|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object comirva.config.WebCrawlingConfig
public class WebCrawlingConfig
This class represents a configuration for a simple WebCrawler. The WebCrawler uses an arbitrary search engine to get a bunch of URLs and then crawls them. It is used to pass a configuration to the WebCrawler instance.
Constructor Summary | |
---|---|
WebCrawlingConfig(String searchEngineURL,
int numberOfRetries,
int intervalBetweenRetries,
int firstRequestedPageNumber,
String additionalKeywords,
boolean additionalKeywordsAfterSearchString,
int numberOfPages,
String pathStoreRetrievedPages,
String pathExternalCrawler,
boolean isStoreURLList,
boolean isQuoteSearchTerms)
Creates a new instance of a WebCrawling-Configuration. |
Method Summary | |
---|---|
String |
getAdditionalKeywords()
Returns the additional keywords to be added to the search string. |
boolean |
getAdditionalKeywordsAfterSearchString()
Returns whether additional keywords are to be placed after the search string or before. |
int |
getFirstRequestedPageNumber()
Returns the number of the first requested page (usually 0). |
int |
getIntervalBetweenRetries()
Returns the interval between two retries in case of failure (in seconds). |
int |
getNumberOfRequestedPages()
Returns the number of pages that should be returned by the search engine and subsequently crawled. |
int |
getNumberOfRetries()
Returns the number of retries in case of failure to raise a search query. |
String |
getPathExternalCrawler()
Returns the command needed to start the external crawler. |
String |
getPathStoreRetrievedPages()
Returns the root directory where all retrieved web pages are to be stored. |
String |
getSearchEngineURL()
Returns the URL of the search engine to be used. |
boolean |
isQuoteSearchTerms()
Returns whether the search terms should be automatically quoted (i.e., phrase search to be used). |
boolean |
isStoreURLList()
Returns whether a list of all crawled URLs should be stored for every query term. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public WebCrawlingConfig(String searchEngineURL, int numberOfRetries, int intervalBetweenRetries, int firstRequestedPageNumber, String additionalKeywords, boolean additionalKeywordsAfterSearchString, int numberOfPages, String pathStoreRetrievedPages, String pathExternalCrawler, boolean isStoreURLList, boolean isQuoteSearchTerms)
searchEngineURL
- a String containing the URL of the search enginenumberOfRetries
- the number of retries in case of failureintervalBetweenRetries
- the interval between two retries (in seconds)firstRequestedPageNumber
- the number (index) of the first requested pageadditionalKeywords
- additional keywords in the queryadditionalKeywordsAfterSearchString
- whether the additional keywords are to be placed after (or before) the search stringnumberOfPages
- number of pages to retrievepathStoreRetrievedPages
- local path where the retrieved html documents should be storedpathExternalCrawler
- command to run wgetisStoreURLList
- flag to determine whether a list of all retrieved URLs should be stored for every query termisQuoteSearchTerms
- should the search terms automatically be quoted (phrase search)Method Detail |
---|
public String getSearchEngineURL()
public int getNumberOfRetries()
getNumberOfRetries
in interface AnySearchConfig
public int getIntervalBetweenRetries()
getIntervalBetweenRetries
in interface AnySearchConfig
public int getFirstRequestedPageNumber()
getFirstRequestedPageNumber
in interface AnySearchConfig
public String getAdditionalKeywords()
public boolean getAdditionalKeywordsAfterSearchString()
true
if additional keywords should be placed after the search string,
false
if they are placed before the search stringpublic int getNumberOfRequestedPages()
getNumberOfRequestedPages
in interface AnySearchConfig
public String getPathStoreRetrievedPages()
public String getPathExternalCrawler()
public boolean isStoreURLList()
true
if a text file containing all crawled URLs is to be stored for every query term
false
if information of crawled URLs is to be discardedpublic boolean isQuoteSearchTerms()
true
if all search terms are automatically quoted
false
if search will be conducted using the search terms as they are
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |