|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object comirva.web.crawling.AnySearch
public class AnySearch
This class provides simple access to the results of search engines using Google-like parameters. It can be used directly to crawl the web by defining a search engine's URL and a query.
Field Summary | |
---|---|
static int |
MAX_RETRIES
|
static int |
MAX_WAITTIME
|
static int |
RESULTS_TO_REQUEST
|
static int |
RETRY_INTERVAL
|
Constructor Summary | |
---|---|
AnySearch(AnySearchConfig asCfg,
String engineURL,
String query)
Creates a new AnySearch-instance to crawl the web. |
Method Summary | |
---|---|
int |
getPageCount()
Returns the number of web pages the search engine returned for the query. |
URL[] |
getResultURLs(int maxNumber)
Returns an URL-array with the URLs that the query to the search engine yielded. |
boolean |
timedOut()
Indicates whether the connection to retrieve the full page exceeded the time limit. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static int RESULTS_TO_REQUEST
public static int MAX_RETRIES
public static int RETRY_INTERVAL
public static int MAX_WAITTIME
Constructor Detail |
---|
public AnySearch(AnySearchConfig asCfg, String engineURL, String query) throws WebCrawlException
asCfg
- an AnySearchConfig containing the configurationengineURL
- a String with the URL of the search engine to be usedquery
- a String specifying the exact search query
WebCrawlException
Method Detail |
---|
public int getPageCount()
public URL[] getResultURLs(int maxNumber)
maxNumber
- the maximum number of returned URLs (if more URLs than maxNumber
were found, return only maxNumber
)
public boolean timedOut()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |