AnySearch

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

comirva.web.crawling
Class AnySearch

java.lang.Object
  comirva.web.crawling.AnySearch

public class AnySearch
extends Object
extends Object

This class provides simple access to the results of search engines using Google-like parameters. It can be used directly to crawl the web by defining a search engine's URL and a query.

Field Summary
`static int`	`MAX_RETRIES`
`static int`	`MAX_WAITTIME`
`static int`	`RESULTS_TO_REQUEST`
`static int`	`RETRY_INTERVAL`

Constructor Summary
`AnySearch(AnySearchConfig asCfg, String engineURL, String query)` Creates a new AnySearch-instance to crawl the web.

Method Summary
`int`	`getPageCount()` Returns the number of web pages the search engine returned for the query.
`URL[]`	`getResultURLs(int maxNumber)` Returns an URL-array with the URLs that the query to the search engine yielded.
`boolean`	`timedOut()` Indicates whether the connection to retrieve the full page exceeded the time limit.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

RESULTS_TO_REQUEST

public static int RESULTS_TO_REQUEST

MAX_RETRIES

public static int MAX_RETRIES

RETRY_INTERVAL

public static int RETRY_INTERVAL

MAX_WAITTIME

public static int MAX_WAITTIME

Constructor Detail

AnySearch

public AnySearch(AnySearchConfig asCfg,
                 String engineURL,
                 String query)
          throws WebCrawlException

Creates a new AnySearch-instance to crawl the web.

Parameters:: asCfg - an AnySearchConfig containing the configuration; engineURL - a String with the URL of the search engine to be used; query - a String specifying the exact search query
Throws:: WebCrawlException

Method Detail

getPageCount

public int getPageCount()

Returns the number of web pages the search engine returned for the query.

Returns:: the number of web pages found

getResultURLs

public URL[] getResultURLs(int maxNumber)

Returns an URL-array with the URLs that the query to the search engine yielded.

Parameters:: maxNumber - the maximum number of returned URLs (if more URLs than maxNumber were found, return only maxNumber)
Returns:: a URL[] containing the URLs

timedOut

public boolean timedOut()

Indicates whether the connection to retrieve the full page exceeded the time limit.

Returns:: true, if a time out occurred, false otherwise

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

comirva.web.crawling Class AnySearch

RESULTS_TO_REQUEST

MAX_RETRIES

RETRY_INTERVAL

MAX_WAITTIME

AnySearch

getPageCount

getResultURLs

timedOut

comirva.web.crawling
Class AnySearch