Class Summary |
CrawlListCreator |
Creates a list of all URLs to be crawled from the info.xml files. |
CrawlListManager |
This class manages the maintainance of the
list of URLs to fetch. |
DownloadControlData |
Class to hold the data structure to
manage downloads (especially to ensure
minimal time limits between querying the
same host address). |
DownloadControlDataVector |
Extension to Vector to cope with special
requirements of the DownloadControlData. |
ExaleadRetriever |
This class automatically queries exalead and stores
the resulting URLs in text files. |
GoldenRetriever |
This class implements the retrieval of a set of web pages generated by
the class CrawlListCreator. |
GoldenRetriever_ProcessedIndexCorrector |
This class analyzes the crawling.txt and writes the file processed_idx.txt,
containing all the indices (wrt the crawling.txt) of URLs that really has been
retrieved, by analyzing if the files reside on the HDD. |
RetrievalData |
Class to hold the data structure for the
retrieval of one URL into one file. |
SearchResultsAnalyzer |
Analyzes the info.xml files stored for a crawl and prints out some statistical measures. |
SubsetCollectionCreation_Linux |
Creates a subset of the complete artist collection's retrieved Web pages
given a text file with the complete paths to the crawl dirs
of each artist that should be included. |