Class Summary |
CrawlListCreator |
Creates a list of all URLs to be crawled from the info.xml files. |
ExaleadRetriever |
This class automatically queries exalead and stores
the resulting URLs in text files. |
GoldenRetriever |
This class implements the retrieval of a set of web pages generated by
the class CrawlListCreator. |
GoldenRetriever_ProcessedIndexCorrector |
This class analyzes the crawling.txt and writes the file processed_idx.txt,
containing all the indices (wrt the crawling.txt) of URLs that really has been
retrieved, by analyzing if the files reside on the HDD. |
SearchResultsAnalyzer |
Analyzes the info.xml files stored for a crawl and prints out some statistical measures. |
SubsetCollectionCreation_Linux |
Creates a subset of the complete artist collection's retrieved Web pages
given a text file with the complete paths to the crawl dirs
of each artist that should be included. |