||Creates a list of all URLs to be crawled from the info.xml files.
||This class manages the maintainance of the
list of URLs to fetch.
||Class to hold the data structure to
manage downloads (especially to ensure
minimal time limits between querying the
same host address).
||Extension to Vector to cope with special
requirements of the DownloadControlData.
||This class automatically queries exalead and stores
the resulting URLs in text files.
||This class implements the retrieval of a set of web pages generated by
the class CrawlListCreator.
||This class analyzes the crawling.txt and writes the file processed_idx.txt,
containing all the indices (wrt the crawling.txt) of URLs that really has been
retrieved, by analyzing if the files reside on the HDD.
||Class to hold the data structure for the
retrieval of one URL into one file.
||Analyzes the info.xml files stored for a crawl and prints out some statistical measures.
||Creates a subset of the complete artist collection's retrieved Web pages
given a text file with the complete paths to the crawl dirs
of each artist that should be included.