comirva.web.crawling.agmis

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV PACKAGE NEXT PACKAGE

FRAMES NO FRAMES

Package comirva.web.crawling.agmis

Class Summary
CrawlListCreator	Creates a list of all URLs to be crawled from the info.xml files.
CrawlListManager	This class manages the maintainance of the list of URLs to fetch.
DownloadControlData	Class to hold the data structure to manage downloads (especially to ensure minimal time limits between querying the same host address).
DownloadControlDataVector	Extension to Vector to cope with special requirements of the DownloadControlData.
ExaleadRetriever	This class automatically queries exalead and stores the resulting URLs in text files.
GoldenRetriever	This class implements the retrieval of a set of web pages generated by the class CrawlListCreator.
GoldenRetriever_ProcessedIndexCorrector	This class analyzes the crawling.txt and writes the file processed_idx.txt, containing all the indices (wrt the crawling.txt) of URLs that really has been retrieved, by analyzing if the files reside on the HDD.
RetrievalData	Class to hold the data structure for the retrieval of one URL into one file.
SearchResultsAnalyzer	Analyzes the info.xml files stored for a crawl and prints out some statistical measures.
SubsetCollectionCreation_Linux	Creates a subset of the complete artist collection's retrieved Web pages given a text file with the complete paths to the crawl dirs of each artist that should be included.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV PACKAGE NEXT PACKAGE

FRAMES NO FRAMES