Package comirva.web.indexing

Class Summary
HTMLAnalyzer A lucence word analyzer tailored to HTML files.
HTMLDocument A utility for making Lucene Documents for HTML documents.
HTMLIndexer Indexer for HTML documents.
IncludeTermsFilter Includes only words occurring in a word list from a token stream.
SimMeasure_Lucene Based on output of TermWeighting_Lucene, calculate similarities or distances between term profiles (e.g., TFxIDF weights).
TermWeighting_Lucene Based on a lucene-Index, extract term weights (TF, DF, and IDF) and write to file.