Class Summary |
HTMLAnalyzer |
A lucence word analyzer tailored to HTML files. |
HTMLDocument |
A utility for making Lucene Documents for HTML documents. |
HTMLIndexer |
Indexer for HTML documents. |
IncludeTermsFilter |
Includes only words occurring in a word list from a token stream. |
SimMeasure_Lucene |
Based on output of TermWeighting_Lucene, calculate similarities or
distances between term profiles (e.g., TFxIDF weights). |
TermWeighting_Lucene |
Based on a lucene-Index, extract term weights (TF, DF, and IDF) and write to file. |