|
Class Summary |
| HTMLAnalyzer |
A lucence word analyzer tailored to HTML files. |
| HTMLDocument |
A utility for making Lucene Documents for HTML documents. |
| HTMLIndexer |
Indexer for HTML documents. |
| IncludeTermsFilter |
Includes only words occurring in a word list from a token stream. |
| SimMeasure_Lucene |
Based on output of TermWeighting_Lucene, calculate similarities or
distances between term profiles (e.g., TFxIDF weights). |
| TermWeighting_Lucene |
Based on a lucene-Index, extract term weights (TF, DF, and IDF) and write to file. |