IncludeTermsFilter

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

comirva.web.indexing
Class IncludeTermsFilter

java.lang.Object
  org.apache.lucene.util.AttributeSource
      org.apache.lucene.analysis.TokenStream
          org.apache.lucene.analysis.TokenFilter
              comirva.web.indexing.IncludeTermsFilter

All Implemented Interfaces:: Closeable

public final class IncludeTermsFilter
extends org.apache.lucene.analysis.TokenFilter
extends org.apache.lucene.analysis.TokenFilter

Includes only words occurring in a word list from a token stream.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
`org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State`

Field Summary

Fields inherited from class org.apache.lucene.analysis.TokenFilter
`input`

Constructor Summary
`IncludeTermsFilter(boolean enablePositionIncrements, org.apache.lucene.analysis.TokenStream in, Set<?> includeWords)` Constructs a filter which includes only words in a list.
`IncludeTermsFilter(boolean enablePositionIncrements, org.apache.lucene.analysis.TokenStream input, Set<?> includeWords, boolean ignoreCase)` Construct a token stream filtering the given input.

Method Summary
`boolean`	`getEnablePositionIncrements()`
`static boolean`	`getEnablePositionIncrementsVersionDefault(org.apache.lucene.util.Version matchVersion)` Returns version-dependent default for enablePositionIncrements.
`boolean`	`incrementToken()` Returns the next input Token whose term() is not a stop word.
`static Set<Object>`	`makeStopSet(List<?> stopWords)` Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor.
`static Set<Object>`	`makeStopSet(List<?> stopWords, boolean ignoreCase)`
`static Set<Object>`	`makeStopSet(String... stopWords)` Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor.
`static Set<Object>`	`makeStopSet(String[] stopWords, boolean ignoreCase)`
`void`	`setEnablePositionIncrements(boolean enable)` If `true`, this StopFilter will preserve positions of the incoming tokens (ie, accumulate and set position increments of the removed stop tokens).

Methods inherited from class org.apache.lucene.analysis.TokenFilter
`close, end, reset`

Methods inherited from class org.apache.lucene.util.AttributeSource
`addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString`

Methods inherited from class java.lang.Object
`clone, finalize, getClass, notify, notifyAll, wait, wait, wait`

Constructor Detail

IncludeTermsFilter

public IncludeTermsFilter(boolean enablePositionIncrements,
                          org.apache.lucene.analysis.TokenStream input,
                          Set<?> includeWords,
                          boolean ignoreCase)

Construct a token stream filtering the given input. If includeWords is an instance of CharArraySet (true if makeStopSet() was used to construct the set) it will be directly used and ignoreCase will be ignored since CharArraySet directly controls case sensitivity.

If stopWords is not an instance of CharArraySet, a new CharArraySet will be constructed and ignoreCase will be used to specify the case sensitivity of that set.

Parameters:: enablePositionIncrements - true if token positions should record the removed stop words; input - Input TokenStream; includeWords - A Set of Strings or char[] or any other toString()-able set representing the stopwords; ignoreCase - if true, all words are lower cased first

IncludeTermsFilter

public IncludeTermsFilter(boolean enablePositionIncrements,
                          org.apache.lucene.analysis.TokenStream in,
                          Set<?> includeWords)

Constructs a filter which includes only words in a list. TokenStream that are named in the Set.

Parameters:: enablePositionIncrements - true if token positions should record the removed stop words; in - Input stream; includeWords - A Set of Strings or char[] or any other toString()-able set representing the stopwords
See Also:: makeStopSet(java.lang.String[])

Method Detail

makeStopSet

public static final Set<Object> makeStopSet(String... stopWords)

Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.

See Also:: passing false to ignoreCase

makeStopSet

public static final Set<Object> makeStopSet(List<?> stopWords)

Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.

Parameters:: stopWords - A List of Strings or char[] or any other toString()-able list representing the stopwords
Returns:: A Set (CharArraySet) containing the words
See Also:: passing false to ignoreCase

makeStopSet

public static final Set<Object> makeStopSet(String[] stopWords,
                                            boolean ignoreCase)

Parameters:: stopWords - An array of stopwords; ignoreCase - If true, all words are lower cased first.
Returns:: a Set containing the words

makeStopSet

public static final Set<Object> makeStopSet(List<?> stopWords,
                                            boolean ignoreCase)

Parameters:: stopWords - A List of Strings or char[] or any other toString()-able list representing the stopwords; ignoreCase - if true, all words are lower cased first
Returns:: A Set (CharArraySet) containing the words

incrementToken

public final boolean incrementToken()
                             throws IOException

Returns the next input Token whose term() is not a stop word.

Specified by:: incrementToken in class org.apache.lucene.analysis.TokenStream

Throws:: IOException

getEnablePositionIncrementsVersionDefault

public static boolean getEnablePositionIncrementsVersionDefault(org.apache.lucene.util.Version matchVersion)

Returns version-dependent default for enablePositionIncrements. Analyzers that embed StopFilter use this method when creating the StopFilter. Prior to 2.9, this returns false. On 2.9 or later, it returns true.

getEnablePositionIncrements

public boolean getEnablePositionIncrements()

See Also:: setEnablePositionIncrements(boolean).

setEnablePositionIncrements

public void setEnablePositionIncrements(boolean enable)

If true, this StopFilter will preserve positions of the incoming tokens (ie, accumulate and set position increments of the removed stop tokens). Generally, true is best as it does not lose information (positions of the original tokens) during indexing.

When set, when a token is stopped (omitted), the position increment of the following token is incremented.

NOTE: be sure to also set QueryParser.setEnablePositionIncrements(boolean) if you use QueryParser to create queries.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

comirva.web.indexing Class IncludeTermsFilter

IncludeTermsFilter

IncludeTermsFilter

makeStopSet

makeStopSet

makeStopSet

makeStopSet

incrementToken

getEnablePositionIncrementsVersionDefault

getEnablePositionIncrements

setEnablePositionIncrements

comirva.web.indexing
Class IncludeTermsFilter