tokenize

Description:

public virtual void tokenize (string data, TermList terms_out)

Tokenize some input data (which is normally, but not necessarily, a UTF-8 string).

Tokenization splits the input data into constituents (in most cases words), but does not run it through any of the term filters set for the analyzer. It is undefined if the tokenization process itself does any normalization.

Parameters:

this	The analyzer to use
data	The input data to analyze
terms_out	A TermList to place the generated tokens in.