Tokenize some input data (which is normally, but not necessarily, a UTF-8 string).
Tokenization splits the input data into constituents (in most cases words), but does not run it through any of the term filters set for the analyzer. It is undefined if the tokenization process itself does any normalization.
this |
The analyzer to use |
data |
The input data to analyze |
terms_out |
A TermList to place the generated tokens in. |