tokenize


Description:

public virtual void tokenize (string data, TermList terms_out)

Tokenize some input data (which is normally, but not necessarily, a UTF-8 string).

Tokenization splits the input data into constituents (in most cases words), but does not run it through any of the term filters set for the analyzer. It is undefined if the tokenization process itself does any normalization.

Parameters:

this

The analyzer to use

data

The input data to analyze

terms_out

A TermList to place the generated tokens in.