tokenize_and_fold
Description:
[ Version ( since = "2.40" ) ]
public string[] tokenize_and_fold (string transit_locale, out string[] ascii_alternates)
Tokenises string
and performs folding on each token.
A token is a non-empty sequence of alphanumeric characters in the source string, separated by non-alphanumeric characters. An "alphanumeric" character for this purpose is one that matches isalnum or ismark.
Each token is then (Unicode) normalised and case-folded. If ascii_alternates
is non-null and
some of the returned tokens contain non-ASCII characters, ASCII alternatives will be generated.
The number of ASCII alternatives that are generated and the method for doing so is unspecified, but translit_locale
(if specified)
may improve the transliteration if the language of the source string is known.
Parameters:
ascii_alternates |
a return location for ASCII alternates |
string |
a string |
translit_locale |
the language code (like 'de' or 'en_GB') from which |
Returns:
the folded tokens |