Displaying 1 result from an estimated 1 matches for "gesundheitssch".
2010 Jun 09
1
TermGenerator incorrectly tokenizes German text which contains special characters
...s that after indexing text which contains special characters
like ?, ?, ? and ?, using TermGenerator::index_text (
http://xapian.org/docs/sourcedoc/html/classXapian_1_1TermGenerator.html#b358784fa685139e8bdd71d37f39573e),
terms get cut off (stopped) after the special character. For example the
term gesundheitssch?dlich is indexed as gesundheitssch? and Zgesundheitssch?
(stemmed).
All character encodings are set to UTF-8, the MySql database is also in
UTF-8 encoding.
*
#1 $lIndexer = new XapianTermGenerator();
#2 $lStemmer = new XapianStem(XapianHelper::GetStemmer($pLanguage)); //
?german?
#3 $lIndexer->...