tata 668
2009-Apr-05 23:18 UTC
[Xapian-discuss] TermGenerator question for the single quote character
Hi, I use the TermGenerator to index the french text "Cela m'excite" (without the quotes). When I do a search for "excite" after this indexation, I need it to be found. "excite" is a word on is own. Currently "excite" is not found but "m'excite" is... Is there a setting I'm missing so that the single quote character act as a word delimiter? Thanks for the help! Julien
Olly Betts
2009-Apr-06 13:31 UTC
[Xapian-discuss] TermGenerator question for the single quote character
On Sun, Apr 05, 2009 at 07:18:08PM -0400, tata 668 wrote:> I use the TermGenerator to index the french text "Cela m'excite" > (without the quotes). When I do a search for "excite" after this > indexation, I need it to be found. "excite" is a word on is own. > > Currently "excite" is not found but "m'excite" is...In 1.0.0, we changed to treating apostrophes as part of a word, and updated to a newer version of Snowball where the English stemmer deals with them. I think the correct way for this to work is for the other stemmers to also handle apostrophes (at least if their languages use them) as otherwise the word tokenisation required depends on the stemmer.> Is there a setting I'm missing so that the single quote character act as > a word delimiter?No, there's no such setting currently. Cheers, Olly