Jarrod Roberson
2007-Feb-13 16:16 UTC
[Xapian-discuss] Stem doesn't remove ' from end of words that are possessive?
If I stem a word with " 's " at the end it strips off the s and leaves the ' hanging. this means that possessives don't stem down to a useful term. Removing the " ' " as well as the trailing " s " would be a more useful behavior. For example: uncle's stems down to uncle' if it stemmed down to just " uncle " that would be a much more useful behavior I would believe. Is this the intended behavior? If so why?
Richard Boulton
2007-Feb-13 16:39 UTC
[Xapian-discuss] Stem doesn't remove ' from end of words that are possessive?
Jarrod Roberson wrote:> If I stem a word with " 's " at the end it strips off the s and leaves the > ' hanging. > this means that possessives don't stem down to a useful term. > Removing the " ' " as well as the trailing " s " would be a more useful > behavior. > > For example: > uncle's stems down to uncle' > if it stemmed down to just " uncle " that would be a much more useful > behavior I would believe. > > Is this the intended behavior? If so why?The current stemming algorithms (as used in Xapian version 0.9.9) don't have and special code for handling apostrophes at all. Olly is right in the middle of updating SVN HEAD to the latest versions of the stemming algorithms, in which the English stemmer handles apostrophes. This means that "uncle", "uncle's" and "uncles'" will all stem to "uncl" in the next release of Xapian. We may also need to do some work on the query parser and text processing code in Omega to ensure that apostrophes are passed through to the stemming algorithm correctly; I'm not sure exactly which characters get stripped out before being passed to the stemmers currently. -- Richard
Olly Betts
2007-Feb-13 16:40 UTC
[Xapian-discuss] Stem doesn't remove ' from end of words that are possessive?
On Tue, Feb 13, 2007 at 11:16:40AM -0500, Jarrod Roberson wrote:> For example: > uncle's stems down to uncle' > if it stemmed down to just " uncle " that would be a much more useful > behavior I would believe.I'm actually in the middle of integrating the latest version of the snowball stemmers, which change the handling of apostrophes as you suggest. So this will be fixed in Xapian 1.0. Cheers, Olly