Mike Boone
2008-Aug-13 16:55 UTC
[Xapian-discuss] STEM_SOME (was: Custom Stemmng and QueryParser)
I just realized I wasn't replying to the list on my previous messages. Doh! Thanks to Matthew for replying thus far. The issue appears to boil down to this. I am trying to parse a query with STEM_SOME set. It's described in the docs as "Search for stemmed forms of terms except for those which start with a capital letter". I am custom-stemming a few words, which are stored in the index prefixed with XZ. I tried to prefix these myself before sending them to the query parser, but they get stemmed anyway: "XZiis AND sharp" (no quotes) gets parsed as Xapian::Query((xziis:(pos=1) AND Zsharp:(pos=2))). The first term should be XZiis. If I try to use add_prefix('custom','XZ'), "custom:iis AND sharp" (no quotes) is parsed as Xapian::Query((ZXZii:(pos=1) AND Zsharp:(pos=2))). What I'm trying to get is: Xapian::Query((XZiis:(pos=1) AND Zsharp:(pos=2))) How do I get there? This is Xapian 1.0.7 and the PHP bindings. Thanks! Mike Boone. http://boonedocks.net/mike/
Olly Betts
2008-Aug-20 01:48 UTC
[Xapian-discuss] STEM_SOME (was: Custom Stemmng and QueryParser)
On Wed, Aug 13, 2008 at 12:55:46PM -0400, Mike Boone wrote:> The issue appears to boil down to this. I am trying to parse a query > with STEM_SOME set. It's described in the docs as "Search for stemmed > forms of terms except for those which start with a capital letter". > > I am custom-stemming a few words, which are stored in the index > prefixed with XZ. > > I tried to prefix these myself before sending them to the query > parser, but they get stemmed anyway:I should insert the standard warning here that it's generally not a good idea to try to "adjust" the input to the QueryParser. Since it aims to parse potentially free-form input from users as well as boolean structure, you'd generally have to build the equivalent of QueryParser and the equivalent of un-QueryParser to avoid unexpected handling of some cases. There ought to be a way to control parsing and manipulation of terms by slotting bits of code into the QueryParser framework, but currently there are just some settings like a stemmer object to use and the stemming strategy.> "XZiis AND sharp" (no quotes) gets parsed as Xapian::Query((xziis:(pos=1) AND > Zsharp:(pos=2))). The first term should be XZiis.No, the word the user specified is "XZiis". The "XZ" here isn't a term-prefix - that's an implementation detail invisible to the user. If this worked how you seem to want, then a query for "Swordplay" would be interpreted as an "S" prefixed term and actually match "wordplay" in the title instead!> If I try to use add_prefix('custom','XZ'), "custom:iis AND sharp" (no > quotes) is parsed as Xapian::Query((ZXZii:(pos=1) AND > Zsharp:(pos=2))).Yes, because "iis" starts with a lower-case "i", so with STEM_SOME we stem it.> What I'm trying to get is: Xapian::Query((XZiis:(pos=1) AND Zsharp:(pos=2)))Try this: custom:IIS AND sharp Cheers, Olly