On Sat, Jun 06, 2009 at 12:50:30AM +0300, Silviu-Ionut Ganceanu
wrote:> I need to modify the stemming for a couple of words (a blacklist) and for
> all the other to use the usual snowball stemmer.
>
> The "natural" way of doing it would be to derive from Stem and
override
> operator ()... but I am using *python-bindings*. Would this be possible?
Not currently. The big problem is it requires fairly major incompatible
API changes, so it's currently slated as waiting for the next major
version. There's a ticket which is relevant:
http://trac.xapian.org/ticket/186
And a branch with an experimental implementation:
http://trac.xapian.org/browser/branches/stemrefcnt
> If not I have two other solutions in mind:
>
> - add a custom stemmer to Xapian
That would work, and is probably simpler than the second idea.
> - write custom index & search methods in python using add_posting
& hacks
> to modify the query tree respectively
There isn't really a way to modify a query tree (they're immutable, and
there aren't methods to read through an existing tree so you can build a
modified version). Probably doing your own query parsing is the way to
implement this approach.
> Both solutions are not too appealing.
>
> What would be the easiest way to do it?
You could add a "words not to stem" feature to the Xapian::Stem class
(or equivalent functionality such as a "stem 'X' to
'Y'" exception
list). I think that would work.
Cheers,
Olly