On Sat, Jun 06, 2009 at 12:50:30AM +0300, Silviu-Ionut Ganceanu
wrote:> I need to modify the stemming for a couple of words (a blacklist) and for
> all the other to use the usual snowball stemmer.
> 
> The "natural" way of doing it would be to derive from Stem and
override
> operator ()... but I am using *python-bindings*. Would this be possible?
Not currently.  The big problem is it requires fairly major incompatible
API changes, so it's currently slated as waiting for the next major
version.  There's a ticket which is relevant:
http://trac.xapian.org/ticket/186
And a branch with an experimental implementation:
http://trac.xapian.org/browser/branches/stemrefcnt
> If not I have two other solutions in mind:
> 
>    - add a custom stemmer to Xapian
That would work, and is probably simpler than the second idea.
>    - write custom index & search methods in python using add_posting
& hacks
>    to modify the query tree respectively
There isn't really a way to modify a query tree (they're immutable, and
there aren't methods to read through an existing tree so you can build a
modified version).  Probably doing your own query parsing is the way to
implement this approach.
> Both solutions are not too appealing.
> 
> What would be the easiest way to do it?
You could add a "words not to stem" feature to the Xapian::Stem class
(or equivalent functionality such as a "stem 'X' to
'Y'" exception
list).  I think that would work.
Cheers,
    Olly