emmanuel at engelhart.org
2010-May-27 13:20 UTC
[Xapian-discuss] Problem with stop words by indexing
Le jeu 15/04/10 02:36, "Olly Betts" olly at survex.com a ?crit:> On Mon, Apr 05, 2010 at 07:13:02PM +0200, Emmanuel Engelhart wrote: > > I try to remove stop words during the index process > and I have no stemming. > I have tried with a simple example but it does not > work at all. > > > I have my writableDatabase and my termGenerator > (indexer) and they work > well both together: I can index texts and search > trough the database > correctly. > > > > But if I add (before indexing my texts): > > Xapian::SimpleStopper stopper; > > stopper.add("testword"); > > indexer.set_stopper(&stopper); > > > > ... the result is exactly the same as before. I have > checked with delve > and "testword" is indexed. > > http://article.gmane.org/gmane.comp.search.xapian.general/7571 > Looks like I failed to add that note to the API docs - now done. > > This ought to be more configurable, as should some other things in > TermGenerator. I'm thinking we should look at how to improve TermGenerator > in 1.3.x.1.3.x release is a little bit far away for my use case (I speak here only about the capacity of removing unstemmed stop words). I have (in termegenerator_internal.cc, line 129) changed the default value of stop_mode from STOPWORDS_INDEX_UNSTEMMED_ONLY to STOPWORDS_IGNORE and xapian does now exactly what I want. Wouldn't be possible to simply add a property "stopper_strategy" to the termgenerator (or to the stopper) class and a method to modify it (like set_stopper_strategy() ? Emmanuel
On Thu, May 27, 2010 at 03:20:36PM +0200, emmanuel at engelhart.org wrote:> I have (in termegenerator_internal.cc, line 129) changed the default value of > stop_mode from STOPWORDS_INDEX_UNSTEMMED_ONLY to STOPWORDS_IGNORE and xapian > does now exactly what I want. > > Wouldn't be possible to simply add a property "stopper_strategy" to the > termgenerator (or to the stopper) class and a method to modify it (like > set_stopper_strategy() ?Sure, want to work up a patch? Cheers, Olly