On Sun, Dec 09, 2007 at 08:16:17AM +0100, Jesper Krogh
wrote:> The queryparser in my setup is using strategy STEM_SOME which seem to
> give the best handling of the data in our setup.
>
> But the queryparser doesn't really seem to be consistent.
> doc:test
> Running query 'Xapian::Query(ZDOCTYPEtest:(pos=1))'
>
> Here it applies stemming to the term before running the query (Z-prefix)
>
> doc:1234
> Running query 'Xapian::Query(DOCTYPE1234:(pos=1))'
>
> There it skips the stemming.
>
> What is the reason for behaving different based on user-input?
http://www.xapian.org/docs/termgenerator.html
Now we index all terms lowercased with positional information, and
also stemmed with a 'Z' prefix (unless they start with a digit)
[...]
Indexing terms which start with a digit twice just bloats the database.
I'm not aware of a language where words can start with a digit, and it
can actually harm retrieval if we attempt to stem part numbers and other
codes.
Cheers,
Olly