On Sun, Oct 17, 2004 at 01:40:28PM -0400, Mike Boone
wrote:> We're currently running Xapian 0.8.1 via PHP. I am trying to search on
the
> term 'C#' in our keyword list. If I run the stemmer (English)
independently,
> 'c#' is stemmed to 'c#', but it appears that when I parse
the term using the
> QueryParser, it is truncated to plain 'c'. For a similar search,
'C++' stems
> properly in both the stemmer and the QueryParser.
>
> Is there a list of which characters are thrown out by the QueryParser, and
> is there any way to use the QueryParser, yet keep the desired characters?
Are you indexing "c#" as a term? Our indexers (omindex and
scriptindex)
currently don't (which ought to be fixed next time we make indexing
changes), and the QueryParser is set up in line with this - there's no
point it generating search terms not in the index.
If you do have "c#" as a term, you'll have to modify the
queryparser
source for now as this isn't currently configurable. Look for the call
to C_isnotsign - currently this keeps trailing + and - in the term (e.g.
C++, Cl-, Mg2+). You also want to allow "#" here.
Cheers,
Olly