Hello. There are several problems I couldn't find a solution. 1. QueryParser does not perform stemming I am working with PHP5 and use the xapian wrapper written by Daniel M?nard I build a query using parseQuery. Output of the parsed query shows that terms are not stemmed, although a stemmer is set ( see code snippet) # create a XapianDatabase object to search in $db = new XapianDatabase($path2db); # every Query needs an XapianEnquire object; i.e. specifying database to search in $enquire = new XapianEnquire($db); # call XapianQuery object $myQueryParser = new XapianQueryParser(); $myQueryParser->setDatabase($db); $stemmer = new XapianStemmer("german"); $myQueryParser->setStemmer($stemmer); $myQueryParser->setStemmingStrategy(STEM_ALL); #$querystring = removeUmlaute($querystring); #wildcard search $myQuery = $myQueryParser->parseQuery($querystring, Xapian::FLAG_PHRASE|Xapian::FLAG_BOOLEAN|Xapian::FLAG_LOVEHATE|Xapian::FLAG_WILDCARD); ... So what am I doing wrong? The second thing I wondered about, is there any possibility to forbid queryparser lowercasing of the query string. At least for exact phrase matching I found this quite meaningful. (Data is indexed both, upper- and lowercase) Another thing is the encoding of non ascii chars (I hope I didn't miss something in the postings of the mailing list). After applying UTF-8 patch for xapian version 0.9.5, characters like ? ? ? cause a mistake in parsing a term (e.g. K?ln is processed to 'k' and 'n'). Surprisingly using the unpatched xapian-cores and building a query without queryparser results in exact matches when searching for example for 'K?ln'. So what about this? Thanks for any help DD
Olly Betts
2006-May-17 22:55 UTC
[Xapian-discuss] QueryParser lowercase / uppercase and stemming
On Wed, May 17, 2006 at 04:17:39PM +0200, dd wrote:> 1. QueryParser does not perform stemmingIt does!> $myQueryParser->setStemmingStrategy(STEM_ALL);I'm not that familiar with Daniel's wrappers, but my guess is that STEM_ALL isn't the correct name for this constant, so you're passing the string "STEM_ALL" in here which probably gets interpreted as 0, meaning "don't stem anything". One of PHP's nastier features that...> Another thing is the encoding of non ascii chars (I hope I didn't miss > something in the postings of the mailing list). After applying UTF-8 > patch for xapian version 0.9.5, characters like ? ? ? cause a mistake in > parsing a term (e.g. K?ln is processed to 'k' and 'n').You want to modify accentnormalisingitor.h too: http://article.gmane.org/gmane.comp.search.xapian.general/1927> Surprisingly using the unpatched xapian-cores and building a query > without queryparser results in exact matches when searching for > example for 'K?ln'.If you create a term with Query("term") you get *exactly* what you pass as the term (even arbitrary binary data - if you pass a C++ std::string containing zero bytes, the term will contain zero bytes.) But by its nature, QueryParser has to split the passed string up so there's the issue of what is and isn't a "word character". Cheers, Olly
Olly Betts
2006-May-17 23:31 UTC
[Xapian-discuss] QueryParser lowercase / uppercase and stemming
On Wed, May 17, 2006 at 04:17:39PM +0200, dd wrote:> The second thing I wondered about, is there any possibility to forbid > queryparser lowercasing of the query string. At least for exact phrase > matching I found this quite meaningful. (Data is indexed both, upper- > and lowercase)I just realised I missed this. I'm not convinced it's actually a sensible way to index - the only example I know of where it's useful is NeXT computers, which got merged into Apple about a decade ago. Especially in these days of ubiquitous web search, nobody sane would pick a common word and just vary the capitalisation to name their product or company. And enough people will ignore the official spelling and write "NEXT Computers" or "Next computers" that being pedantic about capitalistion also has a negative effect on retrieval performance. But it shouldn't be too hard to add an option for it. I'll take a look when I'm next fiddling with the QueryParser. Cheers, Olly
Daniel Ménard
2006-May-22 16:42 UTC
[Xapian-discuss] QueryParser lowercase / uppercase and stemming
> I am working with PHP5 and use the xapian wrapper written by Daniel > M?nardI suggest that you use Olly's wrapper instead of mine : the wrapper I wrote was just an experiment and I don't intend to update it... Olly did a lot of work to have SWIG generate an object oriented wrapper which is far better than mine : it is complete (mine was not) and it will be included in the Xapian build process so it will always be up to date. See : http://www.oligarchy.co.uk/xapian/patches/xapian9.phps Best regards, DM -- Daniel M?nard Banque de Donn?es Sant? Publique Avenue du Professeur L?on Bernard 35043 Rennes C?dex T?l. (+33) 2.99.02.29.42 Fax (+33) 2.99.02.26.28 E-mail Daniel.Menard@Bdsp.tm.fr http://www.bdsp.tm.fr