Ted Jordan
2005-Dec-30 00:44 UTC
[Xapian-discuss] Query Parser, filenames and compound words
When I submit a filename to the query parser it breaks it up Example: /home/user/file_name.ext becomes Xapian::Query((home:(pos=1) PHRASE 5 user:(pos=2) PHRASE 5 file:(pos=3) PHRASE 5 name:(pos=4) PHRASE 5 ext:(pos=5))) which does not find the document. If I do an single term query not using the query parser then I find the document. The Query Parser also breaks up hyphenated terms Example: open-minded becomes Xapian::Query((open:(pos=1) PHRASE 2 minded:(pos=2))) instead of Xapian::Query((open-minded:(pos=1)) which does not find the indexed term "open-minded" Any ideas would be much appreciated. Thanks, -Ted.
Olly Betts
2005-Dec-30 03:52 UTC
[Xapian-discuss] Query Parser, filenames and compound words
On Fri, Dec 30, 2005 at 12:43:22AM +0000, Ted Jordan wrote:> When I submit a filename to the query parser it breaks it up > > Example: > > /home/user/file_name.ext > > becomes > > Xapian::Query((home:(pos=1) PHRASE 5 user:(pos=2) PHRASE 5 file:(pos=3) PHRASE > 5 name:(pos=4) PHRASE 5 ext:(pos=5))) > > which does not find the document.The QueryParser currently expects you to have tokenised text in a similar way to how Omega's indexers do (this is because historically the QueryParser was part of Omega, and was then split off into a more generic class). Ultimately there should be some way to tell the QueryParser how you tokenised (or it should be able to work it out by being able to test terms in the database). Currently you can say if and how stemming was done but not much else. Hopefully I'll be able to address this in the release after the one I'm currently trying to get out the door. But currently I'm afraid you either need to index like Omega does (look at indextext.cc in the omega sources for the full details), or parse query strings yourself. Cheers, Olly