Matti Heinonen
2006-Jun-09 09:04 UTC
[Xapian-discuss] I'm having problems using queryparser's wild cards with Python
Hello,
I'm having trouble with queryparser using python bindings. Using
wildcards yields an empty query although there are matching terms in the
database.
I'm running
* xapian 0.9.6
with the utf-8 patch
http://search.gmane.org/~xapian/xapian-qp-utf8-0.9.2.patch and
and the transliteration patch
http://article.gmane.org/gmane.comp.search.xapian.general/1927
* python 2.4.2
* the data is in Finnish and in Swedish
Running a small test programme yields:
$python test.py "terveyspalvelut"
Query string is terveyspalvelut
TEST QUERYPARSER
Parsed query to Xapian::Query(terveyspalvelut:(pos=1))
Found these docs: [48, 143, 74, 150, 31, 11, 20, 103, 92, 36]
TRUNCATE
Term is terveyspalvelut
Found these terms: ['terveyspalvelut',
'terveyspalvelutoiminnan']
$python test.py "terveyspalvelut*"
Query string is terveyspalvelut*
TEST QUERYPARSER
Parsed query to Xapian::Query()
Found these docs: []
TRUNCATE
Term is terveyspalvelut
Found these terms: ['terveyspalvelut',
'terveyspalvelutoiminnan']
Here's my test programme
#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys
import xapian
# Querystring is taken from shell. Encode to utf-8.
query = sys.argv[1].encode("utf-8")
print "Query string is %s" % (query,)
print
DB = xapian.Database("xapian")
### Test queryparser
print "TEST QUERYPARSER"
# Set up query
qp = xapian.QueryParser()
qp.set_stemming_strategy(xapian.QueryParser.STEM_NONE)
parsed_query =
qp.parse_query(query,xapian.QueryParser.FLAG_BOOLEAN|xapian.QueryParser.FLAG_PHRASE|xapian.QueryParser.FLAG_LOVEHATE|xapian.QueryParser.FLAG_BOOLEAN_ANY_CASE|xapian.QueryParser.FLAG_WILDCARD)
print "Parsed query to %s" % (parsed_query.get_description(),)
# Do query, print out results
enquire = xapian.Enquire(DB)
enquire.set_query(parsed_query)
mset = enquire.get_mset(0,10)
print "Found these docs: %s" % ([ data[0] for data in mset ],)
print
### Truncate "by hand" to check if they are present
print "TRUNCATE"
# Set up term for iteration (ie. drop "*" at the end if present)
if query[-1] == "*":
term = query[:-1]
else:
term = query
print "Term is %s" % (term,)
# Iterate over matching terms
term_iterator = DB.allterms_begin()
term_iterator.skip_to(term)
matching_terms = []; cut_point = len(term)
while True:
candidate_term = term_iterator.get_term()
if candidate_term[:cut_point] != term:
break
matching_terms.append(candidate_term)
term_iterator.next()
print "Found these terms: %s" % (matching_terms,)
print
Am I missing something? I'd rather avoid writing my own queryparser as
Xapian's queryparser seems to have all the features I need (and more!).
However, right truncation is a neccessity for my project.
Yours,
Matti Heinonen
--
Matti Heinonen | email: matti.heinonen@uta.fi
Atk-erikoistutkija | tel: +358 3 215 8523
Yhteiskuntatieteellinen tietoarkisto FSD | fax: +358 3 215 8519
FIN-33014 TAMPEREEN YLIOPISTO | WWW: http://www.fsd.uta.fi/
Olly Betts
2006-Jun-09 11:21 UTC
[Xapian-discuss] I'm having problems using queryparser's wild cards with Python
On Fri, Jun 09, 2006 at 11:06:48AM +0300, Matti Heinonen wrote:> I'm having trouble with queryparser using python bindings. Using > wildcards yields an empty query although there are matching terms in the > database.> # Set up query > qp = xapian.QueryParser() > qp.set_stemming_strategy(xapian.QueryParser.STEM_NONE)You need to tell the queryparser which database to use: qp.set_database(DB)> parsed_query = > qp.parse_query(query,xapian.QueryParser.FLAG_BOOLEAN|xapian.QueryParser.FLAG_PHRASE|xapian.QueryParser.FLAG_LOVEHATE|xapian.QueryParser.FLAG_BOOLEAN_ANY_CASE|xapian.QueryParser.FLAG_WILDCARD)Cheers, Olly