Matti Heinonen
2006-Jun-09 09:04 UTC
[Xapian-discuss] I'm having problems using queryparser's wild cards with Python
Hello, I'm having trouble with queryparser using python bindings. Using wildcards yields an empty query although there are matching terms in the database. I'm running * xapian 0.9.6 with the utf-8 patch http://search.gmane.org/~xapian/xapian-qp-utf8-0.9.2.patch and and the transliteration patch http://article.gmane.org/gmane.comp.search.xapian.general/1927 * python 2.4.2 * the data is in Finnish and in Swedish Running a small test programme yields: $python test.py "terveyspalvelut" Query string is terveyspalvelut TEST QUERYPARSER Parsed query to Xapian::Query(terveyspalvelut:(pos=1)) Found these docs: [48, 143, 74, 150, 31, 11, 20, 103, 92, 36] TRUNCATE Term is terveyspalvelut Found these terms: ['terveyspalvelut', 'terveyspalvelutoiminnan'] $python test.py "terveyspalvelut*" Query string is terveyspalvelut* TEST QUERYPARSER Parsed query to Xapian::Query() Found these docs: [] TRUNCATE Term is terveyspalvelut Found these terms: ['terveyspalvelut', 'terveyspalvelutoiminnan'] Here's my test programme #!/usr/bin/python # -*- coding: utf-8 -*- import sys import xapian # Querystring is taken from shell. Encode to utf-8. query = sys.argv[1].encode("utf-8") print "Query string is %s" % (query,) print DB = xapian.Database("xapian") ### Test queryparser print "TEST QUERYPARSER" # Set up query qp = xapian.QueryParser() qp.set_stemming_strategy(xapian.QueryParser.STEM_NONE) parsed_query = qp.parse_query(query,xapian.QueryParser.FLAG_BOOLEAN|xapian.QueryParser.FLAG_PHRASE|xapian.QueryParser.FLAG_LOVEHATE|xapian.QueryParser.FLAG_BOOLEAN_ANY_CASE|xapian.QueryParser.FLAG_WILDCARD) print "Parsed query to %s" % (parsed_query.get_description(),) # Do query, print out results enquire = xapian.Enquire(DB) enquire.set_query(parsed_query) mset = enquire.get_mset(0,10) print "Found these docs: %s" % ([ data[0] for data in mset ],) print ### Truncate "by hand" to check if they are present print "TRUNCATE" # Set up term for iteration (ie. drop "*" at the end if present) if query[-1] == "*": term = query[:-1] else: term = query print "Term is %s" % (term,) # Iterate over matching terms term_iterator = DB.allterms_begin() term_iterator.skip_to(term) matching_terms = []; cut_point = len(term) while True: candidate_term = term_iterator.get_term() if candidate_term[:cut_point] != term: break matching_terms.append(candidate_term) term_iterator.next() print "Found these terms: %s" % (matching_terms,) print Am I missing something? I'd rather avoid writing my own queryparser as Xapian's queryparser seems to have all the features I need (and more!). However, right truncation is a neccessity for my project. Yours, Matti Heinonen -- Matti Heinonen | email: matti.heinonen@uta.fi Atk-erikoistutkija | tel: +358 3 215 8523 Yhteiskuntatieteellinen tietoarkisto FSD | fax: +358 3 215 8519 FIN-33014 TAMPEREEN YLIOPISTO | WWW: http://www.fsd.uta.fi/
Olly Betts
2006-Jun-09 11:21 UTC
[Xapian-discuss] I'm having problems using queryparser's wild cards with Python
On Fri, Jun 09, 2006 at 11:06:48AM +0300, Matti Heinonen wrote:> I'm having trouble with queryparser using python bindings. Using > wildcards yields an empty query although there are matching terms in the > database.> # Set up query > qp = xapian.QueryParser() > qp.set_stemming_strategy(xapian.QueryParser.STEM_NONE)You need to tell the queryparser which database to use: qp.set_database(DB)> parsed_query = > qp.parse_query(query,xapian.QueryParser.FLAG_BOOLEAN|xapian.QueryParser.FLAG_PHRASE|xapian.QueryParser.FLAG_LOVEHATE|xapian.QueryParser.FLAG_BOOLEAN_ANY_CASE|xapian.QueryParser.FLAG_WILDCARD)Cheers, Olly