Cedric Jeanneret
2009-Apr-21 10:44 UTC
[Xapian-discuss] Xapian 1.0.7-3.1 (python libs) : can't find anything
Hello,
I have a "little" problem : my xapian search engine cannot find
anything. I explain:
here at office we have a fileserver with about 250G of files (pdf, odt, doc,
txt,...) which are daily indexed with omindex. That part works fine.
Then, I've written a small app with pylons, using xapian python library. It
worked some days, and now, it just cannot find anything unless we put a wildcart
inside the word.... that's not really userfriendly.
Here's how I index my fileserver :
omindex -D <path-to-indexes>/indexes/commun -U http://some-url/
--stemmer=french /path-to-files/commun
and here's a sample of how I search through it :
def SearchXapian(qp, db, strq, nbres, page):
strq = strq.encode('utf-8')
database = xapian.Database()
for i in db:
d = xapian.Database(i)
database.add_database(d)
enquire = xapian.Enquire(database)
qp.set_database(database)
query = qp.parse_query(strq)
enquire.set_query(query)
matches = enquire.get_mset(int(page)*int(nbres), int(nbres))
qp : xapian.QueryParser, without any option (I tried with french stemmer, but no
difference)
db : a list of my dbs (we can search through 3 indexes. this list contain path
to activated indexes)
strq : query string (withour any modifications)
nbres : results per page (used for a pagination, later in the code, useless in
here), default 10
page : used for paginate, default 0.
When I only print matches.get_matches_estimated(), it returns always 0... and
when I tried with a "*" inside my word, it works more or less...
Any clue, idea?
If you need some more code, please ask me.
Regards,
C. Jeanneret
--
C?dric Jeanneret | System Administrator
021 619 10 32 | Camptocamp SA
cedric.jeanneret at camptocamp.com | PSE-A / EPFL
Richard Boulton
2009-Apr-21 11:25 UTC
[Xapian-discuss] Xapian 1.0.7-3.1 (python libs) : can't find anything
First, take a look at http://trac.xapian.org/wiki/FAQ/NoMatches . In particular, the part about using delve to look at your DB. It's hard to guess what exactly is going on without an actual example of a document which should be returned, and the search entered to search for it; if I were you, I'd use delve to display the terms in a document, and then try and build a query which should return that document. You can display the exact query used from python by doing "print str(query)". As a total guess, I'd guess that the problem you're coming against is with stemming; perhaps when you tried enabling french stemming you didn't do it correctly: could you paste the code you used to try this? Note that you'll need to call set_stemmer on the queryparser, and then call set_stemming_strategy(STEM_SOME) to match omindex's stemming strategy. -- Richard