Cedric Jeanneret
2009-Apr-21 10:44 UTC
[Xapian-discuss] Xapian 1.0.7-3.1 (python libs) : can't find anything
Hello, I have a "little" problem : my xapian search engine cannot find anything. I explain: here at office we have a fileserver with about 250G of files (pdf, odt, doc, txt,...) which are daily indexed with omindex. That part works fine. Then, I've written a small app with pylons, using xapian python library. It worked some days, and now, it just cannot find anything unless we put a wildcart inside the word.... that's not really userfriendly. Here's how I index my fileserver : omindex -D <path-to-indexes>/indexes/commun -U http://some-url/ --stemmer=french /path-to-files/commun and here's a sample of how I search through it : def SearchXapian(qp, db, strq, nbres, page): strq = strq.encode('utf-8') database = xapian.Database() for i in db: d = xapian.Database(i) database.add_database(d) enquire = xapian.Enquire(database) qp.set_database(database) query = qp.parse_query(strq) enquire.set_query(query) matches = enquire.get_mset(int(page)*int(nbres), int(nbres)) qp : xapian.QueryParser, without any option (I tried with french stemmer, but no difference) db : a list of my dbs (we can search through 3 indexes. this list contain path to activated indexes) strq : query string (withour any modifications) nbres : results per page (used for a pagination, later in the code, useless in here), default 10 page : used for paginate, default 0. When I only print matches.get_matches_estimated(), it returns always 0... and when I tried with a "*" inside my word, it works more or less... Any clue, idea? If you need some more code, please ask me. Regards, C. Jeanneret -- C?dric Jeanneret | System Administrator 021 619 10 32 | Camptocamp SA cedric.jeanneret at camptocamp.com | PSE-A / EPFL
Richard Boulton
2009-Apr-21 11:25 UTC
[Xapian-discuss] Xapian 1.0.7-3.1 (python libs) : can't find anything
First, take a look at http://trac.xapian.org/wiki/FAQ/NoMatches . In particular, the part about using delve to look at your DB. It's hard to guess what exactly is going on without an actual example of a document which should be returned, and the search entered to search for it; if I were you, I'd use delve to display the terms in a document, and then try and build a query which should return that document. You can display the exact query used from python by doing "print str(query)". As a total guess, I'd guess that the problem you're coming against is with stemming; perhaps when you tried enabling french stemming you didn't do it correctly: could you paste the code you used to try this? Note that you'll need to call set_stemmer on the queryparser, and then call set_stemming_strategy(STEM_SOME) to match omindex's stemming strategy. -- Richard