Alec Thomas
2007-Feb-09 00:18 UTC
[Xapian-devel] Fetching document content by Q term in Python
Hello, I'd like to be able to retrieve the indexes stored copy of the document text and tried the following: terms = self.db.allterms() terms.skip_to('Q' + uri.encode('utf-8')) term = terms.next() doc = self.db.get_document(term[1]) print doc.get_data() I just wildly guessed that [1] was the docid, but of course it isn't. So the question is, how do I get a docid out of a term? Or if I'm completely on the wrong track, how do I get the document from a Q term? Thanks, Alec
Olly Betts
2007-Feb-09 08:01 UTC
[Xapian-devel] Fetching document content by Q term in Python
On Fri, Feb 09, 2007 at 11:18:10AM +1100, Alec Thomas wrote:> I'd like to be able to retrieve the indexes stored copy of the document > text and tried the following: > > terms = self.db.allterms() > terms.skip_to('Q' + uri.encode('utf-8')) > term = terms.next() > doc = self.db.get_document(term[1]) > print doc.get_data() > > I just wildly guessed that [1] was the docid, but of course it isn't. So the > question is, how do I get a docid out of a term?This will print the data from each document indexed by a particular term: term = 'Q' + uri.encode('utf-8') for docid in self.db.postlist(term): doc = self.db.get_document(docid) print doc.get_data() You get a PostingIter from db.postlist(term) - see python/docs/bindings.html for details.> Or if I'm completely on the wrong track, how do I get the document from > a Q term?Alternatively, you can run a search for the Q-prefixed term. The above is a little less work though. Cheers, Olly