Jean-Francois Dockes
2009-Nov-13 14:08 UTC
[Xapian-discuss] Using xapian for general indexed storage
Hello, Two questions about using Xapian as a gdbm stand-in for an auxiliary database: - I am currently using single-term documents having the key as a single term, and the (small) associated data chunk stored in the document data record. Is this still the right way to do it? - There was an answer on the mailing list two years ago, saying that storing a few megabytes in the document data records was ok. Does this still hold, or would it be preferable to use file-system storage ? There is no question of peeking inside the data, it's opaque. http://article.gmane.org/gmane.comp.search.xapian.general/4730/match=document+data+record+storing Regards, jf
Olly Betts
2009-Nov-15 12:36 UTC
[Xapian-discuss] Using xapian for general indexed storage
On Fri, Nov 13, 2009 at 03:08:56PM +0100, Jean-Francois Dockes wrote:> Two questions about using Xapian as a gdbm stand-in for an auxiliary > database: > > - I am currently using single-term documents having the key as a single > term, and the (small) associated data chunk stored in the document data > record. Is this still the right way to do it?If you're just wanting a key/value store, it would be a bit more efficient to just store them as user metadata (a you wouldn't have the termname->docid translation stage): http://xapian.org/docs/apidoc/html/classXapian_1_1WritableDatabase.html#4a8d53e528bda6cee8e507b95f5c0b31 But note that currently Xapian tries to compress document data with zlib, but doesn't try to compress user metadata. This may change in the future - I don't think it was a deliberate decision, just due to where the user metadata is stored.> - There was an answer on the mailing list two years ago, saying that > storing a few megabytes in the document data records was ok. Does > this still hold, or would it be preferable to use file-system storage ? > There is no question of peeking inside the data, it's opaque. > > http://article.gmane.org/gmane.comp.search.xapian.general/4730/match=document+data+record+storingJust to be clear, when Richard says "2MB is fine", he means "2MB is well within the upper limit" rather than that he particularly recommends doing it, since he goes on to say: It's probably a mistake to try storing that much data, anyway; while it should work, you'll end up with a single very large file in the Xapian database directory holding the records, which might be a pain when taking backups, etc. Also, Xapian doesn't provide you with any ability to perform randomly access on the document data - you have to read it all into memory to access it: if the data was stored in a file, the operating system can access it much more efficiently. The same applies to user metadata. The advantage of using Xapian is that you can get changes committed atomically along with other changes to the database. And it does avoid having to open a file for each item you want to read. Cheers, Olly