I need a database where the last three months documents need to be searchable, anything beyond that can be archived. So I'm thinking about implementing a "rolling" database where I have a database per month and combine them into one. The latest database would be writable and the previous two being read-only. When the month ends I would close all the existing databases and reopen them to include the new month. For example: December would look like this: 200912 -> writable 200911 -> read-only 200910 -> read-only And January like this: 201001 -> writable 200912 -> read-only 200911 -> read-only I'm hoping to use this scheme for a number of reasons: the latest database is read *significantly* more often than any of the earlier databases; to be able to manage the ever growing size of the database and to be able to compact the read only databases. I'm using the ruby bindings and I've got a couple of questions. 1) is it possible to close a database? I can flush the database and set the database object to nil but I can't force the database destructor to be called even if I run the garbage collector. 2) Is there a good way of calculating an optimal size for a Xapian database? For example I will be getting about ~ 3 million documents a month should I be rolling every month, two months etc. 3) Is there a better way to this? rgh
On Wed, Dec 02, 2009 at 01:11:21PM +1100, Richard Heycock wrote:> I'm using the ruby bindings and I've got a couple of questions. > > 1) is it possible to close a database? I can flush the database and set > the database object to nil but I can't force the database destructor > to be called even if I run the garbage collector.Not explicitly in 1.0.x. 1.1.x adds a close() method, mostly because of the issues with delayed destruction in scripting languages.> 2) Is there a good way of calculating an optimal size for a Xapian > database? For example I will be getting about ~ 3 million documents > a month should I be rolling every month, two months etc.Optimal by what metric? I think you'll probably need to decide what your criteria are, and then try it with different set ups on realistic test data to get a useful answer to this.> 3) Is there a better way to this?If your search load is high, it might be worth trying two databases - one for the older data and one being updated, but keep the monthly databases around so you can create a new "older data" database covering the desired period when you roll over. Cheers, Olly