I need a database where the last three months documents need to be
searchable, anything beyond that can be archived. So I'm thinking about
implementing a "rolling" database where I have a database per month
and
combine them into one. The latest database would be writable and the
previous two being read-only. When the month ends I would close all the
existing databases and reopen them to include the new month.
For example: December would look like this:
200912 -> writable
200911 -> read-only
200910 -> read-only
And January like this:
201001 -> writable
200912 -> read-only
200911 -> read-only
I'm hoping to use this scheme for a number of reasons: the latest database
is read *significantly* more often than any of the earlier databases;
to be able to manage the ever growing size of the database and to be
able to compact the read only databases.
I'm using the ruby bindings and I've got a couple of questions.
1) is it possible to close a database? I can flush the database and set
the database object to nil but I can't force the database destructor
to be called even if I run the garbage collector.
2) Is there a good way of calculating an optimal size for a Xapian
database? For example I will be getting about ~ 3 million documents
a month should I be rolling every month, two months etc.
3) Is there a better way to this?
rgh
On Wed, Dec 02, 2009 at 01:11:21PM +1100, Richard Heycock wrote:> I'm using the ruby bindings and I've got a couple of questions. > > 1) is it possible to close a database? I can flush the database and set > the database object to nil but I can't force the database destructor > to be called even if I run the garbage collector.Not explicitly in 1.0.x. 1.1.x adds a close() method, mostly because of the issues with delayed destruction in scripting languages.> 2) Is there a good way of calculating an optimal size for a Xapian > database? For example I will be getting about ~ 3 million documents > a month should I be rolling every month, two months etc.Optimal by what metric? I think you'll probably need to decide what your criteria are, and then try it with different set ups on realistic test data to get a useful answer to this.> 3) Is there a better way to this?If your search load is high, it might be worth trying two databases - one for the older data and one being updated, but keep the monthly databases around so you can create a new "older data" database covering the desired period when you roll over. Cheers, Olly