thr3ads.net - Xapian discuss - [Xapian-discuss] Spreading a database across multiple machines [Mar 2006]

If this information is useful, please help other people find it:
Share via:

Philip Neustrom

2006-Mar-26 03:02 UTC

[Xapian-discuss] Spreading a database across multiple machines

Hey all,

I'm working on a project that contains lots of little sub-sites which
I want to act autonomously.  Right now, each site has its own Xapian
database and when a search is performed this site-specific database is
queried.  I want to be able to have a 'global' search across all of
these said databases.  However, I want the individual site searches to
behave, when searched individually, as if they are the only database. 
It seems like the logical thing to do would be to create a Database
object and then add_database() for each database.  However, I'm
looking at a situation in which there could be thousands of
independent databases, and doing add_database() for each possible site
seems like it could be inefficient in this case.

Is there a way to maintain a single database that can be queried on a
site-specific basis and act like it's a site-specific -- e.g. the
probability/results are weighted according to some site-specific tag? 
(And then if I wanted to divide the master database up due to space
concerns I would do so using add_database(), but it would logically be
one large database).

--Philip Neustrom

Olly Betts

2006-Mar-30 13:15 UTC

head link

[Xapian-discuss] Spreading a database across multiple machines

On Sat, Mar 25, 2006 at 06:01:59PM -0800, Philip Neustrom
wrote:> It seems like the logical thing to do would be to create a Database
> object and then add_database() for each database.  However, I'm
> looking at a situation in which there could be thousands of
> independent databases, and doing add_database() for each possible site
> seems like it could be inefficient in this case.
You'll eventually hit the per-process file descriptor limit too.
> Is there a way to maintain a single database that can be queried on a
> site-specific basis and act like it's a site-specific -- e.g. the
> probability/results are weighted according to some site-specific tag? 
No.  The problem is that you can't calculate those statistics
efficiently from the information stored.  Precalculating them as
content is added might be possible, but is a big change.

Are the statistics from a combined database different enough to matter?

If so, I'd suggest building a merged database for the global search, but
keep the individual databases if you want the stats to be exact for each
subcollection.  If you're using flint, then xapian-compact has a
"--multipass" option which will cope with merging 1000s of databases.
I suspect quartzcompact won't cope, but you can always merge in several
goes by hand.

Cheers,
    Olly

Xapian discuss - Mar 2006 - Spreading a database across multiple machines

[Xapian-discuss] Spreading a database across multiple machines

[Xapian-discuss] Spreading a database across multiple machines