Marco Tabini
2005-Jun-13 03:56 UTC
[Xapian-discuss] Separting results from multiple databases
Hello, I'm playing around with Xapian and I'm wondering whether it's possible to retrieve the estimated number of documents returned by each database that is part of a query. For example--suppose that I have two databases, one that stores news items and another one that stores article. When a search is performed, by default I want to return a set that contains matches from both dbs. However, I also want to give the user an idea of how many of the matches come from each database. 1. Is this possible without running the query again against either db? 2. As a side question, is there a significant performance hit in combining multiple databases as opposed to using a single db? In that case, how could I separate the different types of data to achieve the result I described above? Thanks much! Marco
Olly Betts
2005-Jun-13 17:55 UTC
[Xapian-discuss] Separting results from multiple databases
On Sun, Jun 12, 2005 at 10:55:27PM -0400, Marco Tabini wrote:> I'm playing around with Xapian and I'm wondering whether it's possible to > retrieve the estimated number of documents returned by each database that is > part of a query.No. Currently statistics for each term are merged, then the estimates calculated. This is likely to change though. I'm planning to change to storing the first and last document id which each term indexes and use the query's structure to apply intersections, unions, etc to these ranges. This should improve the estimate statistics, but it is probably best done per database, and then summed. It would be pretty easy to make per-database statistics available then.> 1. Is this possible without running the query again against either db?No, although this probably won't be very expensive to do as most of the database blocks you'll need will be cached from the first query. Generally it's the I/O which takes the time (unless the database is small, in which case it's quick anyway!)> 2. As a side question, is there a significant performance hit in combining > multiple databases as opposed to using a single db?Shouldn't be much. The main hit will be that separate databases will usually be smaller, so need less I/O. Cheers, Olly