Vasiliy Sergeev
2008-Feb-12 13:38 UTC
[Xapian-discuss] problem with multi-database search, xapian 0.9.10
Hi everyone! I use xapian 0.9.10 with php-binding. My DBs are divided for months. So when search goes through 2 month at the same time I use multi-database search. But I faced a problem when search goes via 2 DBs: 27365280 28956512 Fatal error: Uncaught exception 'Exception' with message 'DocNotFoundError: Document 2089165965 not found.' in /usr/local/newsserver/classes/Xapian/MSetIterator.php:35 Stack trace: #0 /usr/local/newsserver/classes/Xapian/MSetIterator.php(35): msetiterator_get_document(Resource id #28) #1 /usr/local/newsserver/classes/DocTest.php(42): Xapian_MSetIterator->get_document() #2 /usr/local/newsserver/classes/DocTest.php(61): DocTest->__construct() #3 {main} thrown in /usr/local/newsserver/classes/Xapian/MSetIterator.php on line 35 First goes a row with document ids, after that exception appears with absent document which id is about 10 times larger. The same search with this DBs separately goes fine. But multi-DB search makes error. As I know this problem means that something wrong is with mixed document ids, right? I suspect 32bit overflowing. The same search in single DB results with document ids like 4236476938 which is strangely close to the MAX of 32bit integer. I plan to migrate to xapian 1.0.5 but I want to know why this problem appears to be sure it won't happen with latest xapian version. Please, advise. Thanks, Vaso.
Olly Betts
2008-Feb-16 20:31 UTC
[Xapian-discuss] problem with multi-database search, xapian 0.9.10
On Tue, Feb 12, 2008 at 07:38:24PM +0600, Vasiliy Sergeev wrote:> I suspect 32bit overflowing. The same search in single DB results with > document ids like 4236476938 which is strangely close to the MAX of > 32bit integer.Yes, the multidb code doesn't currently check for the mapped docid wrapping round.> I plan to migrate to xapian 1.0.5 but I want to know why this problem > appears to be sure it won't happen with latest xapian version.This is unchanged in 1.0.5. If you make use of more than half the docid space in each of the two subdatabases, there's not much we can do. We need to map the docids from the subdatabases to/from the docids of the combined database. So the "fix" would be to throw an exception in this case, which isn't going to help you much... I assume you don't actually have 4 billion documents in each database? If you do, then your only option is to recompile Xapian with a 64-bit Xapian::docid type. Although you can set your own docids to create a sparse usage pattern, it's probably not a good idea to. The backend uses delta encoding on docids to compress posting lists, which means that the compression won't be as good. You'll probably waste more space than you save by not storing the UID as a term, and less compressed posting lists affect all searches whereas the UID terms won't have much overhead at search time (since they'll all appear adjacently in the Btree). Cheers, Olly