Hi! I trying to add a large amount of documents to a database. After about 33400 documents I get the following exception for each new document: Key too long: length was 254 bytes, maximum length of a key is BTREE_MAX_KEY_LEN I am pretty sure that this is not my fault. In my code, I limited the maximum term-length to 200 bytes (and no \0 bytes), and one of the documents where the error occured has a maximum-term-lengh of 11. If I'm trying to add one of the documents that thows these exceptions to an empty database, everything works fine. This is weird because I tested xapian long before with a much bigger abount of dummy-data (about 150.000 documents) with no problems. Now, during the final test with real-world-data this exception happens with no (to me) explainable reason. I am using libxapian11 version: 0.9.6-4.99dapper, and the 0.9.6 perl-bindings (but I dont think the problem is with these bingings). quarzcheck on the odd database says: record: baseA blocksize=8K items=33425 lastblock=10010 revision=2728 levels=2 root=8991 B-tree checked okay record table structure checked OK termlist: baseA blocksize=8K items=33424 lastblock=4873 revision=2728 levels=2 root=2252 B-tree checked okay termlist table structure checked OK postlist: baseA blocksize=8K items=337722 lastblock=6406 revision=2728 levels=2 root=16 B-tree checked okay postlist table structure checked OK position: baseA blocksize=8K items=2381373 lastblock=6595 revision=2728 levels=2 root=2311 B-tree checked okay position table structure checked OK value: baseA blocksize=8K items=0 lastblock=0 revision=2728 levels=0 root=(faked) void B-tree checked okay value table structure checked OK No errors found I'm trying to isolate this incident further, but at the moment I am pretty clueless and even don't have an idea where to start. Do you have any ideas? Regards, mrks
On Thu, Sep 28, 2006 at 05:30:03PM +0200, Markus W?rle wrote:> Key too long: length was 254 bytes, maximum length of a key is > BTREE_MAX_KEY_LEN > > I am pretty sure that this is not my fault. In my code, I limited the > maximum term-length to 200 bytes (and no \0 bytes), and one of the > documents where the error occured has a maximum-term-lengh of 11. IfI don't think it can fire under any other circumstances. Are you aware that both terms added by add_term and add_posting matter? So for example, a unique id term containing a URL or pathname can be an issue.> I'm trying to add one of the documents that thows these exceptions to > an empty database, everything works fine.There's a rather unhelpful feature in how this currently works. There's not currently any explicit check on the term length. Instead, this exception is thrown deep inside the backend and it's checking the length of the keys for the B-tree tables. For the position table, this includes the term name and document id, and the document id is encoded in such a way that a larger document id can take up more bytes. So the same term may be fine in document 1, but cause an exception in document 1000. And yes, I know this is rubbish - it's not trivial to add an explicit check currently because of the zero byte encoding issue. You could try adding the suspect document to an empty database using "replace_document()" with a large document id (e.g. 33400 - the total number of documents in the original system). That should reproduce your problem if you have the culprit. Cheers, Olly
Am 28.09.2006 um 17:30 schrieb Markus W?rle:> Hi! > > I trying to add a large amount of documents to a database. After about > 33400 documents I get the following exception for each new document: > > Key too long: length was 254 bytes, maximum length of a key is > BTREE_MAX_KEY_LENI found the problem in my code. It was hard to find because this error seems not to occur until a flush on the disk, and the error message contains no reference to the key where it fails. I patched my libxapian more verbose and got it, finally. It seems that xapian does not detect the error until it flushes on disk, and if it does, and it fails, it seems that it keeps the unflushed erroring content in memory. After adding a new valid document, it notes that its still over the i-should-flush-now threshold, and tries to flush again, and errors again, and so on... Hope this helps anyone out there (so my question has not been totally useless ;-) Regards, mrks