Christiano Anderson
2005-Dec-09 16:17 UTC
[Xapian-discuss] Key too long: length was 254 bytes
Hello, I have two databases (base1 and base2), I am trying to join theses DBs into a single one (finaldb) by running the following command: $ quartzcompact base1 base2 finaldb After some minutes it returns this error: postlist ...quartzcompact: Key too long: length was 254 bytes, maximum length of a key is BTREE_MAX_KEY_LEN bytes What could be wrong with the databases and how can I solve this problem? Thanks a lot, Cheers, Christiano
On Fri, Dec 09, 2005 at 02:24:16PM -0200, Christiano Anderson wrote:> $ quartzcompact base1 base2 finaldb > > After some minutes it returns this error: postlist ...quartzcompact: > Key too long: length was 254 bytes, maximum length of a key is > BTREE_MAX_KEY_LEN bytes > > What could be wrong with the databases and how can I solve this problem?The btree manager which Quartz uses has a maximum key length of 252 bytes. But because the keys contain more than just term names, the maximum safe length for a term is 240 bytes (or perhaps a few more, but 240 is certainly safe). There's one further wrinkle - any zero bytes in a term require 2 bytes in the the quartz key. The problem you've run into is that the length check currently only looks at the assembled key in quartz, not at the term length. The assembled key for some of the Btree tables has the document id encoded using a variable length coding, so bigger document ids need more bytes. I suspect that base2 has a 252 byte key in one of these tables and that when the databases are merged the document id is larger so the key overflows. Really we should vet the term lengths themselves to stop this situation happening, but I'm afraid we don't currently. So you'll need to find out what is producing such long terms and make it stop doing so! If it's something like a URL, you might want to look at how Omega handles this by hashing the tail of long URLs. The code is in omindex.cc, function make_url_term. Cheers, Olly