Hi,
I have a quick question about the BTREE_MAX_KEY_LEN variable and
what happens when it is exceeded.
I have an app which is indexing a very large set of (japanese)
documents, and some of the keys are rather long, garbage-like 400+
byte nuggets of text. When my app attempts to index these guys xapian
balks and throws an exception:
Exception: Key too long: length was 446 bytes, maximum length of a key
is BTREE_MAX_KEY_LEN bytes
If I make sure to check that the new posting token (key) does not
exceed the 252 byte maximum specified here,
http://www.xapian.org/docs/sourcedoc/html/btree_8h.html#b8d8c0c3cbbcec113aa5e3f5edace5dd
I have no problems. However, I noticed that if I run the program
without checking the posting token length before attempting to add it,
it will sometimes throw the exception and keep on trucking, yet
sometimes it will throw the exception and then throw a segmentation
fault and unceremoniously die.
As far as I can tell it is the over-long posting token that is
causing the problem in both cases. It may be that the exception then
causes the posting index to also get out of sync?
for (int i = 0; i < index_tokens.size(); i++)
newdocument.add_posting(index_tokens[i], i);
In code like this, an exception for an overlong posting token would
then... cause the posting index value to get screwed up and possibly
cause a segmentation fault?
I fixed it and am not having any more problems:
for (int i = 0; i < index_tokens.size(); i++) {
if(index_tokens[i].size() <= 252){
newdocument.add_posting(index_tokens[i], posting_cntr);
posting_cntr++;
}
}
I would just like to know whether my assessment of what is/was causing
this problem is accurate.
thanks,
Joe