Started a new thread - don't want to hijack the previous one (or carry on
hijacking it).
On Thu, June 10, 2010 05:17, Olly Betts wrote:>> My issue is that exceptions (ie, "Exception: Key too long: length
>> was...")
>
> You are hitting the Btree key size limit. For flint and chert, this
> translates to a term length limit of 245 bytes.
> If you are using Xapian >= 1.0.3 then the term limit should be checked
> when you call add_document() or replace_document().
I'm using trunk, r13989.
Ok, I have my stupid hat on this morning, so please bear with me:
...
# $raw_text could contain up to 110k of text.
$analyzer->index_text ($raw_text, ...);
$index->add_spelling(...foreach word in $raw_text...);
...
$index->add_document($xpdoc);
...
Now, when you say I need to truncate my term lengths to 240, what exactly
are we talking about? Truncating $raw_text is obviously not it; are we
talking about making sure that each term/word in $raw_text does not exceed
240? Their *is* a lot of junk out there (base64/ascii/etc) where this
limit will be exceeded.
What about add_spelling()? Presumably it would be a good idea to truncate
the words therein as well? What is the hard limit, and what is the
suggested/sane limit?
...or am I completely off the point here? :)
> If you're getting an
> error later then either your terms have zero bytes in (which currently
> need to be escaped in the Btree keys) or there's a bug (in which case a
> testcase would be useful).
ok - will do some digging to see whether my data has 0x0 in it (and
replace if so).
>
>> Other times, the exception will occur followed by another:
"Unexpected
>> end of table when reading continuation of tag..." -- this is
probably
>> because of the unhandled initial exception.
>
> An exception shouldn't cause problems like that. Again, a testcase
would
> be useful.
ok - will check out the specific index run where I observed this.
Thanks
Henry