On Tue, Feb 27, 2018 at 10:54:14PM +0000, Eric Wong
wrote:> Hello, I noticed a problem with DatabaseCorruptError exceptions
> with public-inbox and I guess it's user error...
>
> The problem is public-inbox was calling replace_document to
> modify the DB while iterating through a PostingIterator.
A quick peer at the script suggests you're iterating the postlist for a
term in a WritableDatabase and removing that term from the documents it
indexes.
It's reasonable to expect that to work I think, but I can certainly
believe that it's currently not handled correctly in all cases. The
postlist data is chunked, so depending where the chunk boundaries lie
probably affects whether things go bad or not. Also, iterating a term
which has already been modified since the last commit needs special
handling, so that might affect when this manifests.
I'll investigate deeper when I get a chance.
Your workaround of sucking all the postlist entries for the term you're
currently working on in before working on them is the one I would have
suggested.
BTW, using add_boolean_term() for your 'G' terms would probably be a
good idea - currently they'll have non-zero within-document-frequency,
and so contribute to document length. Generally that's not desirable
for filter terms, and it also can make adding/removing them less
efficient as the document length also needs updating (the case here is
probably OK for that, as you add a replacement term so the document
length is unchanged).
Cheers,
Olly