* Olly Betts <20190826225824.5ws3poy57ou2ahvc at survex.com> :
Wrote on Mon, 26 Aug 2019 23:58:24 +0100:
> On Mon, Aug 26, 2019 at 09:42:09AM +0200, Jean-Francois Dockes wrote:
>> A Recoll user gets the following message while indexing:
>>
>> "Attempted to delete or modify an entry in a non-existent posting
>> list for #bannerholder"
>>
>> The exception happens during a commit call. Xapian version 1.4.11,
>> Debian Buster
>>
>> A little more detail here:
>> https://opensourceprojects.eu/p/recoll1/tickets/108/
>>
>> I asked if this was reproducible, and to run the indexing in
single-thread
>> mode to simplify the situation.
>
> It's worth running xapian-check on the database to see what it reports.
I'm getting the same errors on a recoll database with xapian-1.4.11
when I run recollindex.
I happened to have a snapshot of the database *before* running the
index. xapian-check on the snapshot does not reveal any errors. (see
first attachment)
Running xapian-check on the database after running recollindex (which
reports a failure) gives the output in the second attachment. A
salient point seems to be the following diagnostic which is printed
postlist:
baseA blocksize=8K items=2105479 lastblock=28233 revision=80
levels=2 root=2 Failed to check B-tree: DatabaseError: Items not in
sorted order
> Also might be interesting to check what the posting list for that term
> is:
>
> xapian-delve ~/.recoll/xapiandb -vv -t '#bannerholder'
I ran multithreaded recollindex a few times and got several errors
like this
:2:rcldb/rcldb.cpp:1989::Db::doFlush: flush() failed: Attempted to delete or
modify an entry in a non-existent posting list for -only
:2:rcldb/rcldb.cpp:1989::Db::doFlush: flush() failed: Attempted to delete or
modify an entry in a non-existent posting list for -those-
:2:rcldb/rcldb.cpp:1955::Db::waitUpdIdle: flush() failed: Attempted to delete or
modify an entry in a non-existent posting list for -cols
Then I searched came across this mailing list and then I ran
xapian-delve xapiandb -vv -t '-cols'
I got this error message:
Posting List for term '-cols' (termfreq 1, collfreq 2, wdf_max 2): 4412
2 51781
Then I added the following line
thrQSizes = -1 -1 -1
to recoll.conf - which I believe makes recollindex run in single
threaded mode. I ran recollindex again, and it bailed after printing
the first error message
:2:rcldb/rcldb.cpp:1989::Db::doFlush: flush() failed: Attempted to delete or
modify an entry in a non-existent posting list for -only
However this time
xapian-delve xapiandb -vv -t '-only
returns
term '-only' not in database
Which is very surprising,
>> I'm not too sure if a Recoll bug could cause this, or if this has
to be a
>> Xapian issue, I can open a ticket if more appropriate.
>
> It shouldn't be possible to cause this via valid use of the API, but
> bugs in the application could - for example a stray memory write, or
> writing to a file descriptor which is open for writing by Xapian (most
> likely case is probably a stale handle which the application has closed
> and was reallocated to Xapian). Since 1.3.4 we avoid fds < 3 for
> writable database tables, which at least means writing to a closed
> stdout or stderr can't corrupt the database now, but it's hard to
fully
> protect against writes to our fds (https://trac.xapian.org/ticket/651
> has some ideas).
>
> There aren't any currently known database backend bugs present in
> 1.4.11, but it certainly could be an unknown Xapian bug - if it's
> reproducible with non-private data I'm happy to take a look.
>
> If the database was created with an older Xapian version originally, it
> might be due to an already fixed bug (e.g. the cursor handling one fixed
> in 1.4.7) and the corruption has just escaped notice because nothing
> tried to read or update that part of the database since).
This is likely the case. The timestamp on the snapshot is from 2015.
However each (failed) run of recollindex does successfully seem to
update the xapian database - I get correct results from queries and
the results include the latest indexed data.
[ I'm holding off on rebuilding the database in case someone has some
ideas on what may be going on]
-------------- next part --------------
record:
baseA blocksize=8K items=1930 lastblock=269 revision=72 levels=1 root=266
B-tree checked okay
record table structure checked OK
termlist:
baseA blocksize=8K items=3860 lastblock=6197 revision=72 levels=2 root=5601
B-tree checked okay
termlist table structure checked OK
postlist:
baseA blocksize=8K items=1400286 lastblock=20434 revision=72 levels=2 root=7
B-tree checked okay
postlist table structure checked OK
position:
baseA blocksize=8K items=8607230 lastblock=45578 revision=72 levels=2 root=45100
B-tree checked okay
position table structure checked OK
spelling:
Lazily created, and not yet used.
synonym:
Lazily created, and not yet used.
No errors found
-------------- next part --------------
record:
baseA blocksize=8K items=2452 lastblock=267 revision=80 levels=1 root=43
B-tree checked okay
record table structure checked OK
termlist:
baseA blocksize=8K items=4904 lastblock=7361 revision=80 levels=2 root=1715
B-tree checked okay
termlist table structure checked OK
postlist:
baseA blocksize=8K items=2105479 lastblock=28233 revision=80 levels=2 root=2
Failed to check B-tree: DatabaseError: Items not in sorted order
position:
baseA blocksize=8K items=12095659 lastblock=65112 revision=80 levels=2
root=11486
B-tree checked okay
position table structure checked OK
spelling:
Lazily created, and not yet used.
synonym:
Lazily created, and not yet used.
Total errors found: 1