Hi,
A Recoll user is reporting an index corruption problem. In general, index
corruption happens from time to time with Recoll, because of crashes,
reboots, misc Recoll bugs, etc.
The strange thing here is that xapian-check does not seem to detect anything.
In a nutshell, some document numbers seem to point to a data blackhole: the
docids are returned when searching for the file/doc unique identifying
term, but then get_document() fails. A later replace_document() succeeds,
but on the next indexing pass, same issue.
// success
docid = db.postlist_begin(uniterm)
// then failure:
xdoc = db.get_document(*docid)
In this situation, Recoll will try to update the doc. replace_document()
then succeeds, and this repeats on the next indexing pass.
This is with Xapian 1.2.16
Here follows a slightly edited version of what the user reports about
experiments run with pure xapian-check/delve:
Recoll 1.21.3 + Xapian 1.2.16 with two external indices (on a network
server) and one local index. The setup has been running fine for weeks and
the external indices update on a cron job overnight.
Today, I searched for a term that I know is in many documents and can be
found (my last name). No documents were found in the gui Recoll.
I then searched in one external index on the command line
"recoll -c -t -q term" and received the following response:
:2:../rcldb/rclquery.cpp:358:xenquire->get_mset: exception: Document 6
not
found Recoll query: ((term...)) -1 results
:2:../rcldb/rclquery.cpp:392:enquire->get_mset: exception: Document 6 not
found
I then went through and checked as above (after installing xapian-tools). I
ran the xapian-check on both external indices and both had no problems.
I then ran "delve -t term ./xapiandb" and found a long list of
IDs, one of
which was 6. I then ran "delve -r 6 ./xapiandb" and got a long
list of
terms, which included 'term' and seemed to be reasonable for a
document I
then ran "delve -r 6 ./xapiandb -d" and got the following:
Data for record #6:
Error: DocNotFoundError: Document 6 not found
And the output from xapian-check:
===========record:
baseB blocksize=8K items=84507 lastblock=3379 revision=157 levels=2 root=12
B-tree checked okay
record table structure checked OK
termlist:
baseB blocksize=8K items=169014 lastblock=24090 revision=157 levels=2 root=5
B-tree checked okay
termlist table structure checked OK
postlist:
baseB blocksize=8K items=8727966 lastblock=66596 revision=157 levels=3 root=113
B-tree checked okay
postlist table structure checked OK
position:
baseB blocksize=8K items=34905667 lastblock=109114 revision=157 levels=2
root=11167
B-tree checked okay
position table structure checked OK
spelling:
Lazily created, and not yet used.
synonym:
baseB blocksize=8K items=255128 lastblock=4844 revision=157 levels=2 root=2
B-tree checked okay
synonym table: Don't know how to check structure
No errors found
============
The whole report is here:
https://bitbucket.org/medoc/recoll/issues/257/query-returns-no-results-when-document-is
Look for the 'Bob Cargill' section, unfortunately, the issue was
appended
to an older one (corruption too, but detected by xapian-check, so nothing
extraordinary there).
To repeat, the issue here is not that the index is corrupted, but that
xapian-check does not see it. Is there some more thorough test which could
be run ?
Cheers,
J.F. Dockes
On Fri, Jan 08, 2016 at 08:11:48AM +0100, Jean-Francois Dockes wrote:> A Recoll user is reporting an index corruption problem. In general, index > corruption happens from time to time with Recoll, because of crashes, > reboots, misc Recoll bugs, etc. > > The strange thing here is that xapian-check does not seem to detect anything.Checking the database checks the B-tree structure, checks the contents of most of the tables makes sense, and does some cross-checking between tables, but the latter in particular is far from exhaustive. Looking at the exception message, if it is lacking a trailing '.' (as quoted below), then a corrupted entry (or chunk) in the list of document lengths, but if it has a trailing '.', then it's a missing entry in the record table. (I'm not sure if this punctuation difference was a fiendishly cunning deliberate plan or careless inconsistency...) We probably ought to cross-check the two - that shouldn't be costly to do.> This is with Xapian 1.2.16My guess is that the corruption is caused by the same bug as #645, which was fixed in 1.2.21.> I then ran "delve -t term ./xapiandb" and found a long list of IDs, one of > which was 6. I then ran "delve -r 6 ./xapiandb" and got a long list of > terms, which included 'term' and seemed to be reasonable for a document I > then ran "delve -r 6 ./xapiandb -d" and got the following: > > Data for record #6: > > Error: DocNotFoundError: Document 6 not foundHmm, if you're getting it with '-d' there, that makes me suspect a missing record table entry.> To repeat, the issue here is not that the index is corrupted, but that > xapian-check does not see it. Is there some more thorough test which could > be run ?You could try: delve -t '' ./xapiandb That will list the document lengths, so you can see if document 6 is in that list or not. Cheers, Olly
Olly Betts <olly <at> survex.com> writes:> > You could try: > > delve -t '' ./xapiandb > > That will list the document lengths, so you can see if document 6 is in > that list or not.I am the recoll user mentioned in the first post above. I still have a copy of the (potentially) corrupted index and I did the requested testing. I ran delve -t '' ./xapiandb on the index and it returned a very long list of document IDs, separated by spaces. I than ran delve -t '' ./xapiandb | grep " 6 " and it returned nothing. So, document 6 was not in the list. There were other documents missing from the index as well, so I ran delve -t '' ./xapiandb | head -c 100 The first ID was 257, then it began sequentially from 356. Looks like the first approximately 350 document IDs are "missing." I will look into the bug you listed to see if it might be related. If there is anything else that I can do, please let me know.