Hi, A Recoll user is reporting an index corruption problem. In general, index corruption happens from time to time with Recoll, because of crashes, reboots, misc Recoll bugs, etc. The strange thing here is that xapian-check does not seem to detect anything. In a nutshell, some document numbers seem to point to a data blackhole: the docids are returned when searching for the file/doc unique identifying term, but then get_document() fails. A later replace_document() succeeds, but on the next indexing pass, same issue. // success docid = db.postlist_begin(uniterm) // then failure: xdoc = db.get_document(*docid) In this situation, Recoll will try to update the doc. replace_document() then succeeds, and this repeats on the next indexing pass. This is with Xapian 1.2.16 Here follows a slightly edited version of what the user reports about experiments run with pure xapian-check/delve: Recoll 1.21.3 + Xapian 1.2.16 with two external indices (on a network server) and one local index. The setup has been running fine for weeks and the external indices update on a cron job overnight. Today, I searched for a term that I know is in many documents and can be found (my last name). No documents were found in the gui Recoll. I then searched in one external index on the command line "recoll -c -t -q term" and received the following response: :2:../rcldb/rclquery.cpp:358:xenquire->get_mset: exception: Document 6 not found Recoll query: ((term...)) -1 results :2:../rcldb/rclquery.cpp:392:enquire->get_mset: exception: Document 6 not found I then went through and checked as above (after installing xapian-tools). I ran the xapian-check on both external indices and both had no problems. I then ran "delve -t term ./xapiandb" and found a long list of IDs, one of which was 6. I then ran "delve -r 6 ./xapiandb" and got a long list of terms, which included 'term' and seemed to be reasonable for a document I then ran "delve -r 6 ./xapiandb -d" and got the following: Data for record #6: Error: DocNotFoundError: Document 6 not found And the output from xapian-check: ===========record: baseB blocksize=8K items=84507 lastblock=3379 revision=157 levels=2 root=12 B-tree checked okay record table structure checked OK termlist: baseB blocksize=8K items=169014 lastblock=24090 revision=157 levels=2 root=5 B-tree checked okay termlist table structure checked OK postlist: baseB blocksize=8K items=8727966 lastblock=66596 revision=157 levels=3 root=113 B-tree checked okay postlist table structure checked OK position: baseB blocksize=8K items=34905667 lastblock=109114 revision=157 levels=2 root=11167 B-tree checked okay position table structure checked OK spelling: Lazily created, and not yet used. synonym: baseB blocksize=8K items=255128 lastblock=4844 revision=157 levels=2 root=2 B-tree checked okay synonym table: Don't know how to check structure No errors found ============ The whole report is here: https://bitbucket.org/medoc/recoll/issues/257/query-returns-no-results-when-document-is Look for the 'Bob Cargill' section, unfortunately, the issue was appended to an older one (corruption too, but detected by xapian-check, so nothing extraordinary there). To repeat, the issue here is not that the index is corrupted, but that xapian-check does not see it. Is there some more thorough test which could be run ? Cheers, J.F. Dockes
On Fri, Jan 08, 2016 at 08:11:48AM +0100, Jean-Francois Dockes wrote:> A Recoll user is reporting an index corruption problem. In general, index > corruption happens from time to time with Recoll, because of crashes, > reboots, misc Recoll bugs, etc. > > The strange thing here is that xapian-check does not seem to detect anything.Checking the database checks the B-tree structure, checks the contents of most of the tables makes sense, and does some cross-checking between tables, but the latter in particular is far from exhaustive. Looking at the exception message, if it is lacking a trailing '.' (as quoted below), then a corrupted entry (or chunk) in the list of document lengths, but if it has a trailing '.', then it's a missing entry in the record table. (I'm not sure if this punctuation difference was a fiendishly cunning deliberate plan or careless inconsistency...) We probably ought to cross-check the two - that shouldn't be costly to do.> This is with Xapian 1.2.16My guess is that the corruption is caused by the same bug as #645, which was fixed in 1.2.21.> I then ran "delve -t term ./xapiandb" and found a long list of IDs, one of > which was 6. I then ran "delve -r 6 ./xapiandb" and got a long list of > terms, which included 'term' and seemed to be reasonable for a document I > then ran "delve -r 6 ./xapiandb -d" and got the following: > > Data for record #6: > > Error: DocNotFoundError: Document 6 not foundHmm, if you're getting it with '-d' there, that makes me suspect a missing record table entry.> To repeat, the issue here is not that the index is corrupted, but that > xapian-check does not see it. Is there some more thorough test which could > be run ?You could try: delve -t '' ./xapiandb That will list the document lengths, so you can see if document 6 is in that list or not. Cheers, Olly
Olly Betts <olly <at> survex.com> writes:> > You could try: > > delve -t '' ./xapiandb > > That will list the document lengths, so you can see if document 6 is in > that list or not.I am the recoll user mentioned in the first post above. I still have a copy of the (potentially) corrupted index and I did the requested testing. I ran delve -t '' ./xapiandb on the index and it returned a very long list of document IDs, separated by spaces. I than ran delve -t '' ./xapiandb | grep " 6 " and it returned nothing. So, document 6 was not in the list. There were other documents missing from the index as well, so I ran delve -t '' ./xapiandb | head -c 100 The first ID was 257, then it began sequentially from 356. Looks like the first approximately 350 document IDs are "missing." I will look into the bug you listed to see if it might be related. If there is anything else that I can do, please let me know.