Yesterday a 3G+ Xapian database for a production web app became corrupt, e.g. reporting "Cannot open tables at consistent revisions" runtime error on read/write ops against it. Sharing this experience and the limited data I collected, in case anyone else runs into this case or has interesting suggestions. It seemed worth trying some quick fixes given a full Xapian db rebuild on this host and production data runs several days. Got somes ideas from this 2008 post by Olly: http://article.gmane.org/gmane.comp.search.xapian.general/6333 Results of my xapian-check run copied below. Since it reported mostly success with the "baseB" files, I first tried moving aside the Xapian *.baseA db files with the exception of postlist.baseA. FWIW, there was no postlist.baseB file in existence. A possible clue? Let me know if anything else stands out. I then tried some Xapian read/write ops and still got the "Cannot open tables at consistent revisions" error in all test cases. Next I tried moving back the *.baseA files and moving aside all of the *.baseB files. Success -- or so it appears! Both the search rebuild (writes) and web app search (reads) worked. I have seen a couple of Xapian db corruptions in the past, but this was the first instance since we upgraded to Xapian C++ core v1.0.12 and Search::Xapian Perl XS v1.0.12.0. I haven't been able to corrupt a Xapian db intentionally, but we are thinking about adding a Xapian "redo log" to track individual search transactions in our web app. - Alex *** # xapian-check /var/db/Xapian Database couldn't be opened for reading: DatabaseCorruptError: Cannot open tables at consistent revisions Continuing check anyway record: baseB blocksize=8K items=1135281 lastblock=1957 revision=40949 levels=2 root=1954 B-tree checked okay record table structure checked OK termlist: baseB blocksize=8K items=1135281 lastblock=126481 revision=40949 levels=2 root=126443 B-tree checked okay termlist table structure checked OK postlist: baseA blocksize=8K items=10620877 lastblock=299736 revision=40948 levels=3 root=42 B-tree checked okay postlist table structure checked OK position: baseB blocksize=8K items=184206075 lastblock=516275 revision=40949 levels=3 root=516059 B-tree checked okay position table structure checked OK value: baseB blocksize=8K items=1135281 lastblock=7232 revision=40949 levels=2 root=7228 B-tree checked okay value table structure checked OK spelling: Lazily created, and not yet used. synonym: Lazily created, and not yet used. Total errors found: 1
On Mon, Aug 24, 2009 at 02:44:43PM -0600, Alex Viggio wrote:> It seemed worth trying some quick fixes given a full Xapian db rebuild > on this host and production data runs several days. Got somes ideas from > this 2008 post by Olly: > > http://article.gmane.org/gmane.comp.search.xapian.general/6333Yes, that's still a useful overview.> Results of my xapian-check run copied below. Since it reported mostly > success with the "baseB" files, I first tried moving aside the Xapian > *.baseA db files with the exception of postlist.baseA. FWIW, there was > no postlist.baseB file in existence. A possible clue? Let me know if > anything else stands out.I should point out that currently xapian-check checks the latest version of each table individually, even though opening the database would want a consistent revision for each table. So the reason you see baseB for most tables is that baseB is newer. The lack of a baseA for postlist isn't paticularly significant - it probably just means that the postlist table was modified so its baseA revision is no longer valid.> I then tried some Xapian read/write ops and still got the "Cannot open > tables at consistent revisions" error in all test cases.Since there's no postlist.baseB, you aren't going to be able to open the baseB revision of the database.> Next I tried moving back the *.baseA files and moving aside all of the > *.baseB files. Success -- or so it appears! Both the search rebuild > (writes) and web app search (reads) worked.This recovery should have happened automatically, and it would be useful to understand why it didn't in this case. Do you have the original base files still? If so, how large are they?> I have seen a couple of Xapian db corruptions in the past, but this was > the first instance since we upgraded to Xapian C++ core v1.0.12 and > Search::Xapian Perl XS v1.0.12.0.1.0.10 fixed an issue which could lead to DatabaseCorruptError if the disk filled up. There hasn't been anything since then. Cheers, Olly