thr3ads.net - Xapian discuss - [Xapian-discuss] Rebuilding corrupt databases from .DB files. [Apr 2012]

If this information is useful, please help other people find it:
Share via:

Graham Jones

2012-Apr-16 00:15 UTC

[Xapian-discuss] Rebuilding corrupt databases from .DB files.

We've had some catastrophic filesystem failures that have left us with
corrupted databases with empty files and no backup for about 15TB of our data.
Recreating the 15TB from source data backups is possible but will take a very
very long time.

I'm hoping that, given all of the .DB files are still intact, there my be
some way to extract their contents and rebuild the other tables.

This is one of our 800 databases that has been corrupted and you can see which
files are empty:
drwxr-xr-x 1152 search search       40960 Apr 15 23:06 ..
drwxr-xr-x    2 search search        4096 Jan 31 17:39 .
-rw-r--r--    1 search search           0 Jan 31 17:39 iamchert
-rw-r--r--    1 search search           0 Jan 31 17:39 position.baseB
-rw-r--r--    1 search search  2191785984 Jan 31 17:39 position.DB
-rw-r--r--    1 search search           0 Jan 31 17:37 position.baseA
-rw-r--r--    1 search search           0 Jan 31 17:37 termlist.baseB
-rw-r--r--    1 search search  3953393664 Jan 31 17:37 termlist.DB
-rw-r--r--    1 search search           0 Jan 31 17:37 record.baseB
-rw-r--r--    1 search search 10681057280 Jan 31 17:37 record.DB
-rw-r--r--    1 search search           0 Jan 31 17:37 termlist.baseA
-rw-r--r--    1 search search           0 Jan 31 17:15 postlist.baseB
-rw-r--r--    1 search search  5559812096 Jan 31 17:15 postlist.DB
-rw-r--r--    1 search search           0 Jan 31 17:15 record.baseA
-rw-r--r--    1 search search           0 Jan 31 17:10 postlist.baseA

I have already tried recreating the .baseA and iamchert files from copying
similar databases (as these seem to be identical save for the UUID in iamchert)
but can't get it to be usable without the .baseB files.
I've also tried using copy database but that doesn't work.

Can someone tell me what is in the .baseB files and if their contents can be
recreated from the .DB files if I were to write something that can read and
process the files at a low level.

Olly Betts

2012-Apr-16 00:56 UTC

head link

[Xapian-discuss] Rebuilding corrupt databases from .DB files.

On Mon, Apr 16, 2012 at 10:15:08AM +1000, Graham Jones
wrote:> I have already tried recreating the .baseA and iamchert files from
> copying similar databases (as these seem to be identical save for the
> UUID in iamchert) but can't get it to be usable without the .baseB
> files.
It is safe to create an new "donor" chert database and harvest its
"iamchert" file (the only problem would be if it was a different
version
of the chert format, but that hasn't changed for ages, and is unlikely
to in the future).

Copying the .baseA or .baseB files from a different database isn't going
to work.
> Can someone tell me what is in the .baseB files and if their contents
> can be recreated from the .DB files if I were to write something that
> can read and process the files at a low level.
They can be recreated (as in, it is possible to write a tool to do this,
but no such tool currently exists AFAIK).

Essentially the base file has some header info, and a bitmap of used
blocks, and then the revision number repeated again - this format is
described by a comment in backends/chert/chert_btreebase.h.

But you probably don't need to write that yourself - my suggestion would
be to start from the Btree consistency checking code, which iterates the
tree from the root block, and compares the actually used blocks against
those marked as used in the bitmap.  Instead you could iterate and
create a new bitmap.

That code is in backends/chert/chert_check.cc.

You also need to find the right root block to get you started - this
isn't entirely trivial to do in general, but you can get a list of
candidates by scanning all the blocks in the .DB file looking at
GET_LEVEL() and REVISION().

Naively, the right root is the one with the highest level and revision,
but the complications are that if the Btree has had deletes and lost a
level, then it might be the root you want has a lower level than an
older root block which hasn't yet been reused, and that there may be a
higher revision number (probably only one higher) on some blocks if
there was revision which was partly or fully written but not committed.
If your databases were produced by compaction, then these complications
aren't a concern.

If you pick a root which isn't the latest, you'll likely fail to find
a full tree under it - you need to check that you don't hit a child
block which is newer than its parent as you iterate.  If you don't,
then Xapian will throw an exception when it tries to use that part of
the database.

The baseA/baseB difference is just that one is the latest revision and
one the revision before.  If you're recreating, you can just create a
set with either name - so long as they're consistent, then baseA vs
baseB doesn't matter.

If you find the root blocks, you could probably just create a set of
base files with dummy bitmaps and use copydatabase, but that will be
slow for that much data, so recreating the bitmaps is probably
worthwhile.

Once you've recreated a set of base files, try xapian-check on the
database to makes sure it looks consistent at both the Btree and
higher levels.

Good luck!

Cheers,
    Olly

Maybe Matching Threads

Search for more maybe matching threads

Xapian discuss - Apr 2012 - Rebuilding corrupt databases from .DB files.

[Xapian-discuss] Rebuilding corrupt databases from .DB files.

[Xapian-discuss] Rebuilding corrupt databases from .DB files.

Maybe Matching Threads