On Mon, Nov 15, 2021 at 09:45:15PM +0100, Adam Sj?gren
wrote:> termlist:
> baseA blocksize=8K items=20885830 lastblock=10207839 revision=5434326
levels=3 root=7090
> Failed to check B-tree: DatabaseError: Key >= right dividing key in
level above
This error is essentially saying that check found a branch node where
the key which is meant to partition two nodes below doesn't actually do
so. Maybe replacing the bad dividing key with one which actually
partitions the nodes below (assuming they are actually partitioned)
would given a sensible working database, but it's hard to know without
trying it.
> position:
> baseA blocksize=8K items=13525517523 lastblock=118173891
revision=5434326 levels=4 root=6551
> Failed to check B-tree: DatabaseError: Table entry count says
13525517523 but actually counted 13525517278
This error is just that the record of how many entries there are is
wrong (this count is stored since it's useful to know in some cases, and
expensive to compute by scanning the whole table). It shouldn't get out
of step with the actual number of entries in the table, but since no
other errors are reported just fixing the metadata record seems
reasonable.
> Is there any chance to fix this without starting over?
Unfortunately there isn't an existing tool to try to fix these sort of
things (xapian-check has an "F" mode which can fix some chert
problems,
but they're all to do with recreating missing base and iamchert files
which is a problem sometimes seen with chert databases if a machine
loses power or hangs).
We recommend migrating off chert so perhaps reindexing is a good plan
anyway. I can see it may not be very appealing with a 1.1TB database
though.
> (We have had some problems recently with the introduction of a new kind
> of error, making the indexing program crash in "interesting"
ways, so I
> am not in doubt that this is a "self-inflicted wound").
>
> I think the index is chert (there is an 'iamchert' file), it's
on an
> Ubuntu 18.04 server with libxapian30 1.4.5-1ubuntu0.1.
It shouldn't really be possible for the program to cause a corrupt
database like this (except for program bugs like stray memory writes
into memory Xapian has allocated, or the program writing to file
descriptors which Xapian has open for writing on the database, etc).
However, the way a chert commit happens involves trying to stitch
together per-table atomic commits to make a per-database atomic commit,
which means we need to recover if some tables have committed and others
haven't - that's fiddly to do and we've found bugs there before. It
could be you've hit another one maybe.
In glass we replaced this whole mechanism with a new one which gives a
per-database atomic commit directly.
Cheers,
Olly