thr3ads.net - Xapian discuss - xapian-check sorted order error [Oct 2020]

If this information is useful, please help other people find it:
Share via:

Matthew Somerville

2020-Oct-21 12:20 UTC

xapian-check sorted order error

Hi,

We were running xapian-check on one of our Xapian indexes and it
returns the following error:

position:
baseB blocksize=8K items=809896869 lastblock=2090419 revision=3161
levels=3 root=2084903
Failed to check B-tree: DatabaseError: Items not in sorted order

The other tables verify without issue. It looks like our oldest backup
of this database (a month old) has the same issue. Searching and
indexing are still working fine as far as we can see.

Running xapian-check with the F option to "attempt to fix a broken
database" doesn't seem to have any effect - the position table still
reports the same error both during the run with F enabled and on
subsequent runs.

Is this something we need to worry about? Can it be resolved? Thanks.

ATB,
Matthew

Olly Betts

2020-Oct-21 22:48 UTC

head link

xapian-check sorted order error

On Wed, Oct 21, 2020 at 01:20:34PM +0100, Matthew Somerville
wrote:> We were running xapian-check on one of our Xapian indexes and it
> returns the following error:
> 
> position:
> baseB blocksize=8K items=809896869 lastblock=2090419 revision=3161
> levels=3 root=2084903
> Failed to check B-tree: DatabaseError: Items not in sorted order
What this means is that walking the tree found a place where two
adjacent branches were misordered (it should always be left < right).
> The other tables verify without issue. It looks like our oldest backup
> of this database (a month old) has the same issue. Searching and
> indexing are still working fine as far as we can see.
> 
> Running xapian-check with the F option to "attempt to fix a broken
> database" doesn't seem to have any effect - the position table
still
> reports the same error both during the run with F enabled and on
> subsequent runs.
Unfortunately there's very fairly limited number of things that it
knows how to fix currently.  For chert it can regenerate the "base"
files that are sometimes lost or truncated if the power fails.  In
glass those files no longer exist and that problem is gone - it looks
like there's nothing fix can currently do for glass!  The nearest
equivalent would be regenerating a corrupted freelist.
> Is this something we need to worry about? Can it be resolved? Thanks.
Searches don't care directly about this ordering property, though
correct results do rely on it - if you look closely enough there's
probably some case where a phrase search fails to match something due
to this.

Updates might go wrong if they try to replace or remove an entry which
ends up hidden by this.  Probably you'll just end up with different
inconsistency but limited to the block in question at least.

You may be able to fix it using "xapian-compact" as that walks all the
leaf items items in each table and writes them to new tables in a new
database - that will put the items back in the right order in the new
database, so if that's the only problem then that should fix it.

We could probably allow "fix" to run a similar operation but limited
to the bad tables (so faster than compacting everything).  The tricky
part is deciding when that's a good idea, but perhaps "fix" could
just
warn you to make a backup first, and then give it a go if there's a
problem detected at this level.

Cheers,
    Olly

Matthew Somerville

2020-Oct-22 16:18 UTC

head link

xapian-check sorted order error

On Wed, 21 Oct 2020 at 23:48, Olly Betts <olly at survex.com>
wrote:> You may be able to fix it using "xapian-compact" as that walks
all the
> leaf items items in each table and writes them to new tables in a new
> database - that will put the items back in the right order in the new
> database, so if that's the only problem then that should fix it.
Thanks for your reply. Sadly, xapian-compact gave the following output:
    postlist: Reduced by 29% 2026096K (6760968K -> 4734872K)
    record: Reduced by 2% 11968K (482040K -> 470072K)
    termlist: Reduced by 14% 1214720K (8408144K -> 7193424K)
    position ...terminate called after throwing an instance of
'std::length_error'
      what():  basic_string::_M_replace
    Aborted

Guess to be on the safe side we'll look at scheduling in a full
reindex, and add xapian-check to our backup script so it warns us if
something like this happens again in future.

ATB,
Matthew

Xapian discuss - Oct 2020 - xapian-check sorted order error

xapian-check sorted order error

xapian-check sorted order error

xapian-check sorted order error