thr3ads.net - Xapian discuss - errors on rebuild [Apr 2017]

If this information is useful, please help other people find it:
Share via:

Ryan Cross

2017-Mar-02 22:48 UTC

errors on rebuild

Hi Olly,

Thanks for the detailed response.  I hadn’t realized there was a new xapian
haystack backend.  I’m going to try that but I have some upgrades to do first. 
Django 1.8, etc.

Thanks,
Ryan
 > On Feb 28, 2017, at 3:40 PM, Olly Betts <olly at survex.com> wrote:
> 
> On Mon, Feb 27, 2017 at 10:29:46AM -0800, Ryan Cross wrote:
>> I am trying to rebuild an index of 2+ million documents and have not
been successful.  I am running
>> 
>> Python 2.7
>> Django 1.7
>> Haystack 2.1.1
>> Xapian 1.2.21
>> 
>> The index rebuild command I’m using is: django-admin.py rebuild_index
--noinput --batch-size=100000
>> The rebuild completes but an immediate xapian-check returns this error:
> [...]
>> Trying the latest stable version, Xapian 1.4.3, it fails during the
rebuild:
>> 
>> All documents removed.
>> Indexing 2233651 messages
>> Traceback (most recent call last):
>> …
>> 
>>  File
"/a/mailarch/current/haystack/management/commands/update_index.py",
line 221, in handle_label
>>    self.update_backend(label, using)
>>  File
"/a/mailarch/current/haystack/management/commands/update_index.py",
line 266, in update_backend
>>    do_update(backend, index, qs, start, end, total, self.verbosity)
>>  File
"/a/mailarch/current/haystack/management/commands/update_index.py",
line 89, in do_update
>>    backend.update(index, current_qs)
>>  File
"/a/mailarch/current/haystack/backends/xapian_backend.py", line 286,
in update
>>    database.close()
> 
> What's the version of xapian-haystack?  There's not a
database.close() anywhere
> near line 286 in git master:
> 
>
https://github.com/notanumber/xapian-haystack/blob/master/xapian_backend.py#L286
> 
>> xapian.DatabaseCorruptError: Expected block 615203 to be level 0, not 1
>> docdata:
>> blocksize=8K items=380000 firstunused=21983 revision=38 levels=2
root=21410
> 
> Is that the full output of xapian-check?
> 
>> Any suggestions for how I could get more information to troubleshoot
this
>> failure would be greatly appreciated.
> 
> Is the data to reproduce this something you can make available?
> 
> I'd stick with Xapian 1.4.3 for trying to narrow this down (if it's
a Xapian
> bug we can backport the fix once identified).
> 
> The error message means that a block which was expected to be at the leaf
level
> was actually marked as being one level above, which suggests either
there's an
> obscure bug in the backend code which only manifests in rare circumstances,
or
> something is corrupting data (could be in memory or on disk).
> 
> Since this happens with both 1.2.x and 1.4.x I would tend to suspect
it's
> something external (rather than a bug in Xapian) as the default backends in
1.2
> and 1.4 have some significant differences.  It's certainly possible
it's a
> Xapian bug, but if so I would expect we'd be seeing other reports,
though maybe
> we've actually had one or two and thought them due to #675, which was
fixed in
> 1.2.21 (however nobody's yet said "no, still seeing that"):
> 
> https://trac.xapian.org/ticket/675
> 
> You could look at block 615203 of docdata.glass to see what it looks like -
> that might offer clues:
> 
> xxd -g1 -seek $((615203*8192)) -len 8192 docdata.glass
> 
> It'd also be good to eliminate possible system issues - e.g. check the
disk is
> healthy (check the SMART status, run fsck on it), run a RAM test (distros
often
> provide a way to run memtest86+ or similar from the boot menu).
> 
> Cheers,
>    Olly

Ryan Cross

2017-Mar-25 23:36 UTC

head link

errors on rebuild

Hi Olly,

After upgrades my stack is now:

Python 2.7
Django 1.8
Haystack 2.6.0
Xapian 1.4.3. (latest xapian haystack backend with some modifications)

Using the same rebuild command as below but with —batch-size=50000

The issue has now become one of performance.  I am indexing 2.2 million
documents.  Using delve I can see that performance starts off at about 100,000
records an hour.  This is consistent with the roughly 24 hour rebuild time I was
experiencing with Xapian 1.2.21 (chert).  However, after 75 hours of build time,
the index is about 75% complete and records are processing at a rate of
10,000/hr.  The index is 51GB is size, 30GB is position.glass.

Here is a one minute strace summary

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 63.97    1.272902          13    100240           pread
 33.71    0.670733          14     48175           pwrite
  0.57    0.011253           8      1484           read
  0.45    0.008938           6      1524           fstat
  0.36    0.007098           6      1270           lseek
  0.25    0.004988          20       254           open
  0.18    0.003544          14       254           recvfrom
  0.11    0.002148           8       254           sendto
  0.10    0.002056           8       254           close
  0.10    0.001949           8       254           poll
  0.07    0.001429          11       127           munmap
  0.06    0.001111           9       127           mmap
  0.04    0.000802           6       127       127 ioctl
  0.04    0.000773           6       127           gettimeofday
------ ----------- ----------- --------- --------- ----------------
100.00    1.989724                154471       127 total

This is ten documents with number of terms in the 10s - low100s range.  Is there
a way I can tune for better performance?

Thanks,
Ryan

> On Mar 2, 2017, at 4:48 PM, Ryan Cross <rcross at amsl.com> wrote:
> 
> Hi Olly,
> 
> Thanks for the detailed response.  I hadn’t realized there was a new xapian
haystack backend.  I’m going to try that but I have some upgrades to do first. 
Django 1.8, etc.
> 
> Thanks,
> Ryan
> 
>> On Feb 28, 2017, at 3:40 PM, Olly Betts <olly at survex.com>
wrote:
>> 
>> On Mon, Feb 27, 2017 at 10:29:46AM -0800, Ryan Cross wrote:
>>> I am trying to rebuild an index of 2+ million documents and have
not been successful.  I am running
>>> 
>>> Python 2.7
>>> Django 1.7
>>> Haystack 2.1.1
>>> Xapian 1.2.21
>>> 
>>> The index rebuild command I’m using is: django-admin.py
rebuild_index --noinput --batch-size=100000
>>> The rebuild completes but an immediate xapian-check returns this
error:
>> [...]
>>> Trying the latest stable version, Xapian 1.4.3, it fails during the
rebuild:
>>> 
>>> All documents removed.
>>> Indexing 2233651 messages
>>> Traceback (most recent call last):
>>> …
>>> 
>>> File
"/a/mailarch/current/haystack/management/commands/update_index.py",
line 221, in handle_label
>>>   self.update_backend(label, using)
>>> File
"/a/mailarch/current/haystack/management/commands/update_index.py",
line 266, in update_backend
>>>   do_update(backend, index, qs, start, end, total, self.verbosity)
>>> File
"/a/mailarch/current/haystack/management/commands/update_index.py",
line 89, in do_update
>>>   backend.update(index, current_qs)
>>> File
"/a/mailarch/current/haystack/backends/xapian_backend.py", line 286,
in update
>>>   database.close()
>> 
>> What's the version of xapian-haystack?  There's not a
database.close() anywhere
>> near line 286 in git master:
>> 
>>
https://github.com/notanumber/xapian-haystack/blob/master/xapian_backend.py#L286
>> 
>>> xapian.DatabaseCorruptError: Expected block 615203 to be level 0,
not 1
>>> docdata:
>>> blocksize=8K items=380000 firstunused=21983 revision=38 levels=2
root=21410
>> 
>> Is that the full output of xapian-check?
>> 
>>> Any suggestions for how I could get more information to
troubleshoot this
>>> failure would be greatly appreciated.
>> 
>> Is the data to reproduce this something you can make available?
>> 
>> I'd stick with Xapian 1.4.3 for trying to narrow this down (if
it's a Xapian
>> bug we can backport the fix once identified).
>> 
>> The error message means that a block which was expected to be at the
leaf level
>> was actually marked as being one level above, which suggests either
there's an
>> obscure bug in the backend code which only manifests in rare
circumstances, or
>> something is corrupting data (could be in memory or on disk).
>> 
>> Since this happens with both 1.2.x and 1.4.x I would tend to suspect
it's
>> something external (rather than a bug in Xapian) as the default
backends in 1.2
>> and 1.4 have some significant differences.  It's certainly possible
it's a
>> Xapian bug, but if so I would expect we'd be seeing other reports,
though maybe
>> we've actually had one or two and thought them due to #675, which
was fixed in
>> 1.2.21 (however nobody's yet said "no, still seeing
that"):
>> 
>> https://trac.xapian.org/ticket/675
>> 
>> You could look at block 615203 of docdata.glass to see what it looks
like -
>> that might offer clues:
>> 
>> xxd -g1 -seek $((615203*8192)) -len 8192 docdata.glass
>> 
>> It'd also be good to eliminate possible system issues - e.g. check
the disk is
>> healthy (check the SMART status, run fsck on it), run a RAM test
(distros often
>> provide a way to run memtest86+ or similar from the boot menu).
>> 
>> Cheers,
>>   Olly
>

Olly Betts

2017-Apr-03 01:29 UTC

head link

errors on rebuild

On Sat, Mar 25, 2017 at 06:36:25PM -0500, Ryan Cross
wrote:> After upgrades my stack is now:
> 
> Python 2.7
> Django 1.8
> Haystack 2.6.0
> Xapian 1.4.3. (latest xapian haystack backend with some modifications)
> 
> Using the same rebuild command as below but with —batch-size=50000
> 
> The issue has now become one of performance.  I am indexing 2.2 million
> documents.  Using delve I can see that performance starts off at about
> 100,000 records an hour.  This is consistent with the roughly 24 hour
> rebuild time I was experiencing with Xapian 1.2.21 (chert).  However,
> after 75 hours of build time, the index is about 75% complete and
> records are processing at a rate of 10,000/hr.  The index is 51GB is
> size, 30GB is position.glass.  
One of the big differences between chert and glass is that glass stores
positional data in a different order such that phrase searches are much
more I/O efficient.  The downside is that this means extra work at index
time, and more data to batch up in memory.  There's a thread discussing
this here:

https://lists.xapian.org/pipermail/xapian-discuss/2016-April/009368.html
> Here is a one minute strace summary
> 
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  63.97    1.272902          13    100240           pread
>  33.71    0.670733          14     48175           pwrite
A one minute sample is hard to extrapolate from, as the indexing process
currently goes through phases of flushing changes, so whichever phase the
one minute is from isn't going to be representative.

But from the information you give, my guess is that the extra memory
used for batching up changes is pushing you over an I/O cliff, and
you would get better throughput by reducing the batch size (assuming
the "batch size" you specify maps to XAPIAN_FLUSH_THRESHOLD or
something
equivalent).  Especially likely if you tuned that batch size for chert.

There are some longer term plans to rework the batching and flush process
which should improve matters a lot (and hopefully remove the need for
manually tweaking such settings).  I'm hoping that will land in the
next release series, so you could consider sticking with chert for 1.4.x,
assuming the problematic phrase search cases aren't an issue for you.
There are various other improvements between chert and glass (better
tracking of free space, less on-disk overhead) which you'd lose out on
though.

Cheers,
    Olly

Apparently Analagous Threads

Search for more reasonably related threads

Xapian discuss - Apr 2017 - errors on rebuild

errors on rebuild

errors on rebuild

errors on rebuild

Apparently Analagous Threads