Running notmuch from git on Debian testing[1] with the mail and database sitting on a ZFS filesystem, adding mail to a new database:> agrajag-testing ~/s/notmuch % ./notmuch new > Found 605510 total files (that's not much mail). > add_file: A Xapian exception occurred36m 37s remaining). > A Xapian exception occurred adding message: Unexpected end of posting list for 'G0000000000014364'. > Processed 137296 total files in 1h 21m 17s (28 files/sec.). > Added 135950 new messages to the database. > Note: A fatal error was encountered: A Xapian exception occurred > agrajag-testing ~/s/notmuch %This happens consistently at this point in the process - if I remove the database and start again it will fail at the same place. After the failure the database appears to be okay:> agrajag-testing ~/s/notmuch % xapian-check ~/Maildir/.notmuch/xapian > docdata: > blocksize=8K items=0 firstunused=1 revision=1 levels=0 root=(faked) > void B-tree checked okay > docdata table structure checked OK > > termlist: > blocksize=8K items=0 firstunused=2 revision=1 levels=0 root=(faked) > void B-tree checked okay > termlist table structure checked OK > > postlist: > blocksize=8K items=2 firstunused=1 revision=1 levels=0 root=0 > B-tree checked okay > postlist table structure checked OK > > position: > blocksize=8K items=0 firstunused=1 revision=1 levels=0 root=(faked) > void B-tree checked okay > position table structure checked OK > > spelling: > Lazily created, and not yet used. > > synonym: > Lazily created, and not yet used. > > No errors found > agrajag-testing ~/s/notmuch %If I don't remove the database and start the scan again, it will eventually fail again:> agrajag-testing ~/s/notmuch % ./notmuch new > add_file: A Xapian exception occurred). > A Xapian exception occurred adding message: Unexpected end of posting list for 'G0000000000014364'. > Processed 137296 total files in 1h 22m 6s (27 files/sec.). > Added 135950 new messages to the database. > Note: A fatal error was encountered: A Xapian exception occurred > agrajag-testing ~/s/notmuch %At which point the database *sometimes* appears to be corrupted, but (sod's law), not while I was preparing this message. Unfortunately I can't share the dataset, as it contains mail belonging to my employer :-( Any suggestions on how to go about debugging this? Footnotes: [1] Actually a Debian testing Docker container running on a Debian stable host, but the same failure happens with the Debian stable-backport version of notmuch running on Debian stable, just with less useful error messages. dme. -- Tonight I think I'll walk alone, I'll find my soul as I go home.
On Fri, Dec 29, 2017 at 03:00:47PM +0000, David Edmondson wrote:> Running notmuch from git on Debian testing[1] with the mail and database > sitting on a ZFS filesystem, adding mail to a new database: > > > agrajag-testing ~/s/notmuch % ./notmuch new > > Found 605510 total files (that's not much mail). > > add_file: A Xapian exception occurred36m 37s remaining). > > A Xapian exception occurred adding message: Unexpected end of posting list for 'G0000000000014364'.Each posting list chunk has a "last chunk" flag - this exception means this flag appears not to be set for the final chunk for the term 'G0000000000014364'. The update logic for this flag is somewhat complex, but it looks OK to me. This code hasn't changed recently either, so I suspect the problem is actually elsewhere.> > Processed 137296 total files in 1h 21m 17s (28 files/sec.). > > Added 135950 new messages to the database. > > Note: A fatal error was encountered: A Xapian exception occurred > > agrajag-testing ~/s/notmuch % > > This happens consistently at this point in the process - if I remove the > database and start again it will fail at the same place. > > After the failure the database appears to be okay: > > > agrajag-testing ~/s/notmuch % xapian-check ~/Maildir/.notmuch/xapian > > docdata: > > blocksize=8K items=0 firstunused=1 revision=1 levels=0 root=(faked) > > void B-tree checked okay > > docdata table structure checked OK > > > > termlist: > > blocksize=8K items=0 firstunused=2 revision=1 levels=0 root=(faked) > > void B-tree checked okay > > termlist table structure checked OK > > > > postlist: > > blocksize=8K items=2 firstunused=1 revision=1 levels=0 root=0 > > B-tree checked okay > > postlist table structure checked OK > > > > position: > > blocksize=8K items=0 firstunused=1 revision=1 levels=0 root=(faked) > > void B-tree checked okay > > position table structure checked OKThis seems to be for an almost empty database (2 items in the postlist table and nothing anywhere else) which doesn't really seem consistent with the amount of data notmuch reports as having processed. Are you setting XAPIAN_FLUSH_THRESHOLD very high? You can look at the low level entries in the postlist table with: xapian-inspect ~/Maildir/.notmuch/xapian/postlist.glass (You'll need to build xapian-core from source to get xapian-inspect, as it's really a tool for developers). I'd guess the two entries are user metadata entries notmuch has added (keys starting \x00\xc0) - otherwise I'd expect entries in other tables too.> Unfortunately I can't share the dataset, as it contains mail belonging > to my employer :-( > > Any suggestions on how to go about debugging this?Building xapian-core configured with --enable-assertions is worth trying, as it might catch the problem earlier. You could also try running it under valgrind, though that may take some time if it normally takes 1h21m. If that doesn't help, either we need to find a way to reproduce this which isn't using sensitive data, or add more consistency checks to the code to try to pin down what's going on. Cheers, Olly
On Friday, 2017-12-29 at 22:23:01 UTC, Olly Betts wrote:>> After the failure the database appears to be okay: >> >> > agrajag-testing ~/s/notmuch % xapian-check ~/Maildir/.notmuch/xapian >> > ... >> > position table structure checked OK > > This seems to be for an almost empty database (2 items in the postlist > table and nothing anywhere else) which doesn't really seem consistent > with the amount of data notmuch reports as having processed. Are you > setting XAPIAN_FLUSH_THRESHOLD very high?No, I didn't set any specific value.> You can look at the low level entries in the postlist table with: > > xapian-inspect ~/Maildir/.notmuch/xapian/postlist.glass > > (You'll need to build xapian-core from source to get xapian-inspect, > as it's really a tool for developers). > > I'd guess the two entries are user metadata entries notmuch has added > (keys starting \x00\xc0) - otherwise I'd expect entries in other tables > too. > >> Unfortunately I can't share the dataset, as it contains mail belonging >> to my employer :-( >> >> Any suggestions on how to go about debugging this? > > Building xapian-core configured with --enable-assertions is worth > trying, as it might catch the problem earlier.I'll try these things, thanks.> You could also try running it under valgrind, though that may take some > time if it normally takes 1h21m.valgrind complained about a couple of things in debugger.cc, but they seem both innocuous and unrelated.> If that doesn't help, either we need to find a way to reproduce this > which isn't using sensitive data, or add more consistency checks to the > code to try to pin down what's going on.David Bremner had a dataset that produced this same failure, I believe. I don't know the origin or status of that dataset - David? Just for completeness, I forced notmuch to build a chert database and the initial scan ran through to completion without any problems. dme. -- I'm not the reason you're looking for redemption.