Neil Brown
2002-May-13 05:06 UTC
Assertion failure in do_get_write_access() at transaction.c:609:
Hi all (and developers in particular) I just got bitten by this Assertion. The one that starts as in the subject, and ends with: "!(((jh2bh(jh))->b_state & (1UL << BH_Lock)) != 0)" Google reminds me that it was mentioned a few times earlier this year, but I couldn't find any statement saying that it has been fixed. I got this in a 2.4.16 kernel, though the reports I found were 2.4.18. So my question is: has this been fixed yet? What seems to trigger it for me is reading the block device file. I have a program that runs every 10 hours and reads all the inode tables straight of the block device and checks the disc usage against what is stored in the quota file. Any difference that is found is logged. If the same difference gets logged 3 times in a row, I correct it. The Assertion failure, which has now happened twice, corresponds with running this program.... so I might not run if quite so often any more. NeilBrown
Andrew Morton
2002-May-13 07:02 UTC
Re: Assertion failure in do_get_write_access() at transaction.c:609:
Neil Brown wrote:> > Hi all (and developers in particular) > > I just got bitten by this Assertion. The one that starts as in the > subject, and ends with: > "!(((jh2bh(jh))->b_state & (1UL << BH_Lock)) != 0)" > > Google reminds me that it was mentioned a few times earlier this year, > but I couldn't find any statement saying that it has been fixed. > I got this in a 2.4.16 kernel, though the reports I found were 2.4.18. > > So my question is: has this been fixed yet? >Stephen has a fix in ext3 CVS for this. I've been playing with that fix in 2.5.x. I guess we'll slot it into 2.4.20-pre. I've uploaded the diff to http://www.zip.com.au/~akpm/linux/patches/2.4/2.4.19-pre8/ext3-cvs.patch (It's a `patch -p0' diff).
Stephen C. Tweedie
2002-May-13 10:25 UTC
Re: Assertion failure in do_get_write_access() at transaction.c:609:
Hi, On Mon, May 13, 2002 at 03:06:48PM +1000, Neil Brown wrote:> > Hi all (and developers in particular) > > I just got bitten by this Assertion. The one that starts as in the > subject, and ends with: > "!(((jh2bh(jh))->b_state & (1UL << BH_Lock)) != 0)"> Google reminds me that it was mentioned a few times earlier this year, > but I couldn't find any statement saying that it has been fixed. > I got this in a 2.4.16 kernel, though the reports I found were 2.4.18.> So my question is: has this been fixed yet?Twice. :)> What seems to trigger it for me is reading the block device file.Yep. The assertion failure can trigger in two ways. One is reproducible under normal filesystem activity, and has been fixed in 2.4 for a while --- it was a missing "goto repeat" in one special case branch which meant that that we could drop a lock and fail to re-test an important condition. The second case that can cause this is the one you are seeing, involving block device IO in parallel with filesystem IO. That one has always been there for block write IO (and in fact it's arguably right for ext3 to oops if out-of-band writes are causing ext3's own metadata to be written out-of-order), but as of 2.4.11, we now have the page-cache/buffer-cache aliasing interactions which can cause ext3 to see locked buffers even if you are only reading from the buffered block device. Current 2.4 and 2.5 don't handle that well --- in fact they can corrupt the fs when it happens (even for ext2). I posted a fix 3 or 4 weeks ago, as well as a patch which lets ext3 recover from the situation properly. It's not in the upstream kernels yet --- Al Viro raised a question over whether it's the best fix, but it's definitely the simplest one as far as I can see. Those fixes are currently all in ext3 CVS, and is part of the patch akpm just posted. I've got one more thing to sort out --- O_SYNC behaviour in data-journaled mode --- and I'll push it all to Linus and Marcelo.> I have a program that runs every 10 hours and reads all the inode > tables straight of the block device and checks the disc usage against > what is stored in the quota file. Any difference that is found is > logged. If the same difference gets logged 3 times in a row, I > correct it. > > The Assertion failure, which has now happened twice, corresponds with > running this program.... so I might not run if quite so often any > more.With the current ext3 patches, it should be safe enough (or at least safe against kernel corruption --- reading a live filesystem via the bdev is always unsafe from the application point of view because you never get a consistent view of the fs.) I saw the same thing happening with dump(8) on a live fs, and testing has shown the current patches to fix that. Cheers, Stephen
Richard Kimber
2002-May-14 09:58 UTC
Re: Assertion failure in do_get_write_access() at transaction.c:609:
On Mon, 13 May 2002 15:06:48 +1000 (EST) Neil Brown <neilb@cse.unsw.edu.au> wrote:> I just got bitten by this Assertion. The one that starts as in the > subject, and ends with: > "!(((jh2bh(jh))->b_state & (1UL << BH_Lock)) != 0)"After getting one of these, when using rsync to a PD drive, the device could not be unmounted. After re-booting I did e2fsck /dev/sdb5 and got: e2fsck 1.24a (02-Sep-2001) backup: recovering journal backup: Attempt to read block from filesystem resulted in short read while reading block 1331 JFS: Failed to read block at offset 813 JFS: IO error -5 recovering block 813 in log e2fsck: Input/output error while recovering ext3 journal of backup What should my next step be? - Richard. -- Richard Kimber Political Science Resources http://www.psr.keele.ac.uk/ UK-Euro FAQ http://www.psr.keele.ac.uk/docs/efaq.htm
Stephen C. Tweedie
2002-May-14 14:46 UTC
Re: Assertion failure in do_get_write_access() at transaction.c:609:
Hi, On Tue, May 14, 2002 at 02:46:41PM +0100, Richard Kimber wrote:> On Tue, 14 May 2002 12:19:43 +0100 > "Stephen C. Tweedie" <sct@redhat.com> wrote: > > debugfs -w /dev/sdb5 > > debugfs: features -needs_recovery -has_journal > > Filesystem features: filetype sparse_super > > debugfs: q> but then > >fsck -f /dev/sdb5 > fsck 1.24a (02-Sep-2001) > e2fsck 1.24a (02-Sep-2001) > backup: recovering journal > backup: recovering journal > backup: recovering journal > backup: recovering journal > ...... etcWhat does a "tune2fs -l" show at this point? --Stephen
Richard Kimber
2002-May-14 15:03 UTC
Re: Assertion failure in do_get_write_access() at transaction.c:609:
On Tue, 14 May 2002 15:46:58 +0100 "Stephen C. Tweedie" <sct@redhat.com> wrote:> Hi, > > On Tue, May 14, 2002 at 02:46:41PM +0100, Richard Kimber wrote: > > On Tue, 14 May 2002 12:19:43 +0100 > > "Stephen C. Tweedie" <sct@redhat.com> wrote: > > > debugfs -w /dev/sdb5 > > > debugfs: features -needs_recovery -has_journal > > > Filesystem features: filetype sparse_super > > > debugfs: q > > > but then > > >fsck -f /dev/sdb5 > > fsck 1.24a (02-Sep-2001) > > e2fsck 1.24a (02-Sep-2001) > > backup: recovering journal > > backup: recovering journal > > backup: recovering journal > > backup: recovering journal > > ...... etc > > What does a "tune2fs -l" show at this point?OK. In the meantime I tried the above again. This time, it worked. Having said "y" to the requests to fix, I assume I now have an ext2 disk, and that I simply go through the normal procedure to make a journal. Many thanks indeed for your help. - Richard. -- Richard Kimber Political Science Resources http://www.psr.keele.ac.uk/ UK-Euro FAQ http://www.psr.keele.ac.uk/docs/efaq.htm