thr3ads.net - Ext3 users - Assertion failure in do_get_write_access() at transaction.c:609: [May 2002]

If this information is useful, please help other people find it:
Share via:

Neil Brown

2002-May-13 05:06 UTC

Assertion failure in do_get_write_access() at transaction.c:609:

Hi all (and developers in particular)

I just got bitten by this Assertion.   The one that starts as in the
subject, and ends with:
    "!(((jh2bh(jh))->b_state & (1UL << BH_Lock)) != 0)"


Google reminds me that it was mentioned a few times earlier this year,
but I couldn't find any statement saying that it has been fixed.
I got this in a 2.4.16 kernel, though the reports I found were 2.4.18.

So my question is:  has this been fixed yet?


What seems to trigger it for me is reading the block device file.

I have a program that runs every 10 hours and reads all the inode
tables straight of the block device and checks the disc usage against
what is stored in the quota file.  Any difference that is  found is
logged.  If the same difference gets logged 3 times in a row, I
correct it.

The Assertion failure, which has now happened twice, corresponds with
running this program.... so I might not run if quite so often any
more.

NeilBrown

Andrew Morton

2002-May-13 07:02 UTC

head link

Re: Assertion failure in do_get_write_access() at transaction.c:609:

Neil Brown wrote:> 
> Hi all (and developers in particular)
> 
> I just got bitten by this Assertion.   The one that starts as in the
> subject, and ends with:
>     "!(((jh2bh(jh))->b_state & (1UL << BH_Lock)) !=
0)"
> 
> Google reminds me that it was mentioned a few times earlier this year,
> but I couldn't find any statement saying that it has been fixed.
> I got this in a 2.4.16 kernel, though the reports I found were 2.4.18.
> 
> So my question is:  has this been fixed yet?
> 
Stephen has a fix in ext3 CVS for this.  I've been playing with that
fix in 2.5.x.  I guess we'll slot it into 2.4.20-pre.

I've uploaded the diff to
http://www.zip.com.au/~akpm/linux/patches/2.4/2.4.19-pre8/ext3-cvs.patch

(It's a `patch -p0' diff).

Stephen C. Tweedie

2002-May-13 10:25 UTC

head link

Re: Assertion failure in do_get_write_access() at transaction.c:609:

Hi,

On Mon, May 13, 2002 at 03:06:48PM +1000, Neil Brown
wrote:> 
> Hi all (and developers in particular)
> 
> I just got bitten by this Assertion.   The one that starts as in the
> subject, and ends with:
>     "!(((jh2bh(jh))->b_state & (1UL << BH_Lock)) !=
0)"
 > Google reminds me that it was mentioned a few times earlier this year,
> but I couldn't find any statement saying that it has been fixed.
> I got this in a 2.4.16 kernel, though the reports I found were 2.4.18.
 > So my question is:  has this been fixed yet?
Twice. :)
> What seems to trigger it for me is reading the block device file.
Yep.

The assertion failure can trigger in two ways.  One is reproducible
under normal filesystem activity, and has been fixed in 2.4 for a
while --- it was a missing "goto repeat" in one special case branch
which meant that that we could drop a lock and fail to re-test an
important condition.

The second case that can cause this is the one you are seeing,
involving block device IO in parallel with filesystem IO.  That one
has always been there for block write IO (and in fact it's arguably
right for ext3 to oops if out-of-band writes are causing ext3's own
metadata to be written out-of-order), but as of 2.4.11, we now have
the page-cache/buffer-cache aliasing interactions which can cause ext3
to see locked buffers even if you are only reading from the buffered
block device.

Current 2.4 and 2.5 don't handle that well --- in fact they can
corrupt the fs when it happens (even for ext2).  I posted a fix 3 or 4
weeks ago, as well as a patch which lets ext3 recover from the
situation properly.  It's not in the upstream kernels yet --- Al Viro
raised a question over whether it's the best fix, but it's definitely
the simplest one as far as I can see.

Those fixes are  currently all in ext3 CVS, and is part of the patch
akpm just posted.  I've got one more thing to sort out --- O_SYNC
behaviour in data-journaled mode --- and I'll push it all to Linus and
Marcelo.
> I have a program that runs every 10 hours and reads all the inode
> tables straight of the block device and checks the disc usage against
> what is stored in the quota file.  Any difference that is  found is
> logged.  If the same difference gets logged 3 times in a row, I
> correct it.
> 
> The Assertion failure, which has now happened twice, corresponds with
> running this program.... so I might not run if quite so often any
> more.
With the current ext3 patches, it should be safe enough (or at least
safe against kernel corruption --- reading a live filesystem via the
bdev is always unsafe from the application point of view because you
never get a consistent view of the fs.)  I saw the same thing
happening with dump(8) on a live fs, and testing has shown the current
patches to fix that.

Cheers,
 Stephen

Richard Kimber

2002-May-14 09:58 UTC

head link

Re: Assertion failure in do_get_write_access() at transaction.c:609:

On Mon, 13 May 2002 15:06:48 +1000 (EST)
Neil Brown <neilb@cse.unsw.edu.au> wrote:
> I just got bitten by this Assertion.   The one that starts as in the
> subject, and ends with:
>     "!(((jh2bh(jh))->b_state & (1UL << BH_Lock)) !=
0)"
After getting one of these, when using rsync to a PD drive, the device
could not be unmounted.  After re-booting I did e2fsck /dev/sdb5 and got:

e2fsck 1.24a (02-Sep-2001)
backup: recovering journal
backup: Attempt to read block from filesystem resulted in short read while
reading block 1331

JFS: Failed to read block at offset 813
JFS: IO error -5 recovering block 813 in log
e2fsck: Input/output error while recovering ext3 journal of backup

What should my next step be?

- Richard.
-- 
Richard Kimber
Political Science Resources        http://www.psr.keele.ac.uk/

UK-Euro FAQ          http://www.psr.keele.ac.uk/docs/efaq.htm

Stephen C. Tweedie

2002-May-14 14:46 UTC

head link

Re: Assertion failure in do_get_write_access() at transaction.c:609:

Hi,

On Tue, May 14, 2002 at 02:46:41PM +0100, Richard Kimber
wrote:> On Tue, 14 May 2002 12:19:43 +0100
> "Stephen C. Tweedie" <sct@redhat.com> wrote:
> > 	debugfs -w /dev/sdb5
> > 	debugfs:  features -needs_recovery -has_journal
> > 	Filesystem features: filetype sparse_super
> > 	debugfs:  q
 > but then
> >fsck -f /dev/sdb5
> fsck 1.24a (02-Sep-2001)
> e2fsck 1.24a (02-Sep-2001)
> backup: recovering journal
> backup: recovering journal
> backup: recovering journal
> backup: recovering journal
> ...... etc
What does a "tune2fs -l" show at this point?

--Stephen

Richard Kimber

2002-May-14 15:03 UTC

head link

Re: Assertion failure in do_get_write_access() at transaction.c:609:

On Tue, 14 May 2002 15:46:58 +0100
"Stephen C. Tweedie" <sct@redhat.com> wrote:
> Hi,
> 
> On Tue, May 14, 2002 at 02:46:41PM +0100, Richard Kimber wrote:
> > On Tue, 14 May 2002 12:19:43 +0100
> > "Stephen C. Tweedie" <sct@redhat.com> wrote:
> > > 	debugfs -w /dev/sdb5
> > > 	debugfs:  features -needs_recovery -has_journal
> > > 	Filesystem features: filetype sparse_super
> > > 	debugfs:  q
>  
> > but then
> > >fsck -f /dev/sdb5
> > fsck 1.24a (02-Sep-2001)
> > e2fsck 1.24a (02-Sep-2001)
> > backup: recovering journal
> > backup: recovering journal
> > backup: recovering journal
> > backup: recovering journal
> > ...... etc
> 
> What does a "tune2fs -l" show at this point?
OK. In the meantime I tried the above again.  This time, it worked. Having
said "y" to the requests to fix, I assume I now have an ext2 disk, and
that I simply go through the normal procedure to make a journal.  Many
thanks indeed for your help.
- Richard.
-- 
Richard Kimber
Political Science Resources        http://www.psr.keele.ac.uk/

UK-Euro FAQ          http://www.psr.keele.ac.uk/docs/efaq.htm

Possibly Parallel Threads

Search for more possibly parallel threads

Ext3 users - May 2002 - Assertion failure in do_get_write_access() at transaction.c:609:

Assertion failure in do_get_write_access() at transaction.c:609:

Re: Assertion failure in do_get_write_access() at transaction.c:609:

Re: Assertion failure in do_get_write_access() at transaction.c:609:

Re: Assertion failure in do_get_write_access() at transaction.c:609:

Re: Assertion failure in do_get_write_access() at transaction.c:609:

Re: Assertion failure in do_get_write_access() at transaction.c:609:

Possibly Parallel Threads