I previously wrote:> Stephen writes:
> > I'd much rather fix the buffer.c code. Having journaling try to
patch
> > up after somebody has deleted its buffer heads is very wrong, since we
> > risk the buffer journal lists getting badly corrupted if we allow
> > those buffers to be reused.
>
> > Does the patch below (untested, uncompiled!) work?
>
> OK, I've applied the patch (removing my old check for buffer_jdirty()),
> but leaving in the checks in jfs/commit.c for B_FREE buffer heads. This
> should tell me right away if there are still buffers being freed from
> underneath the journal, without oopsing my machine all the time.
I tested the patch, but it didn't seem to help. Basically, I was running
a kernel compile, and copying files into a different ext3 filesystem (both
on LVs), and running pvscan (i.e. invalidate_buffers) repeatedly. At one
point, I got several free buffers reported in cleanup_transaction.
After that (this is the first time it happened) my kernel compile deadlocked
and the kernel is stuck on:
ext3_create
journal_start
start_this_handle
log_wait_for_space
log_do_checkpoint
cleanup_transaction*
__wake_up*
The last 2 (*) functions appear and disappear from the KDB "bt"
output,
so I assume they are being called, but never finishing the transaction.
I've added more debugging to determine which device's buffers are marked
free (and compiling the kernel deadlocked up at the same place again).
It is noteworthy that the free buffers were reported after the copy had
finished (for the second deadlock), but the kernel compile was on a
different filesystem/journal and was started _after_ the previous free
buffers were reported, and pvscan was not run during the compile at all.
It is possible that one of the free buffers was on the kernel filesystem,
so hopefully my additional debugging will show this.
Cheers, Andreas
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert