Nope, soon after I posted the email box died. I'm still hitting Attempt
to refile buffer which is caused by cleanup_transaction(). I reverted to
use bh->b_jlist == BJ_None in my tests also.
Rereading Andrea's prior thread on this makes me think I'm heading down
the same path he did prior also. Bummer. :)
On Tue, 15 May 2001, Jay Weber wrote:
> Date: Tue, 15 May 2001 18:50:44 -0700 (PDT)
> From: Jay Weber <jweber@valinux.com>
> Reply-To: ext3-users@redhat.com
> To: linux-lvm@sistina.com
> Cc: ext3-users@redhat.com, Joe Thornber <thornber@btconnect.com>,
> sct@redhat.com
> Subject: Re: [linux-lvm] lvm deadlock with 2.4.x kernel?
>
> I think I have this one solved, I hope.
>
> I think what Andreas and I are running into are a few different
> assertions. One being the LVM lvm_do_pv_flush caused assertion which is
> related directly to invalidate_buffers() being called which then triggers
> refile_buffer() on a journaled buffer, which appears clean in all other
> ways according to the checks in refile_buffer().
>
> The following is what I've got in __invalidate_buffers() right now.
>
> if (!bh->b_count &&
!buffer_journaled(bh) &&
> (destroy_dirty_buffers || !buffer_dirty(bh)))
> put_last_free(bh);
> if (slept)
> goto again;
>
> Stephen suggested something along the above a bit ago, except he uses
> bh->b_jlist == BJ_None. buffer_journaled() seems to be a function in
fs.h
> which seems a bit more appropriate.
>
> Next, with the above we'd still see problems. My next patch included a
> suggestion from Heinz to add lock_kernel() and unlock_kernel() around the
> fsync_dev() and invalidate_buffers() in lvm.c/lvm_do_pv_flush().
> Currently I have this in my working kernel, I'm gonna try again without
it
> though, it seems that it shouldn't be necessary, the other block
devices
> I've looked at don't seem to lock the kernel.
>
> Lastly, I was still getting an assertion generating the "Attempt to
refile
> free buffer", but this one was actually caused by an ext3 journaling
> function calling refile_buffer(), not derived from invalidate_buffers().
>
> In fs/jfs/checkpoint.c/cleanup_transaction(), you'll note it does some
> buffer_head bit checks and then calls refile_buffer(). Mine currently
> looks like the following:
>
> if (!buffer_dirty(bh) && !buffer_jdirty(bh)
&&
> !buffer_journaled(bh) &&
> bh->b_list != BUF_CLEAN) {
> unlock_journal(journal);
> refile_buffer(bh);
> lock_journal(journal);
> return 1;
> }
>
> Note the addition of the !buffer_journaled(bh) check.
>
> Okay, so using all of the above, I have now been running multiple vgscan
> loops and a pvscan loop while untarr'ing kernel, removing the kernel
dir,
> and then untarring again, and building the kernel with make -j4 (eating up
> my memory and cpu) for nearly an hour with no assertions.
>
> To me it appears that Stephen had it right all along (in prior thread on
> this), he stated that the b_jlist == BJ_None may be necessary elsewhere
> also, to insure that there are no journaled buffers out there before
> handing back to refile_buffer(). I think that's what we were up
against
> and as far as I can tell (grepping for refile_buffer() in jfs/* code)
I've
> added the checks to all the appropriate cases.
>
> Andreas can you give the above a try and see if it solves the problem on
> your end also. Stephen, does this look good as far as what I've
changed?
>
> Sorry, no diffs just yet, the changes are rather smallish though.
>
> Thanks.
>
> On Tue, 15 May 2001, Chris Mason wrote:
>
> > Date: Tue, 15 May 2001 21:17:06 -0400
> > From: Chris Mason <mason@suse.com>
> > Reply-To: linux-lvm@sistina.com
> > To: linux-lvm@sistina.com
> > Subject: Re: [linux-lvm] lvm deadlock with 2.4.x kernel?
> >
> >
> >
> > On Tuesday, May 15, 2001 06:32:24 PM -0600 Andreas Dilger
> > <adilger@turbolinux.com> wrote:
> >
> > >> reiserfs should catch blocks that don't have the proper
bits set when it
> > >> starts i/o, and then it makes sure the block hasn't been
relogged while
> > >> the i/o was in progress. It sends warnings not an oops
though, check
> > >> your log files. If we were losing journal bits, and the log
code didn't
> > >> catch it, the result should be silent corruption.
> > >>
> > >> Since he is seeing deadlock, it seems more likely reiserfs is
trying to
> > >> lock a buffer for i/o, and that is hanging for some
reason....
> > >
> > > But what does PV_FLUSH do? Calls fsync_dev() to flush dirty
buffers to
> > > disk, and sync_supers() and waits for buffer I/O completion.
This is
> > > unlikely to be the cause of a problem, because that happens on
each
> > > sync call.
> > >
> > > It then calls __invalidate_buffers(dev, 0), which destroys
everything
> > > but dirty buffers (on ALL buffer lru lists).
> >
> > Unless I'm reading it wrong (2.4.4), __invalidate_buffers destroys
all
> > buffers that are clean and have b_count == 0. Reiserfs keeps b_count
> 0
> > for all metadata buffers that have been logged, while ext3 allows the
count
> > to be zero (but keeps them in the dirty list).
> >
> > __invalidate_buffers also waits on any locked buffers. Any chance one
of
> > the other LVM ioctls grabs some lvm lock before calling PV_FLUSH?
> >
> > You're right though, pv_flush certainly doesn't look like it
could cause
> > any deadlocks.
> >
> > -chris
> >
> > _______________________________________________
> > linux-lvm mailing list
> > linux-lvm@sistina.com
> > http://lists.sistina.com/mailman/listinfo/linux-lvm
> >
>
>
>
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users@redhat.com
> https://listman.redhat.com/mailman/listinfo/ext3-users
>