Hi,
On Wed, Feb 07, 2001 at 05:02:07PM -0800, Peter J. Braam
wrote:>
> We had some starvation/locks happening to us under very heavy load
> in two cases:
>
> - InterMezzo asked ext3 to do a journaled file write (for 1 block)
> essentially using
> ext3_write
> - similarly for truncate.
>
> These lockups went away when we started the transaction in
> InterMezzo and reserved somewhat
> more space than ext3 does.
>
> Any clues as to what this might be? Are the ext3 reservations big
> enough?
No clue: I've never had such lockups reported elsewhere.
The key to debugging this is to find out where the deadlock is, ie. to
work out where each process is blocking. "kdb" is ideal for this,
since you can interrupt the kernel and run "ps" to list all processes,
then "btp <pid>" on each "D" state process to work out
where it is
sleeping.
I had core commit deadlocks in some early versions of ext3, and they
were easily debugged this way. You will normally find that most
processes are stuck in the same place waiting on a lock being held by
the deadlock loop, so you look for the tasks blocked on a different
path to work out what is really going on.
Let me know whatever you find, and I'll do what I can to help.
Cheers,
Stephen