thr3ads.net - Btrfs devel - [2.6.37] btrfs-transac hanging in prepare_to

If this information is useful, please help other people find it:
Share via:

Christian Schmidt

2011-Feb-23 10:40 UTC

[2.6.37] btrfs-transac hanging in prepare_to_wait

Hi,

After a few weeks of testing and preparation I commissioned a new NFS
server with btrfs for the main storage. I ran into two situations where
the btrfs locked up and I had to hard reboot the machine (sysrq-b).
I end up with btrfs-transac in state D, waiting for the pending
transaction to be completed if I interpret the code right. On top of
that all eight nfsds are in state D waiting to start several different
transactions.
I have attached the sysrq-t output after I killed all processes I could
before rebooting.

It only seems to happen with somewhat heavier IO load, in this case one
process md5summing large files (a few TB in total) while another process
tries to write to the NFS share. I never saw it e.g. while copying
single files onto the file system or reading multiple files.

I''ll be glad for any hints and recommendations.

Christian

Christian Schmidt

2011-Mar-13 17:52 UTC

head link

Re: btrfs-transac hanging in prepare_to_wait

Hi,

In between I tested with 2.6.38rc6 - no hangs there, but extreme
slowness (copying with ~2MB/s) and periodic zero activity (up to 3
minutes) with programs trying to write to the btrfs. Since I saw very
high CPU utilization in the raid6 (md) code I suspect a problem there.

However, because that behavior didn''t seem acceptable as well, I
patched
a 2.6.37.3 vanilla kernel with the latest btrfs-unstable. The
performance was back, but it took ~16 hours until the lockup occurred,
the btrfs is inaccessible again. The usage scenario right at that point
was 4 threads writing to the btrfs via NFS with ~2MB/s each.

This time, btrfs-transac itself went into D state, same with all the
nfsd and a "touch" I placed to verify the btrfs lockup. Attached a
dmesg
of sysrq-t.

Does anyone have any ideas how to debug this - timeout detection,
in-memory data structure dumps, etc?

Regards,
Christian

On 02/23/2011 11:40 AM, Christian Schmidt wrote:> Hi,
> 
> After a few weeks of testing and preparation I commissioned a new NFS
> server with btrfs for the main storage. I ran into two situations where
> the btrfs locked up and I had to hard reboot the machine (sysrq-b).
> I end up with btrfs-transac in state D, waiting for the pending
> transaction to be completed if I interpret the code right. On top of
> that all eight nfsds are in state D waiting to start several different
> transactions.
> I have attached the sysrq-t output after I killed all processes I could
> before rebooting.
> 
> It only seems to happen with somewhat heavier IO load, in this case one
> process md5summing large files (a few TB in total) while another process
> tries to write to the NFS share. I never saw it e.g. while copying
> single files onto the file system or reading multiple files.
> 
> I''ll be glad for any hints and recommendations.
> 
> Christian
>

Btrfs devel - Feb 2011 - [2.6.37] btrfs-transac hanging in prepare_to_wait

[2.6.37] btrfs-transac hanging in prepare_to_wait

Re: btrfs-transac hanging in prepare_to_wait