Hi,
In between I tested with 2.6.38rc6 - no hangs there, but extreme
slowness (copying with ~2MB/s) and periodic zero activity (up to 3
minutes) with programs trying to write to the btrfs. Since I saw very
high CPU utilization in the raid6 (md) code I suspect a problem there.
However, because that behavior didn''t seem acceptable as well, I
patched
a 2.6.37.3 vanilla kernel with the latest btrfs-unstable. The
performance was back, but it took ~16 hours until the lockup occurred,
the btrfs is inaccessible again. The usage scenario right at that point
was 4 threads writing to the btrfs via NFS with ~2MB/s each.
This time, btrfs-transac itself went into D state, same with all the
nfsd and a "touch" I placed to verify the btrfs lockup. Attached a
dmesg
of sysrq-t.
Does anyone have any ideas how to debug this - timeout detection,
in-memory data structure dumps, etc?
Regards,
Christian
On 02/23/2011 11:40 AM, Christian Schmidt wrote:> Hi,
>
> After a few weeks of testing and preparation I commissioned a new NFS
> server with btrfs for the main storage. I ran into two situations where
> the btrfs locked up and I had to hard reboot the machine (sysrq-b).
> I end up with btrfs-transac in state D, waiting for the pending
> transaction to be completed if I interpret the code right. On top of
> that all eight nfsds are in state D waiting to start several different
> transactions.
> I have attached the sysrq-t output after I killed all processes I could
> before rebooting.
>
> It only seems to happen with somewhat heavier IO load, in this case one
> process md5summing large files (a few TB in total) while another process
> tries to write to the NFS share. I never saw it e.g. while copying
> single files onto the file system or reading multiple files.
>
> I''ll be glad for any hints and recommendations.
>
> Christian
>