I don't know yet if this is an ext3, quota or drbd issue, but I'll ask
anyway. I am building a HA NFS server using two Dell-1750's and drbd.
I have ext3 filesystem with quota built on drbd device running over
200Gb disk partition (hardware raid0+1), drdb-mirrored across servers.
The kernel is 2.4.25, so hopefully quota deadlock should not be a
problem (it was on 2.4.24).
Now, the setup mostly works fine. But if you actively use the
filesystem for some time (hour of copying a large tree over NFS), then
then try 'sync' command, the latter runs very long (10 minutes or more),
eating 99% CPU according to top, and the system becomes very sluggish
(leading to stalled replication, heartbeat misbehavior) and in fact
unusable.
Any ideas why this happens and/or suggestions for further investigation?
Eugene
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL:
<http://listman.redhat.com/archives/ext3-users/attachments/20040324/b9693349/attachment.sig>