Chen Chen
2016-Apr-13 09:56 UTC
[Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd
Hi Ashish and other Gluster Users, When I put some heavy IO load onto my cluster (a rsync operation, ~600MB/s), one of the node instantly get inode locked and teared down the whole cluster. I've already turned on "features.lock-heal" but it didn't help. My clients is using a round-robin tactic to mount servers, hoping to average the pressure. Could it be caused by a race between NFS servers on different nodes? Should I instead create a dedicated NFS Server with huge memory, no brick, and multiple Ethernet cables? I really appreciate any help from you guys. Best wishes, Chen PS. Don't know why the native fuse client is 5 times inferior than the old good NFSv3. On 4/4/2016 6:11 PM, Ashish Pandey wrote:> Hi Chen, > > As I suspected, there are many blocked call for inodelk in sm11/mnt-disk1-mainvol.31115.dump.1459760675. > > ============================================> [xlator.features.locks.mainvol-locks.inode] > path=/home/analyzer/softs/bin/GenomeAnalysisTK.jar > mandatory=0 > inodelk-count=4 > lock-dump.domain.domain=mainvol-disperse-0:self-heal > lock-dump.domain.domain=mainvol-disperse-0 > inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=dc2d3dfcc57f0000, client=0x7ff03435d5f0, connection-id=sm12-8063-2016/04/01-07:51:46:892384-mainvol-client-0-0-0, blocked at 2016-04-01 16:52:58, granted at 2016-04-01 16:52:58 > inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=1414371e1a7f0000, client=0x7ff034204490, connection-id=hw10-17315-2016/04/01-07:51:44:421807-mainvol-client-0-0-0, blocked at 2016-04-01 16:58:51 > inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=a8eb14cd9b7f0000, client=0x7ff01400dbd0, connection-id=sm14-879-2016/04/01-07:51:56:133106-mainvol-client-0-0-0, blocked at 2016-04-01 17:03:41 > inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=b41a0482867f0000, client=0x7ff01800e670, connection-id=sm15-30906-2016/04/01-07:51:45:711474-mainvol-client-0-0-0, blocked at 2016-04-01 17:05:09 > ============================================> > This could be the cause of hang. > Possible Workaround - > If there is no IO going on for this volume, we can restart the volume using - gluster v start <volume-name> force. This will restart the nfs process too which will release the locks and > we could come out of this issue. > > Ashish-- Chen Chen Shanghai SmartQuerier Biotechnology Co., Ltd. Add: Add: 3F, 1278 Keyuan Road, Shanghai 201203, P. R. China Mob: +86 15221885893 Email: chenchen at smartquerier.com Web: www.smartquerier.com -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4169 bytes Desc: S/MIME Cryptographic Signature URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160413/dc668eab/attachment.p7s>
Ashish Pandey
2016-Apr-13 10:29 UTC
[Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd
Hi Chen, What do you mean by "instantly get inode locked and teared down the whole cluster" ? Do you mean that whole disperse volume became unresponsive? I don't have much idea about features.lock-heal so can't comment how can it help you. Could you please explain second part of your mail? What exactly are you trying to do and what is the setup? Also volume info, logs statedumps might help. ----- Ashish ----- Original Message ----- From: "Chen Chen" <chenchen at smartquerier.com> To: "Ashish Pandey" <aspandey at redhat.com> Cc: gluster-users at gluster.org Sent: Wednesday, April 13, 2016 3:26:53 PM Subject: Re: [Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd Hi Ashish and other Gluster Users, When I put some heavy IO load onto my cluster (a rsync operation, ~600MB/s), one of the node instantly get inode locked and teared down the whole cluster. I've already turned on "features.lock-heal" but it didn't help. My clients is using a round-robin tactic to mount servers, hoping to average the pressure. Could it be caused by a race between NFS servers on different nodes? Should I instead create a dedicated NFS Server with huge memory, no brick, and multiple Ethernet cables? I really appreciate any help from you guys. Best wishes, Chen PS. Don't know why the native fuse client is 5 times inferior than the old good NFSv3. On 4/4/2016 6:11 PM, Ashish Pandey wrote:> Hi Chen, > > As I suspected, there are many blocked call for inodelk in sm11/mnt-disk1-mainvol.31115.dump.1459760675. > > ============================================= > [xlator.features.locks.mainvol-locks.inode] > path=/home/analyzer/softs/bin/GenomeAnalysisTK.jar > mandatory=0 > inodelk-count=4 > lock-dump.domain.domain=mainvol-disperse-0:self-heal > lock-dump.domain.domain=mainvol-disperse-0 > inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=dc2d3dfcc57f0000, client=0x7ff03435d5f0, connection-id=sm12-8063-2016/04/01-07:51:46:892384-mainvol-client-0-0-0, blocked at 2016-04-01 16:52:58, granted at 2016-04-01 16:52:58 > inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=1414371e1a7f0000, client=0x7ff034204490, connection-id=hw10-17315-2016/04/01-07:51:44:421807-mainvol-client-0-0-0, blocked at 2016-04-01 16:58:51 > inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=a8eb14cd9b7f0000, client=0x7ff01400dbd0, connection-id=sm14-879-2016/04/01-07:51:56:133106-mainvol-client-0-0-0, blocked at 2016-04-01 17:03:41 > inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=b41a0482867f0000, client=0x7ff01800e670, connection-id=sm15-30906-2016/04/01-07:51:45:711474-mainvol-client-0-0-0, blocked at 2016-04-01 17:05:09 > ============================================= > > This could be the cause of hang. > Possible Workaround - > If there is no IO going on for this volume, we can restart the volume using - gluster v start <volume-name> force. This will restart the nfs process too which will release the locks and > we could come out of this issue. > > Ashish-- Chen Chen Shanghai SmartQuerier Biotechnology Co., Ltd. Add: Add: 3F, 1278 Keyuan Road, Shanghai 201203, P. R. China Mob: +86 15221885893 Email: chenchen at smartquerier.com Web: www.smartquerier.com _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://www.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160413/7abd8312/attachment.html>