thr3ads.net - Gluster users - [Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd [Apr 2016]

If this information is useful, please help other people find it:
Share via:

Ashish Pandey

2016-Apr-13 10:29 UTC

[Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd

Hi Chen, 

What do you mean by "instantly get inode locked and teared down 
the whole cluster" ? Do you mean that whole disperse volume became
unresponsive?

I don't have much idea about features.lock-heal so can't comment how can
it help you.

Could you please explain second part of your mail? What exactly are you trying
to do and what is the setup?
Also volume info, logs statedumps might help. 

----- 
Ashish 


----- Original Message -----

From: "Chen Chen" <chenchen at smartquerier.com> 
To: "Ashish Pandey" <aspandey at redhat.com> 
Cc: gluster-users at gluster.org 
Sent: Wednesday, April 13, 2016 3:26:53 PM 
Subject: Re: [Gluster-users] Need some help on Mismatching xdata / Failed
combine iatt / Too many fd

Hi Ashish and other Gluster Users, 

When I put some heavy IO load onto my cluster (a rsync operation, 
~600MB/s), one of the node instantly get inode locked and teared down 
the whole cluster. I've already turned on "features.lock-heal" but
it
didn't help. 

My clients is using a round-robin tactic to mount servers, hoping to 
average the pressure. Could it be caused by a race between NFS servers 
on different nodes? Should I instead create a dedicated NFS Server with 
huge memory, no brick, and multiple Ethernet cables? 

I really appreciate any help from you guys. 

Best wishes, 
Chen 

PS. Don't know why the native fuse client is 5 times inferior than the 
old good NFSv3. 

On 4/4/2016 6:11 PM, Ashish Pandey wrote: > Hi Chen, 
> 
> As I suspected, there are many blocked call for inodelk in
sm11/mnt-disk1-mainvol.31115.dump.1459760675.
> 
> ============================================= 
> [xlator.features.locks.mainvol-locks.inode] 
> path=/home/analyzer/softs/bin/GenomeAnalysisTK.jar 
> mandatory=0 
> inodelk-count=4 
> lock-dump.domain.domain=mainvol-disperse-0:self-heal 
> lock-dump.domain.domain=mainvol-disperse-0 
> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 1,
owner=dc2d3dfcc57f0000, client=0x7ff03435d5f0,
connection-id=sm12-8063-2016/04/01-07:51:46:892384-mainvol-client-0-0-0, blocked
at 2016-04-01 16:52:58, granted at 2016-04-01 16:52:58
> inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1,
owner=1414371e1a7f0000, client=0x7ff034204490,
connection-id=hw10-17315-2016/04/01-07:51:44:421807-mainvol-client-0-0-0,
blocked at 2016-04-01 16:58:51
> inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1,
owner=a8eb14cd9b7f0000, client=0x7ff01400dbd0,
connection-id=sm14-879-2016/04/01-07:51:56:133106-mainvol-client-0-0-0, blocked
at 2016-04-01 17:03:41
> inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1,
owner=b41a0482867f0000, client=0x7ff01800e670,
connection-id=sm15-30906-2016/04/01-07:51:45:711474-mainvol-client-0-0-0,
blocked at 2016-04-01 17:05:09
> ============================================= 
> 
> This could be the cause of hang. 
> Possible Workaround - 
> If there is no IO going on for this volume, we can restart the volume using
- gluster v start <volume-name> force. This will restart the nfs process
too which will release the locks and
> we could come out of this issue. 
> 
> Ashish 
-- 
Chen Chen 
Shanghai SmartQuerier Biotechnology Co., Ltd. 
Add: Add: 3F, 1278 Keyuan Road, Shanghai 201203, P. R. China 
Mob: +86 15221885893 
Email: chenchen at smartquerier.com 
Web: www.smartquerier.com 


_______________________________________________ 
Gluster-users mailing list 
Gluster-users at gluster.org 
http://www.gluster.org/mailman/listinfo/gluster-users 

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160413/7abd8312/attachment.html>

Chen Chen

2016-Apr-13 13:44 UTC

head link

[Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd

Hi Ashish,

Thank you for your quick response!

Yes, the volume became unresponsive in 3 minutes after I initiate the 
rsync. Nodes go down one by one. The cluster monitor is showing me the 
whole procedure.

At first the node which was mounted by my rsync client (sm15) boosted to 
huge load (55, was <10 before). Then in a few seconds, another node's 
network I/O dropped to ~zero, then the 3rd, 4th, all went down one by one.

On all NFS clients, "strace ls /data" (my mount point) stucked at 
"stat("/data",".

The locked node is not reachable by ssh now, but peer status said it is 
connected, and volume status reports its NFS and bricks are online. My 
cluster monitor daemon is also alive.

"gluster volume start <volname> force" reports timeout.
"showmount -e
<nodename>" on other nodes works except the locked node. If I force 
shutdown the locked node or unplug its 10Gb cable, the volume will 
return to work in no time.

statedump showed these lines (as you have noted out before):
=====================[xlator.features.locks.mainvol-locks.inode]
path=/home/analyzer/workdir/NTD/bam/A1703.bam
mandatory=0
inodelk-count=11
lock-dump.domain.domain=mainvol-disperse-0:self-heal
lock-dump.domain.domain=mainvol-disperse-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 
1, owner=3cc5a0e4627f0000, client=0x7f03b8082150, 
connection-id=hw10-48926-2016/04/13-07:23:01:395332-mainvol-client-0-0, 
blocked at 2016-04-13 08:30:11, granted at 2016-04-13 08:31:09
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 
1, owner=80894744477f0000, client=0x7f03b80a8ef0, 
connection-id=sm12-4956-2016/04/13-07:22:44:529032-mainvol-client-0-0, 
blocked at 2016-04-13 08:31:09
inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 
18446744073709551610, owner=1827fddb617f0000, client=0x7f03b80860a0, 
connection-id=sm16-4859-2016/04/13-07:22:42:791688-mainvol-client-0-0, 
blocked at 2016-04-13 08:31:09
inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 
18446744073709551610, owner=6c20fddb617f0000, client=0x7f03b80860a0, 
connection-id=sm16-4859-2016/04/13-07:22:42:791688-mainvol-client-0-0, 
blocked at 2016-04-13 08:31:09
...
=====================
Since the performance of native fuse client is undesirable, I used a 
round-robin DNS policy to distribute load over the cluster and provide 
fail-over. Clients who is requesting mount will be resolved into 
different node, so the load could got averaged (NFS mount do cause heavy 
memory footprint on the server, right?). The client will tied to the 
specific node until it umount the share.

According to statedump, the inode locks were always granted to one node 
and other nodes' requests get blocked. I was thinking, maybe the 
decentralized NFS server cluster caused this race situation?

Then here's the volume info. I tweaked a lot, trying to boost its 
performance while keeping it stable. I have encountered inode lock 4 
times since I sent the first E-mail in this thread asking for help.
=====================Volume Name: mainvol
Type: Distributed-Disperse
Volume ID: 2e190c59-9e28-43a5-b22a-24f75e9a580b
Status: Started
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: sm11:/mnt/disk1/mainvol
Brick2: sm12:/mnt/disk1/mainvol
Brick3: sm13:/mnt/disk1/mainvol
Brick4: sm14:/mnt/disk2/mainvol
Brick5: sm15:/mnt/disk2/mainvol
Brick6: sm16:/mnt/disk2/mainvol
Brick7: sm11:/mnt/disk2/mainvol
Brick8: sm12:/mnt/disk2/mainvol
Brick9: sm13:/mnt/disk2/mainvol
Brick10: sm14:/mnt/disk1/mainvol
Brick11: sm15:/mnt/disk1/mainvol
Brick12: sm16:/mnt/disk1/mainvol
Options Reconfigured:
performance.nfs.quick-read: on
performance.nfs.io-cache: on
performance.nfs.io-threads: on
performance.client-io-threads: on
performance.nfs.read-ahead: on
performance.nfs.write-behind-window-size: 4MB
performance.nfs.stat-prefetch: on
performance.stat-prefetch: on
nfs.acl: off
features.lock-heal: on
features.grace-timeout: 120
server.outstanding-rpc-limit: 128
network.remote-dio: on
performance.io-cache: true
performance.readdir-ahead: on
auth.allow: 172.16.135.*
performance.cache-size: 16GB
client.event-threads: 8
server.event-threads: 8
performance.io-thread-count: 32
performance.write-behind-window-size: 4MB
diagnostics.client-log-level: WARNING
diagnostics.brick-log-level: WARNING
cluster.lookup-optimize: on
cluster.readdir-optimize: on
nfs.rpc-auth-allow: 172.168.135.*,127.0.0.1,::1
=====================
The locked node cannot be reached now. statedump [nfs] of other nodes 
were attached. I also attached the /var/log/gluster from one node (sm11) 
as a representative. The attachment is too big for the mailing list. It 
is available at 
"https://dl.dropboxusercontent.com/u/56671522/inodelock.tar.xz".

Best wishes,
Chen

On 4/13/2016 6:29 PM, Ashish Pandey wrote:> Hi Chen,
>
> What do you mean by "instantly get inode locked and teared down
> the whole cluster" ? Do you mean that whole disperse volume became
> unresponsive?
>
> I don't have much idea about features.lock-heal so can't comment
how can
> it help you.
>
> Could you please explain second part of your mail? What exactly are you
> trying to do and what is the setup?
> Also volume info, logs statedumps might help.
>
> -----
> Ashish
>
>
> ------------------------------------------------------------------------
> *From: *"Chen Chen" <chenchen at smartquerier.com>
> *To: *"Ashish Pandey" <aspandey at redhat.com>
> *Cc: *gluster-users at gluster.org
> *Sent: *Wednesday, April 13, 2016 3:26:53 PM
> *Subject: *Re: [Gluster-users] Need some help on Mismatching xdata /
> Failed combine iatt / Too many fd
>
> Hi Ashish and other Gluster Users,
>
> When I put some heavy IO load onto my cluster (a rsync operation,
> ~600MB/s), one of the node instantly get inode locked and teared down
> the whole cluster. I've already turned on
"features.lock-heal" but it
> didn't help.
>
> My clients is using a round-robin tactic to mount servers, hoping to
> average the pressure. Could it be caused by a race between NFS servers
> on different nodes? Should I instead create a dedicated NFS Server with
> huge memory, no brick, and multiple Ethernet cables?
>
> I really appreciate any help from you guys.
>
> Best wishes,
> Chen
>
> PS. Don't know why the native fuse client is 5 times inferior than the
> old good NFSv3.
>
> On 4/4/2016 6:11 PM, Ashish Pandey wrote:
>> Hi Chen,
>>
>> As I suspected, there are many blocked call for inodelk in
> sm11/mnt-disk1-mainvol.31115.dump.1459760675.
>>
>> ============================================>>
[xlator.features.locks.mainvol-locks.inode]
>> path=/home/analyzer/softs/bin/GenomeAnalysisTK.jar
>> mandatory=0
>> inodelk-count=4
>> lock-dump.domain.domain=mainvol-disperse-0:self-heal
>> lock-dump.domain.domain=mainvol-disperse-0
>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid
> 1, owner=dc2d3dfcc57f0000, client=0x7ff03435d5f0,
> connection-id=sm12-8063-2016/04/01-07:51:46:892384-mainvol-client-0-0-0,
> blocked at 2016-04-01 16:52:58, granted at 2016-04-01 16:52:58
>> inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid
> = 1, owner=1414371e1a7f0000, client=0x7ff034204490,
> connection-id=hw10-17315-2016/04/01-07:51:44:421807-mainvol-client-0-0-0,
blocked
> at 2016-04-01 16:58:51
>> inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid
> = 1, owner=a8eb14cd9b7f0000, client=0x7ff01400dbd0,
> connection-id=sm14-879-2016/04/01-07:51:56:133106-mainvol-client-0-0-0,
> blocked at 2016-04-01 17:03:41
>> inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid
> = 1, owner=b41a0482867f0000, client=0x7ff01800e670,
> connection-id=sm15-30906-2016/04/01-07:51:45:711474-mainvol-client-0-0-0,
blocked
> at 2016-04-01 17:05:09
>> ============================================>>
>> This could be the cause of hang.
>> Possible Workaround -
>> If there is no IO going on for this volume, we can restart the volume
> using - gluster v start <volume-name> force. This will restart the
nfs
> process too which will release the locks and
>> we could come out of this issue.
>>
>> Ashish
-- 
Chen Chen
Shanghai SmartQuerier Biotechnology Co., Ltd.
Add: Add: 3F, 1278 Keyuan Road, Shanghai 201203, P. R. China
Mob: +86 15221885893
Email: chenchen at smartquerier.com
Web: www.smartquerier.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4169 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160413/d26b37ba/attachment.p7s>

Joe Julian

2016-Apr-13 14:31 UTC

head link

[Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd

On 04/13/2016 03:29 AM, Ashish Pandey wrote:> Hi Chen,
>
> What do you mean by "instantly get inode locked and teared down
> the whole cluster" ? Do you mean that whole disperse volume became 
> unresponsive?
>
> I don't have much idea about features.lock-heal so can't comment
how
> can it help you.
So who should get added to this email that would have an idea? Let's get 
that person looped in.
>
> Could you please explain second part of your mail? What exactly are 
> you trying to do and what is the setup?
> Also volume info, logs statedumps might help.
>
> -----
> Ashish
>
>
> ------------------------------------------------------------------------
> *From: *"Chen Chen" <chenchen at smartquerier.com>
> *To: *"Ashish Pandey" <aspandey at redhat.com>
> *Cc: *gluster-users at gluster.org
> *Sent: *Wednesday, April 13, 2016 3:26:53 PM
> *Subject: *Re: [Gluster-users] Need some help on Mismatching xdata / 
> Failed combine iatt / Too many fd
>
> Hi Ashish and other Gluster Users,
>
> When I put some heavy IO load onto my cluster (a rsync operation,
> ~600MB/s), one of the node instantly get inode locked and teared down
> the whole cluster. I've already turned on
"features.lock-heal" but it
> didn't help.
>
> My clients is using a round-robin tactic to mount servers, hoping to
> average the pressure. Could it be caused by a race between NFS servers
> on different nodes? Should I instead create a dedicated NFS Server with
> huge memory, no brick, and multiple Ethernet cables?
>
> I really appreciate any help from you guys.
>
> Best wishes,
> Chen
>
> PS. Don't know why the native fuse client is 5 times inferior than the
> old good NFSv3.
>
> On 4/4/2016 6:11 PM, Ashish Pandey wrote:
> > Hi Chen,
> >
> > As I suspected, there are many blocked call for inodelk in 
> sm11/mnt-disk1-mainvol.31115.dump.1459760675.
> >
> > ============================================> >
[xlator.features.locks.mainvol-locks.inode]
> > path=/home/analyzer/softs/bin/GenomeAnalysisTK.jar
> > mandatory=0
> > inodelk-count=4
> > lock-dump.domain.domain=mainvol-disperse-0:self-heal
> > lock-dump.domain.domain=mainvol-disperse-0
> > inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid 
> = 1, owner=dc2d3dfcc57f0000, client=0x7ff03435d5f0, 
> connection-id=sm12-8063-2016/04/01-07:51:46:892384-mainvol-client-0-0-0, 
> blocked at 2016-04-01 16:52:58, granted at 2016-04-01 16:52:58
> > inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, 
> pid = 1, owner=1414371e1a7f0000, client=0x7ff034204490, 
> connection-id=hw10-17315-2016/04/01-07:51:44:421807-mainvol-client-0-0-0, 
> blocked at 2016-04-01 16:58:51
> > inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, 
> pid = 1, owner=a8eb14cd9b7f0000, client=0x7ff01400dbd0, 
> connection-id=sm14-879-2016/04/01-07:51:56:133106-mainvol-client-0-0-0,
blocked
> at 2016-04-01 17:03:41
> > inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, 
> pid = 1, owner=b41a0482867f0000, client=0x7ff01800e670, 
> connection-id=sm15-30906-2016/04/01-07:51:45:711474-mainvol-client-0-0-0, 
> blocked at 2016-04-01 17:05:09
> > ============================================> >
> > This could be the cause of hang.
> > Possible Workaround -
> > If there is no IO going on for this volume, we can restart the 
> volume using - gluster v start <volume-name> force. This will restart
> the nfs process too which will release the locks and
> > we could come out of this issue.
> >
> > Ashish
>
> -- 
> Chen Chen
> Shanghai SmartQuerier Biotechnology Co., Ltd.
> Add: Add: 3F, 1278 Keyuan Road, Shanghai 201203, P. R. China
> Mob: +86 15221885893
> Email: chenchen at smartquerier.com
> Web: www.smartquerier.com
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160413/ed341e42/attachment.html>

Gluster users - Apr 2016 - Need some help on Mismatching xdata / Failed combine iatt / Too many fd

[Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd

[Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd

[Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd