thr3ads.net - Gluster users - [Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd [Apr 2016]

If this information is useful, please help other people find it:
Share via:

Joe Julian

2016-Apr-13 14:31 UTC

[Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd

On 04/13/2016 03:29 AM, Ashish Pandey wrote:> Hi Chen,
>
> What do you mean by "instantly get inode locked and teared down
> the whole cluster" ? Do you mean that whole disperse volume became 
> unresponsive?
>
> I don't have much idea about features.lock-heal so can't comment
how
> can it help you.
So who should get added to this email that would have an idea? Let's get 
that person looped in.
>
> Could you please explain second part of your mail? What exactly are 
> you trying to do and what is the setup?
> Also volume info, logs statedumps might help.
>
> -----
> Ashish
>
>
> ------------------------------------------------------------------------
> *From: *"Chen Chen" <chenchen at smartquerier.com>
> *To: *"Ashish Pandey" <aspandey at redhat.com>
> *Cc: *gluster-users at gluster.org
> *Sent: *Wednesday, April 13, 2016 3:26:53 PM
> *Subject: *Re: [Gluster-users] Need some help on Mismatching xdata / 
> Failed combine iatt / Too many fd
>
> Hi Ashish and other Gluster Users,
>
> When I put some heavy IO load onto my cluster (a rsync operation,
> ~600MB/s), one of the node instantly get inode locked and teared down
> the whole cluster. I've already turned on
"features.lock-heal" but it
> didn't help.
>
> My clients is using a round-robin tactic to mount servers, hoping to
> average the pressure. Could it be caused by a race between NFS servers
> on different nodes? Should I instead create a dedicated NFS Server with
> huge memory, no brick, and multiple Ethernet cables?
>
> I really appreciate any help from you guys.
>
> Best wishes,
> Chen
>
> PS. Don't know why the native fuse client is 5 times inferior than the
> old good NFSv3.
>
> On 4/4/2016 6:11 PM, Ashish Pandey wrote:
> > Hi Chen,
> >
> > As I suspected, there are many blocked call for inodelk in 
> sm11/mnt-disk1-mainvol.31115.dump.1459760675.
> >
> > ============================================> >
[xlator.features.locks.mainvol-locks.inode]
> > path=/home/analyzer/softs/bin/GenomeAnalysisTK.jar
> > mandatory=0
> > inodelk-count=4
> > lock-dump.domain.domain=mainvol-disperse-0:self-heal
> > lock-dump.domain.domain=mainvol-disperse-0
> > inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid 
> = 1, owner=dc2d3dfcc57f0000, client=0x7ff03435d5f0, 
> connection-id=sm12-8063-2016/04/01-07:51:46:892384-mainvol-client-0-0-0, 
> blocked at 2016-04-01 16:52:58, granted at 2016-04-01 16:52:58
> > inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, 
> pid = 1, owner=1414371e1a7f0000, client=0x7ff034204490, 
> connection-id=hw10-17315-2016/04/01-07:51:44:421807-mainvol-client-0-0-0, 
> blocked at 2016-04-01 16:58:51
> > inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, 
> pid = 1, owner=a8eb14cd9b7f0000, client=0x7ff01400dbd0, 
> connection-id=sm14-879-2016/04/01-07:51:56:133106-mainvol-client-0-0-0,
blocked
> at 2016-04-01 17:03:41
> > inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, 
> pid = 1, owner=b41a0482867f0000, client=0x7ff01800e670, 
> connection-id=sm15-30906-2016/04/01-07:51:45:711474-mainvol-client-0-0-0, 
> blocked at 2016-04-01 17:05:09
> > ============================================> >
> > This could be the cause of hang.
> > Possible Workaround -
> > If there is no IO going on for this volume, we can restart the 
> volume using - gluster v start <volume-name> force. This will restart
> the nfs process too which will release the locks and
> > we could come out of this issue.
> >
> > Ashish
>
> -- 
> Chen Chen
> Shanghai SmartQuerier Biotechnology Co., Ltd.
> Add: Add: 3F, 1278 Keyuan Road, Shanghai 201203, P. R. China
> Mob: +86 15221885893
> Email: chenchen at smartquerier.com
> Web: www.smartquerier.com
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160413/ed341e42/attachment.html>

Chen Chen

2016-Apr-22 02:58 UTC

head link

[Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd

Hi Ashish,

Are you still watching this thread? I got no response after I sent the 
info you requested. Also, could anybody explain what heal-lock is doing?

I got another inode lock yesterday. Only one lock occured in the whole 
12 bricks, yet it stopped the cluster from working again. None of my 
peer's OS is frozen, and this time "start force" worked.

------
[xlator.features.locks.mainvol-locks.inode]
path=<gfid:2092ae08-81de-4717-a7d5-6ad955e18b58>/NTD/variants_calling/primary_gvcf/A2612/13.g.vcf
mandatory=0
inodelk-count=2
lock-dump.domain.domain=mainvol-disperse-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 
1, owner=dc3dbfac887f0000, client=0x7f649835adb0, 
connection-id=hw10-6664-2016/04/17-14:47:58:6629-mainvol-client-0-0, 
granted at 2016-04-21 11:45:30
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 
1, owner=d433bfac887f0000, client=0x7f649835adb0, 
connection-id=hw10-6664-2016/04/17-14:47:58:6629-mainvol-client-0-0, 
blocked at 2016-04-21 11:45:33
------

I've also filed a bug report on bugzilla.
https://bugzilla.redhat.com/show_bug.cgi?id=1329466

Best regards,
Chen

On 4/13/2016 10:31 PM, Joe Julian wrote:>
>
> On 04/13/2016 03:29 AM, Ashish Pandey wrote:
>> Hi Chen,
>>
>> What do you mean by "instantly get inode locked and teared down
>> the whole cluster" ? Do you mean that whole disperse volume became
>> unresponsive?
>>
>> I don't have much idea about features.lock-heal so can't
comment how
>> can it help you.
>
> So who should get added to this email that would have an idea? Let's
get
> that person looped in.
>
>>
>> Could you please explain second part of your mail? What exactly are
>> you trying to do and what is the setup?
>> Also volume info, logs statedumps might help.
>>
>> -----
>> Ashish
>>
>>
>>
------------------------------------------------------------------------
>> *From: *"Chen Chen" <chenchen at smartquerier.com>
>> *To: *"Ashish Pandey" <aspandey at redhat.com>
>> *Cc: *gluster-users at gluster.org
>> *Sent: *Wednesday, April 13, 2016 3:26:53 PM
>> *Subject: *Re: [Gluster-users] Need some help on Mismatching xdata /
>> Failed combine iatt / Too many fd
>>
>> Hi Ashish and other Gluster Users,
>>
>> When I put some heavy IO load onto my cluster (a rsync operation,
>> ~600MB/s), one of the node instantly get inode locked and teared down
>> the whole cluster. I've already turned on
"features.lock-heal" but it
>> didn't help.
>>
>> My clients is using a round-robin tactic to mount servers, hoping to
>> average the pressure. Could it be caused by a race between NFS servers
>> on different nodes? Should I instead create a dedicated NFS Server with
>> huge memory, no brick, and multiple Ethernet cables?
>>
>> I really appreciate any help from you guys.
>>
>> Best wishes,
>> Chen
>>
>> PS. Don't know why the native fuse client is 5 times inferior than
the
>> old good NFSv3.
>>
>> On 4/4/2016 6:11 PM, Ashish Pandey wrote:
>> > Hi Chen,
>> >
>> > As I suspected, there are many blocked call for inodelk in
>> sm11/mnt-disk1-mainvol.31115.dump.1459760675.
>> >
>> > ============================================>> >
[xlator.features.locks.mainvol-locks.inode]
>> > path=/home/analyzer/softs/bin/GenomeAnalysisTK.jar
>> > mandatory=0
>> > inodelk-count=4
>> > lock-dump.domain.domain=mainvol-disperse-0:self-heal
>> > lock-dump.domain.domain=mainvol-disperse-0
>> > inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
pid
>> = 1, owner=dc2d3dfcc57f0000, client=0x7ff03435d5f0,
>>
connection-id=sm12-8063-2016/04/01-07:51:46:892384-mainvol-client-0-0-0,
>> blocked at 2016-04-01 16:52:58, granted at 2016-04-01 16:52:58
>> > inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0,
>> pid = 1, owner=1414371e1a7f0000, client=0x7ff034204490,
>>
connection-id=hw10-17315-2016/04/01-07:51:44:421807-mainvol-client-0-0-0,
>> blocked at 2016-04-01 16:58:51
>> > inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0,
>> pid = 1, owner=a8eb14cd9b7f0000, client=0x7ff01400dbd0,
>> connection-id=sm14-879-2016/04/01-07:51:56:133106-mainvol-client-0-0-0,
blocked
>> at 2016-04-01 17:03:41
>> > inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0,
>> pid = 1, owner=b41a0482867f0000, client=0x7ff01800e670,
>>
connection-id=sm15-30906-2016/04/01-07:51:45:711474-mainvol-client-0-0-0,
>> blocked at 2016-04-01 17:05:09
>> > ============================================>> >
>> > This could be the cause of hang.
>> > Possible Workaround -
>> > If there is no IO going on for this volume, we can restart the
>> volume using - gluster v start <volume-name> force. This will
restart
>> the nfs process too which will release the locks and
>> > we could come out of this issue.
>> >
>> > Ashish
>>
>> --
>> Chen Chen
>> Shanghai SmartQuerier Biotechnology Co., Ltd.
>> Add: Add: 3F, 1278 Keyuan Road, Shanghai 201203, P. R. China
>> Mob: +86 15221885893
>> Email: chenchen at smartquerier.com
>> Web: www.smartquerier.com
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>
-- 
Chen Chen
Shanghai SmartQuerier Biotechnology Co., Ltd.
Add: Add: 3F, 1278 Keyuan Road, Shanghai 201203, P. R. China
Mob: +86 15221885893
Email: chenchen at smartquerier.com
Web: www.smartquerier.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4260 bytes
Desc: S/MIME Cryptographic Signature
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160422/6f7adcc9/attachment.p7s>

Gluster users - Apr 2016 - Need some help on Mismatching xdata / Failed combine iatt / Too many fd

[Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd

[Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd