thr3ads.net - Gluster users - [Gluster-users] Cascading errors and very bad write performance [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Vijaikumar M

2015-Aug-07 12:57 UTC

[Gluster-users] Cascading errors and very bad write performance

On Friday 07 August 2015 05:34 PM, Geoffrey Letessier
wrote:> Hi Vijay,
>
> My brick logs issue and big performance problem have begun when I 
> upgraded Gluster into 3.7.3 version; before write throughput was good 
> enough (~500MBs) -but not as good as with GlusterFS 3.5.3 (especially 
> with distributed volumes)- and didn?t notice these probl?me with 
> brick-logs.
>
> OK? in live:
>
> i just disable to quota for my home volume and now my performance 
> appears to be relatively better (around 300MBs) but i still see the 
> logs (from storage1 and its replicate storage2) growing up with only 
> this kind of lines:
> [2015-08-07 11:16:51.746142] E [dict.c:1418:dict_copy_with_ref] 
>
(-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)
> [0x7f85e9a6a410] 
>
-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)
> [0x7f85e9a6a188] 
> -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) 
> [0x3e99c20674] ) 0-dict: invalid argument: dict [Argument invalide]
>We have root caused log issue,  bug# 1244613 tracks this issue

> After a few minutes: my write throughput seems to be now correct 
> (~550MBs) but the log are still growing up (to not say exploding). So 
> one part of the problem looks like taking its origin in the quota 
> system management.
> ? after a few minutes (and still only 1 client connected), now it is 
> the read operation which is very very slow? -I?m gonna become crazy! :/-
> # ddt -t 50g /home/
> Writing to /home/ddt.11293 ... syncing ... done.
> sleeping 10 seconds ... done.
> Reading from /home/ddt.11293 ... done.
> 35840MiB    KiB/s  CPU%
> Write      568201     5
> Read       567008     4
> # ddt -t 50g /home/
> Writing to /home/ddt.11397 ... syncing ... done.
> sleeping 10 seconds ... done.
> Reading from /home/ddt.11397 ... done.
> 51200MiB    KiB/s  CPU%
> Write      573631     5
> Read       164716     1
>
> and my log are still exploding?
>
> After having re-enabled the quota on my volume:
> # ddt -t 50g /home/
> Writing to /home/ddt.11817 ... syncing ... done.
> sleeping 10 seconds ... done.
> Reading from /home/ddt.11817 ... done.
> 51200MiB    KiB/s  CPU%
> Write      269608     3
> Read       160219     1
>
> Thanks
> Geoffrey
> ------------------------------------------------------
> Geoffrey Letessier
> Responsable informatique & ing?nieur syst?me
> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr 
> <mailto:geoffrey.letessier at ibpc.fr>
>
> Le 7 ao?t 2015 ? 06:28, Vijaikumar M <vmallika at redhat.com 
> <mailto:vmallika at redhat.com>> a ?crit :
>
>> Hi Geoffrey,
>>
>> Some performance improvements has been done in quota in
glusterfs-3.7.3.
>> Could you upgrade to glusterfs-3.7.3 and see if this helps
>>
>> Thanks,
>> Vijay
>>
>>
>> On Friday 07 August 2015 05:02 AM, Geoffrey Letessier wrote:
>>> Hi,
>>>
>>> No idea to help me fix this issue? (big logs, small write 
>>> performance (/4), etc.)
>>>
>>> For comparison, here to volumes:
>>> - home: distributed on 4 bricks / 2 nodes  (and replicated on 4 
>>> other bricks / 2 other nodes):
>>> # ddt -t 35g /home
>>> Writing to /home/ddt.24172 ... syncing ... done.
>>> sleeping 10 seconds ... done.
>>> Reading from /home/ddt.24172 ... done.
>>> 33792MiB    KiB/s  CPU%
>>> Write      103659     1
>>> Read       391955     3
>>>
>>> - workdir: distributed on 4 bricks / 2 nodes (one the same RAID 
>>> volumes and servers than home):
>>> # ddt -t 35g /workdir
>>> Writing to /workdir/ddt.24717 ... syncing ... done.
>>> sleeping 10 seconds ... done.
>>> Reading from /workdir/ddt.24717 ... done.
>>> 35840MiB    KiB/s  CPU%
>>> Write      738314     4
>>> Read       536497     4
>>>
>>> For information, previously on 3.5.3-2 version, I obtained roughly 
>>> 1.1GBs for workdir volume and ~550-600MBs for home.
>>>
>>> All my tests (CP, RSYNC, etc.) provides me the same result (write 
>>> throughput between 100MBs and 150MBs)
>>>
>>> Thanks.
>>> Geoffrey
>>> ------------------------------------------------------
>>> Geoffrey Letessier
>>> Responsable informatique & ing?nieur syst?me
>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
>>> Institut de Biologie Physico-Chimique
>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr 
>>> <mailto:geoffrey.letessier at ibpc.fr>
>>>
>>> Le 5 ao?t 2015 ? 10:40, Geoffrey Letessier 
>>> <geoffrey.letessier at cnrs.fr <mailto:geoffrey.letessier at
cnrs.fr>> a
>>> ?crit :
>>>
>>>> Hello,
>>>>
>>>> In addition, knowing I have reactivated the log
(brick-log-level =
>>>> INFO not CRITICAL) only for the file creation duration (i.e. a
few
>>>> minutes), do you have noticed the log sizes and the number of
lines
>>>> inside:
>>>> # ls -lh storage*
>>>> -rw------- 1 letessier  staff    18M  5 ao? 00:54 
>>>> storage1__export-brick_home-brick1-data.log
>>>> -rw------- 1 letessier  staff   2,1K  5 ao? 00:54 
>>>> storage1__export-brick_home-brick2-data.log
>>>> -rw------- 1 letessier  staff    15M  5 ao? 00:56 
>>>> storage2__export-brick_home-brick1-data.log
>>>> -rw------- 1 letessier  staff   2,1K  5 ao? 00:54 
>>>> storage2__export-brick_home-brick2-data.log
>>>> -rw------- 1 letessier  staff    47M  5 ao? 00:55 
>>>> storage3__export-brick_home-brick1-data.log
>>>> -rw------- 1 letessier  staff   2,1K  5 ao? 00:54 
>>>> storage3__export-brick_home-brick2-data.log
>>>> -rw------- 1 letessier  staff    47M  5 ao? 00:55 
>>>> storage4__export-brick_home-brick1-data.log
>>>> -rw------- 1 letessier  staff   2,1K  5 ao? 00:55 
>>>> storage4__export-brick_home-brick2-data.log
>>>>
>>>> # wc -l storage*
>>>>    55381 storage1__export-brick_home-brick1-data.log
>>>>     17 storage1__export-brick_home-brick2-data.log
>>>>    41636 storage2__export-brick_home-brick1-data.log
>>>>     17 storage2__export-brick_home-brick2-data.log
>>>>   270360 storage3__export-brick_home-brick1-data.log
>>>>     17 storage3__export-brick_home-brick2-data.log
>>>>   270358 storage4__export-brick_home-brick1-data.log
>>>>     17 storage4__export-brick_home-brick2-data.log
>>>>   637803 total
>>>>
>>>> If the let brick-log-level to INFO, the brick log files in each
>>>> server will consume all my /var partition capacity within only
a
>>>> few hours/days?
>>>>
>>>> Thanks in advance,
>>>> Geoffrey
>>>> ------------------------------------------------------
>>>> Geoffrey Letessier
>>>> Responsable informatique & ing?nieur syst?me
>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
>>>> Institut de Biologie Physico-Chimique
>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr 
>>>> <mailto:geoffrey.letessier at ibpc.fr>
>>>>
>>>> Le 5 ao?t 2015 ? 01:12, Geoffrey Letessier 
>>>> <geoffrey.letessier at cnrs.fr <mailto:geoffrey.letessier
at cnrs.fr>> a
>>>> ?crit :
>>>>
>>>>> Hello,
>>>>>
>>>>> Since the problem motioned previously (all errors noticed
in brick
>>>>> log files), i notice a very very bad performance: i can
note my
>>>>> write performance divided by 4 than previously -knowing it
was not
>>>>> so good before.
>>>>> Now, a write of a 33GB file, my write throughput is around
150MBs
>>>>> (with Infiniband), before it was around 550-600MBs; and
this, both
>>>>> with RDMA and TCP protocol.
>>>>>
>>>>> During this test, more than 40 000 error lines (as the
following)
>>>>> were added to the brick log files.
>>>>> [2015-08-04 22:34:27.337622] E
[dict.c:1418:dict_copy_with_ref]
>>>>>
(-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)
>>>>> [0x7f021c6f7410] 
>>>>>
-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)
>>>>> [0x7f021c6f7188] 
>>>>> -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)
>>>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument
invalide]
>>>>>
>>>>>
>>>>> All brick log files are in attachments.
>>>>>
>>>>> Thanks in advance for all your help and fix,
>>>>> Best,
>>>>> Geoffrey
>>>>>
>>>>> PS: question: is it possible to easily downgrade GlusterFS
to a
>>>>> previous version from 3.7 (for example: v3.5)?
>>>>>
>>>>> ------------------------------------------------------
>>>>> Geoffrey Letessier
>>>>> Responsable informatique & ing?nieur syst?me
>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
>>>>> Institut de Biologie Physico-Chimique
>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr 
>>>>> <mailto:geoffrey.letessier at ibpc.fr>
>>>>> <bricks-logs.tgz> 
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150807/d9046875/attachment.html>

Geoffrey Letessier

2015-Aug-07 21:47 UTC

head link

[Gluster-users] Cascading errors and very bad write performance

I?m not really sure to well understand your answer.

I try to set inode-lru-limit to 1, I can not notice any good effect. 

When i re-run ddt application, I can note 2 kinds of messages:
[2015-08-07 21:29:21.792156] W [marker-quota.c:3379:_mq_initiate_quota_txn]
0-vol_home-marker: parent is NULL for
<gfid:5a32328a-7fd9-474e-9bc6-cafde9c41af7>, aborting updation txn
[2015-08-07 21:29:21.792176] W [marker-quota.c:3379:_mq_initiate_quota_txn]
0-vol_home-marker: parent is NULL for
<gfid:5a32328a-7fd9-474e-9bc6-cafde9c41af7>, aborting updation txn

and/or:
[2015-08-07 21:44:19.279971] E [marker-quota.c:2990:mq_start_quota_txn_v2]
0-vol_home-marker: contribution node list is empty
(31d7bf88-b63a-4731-a737-a3dce73b8cd1)
[2015-08-07 21:41:26.177095] E [dict.c:1418:dict_copy_with_ref]
(-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)
[0x7f85e9a6a410]
-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)
[0x7f85e9a6a188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)
[0x3e99c20674] ) 0-dict: invalid argument: dict [Argument invalide]

And concerning the bad IO performance?

[letessier at node031 ~]$ ddt -t 35g /home/admin_team/letessier/
Writing to /home/admin_team/letessier/ddt.25259 ... syncing ... done.
sleeping 10 seconds ... done.
Reading from /home/admin_team/letessier/ddt.25259 ... done.
35840MiB    KiB/s  CPU%
Write      277451     3
Read       188682     1
[letessier at node031 ~]$ logout
[root at node031 ~]# ddt -t 35g /home/
Writing to /home/ddt.25559 ... syncing ... done.
sleeping 10 seconds ... done.
Reading from /home/ddt.25559 ... done.
35840MiB    KiB/s  CPU%
Write      196539     2
Read       438944     3
Notice the read/write throughput differences when i?m root and when i?m a simple
user.

Thanks.
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ing?nieur syst?me
UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr

Le 7 ao?t 2015 ? 14:57, Vijaikumar M <vmallika at redhat.com> a ?crit :
> 
> 
> On Friday 07 August 2015 05:34 PM, Geoffrey Letessier wrote:
>> Hi Vijay, 
>> 
>> My brick logs issue and big performance problem have begun when I
upgraded Gluster into 3.7.3 version; before write throughput was good enough
(~500MBs) -but not as good as with GlusterFS 3.5.3 (especially with distributed
volumes)- and didn?t notice these probl?me with brick-logs.
>> 
>> OK? in live:
>> 
>> i just disable to quota for my home volume and now my performance
appears to be relatively better (around 300MBs) but i still see the logs (from
storage1 and its replicate storage2) growing up with only this kind of lines:
>> [2015-08-07 11:16:51.746142] E [dict.c:1418:dict_copy_with_ref]
(-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)
[0x7f85e9a6a410]
-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)
[0x7f85e9a6a188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)
[0x3e99c20674] ) 0-dict: invalid argument: dict [Argument invalide]
>> 
> We have root caused log issue,  bug# 1244613 tracks this issue
> 
> 
>> After a few minutes: my write throughput seems to be now correct
(~550MBs) but the log are still growing up (to not say exploding). So one part
of the problem looks like taking its origin in the quota system management.
>> ? after a few minutes (and still only 1 client connected), now it is
the read operation which is very very slow? -I?m gonna become crazy! :/-
>> # ddt -t 50g /home/
>> Writing to /home/ddt.11293 ... syncing ... done.
>> sleeping 10 seconds ... done.
>> Reading from /home/ddt.11293 ... done.
>> 35840MiB    KiB/s  CPU%
>> Write      568201     5
>> Read       567008     4
>> # ddt -t 50g /home/
>> Writing to /home/ddt.11397 ... syncing ... done.
>> sleeping 10 seconds ... done.
>> Reading from /home/ddt.11397 ... done.
>> 51200MiB    KiB/s  CPU%
>> Write      573631     5
>> Read       164716     1
>> 
>> and my log are still exploding?
>> 
>> After having re-enabled the quota on my volume: 
>> # ddt -t 50g /home/
>> Writing to /home/ddt.11817 ... syncing ... done.
>> sleeping 10 seconds ... done.
>> Reading from /home/ddt.11817 ... done.
>> 51200MiB    KiB/s  CPU%
>> Write      269608     3
>> Read       160219     1
>> 
>> Thanks 
>> Geoffrey 
>> ------------------------------------------------------
>> Geoffrey Letessier
>> Responsable informatique & ing?nieur syst?me
>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
>> Institut de Biologie Physico-Chimique
>> 13, rue Pierre et Marie Curie - 75005 Paris
>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>> 
>> Le 7 ao?t 2015 ? 06:28, Vijaikumar M <vmallika at redhat.com> a
?crit :
>> 
>>> Hi Geoffrey,
>>> 
>>> Some performance improvements has been done in quota in
glusterfs-3.7.3.
>>> Could you upgrade to glusterfs-3.7.3 and see if this helps
>>> 
>>> Thanks,
>>> Vijay
>>> 
>>> 
>>> On Friday 07 August 2015 05:02 AM, Geoffrey Letessier wrote:
>>>> Hi,
>>>> 
>>>> No idea to help me fix this issue? (big logs, small write
performance (/4), etc.)
>>>> 
>>>> For comparison, here to volumes: 
>>>> 
>>>>                 - home: distributed on 4 bricks / 2 nodes  (and
replicated on 4 other bricks / 2 other nodes):
>>>> # ddt -t 35g /home
>>>> Writing to /home/ddt.24172 ... syncing ... done.
>>>> sleeping 10 seconds ... done.
>>>> Reading from /home/ddt.24172 ... done.
>>>> 33792MiB    KiB/s  CPU%
>>>> Write      103659     1
>>>> Read       391955     3
>>>> 
>>>> 
>>>>                 - workdir: distributed on 4 bricks / 2 nodes
(one the same RAID volumes and servers than home):
>>>> # ddt -t 35g /workdir
>>>> Writing to /workdir/ddt.24717 ... syncing ... done.
>>>> sleeping 10 seconds ... done.
>>>> Reading from /workdir/ddt.24717 ... done.
>>>> 35840MiB    KiB/s  CPU%
>>>> Write      738314     4
>>>> Read       536497     4
>>>> 
>>>> For information, previously on 3.5.3-2 version, I obtained
roughly 1.1GBs for workdir volume and ~550-600MBs for home.
>>>> 
>>>> All my tests (CP, RSYNC, etc.) provides me the same result
(write throughput between 100MBs and 150MBs)
>>>> 
>>>> Thanks.
>>>> Geoffrey
>>>> ------------------------------------------------------
>>>> Geoffrey Letessier
>>>> Responsable informatique & ing?nieur syst?me
>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
>>>> Institut de Biologie Physico-Chimique
>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>>>> 
>>>> Le 5 ao?t 2015 ? 10:40, Geoffrey Letessier
<geoffrey.letessier at cnrs.fr> a ?crit :
>>>> 
>>>>> Hello,
>>>>> 
>>>>> In addition, knowing I have reactivated the log
(brick-log-level = INFO not CRITICAL) only for the file creation duration (i.e.
a few minutes), do you have noticed the log sizes and the number of lines
inside:
>>>>> # ls -lh storage*
>>>>> -rw-------  1 letessier  staff    18M  5 ao? 00:54
storage1__export-brick_home-brick1-data.log
>>>>> -rw-------  1 letessier  staff   2,1K  5 ao? 00:54
storage1__export-brick_home-brick2-data.log
>>>>> -rw-------  1 letessier  staff    15M  5 ao? 00:56
storage2__export-brick_home-brick1-data.log
>>>>> -rw-------  1 letessier  staff   2,1K  5 ao? 00:54
storage2__export-brick_home-brick2-data.log
>>>>> -rw-------  1 letessier  staff    47M  5 ao? 00:55
storage3__export-brick_home-brick1-data.log
>>>>> -rw-------  1 letessier  staff   2,1K  5 ao? 00:54
storage3__export-brick_home-brick2-data.log
>>>>> -rw-------  1 letessier  staff    47M  5 ao? 00:55
storage4__export-brick_home-brick1-data.log
>>>>> -rw-------  1 letessier  staff   2,1K  5 ao? 00:55
storage4__export-brick_home-brick2-data.log
>>>>> 
>>>>> # wc -l storage*
>>>>>    55381 storage1__export-brick_home-brick1-data.log
>>>>>       17 storage1__export-brick_home-brick2-data.log
>>>>>    41636 storage2__export-brick_home-brick1-data.log
>>>>>       17 storage2__export-brick_home-brick2-data.log
>>>>>   270360 storage3__export-brick_home-brick1-data.log
>>>>>       17 storage3__export-brick_home-brick2-data.log
>>>>>   270358 storage4__export-brick_home-brick1-data.log
>>>>>       17 storage4__export-brick_home-brick2-data.log
>>>>>   637803 total
>>>>> 
>>>>> If the let brick-log-level to INFO, the brick log files in
each server will consume all my /var partition capacity within only a few
hours/days?
>>>>> 
>>>>> Thanks in advance,
>>>>> Geoffrey
>>>>> ------------------------------------------------------
>>>>> Geoffrey Letessier
>>>>> Responsable informatique & ing?nieur syst?me
>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
>>>>> Institut de Biologie Physico-Chimique
>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>>>>> 
>>>>> Le 5 ao?t 2015 ? 01:12, Geoffrey Letessier
<geoffrey.letessier at cnrs.fr> a ?crit :
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> Since the problem motioned previously (all errors
noticed in brick log files), i notice a very very bad performance: i can note my
write performance divided by 4 than previously -knowing it was not so good
before.
>>>>>> Now, a write of a 33GB file, my write throughput is
around 150MBs (with Infiniband), before it was around 550-600MBs; and this, both
with RDMA and TCP protocol.
>>>>>> 
>>>>>> During this test, more than 40 000 error lines (as the
following) were added to the brick log files.
>>>>>> [2015-08-04 22:34:27.337622] E
[dict.c:1418:dict_copy_with_ref]
(-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)
[0x7f021c6f7410]
-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)
[0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)
[0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide]
>>>>>> 
>>>>>> 
>>>>>> All brick log files are in attachments.
>>>>>> 
>>>>>> Thanks in advance for all your help and fix,
>>>>>> Best,
>>>>>> Geoffrey
>>>>>> 
>>>>>> PS: question: is it possible to easily downgrade
GlusterFS to a previous version from 3.7 (for example: v3.5)?
>>>>>> 
>>>>>> ------------------------------------------------------
>>>>>> Geoffrey Letessier
>>>>>> Responsable informatique & ing?nieur syst?me
>>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
>>>>>> Institut de Biologie Physico-Chimique
>>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at
ibpc.fr
>>>>>> <bricks-logs.tgz>
>>>>> 
>>>> 
>>> 
>> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150807/f4d309de/attachment.html>

Gluster users - Aug 2015 - Cascading errors and very bad write performance

[Gluster-users] Cascading errors and very bad write performance

[Gluster-users] Cascading errors and very bad write performance