thr3ads.net - Gluster users - [Gluster-users] GlusterFS 3.7

If this information is useful, please help other people find it:
Share via:

Pranith Kumar Karampuri

2015-Jun-02 08:09 UTC

[Gluster-users] GlusterFS 3.7 - slow/poor performances

hi Geoffrey,
              Since you are saying it happens on all types of volumes, 
lets do the following:
1) Create a dist-repl volume
2) Set the options etc you need.
3) enable gluster volume profile using "gluster volume profile
<volname>
start"
4) run the work load
5) give output of "gluster volume profile <volname> info"

Repeat the steps above on new and old version you are comparing this 
with. That should give us insight into what could be causing the slowness.

Pranith
On 06/02/2015 03:22 AM, Geoffrey Letessier wrote:> Dear all,
>
> I have a crash test cluster where i?ve tested the new version of 
> GlusterFS (v3.7) before upgrading my HPC cluster in production.
> But? all my tests show me very very low performances.
>
> For my benches, as you can read below, I do some actions (untar, du, 
> find, tar, rm) with linux kernel sources, dropping cache, each on 
> distributed, replicated, distributed-replicated, single (single brick) 
> volumes and the native FS of one brick.
>
> # time (echo 3 > /proc/sys/vm/drop_caches; tar xJf 
> ~/linux-4.1-rc5.tar.xz; sync; echo 3 > /proc/sys/vm/drop_caches)
> # time (echo 3 > /proc/sys/vm/drop_caches; du -sh linux-4.1-rc5/; echo 
> 3 > /proc/sys/vm/drop_caches)
> # time (echo 3 > /proc/sys/vm/drop_caches; find linux-4.1-rc5/|wc -l; 
> echo 3 > /proc/sys/vm/drop_caches)
> # time (echo 3 > /proc/sys/vm/drop_caches; tar czf linux-4.1-rc5.tgz 
> linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches)
> # time (echo 3 > /proc/sys/vm/drop_caches; rm -rf linux-4.1-rc5.tgz 
> linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches)
>
> And here are the process times:
>
> ---------------------------------------------------------------
> |             |  UNTAR  |   DU   |  FIND   |   TAR   |   RM   |
> ---------------------------------------------------------------
> | single      |  ~3m45s |   ~43s |    ~47s | ~3m10s | ~3m15s |
> ---------------------------------------------------------------
> | replicated  |  ~5m10s |   ~59s |   ~1m6s | ~1m19s | ~1m49s |
> ---------------------------------------------------------------
> | distributed |  ~4m18s |   ~41s |    ~57s | ~2m24s | ~1m38s |
> ---------------------------------------------------------------
> | dist-repl   |  ~8m18s |  ~1m4s |  ~1m11s | ~1m24s | ~2m40s |
> ---------------------------------------------------------------
> | native FS   |    ~11s |    ~4s |     ~2s | ~56s |   ~10s |
> ---------------------------------------------------------------
>
> I get the same results, whether with default configurations with 
> custom configurations.
>
> if I look at the side of the ifstat command, I can note my IO write 
> processes never exceed 3MBs...
>
> EXT4 native FS seems to be faster (roughly 15-20% but no more) than 
> XFS one
>
> My [test] storage cluster config is composed by 2 identical servers 
> (biCPU Intel Xeon X5355, 8GB of RAM, 2x2TB HDD (no-RAID) and Gb ethernet)
>
> My volume settings:
> single: 1server 1 brick
> replicated: 2 servers 1 brick each
> distributed: 2 servers 2 bricks each
> dist-repl: 2 bricks in the same server and replica 2
>
> All seems to be OK in gluster status command line.
>
> Do you have an idea why I obtain so bad results?
> Thanks in advance.
> Geoffrey
> -----------------------------------------------
> Geoffrey Letessier
>
> Responsable informatique & ing?nieur syst?me
> CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr 
> <mailto:geoffrey.letessier at cnrs.fr>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150602/7ba5ae21/attachment.html>

Geoffrey Letessier

2015-Jun-02 12:09 UTC

head link

[Gluster-users] GlusterFS 3.7 - slow/poor performances

Hi Pranith,

I?m sorry but I cannot bring you any comparison because comparison will be
distorted by the fact in my HPC cluster in production the network technology is
InfiniBand QDR and my volumes are quite different (brick in RAID6 (12x2TB), 2
bricks per server and 4 servers into my pool)

Concerning your demand, in attachments you can find all expected results hoping
it can help you to solve this serious performance issue (maybe I need play with
glusterfs parameters?).

Thank you very much by advance,
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ing?nieur syst?me
UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
<mailto:geoffrey.letessier at ibpc.fr>> Le 2 juin 2015 ? 10:09, Pranith Kumar Karampuri <pkarampu at redhat.com
<mailto:pkarampu at redhat.com>> a ?crit :
> 
> hi Geoffrey,
>              Since you are saying it happens on all types of volumes, lets
do the following:
> 1) Create a dist-repl volume
> 2) Set the options etc you need.
> 3) enable gluster volume profile using "gluster volume profile
<volname> start"
> 4) run the work load
> 5) give output of "gluster volume profile <volname> info"
> 
> Repeat the steps above on new and old version you are comparing this with.
That should give us insight into what could be causing the slowness.
> 
> Pranith
> On 06/02/2015 03:22 AM, Geoffrey Letessier wrote:
>> Dear all,
>> 
>> I have a crash test cluster where i?ve tested the new version of
GlusterFS (v3.7) before upgrading my HPC cluster in production.
>> But? all my tests show me very very low performances.
>> 
>> For my benches, as you can read below, I do some actions (untar, du,
find, tar, rm) with linux kernel sources, dropping cache, each on distributed,
replicated, distributed-replicated, single (single brick) volumes and the native
FS of one brick.
>> 
>> # time (echo 3 > /proc/sys/vm/drop_caches; tar xJf
~/linux-4.1-rc5.tar.xz; sync; echo 3 > /proc/sys/vm/drop_caches)
>> # time (echo 3 > /proc/sys/vm/drop_caches; du -sh linux-4.1-rc5/;
echo 3 > /proc/sys/vm/drop_caches)
>> # time (echo 3 > /proc/sys/vm/drop_caches; find linux-4.1-rc5/|wc
-l; echo 3 > /proc/sys/vm/drop_caches)
>> # time (echo 3 > /proc/sys/vm/drop_caches; tar czf linux-4.1-rc5.tgz
linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches)
>> # time (echo 3 > /proc/sys/vm/drop_caches; rm -rf linux-4.1-rc5.tgz
linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches)
>> 
>> And here are the process times:
>> 
>> ---------------------------------------------------------------
>> |             |  UNTAR  |   DU   |  FIND   |   TAR   |   RM   |
>> ---------------------------------------------------------------
>> | single      |  ~3m45s |   ~43s |    ~47s |  ~3m10s | ~3m15s |
>> ---------------------------------------------------------------
>> | replicated  |  ~5m10s |   ~59s |   ~1m6s |  ~1m19s | ~1m49s |
>> ---------------------------------------------------------------
>> | distributed |  ~4m18s |   ~41s |    ~57s |  ~2m24s | ~1m38s |
>> ---------------------------------------------------------------
>> | dist-repl   |  ~8m18s |  ~1m4s |  ~1m11s |  ~1m24s | ~2m40s |
>> ---------------------------------------------------------------
>> | native FS   |    ~11s |    ~4s |     ~2s |    ~56s |   ~10s |
>> ---------------------------------------------------------------
>> 
>> I get the same results, whether with default configurations with custom
configurations.
>> 
>> if I look at the side of the ifstat command, I can note my IO write
processes never exceed 3MBs...
>> 
>> EXT4 native FS seems to be faster (roughly 15-20% but no more) than XFS
one
>> 
>> My [test] storage cluster config is composed by 2 identical servers
(biCPU Intel Xeon X5355, 8GB of RAM, 2x2TB HDD (no-RAID) and Gb ethernet)
>> 
>> My volume settings:
>>  single: 1server 1 brick
>>  replicated: 2 servers 1 brick each
>>  distributed: 2 servers 2 bricks each
>>  dist-repl: 2 bricks in the same server and replica 2
>> 
>> All seems to be OK in gluster status command line.
>> 
>> Do you have an idea why I obtain so bad results?
>> Thanks in advance.
>> Geoffrey
>> -----------------------------------------------
>> Geoffrey Letessier
>> 
>> Responsable informatique & ing?nieur syst?me
>> CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique
>> Institut de Biologie Physico-Chimique
>> 13, rue Pierre et Marie Curie - 75005 Paris
>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr
<mailto:geoffrey.letessier at cnrs.fr>
>> 
>> 
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>> http://www.gluster.org/mailman/listinfo/gluster-users
<http://www.gluster.org/mailman/listinfo/gluster-users>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150602/29138499/attachment-0003.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: client.txt
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150602/29138499/attachment.txt>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150602/29138499/attachment-0004.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: server.txt
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150602/29138499/attachment-0001.txt>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150602/29138499/attachment-0005.html>

Ben Turner

2015-Jun-02 19:53 UTC

head link

[Gluster-users] GlusterFS 3.7 - slow/poor performances

I am seeing problems on 3.7 as well.  Can you check /var/log/messages on both
the clients and servers for hung tasks like:

Jun  2 15:23:14 gqac006 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun  2 15:23:14 gqac006 kernel: iozone        D 0000000000000001     0 21999    
1 0x00000080
Jun  2 15:23:14 gqac006 kernel: ffff880611321cc8 0000000000000082
ffff880611321c18 ffffffffa027236e
Jun  2 15:23:14 gqac006 kernel: ffff880611321c48 ffffffffa0272c10
ffff88052bd1e040 ffff880611321c78
Jun  2 15:23:14 gqac006 kernel: ffff88052bd1e0f0 ffff88062080c7a0
ffff880625addaf8 ffff880611321fd8
Jun  2 15:23:14 gqac006 kernel: Call Trace:
Jun  2 15:23:14 gqac006 kernel: [<ffffffffa027236e>] ?
rpc_make_runnable+0x7e/0x80 [sunrpc]
Jun  2 15:23:14 gqac006 kernel: [<ffffffffa0272c10>] ?
rpc_execute+0x50/0xa0 [sunrpc]
Jun  2 15:23:14 gqac006 kernel: [<ffffffff810aaa21>] ?
ktime_get_ts+0xb1/0xf0
Jun  2 15:23:14 gqac006 kernel: [<ffffffff811242d0>] ? sync_page+0x0/0x50
Jun  2 15:23:14 gqac006 kernel: [<ffffffff8152a1b3>] io_schedule+0x73/0xc0
Jun  2 15:23:14 gqac006 kernel: [<ffffffff8112430d>] sync_page+0x3d/0x50
Jun  2 15:23:14 gqac006 kernel: [<ffffffff8152ac7f>]
__wait_on_bit+0x5f/0x90
Jun  2 15:23:14 gqac006 kernel: [<ffffffff81124543>]
wait_on_page_bit+0x73/0x80
Jun  2 15:23:14 gqac006 kernel: [<ffffffff8109eb80>] ?
wake_bit_function+0x0/0x50
Jun  2 15:23:14 gqac006 kernel: [<ffffffff8113a525>] ?
pagevec_lookup_tag+0x25/0x40
Jun  2 15:23:14 gqac006 kernel: [<ffffffff8112496b>]
wait_on_page_writeback_range+0xfb/0x190
Jun  2 15:23:14 gqac006 kernel: [<ffffffff81124b38>]
filemap_write_and_wait_range+0x78/0x90
Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c07ce>]
vfs_fsync_range+0x7e/0x100
Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c08bd>] vfs_fsync+0x1d/0x20
Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c08fe>] do_fsync+0x3e/0x60
Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c0950>] sys_fsync+0x10/0x20
Jun  2 15:23:14 gqac006 kernel: [<ffffffff8100b072>]
system_call_fastpath+0x16/0x1b

Do you see a perf problem with just a simple DD or do you need a more complex
workload to hit the issue?  I think I saw an issue with metadata performance
that I am trying to run down, let me know if you can see the problem with simple
DD reads / writes or if we need to do some sort of dir / metadata access as
well.

-b

----- Original Message -----> From: "Geoffrey Letessier" <geoffrey.letessier at cnrs.fr>
> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> Cc: gluster-users at gluster.org
> Sent: Tuesday, June 2, 2015 8:09:04 AM
> Subject: Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances
> 
> Hi Pranith,
> 
> I?m sorry but I cannot bring you any comparison because comparison will be
> distorted by the fact in my HPC cluster in production the network
technology
> is InfiniBand QDR and my volumes are quite different (brick in RAID6
> (12x2TB), 2 bricks per server and 4 servers into my pool)
> 
> Concerning your demand, in attachments you can find all expected results
> hoping it can help you to solve this serious performance issue (maybe I
need
> play with glusterfs parameters?).
> 
> Thank you very much by advance,
> Geoffrey
> ------------------------------------------------------
> Geoffrey Letessier
> Responsable informatique & ing?nieur syst?me
> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
> 
> 
> 
> 
> Le 2 juin 2015 ? 10:09, Pranith Kumar Karampuri < pkarampu at redhat.com
> a
> ?crit :
> 
> hi Geoffrey,
> Since you are saying it happens on all types of volumes, lets do the
> following:
> 1) Create a dist-repl volume
> 2) Set the options etc you need.
> 3) enable gluster volume profile using "gluster volume profile
<volname>
> start"
> 4) run the work load
> 5) give output of "gluster volume profile <volname> info"
> 
> Repeat the steps above on new and old version you are comparing this with.
> That should give us insight into what could be causing the slowness.
> 
> Pranith
> On 06/02/2015 03:22 AM, Geoffrey Letessier wrote:
> 
> 
> Dear all,
> 
> I have a crash test cluster where i?ve tested the new version of GlusterFS
> (v3.7) before upgrading my HPC cluster in production.
> But? all my tests show me very very low performances.
> 
> For my benches, as you can read below, I do some actions (untar, du, find,
> tar, rm) with linux kernel sources, dropping cache, each on distributed,
> replicated, distributed-replicated, single (single brick) volumes and the
> native FS of one brick.
> 
> # time (echo 3 > /proc/sys/vm/drop_caches; tar xJf
~/linux-4.1-rc5.tar.xz;
> sync; echo 3 > /proc/sys/vm/drop_caches)
> # time (echo 3 > /proc/sys/vm/drop_caches; du -sh linux-4.1-rc5/; echo 3
>
> /proc/sys/vm/drop_caches)
> # time (echo 3 > /proc/sys/vm/drop_caches; find linux-4.1-rc5/|wc -l;
echo 3
> > /proc/sys/vm/drop_caches)
> # time (echo 3 > /proc/sys/vm/drop_caches; tar czf linux-4.1-rc5.tgz
> linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches)
> # time (echo 3 > /proc/sys/vm/drop_caches; rm -rf linux-4.1-rc5.tgz
> linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches)
> 
> And here are the process times:
> 
> ---------------------------------------------------------------
> | | UNTAR | DU | FIND | TAR | RM |
> ---------------------------------------------------------------
> | single | ~3m45s | ~43s | ~47s | ~3m10s | ~3m15s |
> ---------------------------------------------------------------
> | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s |
> ---------------------------------------------------------------
> | distributed | ~4m18s | ~41s | ~57s | ~2m24s | ~1m38s |
> ---------------------------------------------------------------
> | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s |
> ---------------------------------------------------------------
> | native FS | ~11s | ~4s | ~2s | ~56s | ~10s |
> ---------------------------------------------------------------
> 
> I get the same results, whether with default configurations with custom
> configurations.
> 
> if I look at the side of the ifstat command, I can note my IO write
processes
> never exceed 3MBs...
> 
> EXT4 native FS seems to be faster (roughly 15-20% but no more) than XFS one
> 
> My [test] storage cluster config is composed by 2 identical servers (biCPU
> Intel Xeon X5355, 8GB of RAM, 2x2TB HDD (no-RAID) and Gb ethernet)
> 
> My volume settings:
> single: 1server 1 brick
> replicated: 2 servers 1 brick each
> distributed: 2 servers 2 bricks each
> dist-repl: 2 bricks in the same server and replica 2
> 
> All seems to be OK in gluster status command line.
> 
> Do you have an idea why I obtain so bad results?
> Thanks in advance.
> Geoffrey
> -----------------------------------------------
> Geoffrey Letessier
> 
> Responsable informatique & ing?nieur syst?me
> CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

Gluster users - Jun 2015 - GlusterFS 3.7 - slow/poor performances

[Gluster-users] GlusterFS 3.7 - slow/poor performances

[Gluster-users] GlusterFS 3.7 - slow/poor performances

[Gluster-users] GlusterFS 3.7 - slow/poor performances