thr3ads.net - Gluster users - [Gluster-users] GlusterFS 3.7

If this information is useful, please help other people find it:
Share via:

Geoffrey Letessier

2015-Jun-08 12:37 UTC

[Gluster-users] GlusterFS 3.7 - slow/poor performances

Hello,

Do you know more about?

In addition, do you know how to ? activate ? RDMA for my volume with
Intel/QLogic QDR? Currently, i mount my volumes with RDMA transport-type option
(both in server and client side) but I notice all streams are using TCP stack
-and my bandwith never exceed 2.0-2.5Gbs (250-300MB/s).

Thanks in advance,
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ing?nieur syst?me
UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
> Le 2 juin 2015 ? 23:45, Geoffrey Letessier <geoffrey.letessier at
cnrs.fr> a ?crit :
> 
> Hi Ben,
> 
> I just check my messages log files, both on client and server, and I dont
find any hung task you notice on yours..
> 
> As you can read below, i dont note the performance issue in a simple DD but
I think my issue is concerning a set of small files (tens of thousands nay
more)?
> 
> [root at nisus test]# ddt -t 10g /mnt/test/
> Writing to /mnt/test/ddt.8362 ... syncing ... done.
> sleeping 10 seconds ... done.
> Reading from /mnt/test/ddt.8362 ... done.
> 10240MiB    KiB/s  CPU%
> Write      114770     4
> Read        40675     4
> 
> for info: /mnt/test concerns the single v2 GlFS volume
> 
> [root at nisus test]# ddt -t 10g /mnt/fhgfs/
> Writing to /mnt/fhgfs/ddt.8380 ... syncing ... done.
> sleeping 10 seconds ... done.
> Reading from /mnt/fhgfs/ddt.8380 ... done.
> 10240MiB    KiB/s  CPU%
> Write      102591     1
> Read        98079     2
> 
> Do you have a idea how to tune/optimize performance settings? and/or TCP
settings (MTU, etc.)?
> 
> ---------------------------------------------------------------
> |             |  UNTAR  |   DU   |  FIND   |   TAR   |   RM   |
> ---------------------------------------------------------------
> | single      |  ~3m45s |   ~43s |    ~47s |  ~3m10s | ~3m15s |
> ---------------------------------------------------------------
> | replicated  |  ~5m10s |   ~59s |   ~1m6s |  ~1m19s | ~1m49s |
> ---------------------------------------------------------------
> | distributed |  ~4m18s |   ~41s |    ~57s |  ~2m24s | ~1m38s |
> ---------------------------------------------------------------
> | dist-repl   |  ~8m18s |  ~1m4s |  ~1m11s |  ~1m24s | ~2m40s |
> ---------------------------------------------------------------
> | native FS   |    ~11s |    ~4s |     ~2s |    ~56s |   ~10s |
> ---------------------------------------------------------------
> | BeeGFS      |  ~3m43s |   ~15s |     ~3s |  ~1m33s |   ~46s |
> ---------------------------------------------------------------
> | single (v2) |   ~3m6s |   ~14s |    ~32s |   ~1m2s |   ~44s |
> ---------------------------------------------------------------
> for info: 
> 	-BeeGFS is a distributed FS (4 bricks, 2 bricks per server and 2 servers)
> 	- single (v2): simple gluster volume with default settings
> 
> I also note I obtain the same tar/untar performance issue with FhGFS/BeeGFS
but the rest (DU, FIND, RM) looks like to be OK.
> 
> Thank you very much for your reply and help.
> Geoffrey
> -----------------------------------------------
> Geoffrey Letessier
> 
> Responsable informatique & ing?nieur syst?me
> CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr
<mailto:geoffrey.letessier at cnrs.fr>
> Le 2 juin 2015 ? 21:53, Ben Turner <bturner at redhat.com
<mailto:bturner at redhat.com>> a ?crit :
> 
>> I am seeing problems on 3.7 as well.  Can you check /var/log/messages
on both the clients and servers for hung tasks like:
>> 
>> Jun  2 15:23:14 gqac006 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Jun  2 15:23:14 gqac006 kernel: iozone        D 0000000000000001     0
21999      1 0x00000080
>> Jun  2 15:23:14 gqac006 kernel: ffff880611321cc8 0000000000000082
ffff880611321c18 ffffffffa027236e
>> Jun  2 15:23:14 gqac006 kernel: ffff880611321c48 ffffffffa0272c10
ffff88052bd1e040 ffff880611321c78
>> Jun  2 15:23:14 gqac006 kernel: ffff88052bd1e0f0 ffff88062080c7a0
ffff880625addaf8 ffff880611321fd8
>> Jun  2 15:23:14 gqac006 kernel: Call Trace:
>> Jun  2 15:23:14 gqac006 kernel: [<ffffffffa027236e>] ?
rpc_make_runnable+0x7e/0x80 [sunrpc]
>> Jun  2 15:23:14 gqac006 kernel: [<ffffffffa0272c10>] ?
rpc_execute+0x50/0xa0 [sunrpc]
>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff810aaa21>] ?
ktime_get_ts+0xb1/0xf0
>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811242d0>] ?
sync_page+0x0/0x50
>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8152a1b3>]
io_schedule+0x73/0xc0
>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8112430d>]
sync_page+0x3d/0x50
>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8152ac7f>]
__wait_on_bit+0x5f/0x90
>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff81124543>]
wait_on_page_bit+0x73/0x80
>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8109eb80>] ?
wake_bit_function+0x0/0x50
>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8113a525>] ?
pagevec_lookup_tag+0x25/0x40
>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8112496b>]
wait_on_page_writeback_range+0xfb/0x190
>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff81124b38>]
filemap_write_and_wait_range+0x78/0x90
>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c07ce>]
vfs_fsync_range+0x7e/0x100
>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c08bd>]
vfs_fsync+0x1d/0x20
>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c08fe>]
do_fsync+0x3e/0x60
>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c0950>]
sys_fsync+0x10/0x20
>> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8100b072>]
system_call_fastpath+0x16/0x1b
>> 
>> Do you see a perf problem with just a simple DD or do you need a more
complex workload to hit the issue?  I think I saw an issue with metadata
performance that I am trying to run down, let me know if you can see the problem
with simple DD reads / writes or if we need to do some sort of dir / metadata
access as well.
>> 
>> -b
>> 
>> ----- Original Message -----
>>> From: "Geoffrey Letessier" <geoffrey.letessier at
cnrs.fr <mailto:geoffrey.letessier at cnrs.fr>>
>>> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com
<mailto:pkarampu at redhat.com>>
>>> Cc: gluster-users at gluster.org <mailto:gluster-users at
gluster.org>
>>> Sent: Tuesday, June 2, 2015 8:09:04 AM
>>> Subject: Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances
>>> 
>>> Hi Pranith,
>>> 
>>> I?m sorry but I cannot bring you any comparison because comparison
will be
>>> distorted by the fact in my HPC cluster in production the network
technology
>>> is InfiniBand QDR and my volumes are quite different (brick in
RAID6
>>> (12x2TB), 2 bricks per server and 4 servers into my pool)
>>> 
>>> Concerning your demand, in attachments you can find all expected
results
>>> hoping it can help you to solve this serious performance issue
(maybe I need
>>> play with glusterfs parameters?).
>>> 
>>> Thank you very much by advance,
>>> Geoffrey
>>> ------------------------------------------------------
>>> Geoffrey Letessier
>>> Responsable informatique & ing?nieur syst?me
>>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
>>> Institut de Biologie Physico-Chimique
>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
<mailto:geoffrey.letessier at ibpc.fr>
>>> 
>>> 
>>> 
>>> 
>>> Le 2 juin 2015 ? 10:09, Pranith Kumar Karampuri < pkarampu at
redhat.com <mailto:pkarampu at redhat.com> > a
>>> ?crit :
>>> 
>>> hi Geoffrey,
>>> Since you are saying it happens on all types of volumes, lets do
the
>>> following:
>>> 1) Create a dist-repl volume
>>> 2) Set the options etc you need.
>>> 3) enable gluster volume profile using "gluster volume profile
<volname>
>>> start"
>>> 4) run the work load
>>> 5) give output of "gluster volume profile <volname>
info"
>>> 
>>> Repeat the steps above on new and old version you are comparing
this with.
>>> That should give us insight into what could be causing the
slowness.
>>> 
>>> Pranith
>>> On 06/02/2015 03:22 AM, Geoffrey Letessier wrote:
>>> 
>>> 
>>> Dear all,
>>> 
>>> I have a crash test cluster where i?ve tested the new version of
GlusterFS
>>> (v3.7) before upgrading my HPC cluster in production.
>>> But? all my tests show me very very low performances.
>>> 
>>> For my benches, as you can read below, I do some actions (untar,
du, find,
>>> tar, rm) with linux kernel sources, dropping cache, each on
distributed,
>>> replicated, distributed-replicated, single (single brick) volumes
and the
>>> native FS of one brick.
>>> 
>>> # time (echo 3 > /proc/sys/vm/drop_caches; tar xJf
~/linux-4.1-rc5.tar.xz;
>>> sync; echo 3 > /proc/sys/vm/drop_caches)
>>> # time (echo 3 > /proc/sys/vm/drop_caches; du -sh
linux-4.1-rc5/; echo 3 >
>>> /proc/sys/vm/drop_caches)
>>> # time (echo 3 > /proc/sys/vm/drop_caches; find
linux-4.1-rc5/|wc -l; echo 3
>>>> /proc/sys/vm/drop_caches)
>>> # time (echo 3 > /proc/sys/vm/drop_caches; tar czf
linux-4.1-rc5.tgz
>>> linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches)
>>> # time (echo 3 > /proc/sys/vm/drop_caches; rm -rf
linux-4.1-rc5.tgz
>>> linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches)
>>> 
>>> And here are the process times:
>>> 
>>> ---------------------------------------------------------------
>>> | | UNTAR | DU | FIND | TAR | RM |
>>> ---------------------------------------------------------------
>>> | single | ~3m45s | ~43s | ~47s | ~3m10s | ~3m15s |
>>> ---------------------------------------------------------------
>>> | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s |
>>> ---------------------------------------------------------------
>>> | distributed | ~4m18s | ~41s | ~57s | ~2m24s | ~1m38s |
>>> ---------------------------------------------------------------
>>> | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s |
>>> ---------------------------------------------------------------
>>> | native FS | ~11s | ~4s | ~2s | ~56s | ~10s |
>>> ---------------------------------------------------------------
>>> 
>>> I get the same results, whether with default configurations with
custom
>>> configurations.
>>> 
>>> if I look at the side of the ifstat command, I can note my IO write
processes
>>> never exceed 3MBs...
>>> 
>>> EXT4 native FS seems to be faster (roughly 15-20% but no more) than
XFS one
>>> 
>>> My [test] storage cluster config is composed by 2 identical servers
(biCPU
>>> Intel Xeon X5355, 8GB of RAM, 2x2TB HDD (no-RAID) and Gb ethernet)
>>> 
>>> My volume settings:
>>> single: 1server 1 brick
>>> replicated: 2 servers 1 brick each
>>> distributed: 2 servers 2 bricks each
>>> dist-repl: 2 bricks in the same server and replica 2
>>> 
>>> All seems to be OK in gluster status command line.
>>> 
>>> Do you have an idea why I obtain so bad results?
>>> Thanks in advance.
>>> Geoffrey
>>> -----------------------------------------------
>>> Geoffrey Letessier
>>> 
>>> Responsable informatique & ing?nieur syst?me
>>> CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique
>>> Institut de Biologie Physico-Chimique
>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr
<mailto:geoffrey.letessier at cnrs.fr>
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Gluster-users mailing list Gluster-users at gluster.org
<mailto:Gluster-users at gluster.org>
>>> http://www.gluster.org/mailman/listinfo/gluster-users
<http://www.gluster.org/mailman/listinfo/gluster-users>
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
>>> http://www.gluster.org/mailman/listinfo/gluster-users
<http://www.gluster.org/mailman/listinfo/gluster-users>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150608/40e3ad8d/attachment.html>

Ben Turner

2015-Jun-08 16:22 UTC

head link

[Gluster-users] GlusterFS 3.7 - slow/poor performances

----- Original Message -----> From: "Geoffrey Letessier" <geoffrey.letessier at cnrs.fr>
> To: "Ben Turner" <bturner at redhat.com>
> Cc: "Pranith Kumar Karampuri" <pkarampu at redhat.com>,
gluster-users at gluster.org
> Sent: Monday, June 8, 2015 8:37:08 AM
> Subject: Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances
> 
> Hello,
> 
> Do you know more about?
> 
> In addition, do you know how to ? activate ? RDMA for my volume with
> Intel/QLogic QDR? Currently, i mount my volumes with RDMA transport-type
> option (both in server and client side) but I notice all streams are using
> TCP stack -and my bandwith never exceed 2.0-2.5Gbs (250-300MB/s).
That is a little slow for the HW you described.  Can you check what you get with
iperf just between the clients and servers?  https://iperf.fr/  With replica 2
and 10G NW you should see ~400 MB / sec sequential writes and ~600 MB / sec
reads.  Can you send me the output from gluster v info?  You specify RDMA
volumes at create time by running gluster v create blah transport rdma, did you
specify RDMA when you created the volume?  What block size are you using in your
tests?  1024 KB writes perform best with glusterfs, and the block size gets
smaller perf will drop a little bit.  I wouldn't write in anything under 4k
blocks, the sweet spot is between 64k and 1024k.

-b
> 
> Thanks in advance,
> Geoffrey
> ------------------------------------------------------
> Geoffrey Letessier
> Responsable informatique & ing?nieur syst?me
> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
> 
> > Le 2 juin 2015 ? 23:45, Geoffrey Letessier <geoffrey.letessier at
cnrs.fr> a
> > ?crit :
> > 
> > Hi Ben,
> > 
> > I just check my messages log files, both on client and server, and I
dont
> > find any hung task you notice on yours..
> > 
> > As you can read below, i dont note the performance issue in a simple
DD but
> > I think my issue is concerning a set of small files (tens of thousands
nay
> > more)?
> > 
> > [root at nisus test]# ddt -t 10g /mnt/test/
> > Writing to /mnt/test/ddt.8362 ... syncing ... done.
> > sleeping 10 seconds ... done.
> > Reading from /mnt/test/ddt.8362 ... done.
> > 10240MiB    KiB/s  CPU%
> > Write      114770     4
> > Read        40675     4
> > 
> > for info: /mnt/test concerns the single v2 GlFS volume
> > 
> > [root at nisus test]# ddt -t 10g /mnt/fhgfs/
> > Writing to /mnt/fhgfs/ddt.8380 ... syncing ... done.
> > sleeping 10 seconds ... done.
> > Reading from /mnt/fhgfs/ddt.8380 ... done.
> > 10240MiB    KiB/s  CPU%
> > Write      102591     1
> > Read        98079     2
> > 
> > Do you have a idea how to tune/optimize performance settings? and/or
TCP
> > settings (MTU, etc.)?
> > 
> > ---------------------------------------------------------------
> > |             |  UNTAR  |   DU   |  FIND   |   TAR   |   RM   |
> > ---------------------------------------------------------------
> > | single      |  ~3m45s |   ~43s |    ~47s |  ~3m10s | ~3m15s |
> > ---------------------------------------------------------------
> > | replicated  |  ~5m10s |   ~59s |   ~1m6s |  ~1m19s | ~1m49s |
> > ---------------------------------------------------------------
> > | distributed |  ~4m18s |   ~41s |    ~57s |  ~2m24s | ~1m38s |
> > ---------------------------------------------------------------
> > | dist-repl   |  ~8m18s |  ~1m4s |  ~1m11s |  ~1m24s | ~2m40s |
> > ---------------------------------------------------------------
> > | native FS   |    ~11s |    ~4s |     ~2s |    ~56s |   ~10s |
> > ---------------------------------------------------------------
> > | BeeGFS      |  ~3m43s |   ~15s |     ~3s |  ~1m33s |   ~46s |
> > ---------------------------------------------------------------
> > | single (v2) |   ~3m6s |   ~14s |    ~32s |   ~1m2s |   ~44s |
> > ---------------------------------------------------------------
> > for info:
> > 	-BeeGFS is a distributed FS (4 bricks, 2 bricks per server and 2
servers)
> > 	- single (v2): simple gluster volume with default settings
> > 
> > I also note I obtain the same tar/untar performance issue with
FhGFS/BeeGFS
> > but the rest (DU, FIND, RM) looks like to be OK.
> > 
> > Thank you very much for your reply and help.
> > Geoffrey
> > -----------------------------------------------
> > Geoffrey Letessier
> > 
> > Responsable informatique & ing?nieur syst?me
> > CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique
> > Institut de Biologie Physico-Chimique
> > 13, rue Pierre et Marie Curie - 75005 Paris
> > Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr
> > <mailto:geoffrey.letessier at cnrs.fr>
> > Le 2 juin 2015 ? 21:53, Ben Turner <bturner at redhat.com
> > <mailto:bturner at redhat.com>> a ?crit :
> > 
> >> I am seeing problems on 3.7 as well.  Can you check
/var/log/messages on
> >> both the clients and servers for hung tasks like:
> >> 
> >> Jun  2 15:23:14 gqac006 kernel: "echo 0 >
> >> /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
> >> Jun  2 15:23:14 gqac006 kernel: iozone        D 0000000000000001  
0
> >> 21999      1 0x00000080
> >> Jun  2 15:23:14 gqac006 kernel: ffff880611321cc8 0000000000000082
> >> ffff880611321c18 ffffffffa027236e
> >> Jun  2 15:23:14 gqac006 kernel: ffff880611321c48 ffffffffa0272c10
> >> ffff88052bd1e040 ffff880611321c78
> >> Jun  2 15:23:14 gqac006 kernel: ffff88052bd1e0f0 ffff88062080c7a0
> >> ffff880625addaf8 ffff880611321fd8
> >> Jun  2 15:23:14 gqac006 kernel: Call Trace:
> >> Jun  2 15:23:14 gqac006 kernel: [<ffffffffa027236e>] ?
> >> rpc_make_runnable+0x7e/0x80 [sunrpc]
> >> Jun  2 15:23:14 gqac006 kernel: [<ffffffffa0272c10>] ?
> >> rpc_execute+0x50/0xa0 [sunrpc]
> >> Jun  2 15:23:14 gqac006 kernel: [<ffffffff810aaa21>] ?
> >> ktime_get_ts+0xb1/0xf0
> >> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811242d0>] ?
sync_page+0x0/0x50
> >> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8152a1b3>]
io_schedule+0x73/0xc0
> >> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8112430d>]
sync_page+0x3d/0x50
> >> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8152ac7f>]
> >> __wait_on_bit+0x5f/0x90
> >> Jun  2 15:23:14 gqac006 kernel: [<ffffffff81124543>]
> >> wait_on_page_bit+0x73/0x80
> >> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8109eb80>] ?
> >> wake_bit_function+0x0/0x50
> >> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8113a525>] ?
> >> pagevec_lookup_tag+0x25/0x40
> >> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8112496b>]
> >> wait_on_page_writeback_range+0xfb/0x190
> >> Jun  2 15:23:14 gqac006 kernel: [<ffffffff81124b38>]
> >> filemap_write_and_wait_range+0x78/0x90
> >> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c07ce>]
> >> vfs_fsync_range+0x7e/0x100
> >> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c08bd>]
vfs_fsync+0x1d/0x20
> >> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c08fe>]
do_fsync+0x3e/0x60
> >> Jun  2 15:23:14 gqac006 kernel: [<ffffffff811c0950>]
sys_fsync+0x10/0x20
> >> Jun  2 15:23:14 gqac006 kernel: [<ffffffff8100b072>]
> >> system_call_fastpath+0x16/0x1b
> >> 
> >> Do you see a perf problem with just a simple DD or do you need a
more
> >> complex workload to hit the issue?  I think I saw an issue with
metadata
> >> performance that I am trying to run down, let me know if you can
see the
> >> problem with simple DD reads / writes or if we need to do some
sort of
> >> dir / metadata access as well.
> >> 
> >> -b
> >> 
> >> ----- Original Message -----
> >>> From: "Geoffrey Letessier" <geoffrey.letessier at
cnrs.fr
> >>> <mailto:geoffrey.letessier at cnrs.fr>>
> >>> To: "Pranith Kumar Karampuri" <pkarampu at
redhat.com
> >>> <mailto:pkarampu at redhat.com>>
> >>> Cc: gluster-users at gluster.org <mailto:gluster-users at
gluster.org>
> >>> Sent: Tuesday, June 2, 2015 8:09:04 AM
> >>> Subject: Re: [Gluster-users] GlusterFS 3.7 - slow/poor
performances
> >>> 
> >>> Hi Pranith,
> >>> 
> >>> I?m sorry but I cannot bring you any comparison because
comparison will
> >>> be
> >>> distorted by the fact in my HPC cluster in production the
network
> >>> technology
> >>> is InfiniBand QDR and my volumes are quite different (brick in
RAID6
> >>> (12x2TB), 2 bricks per server and 4 servers into my pool)
> >>> 
> >>> Concerning your demand, in attachments you can find all
expected results
> >>> hoping it can help you to solve this serious performance issue
(maybe I
> >>> need
> >>> play with glusterfs parameters?).
> >>> 
> >>> Thank you very much by advance,
> >>> Geoffrey
> >>> ------------------------------------------------------
> >>> Geoffrey Letessier
> >>> Responsable informatique & ing?nieur syst?me
> >>> UPR 9080 - CNRS - Laboratoire de Biochimie Th?orique
> >>> Institut de Biologie Physico-Chimique
> >>> 13, rue Pierre et Marie Curie - 75005 Paris
> >>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
> >>> <mailto:geoffrey.letessier at ibpc.fr>
> >>> 
> >>> 
> >>> 
> >>> 
> >>> Le 2 juin 2015 ? 10:09, Pranith Kumar Karampuri < pkarampu
at redhat.com
> >>> <mailto:pkarampu at redhat.com> > a
> >>> ?crit :
> >>> 
> >>> hi Geoffrey,
> >>> Since you are saying it happens on all types of volumes, lets
do the
> >>> following:
> >>> 1) Create a dist-repl volume
> >>> 2) Set the options etc you need.
> >>> 3) enable gluster volume profile using "gluster volume
profile <volname>
> >>> start"
> >>> 4) run the work load
> >>> 5) give output of "gluster volume profile <volname>
info"
> >>> 
> >>> Repeat the steps above on new and old version you are
comparing this
> >>> with.
> >>> That should give us insight into what could be causing the
slowness.
> >>> 
> >>> Pranith
> >>> On 06/02/2015 03:22 AM, Geoffrey Letessier wrote:
> >>> 
> >>> 
> >>> Dear all,
> >>> 
> >>> I have a crash test cluster where i?ve tested the new version
of
> >>> GlusterFS
> >>> (v3.7) before upgrading my HPC cluster in production.
> >>> But? all my tests show me very very low performances.
> >>> 
> >>> For my benches, as you can read below, I do some actions
(untar, du,
> >>> find,
> >>> tar, rm) with linux kernel sources, dropping cache, each on
distributed,
> >>> replicated, distributed-replicated, single (single brick)
volumes and the
> >>> native FS of one brick.
> >>> 
> >>> # time (echo 3 > /proc/sys/vm/drop_caches; tar xJf
> >>> ~/linux-4.1-rc5.tar.xz;
> >>> sync; echo 3 > /proc/sys/vm/drop_caches)
> >>> # time (echo 3 > /proc/sys/vm/drop_caches; du -sh
linux-4.1-rc5/; echo 3
> >>> >
> >>> /proc/sys/vm/drop_caches)
> >>> # time (echo 3 > /proc/sys/vm/drop_caches; find
linux-4.1-rc5/|wc -l;
> >>> echo 3
> >>>> /proc/sys/vm/drop_caches)
> >>> # time (echo 3 > /proc/sys/vm/drop_caches; tar czf
linux-4.1-rc5.tgz
> >>> linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches)
> >>> # time (echo 3 > /proc/sys/vm/drop_caches; rm -rf
linux-4.1-rc5.tgz
> >>> linux-4.1-rc5/; echo 3 > /proc/sys/vm/drop_caches)
> >>> 
> >>> And here are the process times:
> >>> 
> >>>
---------------------------------------------------------------
> >>> | | UNTAR | DU | FIND | TAR | RM |
> >>>
---------------------------------------------------------------
> >>> | single | ~3m45s | ~43s | ~47s | ~3m10s | ~3m15s |
> >>>
---------------------------------------------------------------
> >>> | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s |
> >>>
---------------------------------------------------------------
> >>> | distributed | ~4m18s | ~41s | ~57s | ~2m24s | ~1m38s |
> >>>
---------------------------------------------------------------
> >>> | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s |
> >>>
---------------------------------------------------------------
> >>> | native FS | ~11s | ~4s | ~2s | ~56s | ~10s |
> >>>
---------------------------------------------------------------
> >>> 
> >>> I get the same results, whether with default configurations
with custom
> >>> configurations.
> >>> 
> >>> if I look at the side of the ifstat command, I can note my IO
write
> >>> processes
> >>> never exceed 3MBs...
> >>> 
> >>> EXT4 native FS seems to be faster (roughly 15-20% but no more)
than XFS
> >>> one
> >>> 
> >>> My [test] storage cluster config is composed by 2 identical
servers
> >>> (biCPU
> >>> Intel Xeon X5355, 8GB of RAM, 2x2TB HDD (no-RAID) and Gb
ethernet)
> >>> 
> >>> My volume settings:
> >>> single: 1server 1 brick
> >>> replicated: 2 servers 1 brick each
> >>> distributed: 2 servers 2 bricks each
> >>> dist-repl: 2 bricks in the same server and replica 2
> >>> 
> >>> All seems to be OK in gluster status command line.
> >>> 
> >>> Do you have an idea why I obtain so bad results?
> >>> Thanks in advance.
> >>> Geoffrey
> >>> -----------------------------------------------
> >>> Geoffrey Letessier
> >>> 
> >>> Responsable informatique & ing?nieur syst?me
> >>> CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique
> >>> Institut de Biologie Physico-Chimique
> >>> 13, rue Pierre et Marie Curie - 75005 Paris
> >>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr
> >>> <mailto:geoffrey.letessier at cnrs.fr>
> >>> 
> >>> 
> >>> 
> >>> _______________________________________________
> >>> Gluster-users mailing list Gluster-users at gluster.org
> >>> <mailto:Gluster-users at gluster.org>
> >>> http://www.gluster.org/mailman/listinfo/gluster-users
> >>> <http://www.gluster.org/mailman/listinfo/gluster-users>
> >>> 
> >>> 
> >>> 
> >>> 
> >>> _______________________________________________
> >>> Gluster-users mailing list
> >>> Gluster-users at gluster.org <mailto:Gluster-users at
gluster.org>
> >>> http://www.gluster.org/mailman/listinfo/gluster-users
> >>> <http://www.gluster.org/mailman/listinfo/gluster-users>
> 
>

Gluster users - Jun 2015 - GlusterFS 3.7 - slow/poor performances

[Gluster-users] GlusterFS 3.7 - slow/poor performances

[Gluster-users] GlusterFS 3.7 - slow/poor performances