thr3ads.net - Gluster users - [Gluster-users] GlusterFS 3.5.3 - untar: very poor performance [Jun 2015]

If this information is useful, please help other people find it:
Share via:

Mathieu Chateau

2015-Jun-20 07:12 UTC

[Gluster-users] GlusterFS 3.5.3 - untar: very poor performance

Hello,

for the replicated one, is it a new issue or you just didn't notice before
? Same baseline as before?

I also have slowness with small files/many files.

For now I could only tune up things with:

On gluster level:
gluster volume set myvolume performance.io-thread-count 16
gluster volume set myvolume  performance.cache-size 1GB
gluster volume set myvolume nfs.disable on
gluster volume set myvolume readdir-ahead enable
gluster volume set myvolume read-ahead disable

On network level (client and server) (I don't have infiniband):
sysctl -w vm.swappiness=0
sysctl -w net.core.rmem_max=67108864
sysctl -w net.core.wmem_max=67108864
# increase Linux autotuning TCP buffer limit to 32MB
sysctl -w net.ipv4.tcp_rmem="4096 87380 33554432"
sysctl -w net.ipv4.tcp_wmem="4096 65536 33554432"
# increase the length of the processor input queue
sysctl -w net.core.netdev_max_backlog=30000
# recommended default congestion control is htcp
sysctl -w net.ipv4.tcp_congestion_control=htcp

But it's still really slow, even if better

Cordialement,
Mathieu CHATEAU
http://www.lotp.fr

2015-06-20 2:34 GMT+02:00 Geoffrey Letessier <geoffrey.letessier at
cnrs.fr>:
> Re,
>
> For comparison, here is the output of the same script run on a distributed
> only volume (2 servers of the 4 previously described, 2 bricks each):
> #######################################################
> ################  UNTAR time consumed  ################
> #######################################################
>
>
> real 1m44.698s
> user 0m8.891s
> sys 0m8.353s
>
> #######################################################
> #################  DU time consumed  ##################
> #######################################################
>
> 554M linux-4.1-rc6
>
> real 0m21.062s
> user 0m0.100s
> sys 0m1.040s
>
> #######################################################
> #################  FIND time consumed  ################
> #######################################################
>
> 52663
>
> real 0m21.325s
> user 0m0.104s
> sys 0m1.054s
>
> #######################################################
> #################  GREP time consumed  ################
> #######################################################
>
> 7952
>
> real 0m43.618s
> user 0m0.922s
> sys 0m3.626s
>
> #######################################################
> #################  TAR time consumed  #################
> #######################################################
>
>
> real 0m50.577s
> user 0m29.745s
> sys 0m4.086s
>
> #######################################################
> #################  RM time consumed  ##################
> #######################################################
>
>
> real 0m41.133s
> user 0m0.171s
> sys 0m2.522s
>
> The performances are amazing different!
>
> Geoffrey
> -----------------------------------------------
> Geoffrey Letessier
>
> Responsable informatique & ing?nieur syst?me
> CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr
>
> Le 20 juin 2015 ? 02:12, Geoffrey Letessier <geoffrey.letessier at
cnrs.fr>
> a ?crit :
>
> Dear all,
>
> I just noticed on my main volume of my HPC cluster my IO operations become
> impressively poor..
>
> Doing some file operations above a linux kernel sources compressed file,
> the untar operation can take more than 1/2 hours for this file (roughly
> 80MB and 52 000 files inside) as you read below:
> #######################################################
> ################  UNTAR time consumed  ################
> #######################################################
>
>
> real 32m42.967s
> user 0m11.783s
> sys 0m15.050s
>
> #######################################################
> #################  DU time consumed  ##################
> #######################################################
>
> 557M linux-4.1-rc6
>
> real 0m25.060s
> user 0m0.068s
> sys 0m0.344s
>
> #######################################################
> #################  FIND time consumed  ################
> #######################################################
>
> 52663
>
> real 0m25.687s
> user 0m0.084s
> sys 0m0.387s
>
> #######################################################
> #################  GREP time consumed  ################
> #######################################################
>
> 7952
>
> real 2m15.890s
> user 0m0.887s
> sys 0m2.777s
>
> #######################################################
> #################  TAR time consumed  #################
> #######################################################
>
>
> real 1m5.551s
> user 0m26.536s
> sys 0m2.609s
>
> #######################################################
> #################  RM time consumed  ##################
> #######################################################
>
>
> real 2m51.485s
> user 0m0.167s
> sys 0m1.663s
>
> For information, this volume is a distributed replicated one and is
> composed by 4 servers with 2 bricks each. Each bricks is a 12-drives RAID6
> vdisk with nice native performances (around 1.2GBs).
>
> In comparison, when I use DD to generate a 100GB file on the same volume,
> my write throughput is around 1GB (client side) and 500MBs (server side)
> because of replication:
> Client side:
> [root at node056 ~]# ifstat -i ib0
>        ib0
>  KB/s in  KB/s out
>  3251.45  1.09e+06
>  3139.80  1.05e+06
>  3185.29  1.06e+06
>  3293.84  1.09e+06
> ...
>
> Server side:
> [root at lucifer ~]# ifstat -i ib0
>        ib0
>  KB/s in  KB/s out
> 561818.1   1746.42
> 560020.3   1737.92
> 526337.1   1648.20
> 513972.7   1613.69
> ...
>
> DD command:
> [root at node056 ~]# dd if=/dev/zero of=/home/root/test.dd bs=1M
count=100000
> 100000+0 enregistrements lus
> 100000+0 enregistrements ?crits
> 104857600000 octets (105 GB) copi?s, 202,99 s, 517 MB/s
>
> So this issue doesn?t seem coming from the network (which is Infiniband
> technology in this case)
>
> You can find in attachments a set of files:
> - mybench.sh: the bench script
> - benches.txt: output of my "bench"
> - profile.txt: gluster volume profile during the "bench"
> - vol_status.txt: gluster volume status
> - vol_info.txt: gluster volume info
>
> Can someone help me to fix it (it?s very critical because this volume is
> on a HPC cluster in production).
>
> Thanks by advance,
> Geoffrey
> -----------------------------------------------
> Geoffrey Letessier
>
> Responsable informatique & ing?nieur syst?me
> CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr
>  <benches.txt>
> <mybench.sh>
> <profile.txt>
> <vol_info.txt>
> <vol_status.txt>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150620/c9345936/attachment.html>

Geoffrey Letessier

2015-Jun-20 09:01 UTC

head link

[Gluster-users] GlusterFS 3.5.3 - untar: very poor performance

Hello Mathieu,

Thanks for replying.

Previously, i?ve never notice such throughput performances (around 1GBs for 1
big file) but.... The situation with a ? big ? set of small files wasn?t amazing
but not such bad than today.

The problem seems to concern exclusively the size of each file.
"proof": 
[root at node056 tmp]# dd if=/dev/zero of=masterfile bs=1M count=1000
1000+0 enregistrements lus
1000+0 enregistrements ?crits
1048576000 octets (1,0 GB) copi?s, 2,09139 s, 501 MB/s
[root at node056 tmp]# time split -b 1000000 -a 12 masterfile  # 1MB per file

real	0m42.841s
user	0m0.004s
sys	0m1.416s
[root at node056 tmp]# rm -f xaaaaaaaaa* && sync
[root at node056 tmp]# time split -b 5000000 -a 12 masterfile  # 5 MB per file

real	0m17.801s
user	0m0.008s
sys	0m1.396s
[root at node056 tmp]# rm -f xaaaaaaaaa* && sync
[root at node056 tmp]# time split -b 10000000 -a 12 masterfile  # 10MB per file

real	0m9.686s
user	0m0.008s
sys	0m1.451s
[root at node056 tmp]# rm -f xaaaaaaaaa* && sync
[root at node056 tmp]# time split -b 20000000 -a 12 masterfile  # 20MB per file

real	0m9.717s
user	0m0.003s
sys	0m1.399s
[root at node056 tmp]# rm -f xaaaaaaaaa* && sync
[root at node056 tmp]# time split -b 1000000 -a 12 masterfile  # 10MB per file

real	0m40.283s
user	0m0.007s
sys	0m1.390s
[root at node056 tmp]# rm -f xaaaaaaaaa* && sync

Higher is the generated file size, best is the performance (IO throughput and
running time)? ifstat output is coherent from both client/node and server side..

a new test:
[root at node056 tmp]# dd if=/dev/zero of=masterfile bs=1M count=10000
10000+0 enregistrements lus
10000+0 enregistrements ?crits
10485760000 octets (10 GB) copi?s, 23,0044 s, 456 MB/s
[root at node056 tmp]# rm -f xaaaaaaaaa* && sync
[root at node056 tmp]# time split -b 10000000 -a 12 masterfile  # 10MB per file

real	1m43.216s
user	0m0.038s
sys	0m13.407s


So the performance per file is the same (despite of 10x more files)

So, i dont understand why, to get the best performance, i need to create file
with a size of 10MB or more.

Here are my volume reconfigured options:
performance.cache-max-file-size: 64MB
performance.read-ahead: on
performance.write-behind: on
features.quota-deem-statfs: on
performance.stat-prefetch: on
performance.flush-behind: on
features.default-soft-limit: 90%
features.quota: on
diagnostics.brick-log-level: CRITICAL
auth.allow: localhost,127.0.0.1,10.*
nfs.disable: on
performance.cache-size: 1GB
performance.write-behind-window-size: 4MB
performance.quick-read: on
performance.io-cache: on
performance.io-thread-count: 64
nfs.enable-ino32: off

It?s not a local cache trouble because:
	1- it?s disabled in my mount command mount -t glusterfs -o
transport=rdma,direct-io-mode=disable,enable-ino32 ib-storage1:vol_home /home
	2- i made my test also playing with /proc/sys/vm/drop_caches
	3- I note the same ifstat output from both client and server side which is
coherent with the computing of bandwidth (file sizes / time (considering the
replication).

I think it?s not an infiniband network trouble but here are my [not default]
settings:
connected mode with MTU set to 65520 

Do you confirm my feelings? If yes, do you have any other idea?

Thanks again and thanks by advance,
Geoffrey
-----------------------------------------------
Geoffrey Letessier

Responsable informatique & ing?nieur syst?me
CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr

Le 20 juin 2015 ? 09:12, Mathieu Chateau <mathieu.chateau at lotp.fr> a
?crit :
> Hello,
> 
> for the replicated one, is it a new issue or you just didn't notice
before ? Same baseline as before?
> 
> I also have slowness with small files/many files.
> 
> For now I could only tune up things with:
> 
> On gluster level:
> gluster volume set myvolume performance.io-thread-count 16
> gluster volume set myvolume  performance.cache-size 1GB
> gluster volume set myvolume nfs.disable on
> gluster volume set myvolume readdir-ahead enable
> gluster volume set myvolume read-ahead disable
> 
> On network level (client and server) (I don't have infiniband):
> sysctl -w vm.swappiness=0
> sysctl -w net.core.rmem_max=67108864
> sysctl -w net.core.wmem_max=67108864
> # increase Linux autotuning TCP buffer limit to 32MB
> sysctl -w net.ipv4.tcp_rmem="4096 87380 33554432"
> sysctl -w net.ipv4.tcp_wmem="4096 65536 33554432"
> # increase the length of the processor input queue
> sysctl -w net.core.netdev_max_backlog=30000
> # recommended default congestion control is htcp
> sysctl -w net.ipv4.tcp_congestion_control=htcp
> 
> But it's still really slow, even if better
> 
> Cordialement,
> Mathieu CHATEAU
> http://www.lotp.fr
> 
> 2015-06-20 2:34 GMT+02:00 Geoffrey Letessier <geoffrey.letessier at
cnrs.fr>:
> Re,
> 
> For comparison, here is the output of the same script run on a distributed
only volume (2 servers of the 4 previously described, 2 bricks each):
> #######################################################
> ################  UNTAR time consumed  ################
> #######################################################
> 
> 
> real	1m44.698s
> user	0m8.891s
> sys	0m8.353s
> 
> #######################################################
> #################  DU time consumed  ##################
> #######################################################
> 
> 554M	linux-4.1-rc6
> 
> real	0m21.062s
> user	0m0.100s
> sys	0m1.040s
> 
> #######################################################
> #################  FIND time consumed  ################
> #######################################################
> 
> 52663
> 
> real	0m21.325s
> user	0m0.104s
> sys	0m1.054s
> 
> #######################################################
> #################  GREP time consumed  ################
> #######################################################
> 
> 7952
> 
> real	0m43.618s
> user	0m0.922s
> sys	0m3.626s
> 
> #######################################################
> #################  TAR time consumed  #################
> #######################################################
> 
> 
> real	0m50.577s
> user	0m29.745s
> sys	0m4.086s
> 
> #######################################################
> #################  RM time consumed  ##################
> #######################################################
> 
> 
> real	0m41.133s
> user	0m0.171s
> sys	0m2.522s
> 
> The performances are amazing different!
> 
> Geoffrey
> -----------------------------------------------
> Geoffrey Letessier
> 
> Responsable informatique & ing?nieur syst?me
> CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr
> 
> Le 20 juin 2015 ? 02:12, Geoffrey Letessier <geoffrey.letessier at
cnrs.fr> a ?crit :
> 
>> Dear all,
>> 
>> I just noticed on my main volume of my HPC cluster my IO operations
become impressively poor..
>> 
>> Doing some file operations above a linux kernel sources compressed
file, the untar operation can take more than 1/2 hours for this file (roughly
80MB and 52 000 files inside) as you read below:
>> #######################################################
>> ################  UNTAR time consumed  ################
>> #######################################################
>> 
>> 
>> real	32m42.967s
>> user	0m11.783s
>> sys	0m15.050s
>> 
>> #######################################################
>> #################  DU time consumed  ##################
>> #######################################################
>> 
>> 557M	linux-4.1-rc6
>> 
>> real	0m25.060s
>> user	0m0.068s
>> sys	0m0.344s
>> 
>> #######################################################
>> #################  FIND time consumed  ################
>> #######################################################
>> 
>> 52663
>> 
>> real	0m25.687s
>> user	0m0.084s
>> sys	0m0.387s
>> 
>> #######################################################
>> #################  GREP time consumed  ################
>> #######################################################
>> 
>> 7952
>> 
>> real	2m15.890s
>> user	0m0.887s
>> sys	0m2.777s
>> 
>> #######################################################
>> #################  TAR time consumed  #################
>> #######################################################
>> 
>> 
>> real	1m5.551s
>> user	0m26.536s
>> sys	0m2.609s
>> 
>> #######################################################
>> #################  RM time consumed  ##################
>> #######################################################
>> 
>> 
>> real	2m51.485s
>> user	0m0.167s
>> sys	0m1.663s
>> 
>> For information, this volume is a distributed replicated one and is
composed by 4 servers with 2 bricks each. Each bricks is a 12-drives RAID6 vdisk
with nice native performances (around 1.2GBs).
>> 
>> In comparison, when I use DD to generate a 100GB file on the same
volume, my write throughput is around 1GB (client side) and 500MBs (server side)
because of replication:
>> Client side:
>> [root at node056 ~]# ifstat -i ib0
>>        ib0        
>>  KB/s in  KB/s out
>>  3251.45  1.09e+06
>>  3139.80  1.05e+06
>>  3185.29  1.06e+06
>>  3293.84  1.09e+06
>> ...
>> 
>> Server side:
>> [root at lucifer ~]# ifstat -i ib0
>>        ib0        
>>  KB/s in  KB/s out
>> 561818.1   1746.42
>> 560020.3   1737.92
>> 526337.1   1648.20
>> 513972.7   1613.69
>> ...
>> 
>> DD command:
>> [root at node056 ~]# dd if=/dev/zero of=/home/root/test.dd bs=1M
count=100000
>> 100000+0 enregistrements lus
>> 100000+0 enregistrements ?crits
>> 104857600000 octets (105 GB) copi?s, 202,99 s, 517 MB/s
>> 
>> So this issue doesn?t seem coming from the network (which is Infiniband
technology in this case)
>> 
>> You can find in attachments a set of files:
>> 	- mybench.sh: the bench script
>> 	- benches.txt: output of my "bench"
>> 	- profile.txt: gluster volume profile during the "bench"
>> 	- vol_status.txt: gluster volume status
>> 	- vol_info.txt: gluster volume info
>> 
>> Can someone help me to fix it (it?s very critical because this volume
is on a HPC cluster in production).
>> 
>> Thanks by advance,
>> Geoffrey
>> -----------------------------------------------
>> Geoffrey Letessier
>> 
>> Responsable informatique & ing?nieur syst?me
>> CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique
>> Institut de Biologie Physico-Chimique
>> 13, rue Pierre et Marie Curie - 75005 Paris
>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr
>> <benches.txt>
>> <mybench.sh>
>> <profile.txt>
>> <vol_info.txt>
>> <vol_status.txt>
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150620/7e8c9451/attachment.html>

Gluster users - Jun 2015 - GlusterFS 3.5.3 - untar: very poor performance

[Gluster-users] GlusterFS 3.5.3 - untar: very poor performance

[Gluster-users] GlusterFS 3.5.3 - untar: very poor performance