Mathieu Chateau
2015-Jun-20 07:12 UTC
[Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
Hello, for the replicated one, is it a new issue or you just didn't notice before ? Same baseline as before? I also have slowness with small files/many files. For now I could only tune up things with: On gluster level: gluster volume set myvolume performance.io-thread-count 16 gluster volume set myvolume performance.cache-size 1GB gluster volume set myvolume nfs.disable on gluster volume set myvolume readdir-ahead enable gluster volume set myvolume read-ahead disable On network level (client and server) (I don't have infiniband): sysctl -w vm.swappiness=0 sysctl -w net.core.rmem_max=67108864 sysctl -w net.core.wmem_max=67108864 # increase Linux autotuning TCP buffer limit to 32MB sysctl -w net.ipv4.tcp_rmem="4096 87380 33554432" sysctl -w net.ipv4.tcp_wmem="4096 65536 33554432" # increase the length of the processor input queue sysctl -w net.core.netdev_max_backlog=30000 # recommended default congestion control is htcp sysctl -w net.ipv4.tcp_congestion_control=htcp But it's still really slow, even if better Cordialement, Mathieu CHATEAU http://www.lotp.fr 2015-06-20 2:34 GMT+02:00 Geoffrey Letessier <geoffrey.letessier at cnrs.fr>:> Re, > > For comparison, here is the output of the same script run on a distributed > only volume (2 servers of the 4 previously described, 2 bricks each): > ####################################################### > ################ UNTAR time consumed ################ > ####################################################### > > > real 1m44.698s > user 0m8.891s > sys 0m8.353s > > ####################################################### > ################# DU time consumed ################## > ####################################################### > > 554M linux-4.1-rc6 > > real 0m21.062s > user 0m0.100s > sys 0m1.040s > > ####################################################### > ################# FIND time consumed ################ > ####################################################### > > 52663 > > real 0m21.325s > user 0m0.104s > sys 0m1.054s > > ####################################################### > ################# GREP time consumed ################ > ####################################################### > > 7952 > > real 0m43.618s > user 0m0.922s > sys 0m3.626s > > ####################################################### > ################# TAR time consumed ################# > ####################################################### > > > real 0m50.577s > user 0m29.745s > sys 0m4.086s > > ####################################################### > ################# RM time consumed ################## > ####################################################### > > > real 0m41.133s > user 0m0.171s > sys 0m2.522s > > The performances are amazing different! > > Geoffrey > ----------------------------------------------- > Geoffrey Letessier > > Responsable informatique & ing?nieur syst?me > CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique > Institut de Biologie Physico-Chimique > 13, rue Pierre et Marie Curie - 75005 Paris > Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr > > Le 20 juin 2015 ? 02:12, Geoffrey Letessier <geoffrey.letessier at cnrs.fr> > a ?crit : > > Dear all, > > I just noticed on my main volume of my HPC cluster my IO operations become > impressively poor.. > > Doing some file operations above a linux kernel sources compressed file, > the untar operation can take more than 1/2 hours for this file (roughly > 80MB and 52 000 files inside) as you read below: > ####################################################### > ################ UNTAR time consumed ################ > ####################################################### > > > real 32m42.967s > user 0m11.783s > sys 0m15.050s > > ####################################################### > ################# DU time consumed ################## > ####################################################### > > 557M linux-4.1-rc6 > > real 0m25.060s > user 0m0.068s > sys 0m0.344s > > ####################################################### > ################# FIND time consumed ################ > ####################################################### > > 52663 > > real 0m25.687s > user 0m0.084s > sys 0m0.387s > > ####################################################### > ################# GREP time consumed ################ > ####################################################### > > 7952 > > real 2m15.890s > user 0m0.887s > sys 0m2.777s > > ####################################################### > ################# TAR time consumed ################# > ####################################################### > > > real 1m5.551s > user 0m26.536s > sys 0m2.609s > > ####################################################### > ################# RM time consumed ################## > ####################################################### > > > real 2m51.485s > user 0m0.167s > sys 0m1.663s > > For information, this volume is a distributed replicated one and is > composed by 4 servers with 2 bricks each. Each bricks is a 12-drives RAID6 > vdisk with nice native performances (around 1.2GBs). > > In comparison, when I use DD to generate a 100GB file on the same volume, > my write throughput is around 1GB (client side) and 500MBs (server side) > because of replication: > Client side: > [root at node056 ~]# ifstat -i ib0 > ib0 > KB/s in KB/s out > 3251.45 1.09e+06 > 3139.80 1.05e+06 > 3185.29 1.06e+06 > 3293.84 1.09e+06 > ... > > Server side: > [root at lucifer ~]# ifstat -i ib0 > ib0 > KB/s in KB/s out > 561818.1 1746.42 > 560020.3 1737.92 > 526337.1 1648.20 > 513972.7 1613.69 > ... > > DD command: > [root at node056 ~]# dd if=/dev/zero of=/home/root/test.dd bs=1M count=100000 > 100000+0 enregistrements lus > 100000+0 enregistrements ?crits > 104857600000 octets (105 GB) copi?s, 202,99 s, 517 MB/s > > So this issue doesn?t seem coming from the network (which is Infiniband > technology in this case) > > You can find in attachments a set of files: > - mybench.sh: the bench script > - benches.txt: output of my "bench" > - profile.txt: gluster volume profile during the "bench" > - vol_status.txt: gluster volume status > - vol_info.txt: gluster volume info > > Can someone help me to fix it (it?s very critical because this volume is > on a HPC cluster in production). > > Thanks by advance, > Geoffrey > ----------------------------------------------- > Geoffrey Letessier > > Responsable informatique & ing?nieur syst?me > CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique > Institut de Biologie Physico-Chimique > 13, rue Pierre et Marie Curie - 75005 Paris > Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr > <benches.txt> > <mybench.sh> > <profile.txt> > <vol_info.txt> > <vol_status.txt> > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150620/c9345936/attachment.html>
Geoffrey Letessier
2015-Jun-20 09:01 UTC
[Gluster-users] GlusterFS 3.5.3 - untar: very poor performance
Hello Mathieu, Thanks for replying. Previously, i?ve never notice such throughput performances (around 1GBs for 1 big file) but.... The situation with a ? big ? set of small files wasn?t amazing but not such bad than today. The problem seems to concern exclusively the size of each file. "proof": [root at node056 tmp]# dd if=/dev/zero of=masterfile bs=1M count=1000 1000+0 enregistrements lus 1000+0 enregistrements ?crits 1048576000 octets (1,0 GB) copi?s, 2,09139 s, 501 MB/s [root at node056 tmp]# time split -b 1000000 -a 12 masterfile # 1MB per file real 0m42.841s user 0m0.004s sys 0m1.416s [root at node056 tmp]# rm -f xaaaaaaaaa* && sync [root at node056 tmp]# time split -b 5000000 -a 12 masterfile # 5 MB per file real 0m17.801s user 0m0.008s sys 0m1.396s [root at node056 tmp]# rm -f xaaaaaaaaa* && sync [root at node056 tmp]# time split -b 10000000 -a 12 masterfile # 10MB per file real 0m9.686s user 0m0.008s sys 0m1.451s [root at node056 tmp]# rm -f xaaaaaaaaa* && sync [root at node056 tmp]# time split -b 20000000 -a 12 masterfile # 20MB per file real 0m9.717s user 0m0.003s sys 0m1.399s [root at node056 tmp]# rm -f xaaaaaaaaa* && sync [root at node056 tmp]# time split -b 1000000 -a 12 masterfile # 10MB per file real 0m40.283s user 0m0.007s sys 0m1.390s [root at node056 tmp]# rm -f xaaaaaaaaa* && sync Higher is the generated file size, best is the performance (IO throughput and running time)? ifstat output is coherent from both client/node and server side.. a new test: [root at node056 tmp]# dd if=/dev/zero of=masterfile bs=1M count=10000 10000+0 enregistrements lus 10000+0 enregistrements ?crits 10485760000 octets (10 GB) copi?s, 23,0044 s, 456 MB/s [root at node056 tmp]# rm -f xaaaaaaaaa* && sync [root at node056 tmp]# time split -b 10000000 -a 12 masterfile # 10MB per file real 1m43.216s user 0m0.038s sys 0m13.407s So the performance per file is the same (despite of 10x more files) So, i dont understand why, to get the best performance, i need to create file with a size of 10MB or more. Here are my volume reconfigured options: performance.cache-max-file-size: 64MB performance.read-ahead: on performance.write-behind: on features.quota-deem-statfs: on performance.stat-prefetch: on performance.flush-behind: on features.default-soft-limit: 90% features.quota: on diagnostics.brick-log-level: CRITICAL auth.allow: localhost,127.0.0.1,10.* nfs.disable: on performance.cache-size: 1GB performance.write-behind-window-size: 4MB performance.quick-read: on performance.io-cache: on performance.io-thread-count: 64 nfs.enable-ino32: off It?s not a local cache trouble because: 1- it?s disabled in my mount command mount -t glusterfs -o transport=rdma,direct-io-mode=disable,enable-ino32 ib-storage1:vol_home /home 2- i made my test also playing with /proc/sys/vm/drop_caches 3- I note the same ifstat output from both client and server side which is coherent with the computing of bandwidth (file sizes / time (considering the replication). I think it?s not an infiniband network trouble but here are my [not default] settings: connected mode with MTU set to 65520 Do you confirm my feelings? If yes, do you have any other idea? Thanks again and thanks by advance, Geoffrey ----------------------------------------------- Geoffrey Letessier Responsable informatique & ing?nieur syst?me CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr Le 20 juin 2015 ? 09:12, Mathieu Chateau <mathieu.chateau at lotp.fr> a ?crit :> Hello, > > for the replicated one, is it a new issue or you just didn't notice before ? Same baseline as before? > > I also have slowness with small files/many files. > > For now I could only tune up things with: > > On gluster level: > gluster volume set myvolume performance.io-thread-count 16 > gluster volume set myvolume performance.cache-size 1GB > gluster volume set myvolume nfs.disable on > gluster volume set myvolume readdir-ahead enable > gluster volume set myvolume read-ahead disable > > On network level (client and server) (I don't have infiniband): > sysctl -w vm.swappiness=0 > sysctl -w net.core.rmem_max=67108864 > sysctl -w net.core.wmem_max=67108864 > # increase Linux autotuning TCP buffer limit to 32MB > sysctl -w net.ipv4.tcp_rmem="4096 87380 33554432" > sysctl -w net.ipv4.tcp_wmem="4096 65536 33554432" > # increase the length of the processor input queue > sysctl -w net.core.netdev_max_backlog=30000 > # recommended default congestion control is htcp > sysctl -w net.ipv4.tcp_congestion_control=htcp > > But it's still really slow, even if better > > Cordialement, > Mathieu CHATEAU > http://www.lotp.fr > > 2015-06-20 2:34 GMT+02:00 Geoffrey Letessier <geoffrey.letessier at cnrs.fr>: > Re, > > For comparison, here is the output of the same script run on a distributed only volume (2 servers of the 4 previously described, 2 bricks each): > ####################################################### > ################ UNTAR time consumed ################ > ####################################################### > > > real 1m44.698s > user 0m8.891s > sys 0m8.353s > > ####################################################### > ################# DU time consumed ################## > ####################################################### > > 554M linux-4.1-rc6 > > real 0m21.062s > user 0m0.100s > sys 0m1.040s > > ####################################################### > ################# FIND time consumed ################ > ####################################################### > > 52663 > > real 0m21.325s > user 0m0.104s > sys 0m1.054s > > ####################################################### > ################# GREP time consumed ################ > ####################################################### > > 7952 > > real 0m43.618s > user 0m0.922s > sys 0m3.626s > > ####################################################### > ################# TAR time consumed ################# > ####################################################### > > > real 0m50.577s > user 0m29.745s > sys 0m4.086s > > ####################################################### > ################# RM time consumed ################## > ####################################################### > > > real 0m41.133s > user 0m0.171s > sys 0m2.522s > > The performances are amazing different! > > Geoffrey > ----------------------------------------------- > Geoffrey Letessier > > Responsable informatique & ing?nieur syst?me > CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique > Institut de Biologie Physico-Chimique > 13, rue Pierre et Marie Curie - 75005 Paris > Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr > > Le 20 juin 2015 ? 02:12, Geoffrey Letessier <geoffrey.letessier at cnrs.fr> a ?crit : > >> Dear all, >> >> I just noticed on my main volume of my HPC cluster my IO operations become impressively poor.. >> >> Doing some file operations above a linux kernel sources compressed file, the untar operation can take more than 1/2 hours for this file (roughly 80MB and 52 000 files inside) as you read below: >> ####################################################### >> ################ UNTAR time consumed ################ >> ####################################################### >> >> >> real 32m42.967s >> user 0m11.783s >> sys 0m15.050s >> >> ####################################################### >> ################# DU time consumed ################## >> ####################################################### >> >> 557M linux-4.1-rc6 >> >> real 0m25.060s >> user 0m0.068s >> sys 0m0.344s >> >> ####################################################### >> ################# FIND time consumed ################ >> ####################################################### >> >> 52663 >> >> real 0m25.687s >> user 0m0.084s >> sys 0m0.387s >> >> ####################################################### >> ################# GREP time consumed ################ >> ####################################################### >> >> 7952 >> >> real 2m15.890s >> user 0m0.887s >> sys 0m2.777s >> >> ####################################################### >> ################# TAR time consumed ################# >> ####################################################### >> >> >> real 1m5.551s >> user 0m26.536s >> sys 0m2.609s >> >> ####################################################### >> ################# RM time consumed ################## >> ####################################################### >> >> >> real 2m51.485s >> user 0m0.167s >> sys 0m1.663s >> >> For information, this volume is a distributed replicated one and is composed by 4 servers with 2 bricks each. Each bricks is a 12-drives RAID6 vdisk with nice native performances (around 1.2GBs). >> >> In comparison, when I use DD to generate a 100GB file on the same volume, my write throughput is around 1GB (client side) and 500MBs (server side) because of replication: >> Client side: >> [root at node056 ~]# ifstat -i ib0 >> ib0 >> KB/s in KB/s out >> 3251.45 1.09e+06 >> 3139.80 1.05e+06 >> 3185.29 1.06e+06 >> 3293.84 1.09e+06 >> ... >> >> Server side: >> [root at lucifer ~]# ifstat -i ib0 >> ib0 >> KB/s in KB/s out >> 561818.1 1746.42 >> 560020.3 1737.92 >> 526337.1 1648.20 >> 513972.7 1613.69 >> ... >> >> DD command: >> [root at node056 ~]# dd if=/dev/zero of=/home/root/test.dd bs=1M count=100000 >> 100000+0 enregistrements lus >> 100000+0 enregistrements ?crits >> 104857600000 octets (105 GB) copi?s, 202,99 s, 517 MB/s >> >> So this issue doesn?t seem coming from the network (which is Infiniband technology in this case) >> >> You can find in attachments a set of files: >> - mybench.sh: the bench script >> - benches.txt: output of my "bench" >> - profile.txt: gluster volume profile during the "bench" >> - vol_status.txt: gluster volume status >> - vol_info.txt: gluster volume info >> >> Can someone help me to fix it (it?s very critical because this volume is on a HPC cluster in production). >> >> Thanks by advance, >> Geoffrey >> ----------------------------------------------- >> Geoffrey Letessier >> >> Responsable informatique & ing?nieur syst?me >> CNRS - UPR 9080 - Laboratoire de Biochimie Th?orique >> Institut de Biologie Physico-Chimique >> 13, rue Pierre et Marie Curie - 75005 Paris >> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at cnrs.fr >> <benches.txt> >> <mybench.sh> >> <profile.txt> >> <vol_info.txt> >> <vol_status.txt> > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150620/7e8c9451/attachment.html>