Pranith Kumar Karampuri
2017-Apr-14 06:50 UTC
[Gluster-users] Slow write times to gluster disk
On Sat, Apr 8, 2017 at 10:28 AM, Ravishankar N <ravishankar at redhat.com> wrote:> Hi Pat, > > I'm assuming you are using gluster native (fuse mount). If it helps, you > could try mounting it via gluster NFS (gnfs) and then see if there is an > improvement in speed. Fuse mounts are slower than gnfs mounts but you get > the benefit of avoiding a single point of failure. Unlike fuse mounts, if > the gluster node containing the gnfs server goes down, all mounts done > using that node will fail). For fuse mounts, you could try tweaking the > write-behind xlator settings to see if it helps. See the > performance.write-behind and performance.write-behind-window-size options > in `gluster volume set help`. Of course, even for gnfs mounts, you can > achieve fail-over by using CTDB. >Ravi, Do you have any data that suggests fuse mounts are slower than gNFS servers? Pat, I see that I am late to the thread, but do you happen to have "profile info" of the workload? You can follow https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Monitoring%20Workload/ to get the information.> > Thanks, > Ravi > > > On 04/08/2017 12:07 AM, Pat Haley wrote: > > > Hi, > > We noticed a dramatic slowness when writing to a gluster disk when > compared to writing to an NFS disk. Specifically when using dd (data > duplicator) to write a 4.3 GB file of zeros: > > - on NFS disk (/home): 9.5 Gb/s > - on gluster disk (/gdata): 508 Mb/s > > The gluser disk is 2 bricks joined together, no replication or anything > else. The hardware is (literally) the same: > > - one server with 70 hard disks and a hardware RAID card. > - 4 disks in a RAID-6 group (the NFS disk) > - 32 disks in a RAID-6 group (the max allowed by the card, /mnt/brick1) > - 32 disks in another RAID-6 group (/mnt/brick2) > - 2 hot spare > > Some additional information and more tests results (after changing the log > level): > > glusterfs 3.7.11 built on Apr 27 2016 14:09:22 > CentOS release 6.8 (Final) > RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108 > [Invader] (rev 02) > > > > *Create the file to /gdata (gluster)* > [root at mseas-data2 gdata]# dd if=/dev/zero of=/gdata/zero1 bs=1M count=1000 > 1000+0 records in > 1000+0 records out > 1048576000 bytes (1.0 GB) copied, 1.91876 s, *546 MB/s* > > *Create the file to /home (ext4)* > [root at mseas-data2 gdata]# dd if=/dev/zero of=/home/zero1 bs=1M count=1000 > 1000+0 records in > 1000+0 records out > 1048576000 bytes (1.0 GB) copied, 0.686021 s, *1.5 GB/s - *3 times as fast > > > > * Copy from /gdata to /gdata (gluster to gluster) *[root at mseas-data2 > gdata]# dd if=/gdata/zero1 of=/gdata/zero2 > 2048000+0 records in > 2048000+0 records out > 1048576000 bytes (1.0 GB) copied, 101.052 s, *10.4 MB/s* - realllyyy > slooowww > > > *Copy from /gdata to /gdata* *2nd time (gluster to gluster)* > [root at mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2 > 2048000+0 records in > 2048000+0 records out > 1048576000 bytes (1.0 GB) copied, 92.4904 s, *11.3 MB/s* - realllyyy > slooowww again > > > > *Copy from /home to /home (ext4 to ext4)* > [root at mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero2 > 2048000+0 records in > 2048000+0 records out > 1048576000 bytes (1.0 GB) copied, 3.53263 s, *297 MB/s *30 times as fast > > > *Copy from /home to /home (ext4 to ext4)* > [root at mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero3 > 2048000+0 records in > 2048000+0 records out > 1048576000 bytes (1.0 GB) copied, 4.1737 s, *251 MB/s* - 30 times as fast > > > As a test, can we copy data directly to the xfs mountpoint (/mnt/brick1) > and bypass gluster? > > > Any help you could give us would be appreciated. > > Thanks > > -- > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > Pat Haley Email: phaley at mit.edu > Center for Ocean Engineering Phone: (617) 253-6824 > Dept. of Mechanical Engineering Fax: (617) 253-8125 > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > 77 Massachusetts Avenue > Cambridge, MA 02139-4301 > > > > _______________________________________________ > Gluster-users mailing listGluster-users at gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >-- Pranith -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170414/dd3f9839/attachment.html>
On 04/14/2017 12:20 PM, Pranith Kumar Karampuri wrote:> > > On Sat, Apr 8, 2017 at 10:28 AM, Ravishankar N <ravishankar at redhat.com > <mailto:ravishankar at redhat.com>> wrote: > > Hi Pat, > > I'm assuming you are using gluster native (fuse mount). If it > helps, you could try mounting it via gluster NFS (gnfs) and then > see if there is an improvement in speed. Fuse mounts are slower > than gnfs mounts but you get the benefit of avoiding a single > point of failure. Unlike fuse mounts, if the gluster node > containing the gnfs server goes down, all mounts done using that > node will fail). For fuse mounts, you could try tweaking the > write-behind xlator settings to see if it helps. See the > performance.write-behind and performance.write-behind-window-size > options in `gluster volume set help`. Of course, even for gnfs > mounts, you can achieve fail-over by using CTDB. > > > Ravi, > Do you have any data that suggests fuse mounts are slower than > gNFS servers?I have heard anecdotal evidence time and again on the ML and IRC, which is why I wanted to compare it with NFS numbers on his setup.> > Pat, > I see that I am late to the thread, but do you happen to have > "profile info" of the workload? > > You can follow > https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Monitoring%20Workload/ > to get the information.Yeah, Let's see if profile info shows up anything interesting. -Ravi> > > Thanks, > Ravi > > > On 04/08/2017 12:07 AM, Pat Haley wrote: >> >> Hi, >> >> We noticed a dramatic slowness when writing to a gluster disk >> when compared to writing to an NFS disk. Specifically when using >> dd (data duplicator) to write a 4.3 GB file of zeros: >> >> * on NFS disk (/home): 9.5 Gb/s >> * on gluster disk (/gdata): 508 Mb/s >> >> The gluser disk is 2 bricks joined together, no replication or >> anything else. The hardware is (literally) the same: >> >> * one server with 70 hard disks and a hardware RAID card. >> * 4 disks in a RAID-6 group (the NFS disk) >> * 32 disks in a RAID-6 group (the max allowed by the card, >> /mnt/brick1) >> * 32 disks in another RAID-6 group (/mnt/brick2) >> * 2 hot spare >> >> Some additional information and more tests results (after >> changing the log level): >> >> glusterfs 3.7.11 built on Apr 27 2016 14:09:22 >> CentOS release 6.8 (Final) >> RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 >> 3108 [Invader] (rev 02) >> >> >> >> *Create the file to /gdata (gluster)* >> [root at mseas-data2 gdata]# dd if=/dev/zero of=/gdata/zero1 bs=1M >> count=1000 >> 1000+0 records in >> 1000+0 records out >> 1048576000 bytes (1.0 GB) copied, 1.91876 s, *546 MB/s* >> >> *Create the file to /home (ext4)* >> [root at mseas-data2 gdata]# dd if=/dev/zero of=/home/zero1 bs=1M >> count=1000 >> 1000+0 records in >> 1000+0 records out >> 1048576000 bytes (1.0 GB) copied, 0.686021 s, *1.5 GB/s - *3 >> times as fast* >> >> >> Copy from /gdata to /gdata (gluster to gluster) >> *[root at mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2 >> 2048000+0 records in >> 2048000+0 records out >> 1048576000 bytes (1.0 GB) copied, 101.052 s, *10.4 MB/s* - >> realllyyy slooowww >> >> >> *Copy from /gdata to /gdata* *2nd time *(gluster to gluster)** >> [root at mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2 >> 2048000+0 records in >> 2048000+0 records out >> 1048576000 bytes (1.0 GB) copied, 92.4904 s, *11.3 MB/s* - >> realllyyy slooowww again >> >> >> >> *Copy from /home to /home (ext4 to ext4)* >> [root at mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero2 >> 2048000+0 records in >> 2048000+0 records out >> 1048576000 bytes (1.0 GB) copied, 3.53263 s, *297 MB/s *30 times >> as fast >> >> >> *Copy from /home to /home (ext4 to ext4)* >> [root at mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero3 >> 2048000+0 records in >> 2048000+0 records out >> 1048576000 bytes (1.0 GB) copied, 4.1737 s, *251 MB/s* - 30 times >> as fast >> >> >> As a test, can we copy data directly to the xfs mountpoint >> (/mnt/brick1) and bypass gluster? >> >> >> Any help you could give us would be appreciated. >> >> Thanks >> >> -- >> >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu> >> Center for Ocean Engineering Phone: (617) 253-6824 >> Dept. of Mechanical Engineering Fax: (617) 253-8125 >> MIT, Room 5-213http://web.mit.edu/phaley/www/ >> 77 Massachusetts Avenue >> Cambridge, MA 02139-4301 >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> http://lists.gluster.org/mailman/listinfo/gluster-users >> <http://lists.gluster.org/mailman/listinfo/gluster-users> > > _______________________________________________ Gluster-users > mailing list Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org> > http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users> > > -- > Pranith-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170414/56a5c46f/attachment.html>
Hi Pranith & Ravi, Sorry for the delay. I have the profile info for the past couple of days just below. Is this of any help to you or is there additional information I can request? Brick: mseas-data2:/mnt/brick2 ------------------------------ Cumulative Stats: Block Size: 1b+ 2b+ 4b+ No. of Reads: 6 38 1144 No. of Writes: 108032195 8352125 141319922 Block Size: 8b+ 16b+ 32b+ No. of Reads: 689 1256 2756 No. of Writes: 13946933 20694915 57845473 Block Size: 64b+ 128b+ 256b+ No. of Reads: 5522 56492 149462 No. of Writes: 714398165 11923303 2537176 Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 64285 192872 200488 No. of Writes: 5975842 217173849 94536339 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 300021 764297 1613672 No. of Writes: 112481858 53164978 330177486 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 5101884 14470916 4958306977 No. of Writes: 35098110 19969017 2243344759 Block Size: 262144b+ No. of Reads: 0 No. of Writes: 547 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 4052087 FORGET 0.00 0.00 us 0.00 us 0.00 us 6381234 RELEASE 0.00 0.00 us 0.00 us 0.00 us 28716633 RELEASEDIR 0.00 92.81 us 48.00 us 130.00 us 53 READLINK 0.00 201.22 us 112.00 us 457.00 us 188 RMDIR 0.00 169.36 us 53.00 us 20417.00 us 347 SETXATTR 0.00 20497.89 us 241.00 us 57505.00 us 45 SYMLINK 0.00 116.97 us 42.00 us 39168.00 us 9172 SETATTR 0.00 380.06 us 76.00 us 198427.00 us 3133 LINK 0.00 149.60 us 14.00 us 601941.00 us 14426 INODELK 0.00 387.81 us 69.00 us 161114.00 us 6617 RENAME 0.01 96.47 us 14.00 us 1224734.00 us 63599 STATFS 0.01 25041.48 us 299.00 us 93211.00 us 348 MKDIR 0.01 380.41 us 31.00 us 561724.00 us 31452 OPEN 0.02 1346.42 us 64.00 us 226741.00 us 18306 UNLINK 0.02 2123.19 us 42.00 us 802398.00 us 12370 FTRUNCATE 0.04 12161.88 us 175.00 us 158072.00 us 3244 MKNOD 0.07 132801.87 us 39.00 us 3144448.00 us 532 FSYNC 0.13 89.98 us 4.00 us 5550246.00 us 1492793 FLUSH 0.45 65.89 us 6.00 us 3608035.00 us 7194229 FSTAT 0.57 14538.33 us 162.00 us 4577282.00 us 41466 CREATE 0.70 3183.52 us 16.00 us 4358324.00 us 231728 OPENDIR 1.67 7559.32 us 8.00 us 4193443.00 us 234012 STAT 2.26 119.27 us 11.00 us 4491219.00 us 20093638 WRITE 2.51 207.00 us 10.00 us 4993074.00 us 12884466 READ 4.17 246.12 us 13.00 us 8857354.00 us 17952607 GETXATTR 23.72 48775.51 us 14.00 us 5022445.00 us 515770 READDIRP 63.65 1238.53 us 25.00 us 4483760.00 us 54507520 LOOKUP Duration: 9810315 seconds Data Read: 651660783328883 bytes Data Written: 305412177327433 bytes Interval 0 Stats: Block Size: 1b+ 2b+ 4b+ No. of Reads: 6 38 1144 No. of Writes: 108032195 8352125 141319922 Block Size: 8b+ 16b+ 32b+ No. of Reads: 689 1256 2756 No. of Writes: 13946933 20694915 57845473 Block Size: 64b+ 128b+ 256b+ No. of Reads: 5522 56492 149462 No. of Writes: 714398165 11923303 2537176 Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 64285 192872 200488 No. of Writes: 5975842 217173849 94536339 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 300021 764297 1613672 No. of Writes: 112481858 53164978 330177486 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 5101884 14470916 4958306977 No. of Writes: 35098110 19969017 2243344759 Block Size: 262144b+ No. of Reads: 0 No. of Writes: 547 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 4052087 FORGET 0.00 0.00 us 0.00 us 0.00 us 6381233 RELEASE 0.00 0.00 us 0.00 us 0.00 us 28716630 RELEASEDIR 0.00 92.81 us 48.00 us 130.00 us 53 READLINK 0.00 201.22 us 112.00 us 457.00 us 188 RMDIR 0.00 169.36 us 53.00 us 20417.00 us 347 SETXATTR 0.00 20497.89 us 241.00 us 57505.00 us 45 SYMLINK 0.00 116.97 us 42.00 us 39168.00 us 9172 SETATTR 0.00 380.06 us 76.00 us 198427.00 us 3133 LINK 0.00 149.60 us 14.00 us 601941.00 us 14426 INODELK 0.00 387.81 us 69.00 us 161114.00 us 6617 RENAME 0.01 96.47 us 14.00 us 1224734.00 us 63599 STATFS 0.01 25041.48 us 299.00 us 93211.00 us 348 MKDIR 0.01 380.41 us 31.00 us 561724.00 us 31452 OPEN 0.02 1346.42 us 64.00 us 226741.00 us 18306 UNLINK 0.02 2123.19 us 42.00 us 802398.00 us 12370 FTRUNCATE 0.04 12161.88 us 175.00 us 158072.00 us 3244 MKNOD 0.07 132801.87 us 39.00 us 3144448.00 us 532 FSYNC 0.13 89.98 us 4.00 us 5550246.00 us 1492793 FLUSH 0.45 65.89 us 6.00 us 3608035.00 us 7194229 FSTAT 0.57 14538.33 us 162.00 us 4577282.00 us 41466 CREATE 0.70 3183.52 us 16.00 us 4358324.00 us 231728 OPENDIR 1.67 7559.32 us 8.00 us 4193443.00 us 234012 STAT 2.26 119.27 us 11.00 us 4491219.00 us 20093638 WRITE 2.51 207.00 us 10.00 us 4993074.00 us 12884466 READ 4.17 246.12 us 13.00 us 8857354.00 us 17952607 GETXATTR 23.72 48775.51 us 14.00 us 5022445.00 us 515770 READDIRP 63.65 1238.53 us 25.00 us 4483760.00 us 54507520 LOOKUP Duration: 9810315 seconds Data Read: 651660783328883 bytes Data Written: 305412177327433 bytes Brick: mseas-data2:/mnt/brick1 ------------------------------ Cumulative Stats: Block Size: 1b+ 2b+ 4b+ No. of Reads: 4 38 1482 No. of Writes: 643631512 59055444 235532859 Block Size: 8b+ 16b+ 32b+ No. of Reads: 1171 2138 4748 No. of Writes: 31816870 23602175 50161322 Block Size: 64b+ 128b+ 256b+ No. of Reads: 9461 65360 165954 No. of Writes: 711114605 11760241 4078907 Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 94563 226053 258803 No. of Writes: 6366990 211643393 95831137 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 383871 1032345 2244921 No. of Writes: 155833532 57850303 339892660 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 7588068 22368398 5387488199 No. of Writes: 38588368 25195605 2463004132 Block Size: 262144b+ No. of Reads: 0 No. of Writes: 489 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 4060396 FORGET 0.00 0.00 us 0.00 us 0.00 us 6244016 RELEASE 0.00 0.00 us 0.00 us 0.00 us 28716852 RELEASEDIR 0.00 96.42 us 61.00 us 148.00 us 40 READLINK 0.00 208.36 us 114.00 us 322.00 us 188 RMDIR 0.00 2231.61 us 57.00 us 716342.00 us 347 SETXATTR 0.00 20821.92 us 758.00 us 57852.00 us 38 SYMLINK 0.00 519.11 us 76.00 us 952378.00 us 3149 LINK 0.00 196.97 us 50.00 us 736928.00 us 9055 SETATTR 0.00 164.34 us 18.00 us 736161.00 us 13460 INODELK 0.00 375.54 us 73.00 us 198362.00 us 6274 RENAME 0.01 20913.10 us 351.00 us 102696.00 us 348 MKDIR 0.01 151.39 us 17.00 us 782025.00 us 63598 STATFS 0.03 1103.67 us 34.00 us 618187.00 us 29597 OPEN 0.03 2833.17 us 43.00 us 1069257.00 us 11693 FTRUNCATE 0.04 2267.87 us 61.00 us 3746134.00 us 17859 UNLINK 0.04 13105.16 us 254.00 us 179505.00 us 3177 MKNOD 0.05 88496.76 us 21.00 us 1718559.00 us 613 FSYNC 0.58 73.42 us 6.00 us 1917794.00 us 7848483 FSTAT 0.71 17177.23 us 177.00 us 7077794.00 us 40554 CREATE 0.79 585.79 us 3.00 us 11107703.00 us 1322036 FLUSH 1.72 7459.40 us 9.00 us 2764285.00 us 228033 STAT 1.96 8350.73 us 19.00 us 2235725.00 us 231728 OPENDIR 2.60 115.35 us 12.00 us 4196355.00 us 22239110 WRITE 4.60 313.20 us 10.00 us 6211594.00 us 14494253 READ 5.98 307.95 us 13.00 us 9885480.00 us 19163193 GETXATTR 25.68 48514.34 us 17.00 us 4734636.00 us 522162 READDIRP 55.15 1075.93 us 26.00 us 4291535.00 us 50562855 LOOKUP Duration: 9810315 seconds Data Read: 708869551853133 bytes Data Written: 335305857076797 bytes Interval 0 Stats: Block Size: 1b+ 2b+ 4b+ No. of Reads: 4 38 1482 No. of Writes: 643631512 59055444 235532859 Block Size: 8b+ 16b+ 32b+ No. of Reads: 1171 2138 4748 No. of Writes: 31816870 23602175 50161322 Block Size: 64b+ 128b+ 256b+ No. of Reads: 9461 65360 165954 No. of Writes: 711114605 11760241 4078907 Block Size: 512b+ 1024b+ 2048b+ No. of Reads: 94563 226053 258803 No. of Writes: 6366990 211643393 95831137 Block Size: 4096b+ 8192b+ 16384b+ No. of Reads: 383871 1032345 2244921 No. of Writes: 155833532 57850303 339892660 Block Size: 32768b+ 65536b+ 131072b+ No. of Reads: 7588068 22368398 5387488199 No. of Writes: 38588368 25195605 2463004132 Block Size: 262144b+ No. of Reads: 0 No. of Writes: 489 %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop --------- ----------- ----------- ----------- ------------ ---- 0.00 0.00 us 0.00 us 0.00 us 4060397 FORGET 0.00 0.00 us 0.00 us 0.00 us 6244015 RELEASE 0.00 0.00 us 0.00 us 0.00 us 28716850 RELEASEDIR 0.00 96.42 us 61.00 us 148.00 us 40 READLINK 0.00 208.36 us 114.00 us 322.00 us 188 RMDIR 0.00 2231.61 us 57.00 us 716342.00 us 347 SETXATTR 0.00 20821.92 us 758.00 us 57852.00 us 38 SYMLINK 0.00 519.11 us 76.00 us 952378.00 us 3149 LINK 0.00 196.97 us 50.00 us 736928.00 us 9055 SETATTR 0.00 164.34 us 18.00 us 736161.00 us 13460 INODELK 0.00 375.54 us 73.00 us 198362.00 us 6274 RENAME 0.01 20913.10 us 351.00 us 102696.00 us 348 MKDIR 0.01 151.39 us 17.00 us 782025.00 us 63598 STATFS 0.03 1103.67 us 34.00 us 618187.00 us 29597 OPEN 0.03 2833.17 us 43.00 us 1069257.00 us 11693 FTRUNCATE 0.04 2267.87 us 61.00 us 3746134.00 us 17859 UNLINK 0.04 13105.16 us 254.00 us 179505.00 us 3177 MKNOD 0.05 88496.76 us 21.00 us 1718559.00 us 613 FSYNC 0.58 73.42 us 6.00 us 1917794.00 us 7848483 FSTAT 0.71 17177.23 us 177.00 us 7077794.00 us 40554 CREATE 0.79 585.79 us 3.00 us 11107703.00 us 1322036 FLUSH 1.72 7459.40 us 9.00 us 2764285.00 us 228033 STAT 1.96 8350.73 us 19.00 us 2235725.00 us 231728 OPENDIR 2.60 115.35 us 12.00 us 4196355.00 us 22239110 WRITE 4.60 313.20 us 10.00 us 6211594.00 us 14494253 READ 5.98 307.95 us 13.00 us 9885480.00 us 19163193 GETXATTR 25.68 48514.34 us 17.00 us 4734636.00 us 522162 READDIRP 55.15 1075.93 us 26.00 us 4291535.00 us 50562855 LOOKUP Duration: 9810315 seconds Data Read: 708869551853133 bytes Data Written: 335305857076797 bytes On 04/14/2017 02:50 AM, Pranith Kumar Karampuri wrote:> > > On Sat, Apr 8, 2017 at 10:28 AM, Ravishankar N <ravishankar at redhat.com > <mailto:ravishankar at redhat.com>> wrote: > > Hi Pat, > > I'm assuming you are using gluster native (fuse mount). If it > helps, you could try mounting it via gluster NFS (gnfs) and then > see if there is an improvement in speed. Fuse mounts are slower > than gnfs mounts but you get the benefit of avoiding a single > point of failure. Unlike fuse mounts, if the gluster node > containing the gnfs server goes down, all mounts done using that > node will fail). For fuse mounts, you could try tweaking the > write-behind xlator settings to see if it helps. See the > performance.write-behind and performance.write-behind-window-size > options in `gluster volume set help`. Of course, even for gnfs > mounts, you can achieve fail-over by using CTDB. > > > Ravi, > Do you have any data that suggests fuse mounts are slower than > gNFS servers? > > Pat, > I see that I am late to the thread, but do you happen to have > "profile info" of the workload? > > You can follow > https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Monitoring%20Workload/ > to get the information. > > > Thanks, > Ravi > > > On 04/08/2017 12:07 AM, Pat Haley wrote: >> >> Hi, >> >> We noticed a dramatic slowness when writing to a gluster disk >> when compared to writing to an NFS disk. Specifically when using >> dd (data duplicator) to write a 4.3 GB file of zeros: >> >> * on NFS disk (/home): 9.5 Gb/s >> * on gluster disk (/gdata): 508 Mb/s >> >> The gluser disk is 2 bricks joined together, no replication or >> anything else. The hardware is (literally) the same: >> >> * one server with 70 hard disks and a hardware RAID card. >> * 4 disks in a RAID-6 group (the NFS disk) >> * 32 disks in a RAID-6 group (the max allowed by the card, >> /mnt/brick1) >> * 32 disks in another RAID-6 group (/mnt/brick2) >> * 2 hot spare >> >> Some additional information and more tests results (after >> changing the log level): >> >> glusterfs 3.7.11 built on Apr 27 2016 14:09:22 >> CentOS release 6.8 (Final) >> RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 >> 3108 [Invader] (rev 02) >> >> >> >> *Create the file to /gdata (gluster)* >> [root at mseas-data2 gdata]# dd if=/dev/zero of=/gdata/zero1 bs=1M >> count=1000 >> 1000+0 records in >> 1000+0 records out >> 1048576000 bytes (1.0 GB) copied, 1.91876 s, *546 MB/s* >> >> *Create the file to /home (ext4)* >> [root at mseas-data2 gdata]# dd if=/dev/zero of=/home/zero1 bs=1M >> count=1000 >> 1000+0 records in >> 1000+0 records out >> 1048576000 bytes (1.0 GB) copied, 0.686021 s, *1.5 GB/s - *3 >> times as fast* >> >> >> Copy from /gdata to /gdata (gluster to gluster) >> *[root at mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2 >> 2048000+0 records in >> 2048000+0 records out >> 1048576000 bytes (1.0 GB) copied, 101.052 s, *10.4 MB/s* - >> realllyyy slooowww >> >> >> *Copy from /gdata to /gdata* *2nd time *(gluster to gluster)** >> [root at mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2 >> 2048000+0 records in >> 2048000+0 records out >> 1048576000 bytes (1.0 GB) copied, 92.4904 s, *11.3 MB/s* - >> realllyyy slooowww again >> >> >> >> *Copy from /home to /home (ext4 to ext4)* >> [root at mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero2 >> 2048000+0 records in >> 2048000+0 records out >> 1048576000 bytes (1.0 GB) copied, 3.53263 s, *297 MB/s *30 times >> as fast >> >> >> *Copy from /home to /home (ext4 to ext4)* >> [root at mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero3 >> 2048000+0 records in >> 2048000+0 records out >> 1048576000 bytes (1.0 GB) copied, 4.1737 s, *251 MB/s* - 30 times >> as fast >> >> >> As a test, can we copy data directly to the xfs mountpoint >> (/mnt/brick1) and bypass gluster? >> >> >> Any help you could give us would be appreciated. >> >> Thanks >> >> -- >> >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu> >> Center for Ocean Engineering Phone: (617) 253-6824 >> Dept. of Mechanical Engineering Fax: (617) 253-8125 >> MIT, Room 5-213http://web.mit.edu/phaley/www/ >> 77 Massachusetts Avenue >> Cambridge, MA 02139-4301 >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> http://lists.gluster.org/mailman/listinfo/gluster-users >> <http://lists.gluster.org/mailman/listinfo/gluster-users> > > _______________________________________________ Gluster-users > mailing list Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org> > http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users> > > -- > Pranith-- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Pat Haley Email: phaley at mit.edu Center for Ocean Engineering Phone: (617) 253-6824 Dept. of Mechanical Engineering Fax: (617) 253-8125 MIT, Room 5-213 http://web.mit.edu/phaley/www/ 77 Massachusetts Avenue Cambridge, MA 02139-4301 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170503/b8895d6c/attachment.html>
On 04/13/17 23:50, Pranith Kumar Karampuri wrote:> > > On Sat, Apr 8, 2017 at 10:28 AM, Ravishankar N <ravishankar at redhat.com > <mailto:ravishankar at redhat.com>> wrote: > > Hi Pat, > > I'm assuming you are using gluster native (fuse mount). If it > helps, you could try mounting it via gluster NFS (gnfs) and then > see if there is an improvement in speed. Fuse mounts are slower > than gnfs mounts but you get the benefit of avoiding a single > point of failure. Unlike fuse mounts, if the gluster node > containing the gnfs server goes down, all mounts done using that > node will fail). For fuse mounts, you could try tweaking the > write-behind xlator settings to see if it helps. See the > performance.write-behind and performance.write-behind-window-size > options in `gluster volume set help`. Of course, even for gnfs > mounts, you can achieve fail-over by using CTDB. > > > Ravi, > Do you have any data that suggests fuse mounts are slower than > gNFS servers? > > Pat, > I see that I am late to the thread, but do you happen to have > "profile info" of the workload? >I have done actual testing. For directory ops, NFS is faster due to the default cache settings in the kernel. For raw throughput, or ops on an open file, fuse is faster. I have yet to test this but I expect with the newer caching features in 3.8+, even directory op performance should be similar to nfs and more accurate.> You can follow > https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Monitoring%20Workload/ > to get the information. > > > Thanks, > Ravi > > > On 04/08/2017 12:07 AM, Pat Haley wrote: >> >> Hi, >> >> We noticed a dramatic slowness when writing to a gluster disk >> when compared to writing to an NFS disk. Specifically when using >> dd (data duplicator) to write a 4.3 GB file of zeros: >> >> * on NFS disk (/home): 9.5 Gb/s >> * on gluster disk (/gdata): 508 Mb/s >> >> The gluser disk is 2 bricks joined together, no replication or >> anything else. The hardware is (literally) the same: >> >> * one server with 70 hard disks and a hardware RAID card. >> * 4 disks in a RAID-6 group (the NFS disk) >> * 32 disks in a RAID-6 group (the max allowed by the card, >> /mnt/brick1) >> * 32 disks in another RAID-6 group (/mnt/brick2) >> * 2 hot spare >> >> Some additional information and more tests results (after >> changing the log level): >> >> glusterfs 3.7.11 built on Apr 27 2016 14:09:22 >> CentOS release 6.8 (Final) >> RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 >> 3108 [Invader] (rev 02) >> >> >> >> *Create the file to /gdata (gluster)* >> [root at mseas-data2 gdata]# dd if=/dev/zero of=/gdata/zero1 bs=1M >> count=1000 >> 1000+0 records in >> 1000+0 records out >> 1048576000 bytes (1.0 GB) copied, 1.91876 s, *546 MB/s* >> >> *Create the file to /home (ext4)* >> [root at mseas-data2 gdata]# dd if=/dev/zero of=/home/zero1 bs=1M >> count=1000 >> 1000+0 records in >> 1000+0 records out >> 1048576000 bytes (1.0 GB) copied, 0.686021 s, *1.5 GB/s - *3 >> times as fast* >> >> >> Copy from /gdata to /gdata (gluster to gluster) >> *[root at mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2 >> 2048000+0 records in >> 2048000+0 records out >> 1048576000 bytes (1.0 GB) copied, 101.052 s, *10.4 MB/s* - >> realllyyy slooowww >> >> >> *Copy from /gdata to /gdata* *2nd time *(gluster to gluster)** >> [root at mseas-data2 gdata]# dd if=/gdata/zero1 of=/gdata/zero2 >> 2048000+0 records in >> 2048000+0 records out >> 1048576000 bytes (1.0 GB) copied, 92.4904 s, *11.3 MB/s* - >> realllyyy slooowww again >> >> >> >> *Copy from /home to /home (ext4 to ext4)* >> [root at mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero2 >> 2048000+0 records in >> 2048000+0 records out >> 1048576000 bytes (1.0 GB) copied, 3.53263 s, *297 MB/s *30 times >> as fast >> >> >> *Copy from /home to /home (ext4 to ext4)* >> [root at mseas-data2 gdata]# dd if=/home/zero1 of=/home/zero3 >> 2048000+0 records in >> 2048000+0 records out >> 1048576000 bytes (1.0 GB) copied, 4.1737 s, *251 MB/s* - 30 times >> as fast >> >> >> As a test, can we copy data directly to the xfs mountpoint >> (/mnt/brick1) and bypass gluster? >> >> >> Any help you could give us would be appreciated. >> >> Thanks >> >> -- >> >> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >> Pat Haley Email:phaley at mit.edu <mailto:phaley at mit.edu> >> Center for Ocean Engineering Phone: (617) 253-6824 >> Dept. of Mechanical Engineering Fax: (617) 253-8125 >> MIT, Room 5-213http://web.mit.edu/phaley/www/ >> 77 Massachusetts Avenue >> Cambridge, MA 02139-4301 >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> http://lists.gluster.org/mailman/listinfo/gluster-users >> <http://lists.gluster.org/mailman/listinfo/gluster-users> > > _______________________________________________ Gluster-users > mailing list Gluster-users at gluster.org > <mailto:Gluster-users at gluster.org> > http://lists.gluster.org/mailman/listinfo/gluster-users > <http://lists.gluster.org/mailman/listinfo/gluster-users> > > -- > Pranith > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170516/bea82c73/attachment.html>