Schlick Rupert
2021-May-14 19:07 UTC
[Gluster-users] Performance Questions - not only small files
Dear list, we have replicated gluster volumes much slower than the brick disks and I wonder if this is a configuration issue, a conceptual issue of our setup or really how slow gluster just is. The setup: * Three servers in a ring connected via IP over Infiniband, 100Gb/s on each link * 2 x U.2 2TB-SSDs as RAID1, on a Megaraid controller, connected via NVMe links on each node * A 3x replicated volume on a thin-pool LVM on the SSD-RAIDs. * 2 x 26 cores, lots of RAM * The volumes are locally mounted and used as shared disks for computation jobs (CPU and GPU) on the nodes. * The LVM thinpool is shared with other volumes and with a cache pool for the hard disks. The measurement: * Iozone in automated mode, on the gluster volume (Set1) and on the mounted brick disk (Baseline) * Compared with iozone_results_comparator.py The issue: * Smaller files are around factor 2 slower than larger files on the SSD, but factor 6-10 slower than larger files on the gluster volume (somewhat expected) * Larger files are, except for certain read accesses, still factor 3-10 slower on the gluster volume than on the SSD RAID directly, depending on the operation. * Operations with many, admittedly smaller files (checkouts, copying, rsync and unpacking) can extend into hours, where they take tens of seconds to few minutes on disk. * atop sometimes shows 9x% busy on some of the LVM block devices - and sometimes the avio value increases from 4-26ms to 3000-5000ms. Otherwise, there is nothing mentionable observed regarding system load. This does not seem to be network bound. Accessing from a remote VM via glusterfs-mount is not much slower despite connecting via 10GbE instead of 100GB. Nethogs shows sent traffic on the Infiniband interface going up to 30000kB/s for large records - 240Mb/s over a 100Gb/s Interface ... The machine itself is mostly idle - no iowaits, a load average of 6 on 104 logical cpus. Glusterfsd is sometimes pushing up to 10% CPU. I just do not see what the bottleneck should be. Below a summary out of the compared iozone reports. Any hints if it is worth trying to optimize this at all and where to start looking? I believe I have checked all the "standard" hints, but might have missed something. The alternative would be a single storage node connected via nfs, sacrificing the probably not needed redundancy/high availability of the replicated filesystem. Thanks a lot for any comments in advance, Best Rupert [summary] Operation Write<file:///V:/rps/iozone/html_out/index.html#iwrite> Re-write<file:///V:/rps/iozone/html_out/index.html#rewrite> Read<file:///V:/rps/iozone/html_out/index.html#iread> Re-read<file:///V:/rps/iozone/html_out/index.html#reread> Random read<file:///V:/rps/iozone/html_out/index.html#randrd> Random write<file:///V:/rps/iozone/html_out/index.html#randwr> Backwards read<file:///V:/rps/iozone/html_out/index.html#bkwdrd> Record rewrite<file:///V:/rps/iozone/html_out/index.html#recrewr> Strided Read<file:///V:/rps/iozone/html_out/index.html#striderd> Fwrite<file:///V:/rps/iozone/html_out/index.html#fwrite> Frewrite<file:///V:/rps/iozone/html_out/index.html#frewrite> Fread<file:///V:/rps/iozone/html_out/index.html#fread> Freread<file:///V:/rps/iozone/html_out/index.html#freread> ALL<file:///V:/rps/iozone/html_out/index.html#ALL> baseline first quartile 1260.16 1894.11 3171.44 3002.55 3679.12 2214.07 2987.15 2701.4 3361.62 1783.45 1739.6 3231.97 3405.8 2152.28 median 1474.31 2145.7 3637.6 3383.24 4167.16 2570.24 3435.44 3751.41 3967.55 2059.25 1992.64 3611.78 3936.96 3125.35 third quartile 1595.17 2318.0 3992.78 3840.0 4803.77 3013.13 3864.87 4420.33 4673.13 2354.6 2258.15 4036.73 4570.02 3915.54 minimum 194.64 960.92 1785.65 2152.15 1840.86 947.03 1491.1 1135.78 1732.88 999.01 933.54 1745.15 2083.72 194.64 maximum 2041.55 2819.84 5628.61 7342.68 7142.61 4776.16 6091.17 15908.02 6922.3 3540.22 3787.35 5933.0 7510.19 15908.02 mean val. 1408.31 2070.62 3663.03 3505.74 4231.31 2633.07 3444.43 4102.42 4029.73 2087.25 2027.23 3641.8 4113.02 3150.61 standard dev. 306.93 366.43 652.73 816.6 1048.38 781.84 796.13 2277.33 1041.12 511.69 464.33 687.27 1012.13 1335.36 ci. min. 90% 1363.0 2016.52 3566.67 3385.19 4076.54 2517.65 3326.9 3766.21 3876.03 2011.71 1958.68 3540.34 3963.6 3096.31 ci. max. 90% 1453.62 2124.71 3759.39 3626.3 4386.08 2748.5 3561.96 4438.62 4183.44 2162.79 2095.78 3743.27 4262.44 3204.91 geom. mean 1362.58 2034.2 3604.92 3422.21 4094.96 2511.93 3346.74 3637.05 3892.76 2024.11 1975.46 3576.11 3996.51 2883.44 set1 first quartile 173.4 174.43 208.58 2813.71 3498.77 139.63 51.51 91.49 124.34 201.55 199.05 187.72 2509.21 189.36 median 312.05 293.47 267.53 3134.22 4005.72 332.09 110.38 302.02 216.41 359.67 355.33 234.42 2742.35 357.31 third quartile 414.07 410.15 401.42 3446.69 4429.1 436.45 178.94 429.6 397.15 463.62 480.29 317.98 3092.73 686.35 minimum 44.11 45.6 59.07 2089.07 2004.07 9.39 8.67 2.77 32.58 38.63 41.09 26.0 1359.63 2.77 maximum 606.95 610.49 4457.8 6248.75 6153.85 666.55 374.16 696.91 2593.43 647.26 700.02 451.39 4353.18 6248.75 mean val. 308.37 309.17 544.27 3181.18 3964.96 305.76 123.49 282.89 402.01 337.11 342.11 248.76 2780.37 1010.03 standard dev. 154.66 157.37 892.64 635.03 825.98 183.62 87.03 191.43 493.51 160.15 173.1 100.31 484.33 1358.34 ci. min. 90% 285.54 285.93 412.49 3087.43 3843.02 278.65 110.65 254.63 329.15 313.47 316.56 233.95 2708.87 954.8 ci. max. 90% 331.2 332.4 676.05 3274.93 4086.9 332.87 136.34 311.15 474.86 360.76 367.67 263.57 2851.87 1065.27 geom. mean 260.27 260.16 318.88 3127.01 3872.62 220.54 86.71 174.2 243.43 285.24 286.02 223.44 2737.93 414.26 linear regression slope 90% 0.17 - 0.28 0.1 - 0.21 -0.05 - 0.35 0.83 - 0.97 0.85 - 1.0 0.08 - 0.15 0.02 - 0.05 0.04 - 0.07 0.04 - 0.17 0.12 - 0.2 0.13 - 0.22 0.05 - 0.09 0.6 - 0.72 0.28 - 0.35 ttest equality DIFF DIFF DIFF DIFF DIFF DIFF DIFF DIFF DIFF DIFF DIFF DIFF DIFF DIFF baseline set1 difference -78.1 % -85.07 % -85.14 % -9.26 % -6.29 % -88.39 % -96.41 % -93.1 % -90.02 % -83.85 % -83.12 % -93.17 % -32.4 % -67.94 % ttest p-value 0.0 0.0 0.0 0.0005 0.026 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20210514/57edd000/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 41426 bytes Desc: image001.png URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20210514/57edd000/attachment.png>