Gionatan Danti
2020-Nov-26 17:14 UTC
[Gluster-users] Poor performance on a server-class system vs. desktop
Il 2020-11-26 09:47 Dmitry Antipov ha scritto:> On 11/26/20 11:29 AM, Gionatan Danti wrote: > >> Can you details your exact client and server CPU model? > > Desktop is 8x of: > model name : Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz > > Server is 32x of: > model name : Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHzYour desktop CPU single-thread performance is significantly higher than the server CPU: the former turbo to 3.5 GHz, while the latter only to 3.0 GHz. Moreover, for single-thread workloads Skylake client is 3-5% faster than Skylake server at the same frequency. So I think you simply are CPU limited. I remember doing some tests with loopback RAM disks and finding that Gluster used 100% CPU (ie: full load on an entire core) when doing 4K random writes. Side note: using synchronized (ie: fsync) 4k writes, I only get ~600 IOPs even when running both bricks on the same machine and backing them with RAM disks (in other words, with no network or disk bottleneck). Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti at assyoma.it - info at assyoma.it GPG public key ID: FF5F32A8
Dmitry Antipov
2020-Nov-27 05:53 UTC
[Gluster-users] Poor performance on a server-class system vs. desktop
On 11/26/20 8:14 PM, Gionatan Danti wrote:> So I think you simply are CPU limited. I remember doing some tests with loopback RAM disks and finding that Gluster used 100% CPU (ie: full load on an entire core) when doing 4K random writes. Side > note: using synchronized (ie: fsync) 4k writes, I only get ~600 IOPs even when running both bricks on the same machine and backing them with RAM disks (in other words, with no network or disk > bottleneck).Thanks, it seems you're right. Running local replica 3 volume on 3x1Gb ramdisks, I'm seeing: top - 08:44:35 up 1 day, 11:51, 1 user, load average: 2.34, 1.94, 1.00 Tasks: 237 total, 2 running, 235 sleeping, 0 stopped, 0 zombie %Cpu(s): 38.7 us, 29.4 sy, 0.0 ni, 23.6 id, 0.0 wa, 0.4 hi, 7.9 si, 0.0 st MiB Mem : 15889.8 total, 1085.7 free, 1986.3 used, 12817.8 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 12307.3 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 63651 root 20 0 664124 41676 9600 R 166.7 0.3 0:24.20 fio 63282 root 20 0 1235336 21484 8768 S 120.4 0.1 2:43.73 glusterfsd 63298 root 20 0 1235368 20512 8856 S 120.0 0.1 2:42.43 glusterfsd 63314 root 20 0 1236392 21396 8684 S 119.8 0.1 2:41.94 glusterfsd So, 32-core server-class system with a lot of RAM can't perform much faster for an individual I/O client - it just scales better if there are a lot of clients, right? Dmitry