Dmitry Antipov
2020-Nov-26 09:00 UTC
[Gluster-users] Poor performance on a server-class system vs. desktop
On 11/26/20 11:42 AM, Strahil Nikolov wrote:> And you gluster bricks are localhost:/brick1 , localhost:/brick2 and > localhost:/brick3 ? > If not, add the hostname used for the bricks on the line starting with > 127.0.0.1 and try again.Same thing with: 127.0.0.1 trick trick.localdomain trick4 trick4.localdomain4 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 and: Volume Name: test0 Type: Replicate Volume ID: 2699e6fd-3898-4912-b4de-2d3850c53fb9 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: trick:/glusterfs/test0-000 Brick2: trick:/glusterfs/test0-001 Brick3: trick:/glusterfs/test0-002 When running the workload, per-interface RX/TX counters (as shown by ifconfig) are rapidly grows on 'lo' but remains nearly the same on other interfaces. So I'm pretty sure that loopback is in action and the problem is somewhere else. Dmitry
Dmitry Antipov
2020-Nov-26 09:43 UTC
[Gluster-users] Poor performance on a server-class system vs. desktop
BTW, did someone try to profile the brick process? I do, and got this for the default replica 3 volume ('perf record -F 2500 -g -p [PID]'): + 3.29% 0.02% glfs_epoll001 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 3.17% 0.01% glfs_epoll001 [kernel.kallsyms] [k] do_syscall_64 + 3.17% 0.02% glfs_epoll000 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 3.06% 0.02% glfs_epoll000 [kernel.kallsyms] [k] do_syscall_64 + 2.75% 0.01% glfs_iotwr00f [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.74% 0.01% glfs_iotwr00b [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.74% 0.01% glfs_iotwr001 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.73% 0.00% glfs_iotwr003 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.72% 0.00% glfs_iotwr000 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.72% 0.01% glfs_iotwr00c [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.70% 0.01% glfs_iotwr003 [kernel.kallsyms] [k] do_syscall_64 + 2.69% 0.00% glfs_iotwr001 [kernel.kallsyms] [k] do_syscall_64 + 2.69% 0.01% glfs_iotwr008 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.68% 0.00% glfs_iotwr00b [kernel.kallsyms] [k] do_syscall_64 + 2.68% 0.00% glfs_iotwr00c [kernel.kallsyms] [k] do_syscall_64 + 2.68% 0.00% glfs_iotwr00f [kernel.kallsyms] [k] do_syscall_64 + 2.68% 0.01% glfs_iotwr000 [kernel.kallsyms] [k] do_syscall_64 + 2.67% 0.00% glfs_iotwr00a [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.65% 0.00% glfs_iotwr008 [kernel.kallsyms] [k] do_syscall_64 + 2.64% 0.00% glfs_iotwr00e [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.64% 0.01% glfs_iotwr00d [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.63% 0.01% glfs_iotwr00a [kernel.kallsyms] [k] do_syscall_64 + 2.63% 0.01% glfs_iotwr007 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.63% 0.00% glfs_iotwr005 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.63% 0.01% glfs_iotwr006 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.63% 0.00% glfs_iotwr009 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.61% 0.01% glfs_iotwr004 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.61% 0.01% glfs_iotwr00e [kernel.kallsyms] [k] do_syscall_64 + 2.60% 0.00% glfs_iotwr006 [kernel.kallsyms] [k] do_syscall_64 + 2.59% 0.00% glfs_iotwr005 [kernel.kallsyms] [k] do_syscall_64 + 2.59% 0.00% glfs_iotwr00d [kernel.kallsyms] [k] do_syscall_64 + 2.58% 0.00% glfs_iotwr002 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.58% 0.01% glfs_iotwr007 [kernel.kallsyms] [k] do_syscall_64 + 2.58% 0.00% glfs_iotwr004 [kernel.kallsyms] [k] do_syscall_64 + 2.57% 0.00% glfs_iotwr009 [kernel.kallsyms] [k] do_syscall_64 + 2.54% 0.00% glfs_iotwr002 [kernel.kallsyms] [k] do_syscall_64 + 1.65% 0.00% glfs_epoll000 [unknown] [k] 0x0000000000000001 + 1.65% 0.00% glfs_epoll001 [unknown] [k] 0x0000000000000001 + 1.48% 0.01% glfs_rpcrqhnd [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 1.44% 0.08% glfs_rpcrqhnd libpthread-2.32.so [.] pthread_cond_wait@@GLIBC_2.3.2 + 1.40% 0.01% glfs_rpcrqhnd [kernel.kallsyms] [k] do_syscall_64 + 1.36% 0.01% glfs_rpcrqhnd [kernel.kallsyms] [k] __x64_sys_futex + 1.35% 0.03% glfs_rpcrqhnd [kernel.kallsyms] [k] do_futex + 1.34% 0.01% glfs_iotwr00a libpthread-2.32.so [.] __libc_pwrite64 + 1.32% 0.00% glfs_iotwr00a [kernel.kallsyms] [k] __x64_sys_pwrite64 + 1.32% 0.00% glfs_iotwr001 libpthread-2.32.so [.] __libc_pwrite64 + 1.31% 0.01% glfs_iotwr002 libpthread-2.32.so [.] __libc_pwrite64 + 1.31% 0.00% glfs_iotwr00b libpthread-2.32.so [.] __libc_pwrite64 + 1.31% 0.01% glfs_iotwr00a [kernel.kallsyms] [k] vfs_write + 1.30% 0.00% glfs_iotwr001 [kernel.kallsyms] [k] __x64_sys_pwrite64 + 1.30% 0.00% glfs_iotwr008 libpthread-2.32.so [.] __libc_pwrite64 + 1.30% 0.00% glfs_iotwr00a [kernel.kallsyms] [k] new_sync_write + 1.30% 0.00% glfs_iotwr00c libpthread-2.32.so [.] __libc_pwrite64 + 1.29% 0.00% glfs_iotwr00a [kernel.kallsyms] [k] xfs_file_write_iter + 1.29% 0.01% glfs_iotwr00a [kernel.kallsyms] [k] xfs_file_dio_aio_write And on replica 3 with storage.linux-aio enabled: + 11.76% 0.05% glfs_posixaio [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 11.42% 0.01% glfs_posixaio [kernel.kallsyms] [k] do_syscall_64 + 8.81% 0.00% glfs_posixaio [unknown] [k] 0x00000000baadf00d + 8.81% 0.00% glfs_posixaio [unknown] [k] 0x0000000000000004 + 8.74% 0.06% glfs_posixaio libc-2.32.so [.] __GI___writev + 8.33% 0.02% glfs_posixaio [kernel.kallsyms] [k] do_writev + 8.23% 0.03% glfs_posixaio [kernel.kallsyms] [k] vfs_writev + 8.12% 0.05% glfs_posixaio [kernel.kallsyms] [k] do_iter_write + 8.02% 0.05% glfs_posixaio [kernel.kallsyms] [k] do_iter_readv_writev + 7.96% 0.04% glfs_posixaio [kernel.kallsyms] [k] sock_write_iter + 7.92% 0.01% glfs_posixaio [kernel.kallsyms] [k] sock_sendmsg + 7.86% 0.01% glfs_posixaio [kernel.kallsyms] [k] tcp_sendmsg + 7.28% 0.15% glfs_posixaio [kernel.kallsyms] [k] tcp_sendmsg_locked + 6.49% 0.01% glfs_posixaio [kernel.kallsyms] [k] __tcp_push_pending_frames + 6.48% 0.10% glfs_posixaio [kernel.kallsyms] [k] tcp_write_xmit + 6.31% 0.02% glfs_posixaio [unknown] [k] 0000000000000000 + 6.05% 0.13% glfs_posixaio [kernel.kallsyms] [k] __tcp_transmit_skb + 5.71% 0.06% glfs_posixaio [kernel.kallsyms] [k] __ip_queue_xmit + 4.15% 0.03% glfs_rpcrqhnd [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 4.07% 0.08% glfs_posixaio [kernel.kallsyms] [k] ip_finish_output2 + 3.75% 0.02% glfs_posixaio [kernel.kallsyms] [k] asm_call_sysvec_on_stack + 3.75% 0.01% glfs_rpcrqhnd [kernel.kallsyms] [k] do_syscall_64 + 3.70% 0.03% glfs_rpcrqhnd [kernel.kallsyms] [k] __x64_sys_futex + 3.68% 0.06% glfs_posixaio [kernel.kallsyms] [k] __local_bh_enable_ip + 3.67% 0.07% glfs_rpcrqhnd [kernel.kallsyms] [k] do_futex + 3.62% 0.05% glfs_posixaio [kernel.kallsyms] [k] do_softirq + 3.61% 0.01% glfs_posixaio [kernel.kallsyms] [k] do_softirq_own_stack + 3.59% 0.06% glfs_posixaio [kernel.kallsyms] [k] __softirqentry_text_start + 3.44% 0.06% glfs_posixaio [kernel.kallsyms] [k] net_rx_action + 3.34% 0.04% glfs_posixaio [kernel.kallsyms] [k] process_backlog + 3.28% 0.02% glfs_posixaio [kernel.kallsyms] [k] __netif_receive_skb_one_core + 3.08% 0.02% glfs_epoll000 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 3.02% 0.03% glfs_epoll001 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.97% 0.01% glfs_epoll000 [kernel.kallsyms] [k] do_syscall_64 + 2.89% 0.01% glfs_epoll001 [kernel.kallsyms] [k] do_syscall_64 + 2.73% 0.08% glfs_posixaio [kernel.kallsyms] [k] nf_hook_slow + 2.25% 0.04% glfs_posixaio libc-2.32.so [.] fgetxattr + 2.16% 0.14% glfs_rpcrqhnd [kernel.kallsyms] [k] futex_wake According to these tables, the brick process is just a thin wrapper for the system calls and kernel network subsystem behind them. To whom it may be interesting, the following replica 3 volume options: performance.io-cache-pass-through: on performance.iot-pass-through: on performance.md-cache-pass-through: on performance.nl-cache-pass-through: on performance.open-behind-pass-through: on performance.read-ahead-pass-through: on performance.readdir-ahead-pass-through: on performance.strict-o-direct: on features.ctime: off features.selinux: off performance.write-behind: off performance.open-behind: off performance.quick-read: off storage.linux-aio: on storage.fips-mode-rchecksum: off are likely to improve the I/O performance of GFAPI clients (fio with gfapi and gfapi_async engines, qemu -drive file=gluster://XXX, etc.) by ~20%. But beware of killing I/O performance of FUSE clients. Dmitry
Strahil Nikolov
2020-Nov-26 12:58 UTC
[Gluster-users] Poor performance on a server-class system vs. desktop
Erm... that's not correct. Put them on the same line 27.0.0.1 localhost localhost.localdomain localhost4> localhost4.localdomain4 trick.....Best Regards, Strahil Nikolov ? 12:00 +0300 ?? 26.11.2020 (??), Dmitry Antipov ??????:> On 11/26/20 11:42 AM, Strahil Nikolov wrote: > > > And you gluster bricks are localhost:/brick1 , localhost:/brick2 > > and > > localhost:/brick3 ? > > If not, add the hostname used for the bricks on the line starting > > with > > 127.0.0.1 and try again. > > Same thing with: > > 127.0.0.1 trick trick.localdomain trick4 trick4.localdomain4 > 127.0.0.1 localhost localhost.localdomain localhost4 > localhost4.localdomain4 > ::1 localhost localhost.localdomain localhost6 > localhost6.localdomain6 > > and: > > Volume Name: test0 > Type: Replicate > Volume ID: 2699e6fd-3898-4912-b4de-2d3850c53fb9 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: trick:/glusterfs/test0-000 > Brick2: trick:/glusterfs/test0-001 > Brick3: trick:/glusterfs/test0-002 > > When running the workload, per-interface RX/TX counters (as shown by > ifconfig) > are rapidly grows on 'lo' but remains nearly the same on other > interfaces. > So I'm pretty sure that loopback is in action and the problem is > somewhere else. > > Dmitry