Dmitry Antipov
2020-Nov-26 09:43 UTC
[Gluster-users] Poor performance on a server-class system vs. desktop
BTW, did someone try to profile the brick process? I do, and got this for the default replica 3 volume ('perf record -F 2500 -g -p [PID]'): + 3.29% 0.02% glfs_epoll001 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 3.17% 0.01% glfs_epoll001 [kernel.kallsyms] [k] do_syscall_64 + 3.17% 0.02% glfs_epoll000 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 3.06% 0.02% glfs_epoll000 [kernel.kallsyms] [k] do_syscall_64 + 2.75% 0.01% glfs_iotwr00f [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.74% 0.01% glfs_iotwr00b [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.74% 0.01% glfs_iotwr001 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.73% 0.00% glfs_iotwr003 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.72% 0.00% glfs_iotwr000 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.72% 0.01% glfs_iotwr00c [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.70% 0.01% glfs_iotwr003 [kernel.kallsyms] [k] do_syscall_64 + 2.69% 0.00% glfs_iotwr001 [kernel.kallsyms] [k] do_syscall_64 + 2.69% 0.01% glfs_iotwr008 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.68% 0.00% glfs_iotwr00b [kernel.kallsyms] [k] do_syscall_64 + 2.68% 0.00% glfs_iotwr00c [kernel.kallsyms] [k] do_syscall_64 + 2.68% 0.00% glfs_iotwr00f [kernel.kallsyms] [k] do_syscall_64 + 2.68% 0.01% glfs_iotwr000 [kernel.kallsyms] [k] do_syscall_64 + 2.67% 0.00% glfs_iotwr00a [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.65% 0.00% glfs_iotwr008 [kernel.kallsyms] [k] do_syscall_64 + 2.64% 0.00% glfs_iotwr00e [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.64% 0.01% glfs_iotwr00d [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.63% 0.01% glfs_iotwr00a [kernel.kallsyms] [k] do_syscall_64 + 2.63% 0.01% glfs_iotwr007 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.63% 0.00% glfs_iotwr005 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.63% 0.01% glfs_iotwr006 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.63% 0.00% glfs_iotwr009 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.61% 0.01% glfs_iotwr004 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.61% 0.01% glfs_iotwr00e [kernel.kallsyms] [k] do_syscall_64 + 2.60% 0.00% glfs_iotwr006 [kernel.kallsyms] [k] do_syscall_64 + 2.59% 0.00% glfs_iotwr005 [kernel.kallsyms] [k] do_syscall_64 + 2.59% 0.00% glfs_iotwr00d [kernel.kallsyms] [k] do_syscall_64 + 2.58% 0.00% glfs_iotwr002 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.58% 0.01% glfs_iotwr007 [kernel.kallsyms] [k] do_syscall_64 + 2.58% 0.00% glfs_iotwr004 [kernel.kallsyms] [k] do_syscall_64 + 2.57% 0.00% glfs_iotwr009 [kernel.kallsyms] [k] do_syscall_64 + 2.54% 0.00% glfs_iotwr002 [kernel.kallsyms] [k] do_syscall_64 + 1.65% 0.00% glfs_epoll000 [unknown] [k] 0x0000000000000001 + 1.65% 0.00% glfs_epoll001 [unknown] [k] 0x0000000000000001 + 1.48% 0.01% glfs_rpcrqhnd [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 1.44% 0.08% glfs_rpcrqhnd libpthread-2.32.so [.] pthread_cond_wait@@GLIBC_2.3.2 + 1.40% 0.01% glfs_rpcrqhnd [kernel.kallsyms] [k] do_syscall_64 + 1.36% 0.01% glfs_rpcrqhnd [kernel.kallsyms] [k] __x64_sys_futex + 1.35% 0.03% glfs_rpcrqhnd [kernel.kallsyms] [k] do_futex + 1.34% 0.01% glfs_iotwr00a libpthread-2.32.so [.] __libc_pwrite64 + 1.32% 0.00% glfs_iotwr00a [kernel.kallsyms] [k] __x64_sys_pwrite64 + 1.32% 0.00% glfs_iotwr001 libpthread-2.32.so [.] __libc_pwrite64 + 1.31% 0.01% glfs_iotwr002 libpthread-2.32.so [.] __libc_pwrite64 + 1.31% 0.00% glfs_iotwr00b libpthread-2.32.so [.] __libc_pwrite64 + 1.31% 0.01% glfs_iotwr00a [kernel.kallsyms] [k] vfs_write + 1.30% 0.00% glfs_iotwr001 [kernel.kallsyms] [k] __x64_sys_pwrite64 + 1.30% 0.00% glfs_iotwr008 libpthread-2.32.so [.] __libc_pwrite64 + 1.30% 0.00% glfs_iotwr00a [kernel.kallsyms] [k] new_sync_write + 1.30% 0.00% glfs_iotwr00c libpthread-2.32.so [.] __libc_pwrite64 + 1.29% 0.00% glfs_iotwr00a [kernel.kallsyms] [k] xfs_file_write_iter + 1.29% 0.01% glfs_iotwr00a [kernel.kallsyms] [k] xfs_file_dio_aio_write And on replica 3 with storage.linux-aio enabled: + 11.76% 0.05% glfs_posixaio [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 11.42% 0.01% glfs_posixaio [kernel.kallsyms] [k] do_syscall_64 + 8.81% 0.00% glfs_posixaio [unknown] [k] 0x00000000baadf00d + 8.81% 0.00% glfs_posixaio [unknown] [k] 0x0000000000000004 + 8.74% 0.06% glfs_posixaio libc-2.32.so [.] __GI___writev + 8.33% 0.02% glfs_posixaio [kernel.kallsyms] [k] do_writev + 8.23% 0.03% glfs_posixaio [kernel.kallsyms] [k] vfs_writev + 8.12% 0.05% glfs_posixaio [kernel.kallsyms] [k] do_iter_write + 8.02% 0.05% glfs_posixaio [kernel.kallsyms] [k] do_iter_readv_writev + 7.96% 0.04% glfs_posixaio [kernel.kallsyms] [k] sock_write_iter + 7.92% 0.01% glfs_posixaio [kernel.kallsyms] [k] sock_sendmsg + 7.86% 0.01% glfs_posixaio [kernel.kallsyms] [k] tcp_sendmsg + 7.28% 0.15% glfs_posixaio [kernel.kallsyms] [k] tcp_sendmsg_locked + 6.49% 0.01% glfs_posixaio [kernel.kallsyms] [k] __tcp_push_pending_frames + 6.48% 0.10% glfs_posixaio [kernel.kallsyms] [k] tcp_write_xmit + 6.31% 0.02% glfs_posixaio [unknown] [k] 0000000000000000 + 6.05% 0.13% glfs_posixaio [kernel.kallsyms] [k] __tcp_transmit_skb + 5.71% 0.06% glfs_posixaio [kernel.kallsyms] [k] __ip_queue_xmit + 4.15% 0.03% glfs_rpcrqhnd [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 4.07% 0.08% glfs_posixaio [kernel.kallsyms] [k] ip_finish_output2 + 3.75% 0.02% glfs_posixaio [kernel.kallsyms] [k] asm_call_sysvec_on_stack + 3.75% 0.01% glfs_rpcrqhnd [kernel.kallsyms] [k] do_syscall_64 + 3.70% 0.03% glfs_rpcrqhnd [kernel.kallsyms] [k] __x64_sys_futex + 3.68% 0.06% glfs_posixaio [kernel.kallsyms] [k] __local_bh_enable_ip + 3.67% 0.07% glfs_rpcrqhnd [kernel.kallsyms] [k] do_futex + 3.62% 0.05% glfs_posixaio [kernel.kallsyms] [k] do_softirq + 3.61% 0.01% glfs_posixaio [kernel.kallsyms] [k] do_softirq_own_stack + 3.59% 0.06% glfs_posixaio [kernel.kallsyms] [k] __softirqentry_text_start + 3.44% 0.06% glfs_posixaio [kernel.kallsyms] [k] net_rx_action + 3.34% 0.04% glfs_posixaio [kernel.kallsyms] [k] process_backlog + 3.28% 0.02% glfs_posixaio [kernel.kallsyms] [k] __netif_receive_skb_one_core + 3.08% 0.02% glfs_epoll000 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 3.02% 0.03% glfs_epoll001 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe + 2.97% 0.01% glfs_epoll000 [kernel.kallsyms] [k] do_syscall_64 + 2.89% 0.01% glfs_epoll001 [kernel.kallsyms] [k] do_syscall_64 + 2.73% 0.08% glfs_posixaio [kernel.kallsyms] [k] nf_hook_slow + 2.25% 0.04% glfs_posixaio libc-2.32.so [.] fgetxattr + 2.16% 0.14% glfs_rpcrqhnd [kernel.kallsyms] [k] futex_wake According to these tables, the brick process is just a thin wrapper for the system calls and kernel network subsystem behind them. To whom it may be interesting, the following replica 3 volume options: performance.io-cache-pass-through: on performance.iot-pass-through: on performance.md-cache-pass-through: on performance.nl-cache-pass-through: on performance.open-behind-pass-through: on performance.read-ahead-pass-through: on performance.readdir-ahead-pass-through: on performance.strict-o-direct: on features.ctime: off features.selinux: off performance.write-behind: off performance.open-behind: off performance.quick-read: off storage.linux-aio: on storage.fips-mode-rchecksum: off are likely to improve the I/O performance of GFAPI clients (fio with gfapi and gfapi_async engines, qemu -drive file=gluster://XXX, etc.) by ~20%. But beware of killing I/O performance of FUSE clients. Dmitry
Yaniv Kaul
2020-Nov-26 09:49 UTC
[Gluster-users] Poor performance on a server-class system vs. desktop
On Thu, Nov 26, 2020 at 11:44 AM Dmitry Antipov <dmantipov at yandex.ru> wrote:> BTW, did someone try to profile the brick process? I do, and got this > for the default replica 3 volume ('perf record -F 2500 -g -p [PID]'): >I run a slightly different command, which hides the kernel stuff and focuses on the user mode functions: sudo perf record --call-graph dwarf -j any --buildid-all --all-user -p `pgrep -d\, gluster` -F 2000 -ag Y.> + 3.29% 0.02% glfs_epoll001 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 3.17% 0.01% glfs_epoll001 [kernel.kallsyms] [k] > do_syscall_64 > + 3.17% 0.02% glfs_epoll000 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 3.06% 0.02% glfs_epoll000 [kernel.kallsyms] [k] > do_syscall_64 > + 2.75% 0.01% glfs_iotwr00f [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.74% 0.01% glfs_iotwr00b [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.74% 0.01% glfs_iotwr001 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.73% 0.00% glfs_iotwr003 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.72% 0.00% glfs_iotwr000 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.72% 0.01% glfs_iotwr00c [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.70% 0.01% glfs_iotwr003 [kernel.kallsyms] [k] > do_syscall_64 > + 2.69% 0.00% glfs_iotwr001 [kernel.kallsyms] [k] > do_syscall_64 > + 2.69% 0.01% glfs_iotwr008 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.68% 0.00% glfs_iotwr00b [kernel.kallsyms] [k] > do_syscall_64 > + 2.68% 0.00% glfs_iotwr00c [kernel.kallsyms] [k] > do_syscall_64 > + 2.68% 0.00% glfs_iotwr00f [kernel.kallsyms] [k] > do_syscall_64 > + 2.68% 0.01% glfs_iotwr000 [kernel.kallsyms] [k] > do_syscall_64 > + 2.67% 0.00% glfs_iotwr00a [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.65% 0.00% glfs_iotwr008 [kernel.kallsyms] [k] > do_syscall_64 > + 2.64% 0.00% glfs_iotwr00e [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.64% 0.01% glfs_iotwr00d [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.63% 0.01% glfs_iotwr00a [kernel.kallsyms] [k] > do_syscall_64 > + 2.63% 0.01% glfs_iotwr007 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.63% 0.00% glfs_iotwr005 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.63% 0.01% glfs_iotwr006 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.63% 0.00% glfs_iotwr009 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.61% 0.01% glfs_iotwr004 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.61% 0.01% glfs_iotwr00e [kernel.kallsyms] [k] > do_syscall_64 > + 2.60% 0.00% glfs_iotwr006 [kernel.kallsyms] [k] > do_syscall_64 > + 2.59% 0.00% glfs_iotwr005 [kernel.kallsyms] [k] > do_syscall_64 > + 2.59% 0.00% glfs_iotwr00d [kernel.kallsyms] [k] > do_syscall_64 > + 2.58% 0.00% glfs_iotwr002 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.58% 0.01% glfs_iotwr007 [kernel.kallsyms] [k] > do_syscall_64 > + 2.58% 0.00% glfs_iotwr004 [kernel.kallsyms] [k] > do_syscall_64 > + 2.57% 0.00% glfs_iotwr009 [kernel.kallsyms] [k] > do_syscall_64 > + 2.54% 0.00% glfs_iotwr002 [kernel.kallsyms] [k] > do_syscall_64 > + 1.65% 0.00% glfs_epoll000 [unknown] [k] > 0x0000000000000001 > + 1.65% 0.00% glfs_epoll001 [unknown] [k] > 0x0000000000000001 > + 1.48% 0.01% glfs_rpcrqhnd [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 1.44% 0.08% glfs_rpcrqhnd libpthread-2.32.so [.] > pthread_cond_wait@@GLIBC_2.3.2 > + 1.40% 0.01% glfs_rpcrqhnd [kernel.kallsyms] [k] > do_syscall_64 > + 1.36% 0.01% glfs_rpcrqhnd [kernel.kallsyms] [k] > __x64_sys_futex > + 1.35% 0.03% glfs_rpcrqhnd [kernel.kallsyms] [k] do_futex > + 1.34% 0.01% glfs_iotwr00a libpthread-2.32.so [.] > __libc_pwrite64 > + 1.32% 0.00% glfs_iotwr00a [kernel.kallsyms] [k] > __x64_sys_pwrite64 > + 1.32% 0.00% glfs_iotwr001 libpthread-2.32.so [.] > __libc_pwrite64 > + 1.31% 0.01% glfs_iotwr002 libpthread-2.32.so [.] > __libc_pwrite64 > + 1.31% 0.00% glfs_iotwr00b libpthread-2.32.so [.] > __libc_pwrite64 > + 1.31% 0.01% glfs_iotwr00a [kernel.kallsyms] [k] vfs_write > + 1.30% 0.00% glfs_iotwr001 [kernel.kallsyms] [k] > __x64_sys_pwrite64 > + 1.30% 0.00% glfs_iotwr008 libpthread-2.32.so [.] > __libc_pwrite64 > + 1.30% 0.00% glfs_iotwr00a [kernel.kallsyms] [k] > new_sync_write > + 1.30% 0.00% glfs_iotwr00c libpthread-2.32.so [.] > __libc_pwrite64 > + 1.29% 0.00% glfs_iotwr00a [kernel.kallsyms] [k] > xfs_file_write_iter > + 1.29% 0.01% glfs_iotwr00a [kernel.kallsyms] [k] > xfs_file_dio_aio_write > > And on replica 3 with storage.linux-aio enabled: > > + 11.76% 0.05% glfs_posixaio [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 11.42% 0.01% glfs_posixaio [kernel.kallsyms] [k] > do_syscall_64 > + 8.81% 0.00% glfs_posixaio [unknown] [k] > 0x00000000baadf00d > + 8.81% 0.00% glfs_posixaio [unknown] [k] > 0x0000000000000004 > + 8.74% 0.06% glfs_posixaio libc-2.32.so [.] > __GI___writev > + 8.33% 0.02% glfs_posixaio [kernel.kallsyms] [k] do_writev > + 8.23% 0.03% glfs_posixaio [kernel.kallsyms] [k] > vfs_writev > + 8.12% 0.05% glfs_posixaio [kernel.kallsyms] [k] > do_iter_write > + 8.02% 0.05% glfs_posixaio [kernel.kallsyms] [k] > do_iter_readv_writev > + 7.96% 0.04% glfs_posixaio [kernel.kallsyms] [k] > sock_write_iter > + 7.92% 0.01% glfs_posixaio [kernel.kallsyms] [k] > sock_sendmsg > + 7.86% 0.01% glfs_posixaio [kernel.kallsyms] [k] > tcp_sendmsg > + 7.28% 0.15% glfs_posixaio [kernel.kallsyms] [k] > tcp_sendmsg_locked > + 6.49% 0.01% glfs_posixaio [kernel.kallsyms] [k] > __tcp_push_pending_frames > + 6.48% 0.10% glfs_posixaio [kernel.kallsyms] [k] > tcp_write_xmit > + 6.31% 0.02% glfs_posixaio [unknown] [k] > 0000000000000000 > + 6.05% 0.13% glfs_posixaio [kernel.kallsyms] [k] > __tcp_transmit_skb > + 5.71% 0.06% glfs_posixaio [kernel.kallsyms] [k] > __ip_queue_xmit > + 4.15% 0.03% glfs_rpcrqhnd [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 4.07% 0.08% glfs_posixaio [kernel.kallsyms] [k] > ip_finish_output2 > + 3.75% 0.02% glfs_posixaio [kernel.kallsyms] [k] > asm_call_sysvec_on_stack > + 3.75% 0.01% glfs_rpcrqhnd [kernel.kallsyms] [k] > do_syscall_64 > + 3.70% 0.03% glfs_rpcrqhnd [kernel.kallsyms] [k] > __x64_sys_futex > + 3.68% 0.06% glfs_posixaio [kernel.kallsyms] [k] > __local_bh_enable_ip > + 3.67% 0.07% glfs_rpcrqhnd [kernel.kallsyms] [k] do_futex > + 3.62% 0.05% glfs_posixaio [kernel.kallsyms] [k] > do_softirq > + 3.61% 0.01% glfs_posixaio [kernel.kallsyms] [k] > do_softirq_own_stack > + 3.59% 0.06% glfs_posixaio [kernel.kallsyms] [k] > __softirqentry_text_start > + 3.44% 0.06% glfs_posixaio [kernel.kallsyms] [k] > net_rx_action > + 3.34% 0.04% glfs_posixaio [kernel.kallsyms] [k] > process_backlog > + 3.28% 0.02% glfs_posixaio [kernel.kallsyms] [k] > __netif_receive_skb_one_core > + 3.08% 0.02% glfs_epoll000 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 3.02% 0.03% glfs_epoll001 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.97% 0.01% glfs_epoll000 [kernel.kallsyms] [k] > do_syscall_64 > + 2.89% 0.01% glfs_epoll001 [kernel.kallsyms] [k] > do_syscall_64 > + 2.73% 0.08% glfs_posixaio [kernel.kallsyms] [k] > nf_hook_slow > + 2.25% 0.04% glfs_posixaio libc-2.32.so [.] > fgetxattr > + 2.16% 0.14% glfs_rpcrqhnd [kernel.kallsyms] [k] > futex_wake > > According to these tables, the brick process is just a thin wrapper for > the system calls > and kernel network subsystem behind them. > > To whom it may be interesting, the following replica 3 volume options: > > performance.io-cache-pass-through: on > performance.iot-pass-through: on > performance.md-cache-pass-through: on > performance.nl-cache-pass-through: on > performance.open-behind-pass-through: on > performance.read-ahead-pass-through: on > performance.readdir-ahead-pass-through: on > performance.strict-o-direct: on > features.ctime: off > features.selinux: off > performance.write-behind: off > performance.open-behind: off > performance.quick-read: off > storage.linux-aio: on > storage.fips-mode-rchecksum: off > > are likely to improve the I/O performance of GFAPI clients (fio with gfapi > and gfapi_async > engines, qemu -drive file=gluster://XXX, etc.) by ~20%. But beware of > killing I/O performance > of FUSE clients. > > Dmitry > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20201126/ff70f097/attachment.html>
Xavi Hernandez
2020-Nov-27 08:37 UTC
[Gluster-users] Poor performance on a server-class system vs. desktop
Hi Dmitry, On Thu, Nov 26, 2020 at 10:44 AM Dmitry Antipov <dmantipov at yandex.ru> wrote:> BTW, did someone try to profile the brick process? I do, and got this > for the default replica 3 volume ('perf record -F 2500 -g -p [PID]'): > > + 3.29% 0.02% glfs_epoll001 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 3.17% 0.01% glfs_epoll001 [kernel.kallsyms] [k] > do_syscall_64 > + 3.17% 0.02% glfs_epoll000 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 3.06% 0.02% glfs_epoll000 [kernel.kallsyms] [k] > do_syscall_64 > + 2.75% 0.01% glfs_iotwr00f [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.74% 0.01% glfs_iotwr00b [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.74% 0.01% glfs_iotwr001 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.73% 0.00% glfs_iotwr003 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.72% 0.00% glfs_iotwr000 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.72% 0.01% glfs_iotwr00c [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.70% 0.01% glfs_iotwr003 [kernel.kallsyms] [k] > do_syscall_64 > + 2.69% 0.00% glfs_iotwr001 [kernel.kallsyms] [k] > do_syscall_64 > + 2.69% 0.01% glfs_iotwr008 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.68% 0.00% glfs_iotwr00b [kernel.kallsyms] [k] > do_syscall_64 > + 2.68% 0.00% glfs_iotwr00c [kernel.kallsyms] [k] > do_syscall_64 > + 2.68% 0.00% glfs_iotwr00f [kernel.kallsyms] [k] > do_syscall_64 > + 2.68% 0.01% glfs_iotwr000 [kernel.kallsyms] [k] > do_syscall_64 > + 2.67% 0.00% glfs_iotwr00a [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.65% 0.00% glfs_iotwr008 [kernel.kallsyms] [k] > do_syscall_64 > + 2.64% 0.00% glfs_iotwr00e [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.64% 0.01% glfs_iotwr00d [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.63% 0.01% glfs_iotwr00a [kernel.kallsyms] [k] > do_syscall_64 > + 2.63% 0.01% glfs_iotwr007 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.63% 0.00% glfs_iotwr005 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.63% 0.01% glfs_iotwr006 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.63% 0.00% glfs_iotwr009 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.61% 0.01% glfs_iotwr004 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.61% 0.01% glfs_iotwr00e [kernel.kallsyms] [k] > do_syscall_64 > + 2.60% 0.00% glfs_iotwr006 [kernel.kallsyms] [k] > do_syscall_64 > + 2.59% 0.00% glfs_iotwr005 [kernel.kallsyms] [k] > do_syscall_64 > + 2.59% 0.00% glfs_iotwr00d [kernel.kallsyms] [k] > do_syscall_64 > + 2.58% 0.00% glfs_iotwr002 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.58% 0.01% glfs_iotwr007 [kernel.kallsyms] [k] > do_syscall_64 > + 2.58% 0.00% glfs_iotwr004 [kernel.kallsyms] [k] > do_syscall_64 > + 2.57% 0.00% glfs_iotwr009 [kernel.kallsyms] [k] > do_syscall_64 > + 2.54% 0.00% glfs_iotwr002 [kernel.kallsyms] [k] > do_syscall_64 > + 1.65% 0.00% glfs_epoll000 [unknown] [k] > 0x0000000000000001 > + 1.65% 0.00% glfs_epoll001 [unknown] [k] > 0x0000000000000001 > + 1.48% 0.01% glfs_rpcrqhnd [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 1.44% 0.08% glfs_rpcrqhnd libpthread-2.32.so [.] > pthread_cond_wait@@GLIBC_2.3.2 > + 1.40% 0.01% glfs_rpcrqhnd [kernel.kallsyms] [k] > do_syscall_64 > + 1.36% 0.01% glfs_rpcrqhnd [kernel.kallsyms] [k] > __x64_sys_futex > + 1.35% 0.03% glfs_rpcrqhnd [kernel.kallsyms] [k] do_futex > + 1.34% 0.01% glfs_iotwr00a libpthread-2.32.so [.] > __libc_pwrite64 > + 1.32% 0.00% glfs_iotwr00a [kernel.kallsyms] [k] > __x64_sys_pwrite64 > + 1.32% 0.00% glfs_iotwr001 libpthread-2.32.so [.] > __libc_pwrite64 > + 1.31% 0.01% glfs_iotwr002 libpthread-2.32.so [.] > __libc_pwrite64 > + 1.31% 0.00% glfs_iotwr00b libpthread-2.32.so [.] > __libc_pwrite64 > + 1.31% 0.01% glfs_iotwr00a [kernel.kallsyms] [k] vfs_write > + 1.30% 0.00% glfs_iotwr001 [kernel.kallsyms] [k] > __x64_sys_pwrite64 > + 1.30% 0.00% glfs_iotwr008 libpthread-2.32.so [.] > __libc_pwrite64 > + 1.30% 0.00% glfs_iotwr00a [kernel.kallsyms] [k] > new_sync_write > + 1.30% 0.00% glfs_iotwr00c libpthread-2.32.so [.] > __libc_pwrite64 > + 1.29% 0.00% glfs_iotwr00a [kernel.kallsyms] [k] > xfs_file_write_iter > + 1.29% 0.01% glfs_iotwr00a [kernel.kallsyms] [k] > xfs_file_dio_aio_write > > And on replica 3 with storage.linux-aio enabled: > > + 11.76% 0.05% glfs_posixaio [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 11.42% 0.01% glfs_posixaio [kernel.kallsyms] [k] > do_syscall_64 > + 8.81% 0.00% glfs_posixaio [unknown] [k] > 0x00000000baadf00d > + 8.81% 0.00% glfs_posixaio [unknown] [k] > 0x0000000000000004 > + 8.74% 0.06% glfs_posixaio libc-2.32.so [.] > __GI___writev > + 8.33% 0.02% glfs_posixaio [kernel.kallsyms] [k] do_writev > + 8.23% 0.03% glfs_posixaio [kernel.kallsyms] [k] > vfs_writev > + 8.12% 0.05% glfs_posixaio [kernel.kallsyms] [k] > do_iter_write > + 8.02% 0.05% glfs_posixaio [kernel.kallsyms] [k] > do_iter_readv_writev > + 7.96% 0.04% glfs_posixaio [kernel.kallsyms] [k] > sock_write_iter > + 7.92% 0.01% glfs_posixaio [kernel.kallsyms] [k] > sock_sendmsg > + 7.86% 0.01% glfs_posixaio [kernel.kallsyms] [k] > tcp_sendmsg > + 7.28% 0.15% glfs_posixaio [kernel.kallsyms] [k] > tcp_sendmsg_locked > + 6.49% 0.01% glfs_posixaio [kernel.kallsyms] [k] > __tcp_push_pending_frames > + 6.48% 0.10% glfs_posixaio [kernel.kallsyms] [k] > tcp_write_xmit > + 6.31% 0.02% glfs_posixaio [unknown] [k] > 0000000000000000 > + 6.05% 0.13% glfs_posixaio [kernel.kallsyms] [k] > __tcp_transmit_skb > + 5.71% 0.06% glfs_posixaio [kernel.kallsyms] [k] > __ip_queue_xmit > + 4.15% 0.03% glfs_rpcrqhnd [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 4.07% 0.08% glfs_posixaio [kernel.kallsyms] [k] > ip_finish_output2 > + 3.75% 0.02% glfs_posixaio [kernel.kallsyms] [k] > asm_call_sysvec_on_stack > + 3.75% 0.01% glfs_rpcrqhnd [kernel.kallsyms] [k] > do_syscall_64 > + 3.70% 0.03% glfs_rpcrqhnd [kernel.kallsyms] [k] > __x64_sys_futex > + 3.68% 0.06% glfs_posixaio [kernel.kallsyms] [k] > __local_bh_enable_ip > + 3.67% 0.07% glfs_rpcrqhnd [kernel.kallsyms] [k] do_futex > + 3.62% 0.05% glfs_posixaio [kernel.kallsyms] [k] > do_softirq > + 3.61% 0.01% glfs_posixaio [kernel.kallsyms] [k] > do_softirq_own_stack > + 3.59% 0.06% glfs_posixaio [kernel.kallsyms] [k] > __softirqentry_text_start > + 3.44% 0.06% glfs_posixaio [kernel.kallsyms] [k] > net_rx_action > + 3.34% 0.04% glfs_posixaio [kernel.kallsyms] [k] > process_backlog > + 3.28% 0.02% glfs_posixaio [kernel.kallsyms] [k] > __netif_receive_skb_one_core > + 3.08% 0.02% glfs_epoll000 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 3.02% 0.03% glfs_epoll001 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > + 2.97% 0.01% glfs_epoll000 [kernel.kallsyms] [k] > do_syscall_64 > + 2.89% 0.01% glfs_epoll001 [kernel.kallsyms] [k] > do_syscall_64 > + 2.73% 0.08% glfs_posixaio [kernel.kallsyms] [k] > nf_hook_slow > + 2.25% 0.04% glfs_posixaio libc-2.32.so [.] > fgetxattr > + 2.16% 0.14% glfs_rpcrqhnd [kernel.kallsyms] [k] > futex_wake > > According to these tables, the brick process is just a thin wrapper for > the system calls > and kernel network subsystem behind them.Mostly. However there's one issue that doesn't seem so obvious in the perf capture but we have identified it in other setups: when the system calls are processed very fast (as it should be the case when NVMe is used), the io-threads' thread pool will be constantly processing the request queue. This queue is currently synchronized with a mutex. The small latency per request makes the contention on the mutex quite high. This means that the thread pool tends to be serialized by the lock, which kills most of the parallelism and also causes a lot of additional system calls (increased CPU utilization and higher latencies). For now the only way I know to try to minimize this effect is to reduce the number of threads in the io-threads pool. It's hard to tell what would be a good number. It depends on many things. But you can run some tests with different values to try to find the best one (after changing the number of threads, it's better to restart the volume). Reducing the number of threads reduces the CPU power that gluster can use, but also reduces the contention, so it's expected (though not guaranteed) that at some point, even with fewer threads the performance could be a bit better. Regards, Xavi> To whom it may be interesting, the following replica 3 volume options: > > performance.io-cache-pass-through: on > performance.iot-pass-through: on > performance.md-cache-pass-through: on > performance.nl-cache-pass-through: on > performance.open-behind-pass-through: on > performance.read-ahead-pass-through: on > performance.readdir-ahead-pass-through: on > performance.strict-o-direct: on > features.ctime: off > features.selinux: off > performance.write-behind: off > performance.open-behind: off > performance.quick-read: off > storage.linux-aio: on > storage.fips-mode-rchecksum: off > > are likely to improve the I/O performance of GFAPI clients (fio with gfapi > and gfapi_async > engines, qemu -drive file=gluster://XXX, etc.) by ~20%. But beware of > killing I/O performance > of FUSE clients. > > Dmitry > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://meet.google.com/cpu-eiue-hvk > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20201127/a7d3daf1/attachment.html>