Mingfan Lu
2014-Feb-08 06:51 UTC
[Gluster-users] very High CPU load of brick servers while write performance is very slow
CPU load in some of brick servers are very high and write performance is very slow. dd one file to the volume, the result is only 10+KB/sec any comments? more infomation >>>>> Volume Name: prodvolume Type: Distributed-Replicate Volume ID: f3fc24b3-23c7-430d-8ab1-81a646b1ce06 Status: Started Number of Bricks: 17 x 3 = 51 (I have 51 servers) Transport-type: tcp Bricks: .... Options Reconfigured: performance.io-thread-count: 32 auth.allow: *,10.121.48.244,10.121.48.82 features.limit-usage: /:400TB features.quota: on server.allow-insecure: on features.quota-timeout: 5 most of cpu utilization from system/kernel mode top - 14:47:13 up 219 days, 23:36, 2 users, load average: 17.76, 20.98, 24.74 Tasks: 493 total, 1 running, 491 sleeping, 0 stopped, 1 zombie Cpu(s): 8.2%us, 49.0%sy, 0.0%ni, 42.2%id, 0.1%wa, 0.0%hi, 0.4%si, 0.0%st Mem: 132112276k total, 131170760k used, 941516k free, 71224k buffers Swap: 4194296k total, 867216k used, 3327080k free, 110888216k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND * 6226 root 20 0 2677m 496m 2268 S 1183.4 0.4 89252:09 glusterfsd* 27994 root 20 0 1691m 77m 2000 S 111.6 0.1 324333:47 glusterfsd 14169 root 20 0 14.9g 23m 1984 S 51.3 0.0 3700:30 glusterfsd 20582 root 20 0 2129m 1.4g 1708 S 12.6 1.1 198:03.53 glusterfs 24528 root 20 0 0 0 0 S 6.3 0.0 14:18.60 flush-8:16 17717 root 20 0 21416 11m 8268 S 5.0 0.0 14:51.18 oprofiled use perf top -p 6226, are casusd by spin_lock Events: 49K cycles 72.51% [kernel] [k] _spin_lock 4.00% libpthread-2.12.so [.] pthread_mutex_lock 2.63% [kernel] [k] _spin_unlock_irqrestore 1.61% libpthread-2.12.so [.] pthread_mutex_unlock 1.59% [unknown] [.] 0xffffffffff600157 1.57% [xfs] [k] xfs_inobt_get_rec 1.41% [xfs] [k] xfs_btree_increment 1.27% [xfs] [k] xfs_btree_get_rec 1.17% libpthread-2.12.so [.] __lll_lock_wait 0.96% [xfs] [k] _xfs_buf_find 0.95% [xfs] [k] xfs_btree_get_block 0.88% [kernel] [k] copy_user_generic_string 0.50% [xfs] [k] xfs_dialloc 0.48% [xfs] [k] xfs_btree_rec_offset 0.47% [xfs] [k] xfs_btree_readahead 0.41% [kernel] [k] futex_wait_setup 0.41% [kernel] [k] futex_wake 0.35% [kernel] [k] system_call_after_swapgs 0.33% [xfs] [k] xfs_btree_rec_addr 0.30% [kernel] [k] __link_path_walk 0.29% io-threads.so.0.0.0 [.] __iot_dequeue 0.29% io-threads.so.0.0.0 [.] iot_worker 0.25% [kernel] [k] __d_lookup 0.21% libpthread-2.12.so [.] __lll_unlock_wake 0.20% [kernel] [k] get_futex_key 0.18% [kernel] [k] hash_futex 0.17% [kernel] [k] do_futex 0.15% [kernel] [k] thread_return 0.15% libpthread-2.12.so [.] pthread_spin_lock 0.14% libc-2.12.so [.] _int_malloc 0.14% [kernel] [k] sys_futex 0.14% [kernel] [k] wake_futex 0.14% [kernel] [k] _atomic_dec_and_lock 0.12% [kernel] [k] kmem_cache_free 0.12% [xfs] [k] xfs_trans_buf_item_match 0.12% [xfs] [k] xfs_btree_check_sblock 0.11% libc-2.12.so [.] vfprintf 0.11% [kernel] [k] futex_wait 0.11% [kernel] [k] kmem_cache_alloc 0.09% [kernel] [k] acl_permission_check use oprifile, I found the cpu are almost caused breakdown into: CPU: Intel Sandy Bridge microarchitecture, speed 2000.02 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000 samples % linenr info image name app name symbol name ------------------------------------------------------------------------------- 288683303 41.2321 clocksource.c:828 vmlinux vmlinux * sysfs_show_available_clocksources* 288683303 100.000 clocksource.c:828 vmlinux vmlinux sysfs_show_available_clocksources [self] ------------------------------------------------------------------------------- 203797076 29.1079 clocksource.c:236 vmlinux vmlinux *clocksource_mark_unstable* 203797076 100.000 clocksource.c:236 vmlinux vmlinux clocksource_mark_unstable [self] ------------------------------------------------------------------------------- 42321053 6.0446 (no location information) xfs xfs /xfs 42321053 100.000 (no location information) xfs xfs /xfs [self] ------------------------------------------------------------------------------- 23662768 3.3797 (no location information) libpthread-2.12.so libpthread-2.12.so pthread_mutex_lock 23662768 100.000 (no location information) libpthread-2.12.so libpthread-2.12.so pthread_mutex_lock [self] ------------------------------------------------------------------------------- 10867915 1.5522 (no location information) libpthread-2.12.so libpthread-2.12.so pthread_mutex_unlock 10867915 100.000 (no location information) libpthread-2.12.so libpthread-2.12.so pthread_mutex_unlock [self] ------------------------------------------------------------------------------- 7727828 1.1038 (no location information) libpthread-2.12.so libpthread-2.12.so __lll_lock_wait 7727828 100.000 (no location information) libpthread-2.12.so libpthread-2.12.so __lll_lock_wait [self] ------------------------------------------------------------------------------- 6296394 0.8993 blk-sysfs.c:260 vmlinux vmlinux queue_rq_affinity_store 6296394 100.000 blk-sysfs.c:260 vmlinux vmlinux queue_rq_affinity_store [self] ------------------------------------------------------------------------------- 3543413 0.5061 sched.h:293 vmlinux vmlinux ftrace_profile_templ_sched_stat_template 3543413 100.000 sched.h:293 vmlinux vmlinux ftrace_profile_templ_sched_stat_template [self] ------------------------------------------------------------------------------- 2960958 0.4229 msi.c:82 vmlinux vmlinux msi_set_enable 2960958 100.000 msi.c:82 vmlinux vmlinux msi_set_enable [self] ------------------------------------------------------------------------------- 2814515 0.4020 clocksource.c:249 vmlinux vmlinux clocksource_watchdog 2814515 100.000 clocksource.c:249 vmlinux vmlinux clocksource_watchdog [self] -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140208/72c015f5/attachment.html>