Kelly Lesperance
2016-May-25 16:54 UTC
[CentOS] Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2
I?ve posted this on the forums at https://www.centos.org/forums/viewtopic.php?f=47&t=57926&p=244614#p244614 - posting to the list in the hopes of getting more eyeballs on it. We have a cluster of 23 HP DL380p Gen8 hosts running Kafka. Basic specs: 2x E5-2650 128 GB RAM 12 x 4 TB 7200 RPM SATA drives connected to an HP H220 HBA Dual port 10 GB NIC The drives are configured as one large RAID-10 volume with mdadm, filesystem is XFS. The OS is not installed on the drive - we PXE boot a CentOS image we've built with minimal packages installed, and do the OS configuration via puppet. Originally, the hosts were running CentOS 6.5, with Kafka 0.8.1, without issue. We recently upgraded to CentOS 7.2 and Kafka 0.9, and that's when the trouble started. What we're seeing is that when the weekly raid-check script executes, performance nose dives, and I/O wait skyrockets. The raid check starts out fairly fast (20000K/sec - the limit that's been set), but then quickly drops down to about 4000K/Sec. dev.raid.speed sysctls are at the defaults: dev.raid.speed_limit_max = 200000 dev.raid.speed_limit_min = 1000 Here's 10 seconds of iostat output, which illustrates the issue: [root at r1k1log] # iostat 1 10 Linux 3.10.0-327.18.2.el7.x86_64 (r1k1) ? ?05/24/16 ? ?_x86_64_? ?(32 CPU) avg-cpu:? %user? ?%nice %system %iowait? %steal? ?%idle ? ? ? ? ? ?8.80? ? 0.06? ? 1.89? ?14.79? ? 0.00? ?74.46 Device:? ? ? ? ? ? tps? ? kB_read/s? ? kB_wrtn/s? ? kB_read? ? kB_wrtn sda? ? ? ? ? ? ? 52.59? ? ? 2033.16? ? ?10682.78 1210398902 6359779847 sdb? ? ? ? ? ? ? 52.46? ? ? 2031.25? ? ?10682.78 1209265338 6359779847 sdc? ? ? ? ? ? ? 52.40? ? ? 2033.21? ? ?10683.53 1210433924 6360229587 sdd? ? ? ? ? ? ? 52.22? ? ? 2031.16? ? ?10683.53 1209212513 6360229587 sdf? ? ? ? ? ? ? 52.20? ? ? 2031.17? ? ?10682.06 1209216701 6359354331 sdg? ? ? ? ? ? ? 52.62? ? ? 2033.22? ? ?10684.17 1210437080 6360606756 sdh? ? ? ? ? ? ? 52.57? ? ? 2031.21? ? ?10684.17 1209242746 6360606756 sde? ? ? ? ? ? ? 51.67? ? ? 2033.17? ? ?10682.06 1210408935 6359354331 sdj? ? ? ? ? ? ? 51.90? ? ? 2031.13? ? ?10684.48 1209191501 6360795559 sdi? ? ? ? ? ? ? 52.47? ? ? 2033.16? ? ?10684.48 1210399262 6360795559 sdk? ? ? ? ? ? ? 52.09? ? ? 2033.15? ? ?10684.36 1210396915 6360724971 sdl? ? ? ? ? ? ? 51.95? ? ? 2031.20? ? ?10684.36 1209235241 6360724971 md127? ? ? ? ? ?138.20? ? ? ? 74.49? ? ?64101.35? ?44348810 38161468777 avg-cpu:? %user? ?%nice %system %iowait? %steal? ?%idle ? ? ? ? ? ?8.57? ? 0.09? ? 1.33? ?26.19? ? 0.00? ?63.81 Device:? ? ? ? ? ? tps? ? kB_read/s? ? kB_wrtn/s? ? kB_read? ? kB_wrtn sda? ? ? ? ? ? ? 28.00? ? ? ?512.00? ? ? 8416.00? ? ? ? 512? ? ? ?8416 sdb? ? ? ? ? ? ? 28.00? ? ? ?512.00? ? ? 8416.00? ? ? ? 512? ? ? ?8416 sdc? ? ? ? ? ? ? 25.00? ? ? ?448.00? ? ? 8876.00? ? ? ? 448? ? ? ?8876 sdd? ? ? ? ? ? ? 24.00? ? ? ?448.00? ? ? 8364.00? ? ? ? 448? ? ? ?8364 sdf? ? ? ? ? ? ? 23.00? ? ? ?448.00? ? ? 8192.00? ? ? ? 448? ? ? ?8192 sdg? ? ? ? ? ? ? 24.00? ? ? ?512.00? ? ? 7680.00? ? ? ? 512? ? ? ?7680 sdh? ? ? ? ? ? ? 24.00? ? ? ?512.00? ? ? 7680.00? ? ? ? 512? ? ? ?7680 sde? ? ? ? ? ? ? 23.00? ? ? ?448.00? ? ? 8192.00? ? ? ? 448? ? ? ?8192 sdj? ? ? ? ? ? ? 23.00? ? ? ?512.00? ? ? 7680.00? ? ? ? 512? ? ? ?7680 sdi? ? ? ? ? ? ? 23.00? ? ? ?512.00? ? ? 7680.00? ? ? ? 512? ? ? ?7680 sdk? ? ? ? ? ? ? 23.00? ? ? ?512.00? ? ? 7680.00? ? ? ? 512? ? ? ?7680 sdl? ? ? ? ? ? ? 23.00? ? ? ?512.00? ? ? 7680.00? ? ? ? 512? ? ? ?7680 md127? ? ? ? ? ?101.00? ? ? ? ?0.00? ? ?48012.00? ? ? ? ? 0? ? ? 48012 avg-cpu:? %user? ?%nice %system %iowait? %steal? ?%idle ? ? ? ? ? ?6.50? ? 0.00? ? 1.04? ?24.27? ? 0.00? ?68.19 Device:? ? ? ? ? ? tps? ? kB_read/s? ? kB_wrtn/s? ? kB_read? ? kB_wrtn sda? ? ? ? ? ? ? 26.00? ? ? ?512.00? ? ? 9216.00? ? ? ? 512? ? ? ?9216 sdb? ? ? ? ? ? ? 26.00? ? ? ?512.00? ? ? 9216.00? ? ? ? 512? ? ? ?9216 sdc? ? ? ? ? ? ? 27.00? ? ? ?576.00? ? ? 9204.00? ? ? ? 576? ? ? ?9204 sdd? ? ? ? ? ? ? 28.00? ? ? ?576.00? ? ? 9716.00? ? ? ? 576? ? ? ?9716 sdf? ? ? ? ? ? ? 31.00? ? ? ?768.00? ? ? 9728.00? ? ? ? 768? ? ? ?9728 sdg? ? ? ? ? ? ? 28.00? ? ? ?512.00? ? ?10240.00? ? ? ? 512? ? ? 10240 sdh? ? ? ? ? ? ? 28.00? ? ? ?512.00? ? ?10240.00? ? ? ? 512? ? ? 10240 sde? ? ? ? ? ? ? 31.00? ? ? ?768.00? ? ? 9728.00? ? ? ? 768? ? ? ?9728 sdj? ? ? ? ? ? ? 28.00? ? ? ?512.00? ? ? 9744.00? ? ? ? 512? ? ? ?9744 sdi? ? ? ? ? ? ? 28.00? ? ? ?512.00? ? ? 9744.00? ? ? ? 512? ? ? ?9744 sdk? ? ? ? ? ? ? 27.00? ? ? ?512.00? ? ? 9728.00? ? ? ? 512? ? ? ?9728 sdl? ? ? ? ? ? ? 27.00? ? ? ?512.00? ? ? 9728.00? ? ? ? 512? ? ? ?9728 md127? ? ? ? ? ?114.00? ? ? ? ?0.00? ? ?57860.00? ? ? ? ? 0? ? ? 57860 avg-cpu:? %user? ?%nice %system %iowait? %steal? ?%idle ? ? ? ? ? ?9.24? ? 0.00? ? 1.32? ?20.02? ? 0.00? ?69.42 Device:? ? ? ? ? ? tps? ? kB_read/s? ? kB_wrtn/s? ? kB_read? ? kB_wrtn sda? ? ? ? ? ? ? 50.00? ? ? ?512.00? ? ?20408.00? ? ? ? 512? ? ? 20408 sdb? ? ? ? ? ? ? 50.00? ? ? ?512.00? ? ?20408.00? ? ? ? 512? ? ? 20408 sdc? ? ? ? ? ? ? 48.00? ? ? ?512.00? ? ?19984.00? ? ? ? 512? ? ? 19984 sdd? ? ? ? ? ? ? 48.00? ? ? ?512.00? ? ?19984.00? ? ? ? 512? ? ? 19984 sdf? ? ? ? ? ? ? 50.00? ? ? ?704.00? ? ?19968.00? ? ? ? 704? ? ? 19968 sdg? ? ? ? ? ? ? 47.00? ? ? ?512.00? ? ?19968.00? ? ? ? 512? ? ? 19968 sdh? ? ? ? ? ? ? 47.00? ? ? ?512.00? ? ?19968.00? ? ? ? 512? ? ? 19968 sde? ? ? ? ? ? ? 50.00? ? ? ?704.00? ? ?19968.00? ? ? ? 704? ? ? 19968 sdj? ? ? ? ? ? ? 48.00? ? ? ?512.00? ? ?19972.00? ? ? ? 512? ? ? 19972 sdi? ? ? ? ? ? ? 48.00? ? ? ?512.00? ? ?19972.00? ? ? ? 512? ? ? 19972 sdk? ? ? ? ? ? ? 48.00? ? ? ?512.00? ? ?19980.00? ? ? ? 512? ? ? 19980 sdl? ? ? ? ? ? ? 48.00? ? ? ?512.00? ? ?19980.00? ? ? ? 512? ? ? 19980 md127? ? ? ? ? ?241.00? ? ? ? ?0.00? ? 120280.00? ? ? ? ? 0? ? ?120280 avg-cpu:? %user? ?%nice %system %iowait? %steal? ?%idle ? ? ? ? ? ?7.98? ? 0.00? ? 0.98? ?18.42? ? 0.00? ?72.63 Device:? ? ? ? ? ? tps? ? kB_read/s? ? kB_wrtn/s? ? kB_read? ? kB_wrtn sda? ? ? ? ? ? ? 39.00? ? ? ?640.00? ? ?14076.00? ? ? ? 640? ? ? 14076 sdb? ? ? ? ? ? ? 39.00? ? ? ?640.00? ? ?14076.00? ? ? ? 640? ? ? 14076 sdc? ? ? ? ? ? ? 36.00? ? ? ?512.00? ? ?14324.00? ? ? ? 512? ? ? 14324 sdd? ? ? ? ? ? ? 36.00? ? ? ?512.00? ? ?14324.00? ? ? ? 512? ? ? 14324 sdf? ? ? ? ? ? ? 36.00? ? ? ?576.00? ? ?13824.00? ? ? ? 576? ? ? 13824 sdg? ? ? ? ? ? ? 43.00? ? ? 1024.00? ? ?13824.00? ? ? ?1024? ? ? 13824 sdh? ? ? ? ? ? ? 43.00? ? ? 1024.00? ? ?13824.00? ? ? ?1024? ? ? 13824 sde? ? ? ? ? ? ? 36.00? ? ? ?576.00? ? ?13824.00? ? ? ? 576? ? ? 13824 sdj? ? ? ? ? ? ? 44.00? ? ? 1024.00? ? ?14104.00? ? ? ?1024? ? ? 14104 sdi? ? ? ? ? ? ? 44.00? ? ? 1024.00? ? ?14104.00? ? ? ?1024? ? ? 14104 sdk? ? ? ? ? ? ? 45.00? ? ? 1024.00? ? ?14336.00? ? ? ?1024? ? ? 14336 sdl? ? ? ? ? ? ? 45.00? ? ? 1024.00? ? ?14336.00? ? ? ?1024? ? ? 14336 md127? ? ? ? ? ?168.00? ? ? ? ?0.00? ? ?84488.00? ? ? ? ? 0? ? ? 84488 avg-cpu:? %user? ?%nice %system %iowait? %steal? ?%idle ? ? ? ? ? ?7.39? ? 0.00? ? 1.01? ?19.48? ? 0.00? ?72.13 Device:? ? ? ? ? ? tps? ? kB_read/s? ? kB_wrtn/s? ? kB_read? ? kB_wrtn sda? ? ? ? ? ? ? 22.00? ? ? ?896.00? ? ? 4096.00? ? ? ? 896? ? ? ?4096 sdb? ? ? ? ? ? ? 22.00? ? ? ?896.00? ? ? 4096.00? ? ? ? 896? ? ? ?4096 sdc? ? ? ? ? ? ? 19.00? ? ? ?640.00? ? ? 4344.00? ? ? ? 640? ? ? ?4344 sdd? ? ? ? ? ? ? 19.00? ? ? ?640.00? ? ? 4344.00? ? ? ? 640? ? ? ?4344 sdf? ? ? ? ? ? ? 18.00? ? ? ?512.00? ? ? 5120.00? ? ? ? 512? ? ? ?5120 sdg? ? ? ? ? ? ? 18.00? ? ? ?512.00? ? ? 5120.00? ? ? ? 512? ? ? ?5120 sdh? ? ? ? ? ? ? 18.00? ? ? ?512.00? ? ? 5120.00? ? ? ? 512? ? ? ?5120 sde? ? ? ? ? ? ? 18.00? ? ? ?512.00? ? ? 5120.00? ? ? ? 512? ? ? ?5120 sdj? ? ? ? ? ? ? 18.00? ? ? ?512.00? ? ? 4624.00? ? ? ? 512? ? ? ?4624 sdi? ? ? ? ? ? ? 18.00? ? ? ?512.00? ? ? 4624.00? ? ? ? 512? ? ? ?4624 sdk? ? ? ? ? ? ? 18.00? ? ? ?512.00? ? ? 4608.00? ? ? ? 512? ? ? ?4608 sdl? ? ? ? ? ? ? 18.00? ? ? ?512.00? ? ? 4608.00? ? ? ? 512? ? ? ?4608 md127? ? ? ? ? ? 57.00? ? ? ? ?0.00? ? ?27912.00? ? ? ? ? 0? ? ? 27912 avg-cpu:? %user? ?%nice %system %iowait? %steal? ?%idle ? ? ? ? ? 10.92? ? 0.00? ? 1.58? ?21.84? ? 0.00? ?65.66 Device:? ? ? ? ? ? tps? ? kB_read/s? ? kB_wrtn/s? ? kB_read? ? kB_wrtn sda? ? ? ? ? ? ? 23.00? ? ? ?576.00? ? ? 7168.00? ? ? ? 576? ? ? ?7168 sdb? ? ? ? ? ? ? 23.00? ? ? ?576.00? ? ? 7168.00? ? ? ? 576? ? ? ?7168 sdc? ? ? ? ? ? ? 29.00? ? ? ?896.00? ? ? 7680.00? ? ? ? 896? ? ? ?7680 sdd? ? ? ? ? ? ? 29.00? ? ? ?896.00? ? ? 7680.00? ? ? ? 896? ? ? ?7680 sdf? ? ? ? ? ? ? 31.00? ? ? 1024.00? ? ? 7680.00? ? ? ?1024? ? ? ?7680 sdg? ? ? ? ? ? ? 31.00? ? ? 1024.00? ? ? 7680.00? ? ? ?1024? ? ? ?7680 sdh? ? ? ? ? ? ? 31.00? ? ? 1024.00? ? ? 7680.00? ? ? ?1024? ? ? ?7680 sde? ? ? ? ? ? ? 31.00? ? ? 1024.00? ? ? 7680.00? ? ? ?1024? ? ? ?7680 sdj? ? ? ? ? ? ? 30.00? ? ? 1024.00? ? ? 7168.00? ? ? ?1024? ? ? ?7168 sdi? ? ? ? ? ? ? 31.00? ? ? 1024.00? ? ? 7680.00? ? ? ?1024? ? ? ?7680 sdk? ? ? ? ? ? ? 32.00? ? ? 1024.00? ? ? 7424.00? ? ? ?1024? ? ? ?7424 sdl? ? ? ? ? ? ? 32.00? ? ? 1024.00? ? ? 7424.00? ? ? ?1024? ? ? ?7424 md127? ? ? ? ? ? 89.00? ? ? ? ?0.00? ? ?44800.00? ? ? ? ? 0? ? ? 44800 avg-cpu:? %user? ?%nice %system %iowait? %steal? ?%idle ? ? ? ? ? 13.89? ? 0.03? ? 2.63? ?21.54? ? 0.00? ?61.91 Device:? ? ? ? ? ? tps? ? kB_read/s? ? kB_wrtn/s? ? kB_read? ? kB_wrtn sda? ? ? ? ? ? ? 30.00? ? ? ?960.00? ? ? 7680.00? ? ? ? 960? ? ? ?7680 sdb? ? ? ? ? ? ? 30.00? ? ? ?960.00? ? ? 7680.00? ? ? ? 960? ? ? ?7680 sdc? ? ? ? ? ? ? 32.00? ? ? 1024.00? ? ? 7684.00? ? ? ?1024? ? ? ?7684 sdd? ? ? ? ? ? ? 32.00? ? ? 1024.00? ? ? 7684.00? ? ? ?1024? ? ? ?7684 sdf? ? ? ? ? ? ? 31.00? ? ? 1024.00? ? ? 7680.00? ? ? ?1024? ? ? ?7680 sdg? ? ? ? ? ? ? 31.00? ? ? 1024.00? ? ? 7680.00? ? ? ?1024? ? ? ?7680 sdh? ? ? ? ? ? ? 31.00? ? ? 1024.00? ? ? 7680.00? ? ? ?1024? ? ? ?7680 sde? ? ? ? ? ? ? 31.00? ? ? 1024.00? ? ? 7680.00? ? ? ?1024? ? ? ?7680 sdj? ? ? ? ? ? ? 32.00? ? ? 1024.00? ? ? 8192.00? ? ? ?1024? ? ? ?8192 sdi? ? ? ? ? ? ? 31.00? ? ? 1024.00? ? ? 7680.00? ? ? ?1024? ? ? ?7680 sdk? ? ? ? ? ? ? 26.00? ? ? ?704.00? ? ? 7680.00? ? ? ? 704? ? ? ?7680 sdl? ? ? ? ? ? ? 26.00? ? ? ?704.00? ? ? 7680.00? ? ? ? 704? ? ? ?7680 md127? ? ? ? ? ? 92.00? ? ? ? ?0.00? ? ?46596.00? ? ? ? ? 0? ? ? 46596 avg-cpu:? %user? ?%nice %system %iowait? %steal? ?%idle ? ? ? ? ? 14.24? ? 0.00? ? 2.22? ?19.89? ? 0.00? ?63.65 Device:? ? ? ? ? ? tps? ? kB_read/s? ? kB_wrtn/s? ? kB_read? ? kB_wrtn sda? ? ? ? ? ? ? 33.00? ? ? 1024.00? ? ? 7244.00? ? ? ?1024? ? ? ?7244 sdb? ? ? ? ? ? ? 33.00? ? ? 1024.00? ? ? 7244.00? ? ? ?1024? ? ? ?7244 sdc? ? ? ? ? ? ? 31.00? ? ? 1024.00? ? ? 7668.00? ? ? ?1024? ? ? ?7668 sdd? ? ? ? ? ? ? 31.00? ? ? 1024.00? ? ? 7668.00? ? ? ?1024? ? ? ?7668 sdf? ? ? ? ? ? ? 31.00? ? ? 1024.00? ? ? 7680.00? ? ? ?1024? ? ? ?7680 sdg? ? ? ? ? ? ? 26.00? ? ? ?768.00? ? ? 6672.00? ? ? ? 768? ? ? ?6672 sdh? ? ? ? ? ? ? 26.00? ? ? ?768.00? ? ? 6672.00? ? ? ? 768? ? ? ?6672 sde? ? ? ? ? ? ? 31.00? ? ? 1024.00? ? ? 7680.00? ? ? ?1024? ? ? ?7680 sdj? ? ? ? ? ? ? 21.00? ? ? ?512.00? ? ? 6656.00? ? ? ? 512? ? ? ?6656 sdi? ? ? ? ? ? ? 21.00? ? ? ?512.00? ? ? 6656.00? ? ? ? 512? ? ? ?6656 sdk? ? ? ? ? ? ? 27.00? ? ? ?832.00? ? ? 7168.00? ? ? ? 832? ? ? ?7168 sdl? ? ? ? ? ? ? 27.00? ? ? ?832.00? ? ? 7168.00? ? ? ? 832? ? ? ?7168 md127? ? ? ? ? ? 88.00? ? ? ? ?0.00? ? ?43088.00? ? ? ? ? 0? ? ? 43088 avg-cpu:? %user? ?%nice %system %iowait? %steal? ?%idle ? ? ? ? ? ?8.02? ? 0.13? ? 1.42? ?23.90? ? 0.00? ?66.53 Device:? ? ? ? ? ? tps? ? kB_read/s? ? kB_wrtn/s? ? kB_read? ? kB_wrtn sda? ? ? ? ? ? ? 30.00? ? ? 1024.00? ? ? 7168.00? ? ? ?1024? ? ? ?7168 sdb? ? ? ? ? ? ? 30.00? ? ? 1024.00? ? ? 7168.00? ? ? ?1024? ? ? ?7168 sdc? ? ? ? ? ? ? 29.00? ? ? ?960.00? ? ? 7168.00? ? ? ? 960? ? ? ?7168 sdd? ? ? ? ? ? ? 29.00? ? ? ?960.00? ? ? 7168.00? ? ? ? 960? ? ? ?7168 sdf? ? ? ? ? ? ? 23.00? ? ? ?512.00? ? ? 7668.00? ? ? ? 512? ? ? ?7668 sdg? ? ? ? ? ? ? 28.00? ? ? ?768.00? ? ? 7680.00? ? ? ? 768? ? ? ?7680 sdh? ? ? ? ? ? ? 28.00? ? ? ?768.00? ? ? 7680.00? ? ? ? 768? ? ? ?7680 sde? ? ? ? ? ? ? 23.00? ? ? ?512.00? ? ? 7668.00? ? ? ? 512? ? ? ?7668 sdj? ? ? ? ? ? ? 30.00? ? ? 1024.00? ? ? 6672.00? ? ? ?1024? ? ? ?6672 sdi? ? ? ? ? ? ? 30.00? ? ? 1024.00? ? ? 6672.00? ? ? ?1024? ? ? ?6672 sdk? ? ? ? ? ? ? 30.00? ? ? 1024.00? ? ? 7168.00? ? ? ?1024? ? ? ?7168 sdl? ? ? ? ? ? ? 30.00? ? ? 1024.00? ? ? 7168.00? ? ? ?1024? ? ? ?7168 md127? ? ? ? ? ? 87.00? ? ? ? ?0.00? ? ?43524.00? ? ? ? ? 0? ? ? 43524 Details of the array: [root at r1k1] # cat /proc/mdstat? Personalities : [raid10]? md127 : active raid10 sdf[5] sdi[8] sdh[7] sdk[10] sdb[1] sdj[9] sdc[2] sdd[3] sdl[11] sde[13] sdg[12] sda[0] ? ? ? 23441323008 blocks super 1.2 512K chunks 2 near-copies [12/12] [UUUUUUUUUUUU] ? ? ? [======>..............]? check = 30.8% (7237496960/23441323008) finish=62944.5min speed=4290K/sec ? ? ?? unused devices: <none> [root at r1k1] # mdadm --detail /dev/md127 /dev/md127: ? ? ? ? Version : 1.2 ? Creation Time : Thu Sep 18 09:57:57 2014 ? ? ?Raid Level : raid10 ? ? ?Array Size : 23441323008 (22355.39 GiB 24003.91 GB) ? Used Dev Size : 3906887168 (3725.90 GiB 4000.65 GB) ? ?Raid Devices : 12 ? Total Devices : 12 ? ? Persistence : Superblock is persistent ? ? Update Time : Tue May 24 15:32:56 2016 ? ? ? ? ? State : active, checking? ?Active Devices : 12 Working Devices : 12 ?Failed Devices : 0 ? Spare Devices : 0 ? ? ? ? ?Layout : near=2 ? ? ?Chunk Size : 512K ? ?Check Status : 30% complete ? ? ? ? ? ?Name : localhost:kafka ? ? ? ? ? ?UUID : b6b98e3e:65ee06c3:3599d781:98908041 ? ? ? ? ?Events : 2459193 ? ? Number? ?Major? ?Minor? ?RaidDevice State ? ? ? ?0? ? ? ?8? ? ? ? 0? ? ? ? 0? ? ? active sync set-A? ?/dev/sda ? ? ? ?1? ? ? ?8? ? ? ?16? ? ? ? 1? ? ? active sync set-B? ?/dev/sdb ? ? ? ?2? ? ? ?8? ? ? ?32? ? ? ? 2? ? ? active sync set-A? ?/dev/sdc ? ? ? ?3? ? ? ?8? ? ? ?48? ? ? ? 3? ? ? active sync set-B? ?/dev/sdd ? ? ? 13? ? ? ?8? ? ? ?64? ? ? ? 4? ? ? active sync set-A? ?/dev/sde ? ? ? ?5? ? ? ?8? ? ? ?80? ? ? ? 5? ? ? active sync set-B? ?/dev/sdf ? ? ? 12? ? ? ?8? ? ? ?96? ? ? ? 6? ? ? active sync set-A? ?/dev/sdg ? ? ? ?7? ? ? ?8? ? ? 112? ? ? ? 7? ? ? active sync set-B? ?/dev/sdh ? ? ? ?8? ? ? ?8? ? ? 128? ? ? ? 8? ? ? active sync set-A? ?/dev/sdi ? ? ? ?9? ? ? ?8? ? ? 144? ? ? ? 9? ? ? active sync set-B? ?/dev/sdj ? ? ? 10? ? ? ?8? ? ? 160? ? ? ?10? ? ? active sync set-A? ?/dev/sdk ? ? ? 11? ? ? ?8? ? ? 176? ? ? ?11? ? ? active sync set-B? ?/dev/sdl We've tried changing the I/O scheduler, queue_depth, queue_type, read-ahead, etc, but nothing has helped. We've also upgraded all of the firmware, and installed HP's mpt2sas driver. We have 4 other Kafka clusters, however they're HP DL180 G6 servers. We completed the same CentOS 6.5 -> 7.2/Kafka 0.8 -> 0.9 upgrade on those clusters, and there has been no impact to their performance. We've been banging our heads against the wall for a few weeks now, really hoping someone from the community can point us in the right direction. Thanks, Kelly Lesperance
m.roth at 5-cent.us
2016-May-25 17:21 UTC
[CentOS] Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2
Kelly Lesperance wrote:> I?ve posted this on the forums at > https://www.centos.org/forums/viewtopic.php?f=47&t=57926&p=244614#p244614 > - posting to the list in the hopes of getting more eyeballs on it. > > We have a cluster of 23 HP DL380p Gen8 hosts running Kafka. Basic specs: > > 2x E5-2650 > 128 GB RAM > 12 x 4 TB 7200 RPM SATA drives connected to an HP H220 HBA > Dual port 10 GB NIC > > The drives are configured as one large RAID-10 volume with mdadm, > filesystem is XFS. The OS is not installed on the drive - we PXE boot a > CentOS image we've built with minimal packages installed, and do the OS > configuration via puppet. Originally, the hosts were running CentOS 6.5, > with Kafka 0.8.1, without issue. We recently upgraded to CentOS 7.2 and > Kafka 0.9, and that's when the trouble started.<SNIP> Really stupid question: are the drives in that the ones that came with the unit? mark, who, a few years ago, found serious issues with green drives in a server....
m.roth at 5-cent.us
2016-May-25 17:23 UTC
[CentOS] Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2
Kelly Lesperance wrote:> I?ve posted this on the forums at > https://www.centos.org/forums/viewtopic.php?f=47&t=57926&p=244614#p244614 > - posting to the list in the hopes of getting more eyeballs on it. > > We have a cluster of 23 HP DL380p Gen8 hosts running Kafka. Basic specs: > > 2x E5-2650 > 128 GB RAM > 12 x 4 TB 7200 RPM SATA drives connected to an HP H220 HBA > Dual port 10 GB NIC > > The drives are configured as one large RAID-10 volume with mdadm, > filesystem is XFS. The OS is not installed on the drive - we PXE boot a > CentOS image we've built with minimal packages installed, and do the OS > configuration via puppet. Originally, the hosts were running CentOS 6.5, > with Kafka 0.8.1, without issue. We recently upgraded to CentOS 7.2 and > Kafka 0.9, and that's when the trouble started.<SNIP> One more stupid question: could the configuration of the card for how the drives are accessed been accidentally changed? mark
Kelly Lesperance
2016-May-25 17:25 UTC
[CentOS] Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2
They are: [root at r1k1 ~] # hdparm -I /dev/sda /dev/sda: ATA device, with non-removable media Model Number: MB4000GCWDC Serial Number: S1Z06RW9 Firmware Revision: HPGD Transport: Serial, SATA Rev 3.0 Thanks, Kelly On 2016-05-25, 1:21 PM, "centos-bounces at centos.org on behalf of m.roth at 5-cent.us" <centos-bounces at centos.org on behalf of m.roth at 5-cent.us> wrote:>Kelly Lesperance wrote: >> I?ve posted this on the forums at >> https://www.centos.org/forums/viewtopic.php?f=47&t=57926&p=244614#p244614 >> - posting to the list in the hopes of getting more eyeballs on it. >> >> We have a cluster of 23 HP DL380p Gen8 hosts running Kafka. Basic specs: >> >> 2x E5-2650 >> 128 GB RAM >> 12 x 4 TB 7200 RPM SATA drives connected to an HP H220 HBA >> Dual port 10 GB NIC >> >> The drives are configured as one large RAID-10 volume with mdadm, >> filesystem is XFS. The OS is not installed on the drive - we PXE boot a >> CentOS image we've built with minimal packages installed, and do the OS >> configuration via puppet. Originally, the hosts were running CentOS 6.5, >> with Kafka 0.8.1, without issue. We recently upgraded to CentOS 7.2 and >> Kafka 0.9, and that's when the trouble started. ><SNIP> >Really stupid question: are the drives in that the ones that came with the >unit? > > mark, who, a few years ago, found serious issues with green drives in a > server.... > >_______________________________________________ >CentOS mailing list >CentOS at centos.org >https://lists.centos.org/mailman/listinfo/centos
Kelly Lesperance
2016-May-25 17:26 UTC
[CentOS] Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2
[merging] The HBA the drives are attached to has no configuration that I?m aware of. We would have had to accidentally change 23 of them ? Thanks, Kelly On 2016-05-25, 1:25 PM, "Kelly Lesperance" <klesperance at blackberry.com> wrote:>They are: > >[root at r1k1 ~] # hdparm -I /dev/sda > >/dev/sda: > >ATA device, with non-removable media > Model Number: MB4000GCWDC > Serial Number: S1Z06RW9 > Firmware Revision: HPGD > Transport: Serial, SATA Rev 3.0 > >Thanks, > >KellyOn 2016-05-25, 1:23 PM, "centos-bounces at centos.org on behalf of m.roth at 5-cent.us" <centos-bounces at centos.org on behalf of m.roth at 5-cent.us> wrote:>Kelly Lesperance wrote: >> I?ve posted this on the forums at >> https://www.centos.org/forums/viewtopic.php?f=47&t=57926&p=244614#p244614 >> - posting to the list in the hopes of getting more eyeballs on it. >> >> We have a cluster of 23 HP DL380p Gen8 hosts running Kafka. Basic specs: >> >> 2x E5-2650 >> 128 GB RAM >> 12 x 4 TB 7200 RPM SATA drives connected to an HP H220 HBA >> Dual port 10 GB NIC >> >> The drives are configured as one large RAID-10 volume with mdadm, >> filesystem is XFS. The OS is not installed on the drive - we PXE boot a >> CentOS image we've built with minimal packages installed, and do the OS >> configuration via puppet. Originally, the hosts were running CentOS 6.5, >> with Kafka 0.8.1, without issue. We recently upgraded to CentOS 7.2 and >> Kafka 0.9, and that's when the trouble started. ><SNIP> >One more stupid question: could the configuration of the card for how the >drives are accessed been accidentally changed? > > mark > >_______________________________________________ >CentOS mailing list >CentOS at centos.org >https://lists.centos.org/mailman/listinfo/centos
Gordon Messmer
2016-May-27 03:50 UTC
[CentOS] Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2
On 05/25/2016 09:54 AM, Kelly Lesperance wrote:> What we're seeing is that when the weekly raid-check script executes, performance nose dives, and I/O wait skyrockets. The raid check starts out fairly fast (20000K/sec - the limit that's been set), but then quickly drops down to about 4000K/Sec. dev.raid.speed sysctls are at the defaults:It looks like some pretty heavy writes are going on at the time. I'm not sure what you mean by "nose dives", but I'd expect *some* performance impact of running a read-intensive process like a RAID check at the same time you're running a write-intensive process. Do the same write-heavy processes run on the other clusters, where you aren't seeing performance issues?> avg-cpu: %user %nice %system %iowait %steal %idle > 9.24 0.00 1.32 20.02 0.00 69.42 > > Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn > sda 50.00 512.00 20408.00 512 20408 > sdb 50.00 512.00 20408.00 512 20408 > sdc 48.00 512.00 19984.00 512 19984 > sdd 48.00 512.00 19984.00 512 19984 > sdf 50.00 704.00 19968.00 704 19968 > sdg 47.00 512.00 19968.00 512 19968 > sdh 47.00 512.00 19968.00 512 19968 > sde 50.00 704.00 19968.00 704 19968 > sdj 48.00 512.00 19972.00 512 19972 > sdi 48.00 512.00 19972.00 512 19972 > sdk 48.00 512.00 19980.00 512 19980 > sdl 48.00 512.00 19980.00 512 19980 > md127 241.00 0.00 120280.00 0 120280
Kelly Lesperance
2016-May-27 13:21 UTC
[CentOS] Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2
All of our Kafka clusters are fairly write-heavy. The cluster in question is our second-heaviest ? we haven?t yet upgraded the heaviest, due to the issues we?ve been experiencing in this one. Here is an iostat example from a host within the same cluster, but without the RAID check running: [root at r2k1 ~] # iostat -xdmc 1 10 Linux 3.10.0-327.13.1.el7.x86_64 (r2k1) 05/27/16 _x86_64_ (32 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 8.87 0.02 1.28 0.21 0.00 89.62 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdd 0.02 0.55 0.15 27.06 0.03 11.40 859.89 1.02 37.40 36.13 37.41 6.86 18.65 sdf 0.02 0.48 0.15 26.99 0.03 11.40 862.17 0.15 5.56 40.94 5.37 7.27 19.73 sdk 0.03 0.58 0.22 27.10 0.03 11.40 857.01 1.60 58.49 36.20 58.67 7.17 19.58 sdb 0.02 0.52 0.15 27.43 0.03 11.40 848.37 0.02 0.78 42.84 0.55 7.07 19.50 sdj 0.02 0.55 0.15 27.11 0.03 11.40 858.28 0.62 22.70 41.97 22.59 7.43 20.27 sdg 0.03 0.68 0.22 27.76 0.03 11.40 836.98 0.76 27.10 34.36 27.04 7.33 20.51 sde 0.03 0.48 0.22 26.99 0.03 11.40 860.43 0.33 12.07 33.16 11.90 7.34 19.98 sda 0.03 0.52 0.22 27.43 0.03 11.40 846.65 0.57 20.48 36.42 20.35 7.34 20.31 sdh 0.02 0.68 0.15 27.76 0.03 11.40 838.63 0.47 16.66 40.96 16.53 7.20 20.09 sdc 0.03 0.55 0.22 27.06 0.03 11.40 858.19 0.74 27.30 36.96 27.22 7.55 20.58 sdi 0.03 0.53 0.22 27.13 0.03 11.40 856.04 1.60 58.50 27.43 58.75 5.21 14.24 sdl 0.02 0.56 0.15 27.11 0.03 11.40 858.27 1.12 41.09 27.89 41.16 5.00 13.63 md127 0.00 0.00 2.53 161.84 0.36 68.39 856.56 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 13.11 0.00 1.82 1.07 0.00 84.01 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdd 0.00 0.00 0.00 81.00 0.00 38.48 972.95 51.00 219.06 0.00 219.06 6.37 51.60 sdf 0.00 1.00 0.00 73.00 0.00 33.70 945.33 55.02 235.86 0.00 235.86 7.12 52.00 sdk 0.00 1.00 0.00 56.00 0.00 25.70 939.73 60.45 223.79 0.00 223.79 9.29 52.00 sdb 0.00 2.00 0.00 70.00 0.00 34.48 1008.70 58.88 292.81 0.00 292.81 7.37 51.60 sdj 0.00 3.00 0.00 62.00 0.00 29.87 986.60 59.32 243.48 0.00 243.48 8.26 51.20 sdg 0.00 1.00 0.00 49.00 0.00 23.43 979.45 60.37 234.98 0.00 234.98 10.53 51.60 sde 0.00 1.00 0.00 61.00 0.00 27.95 938.38 58.17 239.57 0.00 239.57 8.52 52.00 sda 0.00 2.00 0.00 56.00 0.00 27.48 1004.88 56.27 202.88 0.00 202.88 9.27 51.90 sdh 0.00 1.00 0.00 70.00 0.00 33.57 982.19 59.00 277.84 0.00 277.84 7.43 52.00 sdc 0.00 0.00 0.00 64.00 0.00 30.06 961.89 58.20 268.30 0.00 268.30 8.08 51.70 sdi 0.00 3.00 0.00 116.00 0.00 55.62 981.94 44.54 199.72 0.00 199.72 4.56 52.90 sdl 0.00 1.00 0.00 128.00 0.00 60.31 964.88 43.91 215.94 0.00 215.94 4.11 52.60 md127 0.00 0.00 0.00 1143.00 0.00 538.90 965.59 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 15.70 0.00 1.97 0.44 0.00 81.89 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdd 0.00 0.00 0.00 119.00 0.00 56.39 970.42 42.84 639.45 0.00 639.45 6.66 79.20 sdf 0.00 1.00 0.00 129.00 0.00 61.21 971.84 48.89 672.04 0.00 672.04 6.34 81.80 sdk 0.00 0.00 0.00 152.00 0.00 72.62 978.53 61.02 716.76 0.00 716.76 5.74 87.20 sdb 0.00 1.00 0.00 133.00 0.00 62.86 967.88 54.10 695.35 0.00 695.35 6.45 85.80 sdj 0.00 0.00 0.00 146.00 0.00 68.36 958.85 69.22 767.12 0.00 767.12 6.85 100.00 sdg 0.00 0.00 0.00 146.00 0.00 69.87 980.11 77.99 789.53 0.00 789.53 6.85 100.00 sde 0.00 1.00 0.00 141.00 0.00 66.96 972.60 56.21 707.61 0.00 707.61 6.21 87.60 sda 0.00 1.00 0.00 147.00 0.00 69.86 973.22 62.21 728.76 0.00 728.76 6.32 92.90 sdh 0.00 0.00 0.00 134.00 0.00 62.61 956.90 55.79 711.49 0.00 711.49 6.63 88.90 sdc 0.00 0.00 0.00 136.00 0.00 64.81 975.94 61.46 753.57 0.00 753.57 6.93 94.20 sdi 0.00 0.00 0.00 93.00 0.00 42.67 939.61 17.60 419.10 0.00 419.10 4.63 43.10 sdl 0.00 0.00 0.00 80.00 0.00 38.02 973.20 11.00 340.79 0.00 340.79 4.25 34.00 md127 0.00 0.00 0.00 87.00 0.00 40.99 964.97 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 12.11 0.00 1.35 0.00 0.00 86.54 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdd 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.01 15.00 0.00 15.00 15.00 1.50 sdf 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.01 11.00 0.00 11.00 11.00 1.10 sdk 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.01 11.00 0.00 11.00 11.00 1.10 sdb 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.01 7.00 0.00 7.00 7.00 0.70 sdj 0.00 0.00 0.00 2.00 0.00 0.06 64.50 0.01 733.50 0.00 733.50 7.50 1.50 sdg 0.00 0.00 0.00 10.00 0.00 2.88 588.90 0.55 1212.80 0.00 1212.80 15.50 15.50 sde 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.01 12.00 0.00 12.00 12.00 1.20 sda 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.01 11.00 0.00 11.00 11.00 1.10 sdh 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.02 20.00 0.00 20.00 20.00 2.00 sdc 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.02 17.00 0.00 17.00 17.00 1.70 sdi 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.01 12.00 0.00 12.00 12.00 1.20 sdl 0.00 0.00 0.00 1.00 0.00 0.00 1.00 0.02 17.00 0.00 17.00 17.00 1.70 md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 15.22 0.00 1.50 0.00 0.00 83.28 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdk 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdj 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdl 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 16.96 0.09 1.63 0.16 0.00 81.16 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdd 0.00 0.00 0.00 8.00 0.00 0.66 168.25 0.09 11.50 0.00 11.50 8.75 7.00 sdf 0.00 0.00 0.00 5.00 0.00 0.52 213.20 0.08 16.20 0.00 16.20 16.20 8.10 sdk 0.00 0.00 0.00 3.00 0.00 0.50 342.00 0.06 20.33 0.00 20.33 20.33 6.10 sdb 0.00 0.00 0.00 3.00 0.00 0.50 342.00 0.05 16.67 0.00 16.67 16.67 5.00 sdj 0.00 0.00 0.00 4.00 0.00 0.98 500.50 0.06 14.50 0.00 14.50 11.00 4.40 sdg 0.00 1.00 0.00 4.00 0.00 0.63 322.50 0.14 36.00 0.00 36.00 32.75 13.10 sde 0.00 0.00 0.00 5.00 0.00 0.52 213.20 0.07 13.60 0.00 13.60 13.60 6.80 sda 0.00 0.00 0.00 3.00 0.00 0.50 342.00 0.05 15.67 0.00 15.67 15.67 4.70 sdh 0.00 1.00 0.00 4.00 0.00 0.63 322.50 0.06 14.50 0.00 14.50 11.50 4.60 sdc 0.00 0.00 0.00 8.00 0.00 0.66 168.25 0.11 13.25 0.00 13.25 10.62 8.50 sdi 0.00 0.00 0.00 4.00 0.00 0.98 500.50 0.06 15.50 0.00 15.50 12.00 4.80 sdl 0.00 0.00 0.00 3.00 0.00 0.50 342.00 0.04 13.67 0.00 13.67 13.67 4.10 md127 0.00 0.00 0.00 17.00 0.00 3.78 455.53 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 14.08 0.00 1.50 0.00 0.00 84.42 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdk 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdj 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdl 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 14.89 0.00 1.98 0.00 0.00 83.13 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdd 0.00 0.00 0.00 90.00 0.00 41.31 940.01 27.25 302.80 0.00 302.80 7.07 63.60 sdf 0.00 0.00 0.00 87.00 0.00 41.35 973.44 22.73 261.30 0.00 261.30 6.92 60.20 sdk 0.00 2.00 0.00 97.00 0.00 42.08 888.42 39.86 410.94 0.00 410.94 8.10 78.60 sdb 0.00 0.00 0.00 87.00 0.00 41.07 966.82 24.39 280.30 0.00 280.30 7.14 62.10 sdj 0.00 1.00 0.00 91.00 0.00 41.94 943.92 36.37 399.62 0.00 399.62 8.44 76.80 sdg 0.00 0.00 0.00 86.00 0.00 40.67 968.48 31.76 369.33 0.00 369.33 8.81 75.80 sde 0.00 0.00 0.00 87.00 0.00 41.35 973.44 30.80 354.05 0.00 354.05 9.01 78.40 sda 0.00 0.00 0.00 87.00 0.00 41.07 966.82 32.61 374.80 0.00 374.80 8.57 74.60 sdh 0.00 0.00 0.00 86.00 0.00 40.67 968.48 29.52 343.23 0.00 343.23 8.56 73.60 sdc 0.00 0.00 0.00 89.00 0.00 40.81 939.07 32.80 360.15 0.00 360.15 8.91 79.30 sdi 0.00 1.00 0.00 91.00 0.00 41.94 943.92 19.60 215.34 0.00 215.34 5.62 51.10 sdl 0.00 2.00 0.00 97.00 0.00 42.08 888.42 19.59 201.93 0.00 201.93 4.69 45.50 md127 0.00 0.00 0.00 535.00 0.00 248.42 950.95 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 11.08 0.00 1.41 0.00 0.00 87.51 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdd 0.00 5.00 0.00 42.00 0.00 0.38 18.55 2.25 53.52 0.00 53.52 4.93 20.70 sdf 0.00 0.00 0.00 35.00 0.00 0.21 12.43 1.62 46.17 0.00 46.17 5.29 18.50 sdk 0.00 23.00 0.00 42.00 0.00 0.44 21.40 1.99 47.29 0.00 47.29 4.64 19.50 sdb 0.00 9.00 0.00 58.00 0.00 0.34 12.02 2.77 47.78 0.00 47.78 4.12 23.90 sdj 0.00 1.00 0.00 39.00 0.00 0.24 12.79 1.79 45.97 0.00 45.97 5.21 20.30 sdg 0.00 11.00 0.00 66.00 0.00 0.40 12.45 3.60 54.47 0.00 54.47 3.42 22.60 sde 0.00 0.00 0.00 35.00 0.00 0.21 12.43 2.13 61.00 0.00 61.00 8.89 31.10 sda 0.00 9.00 0.00 58.00 0.00 0.34 12.02 2.48 42.81 0.00 42.81 3.71 21.50 sdh 0.00 11.00 0.00 66.00 0.00 0.40 12.45 4.81 72.83 0.00 72.83 3.80 25.10 sdc 0.00 5.00 0.00 43.00 0.00 0.88 41.93 1.99 63.81 0.00 63.81 5.00 21.50 sdi 0.00 1.00 0.00 39.00 0.00 0.24 12.79 1.31 33.69 0.00 33.69 4.03 15.70 sdl 0.00 23.00 0.00 42.00 0.00 0.44 21.40 1.23 29.33 0.00 29.33 3.71 15.60 md127 0.00 0.00 0.00 313.00 0.00 2.01 13.14 0.00 0.00 0.00 0.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 16.16 0.03 1.66 0.00 0.00 82.15 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdk 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdj 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdg 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdl 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 On 2016-05-26, 11:50 PM, "centos-bounces at centos.org on behalf of Gordon Messmer" <centos-bounces at centos.org on behalf of gordon.messmer at gmail.com> wrote:>On 05/25/2016 09:54 AM, Kelly Lesperance wrote: >> What we're seeing is that when the weekly raid-check script executes, performance nose dives, and I/O wait skyrockets. The raid check starts out fairly fast (20000K/sec - the limit that's been set), but then quickly drops down to about 4000K/Sec. dev.raid.speed sysctls are at the defaults: > >It looks like some pretty heavy writes are going on at the time. I'm not >sure what you mean by "nose dives", but I'd expect *some* performance >impact of running a read-intensive process like a RAID check at the same >time you're running a write-intensive process. > >Do the same write-heavy processes run on the other clusters, where you >aren't seeing performance issues? > >> avg-cpu: %user %nice %system %iowait %steal %idle >> 9.24 0.00 1.32 20.02 0.00 69.42 >> >> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn >> sda 50.00 512.00 20408.00 512 20408 >> sdb 50.00 512.00 20408.00 512 20408 >> sdc 48.00 512.00 19984.00 512 19984 >> sdd 48.00 512.00 19984.00 512 19984 >> sdf 50.00 704.00 19968.00 704 19968 >> sdg 47.00 512.00 19968.00 512 19968 >> sdh 47.00 512.00 19968.00 512 19968 >> sde 50.00 704.00 19968.00 704 19968 >> sdj 48.00 512.00 19972.00 512 19972 >> sdi 48.00 512.00 19972.00 512 19972 >> sdk 48.00 512.00 19980.00 512 19980 >> sdl 48.00 512.00 19980.00 512 19980 >> md127 241.00 0.00 120280.00 0 120280 > >_______________________________________________ >CentOS mailing list >CentOS at centos.org >https://lists.centos.org/mailman/listinfo/centos
Apparently Analagous Threads
- Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2
- Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2
- Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2
- Btrfs High IO-Wait
- 10 Node OCFS2 Cluster - Performance