Patrick Bervoets
2015-Jan-30 09:21 UTC
[CentOS] C6 server responding extremely slow on ssh interactive
Op 29-01-15 om 21:21 schreef Gordon Messmer:> > I haven't seen delays anywhere near that long before, even with heavy swapping. But I guess I'd look at that sort of thing first. > > Run "iostat -x 2" and see if your disks are being fully utilized during the pauses. Run "top" and see if there's anything useful there. Check swap use with "free". Try decreasing swappiness with "echo 10 >/proc/sys/vm/swappiness" > _______________________________________________iostat random sample avg-cpu: %user %nice %system %iowait %steal %idle 3,77 0,00 1,45 0,00 0,00 94,78 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0,00 0,50 0,00 11,00 0,00 136,00 12,36 0,00 0,00 0,00 0,00 sdb 0,00 0,00 0,00 11,50 0,00 148,00 12,87 0,00 0,09 0,09 0,10 sdc 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-0 0,00 0,00 0,00 4,00 0,00 32,00 8,00 0,00 0,00 0,00 0,00 dm-1 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-2 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-3 0,00 0,00 0,00 11,50 0,00 148,00 12,87 0,00 0,13 0,13 0,15 dm-4 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-5 0,00 0,00 0,00 7,50 0,00 104,00 13,87 0,00 0,07 0,07 0,05 atop ATOP - 2015/01/30 10:18:14 --------- 10s elapsed PRC | sys 3.87s | user 14.93s | #proc 197 | #zombie 0 | #exit 0 | CPU | sys 30% | user 119% | irq 1% | idle 533% | wait 0% | cpu | sys 2% | user 21% | irq 0% | idle 56% | cpu000 w 0% | cpu | sys 3% | user 19% | irq 0% | idle 59% | cpu001 w 0% | cpu | sys 8% | user 15% | irq 0% | idle 62% | cpu003 w 0% | cpu | sys 3% | user 13% | irq 0% | idle 73% | cpu002 w 0% | cpu | sys 3% | user 14% | irq 0% | idle 70% | cpu006 w 0% | cpu | sys 4% | user 15% | irq 0% | idle 66% | cpu005 w 0% | cpu | sys 2% | user 11% | irq 0% | idle 77% | cpu007 w 0% | cpu | sys 5% | user 11% | irq 0% | idle 73% | cpu004 w 0% | CPL | avg1 1.92 | avg5 1.97 | avg15 1.61 | csw 229508 | intr 191786 | MEM | tot 47.1G | free 15.9G | cache 519.3M | buff 109.3M | slab 353.3M | SWP | tot 7.8G | free 7.3G | | vmcom 31.8G | vmlim 31.3G | LVM | g_15k-lv_15k | busy 0% | read 1 | write 98 | avio 0.15 ms | LVM | to-lv_oracle | busy 0% | read 0 | write 66 | avio 0.06 ms | LVM | v_oracletest | busy 0% | read 0 | write 79 | avio 0.05 ms | LVM | uito-lv_root | busy 0% | read 0 | write 1 | avio 3.00 ms | DSK | sdb | busy 0% | read 1 | write 98 | avio 0.16 ms | DSK | sda | busy 0% | read 0 | write 146 | avio 0.08 ms | NET | transport | tcpi 12 | tcpo 12 | udpi 0 | udpo 0 | NET | network | ipi 13 | ipo 12 | ipfrw 0 | deliv 12 | NET | vnet0 8% | pcki 2273 | pcko 2581 | si 850 Kbps | so 458 Kbps | NET | vnet1 4% | pcki 2186 | pcko 2075 | si 391 Kbps | so 422 Kbps | NET | eth0 0% | pcki 1330 | pcko 1432 | si 159 Kbps | so 537 Kbps | NET | br0 ---- | pcki 43 | pcko 22 | si 1 Kbps | so 4 Kbps | PID SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPU CMD 1960 2.37s 9.23s 0K 0K 8K 2520K -- - S 101% qemu-kvm 1990 0.69s 5.65s 0K 0K 0K 1196K -- - S 55% qemu-kvm 1975 0.50s 0.00s 0K 0K 0K 0K -- - S 4% kvm-pit-wq 2009 0.20s 0.00s 0K 0K 0K 0K -- - S 2% kvm-pit-wq 23321 0.05s 0.02s 0K 0K 0K 0K -- - R 1% atop 18384 0.05s 0.01s 0K 0K 0K 0K -- - S 1% atop 1719 0.00s 0.01s 0K 0K 0K 0K -- - S 0% hpasmlited 1746 0.00s 0.01s 0K 0K 0K 0K -- - S 0% hp-asrd 35 0.01s 0.00s 0K 0K 0K 0K -- - D 0% events/0 10707 0.00s 0.00s 0K 0K 0K 0K -- - S 0% arping 10740 0.00s 0.00s 0K 0K 0K 0K -- - S 0% arping 58 0.00s 0.00s 0K 0K 0K 0K -- - S 0% kblockd/0 18425 0.00s 0.00s 0K 0K 0K 0K -- - S 0% flush-253:0 free total used free shared buffers cached Mem: 48218 31895 16323 0 108 519 -/+ buffers/cache: 31267 16951 Swap: 7951 476 7475 But I had the same pauses when free gave zero swap. If swap is the problem: would it matter if a command is run with ssh (ssh @ "command") or in a shell? When running atop in a shell I observed pauses between screen updates longer than 10 seconds but atop displayed the time as "10 seconds later". So drifting away in time. While a date command sent a the same time gave the correct date. So it seems like the screens are buffered and are being displayed with a delay.
John R Pierce
2015-Jan-30 09:29 UTC
[CentOS] C6 server responding extremely slow on ssh interactive
On 1/30/2015 1:21 AM, Patrick Bervoets wrote:> free > total used free shared buffers cached > Mem: 48218 31895 16323 0 108 519 > -/+ buffers/cache: 31267 16951 > Swap: 7951 476 7475thats an unusually small amount of 'cached'... I usually see the disk cache as 30-50% of the total memory. does this system not use much disk IO ? -- john r pierce 37N 122W somewhere on the middle of the left coast
Patrick Bervoets
2015-Jan-30 09:39 UTC
[CentOS] C6 server responding extremely slow on ssh interactive
Op 30-01-15 om 10:29 schreef John R Pierce:> On 1/30/2015 1:21 AM, Patrick Bervoets wrote: >> free >> total used free shared buffers cached >> Mem: 48218 31895 16323 0 108 519 >> -/+ buffers/cache: 31267 16951 >> Swap: 7951 476 7475 > > thats an unusually small amount of 'cached'... I usually see the disk cache as 30-50% of the total memory. does this system not use much disk IOit's a kvm-host with lvm, the vm's all have there own lv's (some on a different pv). Would that explain the small cache?
Gordon Messmer
2015-Jan-30 18:40 UTC
[CentOS] C6 server responding extremely slow on ssh interactive
On 01/30/2015 01:21 AM, Patrick Bervoets wrote:> iostat random sample"Random" is difficult to evaluate. Is that representative? Are sda, sdb, and sdc typically less than 1% utilized? Or are there large utilization values right after a hang?> If swap is the problem: would it matter if a command is run with ssh > (ssh @ "command") or in a shell?Let's assume it's not, but I would say "no" to the question. I'd expect the same delays regardless, if the system were swapping heavily.> When running atop in a shell I observed pauses between screen updates > longer than 10 seconds but atop displayed the time as "10 seconds > later". So drifting away in time. > While a date command sent a the same time gave the correct date.That's really weird. Does the time displayed by "atop" eventually catch up? Does the problem persist across reboots? Is this system running ntpd? Does the problem persist if you turn ntpd off and reboot?
Patrick Bervoets
2015-Jan-30 20:32 UTC
[CentOS] C6 server responding extremely slow on ssh interactive
Op 30-01-15 om 19:40 schreef Gordon Messmer:> On 01/30/2015 01:21 AM, Patrick Bervoets wrote: >> iostat random sample > > "Random" is difficult to evaluate. Is that representative? Are sda, sdb, and sdc typically less than 1% utilized? Or are there large utilization values right after a hang?All the output was in the same scale and during a hang in an other shell.> > > Does the time displayed by "atop" eventually catch up?Not that I know. But I gave up :-)> > Does the problem persist across reboots?Alas, one of the vm's is our production database. My next update/reboot window is next saturday. But I had the problem just before the last reboot (halfway january). But hadn't closely monitored it afterwards. Before - in december - I never experienced it. But it's a server I tend do leave alone, so I'm never very busy on a shell.> > Is this system running ntpd?yes> > Does the problem persist if you turn ntpd off and reboot? > _I'll check that next week.
Possibly Parallel Threads
- C6 server responding extremely slow on ssh interactive
- C6 server responding extremely slow on ssh interactive
- C6 server responding extremely slow on ssh interactive
- C6 server responding extremely slow on ssh interactive
- C6 server responding extremely slow on ssh interactive