Patrick Bervoets
2015-Jan-29 07:28 UTC
[CentOS] C6 server responding extremely slow on ssh interactive
Op 29-01-15 om 00:00 schreef Gordon Messmer:> On 01/28/2015 12:12 PM, Patrick Bervoets wrote: >> >> ARPING 192.168.1.15 from 0.0.0.0 br0 >> Unicast reply from 192.168.1.15 [AC:16:2D:72:67:D4] 0.723ms >> Sent 1 probes (1 broadcast(s)) >> Received 1 response(s) >> >> Thanks anyway > > I'm not sure what you mean by "thanks anyway". > > You got a response. There's an IPv4 conflict on your network. That's why you're seeing those delays. If there's no conflict, you should see 0 responses. >Gordon, I'm sorry, I misunderstood you (and arping -D) This was the result of arping on another host; I thought I should see 2 responses in case of an ip conflict. Arping on the troublesome server gives 0 responses. I just tried with a physical console on that server and there I got the same unresponsive behaviour. Does this rule out network related problems? Mark (m.roth) suggested the vms eating up the video bus. (2 vms with an Oracle database) But I'm not sure how I could test that. Patrick
Gordon Messmer
2015-Jan-29 20:21 UTC
[CentOS] C6 server responding extremely slow on ssh interactive
On 01/28/2015 11:28 PM, Patrick Bervoets wrote:> > Arping on the troublesome server gives 0 responses. > > I just tried with a physical console on that server and there I got the > same unresponsive behaviour.Well, that's a different story, then. :) I haven't seen delays anywhere near that long before, even with heavy swapping. But I guess I'd look at that sort of thing first. Run "iostat -x 2" and see if your disks are being fully utilized during the pauses. Run "top" and see if there's anything useful there. Check swap use with "free". Try decreasing swappiness with "echo 10 >/proc/sys/vm/swappiness"
Patrick Bervoets
2015-Jan-30 09:21 UTC
[CentOS] C6 server responding extremely slow on ssh interactive
Op 29-01-15 om 21:21 schreef Gordon Messmer:> > I haven't seen delays anywhere near that long before, even with heavy swapping. But I guess I'd look at that sort of thing first. > > Run "iostat -x 2" and see if your disks are being fully utilized during the pauses. Run "top" and see if there's anything useful there. Check swap use with "free". Try decreasing swappiness with "echo 10 >/proc/sys/vm/swappiness" > _______________________________________________iostat random sample avg-cpu: %user %nice %system %iowait %steal %idle 3,77 0,00 1,45 0,00 0,00 94,78 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0,00 0,50 0,00 11,00 0,00 136,00 12,36 0,00 0,00 0,00 0,00 sdb 0,00 0,00 0,00 11,50 0,00 148,00 12,87 0,00 0,09 0,09 0,10 sdc 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-0 0,00 0,00 0,00 4,00 0,00 32,00 8,00 0,00 0,00 0,00 0,00 dm-1 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-2 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-3 0,00 0,00 0,00 11,50 0,00 148,00 12,87 0,00 0,13 0,13 0,15 dm-4 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 dm-5 0,00 0,00 0,00 7,50 0,00 104,00 13,87 0,00 0,07 0,07 0,05 atop ATOP - 2015/01/30 10:18:14 --------- 10s elapsed PRC | sys 3.87s | user 14.93s | #proc 197 | #zombie 0 | #exit 0 | CPU | sys 30% | user 119% | irq 1% | idle 533% | wait 0% | cpu | sys 2% | user 21% | irq 0% | idle 56% | cpu000 w 0% | cpu | sys 3% | user 19% | irq 0% | idle 59% | cpu001 w 0% | cpu | sys 8% | user 15% | irq 0% | idle 62% | cpu003 w 0% | cpu | sys 3% | user 13% | irq 0% | idle 73% | cpu002 w 0% | cpu | sys 3% | user 14% | irq 0% | idle 70% | cpu006 w 0% | cpu | sys 4% | user 15% | irq 0% | idle 66% | cpu005 w 0% | cpu | sys 2% | user 11% | irq 0% | idle 77% | cpu007 w 0% | cpu | sys 5% | user 11% | irq 0% | idle 73% | cpu004 w 0% | CPL | avg1 1.92 | avg5 1.97 | avg15 1.61 | csw 229508 | intr 191786 | MEM | tot 47.1G | free 15.9G | cache 519.3M | buff 109.3M | slab 353.3M | SWP | tot 7.8G | free 7.3G | | vmcom 31.8G | vmlim 31.3G | LVM | g_15k-lv_15k | busy 0% | read 1 | write 98 | avio 0.15 ms | LVM | to-lv_oracle | busy 0% | read 0 | write 66 | avio 0.06 ms | LVM | v_oracletest | busy 0% | read 0 | write 79 | avio 0.05 ms | LVM | uito-lv_root | busy 0% | read 0 | write 1 | avio 3.00 ms | DSK | sdb | busy 0% | read 1 | write 98 | avio 0.16 ms | DSK | sda | busy 0% | read 0 | write 146 | avio 0.08 ms | NET | transport | tcpi 12 | tcpo 12 | udpi 0 | udpo 0 | NET | network | ipi 13 | ipo 12 | ipfrw 0 | deliv 12 | NET | vnet0 8% | pcki 2273 | pcko 2581 | si 850 Kbps | so 458 Kbps | NET | vnet1 4% | pcki 2186 | pcko 2075 | si 391 Kbps | so 422 Kbps | NET | eth0 0% | pcki 1330 | pcko 1432 | si 159 Kbps | so 537 Kbps | NET | br0 ---- | pcki 43 | pcko 22 | si 1 Kbps | so 4 Kbps | PID SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPU CMD 1960 2.37s 9.23s 0K 0K 8K 2520K -- - S 101% qemu-kvm 1990 0.69s 5.65s 0K 0K 0K 1196K -- - S 55% qemu-kvm 1975 0.50s 0.00s 0K 0K 0K 0K -- - S 4% kvm-pit-wq 2009 0.20s 0.00s 0K 0K 0K 0K -- - S 2% kvm-pit-wq 23321 0.05s 0.02s 0K 0K 0K 0K -- - R 1% atop 18384 0.05s 0.01s 0K 0K 0K 0K -- - S 1% atop 1719 0.00s 0.01s 0K 0K 0K 0K -- - S 0% hpasmlited 1746 0.00s 0.01s 0K 0K 0K 0K -- - S 0% hp-asrd 35 0.01s 0.00s 0K 0K 0K 0K -- - D 0% events/0 10707 0.00s 0.00s 0K 0K 0K 0K -- - S 0% arping 10740 0.00s 0.00s 0K 0K 0K 0K -- - S 0% arping 58 0.00s 0.00s 0K 0K 0K 0K -- - S 0% kblockd/0 18425 0.00s 0.00s 0K 0K 0K 0K -- - S 0% flush-253:0 free total used free shared buffers cached Mem: 48218 31895 16323 0 108 519 -/+ buffers/cache: 31267 16951 Swap: 7951 476 7475 But I had the same pauses when free gave zero swap. If swap is the problem: would it matter if a command is run with ssh (ssh @ "command") or in a shell? When running atop in a shell I observed pauses between screen updates longer than 10 seconds but atop displayed the time as "10 seconds later". So drifting away in time. While a date command sent a the same time gave the correct date. So it seems like the screens are buffered and are being displayed with a delay.