Killian De Volder
2012-Apr-08 21:14 UTC
DomU network stalling when Dom0 generates a lot of TX
Hello,
I''m experiencing the following troubles:
When Dom0 generates a lot of TX traffic, the DomU''s often stall (once
every ~2mins, but it''s fairly irregular), anywhere from 300ms up to
1800ms.
I made a little python script to try and figure out if the machine was getting
cpu time:(If this is a correct way to test it.)
The time result of the script below is fairly consistent, indicating to me the
machine is not stalling. (Unless the time is also stalled ofcours, but ntpq -p
look fine.)
"""
import time
while True:
ts=time.time()
for i in range(0,10**6):
i=i+1-1
te=time.time()
print te-ts
"""
It''s also safe (I think) to exclude the network driver from the
equation, as the same problem occurs on a bridge without a physical network
drive. (bridge: dmz)
"""
# brctl show
bridge name bridge id STP enabled interfaces
dmz 8000.feffffffffff no vif1.2
vif2.0
lan 8000.00c049593d25 no eth_lan
vif1.0
vif11.0
wan 8000.00c049593e3f no eth_wan
vif1.1
"""
Example of a bad ping:
64 bytes from doc (172.17.0.2): icmp_req=3288 ttl=64 time=1130 ms
64 bytes from doc (172.17.0.2): icmp_req=3289 ttl=64 time=920 ms
64 bytes from doc (172.17.0.2): icmp_req=3290 ttl=64 time=710 ms
64 bytes from doc (172.17.0.2): icmp_req=3291 ttl=64 time=500 ms
64 bytes from doc (172.17.0.2): icmp_req=3292 ttl=64 time=300 ms
64 bytes from doc (172.17.0.2): icmp_req=3293 ttl=64 time=91.0 ms
64 bytes from doc (172.17.0.2): icmp_req=3294 ttl=64 time=0.147 ms
64 bytes from doc (172.17.0.2): icmp_req=3295 ttl=64 time=0.180 ms
(However during the same time dom0 is quite responsive.)
I also tried turning of offloading (TX,TO,GPO,...)
The TXqueuelen is 1000.
Dom0 is loaded with CPU during this time, but manually starting CPU load does
not seem to create the problem.
Version info:
Kernel: xen 3.2.1-gentoo-r2
Xen-Hyp: Xen version 4.1.2
Does anyone has any ideas left of what could cause this ?
Kind regards,
Killian De Volder