Killian De Volder
2012-Apr-08 21:14 UTC
DomU network stalling when Dom0 generates a lot of TX
Hello, I''m experiencing the following troubles: When Dom0 generates a lot of TX traffic, the DomU''s often stall (once every ~2mins, but it''s fairly irregular), anywhere from 300ms up to 1800ms. I made a little python script to try and figure out if the machine was getting cpu time:(If this is a correct way to test it.) The time result of the script below is fairly consistent, indicating to me the machine is not stalling. (Unless the time is also stalled ofcours, but ntpq -p look fine.) """ import time while True: ts=time.time() for i in range(0,10**6): i=i+1-1 te=time.time() print te-ts """ It''s also safe (I think) to exclude the network driver from the equation, as the same problem occurs on a bridge without a physical network drive. (bridge: dmz) """ # brctl show bridge name bridge id STP enabled interfaces dmz 8000.feffffffffff no vif1.2 vif2.0 lan 8000.00c049593d25 no eth_lan vif1.0 vif11.0 wan 8000.00c049593e3f no eth_wan vif1.1 """ Example of a bad ping: 64 bytes from doc (172.17.0.2): icmp_req=3288 ttl=64 time=1130 ms 64 bytes from doc (172.17.0.2): icmp_req=3289 ttl=64 time=920 ms 64 bytes from doc (172.17.0.2): icmp_req=3290 ttl=64 time=710 ms 64 bytes from doc (172.17.0.2): icmp_req=3291 ttl=64 time=500 ms 64 bytes from doc (172.17.0.2): icmp_req=3292 ttl=64 time=300 ms 64 bytes from doc (172.17.0.2): icmp_req=3293 ttl=64 time=91.0 ms 64 bytes from doc (172.17.0.2): icmp_req=3294 ttl=64 time=0.147 ms 64 bytes from doc (172.17.0.2): icmp_req=3295 ttl=64 time=0.180 ms (However during the same time dom0 is quite responsive.) I also tried turning of offloading (TX,TO,GPO,...) The TXqueuelen is 1000. Dom0 is loaded with CPU during this time, but manually starting CPU load does not seem to create the problem. Version info: Kernel: xen 3.2.1-gentoo-r2 Xen-Hyp: Xen version 4.1.2 Does anyone has any ideas left of what could cause this ? Kind regards, Killian De Volder