I''m playing with nbd+raid1, and am finding that during a resync the network in xenU is dying and simply not sending packets anymore. At first I thought this was a bridging problem, but in xen0 I have removed the vif from the bridge and given it its own ip address, and given the eth interface in xenU a similar ip address, but no traffic is passing anymore. After a while though, it seemed to come good again and I was able to add the interface to the bridge again and it started working. The only strange thing in the kernel logs was this in xenU: eth0: full queue wasn''t stopped! but i''m not sure at what point this was logged though. James
"full queue wasn''t stopped" should never be printed. If you can reproduce this message then it is worth printing some more info at th esame point -- for example, np->tx->req_prod, np->tx->resp_prod, np->tx_resp_cons. This will let us see whether the ring is indeed full. If the network stack/driver has got itself into a state wher eit can print this message, I''m not surprised it hangs for a while. e.g., add this where that message gets printed in netfront.c: { unsigned long flags; local_irq_save(flags); printk(KERN_ALERT "full=%d req_prod=%08x rsp_prod=%08x" "rsp_cons=%08x\n", np->tx_full, np->tx->req_prod, np->tx->resp_prod, np->tx_resp_cons); local_irq_restore(flags); } -- Keir> I''m playing with nbd+raid1, and am finding that during a resync the network in xenU is dying and simply not sending packets anymore. At first I thought this was a bridging problem, but in xen0 I have removed the vif from the bridge and given it its own ip address, and given the eth interface in xenU a similar ip address, but no traffic is passing anymore. > > After a while though, it seemed to come good again and I was able to add the interface to the bridge again and it started working. > > The only strange thing in the kernel logs was this in xenU: > > eth0: full queue wasn''t stopped! > > but i''m not sure at what point this was logged though. > > James-=- MIME -=- --_5CA9A503-EEF7-4FB1-8592-2E2052031B95_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable I''m playing with nbd+raid1, and am finding that during a resync the network in xenU is dying and simply not sending packets anymore. At first I thought this was a bridging problem, but in xen0 I have removed the vif from the bridge and given it its own ip address, and given the eth interface in xenU a similar ip address, but no traffic is passing anymore. After a while though, it seemed to come good again and I was able to add the interface to the bridge again and it started working. The only strange thing in the kernel logs was this in xenU: eth0: full queue wasn''t stopped! but i''m not sure at what point this was logged though. James --_5CA9A503-EEF7-4FB1-8592-2E2052031B95_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable <HTML dir=3Dltr><HEAD></HEAD> <BODY> <DIV><FONT face=3DArial color=3D#000000 size=3D2>I''m playing with nbd+raid1, and am finding that during a resync the network in xenU is dying and simply not sending packets anymore. At first I thought this was a bridging problem, but in xen0 I have removed the vif from the bridge and given it its own ip address, and given the eth interface in xenU a similar ip address, but no traffic is passing anymore.</FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2>After a while though, it seemed to come good again and I was able to add the interface to the bridge again and it started working.</FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2>The only strange thing in the kernel logs was this in xenU:</FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2>eth0: full queue wasn''t stopped!<BR></DIV></FONT> <DIV><FONT face=3DArial size=3D2>but i''m not sure at what point this was logged though.</FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2>James</FONT></DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV> <DIV><FONT face=3DArial size=3D2></FONT> </DIV></BODY></HTML> --_5CA9A503-EEF7-4FB1-8592-2E2052031B95_-- ------------------------------------------------------- This SF.Net email is sponsored by BEA Weblogic Workshop FREE Java Enterprise J2EE developer tools! Get your free copy of BEA WebLogic Workshop 8.1 today. http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel ------------------------------------------------------- This SF.Net email is sponsored by BEA Weblogic Workshop FREE Java Enterprise J2EE developer tools! Get your free copy of BEA WebLogic Workshop 8.1 today. http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
I saw something go by on the list a week or so ago about network hangs, and I may be observing something similar. The basic setup is: two guest domains running apache, and a different box running httpperf against them, 100 requests per second for the same 100kbyte file. This runs ok for a time, then suddenly chokes and all traffic comes to a stop. Then a few seconds later traffic seems to pick up again. This behavior is not observed with a workload of 40 requests/second. At 80/second, the problem starts appearing, but not very frequently. We can provide sufficient detail if anyone wants to try to reproduce this. Have there been any fixes relating to this lately? We are using xen bits that are a few weeks old right now. Rob Gardner HP ------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> I saw something go by on the list a week or so ago about network hangs, > and I may be observing something similar. > > The basic setup is: two guest domains running apache, and a different > box running httpperf against them, 100 requests per second for the same > 100kbyte file. > > This runs ok for a time, then suddenly chokes and all traffic comes to a > stop. Then a few seconds later traffic seems to pick up again. > > This behavior is not observed with a workload of 40 requests/second. At > 80/second, the problem starts appearing, but not very frequently. > > We can provide sufficient detail if anyone wants to try to reproduce this.Just to check I understand your setup: You have a domain 0 implementing bridging, then a domain 1 and a domain 2 each running apache. When the domain chokes, do you you see any drops or errors in the stats as reported by ifconfig? It would be good to enable the debugging printf''s in both the netfront and netback drivers. Can you repeat this with a single non-0 domain? Can you repeat it more easily by generating a background network load in dom0? BTW: Have you got CONNECTION_TRACKING compiled into the dom0 kernel? This seems to cripple Linux performance, hence it was recently made a module in our config. Ian ------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Ian Pratt wrote:> > Just to check I understand your setup: You have a domain 0 > implementing bridging, then a domain 1 and a domain 2 each > running apache.That''s correct.> When the domain chokes, do you you see any drops or errors in the > stats as reported by ifconfig?In domain 0, ifconfig reports 0 errors, 0 dropped on eth0; 0 errors, 2 dropped on vif1.0, and 0 errors, 5 dropped on vif2.0; 0 errors, 0 dropped on xen-br0. In domain 1, ifconfig reports 0 for errors and dropped. For some reason I can''t get a console onto the other domain right now, but I suspect it will report the same thing as domain 1.> It would be good to enable the debugging printf''s in both the > netfront and netback drivers.Can do.> Can you repeat this with a single non-0 domain? Can you repeat it more > easily by generating a background network load in dom0?Can try these.> BTW: Have you got CONNECTION_TRACKING compiled into the dom0 > kernel? This seems to cripple Linux performance, hence it was > recently made a module in our config.We are using only default options. Rob Gardner HP ------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Thu, 2004-09-09 at 16:06, Ian Pratt wrote:> It would be good to enable the debugging printf''s in both the > netfront and netback drivers. > > Can you repeat this with a single non-0 domain? Can you repeat it more > easily by generating a background network load in dom0?The network hang behavior does occur with just a single non-0 domain. We got the following output in dmesg that looks interesting: ... Freeing unused kernel memory: 116k freed EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,1), internal journal Adding Swap: 1020592k swap-space (priority -1) ioperm not fully supported - set iopl to 3 ioperm not fully supported - set iopl to 3 ioperm not fully supported - set iopl to 3 ioperm not fully supported - set iopl to 3 device eth0 entered promiscuous mode xen-br0: port 1(eth0) entering learning state xen-br0: port 1(eth0) entering forwarding state xen-br0: topology change detected, propagating (file=interface.c, line=140) Successfully created netif device vif1.0 entered promiscuous mode xen-br0: port 2(vif1.0) entering learning state xen-br0: port 2(vif1.0) entering forwarding state xen-br0: topology change detected, propagating (file=interface.c, line=140) Successfully created netif device vif2.0 entered promiscuous mode xen-br0: port 3(vif2.0) entering learning state xen-br0: port 3(vif2.0) entering forwarding state xen-br0: topology change detected, propagating ip_conntrack: table full, dropping packet. ip_conntrack: table full, dropping packet. ip_conntrack: table full, dropping packet. ip_conntrack: table full, dropping packet. ip_conntrack: table full, dropping packet. ip_conntrack: table full, dropping packet. ip_conntrack: table full, dropping packet. ip_conntrack: table full, dropping packet. ip_conntrack: table full, dropping packet. ip_conntrack: table full, dropping packet. NET: 481 messages suppressed. ip_conntrack: table full, dropping packet. NET: 532 messages suppressed. ip_conntrack: table full, dropping packet. NET: 547 messages suppressed. ip_conntrack: table full, dropping packet. NET: 393 messages suppressed. ip_conntrack: table full, dropping packet. NET: 25 messages suppressed. ip_conntrack: table full, dropping packet. NET: 23 messages suppressed. ip_conntrack: table full, dropping packet. NET: 33 messages suppressed. ip_conntrack: table full, dropping packet. ------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Looks as though perhaps the connection-tracking table is full. :-) If you are churning through a lot of TCP connections then the conntrack table may be full of defunct TCBs in TIME WAIT (2MSL) state. Not sure what the best solution is: when we were doing evaluation tests for our paper we disabled connection tracking (which means that things like NAT are unavailable). I haven''t looked around, but there may well be a way to tell Linux to reuse TCBs in TIME WAIT state. This would exaplain networking drop-outs. No more connections can be made until some old TCBs are garbage collected, after a 120s timeout. -- Keir> ip_conntrack: table full, dropping packet. > ip_conntrack: table full, dropping packet. > ip_conntrack: table full, dropping packet. > ip_conntrack: table full, dropping packet. > ip_conntrack: table full, dropping packet. > ip_conntrack: table full, dropping packet. > ip_conntrack: table full, dropping packet. > ip_conntrack: table full, dropping packet. > ip_conntrack: table full, dropping packet. > ip_conntrack: table full, dropping packet. > NET: 481 messages suppressed. > ip_conntrack: table full, dropping packet. > NET: 532 messages suppressed. > ip_conntrack: table full, dropping packet. > NET: 547 messages suppressed. > ip_conntrack: table full, dropping packet. > NET: 393 messages suppressed. > ip_conntrack: table full, dropping packet. > NET: 25 messages suppressed. > ip_conntrack: table full, dropping packet. > NET: 23 messages suppressed. > ip_conntrack: table full, dropping packet. > NET: 33 messages suppressed. > ip_conntrack: table full, dropping packet.------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> Not sure what the best solution is: when we were doing evaluation > tests for our paper we disabled connection tracking (which means that > things like NAT are unavailable). I haven''t looked around, but there > may well be a way to tell Linux to reuse TCBs in TIME WAIT state.More detail: Look in /proc/net/ip_conntrack. Most likely you''ll see lots of connects in TIME_WAIT. You can adjust the maximum number of tracked connections by echoing to /proc/sys/net/ipv4/ip_conntrack_max. A better solution, however, is probably to modify the individual timeout values for each state. For example: echo "5" >/proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_timeout_time_wait [disclaimer: I haven''t tried this myself, but google + src indicates this is the most promising approach.] -- Keir ------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> xen-br0: port 3(vif2.0) entering learning state > xen-br0: port 3(vif2.0) entering forwarding state > xen-br0: topology change detected, propagating > ip_conntrack: table full, dropping packet. > ip_conntrack: table full, dropping packet.As I suspected in a previous email, you''re using an old configuration file where Linux''s connection tracking was enabled by default. Linux''s connection tracking code seems to remain active and slow things down even if you''re not using it. That''s why I changed the config option into ''module'' rather than ''yes'' several weeks back. Ian ------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel