thr3ads.net - Xen devel - [Xen-devel] network hang [Sep 2004]

If this information is useful, please help other people find it:
Share via:

James Harper

2004-Sep-02 00:55 UTC

[Xen-devel] network hang

I''m playing with nbd+raid1, and am finding that during a resync the
network in xenU is dying and simply not sending packets anymore. At first I
thought this was a bridging problem, but in xen0 I have removed the vif from the
bridge and given it its own ip address, and given the eth interface in xenU a
similar ip address, but no traffic is passing anymore.

After a while though, it seemed to come good again and I was able to add the
interface to the bridge again and it started working.

The only strange thing in the kernel logs was this in xenU:

eth0: full queue wasn''t stopped!

but i''m not sure at what point this was logged though.

James

Keir Fraser

2004-Sep-02 02:23 UTC

head link

Re: [Xen-devel] network hang

"full queue wasn''t stopped" should never be printed. If you
can
reproduce this message then it is worth printing some more info at th
esame point -- for example, np->tx->req_prod, np->tx->resp_prod,
np->tx_resp_cons. This will let us see whether the ring is indeed full.
If the network stack/driver has got itself into a state wher eit can print
this message, I''m not surprised it hangs for a while.

e.g., add this where that message gets printed in netfront.c:

 {
   unsigned long flags;
   local_irq_save(flags);
   printk(KERN_ALERT "full=%d req_prod=%08x rsp_prod=%08x"
          "rsp_cons=%08x\n", np->tx_full, np->tx->req_prod,
          np->tx->resp_prod, np->tx_resp_cons);
   local_irq_restore(flags);
 }

 -- Keir
> I''m playing with nbd+raid1, and am finding that during a resync
the network in xenU is dying and simply not sending packets anymore. At first I
thought this was a bridging problem, but in xen0 I have removed the vif from the
bridge and given it its own ip address, and given the eth interface in xenU a
similar ip address, but no traffic is passing anymore.
> 
> After a while though, it seemed to come good again and I was able to add
the interface to the bridge again and it started working.
> 
> The only strange thing in the kernel logs was this in xenU:
> 
> eth0: full queue wasn''t stopped!
> 
> but i''m not sure at what point this was logged though.
> 
> James -=- MIME -=- 
--_5CA9A503-EEF7-4FB1-8592-2E2052031B95_
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

I''m playing with nbd+raid1, and am finding that during a resync the
network in xenU is dying and simply not sending packets anymore. At first I
thought this was a bridging problem, but in xen0 I have removed the vif from the
bridge and given it its own ip address, and given the eth interface in xenU a
similar ip address, but no traffic is passing anymore.

After a while though, it seemed to come good again and I was able to add the
interface to the bridge again and it started working.

The only strange thing in the kernel logs was this in xenU:

eth0: full queue wasn''t stopped!

but i''m not sure at what point this was logged though.

James

--_5CA9A503-EEF7-4FB1-8592-2E2052031B95_
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<HTML dir=3Dltr><HEAD></HEAD>
<BODY>
<DIV><FONT face=3DArial color=3D#000000 size=3D2>I''m
playing with nbd+raid1, and am finding that during a resync the network in xenU
is dying and simply not sending packets anymore. At first I thought this was a
bridging problem, but in xen0 I have removed the vif from the bridge and given
it its own ip address, and given the eth interface in xenU a similar ip address,
but no traffic is passing anymore.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>After a while though, it seemed to
come good again and I was able to add the interface to the bridge again and it
started working.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>The only strange thing in the
kernel logs was this in xenU:</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>eth0: full queue wasn''t
stopped!<BR></DIV></FONT>
<DIV><FONT face=3DArial size=3D2>but i''m not sure at what
point this was logged though.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>James</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial
size=3D2></FONT>&nbsp;</DIV></BODY></HTML>

--_5CA9A503-EEF7-4FB1-8592-2E2052031B95_--


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Rob Gardner

2004-Sep-09 21:43 UTC

head link

[Xen-devel] More on networking hang

I saw something go by on the list a week or so ago about network hangs, 
and I may be observing something similar.

The basic setup is: two guest domains running apache, and a different 
box running httpperf against them, 100 requests per second for the same 
100kbyte file.

This runs ok for a time, then suddenly chokes and all traffic comes to a 
stop. Then a few seconds later traffic seems to pick up again.

This behavior is not observed with a workload of 40 requests/second. At 
80/second, the problem starts appearing, but not very frequently.

We can provide sufficient detail if anyone wants to try to reproduce this.

Have there been any fixes relating to this lately? We are using xen bits 
that are a few weeks old right now.


Rob Gardner
HP




-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM. 
Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Ian Pratt

2004-Sep-09 22:06 UTC

head link

Re: [Xen-devel] More on networking hang

> I saw something go by on the list a week or so ago about network hangs, 
> and I may be observing something similar.
> 
> The basic setup is: two guest domains running apache, and a different 
> box running httpperf against them, 100 requests per second for the same 
> 100kbyte file.
> 
> This runs ok for a time, then suddenly chokes and all traffic comes to a 
> stop. Then a few seconds later traffic seems to pick up again.
> 
> This behavior is not observed with a workload of 40 requests/second. At 
> 80/second, the problem starts appearing, but not very frequently.
> 
> We can provide sufficient detail if anyone wants to try to reproduce this.
Just to check I understand your setup: You have a domain 0
implementing bridging, then a domain 1 and a domain 2 each
running apache.

When the domain chokes, do you you see any drops or errors in the
stats as reported by ifconfig?

It would be good to enable the debugging printf''s in both the
netfront and netback drivers.

Can you repeat this with a single non-0 domain? Can you repeat it more
easily by generating a background network load in dom0?

BTW: Have you got CONNECTION_TRACKING compiled into the dom0
kernel?  This seems to cripple Linux performance, hence it was
recently made a module in our config.

Ian




-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM. 
Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Rob Gardner

2004-Sep-09 22:33 UTC

head link

Re: [Xen-devel] More on networking hang

Ian Pratt wrote:> 
> Just to check I understand your setup: You have a domain 0
> implementing bridging, then a domain 1 and a domain 2 each
> running apache.
That''s correct.
> When the domain chokes, do you you see any drops or errors in the
> stats as reported by ifconfig?
In domain 0, ifconfig reports 0 errors, 0 dropped on eth0; 0 errors, 2 
dropped on vif1.0, and 0 errors, 5 dropped on vif2.0; 0 errors, 0 
dropped on xen-br0.

In domain 1, ifconfig reports 0 for errors and dropped.

For some reason I can''t get a console onto the other domain right now, 
but I suspect it will report the same thing as domain 1.
> It would be good to enable the debugging printf''s in both the
> netfront and netback drivers.
Can do.

> Can you repeat this with a single non-0 domain? Can you repeat it more
> easily by generating a background network load in dom0?
Can try these.

> BTW: Have you got CONNECTION_TRACKING compiled into the dom0
> kernel?  This seems to cripple Linux performance, hence it was
> recently made a module in our config.
We are using only default options.



Rob Gardner
HP




-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM. 
Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Rob Gardner

2004-Sep-10 22:36 UTC

head link

Re: [Xen-devel] More on networking hang

On Thu, 2004-09-09 at 16:06, Ian Pratt wrote:> It would be good to enable the debugging printf''s in both the
> netfront and netback drivers.
>
> Can you repeat this with a single non-0 domain? Can you repeat it more
> easily by generating a background network load in dom0?

The network hang behavior does occur with just a single non-0 domain.

We got the following output in dmesg that looks interesting:

...
Freeing unused kernel memory: 116k freed
EXT3 FS 2.4-0.9.19, 19 August 2002 on ide0(3,1), internal journal
Adding Swap: 1020592k swap-space (priority -1)
ioperm not fully supported - set iopl to 3
ioperm not fully supported - set iopl to 3
ioperm not fully supported - set iopl to 3
ioperm not fully supported - set iopl to 3
device eth0 entered promiscuous mode
xen-br0: port 1(eth0) entering learning state
xen-br0: port 1(eth0) entering forwarding state
xen-br0: topology change detected, propagating
(file=interface.c, line=140) Successfully created netif
device vif1.0 entered promiscuous mode
xen-br0: port 2(vif1.0) entering learning state
xen-br0: port 2(vif1.0) entering forwarding state
xen-br0: topology change detected, propagating
(file=interface.c, line=140) Successfully created netif
device vif2.0 entered promiscuous mode
xen-br0: port 3(vif2.0) entering learning state
xen-br0: port 3(vif2.0) entering forwarding state
xen-br0: topology change detected, propagating
ip_conntrack: table full, dropping packet.
ip_conntrack: table full, dropping packet.
ip_conntrack: table full, dropping packet.
ip_conntrack: table full, dropping packet.
ip_conntrack: table full, dropping packet.
ip_conntrack: table full, dropping packet.
ip_conntrack: table full, dropping packet.
ip_conntrack: table full, dropping packet.
ip_conntrack: table full, dropping packet.
ip_conntrack: table full, dropping packet.
NET: 481 messages suppressed.
ip_conntrack: table full, dropping packet.
NET: 532 messages suppressed.
ip_conntrack: table full, dropping packet.
NET: 547 messages suppressed.
ip_conntrack: table full, dropping packet.
NET: 393 messages suppressed.
ip_conntrack: table full, dropping packet.
NET: 25 messages suppressed.
ip_conntrack: table full, dropping packet.
NET: 23 messages suppressed.
ip_conntrack: table full, dropping packet.
NET: 33 messages suppressed.
ip_conntrack: table full, dropping packet.

-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Keir Fraser

2004-Sep-11 03:59 UTC

head link

Re: [Xen-devel] More on networking hang

Looks as though perhaps the connection-tracking table is full. :-)

If you are churning through a lot of TCP connections then the
conntrack table may be full of defunct TCBs in TIME WAIT (2MSL)
state. 

Not sure what the best solution is: when we were doing evaluation
tests for our paper we disabled connection tracking (which means that
things like NAT are unavailable). I haven''t looked around, but there
may well be a way to tell Linux to reuse TCBs in TIME WAIT state.

This would exaplain networking drop-outs. No more connections can be
made until some old TCBs are garbage collected, after a 120s timeout.

 -- Keir
> ip_conntrack: table full, dropping packet.
> ip_conntrack: table full, dropping packet.
> ip_conntrack: table full, dropping packet.
> ip_conntrack: table full, dropping packet.
> ip_conntrack: table full, dropping packet.
> ip_conntrack: table full, dropping packet.
> ip_conntrack: table full, dropping packet.
> ip_conntrack: table full, dropping packet.
> ip_conntrack: table full, dropping packet.
> ip_conntrack: table full, dropping packet.
> NET: 481 messages suppressed.
> ip_conntrack: table full, dropping packet.
> NET: 532 messages suppressed.
> ip_conntrack: table full, dropping packet.
> NET: 547 messages suppressed.
> ip_conntrack: table full, dropping packet.
> NET: 393 messages suppressed.
> ip_conntrack: table full, dropping packet.
> NET: 25 messages suppressed.
> ip_conntrack: table full, dropping packet.
> NET: 23 messages suppressed.
> ip_conntrack: table full, dropping packet.
> NET: 33 messages suppressed.
> ip_conntrack: table full, dropping packet.

-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM. 
Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Keir Fraser

2004-Sep-11 04:29 UTC

head link

Re: [Xen-devel] More on networking hang

> Not sure what the best solution is: when we were doing evaluation
> tests for our paper we disabled connection tracking (which means that
> things like NAT are unavailable). I haven''t looked around, but
there
> may well be a way to tell Linux to reuse TCBs in TIME WAIT state.
More detail:

Look in /proc/net/ip_conntrack. Most likely you''ll see lots of
connects in TIME_WAIT.

You can adjust the maximum number of tracked connections by echoing to
/proc/sys/net/ipv4/ip_conntrack_max. A better solution, however, is
probably to modify the individual timeout values for each state. For
example:

 echo "5"
>/proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_timeout_time_wait

[disclaimer: I haven''t tried this myself, but google + src indicates
 this is the most promising approach.]

 -- Keir


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM. 
Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Ian Pratt

2004-Sep-11 07:45 UTC

head link

Re: [Xen-devel] More on networking hang

> xen-br0: port 3(vif2.0) entering learning state
> xen-br0: port 3(vif2.0) entering forwarding state
> xen-br0: topology change detected, propagating
> ip_conntrack: table full, dropping packet.
> ip_conntrack: table full, dropping packet.
As I suspected in a previous email, you''re using an old
configuration file where Linux''s connection tracking was enabled
by default.

Linux''s connection tracking code seems to remain active and slow
things down even if you''re not using it. That''s why I changed
the
config option into ''module'' rather than
''yes'' several weeks back.

Ian


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM. 
Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Xen devel - Sep 2004 - network hang

[Xen-devel] network hang

Re: [Xen-devel] network hang

[Xen-devel] More on networking hang

Re: [Xen-devel] More on networking hang

Re: [Xen-devel] More on networking hang

Re: [Xen-devel] More on networking hang

Re: [Xen-devel] More on networking hang

Re: [Xen-devel] More on networking hang

Re: [Xen-devel] More on networking hang