thr3ads.net - Xen users - Pls help: netfront tx ring frozen (any clues appreciated) [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Vijay Chander

2012-Feb-23 16:29 UTC

Pls help: netfront tx ring frozen (any clues appreciated)

Hi,

    We are running into a situation where rsp_prod index in the shared ring
is not getting updated
for the netfront tx ring by the netback.

    We see that rsp_cons is the same value as rsp_prod, with req_prod 236
slots away(tx ring is full).
From looking at the netfront driver code, it looks as if xennet_tx_buf_gc
processing only happens if rsp_prod is more
than rsp_cons.

   Our understanding is that netfront sets rsp_cons to tell the netback to
start processing transmits
from rsp_cons index onwards till req_prod. Once netback is done process X
requests, it will increment rsp_prod
by X. This will cause netfront to look at the status of each of individual
responses for the slots starting
from rsp_cons till rsp_prod (with rsp_prod  - rsp_cons = X in this case).

   Is there anyway to workaround this ? Will xennet_disconnect_backend(),
xennet_connect()
on the netfront cause us to recover from this stuck situation. We are ok
with pending TX packets getting dropped
since we have TCP running on top.

   Thanks,
-vijay


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
lists.xen.org/xen-devel

Vijay Chander

2012-Feb-25 15:46 UTC

head link

Re: Pls help: netfront tx ring frozen (any clues appreciated)

If anybody encountered a similar situation as below where the netfront TX
ring is stuck ,
can you pls provide some pointers on how to get around this problem ?

This typically happens after about 2days of overnight traffic tests.

Thanks,
-vijay

On Thu, Feb 23, 2012 at 8:29 AM, Vijay Chander
<vijay.chander@gmail.com>wrote:
>
>
> Hi,
>
>     We are running into a situation where rsp_prod index in the shared
> ring is not getting updated
> for the netfront tx ring by the netback.
>
>     We see that rsp_cons is the same value as rsp_prod, with req_prod 236
> slots away(tx ring is full).
> From looking at the netfront driver code, it looks as if xennet_tx_buf_gc
> processing only happens if rsp_prod is more
> than rsp_cons.
>
>    Our understanding is that netfront sets rsp_cons to tell the netback to
> start processing transmits
> from rsp_cons index onwards till req_prod. Once netback is done process X
> requests, it will increment rsp_prod
> by X. This will cause netfront to look at the status of each of individual
> responses for the slots starting
> from rsp_cons till rsp_prod (with rsp_prod  - rsp_cons = X in this case).
>
>    Is there anyway to workaround this ? Will xennet_disconnect_backend(),
> xennet_connect()
> on the netfront cause us to recover from this stuck situation. We are ok
> with pending TX packets getting dropped
> since we have TCP running on top.
>
>    Thanks,
> -vijay
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
lists.xen.org/xen-devel

Vijay Chander

2012-Feb-25 15:47 UTC

head link

Fwd: Pls help: netfront tx ring frozen (any clues appreciated)

---------- Forwarded message ----------
From: Vijay Chander <vijay.chander@gmail.com>
Date: Sat, Feb 25, 2012 at 7:46 AM
Subject: Re: Pls help: netfront tx ring frozen (any clues appreciated)
To: xen-devel@lists.xensource.com


If anybody encountered a similar situation as below where the netfront TX
ring is stuck ,
can you pls provide some pointers on how to get around this problem ?

This typically happens after about 2days of overnight traffic tests.

Thanks,
-vijay


On Thu, Feb 23, 2012 at 8:29 AM, Vijay Chander
<vijay.chander@gmail.com>wrote:
>
>
> Hi,
>
>     We are running into a situation where rsp_prod index in the shared
> ring is not getting updated
> for the netfront tx ring by the netback.
>
>     We see that rsp_cons is the same value as rsp_prod, with req_prod 236
> slots away(tx ring is full).
> From looking at the netfront driver code, it looks as if xennet_tx_buf_gc
> processing only happens if rsp_prod is more
> than rsp_cons.
>
>    Our understanding is that netfront sets rsp_cons to tell the netback to
> start processing transmits
> from rsp_cons index onwards till req_prod. Once netback is done process X
> requests, it will increment rsp_prod
> by X. This will cause netfront to look at the status of each of individual
> responses for the slots starting
> from rsp_cons till rsp_prod (with rsp_prod  - rsp_cons = X in this case).
>
>    Is there anyway to workaround this ? Will xennet_disconnect_backend(),
> xennet_connect()
> on the netfront cause us to recover from this stuck situation. We are ok
> with pending TX packets getting dropped
> since we have TCP running on top.
>
>    Thanks,
> -vijay
>
>

_______________________________________________
Xen-users mailing list
Xen-users@lists.xen.org
lists.xen.org/xen-users

Konrad Rzeszutek Wilk

2012-Apr-06 20:31 UTC

head link

Re: Pls help: netfront tx ring frozen (any clues appreciated)

On Sat, Feb 25, 2012 at 07:46:36AM -0800, Vijay Chander
wrote:> If anybody encountered a similar situation as below where the netfront TX
> ring is stuck ,
> can you pls provide some pointers on how to get around this problem ?
> 
> This typically happens after about 2days of overnight traffic tests.
What kind of traffic? As in netperf for 48hrs? Is this from guest to guest
traffic or from outside host to the guest?

Steve Prochniak

2012-Apr-09 19:09 UTC

head link

Re: Pls help: netfront tx ring frozen (any clues appreciated)

I recall running into this problem while in development for a Network PV driver
- though I don''t recall if it was the TX or RX ring that would stall
(maybe it was both or either).  During longevity testing, after days of nonstop
traffic, something would go wrong and the interrupt would fail to clear.  This
seemed to be a "after so many interrupts" bug, since halving the
traffic would double the time necessary to reproduce.  At the time, we figured
that we never saw this with the disk because it would have taken weeks to repro.

Mainly because of the length of time required to reproduce this, we never found
out whether the problem was on the Dom0 or DomU side.  I worked around the
problem by adding code that would detect that the condition was occurring, and
then would trigger a reset of the event channel or interrupt.

Steve

-----Original Message-----
From: Konrad Rzeszutek Wilk 
Sent: Friday, April 06, 2012 4:32 PM
To: Vijay Chander
Cc: xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] Pls help: netfront tx ring frozen (any clues
appreciated)

On Sat, Feb 25, 2012 at 07:46:36AM -0800, Vijay Chander
wrote:> If anybody encountered a similar situation as below where the netfront TX
> ring is stuck ,
> can you pls provide some pointers on how to get around this problem ?
> 
> This typically happens after about 2days of overnight traffic tests.
What kind of traffic? As in netperf for 48hrs? Is this from guest to guest
traffic or from outside host to the guest?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
lists.xen.org/xen-devel

Steve Prochniak

2012-Apr-09 19:21 UTC

head link

Re: Pls help: netfront tx ring frozen (any clues appreciated)

After digging up the code, when we observed this issue it was specific to the RX
ring and it took about 4 days of nonstop traffic to reproduce.  So perhaps the
issues are not related.

-----Original Message-----
From: Steve Prochniak 
Sent: Monday, April 09, 2012 3:09 PM
To: Konrad Wilk
Cc: xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] Pls help: netfront tx ring frozen (any clues
appreciated)

I recall running into this problem while in development for a Network PV driver
- though I don''t recall if it was the TX or RX ring that would stall
(maybe it was both or either).  During longevity testing, after days of nonstop
traffic, something would go wrong and the interrupt would fail to clear.  This
seemed to be a "after so many interrupts" bug, since halving the
traffic would double the time necessary to reproduce.  At the time, we figured
that we never saw this with the disk because it would have taken weeks to repro.

Mainly because of the length of time required to reproduce this, we never found
out whether the problem was on the Dom0 or DomU side.  I worked around the
problem by adding code that would detect that the condition was occurring, and
then would trigger a reset of the event channel or interrupt.

Steve

-----Original Message-----
From: Konrad Rzeszutek Wilk 
Sent: Friday, April 06, 2012 4:32 PM
To: Vijay Chander
Cc: xen-devel@lists.xensource.com
Subject: Re: [Xen-devel] Pls help: netfront tx ring frozen (any clues
appreciated)

On Sat, Feb 25, 2012 at 07:46:36AM -0800, Vijay Chander
wrote:> If anybody encountered a similar situation as below where the netfront TX
> ring is stuck ,
> can you pls provide some pointers on how to get around this problem ?
> 
> This typically happens after about 2days of overnight traffic tests.
What kind of traffic? As in netperf for 48hrs? Is this from guest to guest
traffic or from outside host to the guest?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
lists.xen.org/xen-devel

Maybe Matching Threads

Search for more maybe matching threads

Xen users - Feb 2012 - Pls help: netfront tx ring frozen (any clues appreciated)

Pls help: netfront tx ring frozen (any clues appreciated)

Re: Pls help: netfront tx ring frozen (any clues appreciated)

Fwd: Pls help: netfront tx ring frozen (any clues appreciated)

Re: Pls help: netfront tx ring frozen (any clues appreciated)

Re: Pls help: netfront tx ring frozen (any clues appreciated)

Re: Pls help: netfront tx ring frozen (any clues appreciated)

Maybe Matching Threads