Vijay Chander
2012-Feb-23 16:29 UTC
Pls help: netfront tx ring frozen (any clues appreciated)
Hi, We are running into a situation where rsp_prod index in the shared ring is not getting updated for the netfront tx ring by the netback. We see that rsp_cons is the same value as rsp_prod, with req_prod 236 slots away(tx ring is full). From looking at the netfront driver code, it looks as if xennet_tx_buf_gc processing only happens if rsp_prod is more than rsp_cons. Our understanding is that netfront sets rsp_cons to tell the netback to start processing transmits from rsp_cons index onwards till req_prod. Once netback is done process X requests, it will increment rsp_prod by X. This will cause netfront to look at the status of each of individual responses for the slots starting from rsp_cons till rsp_prod (with rsp_prod - rsp_cons = X in this case). Is there anyway to workaround this ? Will xennet_disconnect_backend(), xennet_connect() on the netfront cause us to recover from this stuck situation. We are ok with pending TX packets getting dropped since we have TCP running on top. Thanks, -vijay _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Vijay Chander
2012-Feb-25 15:46 UTC
Re: Pls help: netfront tx ring frozen (any clues appreciated)
If anybody encountered a similar situation as below where the netfront TX ring is stuck , can you pls provide some pointers on how to get around this problem ? This typically happens after about 2days of overnight traffic tests. Thanks, -vijay On Thu, Feb 23, 2012 at 8:29 AM, Vijay Chander <vijay.chander@gmail.com>wrote:> > > Hi, > > We are running into a situation where rsp_prod index in the shared > ring is not getting updated > for the netfront tx ring by the netback. > > We see that rsp_cons is the same value as rsp_prod, with req_prod 236 > slots away(tx ring is full). > From looking at the netfront driver code, it looks as if xennet_tx_buf_gc > processing only happens if rsp_prod is more > than rsp_cons. > > Our understanding is that netfront sets rsp_cons to tell the netback to > start processing transmits > from rsp_cons index onwards till req_prod. Once netback is done process X > requests, it will increment rsp_prod > by X. This will cause netfront to look at the status of each of individual > responses for the slots starting > from rsp_cons till rsp_prod (with rsp_prod - rsp_cons = X in this case). > > Is there anyway to workaround this ? Will xennet_disconnect_backend(), > xennet_connect() > on the netfront cause us to recover from this stuck situation. We are ok > with pending TX packets getting dropped > since we have TCP running on top. > > Thanks, > -vijay > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Vijay Chander
2012-Feb-25 15:47 UTC
Fwd: Pls help: netfront tx ring frozen (any clues appreciated)
---------- Forwarded message ---------- From: Vijay Chander <vijay.chander@gmail.com> Date: Sat, Feb 25, 2012 at 7:46 AM Subject: Re: Pls help: netfront tx ring frozen (any clues appreciated) To: xen-devel@lists.xensource.com If anybody encountered a similar situation as below where the netfront TX ring is stuck , can you pls provide some pointers on how to get around this problem ? This typically happens after about 2days of overnight traffic tests. Thanks, -vijay On Thu, Feb 23, 2012 at 8:29 AM, Vijay Chander <vijay.chander@gmail.com>wrote:> > > Hi, > > We are running into a situation where rsp_prod index in the shared > ring is not getting updated > for the netfront tx ring by the netback. > > We see that rsp_cons is the same value as rsp_prod, with req_prod 236 > slots away(tx ring is full). > From looking at the netfront driver code, it looks as if xennet_tx_buf_gc > processing only happens if rsp_prod is more > than rsp_cons. > > Our understanding is that netfront sets rsp_cons to tell the netback to > start processing transmits > from rsp_cons index onwards till req_prod. Once netback is done process X > requests, it will increment rsp_prod > by X. This will cause netfront to look at the status of each of individual > responses for the slots starting > from rsp_cons till rsp_prod (with rsp_prod - rsp_cons = X in this case). > > Is there anyway to workaround this ? Will xennet_disconnect_backend(), > xennet_connect() > on the netfront cause us to recover from this stuck situation. We are ok > with pending TX packets getting dropped > since we have TCP running on top. > > Thanks, > -vijay > >_______________________________________________ Xen-users mailing list Xen-users@lists.xen.org http://lists.xen.org/xen-users
Konrad Rzeszutek Wilk
2012-Apr-06 20:31 UTC
Re: Pls help: netfront tx ring frozen (any clues appreciated)
On Sat, Feb 25, 2012 at 07:46:36AM -0800, Vijay Chander wrote:> If anybody encountered a similar situation as below where the netfront TX > ring is stuck , > can you pls provide some pointers on how to get around this problem ? > > This typically happens after about 2days of overnight traffic tests.What kind of traffic? As in netperf for 48hrs? Is this from guest to guest traffic or from outside host to the guest?
Steve Prochniak
2012-Apr-09 19:09 UTC
Re: Pls help: netfront tx ring frozen (any clues appreciated)
I recall running into this problem while in development for a Network PV driver - though I don''t recall if it was the TX or RX ring that would stall (maybe it was both or either). During longevity testing, after days of nonstop traffic, something would go wrong and the interrupt would fail to clear. This seemed to be a "after so many interrupts" bug, since halving the traffic would double the time necessary to reproduce. At the time, we figured that we never saw this with the disk because it would have taken weeks to repro. Mainly because of the length of time required to reproduce this, we never found out whether the problem was on the Dom0 or DomU side. I worked around the problem by adding code that would detect that the condition was occurring, and then would trigger a reset of the event channel or interrupt. Steve -----Original Message----- From: Konrad Rzeszutek Wilk Sent: Friday, April 06, 2012 4:32 PM To: Vijay Chander Cc: xen-devel@lists.xensource.com Subject: Re: [Xen-devel] Pls help: netfront tx ring frozen (any clues appreciated) On Sat, Feb 25, 2012 at 07:46:36AM -0800, Vijay Chander wrote:> If anybody encountered a similar situation as below where the netfront TX > ring is stuck , > can you pls provide some pointers on how to get around this problem ? > > This typically happens after about 2days of overnight traffic tests.What kind of traffic? As in netperf for 48hrs? Is this from guest to guest traffic or from outside host to the guest? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Steve Prochniak
2012-Apr-09 19:21 UTC
Re: Pls help: netfront tx ring frozen (any clues appreciated)
After digging up the code, when we observed this issue it was specific to the RX ring and it took about 4 days of nonstop traffic to reproduce. So perhaps the issues are not related. -----Original Message----- From: Steve Prochniak Sent: Monday, April 09, 2012 3:09 PM To: Konrad Wilk Cc: xen-devel@lists.xensource.com Subject: Re: [Xen-devel] Pls help: netfront tx ring frozen (any clues appreciated) I recall running into this problem while in development for a Network PV driver - though I don''t recall if it was the TX or RX ring that would stall (maybe it was both or either). During longevity testing, after days of nonstop traffic, something would go wrong and the interrupt would fail to clear. This seemed to be a "after so many interrupts" bug, since halving the traffic would double the time necessary to reproduce. At the time, we figured that we never saw this with the disk because it would have taken weeks to repro. Mainly because of the length of time required to reproduce this, we never found out whether the problem was on the Dom0 or DomU side. I worked around the problem by adding code that would detect that the condition was occurring, and then would trigger a reset of the event channel or interrupt. Steve -----Original Message----- From: Konrad Rzeszutek Wilk Sent: Friday, April 06, 2012 4:32 PM To: Vijay Chander Cc: xen-devel@lists.xensource.com Subject: Re: [Xen-devel] Pls help: netfront tx ring frozen (any clues appreciated) On Sat, Feb 25, 2012 at 07:46:36AM -0800, Vijay Chander wrote:> If anybody encountered a similar situation as below where the netfront TX > ring is stuck , > can you pls provide some pointers on how to get around this problem ? > > This typically happens after about 2days of overnight traffic tests.What kind of traffic? As in netperf for 48hrs? Is this from guest to guest traffic or from outside host to the guest? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Apparently Analagous Threads
- Pls help: netfront tx ring frozen (any clues appreciated)
- netfront.c: gnttab_query_foreign_access returns non zero in network_tx_buf_gc
- oops when access xenstore in hvm guest
- [PATCH] VNIF: Using smart polling instead of event notification.
- [PATCH v2 34/34] xen/io: use virt_xxx barriers