netback seemed to be somewhat confused about the napi budget parameter and basically ignored it. This patch fixes that, properly limiting the work done in each poll. Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> --- drivers/net/xen-netback/netback.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 43341b8..83b4e5b 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -1351,14 +1351,15 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size) return false; } -static unsigned xenvif_tx_build_gops(struct xenvif *vif) +static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget) { struct gnttab_copy *gop = vif->tx_copy_ops, *request_gop; struct sk_buff *skb; int ret; while ((nr_pending_reqs(vif) + XEN_NETBK_LEGACY_SLOTS_MAX - < MAX_PENDING_REQS)) { + < MAX_PENDING_REQS) && + (skb_queue_len(&vif->tx_queue) < budget)) { struct xen_netif_tx_request txreq; struct xen_netif_tx_request txfrags[XEN_NETBK_LEGACY_SLOTS_MAX]; struct page *page; @@ -1520,14 +1521,13 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif) } -static int xenvif_tx_submit(struct xenvif *vif, int budget) +static int xenvif_tx_submit(struct xenvif *vif) { struct gnttab_copy *gop = vif->tx_copy_ops; struct sk_buff *skb; int work_done = 0; - while (work_done < budget && - (skb = __skb_dequeue(&vif->tx_queue)) != NULL) { + while ((skb = __skb_dequeue(&vif->tx_queue)) != NULL) { struct xen_netif_tx_request *txp; u16 pending_idx; unsigned data_len; @@ -1602,14 +1602,14 @@ int xenvif_tx_action(struct xenvif *vif, int budget) if (unlikely(!tx_work_todo(vif))) return 0; - nr_gops = xenvif_tx_build_gops(vif); + nr_gops = xenvif_tx_build_gops(vif, budget); if (nr_gops == 0) return 0; gnttab_batch_copy(vif->tx_copy_ops, nr_gops); - work_done = xenvif_tx_submit(vif, nr_gops); + work_done = xenvif_tx_submit(vif); return work_done; } -- 1.7.10.4
On Tue, 2013-12-10 at 10:16 +0000, Paul Durrant wrote:> netback seemed to be somewhat confused about the napi budget parameter and > basically ignored it. This patch fixes that, properly limiting the work done > in each poll.What do you mean "ignored", xenvif_tx_submit seems to be tracking and testing work_done against the budget. I suspect this change is probably worthwhile but it would be good to get an accurate description of why, which I presume is because the tx process is xenvif_tx_build_gops followed by, gnttab_batch_copy then xenvif_tx_submit and that it is better to do the budget enforcement earlier on. How does this change impact the batching in gnttab_batch_copy and therefore performance? Do we need to tweak the the NAPI budget to ensure we are getting good batching? I suspect that netback is a bit unusual among NIC drivers in that the rx path contains a fair bit of actual work to do, so perhaps the NAPI defaults are not necessarily going to be the best for it.> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com> > Cc: Wei Liu <wei.liu2@citrix.com> > Cc: Ian Campbell <ian.campbell@citrix.com> > Cc: David Vrabel <david.vrabel@citrix.com> > --- > drivers/net/xen-netback/netback.c | 14 +++++++------- > 1 file changed, 7 insertions(+), 7 deletions(-) > > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c > index 43341b8..83b4e5b 100644 > --- a/drivers/net/xen-netback/netback.c > +++ b/drivers/net/xen-netback/netback.c > @@ -1351,14 +1351,15 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size) > return false; > } > > -static unsigned xenvif_tx_build_gops(struct xenvif *vif) > +static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget) > { > struct gnttab_copy *gop = vif->tx_copy_ops, *request_gop; > struct sk_buff *skb; > int ret; > > while ((nr_pending_reqs(vif) + XEN_NETBK_LEGACY_SLOTS_MAX > - < MAX_PENDING_REQS)) { > + < MAX_PENDING_REQS) && > + (skb_queue_len(&vif->tx_queue) < budget)) { > struct xen_netif_tx_request txreq; > struct xen_netif_tx_request txfrags[XEN_NETBK_LEGACY_SLOTS_MAX]; > struct page *page; > @@ -1520,14 +1521,13 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif) > } > > > -static int xenvif_tx_submit(struct xenvif *vif, int budget) > +static int xenvif_tx_submit(struct xenvif *vif) > { > struct gnttab_copy *gop = vif->tx_copy_ops; > struct sk_buff *skb; > int work_done = 0; > > - while (work_done < budget && > - (skb = __skb_dequeue(&vif->tx_queue)) != NULL) { > + while ((skb = __skb_dequeue(&vif->tx_queue)) != NULL) { > struct xen_netif_tx_request *txp; > u16 pending_idx; > unsigned data_len; > @@ -1602,14 +1602,14 @@ int xenvif_tx_action(struct xenvif *vif, int budget) > if (unlikely(!tx_work_todo(vif))) > return 0; > > - nr_gops = xenvif_tx_build_gops(vif); > + nr_gops = xenvif_tx_build_gops(vif, budget); > > if (nr_gops == 0) > return 0; > > gnttab_batch_copy(vif->tx_copy_ops, nr_gops); > > - work_done = xenvif_tx_submit(vif, nr_gops); > + work_done = xenvif_tx_submit(vif); > > return work_done; > }
On 10/12/13 10:25, Ian Campbell wrote:> On Tue, 2013-12-10 at 10:16 +0000, Paul Durrant wrote: >> netback seemed to be somewhat confused about the napi budget parameter and >> basically ignored it. This patch fixes that, properly limiting the work done >> in each poll. > > What do you mean "ignored", xenvif_tx_submit seems to be tracking and > testing work_done against the budget.I have seen this warning in net_rx_action() trigger. WARN_ON_ONCE(work > weight); Which means netback wasn''t limiting the work done. David
On Tue, Dec 10, 2013 at 10:16:40AM +0000, Paul Durrant wrote:> netback seemed to be somewhat confused about the napi budget parameter and > basically ignored it. This patch fixes that, properly limiting the work done > in each poll. >After reading the code I think your "basically ignored it" means netback will process the ring as much as possible, right? But overall the packets passed to network stack is still limited by budget, if I''m not mistaken. What''s the impact on flow control if you more the check earlier? Wei.
On Tue, Dec 10, 2013 at 10:30:13AM +0000, David Vrabel wrote:> On 10/12/13 10:25, Ian Campbell wrote: > > On Tue, 2013-12-10 at 10:16 +0000, Paul Durrant wrote: > >> netback seemed to be somewhat confused about the napi budget parameter and > >> basically ignored it. This patch fixes that, properly limiting the work done > >> in each poll. > > > > What do you mean "ignored", xenvif_tx_submit seems to be tracking and > > testing work_done against the budget. > > I have seen this warning in net_rx_action() trigger. > > WARN_ON_ONCE(work > weight); > > Which means netback wasn''t limiting the work done. >But in the original code work_done is returned by xenvif_tx_submit which has guard against that situation, right? Wei.> David
On Tue, Dec 10, 2013 at 10:37:36AM +0000, Wei Liu wrote:> On Tue, Dec 10, 2013 at 10:30:13AM +0000, David Vrabel wrote: > > On 10/12/13 10:25, Ian Campbell wrote: > > > On Tue, 2013-12-10 at 10:16 +0000, Paul Durrant wrote: > > >> netback seemed to be somewhat confused about the napi budget parameter and > > >> basically ignored it. This patch fixes that, properly limiting the work done > > >> in each poll. > > > > > > What do you mean "ignored", xenvif_tx_submit seems to be tracking and > > > testing work_done against the budget. > > > > I have seen this warning in net_rx_action() trigger. > > > > WARN_ON_ONCE(work > weight); > > > > Which means netback wasn''t limiting the work done. > > > > But in the original code work_done is returned by xenvif_tx_submit which > has guard against that situation, right? >And now I think I spot a bug... work_done = xenvif_tx_submit(vif, nr_gops); The second argument should really be "budget". :-( Wei.> Wei. > > > David
> -----Original Message----- > From: Wei Liu [mailto:wei.liu2@citrix.com] > Sent: 10 December 2013 10:45 > To: David Vrabel > Cc: Ian Campbell; Paul Durrant; xen-devel@lists.xen.org; > netdev@vger.kernel.org; Wei Liu > Subject: Re: [PATCH net] xen-netback: fix abuse of napi budget > > On Tue, Dec 10, 2013 at 10:37:36AM +0000, Wei Liu wrote: > > On Tue, Dec 10, 2013 at 10:30:13AM +0000, David Vrabel wrote: > > > On 10/12/13 10:25, Ian Campbell wrote: > > > > On Tue, 2013-12-10 at 10:16 +0000, Paul Durrant wrote: > > > >> netback seemed to be somewhat confused about the napi budget > parameter and > > > >> basically ignored it. This patch fixes that, properly limiting the work > done > > > >> in each poll. > > > > > > > > What do you mean "ignored", xenvif_tx_submit seems to be tracking > and > > > > testing work_done against the budget. > > > > > > I have seen this warning in net_rx_action() trigger. > > > > > > WARN_ON_ONCE(work > weight); > > > > > > Which means netback wasn''t limiting the work done. > > > > > > > But in the original code work_done is returned by xenvif_tx_submit which > > has guard against that situation, right? > > > > And now I think I spot a bug... > > work_done = xenvif_tx_submit(vif, nr_gops); > > The second argument should really be "budget". :-( >Yep - that''s basically the problem. Paul> Wei. > > > Wei. > > > > > David
> -----Original Message----- > From: Ian Campbell > Sent: 10 December 2013 10:26 > To: Paul Durrant > Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel > Subject: Re: [PATCH net] xen-netback: fix abuse of napi budget > > On Tue, 2013-12-10 at 10:16 +0000, Paul Durrant wrote: > > netback seemed to be somewhat confused about the napi budget > parameter and > > basically ignored it. This patch fixes that, properly limiting the work done > > in each poll. > > What do you mean "ignored", xenvif_tx_submit seems to be tracking and > testing work_done against the budget. > > I suspect this change is probably worthwhile but it would be good to get > an accurate description of why, which I presume is because the tx > process is xenvif_tx_build_gops followed by, gnttab_batch_copy then > xenvif_tx_submit and that it is better to do the budget enforcement > earlier on. >Yes, the budget needs to limit what we process from the shared ring because otherwise we risk tx_queue growing without bound.> How does this change impact the batching in gnttab_batch_copy and > therefore performance? Do we need to tweak the the NAPI budget to > ensure > we are getting good batching? I suspect that netback is a bit unusual > among NIC drivers in that the rx path contains a fair bit of actual work > to do, so perhaps the NAPI defaults are not necessarily going to be the > best for it. >We have a budget of 64 at the moment, which I think is big enough and actually possibly too big. We need a value that gives us a reasonable amount of grant op batching but isn''t so big that we''re forever bouncing and in and out of interrupt mode, which I suspect is what''s happening now. It would be really useful if napi budget could be tuned dynamically so we could run some perf tests without having to reload but google has drawn a blank so far. Paul
On Tue, Dec 10, 2013 at 10:48:13AM +0000, Paul Durrant wrote:> > -----Original Message----- > > From: Wei Liu [mailto:wei.liu2@citrix.com] > > Sent: 10 December 2013 10:45 > > To: David Vrabel > > Cc: Ian Campbell; Paul Durrant; xen-devel@lists.xen.org; > > netdev@vger.kernel.org; Wei Liu > > Subject: Re: [PATCH net] xen-netback: fix abuse of napi budget > > > > On Tue, Dec 10, 2013 at 10:37:36AM +0000, Wei Liu wrote: > > > On Tue, Dec 10, 2013 at 10:30:13AM +0000, David Vrabel wrote: > > > > On 10/12/13 10:25, Ian Campbell wrote: > > > > > On Tue, 2013-12-10 at 10:16 +0000, Paul Durrant wrote: > > > > >> netback seemed to be somewhat confused about the napi budget > > parameter and > > > > >> basically ignored it. This patch fixes that, properly limiting the work > > done > > > > >> in each poll. > > > > > > > > > > What do you mean "ignored", xenvif_tx_submit seems to be tracking > > and > > > > > testing work_done against the budget. > > > > > > > > I have seen this warning in net_rx_action() trigger. > > > > > > > > WARN_ON_ONCE(work > weight); > > > > > > > > Which means netback wasn''t limiting the work done. > > > > > > > > > > But in the original code work_done is returned by xenvif_tx_submit which > > > has guard against that situation, right? > > > > > > > And now I think I spot a bug... > > > > work_done = xenvif_tx_submit(vif, nr_gops); > > > > The second argument should really be "budget". :-( > > > > Yep - that''s basically the problem. >So size-wise the attached patch is smaller. Now the only problem is that is it better to move flow control earlier. Wei. ---8<--- From 11db4a9cd7267a621725c48f0e0a99c1d6d31866 Mon Sep 17 00:00:00 2001 From: Wei Liu <wei.liu2@citrix.com> Date: Tue, 10 Dec 2013 10:49:59 +0000 Subject: [PATCH] xen-netback: correct typo nr_gops -> budget work_done should be limited by budget not nr_gops. Otherwise we trigger "WARN_ON_ONCE(work > weight)" in net/dev/core:net_rx_action. Signed-off-by: Wei Liu <wei.liu2@citrix.com> --- drivers/net/xen-netback/netback.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index acf1392..b11f65d 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -1707,7 +1707,7 @@ int xenvif_tx_action(struct xenvif *vif, int budget) gnttab_batch_copy(vif->tx_copy_ops, nr_gops); - work_done = xenvif_tx_submit(vif, nr_gops); + work_done = xenvif_tx_submit(vif, budget); return work_done; } -- 1.7.10.4
> -----Original Message----- > From: Wei Liu [mailto:wei.liu2@citrix.com] > Sent: 10 December 2013 10:55 > To: Paul Durrant > Cc: Wei Liu; David Vrabel; Ian Campbell; xen-devel@lists.xen.org; > netdev@vger.kernel.org > Subject: Re: [PATCH net] xen-netback: fix abuse of napi budget > > On Tue, Dec 10, 2013 at 10:48:13AM +0000, Paul Durrant wrote: > > > -----Original Message----- > > > From: Wei Liu [mailto:wei.liu2@citrix.com] > > > Sent: 10 December 2013 10:45 > > > To: David Vrabel > > > Cc: Ian Campbell; Paul Durrant; xen-devel@lists.xen.org; > > > netdev@vger.kernel.org; Wei Liu > > > Subject: Re: [PATCH net] xen-netback: fix abuse of napi budget > > > > > > On Tue, Dec 10, 2013 at 10:37:36AM +0000, Wei Liu wrote: > > > > On Tue, Dec 10, 2013 at 10:30:13AM +0000, David Vrabel wrote: > > > > > On 10/12/13 10:25, Ian Campbell wrote: > > > > > > On Tue, 2013-12-10 at 10:16 +0000, Paul Durrant wrote: > > > > > >> netback seemed to be somewhat confused about the napi budget > > > parameter and > > > > > >> basically ignored it. This patch fixes that, properly limiting the work > > > done > > > > > >> in each poll. > > > > > > > > > > > > What do you mean "ignored", xenvif_tx_submit seems to be > tracking > > > and > > > > > > testing work_done against the budget. > > > > > > > > > > I have seen this warning in net_rx_action() trigger. > > > > > > > > > > WARN_ON_ONCE(work > weight); > > > > > > > > > > Which means netback wasn''t limiting the work done. > > > > > > > > > > > > > But in the original code work_done is returned by xenvif_tx_submit > which > > > > has guard against that situation, right? > > > > > > > > > > And now I think I spot a bug... > > > > > > work_done = xenvif_tx_submit(vif, nr_gops); > > > > > > The second argument should really be "budget". :-( > > > > > > > Yep - that''s basically the problem. > > > > So size-wise the attached patch is smaller. Now the only problem is that > is it better to move flow control earlier. >Yes, but I think that patch is dangerous as I explained to Ian. If we don''t limit early then tx_queue can grow uncontrollably if the frontend continues to throw more data into the ring than we actually ship out on each napi poll. I will re-submit with a more elaborate description. Paul> Wei. > > ---8<--- > From 11db4a9cd7267a621725c48f0e0a99c1d6d31866 Mon Sep 17 00:00:00 > 2001 > From: Wei Liu <wei.liu2@citrix.com> > Date: Tue, 10 Dec 2013 10:49:59 +0000 > Subject: [PATCH] xen-netback: correct typo nr_gops -> budget > > work_done should be limited by budget not nr_gops. Otherwise we trigger > "WARN_ON_ONCE(work > weight)" in net/dev/core:net_rx_action. > > Signed-off-by: Wei Liu <wei.liu2@citrix.com> > --- > drivers/net/xen-netback/netback.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen- > netback/netback.c > index acf1392..b11f65d 100644 > --- a/drivers/net/xen-netback/netback.c > +++ b/drivers/net/xen-netback/netback.c > @@ -1707,7 +1707,7 @@ int xenvif_tx_action(struct xenvif *vif, int budget) > > gnttab_batch_copy(vif->tx_copy_ops, nr_gops); > > - work_done = xenvif_tx_submit(vif, nr_gops); > + work_done = xenvif_tx_submit(vif, budget); > > return work_done; > } > -- > 1.7.10.4