Palagummi, Siva
2012-Aug-29 12:21 UTC
[PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used
This patch contains the modifications that are discussed in thread http://lists.xen.org/archives/html/xen-devel/2012-08/msg01730.html Ian, Instead of using max_required_rx_slots, I used the count that we already have in hand to verify if we have enough room in the batch queue for next skb. Please let me know if that is not appropriate. Things worked fine in my environment. Under heavy load now we seems to be consuming most of the slots in the queue and no BUG_ON :-) Thanks Siva _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2012-Aug-30 08:07 UTC
Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used
On Wed, 2012-08-29 at 13:21 +0100, Palagummi, Siva wrote:> This patch contains the modifications that are discussed in thread > http://lists.xen.org/archives/html/xen-devel/2012-08/msg01730.htmlThanks. Please can you find a way to include your patches inline rather than as attachments, it makes reply for review much easier. Documentation/email-clients.txt has some hints for various clients or you can just use the "git send-email" command (perhaps with "git format-patch"). You should also CC the netdev@vger.kernel.org list.> Instead of using max_required_rx_slots, I used the count that we > already have in hand to verify if we have enough room in the batch > queue for next skb. Please let me know if that is not appropriate. > Things worked fine in my environment. Under heavy load now we seems to > be consuming most of the slots in the queue and no BUG_ON :-)> From: Siva Palagummi <Siva.Palagummi@ca.com> > > count variable in xen_netbk_rx_action need to be incremented > correctly to take into account of extra slots required when skb_headlen is > greater than PAGE_SIZE when larger MTU values are used. Without this change > BUG_ON(npo.meta_prod > ARRAY_SIZE(netbk->meta)) is causing netback thread > to exit. > > The fix is to stash the counting already done in xen_netbk_count_skb_slots > in skb_cb_overlay and use it directly in xen_netbk_rx_action. > > Also improved the checking for filling the batch queue. > > Also merged a change from a patch created for xen_netbk_count_skb_slots > function as per thread > http://lists.xen.org/archives/html/xen-devel/2012-05/msg01864.html > > The problem is seen with linux 3.2.2 kernel on Intel 10Gbps network > > > Signed-off-by: Siva Palagummi <Siva.Palagummi@ca.com> > --- > diff -uprN a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c > --- a/drivers/net/xen-netback/netback.c 2012-01-25 19:39:32.000000000 -0500 > +++ b/drivers/net/xen-netback/netback.c 2012-08-28 17:31:22.000000000 -0400 > @@ -80,6 +80,11 @@ union page_ext { > void *mapping; > }; > > +struct skb_cb_overlay { > + int meta_slots_used; > + int count; > +}; > + > struct xen_netbk { > wait_queue_head_t wq; > struct task_struct *task; > @@ -324,9 +329,9 @@ unsigned int xen_netbk_count_skb_slots(s > { > unsigned int count; > int i, copy_off; > + struct skb_cb_overlay *sco; > > - count = DIV_ROUND_UP( > - offset_in_page(skb->data)+skb_headlen(skb), PAGE_SIZE); > + count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);This hunk appears to be upstream already (see e26b203ede31fffd52571a5ba607a26c79dc5c0d). Which tree are you working against? You should either base patches on Linus'' branch or on the net-next branch. Other than this the patch looks good, thanks.> > copy_off = skb_headlen(skb) % PAGE_SIZE; > > @@ -352,6 +357,8 @@ unsigned int xen_netbk_count_skb_slots(s > size -= bytes; > } > } > + sco = (struct skb_cb_overlay *)skb->cb; > + sco->count = count; > return count; > } > > @@ -586,9 +593,6 @@ static void netbk_add_frag_responses(str > } > } > > -struct skb_cb_overlay { > - int meta_slots_used; > -}; > > static void xen_netbk_rx_action(struct xen_netbk *netbk) > { > @@ -621,12 +625,16 @@ static void xen_netbk_rx_action(struct x > sco = (struct skb_cb_overlay *)skb->cb; > sco->meta_slots_used = netbk_gop_skb(skb, &npo); > > - count += nr_frags + 1; > + count += sco->count; > > __skb_queue_tail(&rxq, skb); > > + skb = skb_peek(&netbk->rx_queue); > + if (skb == NULL) > + break; > + sco = (struct skb_cb_overlay *)skb->cb; > /* Filled the batch queue? */ > - if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE) > + if (count + sco->count >= XEN_NETIF_RX_RING_SIZE) > break; > } >
Palagummi, Siva
2012-Aug-30 10:26 UTC
Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used
Ian, I was using ms exchange client which is converting tabs into spaces and so on. That''s why I sent it as an attachment. I will find a way to send the patch inline and resubmit. Thanks for reviewing it. --Siva> -----Original Message----- > From: Ian Campbell [mailto:Ian.Campbell@citrix.com] > Sent: Thursday, August 30, 2012 1:37 PM > To: Palagummi, Siva > Cc: xen-devel@lists.xen.org > Subject: Re: [PATCH RFC V2] xen/netback: Count ring slots properly when > larger MTU sizes are used > > On Wed, 2012-08-29 at 13:21 +0100, Palagummi, Siva wrote: > > This patch contains the modifications that are discussed in thread > > http://lists.xen.org/archives/html/xen-devel/2012-08/msg01730.html > > Thanks. > > Please can you find a way to include your patches inline rather than as > attachments, it makes reply for review much easier. > Documentation/email-clients.txt has some hints for various clients or > you can just use the "git send-email" command (perhaps with "git > format-patch"). > > You should also CC the netdev@vger.kernel.org list. > > > Instead of using max_required_rx_slots, I used the count that we > > already have in hand to verify if we have enough room in the batch > > queue for next skb. Please let me know if that is not appropriate. > > Things worked fine in my environment. Under heavy load now we seems > to > > be consuming most of the slots in the queue and no BUG_ON :-) > > > > From: Siva Palagummi <Siva.Palagummi@ca.com> > > > > count variable in xen_netbk_rx_action need to be incremented > > correctly to take into account of extra slots required when > skb_headlen is > > greater than PAGE_SIZE when larger MTU values are used. Without this > change > > BUG_ON(npo.meta_prod > ARRAY_SIZE(netbk->meta)) is causing netback > thread > > to exit. > > > > The fix is to stash the counting already done in > xen_netbk_count_skb_slots > > in skb_cb_overlay and use it directly in xen_netbk_rx_action. > > > > Also improved the checking for filling the batch queue. > > > > Also merged a change from a patch created for > xen_netbk_count_skb_slots > > function as per thread > > http://lists.xen.org/archives/html/xen-devel/2012-05/msg01864.html > > > > The problem is seen with linux 3.2.2 kernel on Intel 10Gbps network > > > > > > Signed-off-by: Siva Palagummi <Siva.Palagummi@ca.com> > > --- > > diff -uprN a/drivers/net/xen-netback/netback.c b/drivers/net/xen- > netback/netback.c > > --- a/drivers/net/xen-netback/netback.c 2012-01-25 19:39:32.000000000 > -0500 > > +++ b/drivers/net/xen-netback/netback.c 2012-08-28 17:31:22.000000000 > -0400 > > @@ -80,6 +80,11 @@ union page_ext { > > void *mapping; > > }; > > > > +struct skb_cb_overlay { > > + int meta_slots_used; > > + int count; > > +}; > > + > > struct xen_netbk { > > wait_queue_head_t wq; > > struct task_struct *task; > > @@ -324,9 +329,9 @@ unsigned int xen_netbk_count_skb_slots(s > > { > > unsigned int count; > > int i, copy_off; > > + struct skb_cb_overlay *sco; > > > > - count = DIV_ROUND_UP( > > - offset_in_page(skb->data)+skb_headlen(skb), > PAGE_SIZE); > > + count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE); > > This hunk appears to be upstream already (see > e26b203ede31fffd52571a5ba607a26c79dc5c0d). Which tree are you working > against? You should either base patches on Linus'' branch or on the > net-next branch. > > Other than this the patch looks good, thanks. > > > > > copy_off = skb_headlen(skb) % PAGE_SIZE; > > > > @@ -352,6 +357,8 @@ unsigned int xen_netbk_count_skb_slots(s > > size -= bytes; > > } > > } > > + sco = (struct skb_cb_overlay *)skb->cb; > > + sco->count = count; > > return count; > > } > > > > @@ -586,9 +593,6 @@ static void netbk_add_frag_responses(str > > } > > } > > > > -struct skb_cb_overlay { > > - int meta_slots_used; > > -}; > > > > static void xen_netbk_rx_action(struct xen_netbk *netbk) > > { > > @@ -621,12 +625,16 @@ static void xen_netbk_rx_action(struct x > > sco = (struct skb_cb_overlay *)skb->cb; > > sco->meta_slots_used = netbk_gop_skb(skb, &npo); > > > > - count += nr_frags + 1; > > + count += sco->count; > > > > __skb_queue_tail(&rxq, skb); > > > > + skb = skb_peek(&netbk->rx_queue); > > + if (skb == NULL) > > + break; > > + sco = (struct skb_cb_overlay *)skb->cb; > > /* Filled the batch queue? */ > > - if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE) > > + if (count + sco->count >= XEN_NETIF_RX_RING_SIZE) > > break; > > } > > >
Matt Wilson
2012-Dec-04 23:23 UTC
Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used
On Thu, Aug 30, 2012 at 09:07:11AM +0100, Ian Campbell wrote:> On Wed, 2012-08-29 at 13:21 +0100, Palagummi, Siva wrote: > > This patch contains the modifications that are discussed in thread > > http://lists.xen.org/archives/html/xen-devel/2012-08/msg01730.html[...]> > Instead of using max_required_rx_slots, I used the count that we > > already have in hand to verify if we have enough room in the batch > > queue for next skb. Please let me know if that is not appropriate. > > Things worked fine in my environment. Under heavy load now we seems to > > be consuming most of the slots in the queue and no BUG_ON :-) > > > From: Siva Palagummi <Siva.Palagummi@ca.com> > > > > count variable in xen_netbk_rx_action need to be incremented > > correctly to take into account of extra slots required when skb_headlen is > > greater than PAGE_SIZE when larger MTU values are used. Without this change > > BUG_ON(npo.meta_prod > ARRAY_SIZE(netbk->meta)) is causing netback thread > > to exit. > > > > The fix is to stash the counting already done in xen_netbk_count_skb_slots > > in skb_cb_overlay and use it directly in xen_netbk_rx_action.You don''t need to stash the estimated value to use it in xen_netbk_rx_action() - you have the actual number of slots consumed in hand from the return value of netbk_gop_skb(). See below.> > Also improved the checking for filling the batch queue. > > > > Also merged a change from a patch created for xen_netbk_count_skb_slots > > function as per thread > > http://lists.xen.org/archives/html/xen-devel/2012-05/msg01864.html > > > > The problem is seen with linux 3.2.2 kernel on Intel 10Gbps network > > > > > > Signed-off-by: Siva Palagummi <Siva.Palagummi@ca.com> > > --- > > diff -uprN a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c > > --- a/drivers/net/xen-netback/netback.c 2012-01-25 19:39:32.000000000 -0500 > > +++ b/drivers/net/xen-netback/netback.c 2012-08-28 17:31:22.000000000 -0400 > > @@ -80,6 +80,11 @@ union page_ext { > > void *mapping; > > }; > > > > +struct skb_cb_overlay { > > + int meta_slots_used; > > + int count; > > +};We don''t actually need a separate variable for the estimate. We could rename meta_slots_used to meta_slots. It could hold the estimate before netbk_gop_skb() is called and the actual number of slots consumed after. That might be confusing, though, so maybe it''s better off left as two variables.> > struct xen_netbk { > > wait_queue_head_t wq; > > struct task_struct *task; > > @@ -324,9 +329,9 @@ unsigned int xen_netbk_count_skb_slots(s > > { > > unsigned int count; > > int i, copy_off; > > + struct skb_cb_overlay *sco; > > > > - count = DIV_ROUND_UP( > > - offset_in_page(skb->data)+skb_headlen(skb), PAGE_SIZE); > > + count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE); > > This hunk appears to be upstream already (see > e26b203ede31fffd52571a5ba607a26c79dc5c0d). Which tree are you working > against? You should either base patches on Linus'' branch or on the > net-next branch. > > Other than this the patch looks good, thanks.I don''t think that this patch completely addresses problems calculating the number of slots required when large MTUs are used. For example: if netback is handling a skb with a large linear data portion, say 8157 bytes, that begins at 64 bytes from the start of the page. Assume that GSO is disabled and there are no frags. xen_netbk_count_skb_slots() will calculate that two slots are needed: count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE); but netbk_gop_frag_copy() will actually use three. Let''s walk through the loop: data = skb->data; while (data < skb_tail_pointer(skb)) { unsigned int offset = offset_in_page(data); unsigned int len = PAGE_SIZE - offset; if (data + len > skb_tail_pointer(skb)) len = skb_tail_pointer(skb) - data; netbk_gop_frag_copy(vif, skb, npo, virt_to_page(data), len, offset, &head); data += len; } The first pass will call netbk_gop_frag_copy() with len=4032, offset=64, and head=1. After the call, head will be set to 0. Inside netbk_gop_frag_copy(), start_new_rx_buffer() will be called with offset=0, size=4032, head=1. We''ll return false due to the checks for "offset" and "!head". The second pass will call netbk_gop_frag_copy() with len=4096, offset=0, head=0. netbk_gop_frag_copy() will get called with offset=4032, bytes=4096, head=0. We''ll return true here, which we shouldn''t since it''s just going to lead to buffer waste for the last bit. The third pass will call with len=29 and offset=0. start_new_rx_buffer() will be called with offset=4096, bytes=29, head=0. We''ll start a new buffer for the last bit. So you can see that we underestimate the buffers / meta slots required to handle a skb with a large linear buffer, as we commonly have to handle with large MTU sizes. This can lead to problems later on. I''m not as familiar with the new compound page frag handling code, but I can imagine that the same problem could exist there. But since the calculation logic in xen_netbk_count_skb_slots() directly models the code setting up the copy operation, at least it will be estimated properly.> > > > copy_off = skb_headlen(skb) % PAGE_SIZE; > > > > @@ -352,6 +357,8 @@ unsigned int xen_netbk_count_skb_slots(s > > size -= bytes; > > } > > } > > + sco = (struct skb_cb_overlay *)skb->cb; > > + sco->count = count; > > return count; > > } > > > > @@ -586,9 +593,6 @@ static void netbk_add_frag_responses(str > > } > > } > > > > -struct skb_cb_overlay { > > - int meta_slots_used; > > -}; > > > > static void xen_netbk_rx_action(struct xen_netbk *netbk) > > { > > @@ -621,12 +625,16 @@ static void xen_netbk_rx_action(struct x > > sco = (struct skb_cb_overlay *)skb->cb; > > sco->meta_slots_used = netbk_gop_skb(skb, &npo); > > > > - count += nr_frags + 1; > > + count += sco->count;Why increment count by the /estimated/ count instead of the actual number of slots used? We have the number of slots in the line just above, in sco->meta_slots_used.> > __skb_queue_tail(&rxq, skb); > > > > + skb = skb_peek(&netbk->rx_queue); > > + if (skb == NULL) > > + break; > > + sco = (struct skb_cb_overlay *)skb->cb; > > /* Filled the batch queue? */ > > - if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE) > > + if (count + sco->count >= XEN_NETIF_RX_RING_SIZE) > > break; > > } > >This change I like. We''re working on a patch to improve the buffer efficiency and the miscalculation problem. Siva, I''d be happy to re-base and re-submit this patch (with minor adjustments) as part of that work, unless you want to handle that. Matt
Palagummi, Siva
2012-Dec-05 11:56 UTC
Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used
Matt,> -----Original Message----- > From: Matt Wilson [mailto:msw@amazon.com] > Sent: Wednesday, December 05, 2012 4:53 AM > To: Ian Campbell; Palagummi, Siva > Cc: xen-devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH RFC V2] xen/netback: Count ring slots > properly when larger MTU sizes are used > > On Thu, Aug 30, 2012 at 09:07:11AM +0100, Ian Campbell wrote: > > On Wed, 2012-08-29 at 13:21 +0100, Palagummi, Siva wrote: > > > This patch contains the modifications that are discussed in thread > > > http://lists.xen.org/archives/html/xen-devel/2012-08/msg01730.html > > [...] > > > > Instead of using max_required_rx_slots, I used the count that we > > > already have in hand to verify if we have enough room in the batch > > > queue for next skb. Please let me know if that is not appropriate. > > > Things worked fine in my environment. Under heavy load now we seems > to > > > be consuming most of the slots in the queue and no BUG_ON :-) > > > > > From: Siva Palagummi <Siva.Palagummi@ca.com> > > > > > > count variable in xen_netbk_rx_action need to be incremented > > > correctly to take into account of extra slots required when > skb_headlen is > > > greater than PAGE_SIZE when larger MTU values are used. Without > this change > > > BUG_ON(npo.meta_prod > ARRAY_SIZE(netbk->meta)) is causing netback > thread > > > to exit. > > > > > > The fix is to stash the counting already done in > xen_netbk_count_skb_slots > > > in skb_cb_overlay and use it directly in xen_netbk_rx_action. > > You don''t need to stash the estimated value to use it in > xen_netbk_rx_action() - you have the actual number of slots consumed > in hand from the return value of netbk_gop_skb(). See below. > > > > Also improved the checking for filling the batch queue. > > > > > > Also merged a change from a patch created for > xen_netbk_count_skb_slots > > > function as per thread > > > http://lists.xen.org/archives/html/xen-devel/2012-05/msg01864.html > > > > > > The problem is seen with linux 3.2.2 kernel on Intel 10Gbps network > > > > > > > > > Signed-off-by: Siva Palagummi <Siva.Palagummi@ca.com> > > > --- > > > diff -uprN a/drivers/net/xen-netback/netback.c b/drivers/net/xen- > netback/netback.c > > > --- a/drivers/net/xen-netback/netback.c 2012-01-25 > 19:39:32.000000000 -0500 > > > +++ b/drivers/net/xen-netback/netback.c 2012-08-28 > 17:31:22.000000000 -0400 > > > @@ -80,6 +80,11 @@ union page_ext { > > > void *mapping; > > > }; > > > > > > +struct skb_cb_overlay { > > > + int meta_slots_used; > > > + int count; > > > +}; > > We don''t actually need a separate variable for the estimate. We could > rename meta_slots_used to meta_slots. It could hold the estimate > before netbk_gop_skb() is called and the actual number of slots > consumed after. That might be confusing, though, so maybe it''s better > off left as two variables. > > > > struct xen_netbk { > > > wait_queue_head_t wq; > > > struct task_struct *task; > > > @@ -324,9 +329,9 @@ unsigned int xen_netbk_count_skb_slots(s > > > { > > > unsigned int count; > > > int i, copy_off; > > > + struct skb_cb_overlay *sco; > > > > > > - count = DIV_ROUND_UP( > > > - offset_in_page(skb->data)+skb_headlen(skb), > PAGE_SIZE); > > > + count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE); > > > > This hunk appears to be upstream already (see > > e26b203ede31fffd52571a5ba607a26c79dc5c0d). Which tree are you working > > against? You should either base patches on Linus'' branch or on the > > net-next branch. > > > > Other than this the patch looks good, thanks. > > I don''t think that this patch completely addresses problems > calculating the number of slots required when large MTUs are used. > > For example: if netback is handling a skb with a large linear data > portion, say 8157 bytes, that begins at 64 bytes from the start of the > page. Assume that GSO is disabled and there are no frags. > xen_netbk_count_skb_slots() will calculate that two slots are needed: > > count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE); > > but netbk_gop_frag_copy() will actually use three. Let''s walk through > the loop:You are right. The above chunk which is already part of the upstream is unfortunately incorrect for some cases. We also ran into issues in our environment around a week back and found this problem. The count will be different based on head len because of the optimization that start_new_rx_buffer is trying to do for large buffers. A hole of size "offset_in_page" will be left in first page during copy if the remaining buffer size is >=PAG_SIZE. This subsequently affects the copy_off as well. So xen_netbk_count_skb_slots actually needs a fix to calculate the count correctly based on head len. And also a fix to calculate the copy_off properly to which the data from fragments gets copied. max_required_rx_slots also may require a fix to account the additional slot that may be required in case mtu >= PAG_SIZE. For worst case scenario atleast another +1. One thing that is still puzzling here is, max_required_rx_slots seems to be assuming that linear length in head will never be greater than mtu size. But that doesn''t seem to be the case all the time. I wonder if it requires some kind of fix there or special handling when count_skb_slots exceeds max_required_rx_slots.> > data = skb->data; > while (data < skb_tail_pointer(skb)) { > unsigned int offset = offset_in_page(data); > unsigned int len = PAGE_SIZE - offset; > > if (data + len > skb_tail_pointer(skb)) > len = skb_tail_pointer(skb) - data; > > netbk_gop_frag_copy(vif, skb, npo, > virt_to_page(data), len, offset, > &head); > data += len; > } > > The first pass will call netbk_gop_frag_copy() with len=4032, > offset=64, and head=1. After the call, head will be set to 0. Inside > netbk_gop_frag_copy(), start_new_rx_buffer() will be called with > offset=0, size=4032, head=1. We''ll return false due to the checks for > "offset" and "!head". > > The second pass will call netbk_gop_frag_copy() with len=4096, > offset=0, head=0. netbk_gop_frag_copy() will get called with > offset=4032, bytes=4096, head=0. We''ll return true here, which we > shouldn''t since it''s just going to lead to buffer waste for the last > bit. > > The third pass will call with len=29 and offset=0. > start_new_rx_buffer() > will be called with offset=4096, bytes=29, head=0. We''ll start a new > buffer for the last bit. > > So you can see that we underestimate the buffers / meta slots required > to handle a skb with a large linear buffer, as we commonly have to > handle with large MTU sizes. This can lead to problems later on. > > I''m not as familiar with the new compound page frag handling code, but > I can imagine that the same problem could exist there. But since the > calculation logic in xen_netbk_count_skb_slots() directly models the > code setting up the copy operation, at least it will be estimated > properly. > > > > > > > copy_off = skb_headlen(skb) % PAGE_SIZE; > > > > > > @@ -352,6 +357,8 @@ unsigned int xen_netbk_count_skb_slots(s > > > size -= bytes; > > > } > > > } > > > + sco = (struct skb_cb_overlay *)skb->cb; > > > + sco->count = count; > > > return count; > > > } > > > > > > @@ -586,9 +593,6 @@ static void netbk_add_frag_responses(str > > > } > > > } > > > > > > -struct skb_cb_overlay { > > > - int meta_slots_used; > > > -}; > > > > > > static void xen_netbk_rx_action(struct xen_netbk *netbk) > > > { > > > @@ -621,12 +625,16 @@ static void xen_netbk_rx_action(struct x > > > sco = (struct skb_cb_overlay *)skb->cb; > > > sco->meta_slots_used = netbk_gop_skb(skb, &npo); > > > > > > - count += nr_frags + 1; > > > + count += sco->count; > > Why increment count by the /estimated/ count instead of the actual > number of slots used? We have the number of slots in the line just > above, in sco->meta_slots_used. >Count actually refers to ring slots consumed rather than meta_slots used. Count can be different from meta_slots_used.> > > __skb_queue_tail(&rxq, skb); > > > > > > + skb = skb_peek(&netbk->rx_queue); > > > + if (skb == NULL) > > > + break; > > > + sco = (struct skb_cb_overlay *)skb->cb; > > > /* Filled the batch queue? */ > > > - if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE) > > > + if (count + sco->count >= XEN_NETIF_RX_RING_SIZE) > > > break; > > > } > > > > > This change I like. > > We''re working on a patch to improve the buffer efficiency and the > miscalculation problem. Siva, I''d be happy to re-base and re-submit > this patch (with minor adjustments) as part of that work, unless you > want to handle that. > > MattThanks!! Please feel free to re-base and re-submit :-) Siva
Matt Wilson
2012-Dec-06 05:35 UTC
Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used
On Wed, Dec 05, 2012 at 11:56:32AM +0000, Palagummi, Siva wrote:> Matt,[...]> You are right. The above chunk which is already part of the upstream > is unfortunately incorrect for some cases. We also ran into issues > in our environment around a week back and found this problem. The > count will be different based on head len because of the > optimization that start_new_rx_buffer is trying to do for large > buffers. A hole of size "offset_in_page" will be left in first page > during copy if the remaining buffer size is >=PAG_SIZE. This > subsequently affects the copy_off as well. > > So xen_netbk_count_skb_slots actually needs a fix to calculate the > count correctly based on head len. And also a fix to calculate the > copy_off properly to which the data from fragments gets copied.Can you explain more about the copy_off problem? I''m not seeing it.> max_required_rx_slots also may require a fix to account the > additional slot that may be required in case mtu >= PAG_SIZE. For > worst case scenario atleast another +1. One thing that is still > puzzling here is, max_required_rx_slots seems to be assuming that > linear length in head will never be greater than mtu size. But that > doesn''t seem to be the case all the time. I wonder if it requires > some kind of fix there or special handling when count_skb_slots > exceeds max_required_rx_slots.We should only be using the number of pages required to copy the data. The fix shouldn''t be to anticipate wasting ring space by increasing the return value of max_required_rx_slots(). [...]> > Why increment count by the /estimated/ count instead of the actual > > number of slots used? We have the number of slots in the line just > > above, in sco->meta_slots_used. > > > > Count actually refers to ring slots consumed rather than meta_slots > used. Count can be different from meta_slots_used.Aah, indeed. This can end up being too pessimistic if you have lots of frags that require multiple copy operations. I still think that it would be better to calculate the actual number of ring slots consumed by netbk_gop_skb() to avoid other bugs like the one you originally fixed.> > > > __skb_queue_tail(&rxq, skb); > > > > > > > > + skb = skb_peek(&netbk->rx_queue); > > > > + if (skb == NULL) > > > > + break; > > > > + sco = (struct skb_cb_overlay *)skb->cb; > > > > /* Filled the batch queue? */ > > > > - if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE) > > > > + if (count + sco->count >= XEN_NETIF_RX_RING_SIZE) > > > > break; > > > > } > > > > > > > > This change I like. > > > > We''re working on a patch to improve the buffer efficiency and the > > miscalculation problem. Siva, I''d be happy to re-base and re-submit > > this patch (with minor adjustments) as part of that work, unless you > > want to handle that. > > > > Matt > > Thanks!! Please feel free to re-base and re-submit :-)OK, thanks! Matt
Palagummi, Siva
2012-Dec-11 10:25 UTC
Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used
> -----Original Message----- > From: Matt Wilson [mailto:msw@amazon.com] > Sent: Thursday, December 06, 2012 11:05 AM > To: Palagummi, Siva > Cc: Ian Campbell; xen-devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH RFC V2] xen/netback: Count ring slots > properly when larger MTU sizes are used > > On Wed, Dec 05, 2012 at 11:56:32AM +0000, Palagummi, Siva wrote: > > Matt, > [...] > > You are right. The above chunk which is already part of the upstream > > is unfortunately incorrect for some cases. We also ran into issues > > in our environment around a week back and found this problem. The > > count will be different based on head len because of the > > optimization that start_new_rx_buffer is trying to do for large > > buffers. A hole of size "offset_in_page" will be left in first page > > during copy if the remaining buffer size is >=PAG_SIZE. This > > subsequently affects the copy_off as well. > > > > So xen_netbk_count_skb_slots actually needs a fix to calculate the > > count correctly based on head len. And also a fix to calculate the > > copy_off properly to which the data from fragments gets copied. > > Can you explain more about the copy_off problem? I''m not seeing it.You can clearly see below that copy_off is input to start_new_rx_buffer while copying frags. So if the buggy "count" calculation below is fixed based on offset_in_page value then copy_off value also will change accordingly. count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE); copy_off = skb_headlen(skb) % PAGE_SIZE; if (skb_shinfo(skb)->gso_size) count++; for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]); unsigned long bytes; while (size > 0) { BUG_ON(copy_off > MAX_BUFFER_OFFSET); if (start_new_rx_buffer(copy_off, size, 0)) { count++; copy_off = 0; } So a correct calculation should be somewhat like below because of the optimization in start_new_rx_buffer for larger sizes. linear_len = skb_headlen(skb) count = (linear_len <= PAGE_SIZE) ? 1 :DIV_ROUND_UP(offset_in_page(skb->data)+linear_len, PAGE_SIZE)); copy_off = ((offset_in_page(skb->data)+linear_len) < 2*PAGE_SIZE) ? linear_len % PAGE_SIZE; : (offset_in_page(skb->data)+linear_len) % PAGE_SIZE;> > > max_required_rx_slots also may require a fix to account the > > additional slot that may be required in case mtu >= PAG_SIZE. For > > worst case scenario atleast another +1. One thing that is still > > puzzling here is, max_required_rx_slots seems to be assuming that > > linear length in head will never be greater than mtu size. But that > > doesn''t seem to be the case all the time. I wonder if it requires > > some kind of fix there or special handling when count_skb_slots > > exceeds max_required_rx_slots. > > We should only be using the number of pages required to copy the > data. The fix shouldn''t be to anticipate wasting ring space by > increasing the return value of max_required_rx_slots(). >I do not think we are wasting any ring space. But just ensuring that we have enough before proceeding ahead.> [...] > > > > Why increment count by the /estimated/ count instead of the actual > > > number of slots used? We have the number of slots in the line just > > > above, in sco->meta_slots_used. > > > > > > > Count actually refers to ring slots consumed rather than meta_slots > > used. Count can be different from meta_slots_used. > > Aah, indeed. This can end up being too pessimistic if you have lots of > frags that require multiple copy operations. I still think that it > would be better to calculate the actual number of ring slots consumed > by netbk_gop_skb() to avoid other bugs like the one you originally > fixed. >counting done in count_skb_slots is what exactly it is. The fix done above is to make it same so that no need to re calculate again. Thanks Siva> > > > > __skb_queue_tail(&rxq, skb); > > > > > > > > > > + skb = skb_peek(&netbk->rx_queue); > > > > > + if (skb == NULL) > > > > > + break; > > > > > + sco = (struct skb_cb_overlay *)skb->cb; > > > > > /* Filled the batch queue? */ > > > > > - if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE) > > > > > + if (count + sco->count >= XEN_NETIF_RX_RING_SIZE) > > > > > break; > > > > > } > > > > > > > > > > > This change I like. > > > > > > We''re working on a patch to improve the buffer efficiency and the > > > miscalculation problem. Siva, I''d be happy to re-base and re-submit > > > this patch (with minor adjustments) as part of that work, unless > you > > > want to handle that. > > > > > > Matt > > > > Thanks!! Please feel free to re-base and re-submit :-) > > OK, thanks! > > Matt
Matt Wilson
2012-Dec-11 21:34 UTC
Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used
On Tue, Dec 11, 2012 at 10:25:51AM +0000, Palagummi, Siva wrote:> > -----Original Message----- > > From: Matt Wilson [mailto:msw@amazon.com] > > Sent: Thursday, December 06, 2012 11:05 AM > > To: Palagummi, Siva > > Cc: Ian Campbell; xen-devel@lists.xen.org > > Subject: Re: [Xen-devel] [PATCH RFC V2] xen/netback: Count ring slots > > properly when larger MTU sizes are used > > > > On Wed, Dec 05, 2012 at 11:56:32AM +0000, Palagummi, Siva wrote: > > > Matt, > > [...] > > > You are right. The above chunk which is already part of the upstream > > > is unfortunately incorrect for some cases. We also ran into issues > > > in our environment around a week back and found this problem. The > > > count will be different based on head len because of the > > > optimization that start_new_rx_buffer is trying to do for large > > > buffers. A hole of size "offset_in_page" will be left in first page > > > during copy if the remaining buffer size is >=PAG_SIZE. This > > > subsequently affects the copy_off as well. > > > > > > So xen_netbk_count_skb_slots actually needs a fix to calculate the > > > count correctly based on head len. And also a fix to calculate the > > > copy_off properly to which the data from fragments gets copied. > > > > Can you explain more about the copy_off problem? I''m not seeing it. > > You can clearly see below that copy_off is input to > start_new_rx_buffer while copying frags.Yes, but that''s the right thing to do. copy_off should be set to the destination offset after copying the last byte of linear data, which means "skb_headlen(skb) % PAGE_SIZE" is correct.> So if the buggy "count" calculation below is fixed based on > offset_in_page value then copy_off value also will change > accordingly.This calculation is not incorrect. You should only need as many PAGE_SIZE buffers as you have linear data to fill.> count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE); > > copy_off = skb_headlen(skb) % PAGE_SIZE; > > if (skb_shinfo(skb)->gso_size) > count++; > > for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { > unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]); > unsigned long bytes; > while (size > 0) { > BUG_ON(copy_off > MAX_BUFFER_OFFSET); > > if (start_new_rx_buffer(copy_off, size, 0)) { > count++; > copy_off = 0; > } > > > So a correct calculation should be somewhat like below because of > the optimization in start_new_rx_buffer for larger sizes.start_new_rx_buffer() should not be starting a new buffer after the first pass copying the linear data.> linear_len = skb_headlen(skb) > count = (linear_len <= PAGE_SIZE) > ? 1 > :DIV_ROUND_UP(offset_in_page(skb->data)+linear_len, PAGE_SIZE)); > > copy_off = ((offset_in_page(skb->data)+linear_len) < 2*PAGE_SIZE) > ? linear_len % PAGE_SIZE; > : (offset_in_page(skb->data)+linear_len) % PAGE_SIZE;A change like this makes the code much more difficult to understand.> > > max_required_rx_slots also may require a fix to account the > > > additional slot that may be required in case mtu >= PAG_SIZE. For > > > worst case scenario atleast another +1. One thing that is still > > > puzzling here is, max_required_rx_slots seems to be assuming that > > > linear length in head will never be greater than mtu size. But that > > > doesn''t seem to be the case all the time. I wonder if it requires > > > some kind of fix there or special handling when count_skb_slots > > > exceeds max_required_rx_slots. > > > > We should only be using the number of pages required to copy the > > data. The fix shouldn''t be to anticipate wasting ring space by > > increasing the return value of max_required_rx_slots(). > > > > I do not think we are wasting any ring space. But just ensuring that > we have enough before proceeding ahead.For some SKBs with large linear buffers, we certainly are wasting space. Go back and read the explanation in http://lists.xen.org/archives/html/xen-devel/2012-12/msg00274.html> > [...] > > > > > > Why increment count by the /estimated/ count instead of the actual > > > > number of slots used? We have the number of slots in the line just > > > > above, in sco->meta_slots_used. > > > > > > > > > > Count actually refers to ring slots consumed rather than meta_slots > > > used. Count can be different from meta_slots_used. > > > > Aah, indeed. This can end up being too pessimistic if you have lots of > > frags that require multiple copy operations. I still think that it > > would be better to calculate the actual number of ring slots consumed > > by netbk_gop_skb() to avoid other bugs like the one you originally > > fixed. > > > > counting done in count_skb_slots is what exactly it is. The fix done > above is to make it same so that no need to re calculate again.Today, the counting done in count_skb_slots() *does not* match the number of buffer slots consumed by netbk_gop_skb(). Matt
Palagummi, Siva
2012-Dec-13 23:12 UTC
Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used
> -----Original Message----- > From: Matt Wilson [mailto:msw@amazon.com] > Sent: Wednesday, December 12, 2012 3:05 AM > To: Palagummi, Siva > Cc: Ian Campbell; xen-devel@lists.xen.org > Subject: Re: [Xen-devel] [PATCH RFC V2] xen/netback: Count ring slots > properly when larger MTU sizes are used > > On Tue, Dec 11, 2012 at 10:25:51AM +0000, Palagummi, Siva wrote: > > > -----Original Message----- > > > From: Matt Wilson [mailto:msw@amazon.com] > > > Sent: Thursday, December 06, 2012 11:05 AM > > > To: Palagummi, Siva > > > Cc: Ian Campbell; xen-devel@lists.xen.org > > > Subject: Re: [Xen-devel] [PATCH RFC V2] xen/netback: Count ring > slots > > > properly when larger MTU sizes are used > > > > > > On Wed, Dec 05, 2012 at 11:56:32AM +0000, Palagummi, Siva wrote: > > > > Matt, > > > [...] > > > > You are right. The above chunk which is already part of the > upstream > > > > is unfortunately incorrect for some cases. We also ran into > issues > > > > in our environment around a week back and found this problem. The > > > > count will be different based on head len because of the > > > > optimization that start_new_rx_buffer is trying to do for large > > > > buffers. A hole of size "offset_in_page" will be left in first > page > > > > during copy if the remaining buffer size is >=PAG_SIZE. This > > > > subsequently affects the copy_off as well. > > > > > > > > So xen_netbk_count_skb_slots actually needs a fix to calculate > the > > > > count correctly based on head len. And also a fix to calculate > the > > > > copy_off properly to which the data from fragments gets copied. > > > > > > Can you explain more about the copy_off problem? I''m not seeing it. > > > > You can clearly see below that copy_off is input to > > start_new_rx_buffer while copying frags. > > Yes, but that''s the right thing to do. copy_off should be set to the > destination offset after copying the last byte of linear data, which > means "skb_headlen(skb) % PAGE_SIZE" is correct. >No. It is not correct for two reasons. For example what if skb_headlen(skb) is exactly a multiple of PAGE_SIZE. Copy_off would be set to ZERO. And now if there exists some data in frags, ZERO will be passed in as copy_off value and start_new_rx_buffer will return FALSE. And second reason is the obvious case from the current code where "offset_in_page(skb->data)" size hole will be left in the first buffer after first pass in case remaining data that need to be copied is going to overflow the first buffer.> > So if the buggy "count" calculation below is fixed based on > > offset_in_page value then copy_off value also will change > > accordingly. > > This calculation is not incorrect. You should only need as many > PAGE_SIZE buffers as you have linear data to fill. >This calculation is incorrect and do not match actual slots used as it is now unless some new change is done either in nebk_gop_skb or in start_new_rx_buffer.> > count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE); > > > > copy_off = skb_headlen(skb) % PAGE_SIZE; > > > > if (skb_shinfo(skb)->gso_size) > > count++; > > > > for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { > > unsigned long size = skb_frag_size(&skb_shinfo(skb)- > >frags[i]); > > unsigned long bytes; > > while (size > 0) { > > BUG_ON(copy_off > MAX_BUFFER_OFFSET); > > > > if (start_new_rx_buffer(copy_off, size, 0)) { > > count++; > > copy_off = 0; > > } > > > > > > So a correct calculation should be somewhat like below because of > > the optimization in start_new_rx_buffer for larger sizes. > > start_new_rx_buffer() should not be starting a new buffer after the > first pass copying the linear data. > > > linear_len = skb_headlen(skb) > > count = (linear_len <= PAGE_SIZE) > > ? 1 > > :DIV_ROUND_UP(offset_in_page(skb->data)+linear_len, > PAGE_SIZE)); > > > > copy_off = ((offset_in_page(skb->data)+linear_len) < > 2*PAGE_SIZE) > > ? linear_len % PAGE_SIZE; > > : (offset_in_page(skb->data)+linear_len) % PAGE_SIZE; > > A change like this makes the code much more difficult to understand. >:-) It would have been easier had we written logic using a for loop similar to how the counting is done for data in frags. In fact I did do mistake in above calculations :-( . A proper logic probably should look somewhat like below. linear_len=skb_headlen(skb); count = (linear_len <= PAGE_SIZE) ? 1 :DIV_ROUND_UP(offset_in_page(skb->data)+linear_len,PAGE_SIZE); copy_off = (linear_len <= PAGE_SIZE) ?linear_len :( offset_in_page(skb->data)+linear_len -1)%PAGE_SIZE+1;> > > > max_required_rx_slots also may require a fix to account the > > > > additional slot that may be required in case mtu >= PAG_SIZE. For > > > > worst case scenario atleast another +1. One thing that is still > > > > puzzling here is, max_required_rx_slots seems to be assuming that > > > > linear length in head will never be greater than mtu size. But > that > > > > doesn''t seem to be the case all the time. I wonder if it requires > > > > some kind of fix there or special handling when count_skb_slots > > > > exceeds max_required_rx_slots. > > > > > > We should only be using the number of pages required to copy the > > > data. The fix shouldn''t be to anticipate wasting ring space by > > > increasing the return value of max_required_rx_slots(). > > > > > > > I do not think we are wasting any ring space. But just ensuring that > > we have enough before proceeding ahead. > > For some SKBs with large linear buffers, we certainly are wasting > space. Go back and read the explanation in > http://lists.xen.org/archives/html/xen-devel/2012-12/msg00274.html >I think I probably did not put my point clearly to make it understandable. Xen_netbk_rx_ring_full uses max_required_rx_slots value. Xen_netbk_rx_ring_full is called to decide wither a vif is schedulable or not. So in case the mtu value is >=PAGE_SIZE, for a worst case scenario additional buffer would be required which is not taken care by current calculations. Ofcourse in your new fix if you do a code change not to leave a hole in first buffer then this correction may not be required. But I am not the right person to decide the implications of the fix you are proposing. The current start_new_rx_buffer seems to be trying to make the copies PAGE aligned and also reduce number of copy operations. For example let us say SKB_HEAD_LEN is for whatever reason 4*PAGE_SIZE and offset_in_page is 32. As per existing logic of start_new_rx_buffer and with the fix I am proposing for count and copy_off, we will calculate and occupy 5 ring buffers and will use 5 copy operations. If we fix it the way you are proposing, not to leave a hole in the first buffer by modifying start_new_rx_buffer then it will occupy 4 ring buffers but will require 8 copy operations as per existing logic in netbk_gop_skb while copying head!! Thanks Siva> > > [...] > > > > > > > > Why increment count by the /estimated/ count instead of the > actual > > > > > number of slots used? We have the number of slots in the line > just > > > > > above, in sco->meta_slots_used. > > > > > > > > > > > > > Count actually refers to ring slots consumed rather than > meta_slots > > > > used. Count can be different from meta_slots_used. > > > > > > Aah, indeed. This can end up being too pessimistic if you have lots > of > > > frags that require multiple copy operations. I still think that it > > > would be better to calculate the actual number of ring slots > consumed > > > by netbk_gop_skb() to avoid other bugs like the one you originally > > > fixed. > > > > > > > counting done in count_skb_slots is what exactly it is. The fix done > > above is to make it same so that no need to re calculate again. > > Today, the counting done in count_skb_slots() *does not* match the > number of buffer slots consumed by netbk_gop_skb(). > > Matt
Matt Wilson
2012-Dec-14 18:53 UTC
Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used
On Thu, Dec 13, 2012 at 11:12:50PM +0000, Palagummi, Siva wrote:> > -----Original Message----- > > From: Matt Wilson [mailto:msw@amazon.com] > > Sent: Wednesday, December 12, 2012 3:05 AM > > > > On Tue, Dec 11, 2012 at 10:25:51AM +0000, Palagummi, Siva wrote: > > > > > > You can clearly see below that copy_off is input to > > > start_new_rx_buffer while copying frags. > > > > Yes, but that''s the right thing to do. copy_off should be set to the > > destination offset after copying the last byte of linear data, which > > means "skb_headlen(skb) % PAGE_SIZE" is correct. > > > > No. It is not correct for two reasons. For example what if > skb_headlen(skb) is exactly a multiple of PAGE_SIZE. Copy_off would > be set to ZERO. And now if there exists some data in frags, ZERO > will be passed in as copy_off value and start_new_rx_buffer will > return FALSE. And second reason is the obvious case from the current > code where "offset_in_page(skb->data)" size hole will be left in the > first buffer after first pass in case remaining data that need to be > copied is going to overflow the first buffer.Right, and I''m arguing that having the code leave a hole is less desirable than potentially increasing the number of copy operations. I''d like to hear from Ian and others if using the buffers efficiently is more important than reducing copy operations. Intuitively, I think it''s more important to use the ring efficiently.> > > So if the buggy "count" calculation below is fixed based on > > > offset_in_page value then copy_off value also will change > > > accordingly. > > > > This calculation is not incorrect. You should only need as many > > PAGE_SIZE buffers as you have linear data to fill. > > > > This calculation is incorrect and do not match actual slots used as > it is now unless some new change is done either in nebk_gop_skb or > in start_new_rx_buffer.Yes, and I''m proposing to change netbk_gop_skb() and start_new_rx_buffer().> > > linear_len = skb_headlen(skb) > > > count = (linear_len <= PAGE_SIZE) > > > ? 1 > > > :DIV_ROUND_UP(offset_in_page(skb->data)+linear_len, > > PAGE_SIZE)); > > > > > > copy_off = ((offset_in_page(skb->data)+linear_len) < > > 2*PAGE_SIZE) > > > ? linear_len % PAGE_SIZE; > > > : (offset_in_page(skb->data)+linear_len) % PAGE_SIZE; > > > > A change like this makes the code much more difficult to understand. > >> :-) It would have been easier had we written logic using a for loop > similar to how the counting is done for data in frags. In fact I did > do mistake in above calculations :-( . A proper logic probably > should look somewhat like below.Yes, we tested one version of a fix that modeled the algorithm used in count_skb_slots() more directly after the copy loop in netbk_gop_skb() to avoid such errors.> linear_len=skb_headlen(skb); > count = (linear_len <= PAGE_SIZE) > ? 1 > :DIV_ROUND_UP(offset_in_page(skb->data)+linear_len,PAGE_SIZE); > > copy_off = (linear_len <= PAGE_SIZE) > ?linear_len > :( offset_in_page(skb->data)+linear_len -1)%PAGE_SIZE+1;If we can agree that utilizing buffers efficiently is more important than reducing copy operations, we can avoid this complexity.> > For some SKBs with large linear buffers, we certainly are wasting > > space. Go back and read the explanation in > > http://lists.xen.org/archives/html/xen-devel/2012-12/msg00274.html > > I think I probably did not put my point clearly to make it > understandable. Xen_netbk_rx_ring_full uses max_required_rx_slots > value. Xen_netbk_rx_ring_full is called to decide wither a vif is > schedulable or not. So in case the mtu value is >=PAGE_SIZE, for a > worst case scenario additional buffer would be required which is not > taken care by current calculations.Correct, I agree with your description of the current problem.> Ofcourse in your new fix if you do a code change not to leave a hole > in first buffer then this correction may not be required. But I am > not the right person to decide the implications of the fix you are > proposing. The current start_new_rx_buffer seems to be trying to > make the copies PAGE aligned and also reduce number of copy > operations.Indeed, that is what the change that we''re testing now does. Changing start_new_rx_buffer() to avoid leaving a hole in the first buffer is performing quite well. Unfortunately I don''t have comparison numbers just now.> For example let us say SKB_HEAD_LEN is for whatever reason > 4*PAGE_SIZE and offset_in_page is 32. > > As per existing logic of start_new_rx_buffer and with the fix I am > proposing for count and copy_off, we will calculate and occupy 5 > ring buffers and will use 5 copy operations.I agree.> If we fix it the way you are proposing, not to leave a hole in the > first buffer by modifying start_new_rx_buffer then it will occupy 4 > ring buffers but will require 8 copy operations as per existing > logic in netbk_gop_skb while copying head!!I think that we should optimize for more commonly seen lengths, which I''d think would be 2-3 pages max. The delta in copy operations is smaller in these cases, where we would use 4 copy operations for 2 pages (as opposed to 3 with the current algorithm) and 6 copy operations for 3 pages (as opposed to 4 with the current algorithm). IanC, Konrad - do you have any opinions on the best approach here? Matt
Ian Campbell
2012-Dec-17 11:26 UTC
Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used
On Fri, 2012-12-14 at 18:53 +0000, Matt Wilson wrote:> On Thu, Dec 13, 2012 at 11:12:50PM +0000, Palagummi, Siva wrote: > > > -----Original Message----- > > > From: Matt Wilson [mailto:msw@amazon.com] > > > Sent: Wednesday, December 12, 2012 3:05 AM > > > > > > On Tue, Dec 11, 2012 at 10:25:51AM +0000, Palagummi, Siva wrote: > > > > > > > > You can clearly see below that copy_off is input to > > > > start_new_rx_buffer while copying frags. > > > > > > Yes, but that''s the right thing to do. copy_off should be set to the > > > destination offset after copying the last byte of linear data, which > > > means "skb_headlen(skb) % PAGE_SIZE" is correct. > > > > > > > No. It is not correct for two reasons. For example what if > > skb_headlen(skb) is exactly a multiple of PAGE_SIZE. Copy_off would > > be set to ZERO. And now if there exists some data in frags, ZERO > > will be passed in as copy_off value and start_new_rx_buffer will > > return FALSE. And second reason is the obvious case from the current > > code where "offset_in_page(skb->data)" size hole will be left in the > > first buffer after first pass in case remaining data that need to be > > copied is going to overflow the first buffer. > > Right, and I''m arguing that having the code leave a hole is less > desirable than potentially increasing the number of copy > operations. I''d like to hear from Ian and others if using the buffers > efficiently is more important than reducing copy operations. Intuitively, > I think it''s more important to use the ring efficiently.Do you mean the ring or the actual buffers? The current code tries to coalesce multiple small frags/heads because it is usually trivial but doesn''t try too hard with multiple larger frags, since they take up most of a page by themselves anyway. I suppose this does waste a bit of buffer space and therefore could take more ring slots, but it''s not clear to me how much this matters in practice (it might be tricky to measure this with any realistic workload). The cost of splitting a copy into two should be low though, the copies are already batched into a single hypercall and I''d expect things to be mostly dominated by the data copy itself rather than the setup of each individual op, which would argue for splitting a copy in two if that helps fill the buffers. The flip side is that once you get past the headers etc the paged frags likely tend to either bits and bobs (fine) or mostly whole pages. In the whole pages case trying to fill the buffers will result in every copy getting split. My gut tells me that the whole pages case probably dominates, but I''m not sure what the real world impact of splitting all the copies would be.> > > If we fix it the way you are proposing, not to leave a hole in the > > first buffer by modifying start_new_rx_buffer then it will occupy 4 > > ring buffers but will require 8 copy operations as per existing > > logic in netbk_gop_skb while copying head!! > > I think that we should optimize for more commonly seen lengths, which > I''d think would be 2-3 pages max. The delta in copy operations is > smaller in these cases, where we would use 4 copy operations for 2 > pages (as opposed to 3 with the current algorithm) and 6 copy > operations for 3 pages (as opposed to 4 with the current algorithm). > > IanC, Konrad - do you have any opinions on the best approach here? > > Matt
Matt Wilson
2012-Dec-17 20:09 UTC
Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used
On Mon, Dec 17, 2012 at 11:26:38AM +0000, Ian Campbell wrote:> On Fri, 2012-12-14 at 18:53 +0000, Matt Wilson wrote: > > On Thu, Dec 13, 2012 at 11:12:50PM +0000, Palagummi, Siva wrote: > > > > -----Original Message----- > > > > From: Matt Wilson [mailto:msw@amazon.com] > > > > Sent: Wednesday, December 12, 2012 3:05 AM > > > > > > > > On Tue, Dec 11, 2012 at 10:25:51AM +0000, Palagummi, Siva wrote: > > > > > > > > > > You can clearly see below that copy_off is input to > > > > > start_new_rx_buffer while copying frags. > > > > > > > > Yes, but that''s the right thing to do. copy_off should be set to the > > > > destination offset after copying the last byte of linear data, which > > > > means "skb_headlen(skb) % PAGE_SIZE" is correct. > > > > > > > > > > No. It is not correct for two reasons. For example what if > > > skb_headlen(skb) is exactly a multiple of PAGE_SIZE. Copy_off would > > > be set to ZERO. And now if there exists some data in frags, ZERO > > > will be passed in as copy_off value and start_new_rx_buffer will > > > return FALSE. And second reason is the obvious case from the current > > > code where "offset_in_page(skb->data)" size hole will be left in the > > > first buffer after first pass in case remaining data that need to be > > > copied is going to overflow the first buffer. > > > > Right, and I''m arguing that having the code leave a hole is less > > desirable than potentially increasing the number of copy > > operations. I''d like to hear from Ian and others if using the buffers > > efficiently is more important than reducing copy operations. Intuitively, > > I think it''s more important to use the ring efficiently. > > Do you mean the ring or the actual buffers?Sorry, the actual buffers.> The current code tries to coalesce multiple small frags/heads because it > is usually trivial but doesn''t try too hard with multiple larger frags, > since they take up most of a page by themselves anyway. I suppose this > does waste a bit of buffer space and therefore could take more ring > slots, but it''s not clear to me how much this matters in practice (it > might be tricky to measure this with any realistic workload).In the case where we''re consistently handling large heads (like when using a MTU value of 9000 for streaming traffic), we''re wasting 1/3 of the available buffers.> The cost of splitting a copy into two should be low though, the copies > are already batched into a single hypercall and I''d expect things to be > mostly dominated by the data copy itself rather than the setup of each > individual op, which would argue for splitting a copy in two if that > helps fill the buffers.That was my thought as well. We''re testing a patch that does just this now.> The flip side is that once you get past the headers etc the paged frags > likely tend to either bits and bobs (fine) or mostly whole pages. In the > whole pages case trying to fill the buffers will result in every copy > getting split. My gut tells me that the whole pages case probably > dominates, but I''m not sure what the real world impact of splitting all > the copies would be.Right, I''m less concerned about the paged frags. It might make sense to skip some space so that the copying can be page aligned. I suppose it depends on how many defferent pages are in the list, and what the total size is. In practice I''d think it would be rare to see a paged SKB for ingress traffic to domUs unless there is significant intra-host communication (dom0->domU, domU->domU). When domU ingress traffic is originating from an Ethernet device it shouldn''t be paged. Paged SKBs would come into play when a SKB is formed for transmit on an egress device that is SG-capable. Or am I misunderstanding how paged SKBs are used these days? Matt
Ian Campbell
2012-Dec-18 10:02 UTC
Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used
On Mon, 2012-12-17 at 20:09 +0000, Matt Wilson wrote:> On Mon, Dec 17, 2012 at 11:26:38AM +0000, Ian Campbell wrote: > > On Fri, 2012-12-14 at 18:53 +0000, Matt Wilson wrote: > > > On Thu, Dec 13, 2012 at 11:12:50PM +0000, Palagummi, Siva wrote: > > > > > -----Original Message----- > > > > > From: Matt Wilson [mailto:msw@amazon.com] > > > > > Sent: Wednesday, December 12, 2012 3:05 AM > > > > > > > > > > On Tue, Dec 11, 2012 at 10:25:51AM +0000, Palagummi, Siva wrote: > > > > > > > > > > > > You can clearly see below that copy_off is input to > > > > > > start_new_rx_buffer while copying frags. > > > > > > > > > > Yes, but that''s the right thing to do. copy_off should be set to the > > > > > destination offset after copying the last byte of linear data, which > > > > > means "skb_headlen(skb) % PAGE_SIZE" is correct. > > > > > > > > > > > > > No. It is not correct for two reasons. For example what if > > > > skb_headlen(skb) is exactly a multiple of PAGE_SIZE. Copy_off would > > > > be set to ZERO. And now if there exists some data in frags, ZERO > > > > will be passed in as copy_off value and start_new_rx_buffer will > > > > return FALSE. And second reason is the obvious case from the current > > > > code where "offset_in_page(skb->data)" size hole will be left in the > > > > first buffer after first pass in case remaining data that need to be > > > > copied is going to overflow the first buffer. > > > > > > Right, and I''m arguing that having the code leave a hole is less > > > desirable than potentially increasing the number of copy > > > operations. I''d like to hear from Ian and others if using the buffers > > > efficiently is more important than reducing copy operations. Intuitively, > > > I think it''s more important to use the ring efficiently. > > > > Do you mean the ring or the actual buffers? > > Sorry, the actual buffers. > > > The current code tries to coalesce multiple small frags/heads because it > > is usually trivial but doesn''t try too hard with multiple larger frags, > > since they take up most of a page by themselves anyway. I suppose this > > does waste a bit of buffer space and therefore could take more ring > > slots, but it''s not clear to me how much this matters in practice (it > > might be tricky to measure this with any realistic workload). > > In the case where we''re consistently handling large heads (like when > using a MTU value of 9000 for streaming traffic), we''re wasting 1/3 of > the available buffers.Sorry if I missed this earlier in the thread, but how do we end up wasting so much? For an skb with 9000 bytes in the linear area, which must necessarily be contiguous, do we not fill the first two page sized buffers completely? The remaining 808 bytes must then have its own buffer. Hrm, I suppose that''s about 27% wasted over the three pages. If we are doing something worse than that though then we have a bug somewhere (nb: older netbacks would only fill the first 2048 bytes of each buffer, the wastage is presumably phenomenal in that case ;-), MAX_BUFFER_OFFSET is now =PAGE_SIZE though) Unless I''ve misunderstood this thread and we are considering packing data from multiple skbs into a single buffer? (i.e. the remaining 4096-808=3288 bytes in the third buffer above would contain data for the next skb). Does the ring protocol even allow for that possibility? It seems like a path to complexity to me.> > The cost of splitting a copy into two should be low though, the copies > > are already batched into a single hypercall and I''d expect things to be > > mostly dominated by the data copy itself rather than the setup of each > > individual op, which would argue for splitting a copy in two if that > > helps fill the buffers. > > That was my thought as well. We''re testing a patch that does just this > now. > > > The flip side is that once you get past the headers etc the paged frags > > likely tend to either bits and bobs (fine) or mostly whole pages. In the > > whole pages case trying to fill the buffers will result in every copy > > getting split. My gut tells me that the whole pages case probably > > dominates, but I''m not sure what the real world impact of splitting all > > the copies would be. > > Right, I''m less concerned about the paged frags. It might make sense > to skip some space so that the copying can be page aligned. I suppose > it depends on how many defferent pages are in the list, and what the > total size is. > > In practice I''d think it would be rare to see a paged SKB for ingress > traffic to domUs unless there is significant intra-host communication > (dom0->domU, domU->domU). When domU ingress traffic is originating > from an Ethernet device it shouldn''t be paged. Paged SKBs would come > into play when a SKB is formed for transmit on an egress device that > is SG-capable. Or am I misunderstanding how paged SKBs are used these > days?I think it depends on the hardware and/or driver. IIRC some devices push down frag zero into the device for RX DMA and then share it with the linear area (I think this might have something to do with making LRO or GRO easier/workable). Also things such as GRO can commonly cause received skbs being passed up the stack to contain several frags. I''m not quite sure how this works but in the case of s/w GRO I wouldn''t be surprised if this resulted in a skb with lots of 1500 byte (i.e. wire MTU) frags. I think we would end up at least coalescing those two per buffer on transmit (3000/4096 = 73% filling the page). Doing better would either need start_new_rx_buffer to always completely fill each buffer or to take a much more global view of the frags (e.g. taking the size of the next frag and how it fits into consideration too). Ian.
Matt Wilson
2012-Dec-18 19:43 UTC
Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used
On Tue, Dec 18, 2012 at 10:02:48AM +0000, Ian Campbell wrote:> On Mon, 2012-12-17 at 20:09 +0000, Matt Wilson wrote: > > On Mon, Dec 17, 2012 at 11:26:38AM +0000, Ian Campbell wrote:[...]> > > Do you mean the ring or the actual buffers? > > > > Sorry, the actual buffers. > > > > > The current code tries to coalesce multiple small frags/heads because it > > > is usually trivial but doesn''t try too hard with multiple larger frags, > > > since they take up most of a page by themselves anyway. I suppose this > > > does waste a bit of buffer space and therefore could take more ring > > > slots, but it''s not clear to me how much this matters in practice (it > > > might be tricky to measure this with any realistic workload). > > > > In the case where we''re consistently handling large heads (like when > > using a MTU value of 9000 for streaming traffic), we''re wasting 1/3 of > > the available buffers. > > Sorry if I missed this earlier in the thread, but how do we end up > wasting so much?I see SKBs with: skb_headlen(skb) == 8157 offset_in_page(skb->data) == 64 when handling long streaming ingress flows from ixgbe with MTU (on the NIC and both sides of the VIF) set to 9000. When all the SKBs making up the flow have the above property, xen-netback uses 3 pages instead of two. The first buffer gets 4032 bytes copied into it. The next buffer gets 4096 bytes copied into it. The final buffer gets 29 bytes copied into it. See this post in the archives for a more detailed walk through netbk_gop_frag_copy(): http://lists.xen.org/archives/html/xen-devel/2012-12/msg00274.html> For an skb with 9000 bytes in the linear area, which must necessarily be > contiguous, do we not fill the first two page sized buffers completely? > The remaining 808 bytes must then have its own buffer. Hrm, I suppose > that''s about 27% wasted over the three pages. If we are doing something > worse than that though then we have a bug somewhere (nb: older netbacks > would only fill the first 2048 bytes of each buffer, the wastage is > presumably phenomenal in that case ;-), MAX_BUFFER_OFFSET is now => PAGE_SIZE though)Sorry, I should have said 8157 bytes for my example. :-)> Unless I''ve misunderstood this thread and we are considering packing > data from multiple skbs into a single buffer? (i.e. the remaining > 4096-808=3288 bytes in the third buffer above would contain data for the > next skb). Does the ring protocol even allow for that possibility? It > seems like a path to complexity to me.No, I''m not suggesting that we come up with an extension to pack the next skb into any remaining space left over from the current one. I agree that would make for a lot of complexity managing the ratio of meta slots to buffers, etc.> > > The cost of splitting a copy into two should be low though, the copies > > > are already batched into a single hypercall and I''d expect things to be > > > mostly dominated by the data copy itself rather than the setup of each > > > individual op, which would argue for splitting a copy in two if that > > > helps fill the buffers. > > > > That was my thought as well. We''re testing a patch that does just this > > now. > > > > > The flip side is that once you get past the headers etc the paged frags > > > likely tend to either bits and bobs (fine) or mostly whole pages. In the > > > whole pages case trying to fill the buffers will result in every copy > > > getting split. My gut tells me that the whole pages case probably > > > dominates, but I''m not sure what the real world impact of splitting all > > > the copies would be. > > > > Right, I''m less concerned about the paged frags. It might make sense > > to skip some space so that the copying can be page aligned. I suppose > > it depends on how many defferent pages are in the list, and what the > > total size is. > > > > In practice I''d think it would be rare to see a paged SKB for ingress > > traffic to domUs unless there is significant intra-host communication > > (dom0->domU, domU->domU). When domU ingress traffic is originating > > from an Ethernet device it shouldn''t be paged. Paged SKBs would come > > into play when a SKB is formed for transmit on an egress device that > > is SG-capable. Or am I misunderstanding how paged SKBs are used these > > days? > > I think it depends on the hardware and/or driver. IIRC some devices push > down frag zero into the device for RX DMA and then share it with the > linear area (I think this might have something to do with making LRO or > GRO easier/workable). > > Also things such as GRO can commonly cause received skbs being passed up > the stack to contain several frags. > > I''m not quite sure how this works but in the case of s/w GRO I wouldn''t > be surprised if this resulted in a skb with lots of 1500 byte (i.e. wire > MTU) frags. I think we would end up at least coalescing those two per > buffer on transmit (3000/4096 = 73% filling the page). > > Doing better would either need start_new_rx_buffer to always completely > fill each buffer or to take a much more global view of the frags (e.g. > taking the size of the next frag and how it fits into consideration > too).What''s the down side to making start_new_rx_buffer() always try to fill each buffer? Matt
Ian Campbell
2012-Dec-20 10:05 UTC
Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used
On Tue, 2012-12-18 at 19:43 +0000, Matt Wilson wrote:> On Tue, Dec 18, 2012 at 10:02:48AM +0000, Ian Campbell wrote: > > On Mon, 2012-12-17 at 20:09 +0000, Matt Wilson wrote: > > > On Mon, Dec 17, 2012 at 11:26:38AM +0000, Ian Campbell wrote: > [...] > > > > Do you mean the ring or the actual buffers? > > > > > > Sorry, the actual buffers. > > > > > > > The current code tries to coalesce multiple small frags/heads because it > > > > is usually trivial but doesn''t try too hard with multiple larger frags, > > > > since they take up most of a page by themselves anyway. I suppose this > > > > does waste a bit of buffer space and therefore could take more ring > > > > slots, but it''s not clear to me how much this matters in practice (it > > > > might be tricky to measure this with any realistic workload). > > > > > > In the case where we''re consistently handling large heads (like when > > > using a MTU value of 9000 for streaming traffic), we''re wasting 1/3 of > > > the available buffers. > > > > Sorry if I missed this earlier in the thread, but how do we end up > > wasting so much? > > I see SKBs with: > skb_headlen(skb) == 8157 > offset_in_page(skb->data) == 64 > > when handling long streaming ingress flows from ixgbe with MTU (on the > NIC and both sides of the VIF) set to 9000. When all the SKBs making > up the flow have the above property, xen-netback uses 3 pages instead > of two. The first buffer gets 4032 bytes copied into it. The next > buffer gets 4096 bytes copied into it. The final buffer gets 29 bytes > copied into it. See this post in the archives for a more detailed > walk through netbk_gop_frag_copy(): > http://lists.xen.org/archives/html/xen-devel/2012-12/msg00274.htmlThanks. This certainly seems wrong for the head bit.> What''s the down side to making start_new_rx_buffer() always try to > fill each buffer?As we discussed earlier in the thread it doubles the number of copy ops per frag under some circumstances, my gut is that this isn''t going to hurt but that''s just my gut. It seems obviously right that the linear part of the SKB should always fill entire buffers though. Perhaps the answer is to differentiate between the skb->data and the frags? Ian.
Matt Wilson
2012-Dec-20 21:42 UTC
Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used
On Thu, Dec 20, 2012 at 10:05:29AM +0000, Ian Campbell wrote:> On Tue, 2012-12-18 at 19:43 +0000, Matt Wilson wrote:[...]> > I see SKBs with: > > skb_headlen(skb) == 8157 > > offset_in_page(skb->data) == 64 > > > > when handling long streaming ingress flows from ixgbe with MTU (on the > > NIC and both sides of the VIF) set to 9000. When all the SKBs making > > up the flow have the above property, xen-netback uses 3 pages instead > > of two. The first buffer gets 4032 bytes copied into it. The next > > buffer gets 4096 bytes copied into it. The final buffer gets 29 bytes > > copied into it. See this post in the archives for a more detailed > > walk through netbk_gop_frag_copy(): > > http://lists.xen.org/archives/html/xen-devel/2012-12/msg00274.html > > Thanks. This certainly seems wrong for the head bit. > > > What''s the down side to making start_new_rx_buffer() always try to > > fill each buffer? > > As we discussed earlier in the thread it doubles the number of copy ops > per frag under some circumstances, my gut is that this isn''t going to > hurt but that''s just my gut. > > It seems obviously right that the linear part of the SKB should always > fill entire buffers though. Perhaps the answer is to differentiate > between the skb->data and the frags?We''ve written a patch that does exactly that. It''s stable and performs well in our testing so far. I''ll need to forward port it to the latest Linux tree, test it there, and post. Matt