thr3ads.net - Xen devel - [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used [Aug 2012]

If this information is useful, please help other people find it:
Share via:

Palagummi, Siva

2012-Aug-29 12:21 UTC

[PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

This patch contains the modifications that are discussed in thread
http://lists.xen.org/archives/html/xen-devel/2012-08/msg01730.html

Ian,
Instead of using max_required_rx_slots, I used the count that we already have in
hand to verify if we have enough room in the batch queue for next skb. Please
let me know if that is not appropriate. Things worked fine in my environment.
Under heavy load now we seems to be consuming most of the slots in the queue and
no BUG_ON :-)

Thanks
Siva



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Ian Campbell

2012-Aug-30 08:07 UTC

head link

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

On Wed, 2012-08-29 at 13:21 +0100, Palagummi, Siva
wrote:> This patch contains the modifications that are discussed in thread
> http://lists.xen.org/archives/html/xen-devel/2012-08/msg01730.html
Thanks.

Please can you find a way to include your patches inline rather than as
attachments, it makes reply for review much easier.
Documentation/email-clients.txt has some hints for various clients or
you can just use the "git send-email" command (perhaps with "git
format-patch").

You should also CC the netdev@vger.kernel.org list.
> Instead of using max_required_rx_slots, I used the count that we
> already have in hand to verify if we have enough room in the batch
> queue for next skb. Please let me know if that is not appropriate.
> Things worked fine in my environment. Under heavy load now we seems to
> be consuming most of the slots in the queue and no BUG_ON :-)
> From: Siva Palagummi <Siva.Palagummi@ca.com>
> 
> count variable in xen_netbk_rx_action need to be incremented
> correctly to take into account of extra slots required when skb_headlen is 
> greater than PAGE_SIZE when larger MTU values are used. Without this change
> BUG_ON(npo.meta_prod > ARRAY_SIZE(netbk->meta)) is causing netback
thread
> to exit.
> 
> The fix is to stash the counting already done in xen_netbk_count_skb_slots
> in skb_cb_overlay and use it directly in xen_netbk_rx_action.
> 
> Also improved the checking for filling the batch queue. 
> 
> Also merged a change from a patch created for xen_netbk_count_skb_slots 
> function as per thread 
> http://lists.xen.org/archives/html/xen-devel/2012-05/msg01864.html
> 
> The problem is seen with linux 3.2.2 kernel on Intel 10Gbps network
> 
> 
> Signed-off-by: Siva Palagummi <Siva.Palagummi@ca.com>
> ---
> diff -uprN a/drivers/net/xen-netback/netback.c
b/drivers/net/xen-netback/netback.c
> --- a/drivers/net/xen-netback/netback.c	2012-01-25 19:39:32.000000000 -0500
> +++ b/drivers/net/xen-netback/netback.c	2012-08-28 17:31:22.000000000 -0400
> @@ -80,6 +80,11 @@ union page_ext {
>  	void *mapping;
>  };
>  
> +struct skb_cb_overlay {
> +	int meta_slots_used;
> +	int count;
> +};
> +
>  struct xen_netbk {
>  	wait_queue_head_t wq;
>  	struct task_struct *task;
> @@ -324,9 +329,9 @@ unsigned int xen_netbk_count_skb_slots(s
>  {
>  	unsigned int count;
>  	int i, copy_off;
> +	struct skb_cb_overlay *sco;
>  
> -	count = DIV_ROUND_UP(
> -			offset_in_page(skb->data)+skb_headlen(skb), PAGE_SIZE);
> +	count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);
This hunk appears to be upstream already (see
e26b203ede31fffd52571a5ba607a26c79dc5c0d). Which tree are you working
against? You should either base patches on Linus'' branch or on the
net-next branch.

Other than this the patch looks good, thanks.
>  
>  	copy_off = skb_headlen(skb) % PAGE_SIZE;
>  
> @@ -352,6 +357,8 @@ unsigned int xen_netbk_count_skb_slots(s
>  			size -= bytes;
>  		}
>  	}
> +	sco = (struct skb_cb_overlay *)skb->cb;
> +	sco->count = count;
>  	return count;
>  }
>  
> @@ -586,9 +593,6 @@ static void netbk_add_frag_responses(str
>  	}
>  }
>  
> -struct skb_cb_overlay {
> -	int meta_slots_used;
> -};
>  
>  static void xen_netbk_rx_action(struct xen_netbk *netbk)
>  {
> @@ -621,12 +625,16 @@ static void xen_netbk_rx_action(struct x
>  		sco = (struct skb_cb_overlay *)skb->cb;
>  		sco->meta_slots_used = netbk_gop_skb(skb, &npo);
>  
> -		count += nr_frags + 1;
> +		count += sco->count;
>  
>  		__skb_queue_tail(&rxq, skb);
>  
> +		skb = skb_peek(&netbk->rx_queue);
> +		if (skb == NULL)
> +			break;
> +		sco = (struct skb_cb_overlay *)skb->cb;
>  		/* Filled the batch queue? */
> -		if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE)
> +		if (count + sco->count >= XEN_NETIF_RX_RING_SIZE)
>  			break;
>  	}
>

Palagummi, Siva

2012-Aug-30 10:26 UTC

head link

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

Ian,

I was using ms exchange client which is converting tabs into spaces and so on.
That''s why I sent it as an attachment. I will find a way to send the
patch inline and resubmit. Thanks for reviewing it.

--Siva


> -----Original Message-----
> From: Ian Campbell [mailto:Ian.Campbell@citrix.com]
> Sent: Thursday, August 30, 2012 1:37 PM
> To: Palagummi, Siva
> Cc: xen-devel@lists.xen.org
> Subject: Re: [PATCH RFC V2] xen/netback: Count ring slots properly when
> larger MTU sizes are used
> 
> On Wed, 2012-08-29 at 13:21 +0100, Palagummi, Siva wrote:
> > This patch contains the modifications that are discussed in thread
> > http://lists.xen.org/archives/html/xen-devel/2012-08/msg01730.html
> 
> Thanks.
> 
> Please can you find a way to include your patches inline rather than as
> attachments, it makes reply for review much easier.
> Documentation/email-clients.txt has some hints for various clients or
> you can just use the "git send-email" command (perhaps with
"git
> format-patch").
> 
> You should also CC the netdev@vger.kernel.org list.
> 
> > Instead of using max_required_rx_slots, I used the count that we
> > already have in hand to verify if we have enough room in the batch
> > queue for next skb. Please let me know if that is not appropriate.
> > Things worked fine in my environment. Under heavy load now we seems
> to
> > be consuming most of the slots in the queue and no BUG_ON :-)
> 
> 
> > From: Siva Palagummi <Siva.Palagummi@ca.com>
> >
> > count variable in xen_netbk_rx_action need to be incremented
> > correctly to take into account of extra slots required when
> skb_headlen is
> > greater than PAGE_SIZE when larger MTU values are used. Without this
> change
> > BUG_ON(npo.meta_prod > ARRAY_SIZE(netbk->meta)) is causing
netback
> thread
> > to exit.
> >
> > The fix is to stash the counting already done in
> xen_netbk_count_skb_slots
> > in skb_cb_overlay and use it directly in xen_netbk_rx_action.
> >
> > Also improved the checking for filling the batch queue.
> >
> > Also merged a change from a patch created for
> xen_netbk_count_skb_slots
> > function as per thread
> > http://lists.xen.org/archives/html/xen-devel/2012-05/msg01864.html
> >
> > The problem is seen with linux 3.2.2 kernel on Intel 10Gbps network
> >
> >
> > Signed-off-by: Siva Palagummi <Siva.Palagummi@ca.com>
> > ---
> > diff -uprN a/drivers/net/xen-netback/netback.c b/drivers/net/xen-
> netback/netback.c
> > --- a/drivers/net/xen-netback/netback.c	2012-01-25 19:39:32.000000000
> -0500
> > +++ b/drivers/net/xen-netback/netback.c	2012-08-28 17:31:22.000000000
> -0400
> > @@ -80,6 +80,11 @@ union page_ext {
> >  	void *mapping;
> >  };
> >
> > +struct skb_cb_overlay {
> > +	int meta_slots_used;
> > +	int count;
> > +};
> > +
> >  struct xen_netbk {
> >  	wait_queue_head_t wq;
> >  	struct task_struct *task;
> > @@ -324,9 +329,9 @@ unsigned int xen_netbk_count_skb_slots(s
> >  {
> >  	unsigned int count;
> >  	int i, copy_off;
> > +	struct skb_cb_overlay *sco;
> >
> > -	count = DIV_ROUND_UP(
> > -			offset_in_page(skb->data)+skb_headlen(skb),
> PAGE_SIZE);
> > +	count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);
> 
> This hunk appears to be upstream already (see
> e26b203ede31fffd52571a5ba607a26c79dc5c0d). Which tree are you working
> against? You should either base patches on Linus'' branch or on the
> net-next branch.
> 
> Other than this the patch looks good, thanks.
> 
> >
> >  	copy_off = skb_headlen(skb) % PAGE_SIZE;
> >
> > @@ -352,6 +357,8 @@ unsigned int xen_netbk_count_skb_slots(s
> >  			size -= bytes;
> >  		}
> >  	}
> > +	sco = (struct skb_cb_overlay *)skb->cb;
> > +	sco->count = count;
> >  	return count;
> >  }
> >
> > @@ -586,9 +593,6 @@ static void netbk_add_frag_responses(str
> >  	}
> >  }
> >
> > -struct skb_cb_overlay {
> > -	int meta_slots_used;
> > -};
> >
> >  static void xen_netbk_rx_action(struct xen_netbk *netbk)
> >  {
> > @@ -621,12 +625,16 @@ static void xen_netbk_rx_action(struct x
> >  		sco = (struct skb_cb_overlay *)skb->cb;
> >  		sco->meta_slots_used = netbk_gop_skb(skb, &npo);
> >
> > -		count += nr_frags + 1;
> > +		count += sco->count;
> >
> >  		__skb_queue_tail(&rxq, skb);
> >
> > +		skb = skb_peek(&netbk->rx_queue);
> > +		if (skb == NULL)
> > +			break;
> > +		sco = (struct skb_cb_overlay *)skb->cb;
> >  		/* Filled the batch queue? */
> > -		if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE)
> > +		if (count + sco->count >= XEN_NETIF_RX_RING_SIZE)
> >  			break;
> >  	}
> >
>

Matt Wilson

2012-Dec-04 23:23 UTC

head link

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

On Thu, Aug 30, 2012 at 09:07:11AM +0100, Ian Campbell
wrote:> On Wed, 2012-08-29 at 13:21 +0100, Palagummi, Siva wrote:
> > This patch contains the modifications that are discussed in thread
> > http://lists.xen.org/archives/html/xen-devel/2012-08/msg01730.html
[...]
> > Instead of using max_required_rx_slots, I used the count that we
> > already have in hand to verify if we have enough room in the batch
> > queue for next skb. Please let me know if that is not appropriate.
> > Things worked fine in my environment. Under heavy load now we seems to
> > be consuming most of the slots in the queue and no BUG_ON :-)
> 
> > From: Siva Palagummi <Siva.Palagummi@ca.com>
> > 
> > count variable in xen_netbk_rx_action need to be incremented
> > correctly to take into account of extra slots required when
skb_headlen is
> > greater than PAGE_SIZE when larger MTU values are used. Without this
change
> > BUG_ON(npo.meta_prod > ARRAY_SIZE(netbk->meta)) is causing
netback thread
> > to exit.
> > 
> > The fix is to stash the counting already done in
xen_netbk_count_skb_slots
> > in skb_cb_overlay and use it directly in xen_netbk_rx_action.
You don''t need to stash the estimated value to use it in
xen_netbk_rx_action() - you have the actual number of slots consumed
in hand from the return value of netbk_gop_skb(). See below.
> > Also improved the checking for filling the batch queue. 
> > 
> > Also merged a change from a patch created for
xen_netbk_count_skb_slots
> > function as per thread 
> > http://lists.xen.org/archives/html/xen-devel/2012-05/msg01864.html
> > 
> > The problem is seen with linux 3.2.2 kernel on Intel 10Gbps network
> > 
> > 
> > Signed-off-by: Siva Palagummi <Siva.Palagummi@ca.com>
> > ---
> > diff -uprN a/drivers/net/xen-netback/netback.c
b/drivers/net/xen-netback/netback.c
> > --- a/drivers/net/xen-netback/netback.c	2012-01-25 19:39:32.000000000
-0500
> > +++ b/drivers/net/xen-netback/netback.c	2012-08-28 17:31:22.000000000
-0400
> > @@ -80,6 +80,11 @@ union page_ext {
> >  	void *mapping;
> >  };
> >  
> > +struct skb_cb_overlay {
> > +	int meta_slots_used;
> > +	int count;
> > +};
We don''t actually need a separate variable for the estimate. We could
rename meta_slots_used to meta_slots. It could hold the estimate
before netbk_gop_skb() is called and the actual number of slots
consumed after. That might be confusing, though, so maybe it''s better
off left as two variables.
> >  struct xen_netbk {
> >  	wait_queue_head_t wq;
> >  	struct task_struct *task;
> > @@ -324,9 +329,9 @@ unsigned int xen_netbk_count_skb_slots(s
> >  {
> >  	unsigned int count;
> >  	int i, copy_off;
> > +	struct skb_cb_overlay *sco;
> >  
> > -	count = DIV_ROUND_UP(
> > -			offset_in_page(skb->data)+skb_headlen(skb), PAGE_SIZE);
> > +	count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);
> 
> This hunk appears to be upstream already (see
> e26b203ede31fffd52571a5ba607a26c79dc5c0d). Which tree are you working
> against? You should either base patches on Linus'' branch or on the
> net-next branch.
> 
> Other than this the patch looks good, thanks.
I don''t think that this patch completely addresses problems
calculating the number of slots required when large MTUs are used.

For example: if netback is handling a skb with a large linear data
portion, say 8157 bytes, that begins at 64 bytes from the start of the
page. Assume that GSO is disabled and there are no frags.
xen_netbk_count_skb_slots() will calculate that two slots are needed:

    count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);

but netbk_gop_frag_copy() will actually use three. Let''s walk through
the loop:

        data = skb->data;
        while (data < skb_tail_pointer(skb)) {
                unsigned int offset = offset_in_page(data);
                unsigned int len = PAGE_SIZE - offset;

                if (data + len > skb_tail_pointer(skb))
                        len = skb_tail_pointer(skb) - data;

                netbk_gop_frag_copy(vif, skb, npo,
                                    virt_to_page(data), len, offset, &head);
                data += len;
        }

The first pass will call netbk_gop_frag_copy() with len=4032,
offset=64, and head=1. After the call, head will be set to 0. Inside
netbk_gop_frag_copy(), start_new_rx_buffer() will be called with
offset=0, size=4032, head=1. We''ll return false due to the checks for
"offset" and "!head".

The second pass will call netbk_gop_frag_copy() with len=4096,
offset=0, head=0. netbk_gop_frag_copy() will get called with
offset=4032, bytes=4096, head=0. We''ll return true here, which we
shouldn''t since it''s just going to lead to buffer waste for
the last
bit.

The third pass will call with len=29 and offset=0. start_new_rx_buffer()
will be called with offset=4096, bytes=29, head=0. We''ll start a new
buffer for the last bit.

So you can see that we underestimate the buffers / meta slots required
to handle a skb with a large linear buffer, as we commonly have to
handle with large MTU sizes. This can lead to problems later on.

I''m not as familiar with the new compound page frag handling code, but
I can imagine that the same problem could exist there. But since the
calculation logic in xen_netbk_count_skb_slots() directly models the
code setting up the copy operation, at least it will be estimated
properly.
> >  
> >  	copy_off = skb_headlen(skb) % PAGE_SIZE;
> >  
> > @@ -352,6 +357,8 @@ unsigned int xen_netbk_count_skb_slots(s
> >  			size -= bytes;
> >  		}
> >  	}
> > +	sco = (struct skb_cb_overlay *)skb->cb;
> > +	sco->count = count;
> >  	return count;
> >  }
> >  
> > @@ -586,9 +593,6 @@ static void netbk_add_frag_responses(str
> >  	}
> >  }
> >  
> > -struct skb_cb_overlay {
> > -	int meta_slots_used;
> > -};
> >  
> >  static void xen_netbk_rx_action(struct xen_netbk *netbk)
> >  {
> > @@ -621,12 +625,16 @@ static void xen_netbk_rx_action(struct x
> >  		sco = (struct skb_cb_overlay *)skb->cb;
> >  		sco->meta_slots_used = netbk_gop_skb(skb, &npo);
> >  
> > -		count += nr_frags + 1;
> > +		count += sco->count;
Why increment count by the /estimated/ count instead of the actual
number of slots used? We have the number of slots in the line just
above, in sco->meta_slots_used.
> >  		__skb_queue_tail(&rxq, skb);
> >  
> > +		skb = skb_peek(&netbk->rx_queue);
> > +		if (skb == NULL)
> > +			break;
> > +		sco = (struct skb_cb_overlay *)skb->cb;
> >  		/* Filled the batch queue? */
> > -		if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE)
> > +		if (count + sco->count >= XEN_NETIF_RX_RING_SIZE)
> >  			break;
> >  	}
> >  
This change I like.

We''re working on a patch to improve the buffer efficiency and the
miscalculation problem. Siva, I''d be happy to re-base and re-submit
this patch (with minor adjustments) as part of that work, unless you
want to handle that.

Matt

Palagummi, Siva

2012-Dec-05 11:56 UTC

head link

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

Matt,
> -----Original Message-----
> From: Matt Wilson [mailto:msw@amazon.com]
> Sent: Wednesday, December 05, 2012 4:53 AM
> To: Ian Campbell; Palagummi, Siva
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH RFC V2] xen/netback: Count ring slots
> properly when larger MTU sizes are used
> 
> On Thu, Aug 30, 2012 at 09:07:11AM +0100, Ian Campbell wrote:
> > On Wed, 2012-08-29 at 13:21 +0100, Palagummi, Siva wrote:
> > > This patch contains the modifications that are discussed in
thread
> > >
http://lists.xen.org/archives/html/xen-devel/2012-08/msg01730.html
> 
> [...]
> 
> > > Instead of using max_required_rx_slots, I used the count that we
> > > already have in hand to verify if we have enough room in the
batch
> > > queue for next skb. Please let me know if that is not
appropriate.
> > > Things worked fine in my environment. Under heavy load now we
seems
> to
> > > be consuming most of the slots in the queue and no BUG_ON :-)
> >
> > > From: Siva Palagummi <Siva.Palagummi@ca.com>
> > >
> > > count variable in xen_netbk_rx_action need to be incremented
> > > correctly to take into account of extra slots required when
> skb_headlen is
> > > greater than PAGE_SIZE when larger MTU values are used. Without
> this change
> > > BUG_ON(npo.meta_prod > ARRAY_SIZE(netbk->meta)) is causing
netback
> thread
> > > to exit.
> > >
> > > The fix is to stash the counting already done in
> xen_netbk_count_skb_slots
> > > in skb_cb_overlay and use it directly in xen_netbk_rx_action.
> 
> You don''t need to stash the estimated value to use it in
> xen_netbk_rx_action() - you have the actual number of slots consumed
> in hand from the return value of netbk_gop_skb(). See below.
> 
> > > Also improved the checking for filling the batch queue.
> > >
> > > Also merged a change from a patch created for
> xen_netbk_count_skb_slots
> > > function as per thread
> > >
http://lists.xen.org/archives/html/xen-devel/2012-05/msg01864.html
> > >
> > > The problem is seen with linux 3.2.2 kernel on Intel 10Gbps
network
> > >
> > >
> > > Signed-off-by: Siva Palagummi <Siva.Palagummi@ca.com>
> > > ---
> > > diff -uprN a/drivers/net/xen-netback/netback.c b/drivers/net/xen-
> netback/netback.c
> > > --- a/drivers/net/xen-netback/netback.c	2012-01-25
> 19:39:32.000000000 -0500
> > > +++ b/drivers/net/xen-netback/netback.c	2012-08-28
> 17:31:22.000000000 -0400
> > > @@ -80,6 +80,11 @@ union page_ext {
> > >  	void *mapping;
> > >  };
> > >
> > > +struct skb_cb_overlay {
> > > +	int meta_slots_used;
> > > +	int count;
> > > +};
> 
> We don''t actually need a separate variable for the estimate. We
could
> rename meta_slots_used to meta_slots. It could hold the estimate
> before netbk_gop_skb() is called and the actual number of slots
> consumed after. That might be confusing, though, so maybe it''s
better
> off left as two variables.
> 
> > >  struct xen_netbk {
> > >  	wait_queue_head_t wq;
> > >  	struct task_struct *task;
> > > @@ -324,9 +329,9 @@ unsigned int xen_netbk_count_skb_slots(s
> > >  {
> > >  	unsigned int count;
> > >  	int i, copy_off;
> > > +	struct skb_cb_overlay *sco;
> > >
> > > -	count = DIV_ROUND_UP(
> > > -			offset_in_page(skb->data)+skb_headlen(skb),
> PAGE_SIZE);
> > > +	count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);
> >
> > This hunk appears to be upstream already (see
> > e26b203ede31fffd52571a5ba607a26c79dc5c0d). Which tree are you working
> > against? You should either base patches on Linus'' branch or
on the
> > net-next branch.
> >
> > Other than this the patch looks good, thanks.
> 
> I don''t think that this patch completely addresses problems
> calculating the number of slots required when large MTUs are used.
> 
> For example: if netback is handling a skb with a large linear data
> portion, say 8157 bytes, that begins at 64 bytes from the start of the
> page. Assume that GSO is disabled and there are no frags.
> xen_netbk_count_skb_slots() will calculate that two slots are needed:
> 
>     count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);
> 
> but netbk_gop_frag_copy() will actually use three. Let''s walk
through
> the loop:You are right. The above chunk which is already part of the upstream is
unfortunately incorrect for some cases. We also ran into issues in our
environment around a week back and found this problem. The count will be
different based on head len because of the optimization that start_new_rx_buffer
is trying to do for large buffers.  A hole of size "offset_in_page"
will be left in first page during copy if the remaining buffer size is
>=PAG_SIZE. This subsequently affects the copy_off as well.


So xen_netbk_count_skb_slots actually needs a fix to calculate the count
correctly based on head len. And also a fix to calculate the copy_off properly
to which the data from fragments gets copied.

max_required_rx_slots also may require a fix to account the additional slot that
may be required in case mtu >= PAG_SIZE. For worst case scenario atleast
another +1.  One thing that is still puzzling here is, max_required_rx_slots
seems to be assuming that linear length in head will never be greater than mtu
size. But that doesn''t seem to be the case all the time. I wonder if it
requires some kind of fix there or special handling when count_skb_slots exceeds
max_required_rx_slots.
> 
>         data = skb->data;
>         while (data < skb_tail_pointer(skb)) {
>                 unsigned int offset = offset_in_page(data);
>                 unsigned int len = PAGE_SIZE - offset;
> 
>                 if (data + len > skb_tail_pointer(skb))
>                         len = skb_tail_pointer(skb) - data;
> 
>                 netbk_gop_frag_copy(vif, skb, npo,
>                                     virt_to_page(data), len, offset,
> &head);
>                 data += len;
>         }
> 
> The first pass will call netbk_gop_frag_copy() with len=4032,
> offset=64, and head=1. After the call, head will be set to 0. Inside
> netbk_gop_frag_copy(), start_new_rx_buffer() will be called with
> offset=0, size=4032, head=1. We''ll return false due to the checks
for
> "offset" and "!head".
> 
> The second pass will call netbk_gop_frag_copy() with len=4096,
> offset=0, head=0. netbk_gop_frag_copy() will get called with
> offset=4032, bytes=4096, head=0. We''ll return true here, which we
> shouldn''t since it''s just going to lead to buffer waste
for the last
> bit.
> 
> The third pass will call with len=29 and offset=0.
> start_new_rx_buffer()
> will be called with offset=4096, bytes=29, head=0. We''ll start a
new
> buffer for the last bit.
> 
> So you can see that we underestimate the buffers / meta slots required
> to handle a skb with a large linear buffer, as we commonly have to
> handle with large MTU sizes. This can lead to problems later on.
> 
> I''m not as familiar with the new compound page frag handling code,
but
> I can imagine that the same problem could exist there. But since the
> calculation logic in xen_netbk_count_skb_slots() directly models the
> code setting up the copy operation, at least it will be estimated
> properly.
> 
> > >
> > >  	copy_off = skb_headlen(skb) % PAGE_SIZE;
> > >
> > > @@ -352,6 +357,8 @@ unsigned int xen_netbk_count_skb_slots(s
> > >  			size -= bytes;
> > >  		}
> > >  	}
> > > +	sco = (struct skb_cb_overlay *)skb->cb;
> > > +	sco->count = count;
> > >  	return count;
> > >  }
> > >
> > > @@ -586,9 +593,6 @@ static void netbk_add_frag_responses(str
> > >  	}
> > >  }
> > >
> > > -struct skb_cb_overlay {
> > > -	int meta_slots_used;
> > > -};
> > >
> > >  static void xen_netbk_rx_action(struct xen_netbk *netbk)
> > >  {
> > > @@ -621,12 +625,16 @@ static void xen_netbk_rx_action(struct x
> > >  		sco = (struct skb_cb_overlay *)skb->cb;
> > >  		sco->meta_slots_used = netbk_gop_skb(skb, &npo);
> > >
> > > -		count += nr_frags + 1;
> > > +		count += sco->count;
> 
> Why increment count by the /estimated/ count instead of the actual
> number of slots used? We have the number of slots in the line just
> above, in sco->meta_slots_used.
> Count actually refers to ring slots consumed rather than meta_slots used.  Count
can be different from meta_slots_used.
> > >  		__skb_queue_tail(&rxq, skb);
> > >
> > > +		skb = skb_peek(&netbk->rx_queue);
> > > +		if (skb == NULL)
> > > +			break;
> > > +		sco = (struct skb_cb_overlay *)skb->cb;
> > >  		/* Filled the batch queue? */
> > > -		if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE)
> > > +		if (count + sco->count >= XEN_NETIF_RX_RING_SIZE)
> > >  			break;
> > >  	}
> > >
> 
> This change I like.
> 
> We''re working on a patch to improve the buffer efficiency and the
> miscalculation problem. Siva, I''d be happy to re-base and
re-submit
> this patch (with minor adjustments) as part of that work, unless you
> want to handle that.
> 
> Matt
Thanks!!  Please feel free to re-base and re-submit :-)

Siva

Matt Wilson

2012-Dec-06 05:35 UTC

head link

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

On Wed, Dec 05, 2012 at 11:56:32AM +0000, Palagummi, Siva
wrote:> Matt,
[...]> You are right. The above chunk which is already part of the upstream
> is unfortunately incorrect for some cases. We also ran into issues
> in our environment around a week back and found this problem. The
> count will be different based on head len because of the
> optimization that start_new_rx_buffer is trying to do for large
> buffers.  A hole of size "offset_in_page" will be left in first
page
> during copy if the remaining buffer size is >=PAG_SIZE. This
> subsequently affects the copy_off as well.
>
> So xen_netbk_count_skb_slots actually needs a fix to calculate the
> count correctly based on head len. And also a fix to calculate the
> copy_off properly to which the data from fragments gets copied.
Can you explain more about the copy_off problem? I''m not seeing it.
> max_required_rx_slots also may require a fix to account the
> additional slot that may be required in case mtu >= PAG_SIZE. For
> worst case scenario atleast another +1.  One thing that is still
> puzzling here is, max_required_rx_slots seems to be assuming that
> linear length in head will never be greater than mtu size. But that
> doesn''t seem to be the case all the time. I wonder if it requires
> some kind of fix there or special handling when count_skb_slots
> exceeds max_required_rx_slots.
We should only be using the number of pages required to copy the
data. The fix shouldn''t be to anticipate wasting ring space by
increasing the return value of max_required_rx_slots().

[...]
> > Why increment count by the /estimated/ count instead of the actual
> > number of slots used? We have the number of slots in the line just
> > above, in sco->meta_slots_used.
> > 
>
> Count actually refers to ring slots consumed rather than meta_slots
> used.  Count can be different from meta_slots_used.
Aah, indeed. This can end up being too pessimistic if you have lots of
frags that require multiple copy operations. I still think that it
would be better to calculate the actual number of ring slots consumed
by netbk_gop_skb() to avoid other bugs like the one you originally
fixed.
> > > >  		__skb_queue_tail(&rxq, skb);
> > > >
> > > > +		skb = skb_peek(&netbk->rx_queue);
> > > > +		if (skb == NULL)
> > > > +			break;
> > > > +		sco = (struct skb_cb_overlay *)skb->cb;
> > > >  		/* Filled the batch queue? */
> > > > -		if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE)
> > > > +		if (count + sco->count >= XEN_NETIF_RX_RING_SIZE)
> > > >  			break;
> > > >  	}
> > > >
> > 
> > This change I like.
> > 
> > We''re working on a patch to improve the buffer efficiency and
the
> > miscalculation problem. Siva, I''d be happy to re-base and
re-submit
> > this patch (with minor adjustments) as part of that work, unless you
> > want to handle that.
> > 
> > Matt
> 
> Thanks!!  Please feel free to re-base and re-submit :-)
OK, thanks!

Matt

Palagummi, Siva

2012-Dec-11 10:25 UTC

head link

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

> -----Original Message-----
> From: Matt Wilson [mailto:msw@amazon.com]
> Sent: Thursday, December 06, 2012 11:05 AM
> To: Palagummi, Siva
> Cc: Ian Campbell; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH RFC V2] xen/netback: Count ring slots
> properly when larger MTU sizes are used
> 
> On Wed, Dec 05, 2012 at 11:56:32AM +0000, Palagummi, Siva wrote:
> > Matt,
> [...]
> > You are right. The above chunk which is already part of the upstream
> > is unfortunately incorrect for some cases. We also ran into issues
> > in our environment around a week back and found this problem. The
> > count will be different based on head len because of the
> > optimization that start_new_rx_buffer is trying to do for large
> > buffers.  A hole of size "offset_in_page" will be left in
first page
> > during copy if the remaining buffer size is >=PAG_SIZE. This
> > subsequently affects the copy_off as well.
> >
> > So xen_netbk_count_skb_slots actually needs a fix to calculate the
> > count correctly based on head len. And also a fix to calculate the
> > copy_off properly to which the data from fragments gets copied.
> 
> Can you explain more about the copy_off problem? I''m not seeing
it.
You can clearly see below that copy_off is input to start_new_rx_buffer while
copying frags.
So if the buggy "count" calculation below is fixed based on
offset_in_page value then  copy_off value also will change accordingly.

        count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);

        copy_off = skb_headlen(skb) % PAGE_SIZE;

        if (skb_shinfo(skb)->gso_size)
                count++;

        for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
                unsigned long size =
skb_frag_size(&skb_shinfo(skb)->frags[i]);
                unsigned long bytes;
                while (size > 0) {
                        BUG_ON(copy_off > MAX_BUFFER_OFFSET);

                        if (start_new_rx_buffer(copy_off, size, 0)) {
                                count++;
                                copy_off = 0;
                        }

So a correct calculation should be somewhat like below because of the
optimization in start_new_rx_buffer for larger sizes.

      linear_len = skb_headlen(skb)
	count = (linear_len <= PAGE_SIZE)
              ? 1
              :DIV_ROUND_UP(offset_in_page(skb->data)+linear_len,
PAGE_SIZE));

      copy_off = ((offset_in_page(skb->data)+linear_len) < 2*PAGE_SIZE)
			? linear_len % PAGE_SIZE;
			: (offset_in_page(skb->data)+linear_len) % PAGE_SIZE;
> 
> > max_required_rx_slots also may require a fix to account the
> > additional slot that may be required in case mtu >= PAG_SIZE. For
> > worst case scenario atleast another +1.  One thing that is still
> > puzzling here is, max_required_rx_slots seems to be assuming that
> > linear length in head will never be greater than mtu size. But that
> > doesn''t seem to be the case all the time. I wonder if it
requires
> > some kind of fix there or special handling when count_skb_slots
> > exceeds max_required_rx_slots.
> 
> We should only be using the number of pages required to copy the
> data. The fix shouldn''t be to anticipate wasting ring space by
> increasing the return value of max_required_rx_slots().
> I do not think we are wasting any ring space. But just ensuring that we have
enough before proceeding ahead.> [...]
> 
> > > Why increment count by the /estimated/ count instead of the
actual
> > > number of slots used? We have the number of slots in the line
just
> > > above, in sco->meta_slots_used.
> > >
> >
> > Count actually refers to ring slots consumed rather than meta_slots
> > used.  Count can be different from meta_slots_used.
> 
> Aah, indeed. This can end up being too pessimistic if you have lots of
> frags that require multiple copy operations. I still think that it
> would be better to calculate the actual number of ring slots consumed
> by netbk_gop_skb() to avoid other bugs like the one you originally
> fixed.
> 

counting done in count_skb_slots is what exactly it is. The fix done above is to
make it same so that no need to re calculate again.

Thanks
Siva

> > > > >  		__skb_queue_tail(&rxq, skb);
> > > > >
> > > > > +		skb = skb_peek(&netbk->rx_queue);
> > > > > +		if (skb == NULL)
> > > > > +			break;
> > > > > +		sco = (struct skb_cb_overlay *)skb->cb;
> > > > >  		/* Filled the batch queue? */
> > > > > -		if (count + MAX_SKB_FRAGS >=
XEN_NETIF_RX_RING_SIZE)
> > > > > +		if (count + sco->count >=
XEN_NETIF_RX_RING_SIZE)
> > > > >  			break;
> > > > >  	}
> > > > >
> > >
> > > This change I like.
> > >
> > > We''re working on a patch to improve the buffer
efficiency and the
> > > miscalculation problem. Siva, I''d be happy to re-base
and re-submit
> > > this patch (with minor adjustments) as part of that work, unless
> you
> > > want to handle that.
> > >
> > > Matt
> >
> > Thanks!!  Please feel free to re-base and re-submit :-)
> 
> OK, thanks!
> 
> Matt

Matt Wilson

2012-Dec-11 21:34 UTC

head link

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

On Tue, Dec 11, 2012 at 10:25:51AM +0000, Palagummi, Siva
wrote:> > -----Original Message-----
> > From: Matt Wilson [mailto:msw@amazon.com]
> > Sent: Thursday, December 06, 2012 11:05 AM
> > To: Palagummi, Siva
> > Cc: Ian Campbell; xen-devel@lists.xen.org
> > Subject: Re: [Xen-devel] [PATCH RFC V2] xen/netback: Count ring slots
> > properly when larger MTU sizes are used
> > 
> > On Wed, Dec 05, 2012 at 11:56:32AM +0000, Palagummi, Siva wrote:
> > > Matt,
> > [...]
> > > You are right. The above chunk which is already part of the
upstream
> > > is unfortunately incorrect for some cases. We also ran into
issues
> > > in our environment around a week back and found this problem. The
> > > count will be different based on head len because of the
> > > optimization that start_new_rx_buffer is trying to do for large
> > > buffers.  A hole of size "offset_in_page" will be left
in first page
> > > during copy if the remaining buffer size is >=PAG_SIZE. This
> > > subsequently affects the copy_off as well.
> > >
> > > So xen_netbk_count_skb_slots actually needs a fix to calculate
the
> > > count correctly based on head len. And also a fix to calculate
the
> > > copy_off properly to which the data from fragments gets copied.
> > 
> > Can you explain more about the copy_off problem? I''m not
seeing it.
>
> You can clearly see below that copy_off is input to
> start_new_rx_buffer while copying frags.
Yes, but that''s the right thing to do. copy_off should be set to the
destination offset after copying the last byte of linear data, which
means "skb_headlen(skb) % PAGE_SIZE" is correct.
> So if the buggy "count" calculation below is fixed based on
> offset_in_page value then copy_off value also will change
> accordingly.
This calculation is not incorrect. You should only need as many
PAGE_SIZE buffers as you have linear data to fill.
>         count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);
> 
>         copy_off = skb_headlen(skb) % PAGE_SIZE;
> 
>         if (skb_shinfo(skb)->gso_size)
>                 count++;
> 
>         for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
>                 unsigned long size =
skb_frag_size(&skb_shinfo(skb)->frags[i]);
>                 unsigned long bytes;
>                 while (size > 0) {
>                         BUG_ON(copy_off > MAX_BUFFER_OFFSET);
> 
>                         if (start_new_rx_buffer(copy_off, size, 0)) {
>                                 count++;
>                                 copy_off = 0;
>                         }
> 
>
> So a correct calculation should be somewhat like below because of
> the optimization in start_new_rx_buffer for larger sizes.
start_new_rx_buffer() should not be starting a new buffer after the
first pass copying the linear data.
>       linear_len = skb_headlen(skb)
> 	count = (linear_len <= PAGE_SIZE)
>               ? 1
>               :DIV_ROUND_UP(offset_in_page(skb->data)+linear_len,
PAGE_SIZE));
> 
>       copy_off = ((offset_in_page(skb->data)+linear_len) <
2*PAGE_SIZE)
> 			? linear_len % PAGE_SIZE;
> 			: (offset_in_page(skb->data)+linear_len) % PAGE_SIZE;
A change like this makes the code much more difficult to understand.
> > > max_required_rx_slots also may require a fix to account the
> > > additional slot that may be required in case mtu >= PAG_SIZE.
For
> > > worst case scenario atleast another +1.  One thing that is still
> > > puzzling here is, max_required_rx_slots seems to be assuming that
> > > linear length in head will never be greater than mtu size. But
that
> > > doesn''t seem to be the case all the time. I wonder if it
requires
> > > some kind of fix there or special handling when count_skb_slots
> > > exceeds max_required_rx_slots.
> > 
> > We should only be using the number of pages required to copy the
> > data. The fix shouldn''t be to anticipate wasting ring space
by
> > increasing the return value of max_required_rx_slots().
> > 
>
> I do not think we are wasting any ring space. But just ensuring that
> we have enough before proceeding ahead.
For some SKBs with large linear buffers, we certainly are wasting
space. Go back and read the explanation in
  http://lists.xen.org/archives/html/xen-devel/2012-12/msg00274.html
> > [...]
> > 
> > > > Why increment count by the /estimated/ count instead of the
actual
> > > > number of slots used? We have the number of slots in the
line just
> > > > above, in sco->meta_slots_used.
> > > >
> > >
> > > Count actually refers to ring slots consumed rather than
meta_slots
> > > used.  Count can be different from meta_slots_used.
> > 
> > Aah, indeed. This can end up being too pessimistic if you have lots of
> > frags that require multiple copy operations. I still think that it
> > would be better to calculate the actual number of ring slots consumed
> > by netbk_gop_skb() to avoid other bugs like the one you originally
> > fixed.
> > 
>
> counting done in count_skb_slots is what exactly it is. The fix done
> above is to make it same so that no need to re calculate again.
Today, the counting done in count_skb_slots() *does not* match the
number of buffer slots consumed by netbk_gop_skb().

Matt

Palagummi, Siva

2012-Dec-13 23:12 UTC

head link

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

> -----Original Message-----
> From: Matt Wilson [mailto:msw@amazon.com]
> Sent: Wednesday, December 12, 2012 3:05 AM
> To: Palagummi, Siva
> Cc: Ian Campbell; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH RFC V2] xen/netback: Count ring slots
> properly when larger MTU sizes are used
> 
> On Tue, Dec 11, 2012 at 10:25:51AM +0000, Palagummi, Siva wrote:
> > > -----Original Message-----
> > > From: Matt Wilson [mailto:msw@amazon.com]
> > > Sent: Thursday, December 06, 2012 11:05 AM
> > > To: Palagummi, Siva
> > > Cc: Ian Campbell; xen-devel@lists.xen.org
> > > Subject: Re: [Xen-devel] [PATCH RFC V2] xen/netback: Count ring
> slots
> > > properly when larger MTU sizes are used
> > >
> > > On Wed, Dec 05, 2012 at 11:56:32AM +0000, Palagummi, Siva wrote:
> > > > Matt,
> > > [...]
> > > > You are right. The above chunk which is already part of the
> upstream
> > > > is unfortunately incorrect for some cases. We also ran into
> issues
> > > > in our environment around a week back and found this
problem. The
> > > > count will be different based on head len because of the
> > > > optimization that start_new_rx_buffer is trying to do for
large
> > > > buffers.  A hole of size "offset_in_page" will be
left in first
> page
> > > > during copy if the remaining buffer size is >=PAG_SIZE.
This
> > > > subsequently affects the copy_off as well.
> > > >
> > > > So xen_netbk_count_skb_slots actually needs a fix to
calculate
> the
> > > > count correctly based on head len. And also a fix to
calculate
> the
> > > > copy_off properly to which the data from fragments gets
copied.
> > >
> > > Can you explain more about the copy_off problem? I''m not
seeing it.
> >
> > You can clearly see below that copy_off is input to
> > start_new_rx_buffer while copying frags.
> 
> Yes, but that''s the right thing to do. copy_off should be set to
the
> destination offset after copying the last byte of linear data, which
> means "skb_headlen(skb) % PAGE_SIZE" is correct.
> 
No. It is not correct for two reasons. For example what if skb_headlen(skb) is
exactly a multiple of PAGE_SIZE. Copy_off would be set to ZERO. And now if there
exists some data in frags, ZERO will be passed in as copy_off value and
start_new_rx_buffer will return FALSE. And second reason is the obvious case
from the current code where "offset_in_page(skb->data)" size hole
will be left in the first buffer after first pass in case remaining data that
need to be copied is going to overflow the first buffer.
> > So if the buggy "count" calculation below is fixed based on
> > offset_in_page value then copy_off value also will change
> > accordingly.
> 
> This calculation is not incorrect. You should only need as many
> PAGE_SIZE buffers as you have linear data to fill.
> 
This calculation is incorrect and do not match actual slots used as it is now 
unless some new change is done either in nebk_gop_skb or in start_new_rx_buffer.
> >         count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE);
> >
> >         copy_off = skb_headlen(skb) % PAGE_SIZE;
> >
> >         if (skb_shinfo(skb)->gso_size)
> >                 count++;
> >
> >         for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> >                 unsigned long size =
skb_frag_size(&skb_shinfo(skb)-
> >frags[i]);
> >                 unsigned long bytes;
> >                 while (size > 0) {
> >                         BUG_ON(copy_off > MAX_BUFFER_OFFSET);
> >
> >                         if (start_new_rx_buffer(copy_off, size, 0)) {
> >                                 count++;
> >                                 copy_off = 0;
> >                         }
> >
> >
> > So a correct calculation should be somewhat like below because of
> > the optimization in start_new_rx_buffer for larger sizes.
> 
> start_new_rx_buffer() should not be starting a new buffer after the
> first pass copying the linear data.
> 
> >       linear_len = skb_headlen(skb)
> > 	count = (linear_len <= PAGE_SIZE)
> >               ? 1
> >               :DIV_ROUND_UP(offset_in_page(skb->data)+linear_len,
> PAGE_SIZE));
> >
> >       copy_off = ((offset_in_page(skb->data)+linear_len) <
> 2*PAGE_SIZE)
> > 			? linear_len % PAGE_SIZE;
> > 			: (offset_in_page(skb->data)+linear_len) % PAGE_SIZE;
> 
> A change like this makes the code much more difficult to understand.
> 
:-) It would have been easier had we written logic using a for loop similar to
how the counting is done for data in frags. In fact I did do mistake in above
calculations :-( . A proper logic probably should look somewhat like below.

linear_len=skb_headlen(skb);
count = (linear_len <= PAGE_SIZE)
	? 1
	:DIV_ROUND_UP(offset_in_page(skb->data)+linear_len,PAGE_SIZE);

copy_off = (linear_len <= PAGE_SIZE)
		?linear_len
		:( offset_in_page(skb->data)+linear_len -1)%PAGE_SIZE+1;

> > > > max_required_rx_slots also may require a fix to account the
> > > > additional slot that may be required in case mtu >=
PAG_SIZE. For
> > > > worst case scenario atleast another +1.  One thing that is
still
> > > > puzzling here is, max_required_rx_slots seems to be assuming
that
> > > > linear length in head will never be greater than mtu size.
But
> that
> > > > doesn''t seem to be the case all the time. I wonder
if it requires
> > > > some kind of fix there or special handling when
count_skb_slots
> > > > exceeds max_required_rx_slots.
> > >
> > > We should only be using the number of pages required to copy the
> > > data. The fix shouldn''t be to anticipate wasting ring
space by
> > > increasing the return value of max_required_rx_slots().
> > >
> >
> > I do not think we are wasting any ring space. But just ensuring that
> > we have enough before proceeding ahead.
> 
> For some SKBs with large linear buffers, we certainly are wasting
> space. Go back and read the explanation in
>   http://lists.xen.org/archives/html/xen-devel/2012-12/msg00274.html
> I think I probably did not put my point clearly to make it understandable.
Xen_netbk_rx_ring_full uses max_required_rx_slots value. Xen_netbk_rx_ring_full
is called to decide wither a vif is schedulable or not. So in case the mtu value
is >=PAGE_SIZE, for a worst case scenario additional buffer would be required
which is not taken care by current calculations.

Ofcourse in your new fix if you do a code change not to leave a hole in first
buffer then this correction may not be required. But I am not the right person
to decide the implications of the fix you are proposing. The current
start_new_rx_buffer seems to be trying to make the copies PAGE aligned and also
reduce number of copy operations.

For example let us say SKB_HEAD_LEN is for whatever reason 4*PAGE_SIZE and
offset_in_page is 32.

As per existing logic of start_new_rx_buffer and with the fix I am proposing for
count and copy_off, we will calculate and occupy 5 ring buffers and will use 5
copy operations.

If we fix it the way you are proposing, not to leave a hole in the first buffer
by modifying start_new_rx_buffer then it will occupy 4 ring buffers but will
require 8 copy operations as per existing logic in netbk_gop_skb while copying
head!!


Thanks
Siva

> > > [...]
> > >
> > > > > Why increment count by the /estimated/ count instead of
the
> actual
> > > > > number of slots used? We have the number of slots in
the line
> just
> > > > > above, in sco->meta_slots_used.
> > > > >
> > > >
> > > > Count actually refers to ring slots consumed rather than
> meta_slots
> > > > used.  Count can be different from meta_slots_used.
> > >
> > > Aah, indeed. This can end up being too pessimistic if you have
lots
> of
> > > frags that require multiple copy operations. I still think that
it
> > > would be better to calculate the actual number of ring slots
> consumed
> > > by netbk_gop_skb() to avoid other bugs like the one you
originally
> > > fixed.
> > >
> >
> > counting done in count_skb_slots is what exactly it is. The fix done
> > above is to make it same so that no need to re calculate again.
> 
> Today, the counting done in count_skb_slots() *does not* match the
> number of buffer slots consumed by netbk_gop_skb().
> 
> Matt

Matt Wilson

2012-Dec-14 18:53 UTC

head link

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

On Thu, Dec 13, 2012 at 11:12:50PM +0000, Palagummi, Siva
wrote:> > -----Original Message-----
> > From: Matt Wilson [mailto:msw@amazon.com]
> > Sent: Wednesday, December 12, 2012 3:05 AM
> >
> > On Tue, Dec 11, 2012 at 10:25:51AM +0000, Palagummi, Siva wrote:
> > >
> > > You can clearly see below that copy_off is input to
> > > start_new_rx_buffer while copying frags.
> > 
> > Yes, but that''s the right thing to do. copy_off should be set
to the
> > destination offset after copying the last byte of linear data, which
> > means "skb_headlen(skb) % PAGE_SIZE" is correct.
> > 
>
> No. It is not correct for two reasons. For example what if
> skb_headlen(skb) is exactly a multiple of PAGE_SIZE. Copy_off would
> be set to ZERO. And now if there exists some data in frags, ZERO
> will be passed in as copy_off value and start_new_rx_buffer will
> return FALSE. And second reason is the obvious case from the current
> code where "offset_in_page(skb->data)" size hole will be left
in the
> first buffer after first pass in case remaining data that need to be
> copied is going to overflow the first buffer.
Right, and I''m arguing that having the code leave a hole is less
desirable than potentially increasing the number of copy
operations. I''d like to hear from Ian and others if using the buffers
efficiently is more important than reducing copy operations. Intuitively,
I think it''s more important to use the ring efficiently.
> > > So if the buggy "count" calculation below is fixed
based on
> > > offset_in_page value then copy_off value also will change
> > > accordingly.
> > 
> > This calculation is not incorrect. You should only need as many
> > PAGE_SIZE buffers as you have linear data to fill.
> > 
>
> This calculation is incorrect and do not match actual slots used as
> it is now unless some new change is done either in nebk_gop_skb or
> in start_new_rx_buffer.
Yes, and I''m proposing to change netbk_gop_skb() and
start_new_rx_buffer().
> > >       linear_len = skb_headlen(skb)
> > > 	count = (linear_len <= PAGE_SIZE)
> > >               ? 1
> > >              
:DIV_ROUND_UP(offset_in_page(skb->data)+linear_len,
> > PAGE_SIZE));
> > >
> > >       copy_off = ((offset_in_page(skb->data)+linear_len) <
> > 2*PAGE_SIZE)
> > > 			? linear_len % PAGE_SIZE;
> > > 			: (offset_in_page(skb->data)+linear_len) % PAGE_SIZE;
> > 
> > A change like this makes the code much more difficult to understand.
> > 
> :-) It would have been easier had we written logic using a for loop
> similar to how the counting is done for data in frags. In fact I did
> do mistake in above calculations :-( . A proper logic probably
> should look somewhat like below.
Yes, we tested one version of a fix that modeled the algorithm used in
count_skb_slots() more directly after the copy loop in netbk_gop_skb()
to avoid such errors.
> linear_len=skb_headlen(skb);
> count = (linear_len <= PAGE_SIZE)
> 	? 1
> 	:DIV_ROUND_UP(offset_in_page(skb->data)+linear_len,PAGE_SIZE);
> 
> copy_off = (linear_len <= PAGE_SIZE)
> 		?linear_len
> 		:( offset_in_page(skb->data)+linear_len -1)%PAGE_SIZE+1;
If we can agree that utilizing buffers efficiently is more important
than reducing copy operations, we can avoid this complexity.
> > For some SKBs with large linear buffers, we certainly are wasting
> > space. Go back and read the explanation in
> >   http://lists.xen.org/archives/html/xen-devel/2012-12/msg00274.html
>
> I think I probably did not put my point clearly to make it
> understandable. Xen_netbk_rx_ring_full uses max_required_rx_slots
> value. Xen_netbk_rx_ring_full is called to decide wither a vif is
> schedulable or not. So in case the mtu value is >=PAGE_SIZE, for a
> worst case scenario additional buffer would be required which is not
> taken care by current calculations.
Correct, I agree with your description of the current problem.
> Ofcourse in your new fix if you do a code change not to leave a hole
> in first buffer then this correction may not be required. But I am
> not the right person to decide the implications of the fix you are
> proposing. The current start_new_rx_buffer seems to be trying to
> make the copies PAGE aligned and also reduce number of copy
> operations.
Indeed, that is what the change that we''re testing now does. Changing
start_new_rx_buffer() to avoid leaving a hole in the first buffer is
performing quite well. Unfortunately I don''t have comparison numbers
just now.
> For example let us say SKB_HEAD_LEN is for whatever reason
> 4*PAGE_SIZE and offset_in_page is 32.
>
> As per existing logic of start_new_rx_buffer and with the fix I am
> proposing for count and copy_off, we will calculate and occupy 5
> ring buffers and will use 5 copy operations.
I agree.
> If we fix it the way you are proposing, not to leave a hole in the
> first buffer by modifying start_new_rx_buffer then it will occupy 4
> ring buffers but will require 8 copy operations as per existing
> logic in netbk_gop_skb while copying head!!
I think that we should optimize for more commonly seen lengths, which
I''d think would be 2-3 pages max. The delta in copy operations is
smaller in these cases, where we would use 4 copy operations for 2
pages (as opposed to 3 with the current algorithm) and 6 copy
operations for 3 pages (as opposed to 4 with the current algorithm).

IanC, Konrad - do you have any opinions on the best approach here?

Matt

Ian Campbell

2012-Dec-17 11:26 UTC

head link

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

On Fri, 2012-12-14 at 18:53 +0000, Matt Wilson wrote:> On Thu, Dec 13, 2012 at 11:12:50PM +0000, Palagummi, Siva wrote:
> > > -----Original Message-----
> > > From: Matt Wilson [mailto:msw@amazon.com]
> > > Sent: Wednesday, December 12, 2012 3:05 AM
> > >
> > > On Tue, Dec 11, 2012 at 10:25:51AM +0000, Palagummi, Siva wrote:
> > > >
> > > > You can clearly see below that copy_off is input to
> > > > start_new_rx_buffer while copying frags.
> > > 
> > > Yes, but that''s the right thing to do. copy_off should
be set to the
> > > destination offset after copying the last byte of linear data,
which
> > > means "skb_headlen(skb) % PAGE_SIZE" is correct.
> > > 
> >
> > No. It is not correct for two reasons. For example what if
> > skb_headlen(skb) is exactly a multiple of PAGE_SIZE. Copy_off would
> > be set to ZERO. And now if there exists some data in frags, ZERO
> > will be passed in as copy_off value and start_new_rx_buffer will
> > return FALSE. And second reason is the obvious case from the current
> > code where "offset_in_page(skb->data)" size hole will be
left in the
> > first buffer after first pass in case remaining data that need to be
> > copied is going to overflow the first buffer.
> 
> Right, and I''m arguing that having the code leave a hole is less
> desirable than potentially increasing the number of copy
> operations. I''d like to hear from Ian and others if using the
buffers
> efficiently is more important than reducing copy operations. Intuitively,
> I think it''s more important to use the ring efficiently.
Do you mean the ring or the actual buffers?

The current code tries to coalesce multiple small frags/heads because it
is usually trivial but doesn''t try too hard with multiple larger frags,
since they take up most of a page by themselves anyway. I suppose this
does waste a bit of buffer space and therefore could take more ring
slots, but it''s not clear to me how much this matters in practice (it
might be tricky to measure this with any realistic workload).

The cost of splitting a copy into two should be low though, the copies
are already batched into a single hypercall and I''d expect things to be
mostly dominated by the data copy itself rather than the setup of each
individual op, which would argue for splitting a copy in two if that
helps fill the buffers.

The flip side is that once you get past the headers etc the paged frags
likely tend to either bits and bobs (fine) or mostly whole pages. In the
whole pages case trying to fill the buffers will result in every copy
getting split. My gut tells me that the whole pages case probably
dominates, but I''m not sure what the real world impact of splitting all
the copies would be.
> 
> > If we fix it the way you are proposing, not to leave a hole in the
> > first buffer by modifying start_new_rx_buffer then it will occupy 4
> > ring buffers but will require 8 copy operations as per existing
> > logic in netbk_gop_skb while copying head!!
> 
> I think that we should optimize for more commonly seen lengths, which
> I''d think would be 2-3 pages max. The delta in copy operations is
> smaller in these cases, where we would use 4 copy operations for 2
> pages (as opposed to 3 with the current algorithm) and 6 copy
> operations for 3 pages (as opposed to 4 with the current algorithm).
> 
> IanC, Konrad - do you have any opinions on the best approach here?
> 
> Matt

Matt Wilson

2012-Dec-17 20:09 UTC

head link

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

On Mon, Dec 17, 2012 at 11:26:38AM +0000, Ian Campbell
wrote:> On Fri, 2012-12-14 at 18:53 +0000, Matt Wilson wrote:
> > On Thu, Dec 13, 2012 at 11:12:50PM +0000, Palagummi, Siva wrote:
> > > > -----Original Message-----
> > > > From: Matt Wilson [mailto:msw@amazon.com]
> > > > Sent: Wednesday, December 12, 2012 3:05 AM
> > > >
> > > > On Tue, Dec 11, 2012 at 10:25:51AM +0000, Palagummi, Siva
wrote:
> > > > >
> > > > > You can clearly see below that copy_off is input to
> > > > > start_new_rx_buffer while copying frags.
> > > > 
> > > > Yes, but that''s the right thing to do. copy_off
should be set to the
> > > > destination offset after copying the last byte of linear
data, which
> > > > means "skb_headlen(skb) % PAGE_SIZE" is correct.
> > > > 
> > >
> > > No. It is not correct for two reasons. For example what if
> > > skb_headlen(skb) is exactly a multiple of PAGE_SIZE. Copy_off
would
> > > be set to ZERO. And now if there exists some data in frags, ZERO
> > > will be passed in as copy_off value and start_new_rx_buffer will
> > > return FALSE. And second reason is the obvious case from the
current
> > > code where "offset_in_page(skb->data)" size hole
will be left in the
> > > first buffer after first pass in case remaining data that need to
be
> > > copied is going to overflow the first buffer.
> > 
> > Right, and I''m arguing that having the code leave a hole is
less
> > desirable than potentially increasing the number of copy
> > operations. I''d like to hear from Ian and others if using the
buffers
> > efficiently is more important than reducing copy operations.
Intuitively,
> > I think it''s more important to use the ring efficiently.
> 
> Do you mean the ring or the actual buffers?
Sorry, the actual buffers.
> The current code tries to coalesce multiple small frags/heads because it
> is usually trivial but doesn''t try too hard with multiple larger
frags,
> since they take up most of a page by themselves anyway. I suppose this
> does waste a bit of buffer space and therefore could take more ring
> slots, but it''s not clear to me how much this matters in practice
(it
> might be tricky to measure this with any realistic workload).
In the case where we''re consistently handling large heads (like when
using a MTU value of 9000 for streaming traffic), we''re wasting 1/3 of
the available buffers.
> The cost of splitting a copy into two should be low though, the copies
> are already batched into a single hypercall and I''d expect things
to be
> mostly dominated by the data copy itself rather than the setup of each
> individual op, which would argue for splitting a copy in two if that
> helps fill the buffers.
That was my thought as well. We''re testing a patch that does just this
now.
> The flip side is that once you get past the headers etc the paged frags
> likely tend to either bits and bobs (fine) or mostly whole pages. In the
> whole pages case trying to fill the buffers will result in every copy
> getting split. My gut tells me that the whole pages case probably
> dominates, but I''m not sure what the real world impact of
splitting all
> the copies would be.
Right, I''m less concerned about the paged frags. It might make sense
to skip some space so that the copying can be page aligned. I suppose
it depends on how many defferent pages are in the list, and what the
total size is.

In practice I''d think it would be rare to see a paged SKB for ingress
traffic to domUs unless there is significant intra-host communication
(dom0->domU, domU->domU). When domU ingress traffic is originating
from an Ethernet device it shouldn''t be paged. Paged SKBs would come
into play when a SKB is formed for transmit on an egress device that
is SG-capable. Or am I misunderstanding how paged SKBs are used these
days?

Matt

Ian Campbell

2012-Dec-18 10:02 UTC

head link

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

On Mon, 2012-12-17 at 20:09 +0000, Matt Wilson wrote:> On Mon, Dec 17, 2012 at 11:26:38AM +0000, Ian Campbell wrote:
> > On Fri, 2012-12-14 at 18:53 +0000, Matt Wilson wrote:
> > > On Thu, Dec 13, 2012 at 11:12:50PM +0000, Palagummi, Siva wrote:
> > > > > -----Original Message-----
> > > > > From: Matt Wilson [mailto:msw@amazon.com]
> > > > > Sent: Wednesday, December 12, 2012 3:05 AM
> > > > >
> > > > > On Tue, Dec 11, 2012 at 10:25:51AM +0000, Palagummi,
Siva wrote:
> > > > > >
> > > > > > You can clearly see below that copy_off is input
to
> > > > > > start_new_rx_buffer while copying frags.
> > > > > 
> > > > > Yes, but that''s the right thing to do.
copy_off should be set to the
> > > > > destination offset after copying the last byte of
linear data, which
> > > > > means "skb_headlen(skb) % PAGE_SIZE" is
correct.
> > > > > 
> > > >
> > > > No. It is not correct for two reasons. For example what if
> > > > skb_headlen(skb) is exactly a multiple of PAGE_SIZE.
Copy_off would
> > > > be set to ZERO. And now if there exists some data in frags,
ZERO
> > > > will be passed in as copy_off value and start_new_rx_buffer
will
> > > > return FALSE. And second reason is the obvious case from the
current
> > > > code where "offset_in_page(skb->data)" size
hole will be left in the
> > > > first buffer after first pass in case remaining data that
need to be
> > > > copied is going to overflow the first buffer.
> > > 
> > > Right, and I''m arguing that having the code leave a hole
is less
> > > desirable than potentially increasing the number of copy
> > > operations. I''d like to hear from Ian and others if
using the buffers
> > > efficiently is more important than reducing copy operations.
Intuitively,
> > > I think it''s more important to use the ring efficiently.
> > 
> > Do you mean the ring or the actual buffers?
> 
> Sorry, the actual buffers.
> 
> > The current code tries to coalesce multiple small frags/heads because
it
> > is usually trivial but doesn''t try too hard with multiple
larger frags,
> > since they take up most of a page by themselves anyway. I suppose this
> > does waste a bit of buffer space and therefore could take more ring
> > slots, but it''s not clear to me how much this matters in
practice (it
> > might be tricky to measure this with any realistic workload).
> 
> In the case where we''re consistently handling large heads (like
when
> using a MTU value of 9000 for streaming traffic), we''re wasting
1/3 of
> the available buffers.
Sorry if I missed this earlier in the thread, but how do we end up
wasting so much?

For an skb with 9000 bytes in the linear area, which must necessarily be
contiguous, do we not fill the first two page sized buffers completely?
The remaining 808 bytes must then have its own buffer. Hrm, I suppose
that''s about 27% wasted over the three pages. If we are doing something
worse than that though then we have a bug somewhere (nb: older netbacks
would only fill the first 2048 bytes of each buffer, the wastage is
presumably phenomenal in that case ;-), MAX_BUFFER_OFFSET is now =PAGE_SIZE
though)

Unless I''ve misunderstood this thread and we are considering packing
data from multiple skbs into a single buffer? (i.e. the remaining
4096-808=3288 bytes in the third buffer above would contain data for the
next skb). Does the ring protocol even allow for that possibility? It
seems like a path to complexity to me.
> > The cost of splitting a copy into two should be low though, the copies
> > are already batched into a single hypercall and I''d expect
things to be
> > mostly dominated by the data copy itself rather than the setup of each
> > individual op, which would argue for splitting a copy in two if that
> > helps fill the buffers.
> 
> That was my thought as well. We''re testing a patch that does just
this
> now.
> 
> > The flip side is that once you get past the headers etc the paged
frags
> > likely tend to either bits and bobs (fine) or mostly whole pages. In
the
> > whole pages case trying to fill the buffers will result in every copy
> > getting split. My gut tells me that the whole pages case probably
> > dominates, but I''m not sure what the real world impact of
splitting all
> > the copies would be.
> 
> Right, I''m less concerned about the paged frags. It might make
sense
> to skip some space so that the copying can be page aligned. I suppose
> it depends on how many defferent pages are in the list, and what the
> total size is.
> 
> In practice I''d think it would be rare to see a paged SKB for
ingress
> traffic to domUs unless there is significant intra-host communication
> (dom0->domU, domU->domU). When domU ingress traffic is originating
> from an Ethernet device it shouldn''t be paged. Paged SKBs would
come
> into play when a SKB is formed for transmit on an egress device that
> is SG-capable. Or am I misunderstanding how paged SKBs are used these
> days?
I think it depends on the hardware and/or driver. IIRC some devices push
down frag zero into the device for RX DMA and then share it with the
linear area (I think this might have something to do with making LRO or
GRO easier/workable).

Also things such as GRO can commonly cause received skbs being passed up
the stack to contain several frags.

I''m not quite sure how this works but in the case of s/w GRO I
wouldn''t
be surprised if this resulted in a skb with lots of 1500 byte (i.e. wire
MTU) frags. I think we would end up at least coalescing those two per
buffer on transmit (3000/4096 = 73% filling the page).

Doing better would either need start_new_rx_buffer to always completely
fill each buffer or to take a much more global view of the frags (e.g.
taking the size of the next frag and how it fits into consideration
too).

Ian.

Matt Wilson

2012-Dec-18 19:43 UTC

head link

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

On Tue, Dec 18, 2012 at 10:02:48AM +0000, Ian Campbell
wrote:> On Mon, 2012-12-17 at 20:09 +0000, Matt Wilson wrote:
> > On Mon, Dec 17, 2012 at 11:26:38AM +0000, Ian Campbell wrote:
[...]> > > Do you mean the ring or the actual buffers?
> > 
> > Sorry, the actual buffers.
> > 
> > > The current code tries to coalesce multiple small frags/heads
because it
> > > is usually trivial but doesn''t try too hard with
multiple larger frags,
> > > since they take up most of a page by themselves anyway. I suppose
this
> > > does waste a bit of buffer space and therefore could take more
ring
> > > slots, but it''s not clear to me how much this matters in
practice (it
> > > might be tricky to measure this with any realistic workload).
> > 
> > In the case where we''re consistently handling large heads
(like when
> > using a MTU value of 9000 for streaming traffic), we''re
wasting 1/3 of
> > the available buffers.
> 
> Sorry if I missed this earlier in the thread, but how do we end up
> wasting so much?
I see SKBs with:
  skb_headlen(skb) == 8157
  offset_in_page(skb->data) == 64

when handling long streaming ingress flows from ixgbe with MTU (on the
NIC and both sides of the VIF) set to 9000. When all the SKBs making
up the flow have the above property, xen-netback uses 3 pages instead
of two. The first buffer gets 4032 bytes copied into it. The next
buffer gets 4096 bytes copied into it. The final buffer gets 29 bytes
copied into it. See this post in the archives for a more detailed
walk through netbk_gop_frag_copy():
  http://lists.xen.org/archives/html/xen-devel/2012-12/msg00274.html
> For an skb with 9000 bytes in the linear area, which must necessarily be
> contiguous, do we not fill the first two page sized buffers completely?
> The remaining 808 bytes must then have its own buffer. Hrm, I suppose
> that''s about 27% wasted over the three pages. If we are doing
something
> worse than that though then we have a bug somewhere (nb: older netbacks
> would only fill the first 2048 bytes of each buffer, the wastage is
> presumably phenomenal in that case ;-), MAX_BUFFER_OFFSET is now =>
PAGE_SIZE though)
Sorry, I should have said 8157 bytes for my example. :-)
> Unless I''ve misunderstood this thread and we are considering
packing
> data from multiple skbs into a single buffer? (i.e. the remaining
> 4096-808=3288 bytes in the third buffer above would contain data for the
> next skb). Does the ring protocol even allow for that possibility? It
> seems like a path to complexity to me.
No, I''m not suggesting that we come up with an extension to pack the
next skb into any remaining space left over from the current one. I
agree that would make for a lot of complexity managing the ratio of
meta slots to buffers, etc.
> > > The cost of splitting a copy into two should be low though, the
copies
> > > are already batched into a single hypercall and I''d
expect things to be
> > > mostly dominated by the data copy itself rather than the setup of
each
> > > individual op, which would argue for splitting a copy in two if
that
> > > helps fill the buffers.
> > 
> > That was my thought as well. We''re testing a patch that does
just this
> > now.
> > 
> > > The flip side is that once you get past the headers etc the paged
frags
> > > likely tend to either bits and bobs (fine) or mostly whole pages.
In the
> > > whole pages case trying to fill the buffers will result in every
copy
> > > getting split. My gut tells me that the whole pages case probably
> > > dominates, but I''m not sure what the real world impact
of splitting all
> > > the copies would be.
> > 
> > Right, I''m less concerned about the paged frags. It might
make sense
> > to skip some space so that the copying can be page aligned. I suppose
> > it depends on how many defferent pages are in the list, and what the
> > total size is.
> > 
> > In practice I''d think it would be rare to see a paged SKB for
ingress
> > traffic to domUs unless there is significant intra-host communication
> > (dom0->domU, domU->domU). When domU ingress traffic is
originating
> > from an Ethernet device it shouldn''t be paged. Paged SKBs
would come
> > into play when a SKB is formed for transmit on an egress device that
> > is SG-capable. Or am I misunderstanding how paged SKBs are used these
> > days?
> 
> I think it depends on the hardware and/or driver. IIRC some devices push
> down frag zero into the device for RX DMA and then share it with the
> linear area (I think this might have something to do with making LRO or
> GRO easier/workable).
> 
> Also things such as GRO can commonly cause received skbs being passed up
> the stack to contain several frags.
> 
> I''m not quite sure how this works but in the case of s/w GRO I
wouldn''t
> be surprised if this resulted in a skb with lots of 1500 byte (i.e. wire
> MTU) frags. I think we would end up at least coalescing those two per
> buffer on transmit (3000/4096 = 73% filling the page).
> 
> Doing better would either need start_new_rx_buffer to always completely
> fill each buffer or to take a much more global view of the frags (e.g.
> taking the size of the next frag and how it fits into consideration
> too).
What''s the down side to making start_new_rx_buffer() always try to
fill each buffer?

Matt

Ian Campbell

2012-Dec-20 10:05 UTC

head link

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

On Tue, 2012-12-18 at 19:43 +0000, Matt Wilson wrote:> On Tue, Dec 18, 2012 at 10:02:48AM +0000, Ian Campbell wrote:
> > On Mon, 2012-12-17 at 20:09 +0000, Matt Wilson wrote:
> > > On Mon, Dec 17, 2012 at 11:26:38AM +0000, Ian Campbell wrote:
> [...]
> > > > Do you mean the ring or the actual buffers?
> > > 
> > > Sorry, the actual buffers.
> > > 
> > > > The current code tries to coalesce multiple small
frags/heads because it
> > > > is usually trivial but doesn''t try too hard with
multiple larger frags,
> > > > since they take up most of a page by themselves anyway. I
suppose this
> > > > does waste a bit of buffer space and therefore could take
more ring
> > > > slots, but it''s not clear to me how much this
matters in practice (it
> > > > might be tricky to measure this with any realistic
workload).
> > > 
> > > In the case where we''re consistently handling large
heads (like when
> > > using a MTU value of 9000 for streaming traffic), we''re
wasting 1/3 of
> > > the available buffers.
> > 
> > Sorry if I missed this earlier in the thread, but how do we end up
> > wasting so much?
> 
> I see SKBs with:
>   skb_headlen(skb) == 8157
>   offset_in_page(skb->data) == 64
> 
> when handling long streaming ingress flows from ixgbe with MTU (on the
> NIC and both sides of the VIF) set to 9000. When all the SKBs making
> up the flow have the above property, xen-netback uses 3 pages instead
> of two. The first buffer gets 4032 bytes copied into it. The next
> buffer gets 4096 bytes copied into it. The final buffer gets 29 bytes
> copied into it. See this post in the archives for a more detailed
> walk through netbk_gop_frag_copy():
>   http://lists.xen.org/archives/html/xen-devel/2012-12/msg00274.html
Thanks. This certainly seems wrong for the head bit.
> What''s the down side to making start_new_rx_buffer() always try to
> fill each buffer?
As we discussed earlier in the thread it doubles the number of copy ops
per frag under some circumstances, my gut is that this isn''t going to
hurt but that''s just my gut.

It seems obviously right that the linear part of the SKB should always
fill entire buffers though. Perhaps the answer is to differentiate
between the skb->data and the frags?

Ian.

Matt Wilson

2012-Dec-20 21:42 UTC

head link

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

On Thu, Dec 20, 2012 at 10:05:29AM +0000, Ian Campbell
wrote:> On Tue, 2012-12-18 at 19:43 +0000, Matt Wilson wrote:
[...]> > I see SKBs with:
> >   skb_headlen(skb) == 8157
> >   offset_in_page(skb->data) == 64
> > 
> > when handling long streaming ingress flows from ixgbe with MTU (on the
> > NIC and both sides of the VIF) set to 9000. When all the SKBs making
> > up the flow have the above property, xen-netback uses 3 pages instead
> > of two. The first buffer gets 4032 bytes copied into it. The next
> > buffer gets 4096 bytes copied into it. The final buffer gets 29 bytes
> > copied into it. See this post in the archives for a more detailed
> > walk through netbk_gop_frag_copy():
> >   http://lists.xen.org/archives/html/xen-devel/2012-12/msg00274.html
> 
> Thanks. This certainly seems wrong for the head bit.
> 
> > What''s the down side to making start_new_rx_buffer() always
try to
> > fill each buffer?
> 
> As we discussed earlier in the thread it doubles the number of copy ops
> per frag under some circumstances, my gut is that this isn''t going
to
> hurt but that''s just my gut.
> 
> It seems obviously right that the linear part of the SKB should always
> fill entire buffers though. Perhaps the answer is to differentiate
> between the skb->data and the frags?
We''ve written a patch that does exactly that. It''s stable and
performs
well in our testing so far. I''ll need to forward port it to the latest
Linux tree, test it there, and post.

Matt

Xen devel - Aug 2012 - [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

[PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used

Re: [PATCH RFC V2] xen/netback: Count ring slots properly when larger MTU sizes are used