Paul Durrant
2013-Oct-04 16:26 UTC
[PATCH net-next] xen-netback: fix xenvif_count_skb_slots()
Commit 4f0581d25827d5e864bcf07b05d73d0d12a20a5c introduced an error into xenvif_count_skb_slots() for skbs with a linear area spanning a page boundary. The alignment of skb->data needs to be taken into account, not just the head length. This patch fixes the issue by dry-running the code from xenvif_gop_skb() (and adjusting the comment above the function to note that). Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Xi Xiong <xixiong@amazon.com> Cc: Matt Wilson <msw@amazon.com> Cc: Annie Li <annie.li@oracle.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> --- drivers/net/xen-netback/netback.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index d0b0feb..6f680f4 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -223,15 +223,28 @@ static bool start_new_rx_buffer(int offset, unsigned long size, int head) /* * Figure out how many ring slots we''re going to need to send @skb to * the guest. This function is essentially a dry run of - * xenvif_gop_frag_copy. + * xenvif_gop_skb. */ unsigned int xenvif_count_skb_slots(struct xenvif *vif, struct sk_buff *skb) { + unsigned char *data; unsigned int count; int i, copy_off; struct skb_cb_overlay *sco; - count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE); + count = 0; + + data = skb->data; + while (data < skb_tail_pointer(skb)) { + unsigned int offset = offset_in_page(data); + unsigned int len = PAGE_SIZE - offset; + + if (data + len > skb_tail_pointer(skb)) + len = skb_tail_pointer(skb) - data; + + count++; + data += len; + } copy_off = skb_headlen(skb) % PAGE_SIZE; -- 1.7.10.4
Matt Wilson
2013-Oct-07 00:06 UTC
Re: [PATCH net-next] xen-netback: fix xenvif_count_skb_slots()
On Fri, Oct 04, 2013 at 05:26:23PM +0100, Paul Durrant wrote:> Commit 4f0581d25827d5e864bcf07b05d73d0d12a20a5c introduced an error into > xenvif_count_skb_slots() for skbs with a linear area spanning a page > boundary. The alignment of skb->data needs to be taken into account, not > just the head length. This patch fixes the issue by dry-running the code > from xenvif_gop_skb() (and adjusting the comment above the function to note > that). > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com> > Cc: Xi Xiong <xixiong@amazon.com> > Cc: Matt Wilson <msw@amazon.com> > Cc: Annie Li <annie.li@oracle.com> > Cc: Wei Liu <wei.liu2@citrix.com> > Cc: Ian Campbell <Ian.Campbell@citrix.com>Paul, can you reconcile this change with the one made by Simon in cs 8f985b4f7a5394c8f8725a5109451a541ddb9eea? --msw> --- > drivers/net/xen-netback/netback.c | 17 +++++++++++++++-- > 1 file changed, 15 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c > index d0b0feb..6f680f4 100644 > --- a/drivers/net/xen-netback/netback.c > +++ b/drivers/net/xen-netback/netback.c > @@ -223,15 +223,28 @@ static bool start_new_rx_buffer(int offset, unsigned long size, int head) > /* > * Figure out how many ring slots we''re going to need to send @skb to > * the guest. This function is essentially a dry run of > - * xenvif_gop_frag_copy. > + * xenvif_gop_skb. > */ > unsigned int xenvif_count_skb_slots(struct xenvif *vif, struct sk_buff *skb) > { > + unsigned char *data; > unsigned int count; > int i, copy_off; > struct skb_cb_overlay *sco; > > - count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE); > + count = 0; > + > + data = skb->data; > + while (data < skb_tail_pointer(skb)) { > + unsigned int offset = offset_in_page(data); > + unsigned int len = PAGE_SIZE - offset; > + > + if (data + len > skb_tail_pointer(skb)) > + len = skb_tail_pointer(skb) - data; > + > + count++; > + data += len; > + } > > copy_off = skb_headlen(skb) % PAGE_SIZE; >
Matt Wilson
2013-Oct-07 00:07 UTC
Re: [PATCH net-next] xen-netback: fix xenvif_count_skb_slots()
On Sun, Oct 06, 2013 at 05:06:52PM -0700, Matt Wilson wrote:> On Fri, Oct 04, 2013 at 05:26:23PM +0100, Paul Durrant wrote: > > Commit 4f0581d25827d5e864bcf07b05d73d0d12a20a5c introduced an error into > > xenvif_count_skb_slots() for skbs with a linear area spanning a page > > boundary. The alignment of skb->data needs to be taken into account, not > > just the head length. This patch fixes the issue by dry-running the code > > from xenvif_gop_skb() (and adjusting the comment above the function to note > > that). > > > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com> > > Cc: Xi Xiong <xixiong@amazon.com> > > Cc: Matt Wilson <msw@amazon.com> > > Cc: Annie Li <annie.li@oracle.com> > > Cc: Wei Liu <wei.liu2@citrix.com> > > Cc: Ian Campbell <Ian.Campbell@citrix.com> > > Paul, can you reconcile this change with the one made by Simon in cs > 8f985b4f7a5394c8f8725a5109451a541ddb9eea?Correction: e26b203ede31fffd52571a5ba607a26c79dc5c0d> --msw > > > --- > > drivers/net/xen-netback/netback.c | 17 +++++++++++++++-- > > 1 file changed, 15 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c > > index d0b0feb..6f680f4 100644 > > --- a/drivers/net/xen-netback/netback.c > > +++ b/drivers/net/xen-netback/netback.c > > @@ -223,15 +223,28 @@ static bool start_new_rx_buffer(int offset, unsigned long size, int head) > > /* > > * Figure out how many ring slots we''re going to need to send @skb to > > * the guest. This function is essentially a dry run of > > - * xenvif_gop_frag_copy. > > + * xenvif_gop_skb. > > */ > > unsigned int xenvif_count_skb_slots(struct xenvif *vif, struct sk_buff *skb) > > { > > + unsigned char *data; > > unsigned int count; > > int i, copy_off; > > struct skb_cb_overlay *sco; > > > > - count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE); > > + count = 0; > > + > > + data = skb->data; > > + while (data < skb_tail_pointer(skb)) { > > + unsigned int offset = offset_in_page(data); > > + unsigned int len = PAGE_SIZE - offset; > > + > > + if (data + len > skb_tail_pointer(skb)) > > + len = skb_tail_pointer(skb) - data; > > + > > + count++; > > + data += len; > > + } > > > > copy_off = skb_headlen(skb) % PAGE_SIZE; > >
David Vrabel
2013-Oct-07 09:50 UTC
Re: [PATCH net-next] xen-netback: fix xenvif_count_skb_slots()
On 04/10/13 17:26, Paul Durrant wrote:> Commit 4f0581d25827d5e864bcf07b05d73d0d12a20a5c introduced an error into > xenvif_count_skb_slots() for skbs with a linear area spanning a page > boundary. The alignment of skb->data needs to be taken into account, not > just the head length. This patch fixes the issue by dry-running the code > from xenvif_gop_skb() (and adjusting the comment above the function to note > that).If 4f0581d2582 is causing the skb->data to be fully packed into a minimal number of slots then the simple DIV_ROUND_UP(skb_headlen(skb)) is correct. I think this change will miscount in the number of slots, over-estimating the count which I think will eventually cause netback to think the ring has no space when it has some. Is the problem here not the miscounting of slots but running out of space in the grant table op array because we know use more copy ops? I didn''t think there was any real merit in the problematic commit (or at least there was no evidence that it was better) so I would suggest just reverting it instead of trying to fix it up. If we do want to change how netback fills the ring then netback needs some redesign (i.e., change it so it doesn''t have to this counting in advance) to make it much less fragile to changes in this area. David
Wei Liu
2013-Oct-07 10:01 UTC
Re: [PATCH net-next] xen-netback: fix xenvif_count_skb_slots()
On Fri, Oct 04, 2013 at 05:26:23PM +0100, Paul Durrant wrote:> Commit 4f0581d25827d5e864bcf07b05d73d0d12a20a5c introduced an error into > xenvif_count_skb_slots() for skbs with a linear area spanning a page > boundary. The alignment of skb->data needs to be taken into account, not > just the head length. This patch fixes the issue by dry-running the code > from xenvif_gop_skb() (and adjusting the comment above the function to note > that). >If I''m not mistaken the change in commit 4f0581d2 is correct because we changed the way that the ring is packed. Now you seem to fall back to the original scheme (or something in between without reverting later other changes in that commit). Do you have instruction to reproduce the bug? Can you paste some detailed oops message? Wei.> Signed-off-by: Paul Durrant <paul.durrant@citrix.com> > Cc: Xi Xiong <xixiong@amazon.com> > Cc: Matt Wilson <msw@amazon.com> > Cc: Annie Li <annie.li@oracle.com> > Cc: Wei Liu <wei.liu2@citrix.com> > Cc: Ian Campbell <Ian.Campbell@citrix.com> > > --- > drivers/net/xen-netback/netback.c | 17 +++++++++++++++-- > 1 file changed, 15 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c > index d0b0feb..6f680f4 100644 > --- a/drivers/net/xen-netback/netback.c > +++ b/drivers/net/xen-netback/netback.c > @@ -223,15 +223,28 @@ static bool start_new_rx_buffer(int offset, unsigned long size, int head) > /* > * Figure out how many ring slots we''re going to need to send @skb to > * the guest. This function is essentially a dry run of > - * xenvif_gop_frag_copy. > + * xenvif_gop_skb. > */ > unsigned int xenvif_count_skb_slots(struct xenvif *vif, struct sk_buff *skb) > { > + unsigned char *data; > unsigned int count; > int i, copy_off; > struct skb_cb_overlay *sco; > > - count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE); > + count = 0; > + > + data = skb->data; > + while (data < skb_tail_pointer(skb)) { > + unsigned int offset = offset_in_page(data); > + unsigned int len = PAGE_SIZE - offset; > + > + if (data + len > skb_tail_pointer(skb)) > + len = skb_tail_pointer(skb) - data; > + > + count++; > + data += len; > + } > > copy_off = skb_headlen(skb) % PAGE_SIZE; > > -- > 1.7.10.4
Paul Durrant
2013-Oct-07 10:12 UTC
Re: [PATCH net-next] xen-netback: fix xenvif_count_skb_slots()
> -----Original Message----- > From: Wei Liu [mailto:wei.liu2@citrix.com] > Sent: 07 October 2013 11:02 > To: Paul Durrant > Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Xi Xiong; Matt Wilson; > Annie Li; Wei Liu; Ian Campbell > Subject: Re: [PATCH net-next] xen-netback: fix xenvif_count_skb_slots() > > On Fri, Oct 04, 2013 at 05:26:23PM +0100, Paul Durrant wrote: > > Commit 4f0581d25827d5e864bcf07b05d73d0d12a20a5c introduced an error > into > > xenvif_count_skb_slots() for skbs with a linear area spanning a page > > boundary. The alignment of skb->data needs to be taken into account, not > > just the head length. This patch fixes the issue by dry-running the code > > from xenvif_gop_skb() (and adjusting the comment above the function to > note > > that). > > > > If I''m not mistaken the change in commit 4f0581d2 is correct because we > changed the way that the ring is packed. Now you seem to fall back to > the original scheme (or something in between without reverting later > other changes in that commit). >It''s not possible to use a single grant copy to copy to even a 2-byte linear area that spans a page boundary so you have to take into account the alignment of skb->data. How the ring is packed is not relevant.> Do you have instruction to reproduce the bug? Can you paste some > detailed oops message? >I don''t have the message to hand, but it''s this BUG_ON that I hit: BUG_ON(npo.copy_prod > ARRAY_SIZE(vif->grant_copy_op)); I.e. we blow the grant copy op array. Paul
Paul Durrant
2013-Oct-07 10:17 UTC
Re: [PATCH net-next] xen-netback: fix xenvif_count_skb_slots()
> -----Original Message----- > From: Matt Wilson [mailto:mswilson@gmail.com] On Behalf Of Matt Wilson > Sent: 07 October 2013 01:08 > To: Paul Durrant > Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Xi Xiong; Matt Wilson; > Annie Li; Wei Liu; Ian Campbell; Simon Graham > Subject: Re: [PATCH net-next] xen-netback: fix xenvif_count_skb_slots() > > On Sun, Oct 06, 2013 at 05:06:52PM -0700, Matt Wilson wrote: > > On Fri, Oct 04, 2013 at 05:26:23PM +0100, Paul Durrant wrote: > > > Commit 4f0581d25827d5e864bcf07b05d73d0d12a20a5c introduced an > error into > > > xenvif_count_skb_slots() for skbs with a linear area spanning a page > > > boundary. The alignment of skb->data needs to be taken into account, > not > > > just the head length. This patch fixes the issue by dry-running the code > > > from xenvif_gop_skb() (and adjusting the comment above the function > to note > > > that). > > > > > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com> > > > Cc: Xi Xiong <xixiong@amazon.com> > > > Cc: Matt Wilson <msw@amazon.com> > > > Cc: Annie Li <annie.li@oracle.com> > > > Cc: Wei Liu <wei.liu2@citrix.com> > > > Cc: Ian Campbell <Ian.Campbell@citrix.com> > > > > Paul, can you reconcile this change with the one made by Simon in cs > > 8f985b4f7a5394c8f8725a5109451a541ddb9eea? > > Correction: e26b203ede31fffd52571a5ba607a26c79dc5c0d >The comment is possibly correct with modified ring packing but the problem is that by reducing that count netback now tries to handle more skbs than it has grant copy slots for. Maybe it would be more appropriate to simply revert 4f0581d25827d5e864bcf07b05d73d0d12a20a5c. I see no problem before that patch was applied. Paul
Paul Durrant
2013-Oct-07 10:23 UTC
Re: [PATCH net-next] xen-netback: fix xenvif_count_skb_slots()
> -----Original Message----- > From: David Vrabel > Sent: 07 October 2013 10:50 > To: Paul Durrant > Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; Ian Campbell; > Annie Li; Matt Wilson; Xi Xiong > Subject: Re: [Xen-devel] [PATCH net-next] xen-netback: fix > xenvif_count_skb_slots() > > On 04/10/13 17:26, Paul Durrant wrote: > > Commit 4f0581d25827d5e864bcf07b05d73d0d12a20a5c introduced an error > into > > xenvif_count_skb_slots() for skbs with a linear area spanning a page > > boundary. The alignment of skb->data needs to be taken into account, not > > just the head length. This patch fixes the issue by dry-running the code > > from xenvif_gop_skb() (and adjusting the comment above the function to > note > > that). > > If 4f0581d2582 is causing the skb->data to be fully packed into a > minimal number of slots then the simple > DIV_ROUND_UP(skb_headlen(skb)) > is correct. > > I think this change will miscount in the number of slots, > over-estimating the count which I think will eventually cause netback to > think the ring has no space when it has some. > > Is the problem here not the miscounting of slots but running out of > space in the grant table op array because we know use more copy ops? >Essentially yes. Netback is built on the assumption of no more than two grant copies per ring slot.> I didn''t think there was any real merit in the problematic commit (or at > least there was no evidence that it was better) so I would suggest just > reverting it instead of trying to fix it up. >I''d be happy with a reversion.> If we do want to change how netback fills the ring then netback needs > some redesign (i.e., change it so it doesn''t have to this counting in > advance) to make it much less fragile to changes in this area. >Yes, that would be much better. Paul
Paul Durrant
2013-Oct-07 10:37 UTC
Re: [PATCH net-next] xen-netback: fix xenvif_count_skb_slots()
> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > bounces@lists.xen.org] On Behalf Of Paul Durrant > Sent: 07 October 2013 11:24 > To: David Vrabel > Cc: Wei Liu; Ian Campbell; netdev@vger.kernel.org; xen-devel@lists.xen.org; > Annie Li; Matt Wilson; Xi Xiong > Subject: Re: [Xen-devel] [PATCH net-next] xen-netback: fix > xenvif_count_skb_slots() > > > -----Original Message----- > > From: David Vrabel > > Sent: 07 October 2013 10:50 > > To: Paul Durrant > > Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; Ian > Campbell; > > Annie Li; Matt Wilson; Xi Xiong > > Subject: Re: [Xen-devel] [PATCH net-next] xen-netback: fix > > xenvif_count_skb_slots() > > > > On 04/10/13 17:26, Paul Durrant wrote: > > > Commit 4f0581d25827d5e864bcf07b05d73d0d12a20a5c introduced an > error > > into > > > xenvif_count_skb_slots() for skbs with a linear area spanning a page > > > boundary. The alignment of skb->data needs to be taken into account, > not > > > just the head length. This patch fixes the issue by dry-running the code > > > from xenvif_gop_skb() (and adjusting the comment above the function > to > > note > > > that). > > > > If 4f0581d2582 is causing the skb->data to be fully packed into a > > minimal number of slots then the simple > > DIV_ROUND_UP(skb_headlen(skb)) > > is correct. > > > > I think this change will miscount in the number of slots, > > over-estimating the count which I think will eventually cause netback to > > think the ring has no space when it has some. > > > > Is the problem here not the miscounting of slots but running out of > > space in the grant table op array because we know use more copy ops? > > > > Essentially yes. Netback is built on the assumption of no more than two grant > copies per ring slot. >To be clear; I believe that, with the packing change, a third grant copy may be used for the initial slot and that is why we blow the array. Paul> > I didn''t think there was any real merit in the problematic commit (or at > > least there was no evidence that it was better) so I would suggest just > > reverting it instead of trying to fix it up. > > > > I''d be happy with a reversion. > > > If we do want to change how netback fills the ring then netback needs > > some redesign (i.e., change it so it doesn''t have to this counting in > > advance) to make it much less fragile to changes in this area. > > > > Yes, that would be much better. > > Paul > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Wei Liu
2013-Oct-07 10:53 UTC
Re: [PATCH net-next] xen-netback: fix xenvif_count_skb_slots()
On Mon, Oct 07, 2013 at 11:37:31AM +0100, Paul Durrant wrote:> > -----Original Message----- > > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > > bounces@lists.xen.org] On Behalf Of Paul Durrant > > Sent: 07 October 2013 11:24 > > To: David Vrabel > > Cc: Wei Liu; Ian Campbell; netdev@vger.kernel.org; xen-devel@lists.xen.org; > > Annie Li; Matt Wilson; Xi Xiong > > Subject: Re: [Xen-devel] [PATCH net-next] xen-netback: fix > > xenvif_count_skb_slots() > > > > > -----Original Message----- > > > From: David Vrabel > > > Sent: 07 October 2013 10:50 > > > To: Paul Durrant > > > Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; Ian > > Campbell; > > > Annie Li; Matt Wilson; Xi Xiong > > > Subject: Re: [Xen-devel] [PATCH net-next] xen-netback: fix > > > xenvif_count_skb_slots() > > > > > > On 04/10/13 17:26, Paul Durrant wrote: > > > > Commit 4f0581d25827d5e864bcf07b05d73d0d12a20a5c introduced an > > error > > > into > > > > xenvif_count_skb_slots() for skbs with a linear area spanning a page > > > > boundary. The alignment of skb->data needs to be taken into account, > > not > > > > just the head length. This patch fixes the issue by dry-running the code > > > > from xenvif_gop_skb() (and adjusting the comment above the function > > to > > > note > > > > that). > > > > > > If 4f0581d2582 is causing the skb->data to be fully packed into a > > > minimal number of slots then the simple > > > DIV_ROUND_UP(skb_headlen(skb)) > > > is correct. > > > > > > I think this change will miscount in the number of slots, > > > over-estimating the count which I think will eventually cause netback to > > > think the ring has no space when it has some. > > > > > > Is the problem here not the miscounting of slots but running out of > > > space in the grant table op array because we know use more copy ops? > > > > > > > Essentially yes. Netback is built on the assumption of no more than two grant > > copies per ring slot. > > > > To be clear; I believe that, with the packing change, a third grant copy may be used for the initial slot and that is why we blow the array. >OK, thanks for the explanation. I''m fine with reverting the problematic changeset as that part of code is quite fragile now. Looking at the git history we''ve tripped over this many times, sigh. Wei.
David Miller
2013-Oct-07 19:36 UTC
Re: [PATCH net-next] xen-netback: fix xenvif_count_skb_slots()
From: Paul Durrant <paul.durrant@citrix.com> Date: Fri, 4 Oct 2013 17:26:23 +0100> Commit 4f0581d25827d5e864bcf07b05d73d0d12a20a5c introduced an error into > xenvif_count_skb_slots() for skbs with a linear area spanning a page > boundary. The alignment of skb->data needs to be taken into account, not > just the head length. This patch fixes the issue by dry-running the code > from xenvif_gop_skb() (and adjusting the comment above the function to note > that). > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>There seems to be a lot of back and forth about what is the most desirable way forward wrt. this commit and another similar one. Please advise.
Ian Campbell
2013-Oct-07 20:03 UTC
Re: [PATCH net-next] xen-netback: fix xenvif_count_skb_slots()
On Mon, 2013-10-07 at 15:36 -0400, David Miller wrote:> From: Paul Durrant <paul.durrant@citrix.com> > Date: Fri, 4 Oct 2013 17:26:23 +0100 > > > Commit 4f0581d25827d5e864bcf07b05d73d0d12a20a5c introduced an error into > > xenvif_count_skb_slots() for skbs with a linear area spanning a page > > boundary. The alignment of skb->data needs to be taken into account, not > > just the head length. This patch fixes the issue by dry-running the code > > from xenvif_gop_skb() (and adjusting the comment above the function to note > > that). > > > > Signed-off-by: Paul Durrant <paul.durrant@citrix.com> > > There seems to be a lot of back and forth about what is the most > desirable way forward wrt. this commit and another similar one. > > Please advise.Lets revert 4f0581d25827d5e864bcf07b05d73d0d12a20a5c and see about making this stuff less fragile in the future. Thanks, Ian.