thr3ads.net - Xen devel - compound skb frag pages appearing in start

If this information is useful, please help other people find it:
Share via:

Ian Campbell

2012-Oct-09 13:47 UTC

compound skb frag pages appearing in start_xmit

Hi Eric,

Sander has discovered an issue where xen-netback is given a compound
page as one of the skb frag pages to transmit. Currently netback can
only handle PAGE_SIZE''d frags and bugs out.

I suspect this is something to do with 69b08f62e174 "net: use bigger
pages in __netdev_alloc_frag", although perhaps not because it looks
like only tg3 uses it and Sander has an r8169. Also tg3 seems to only
call netdev_alloc_frag for sizes < PAGE_SIZE. I''m probably missing
something.

Are all net drivers expected to be able to handle compound pages in the
frags? Obviously it is to their benefit to do so, so it is something
I''ll want to look into for netback.

I expect the main factor here is bridging/forwarding, since the
receiving NIC and its driver appear to support compound pages but the
outgoing NIC (netback in this case) does not.

I guess my question is should I be rushing to fix netback ASAP or should
I rather be looking for a bug somewhere which caused a frag of this type
to get as far as netback''s start_xmit in the first place?

Or am I just barking up the wrong tree to start with?

Thanks,
Ian.

Eric Dumazet

2012-Oct-09 13:54 UTC

head link

Re: compound skb frag pages appearing in start_xmit

On Tue, 2012-10-09 at 14:47 +0100, Ian Campbell wrote:> Hi Eric,
> 
Hi Ian
> Sander has discovered an issue where xen-netback is given a compound
> page as one of the skb frag pages to transmit. Currently netback can
> only handle PAGE_SIZE''d frags and bugs out.
> 
> I suspect this is something to do with 69b08f62e174 "net: use bigger
> pages in __netdev_alloc_frag", although perhaps not because it looks
> like only tg3 uses it and Sander has an r8169. Also tg3 seems to only
> call netdev_alloc_frag for sizes < PAGE_SIZE. I''m probably
missing
> something.

Its not the commit you want ;)
> 
> Are all net drivers expected to be able to handle compound pages in the
> frags? Obviously it is to their benefit to do so, so it is something
> I''ll want to look into for netback.
> 
Not sure why a net driver would care of COMPOUND page at all ?

a Fragment has a struct page *, and a size.

a page can be order-0, order-1, order-2, order-3, ...
> I expect the main factor here is bridging/forwarding, since the
> receiving NIC and its driver appear to support compound pages but the
> outgoing NIC (netback in this case) does not.
> 
> I guess my question is should I be rushing to fix netback ASAP or should
> I rather be looking for a bug somewhere which caused a frag of this type
> to get as far as netback''s start_xmit in the first place?
> 
> Or am I just barking up the wrong tree to start with?


The problem comes because of 

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=5640f7685831e088fe6c2e1f863a6805962f8e81

And yes, we must find a way to cope with this problem in your driver,
because you can also benefit from increase of performance once fixed ;)

And yes I can certainly help, as I am the author of this patch ;)

Thanks

Eric Dumazet

2012-Oct-09 14:01 UTC

head link

Re: compound skb frag pages appearing in start_xmit

On Tue, 2012-10-09 at 15:54 +0200, Eric Dumazet wrote:> On Tue, 2012-10-09 at 14:47 +0100, Ian Campbell wrote:
> > Hi Eric,
> > 
> 
> Hi Ian
> 
> > Sander has discovered an issue where xen-netback is given a compound
> > page as one of the skb frag pages to transmit. Currently netback can
> > only handle PAGE_SIZE''d frags and bugs out.
> > 
> > I suspect this is something to do with 69b08f62e174 "net: use
bigger
> > pages in __netdev_alloc_frag", although perhaps not because it
looks
> > like only tg3 uses it and Sander has an r8169. Also tg3 seems to only
> > call netdev_alloc_frag for sizes < PAGE_SIZE. I''m probably
missing
> > something.
> 
> 
> Its not the commit you want ;)
Hmm, I take it back. It also can give you the same problem :

We use this allocator for rx path of drivers : 

 __netdev_alloc_skb() 

So its now absolutely possible that one skb->head is backed by a order-3
page.

Is the problem coming from xen_netbk_count_skb_slots() ?

Give me more information if you want me to help.

Ian Campbell

2012-Oct-09 14:17 UTC

head link

Re: compound skb frag pages appearing in start_xmit

On Tue, 2012-10-09 at 14:54 +0100, Eric Dumazet wrote:> On Tue, 2012-10-09 at 14:47 +0100, Ian Campbell wrote:
> > Hi Eric,
> > 
> 
> Hi Ian
> 
> > Sander has discovered an issue where xen-netback is given a compound
> > page as one of the skb frag pages to transmit. Currently netback can
> > only handle PAGE_SIZE''d frags and bugs out.
> > 
> > I suspect this is something to do with 69b08f62e174 "net: use
bigger
> > pages in __netdev_alloc_frag", although perhaps not because it
looks
> > like only tg3 uses it and Sander has an r8169. Also tg3 seems to only
> > call netdev_alloc_frag for sizes < PAGE_SIZE. I''m probably
missing
> > something.
> 
> 
> Its not the commit you want ;)
> 
> > 
> > Are all net drivers expected to be able to handle compound pages in
the
> > frags? Obviously it is to their benefit to do so, so it is something
> > I''ll want to look into for netback.
> > 
> 
> Not sure why a net driver would care of COMPOUND page at all ?
> 
> a Fragment has a struct page *, and a size.
> 
> a page can be order-0, order-1, order-2, order-3, ...
I keep falling into this trap that a struct page * can be order > 0.

The Xen PV interfaces deal in order-0 pages only. Also things which are
contiguous in physical space may not be contiguous in DMA space (which
we call "machine memory" in Xen terminology).

The first is probably a specific quirk of Xen, but I thought there were
other architectures where physical and DMA space we not necessarily
contiguous and which would therefore need special handling (I guess
those platforms all have IOMMUs)
> > I expect the main factor here is bridging/forwarding, since the
> > receiving NIC and its driver appear to support compound pages but the
> > outgoing NIC (netback in this case) does not.
> > 
> > I guess my question is should I be rushing to fix netback ASAP or
should
> > I rather be looking for a bug somewhere which caused a frag of this
type
> > to get as far as netback''s start_xmit in the first place?
> > 
> > Or am I just barking up the wrong tree to start with?
> 
> 
> 
> The problem comes because of 
> 
>
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=5640f7685831e088fe6c2e1f863a6805962f8e81
> 
> And yes, we must find a way to cope with this problem in your driver,
> because you can also benefit from increase of performance once fixed ;)
> 
> And yes I can certainly help, as I am the author of this patch ;)
I think I can mostly deal with this in the same way netback deals with
large skb heads i.e. by busting the multipage page into individual 4096
page chunks.

Does the higher order pages effectively reduce the number of frags which
are in use? e.g if MAX_SKB_FRAGS is 16, then for order-0 pages you could
have 64K worth of frag data.

If we switch to order-3 pages everywhere then can the skb contain 512K
of data, or does the effective maximum number of frags in an skb reduce
to 2?

If it''s the latter then I think fixing netback is simple, if
it''s the
former then I might need to think a bit harder.

Ian.

Ian Campbell

2012-Oct-09 14:23 UTC

head link

Re: compound skb frag pages appearing in start_xmit

On Tue, 2012-10-09 at 15:01 +0100, Eric Dumazet wrote:> On Tue, 2012-10-09 at 15:54 +0200, Eric Dumazet wrote:
> > On Tue, 2012-10-09 at 14:47 +0100, Ian Campbell wrote:
> > > Hi Eric,
> > > 
> > 
> > Hi Ian
> > 
> > > Sander has discovered an issue where xen-netback is given a
compound
> > > page as one of the skb frag pages to transmit. Currently netback
can
> > > only handle PAGE_SIZE''d frags and bugs out.
> > > 
> > > I suspect this is something to do with 69b08f62e174 "net:
use bigger
> > > pages in __netdev_alloc_frag", although perhaps not because
it looks
> > > like only tg3 uses it and Sander has an r8169. Also tg3 seems to
only
> > > call netdev_alloc_frag for sizes < PAGE_SIZE. I''m
probably missing
> > > something.
> > 
> > 
> > Its not the commit you want ;)
> 
> Hmm, I take it back. It also can give you the same problem :
> 
> We use this allocator for rx path of drivers : 
> 
>  __netdev_alloc_skb() 
> 
> So its now absolutely possible that one skb->head is backed by a order-3
> page.
> 
> Is the problem coming from xen_netbk_count_skb_slots() ?
> 
> Give me more information if you want me to help.
The interesting code is in netbk_gop_skb(), specifically the two calls
to netbk_gop_frag_copy.

netbk_gop_frag_copy can only copy order-0 pages to the peer since they
go over a shared ring transport which can only deal in order-0 pages.

For the SKB head there is a loop which handles order>0 heads, I suspect
we just need something similar for the frag case.

Although see my question in the other response about the maximum number
of frags we can have when order is > 0 since if using larger pages
causes us to end up with a much larger number of order-0 pages once
we''ve broken them up then we have a problem and I need to put my
thinking cap on a bit (perhaps substantially) tighter.

Konrad, it looks like netfront has a similar issue in
xennet_make_frags() since it doesn''t shatter large order mappings
either.

Ian.

Eric Dumazet

2012-Oct-09 14:27 UTC

head link

Re: compound skb frag pages appearing in start_xmit

On Tue, 2012-10-09 at 15:17 +0100, Ian Campbell wrote:
> Does the higher order pages effectively reduce the number of frags which
> are in use? e.g if MAX_SKB_FRAGS is 16, then for order-0 pages you could
> have 64K worth of frag data.
> 
> If we switch to order-3 pages everywhere then can the skb contain 512K
> of data, or does the effective maximum number of frags in an skb reduce
> to 2?
effective number of frags reduce to 2 or 3

(We still limit GSO packets to ~63536 bytes)

Eric Dumazet

2012-Oct-09 14:33 UTC

head link

Re: compound skb frag pages appearing in start_xmit

On Tue, 2012-10-09 at 15:23 +0100, Ian Campbell wrote:> On Tue, 2012-10-09 at 15:01 +0100, Eric Dumazet wrote:
> > On Tue, 2012-10-09 at 15:54 +0200, Eric Dumazet wrote:
> > > On Tue, 2012-10-09 at 14:47 +0100, Ian Campbell wrote:
> > > > Hi Eric,
> > > > 
> > > 
> > > Hi Ian
> > > 
> > > > Sander has discovered an issue where xen-netback is given a
compound
> > > > page as one of the skb frag pages to transmit. Currently
netback can
> > > > only handle PAGE_SIZE''d frags and bugs out.
> > > > 
> > > > I suspect this is something to do with 69b08f62e174
"net: use bigger
> > > > pages in __netdev_alloc_frag", although perhaps not
because it looks
> > > > like only tg3 uses it and Sander has an r8169. Also tg3
seems to only
> > > > call netdev_alloc_frag for sizes < PAGE_SIZE.
I''m probably missing
> > > > something.
> > > 
> > > 
> > > Its not the commit you want ;)
> > 
> > Hmm, I take it back. It also can give you the same problem :
> > 
> > We use this allocator for rx path of drivers : 
> > 
> >  __netdev_alloc_skb() 
> > 
> > So its now absolutely possible that one skb->head is backed by a
order-3
> > page.
> > 
> > Is the problem coming from xen_netbk_count_skb_slots() ?
> > 
> > Give me more information if you want me to help.
> 
> The interesting code is in netbk_gop_skb(), specifically the two calls
> to netbk_gop_frag_copy.
> 
> netbk_gop_frag_copy can only copy order-0 pages to the peer since they
> go over a shared ring transport which can only deal in order-0 pages.
> 
> For the SKB head there is a loop which handles order>0 heads, I suspect
> we just need something similar for the frag case.
> 
> Although see my question in the other response about the maximum number
> of frags we can have when order is > 0 since if using larger pages
> causes us to end up with a much larger number of order-0 pages once
> we''ve broken them up then we have a problem and I need to put my
> thinking cap on a bit (perhaps substantially) tighter.
> 
> Konrad, it looks like netfront has a similar issue in
> xennet_make_frags() since it doesn''t shatter large order mappings
> either.
Hmm...

In theory, if a skb has 16+1 frags backed by compound pages, you could
need ~48 order-0 frags.

(4098 bytes could need 1-4096-1 (3 frags))

In practice, it should be around ~17 order-0 frags as before.

Ian Campbell

2012-Oct-09 14:40 UTC

head link

Re: compound skb frag pages appearing in start_xmit

On Tue, 2012-10-09 at 15:27 +0100, Eric Dumazet wrote:> On Tue, 2012-10-09 at 15:17 +0100, Ian Campbell wrote:
> 
> > Does the higher order pages effectively reduce the number of frags
which
> > are in use? e.g if MAX_SKB_FRAGS is 16, then for order-0 pages you
could
> > have 64K worth of frag data.
> > 
> > If we switch to order-3 pages everywhere then can the skb contain 512K
> > of data, or does the effective maximum number of frags in an skb
reduce
> > to 2?
> 
> effective number of frags reduce to 2 or 3
> 
> (We still limit GSO packets to ~63536 bytes)
Great! Then I think the fix is more/less trivial...

As an aside, when the skb head is < 4096 bytes is that necessarily a
compound page or might it just be a large kmalloc area?

Only really relevant since it impacts the possibility for code sharing
between the head and the frags sending.

Ian

Eric Dumazet

2012-Oct-09 14:51 UTC

head link

Re: compound skb frag pages appearing in start_xmit

On Tue, 2012-10-09 at 15:40 +0100, Ian Campbell wrote:> On Tue, 2012-10-09 at 15:27 +0100, Eric Dumazet wrote:
> > On Tue, 2012-10-09 at 15:17 +0100, Ian Campbell wrote:
> > 
> > > Does the higher order pages effectively reduce the number of
frags which
> > > are in use? e.g if MAX_SKB_FRAGS is 16, then for order-0 pages
you could
> > > have 64K worth of frag data.
> > > 
> > > If we switch to order-3 pages everywhere then can the skb contain
512K
> > > of data, or does the effective maximum number of frags in an skb
reduce
> > > to 2?
> > 
> > effective number of frags reduce to 2 or 3
> > 
> > (We still limit GSO packets to ~63536 bytes)
> 
> Great! Then I think the fix is more/less trivial...
> 
> As an aside, when the skb head is < 4096 bytes is that necessarily a
> compound page or might it just be a large kmalloc area?
> 
skb->head can be either allocated by kmalloc() (standard alloc_skb()) or
a page frag (if allocated in rx path)

Not sure its related to headlen/size...

Ian Campbell

2012-Oct-09 14:54 UTC

head link

Re: compound skb frag pages appearing in start_xmit

On Tue, 2012-10-09 at 15:33 +0100, Eric Dumazet wrote:> On Tue, 2012-10-09 at 15:23 +0100, Ian Campbell wrote:
> > On Tue, 2012-10-09 at 15:01 +0100, Eric Dumazet wrote:
> > > On Tue, 2012-10-09 at 15:54 +0200, Eric Dumazet wrote:
> > > > On Tue, 2012-10-09 at 14:47 +0100, Ian Campbell wrote:
> > > > > Hi Eric,
> > > > > 
> > > > 
> > > > Hi Ian
> > > > 
> > > > > Sander has discovered an issue where xen-netback is
given a compound
> > > > > page as one of the skb frag pages to transmit.
Currently netback can
> > > > > only handle PAGE_SIZE''d frags and bugs out.
> > > > > 
> > > > > I suspect this is something to do with 69b08f62e174
"net: use bigger
> > > > > pages in __netdev_alloc_frag", although perhaps
not because it looks
> > > > > like only tg3 uses it and Sander has an r8169. Also tg3
seems to only
> > > > > call netdev_alloc_frag for sizes < PAGE_SIZE.
I''m probably missing
> > > > > something.
> > > > 
> > > > 
> > > > Its not the commit you want ;)
> > > 
> > > Hmm, I take it back. It also can give you the same problem :
> > > 
> > > We use this allocator for rx path of drivers : 
> > > 
> > >  __netdev_alloc_skb() 
> > > 
> > > So its now absolutely possible that one skb->head is backed by
a order-3
> > > page.
> > > 
> > > Is the problem coming from xen_netbk_count_skb_slots() ?
> > > 
> > > Give me more information if you want me to help.
> > 
> > The interesting code is in netbk_gop_skb(), specifically the two calls
> > to netbk_gop_frag_copy.
> > 
> > netbk_gop_frag_copy can only copy order-0 pages to the peer since they
> > go over a shared ring transport which can only deal in order-0 pages.
> > 
> > For the SKB head there is a loop which handles order>0 heads, I
suspect
> > we just need something similar for the frag case.
> > 
> > Although see my question in the other response about the maximum
number
> > of frags we can have when order is > 0 since if using larger pages
> > causes us to end up with a much larger number of order-0 pages once
> > we''ve broken them up then we have a problem and I need to put
my
> > thinking cap on a bit (perhaps substantially) tighter.
> > 
> > Konrad, it looks like netfront has a similar issue in
> > xennet_make_frags() since it doesn''t shatter large order
mappings
> > either.
> 
> Hmm...
> 
> In theory, if a skb has 16+1 frags backed by compound pages, you could
> need ~48 order-0 frags.
> 
> (4098 bytes could need 1-4096-1 (3 frags))
> 
> In practice, it should be around ~17 order-0 frags as before.
Right, thanks. I think I can cope with that without needing to change
the PV protocol in any way.

Ian.

Ian Campbell

2012-Oct-10 10:13 UTC

head link

Re: compound skb frag pages appearing in start_xmit

On Tue, 2012-10-09 at 15:40 +0100, Ian Campbell wrote:> On Tue, 2012-10-09 at 15:27 +0100, Eric Dumazet wrote:
> > On Tue, 2012-10-09 at 15:17 +0100, Ian Campbell wrote:
> > 
> > > Does the higher order pages effectively reduce the number of
frags which
> > > are in use? e.g if MAX_SKB_FRAGS is 16, then for order-0 pages
you could
> > > have 64K worth of frag data.
> > > 
> > > If we switch to order-3 pages everywhere then can the skb contain
512K
> > > of data, or does the effective maximum number of frags in an skb
reduce
> > > to 2?
> > 
> > effective number of frags reduce to 2 or 3
> > 
> > (We still limit GSO packets to ~63536 bytes)
> 
> Great! Then I think the fix is more/less trivial...
The following seems to work for me.

I haven''t tackled netfront yet.

8<--------------------------------------------------------------

From 551e42e3dd203f2eb97cb082985013bb33b8f020 Mon Sep 17 00:00:00 2001
From: Ian Campbell <ian.campbell@citrix.com>
Date: Tue, 9 Oct 2012 15:51:20 +0100
Subject: [PATCH] xen: netback: handle compound page fragments on transmit.

An SKB paged fragment can consist of a compound page with order > 0.
However the netchannel protocol deals only in PAGE_SIZE frames.

Handle this in netbk_gop_frag_copy and xen_netbk_count_skb_slots by
iterating over the frames which make up the page.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
---
 drivers/net/xen-netback/netback.c |   40 ++++++++++++++++++++++++++++++++----
 1 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c
b/drivers/net/xen-netback/netback.c
index 4ebfcf3..d747e30 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -335,21 +335,35 @@ unsigned int xen_netbk_count_skb_slots(struct xenvif *vif,
struct sk_buff *skb)
 
 	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
 		unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
+		unsigned long offset = skb_shinfo(skb)->frags[i].page_offset;
 		unsigned long bytes;
+
+		offset &= ~PAGE_MASK;
+
 		while (size > 0) {
+			BUG_ON(offset >= PAGE_SIZE);
 			BUG_ON(copy_off > MAX_BUFFER_OFFSET);
 
-			if (start_new_rx_buffer(copy_off, size, 0)) {
+			bytes = PAGE_SIZE - offset;
+
+			if (bytes > size)
+				bytes = size;
+
+			if (start_new_rx_buffer(copy_off, bytes, 0)) {
 				count++;
 				copy_off = 0;
 			}
 
-			bytes = size;
 			if (copy_off + bytes > MAX_BUFFER_OFFSET)
 				bytes = MAX_BUFFER_OFFSET - copy_off;
 
 			copy_off += bytes;
+
+			offset += bytes;
 			size -= bytes;
+
+			if (offset == PAGE_SIZE)
+				offset = 0;
 		}
 	}
 	return count;
@@ -403,14 +417,24 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct
sk_buff *skb,
 	unsigned long bytes;
 
 	/* Data must not cross a page boundary. */
-	BUG_ON(size + offset > PAGE_SIZE);
+	BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
 
 	meta = npo->meta + npo->meta_prod - 1;
 
+	/* Skip unused frames from start of page */
+	page += offset >> PAGE_SHIFT;
+	offset &= ~PAGE_MASK;
+
 	while (size > 0) {
+		BUG_ON(offset >= PAGE_SIZE);
 		BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
 
-		if (start_new_rx_buffer(npo->copy_off, size, *head)) {
+		bytes = PAGE_SIZE - offset;
+
+		if (bytes > size)
+			bytes = size;
+
+		if (start_new_rx_buffer(npo->copy_off, bytes, *head)) {
 			/*
 			 * Netfront requires there to be some data in the head
 			 * buffer.
@@ -420,7 +444,6 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct
sk_buff *skb,
 			meta = get_next_rx_buffer(vif, npo);
 		}
 
-		bytes = size;
 		if (npo->copy_off + bytes > MAX_BUFFER_OFFSET)
 			bytes = MAX_BUFFER_OFFSET - npo->copy_off;
 
@@ -453,6 +476,13 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct
sk_buff *skb,
 		offset += bytes;
 		size -= bytes;
 
+		/* Next frame */
+		if (offset == PAGE_SIZE) {
+			BUG_ON(!PageCompound(page));
+			page++;
+			offset = 0;
+		}
+
 		/* Leave a gap for the GSO descriptor. */
 		if (*head && skb_shinfo(skb)->gso_size &&
!vif->gso_prefix)
 			vif->rx.req_cons++;
-- 
1.7.2.5

Sander Eikelenboom

2012-Oct-10 12:24 UTC

head link

Re: compound skb frag pages appearing in start_xmit

Wednesday, October 10, 2012, 12:13:04 PM, you wrote:
> On Tue, 2012-10-09 at 15:40 +0100, Ian Campbell wrote:
>> On Tue, 2012-10-09 at 15:27 +0100, Eric Dumazet wrote:
>> > On Tue, 2012-10-09 at 15:17 +0100, Ian Campbell wrote:
>> > 
>> > > Does the higher order pages effectively reduce the number of
frags which
>> > > are in use? e.g if MAX_SKB_FRAGS is 16, then for order-0
pages you could
>> > > have 64K worth of frag data.
>> > > 
>> > > If we switch to order-3 pages everywhere then can the skb
contain 512K
>> > > of data, or does the effective maximum number of frags in an
skb reduce
>> > > to 2?
>> > 
>> > effective number of frags reduce to 2 or 3
>> > 
>> > (We still limit GSO packets to ~63536 bytes)
>> 
>> Great! Then I think the fix is more/less trivial...
> The following seems to work for me.
But it doesn''t seem to work for me ... dmesg attached.

I don''t know if the "mcelog:4359 map pfn expected mapping type
write-back for [mem 0x0009f000-0x000a0fff], got uncached-minus"
is related, is shows up right after the nics get initialized ?

netback still fails with:

[  191.777994] ------------[ cut here ]------------
[  191.784245] kernel BUG at drivers/net/xen-netback/netback.c:481!
[  191.790423] invalid opcode: 0000 [#1] PREEMPT SMP 
[  191.796462] Modules linked in:
[  191.802315] CPU 1 
[  191.802367] Pid: 1177, comm: netback/1 Tainted: G        W   
3.6.0pre-rc1-20121010 #1 MSI MS-7640/890FXA-GD70 (MS-7640)
[  191.814043] RIP: e030:[<ffffffff8146de61>]  [<ffffffff8146de61>]
netbk_gop_frag_copy+0x3f1/0x400
[  191.820171] RSP: e02b:ffff880037c6bb98  EFLAGS: 00010246
[  191.826271] RAX: 0000000000000244 RBX: ffffc90010827f98 RCX: ffff880031ed9880
[  191.832450] RDX: 00000000000000a8 RSI: ffff880037c6bd24 RDI: ffffea0000b03f80
[  191.838581] RBP: ffff880037c6bc28 R08: ffff8800319f8100 R09: 0000000000001000
[  191.844739] R10: 0000000000000000 R11: 0000000000000132 R12: 00000000000000a8
[  191.850785] R13: ffff880037c6bcd8 R14: 0000000000001000 R15: ffffc9001082cf70
[  191.856741] FS:  00007f9f3c944700(0000) GS:ffff88003f840000(0000)
knlGS:0000000000000000
[  191.862841] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[  191.868901] CR2: 0000000001337ca0 CR3: 0000000032cec000 CR4: 0000000000000660
[  191.875053] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  191.881175] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  191.887247] Process netback/1 (pid: 1177, threadinfo ffff880037c6a000, task
ffff880039984140)
[  191.893325] Stack:
[  191.899328]  ffff880037c6bd24 00000000000000a8 ffff8800319f8100
ffff880031ed9880
[  191.905534]  ffffc90000000000 0000000000001000 0000000000000000
0000000000000000
[  191.911742]  ffff880000000000 ffffffff817459f3 ffffc90010823420
ffffea0000b03f80
[  191.917898] Call Trace:
[  191.923939]  [<ffffffff817459f3>] ?
_raw_spin_unlock_irqrestore+0x53/0xa0
[  191.930141]  [<ffffffff8146e1cb>] xen_netbk_rx_action+0x30b/0x830
[  191.936543]  [<ffffffff810ad22d>] ? trace_hardirqs_on+0xd/0x10
[  191.942942]  [<ffffffff8146f6da>] xen_netbk_kthread+0xba/0xa90
[  191.949147]  [<ffffffff81095b06>] ? try_to_wake_up+0x1b6/0x310
[  191.955250]  [<ffffffff81086b40>] ? wake_up_bit+0x40/0x40
[  191.961421]  [<ffffffff8146f620>] ? xen_netbk_tx_build_gops+0xa70/0xa70
[  191.967660]  [<ffffffff810864d6>] kthread+0xd6/0xe0
[  191.973834]  [<ffffffff81086400>] ? __init_kthread_worker+0x70/0x70
[  191.979953]  [<ffffffff8174677c>] ret_from_fork+0x7c/0x90
[  191.986107]  [<ffffffff81086400>] ? __init_kthread_worker+0x70/0x70
[  191.992174] Code: b8 b3 00 00 48 8d 8c f1 60 01 00 00 48 3b 14 01 0f 85 72 fc
ff ff e9 7a fc ff ff 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe <0f>
0b eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83
[  192.005230] RIP  [<ffffffff8146de61>] netbk_gop_frag_copy+0x3f1/0x400
[  192.011786]  RSP <ffff880037c6bb98>
[  192.018402] ---[ end trace c51ab5e2c2c918fc ]---


--

Sander
> I haven''t tackled netfront yet.
> 8<--------------------------------------------------------------
> From 551e42e3dd203f2eb97cb082985013bb33b8f020 Mon Sep 17 00:00:00 2001
> From: Ian Campbell <ian.campbell@citrix.com>
> Date: Tue, 9 Oct 2012 15:51:20 +0100
> Subject: [PATCH] xen: netback: handle compound page fragments on transmit.
> An SKB paged fragment can consist of a compound page with order > 0.
> However the netchannel protocol deals only in PAGE_SIZE frames.
> Handle this in netbk_gop_frag_copy and xen_netbk_count_skb_slots by
> iterating over the frames which make up the page.
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
> Cc: Sander Eikelenboom <linux@eikelenboom.it>
> ---
>  drivers/net/xen-netback/netback.c |   40
++++++++++++++++++++++++++++++++----
>  1 files changed, 35 insertions(+), 5 deletions(-)
> diff --git a/drivers/net/xen-netback/netback.c
b/drivers/net/xen-netback/netback.c
> index 4ebfcf3..d747e30 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -335,21 +335,35 @@ unsigned int xen_netbk_count_skb_slots(struct xenvif
*vif, struct sk_buff *skb)
>  
>         for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
>                 unsigned long size =
skb_frag_size(&skb_shinfo(skb)->frags[i]);
> +               unsigned long offset =
skb_shinfo(skb)->frags[i].page_offset;
>                 unsigned long bytes;
> +
> +               offset &= ~PAGE_MASK;
> +
>                 while (size > 0) {
> +                       BUG_ON(offset >= PAGE_SIZE);
>                         BUG_ON(copy_off > MAX_BUFFER_OFFSET);
>  
> -                       if (start_new_rx_buffer(copy_off, size, 0)) {
> +                       bytes = PAGE_SIZE - offset;
> +
> +                       if (bytes > size)
> +                               bytes = size;
> +
> +                       if (start_new_rx_buffer(copy_off, bytes, 0)) {
>                                 count++;
>                                 copy_off = 0;
>                         }
>  
> -                       bytes = size;
>                         if (copy_off + bytes > MAX_BUFFER_OFFSET)
>                                 bytes = MAX_BUFFER_OFFSET - copy_off;
>  
>                         copy_off += bytes;
> +
> +                       offset += bytes;
>                         size -= bytes;
> +
> +                       if (offset == PAGE_SIZE)
> +                               offset = 0;
>                 }
>         }
>         return count;
> @@ -403,14 +417,24 @@ static void netbk_gop_frag_copy(struct xenvif *vif,
struct sk_buff *skb,
>         unsigned long bytes;
>  
>         /* Data must not cross a page boundary. */
> -       BUG_ON(size + offset > PAGE_SIZE);
> +       BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
>  
>         meta = npo->meta + npo->meta_prod - 1;
>  
> +       /* Skip unused frames from start of page */
> +       page += offset >> PAGE_SHIFT;
> +       offset &= ~PAGE_MASK;
> +
>         while (size > 0) {
> +               BUG_ON(offset >= PAGE_SIZE);
>                 BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
>  
> -               if (start_new_rx_buffer(npo->copy_off, size, *head)) {
> +               bytes = PAGE_SIZE - offset;
> +
> +               if (bytes > size)
> +                       bytes = size;
> +
> +               if (start_new_rx_buffer(npo->copy_off, bytes, *head)) {
>                         /*
>                          * Netfront requires there to be some data in the
head
>                          * buffer.
> @@ -420,7 +444,6 @@ static void netbk_gop_frag_copy(struct xenvif *vif,
struct sk_buff *skb,
>                         meta = get_next_rx_buffer(vif, npo);
>                 }
>  
> -               bytes = size;
>                 if (npo->copy_off + bytes > MAX_BUFFER_OFFSET)
>                         bytes = MAX_BUFFER_OFFSET - npo->copy_off;
>  
> @@ -453,6 +476,13 @@ static void netbk_gop_frag_copy(struct xenvif *vif,
struct sk_buff *skb,
>                 offset += bytes;
>                 size -= bytes;
>  
> +               /* Next frame */
> +               if (offset == PAGE_SIZE) {
> +                       BUG_ON(!PageCompound(page));
> +                       page++;
> +                       offset = 0;
> +               }
> +
>                 /* Leave a gap for the GSO descriptor. */
>                 if (*head && skb_shinfo(skb)->gso_size
&& !vif->gso_prefix)
>                         vif->rx.req_cons++;

Ian Campbell

2012-Oct-10 12:29 UTC

head link

Re: compound skb frag pages appearing in start_xmit

On Wed, 2012-10-10 at 13:24 +0100, Sander Eikelenboom
wrote:> Wednesday, October 10, 2012, 12:13:04 PM, you wrote:
> 
> > On Tue, 2012-10-09 at 15:40 +0100, Ian Campbell wrote:
> >> On Tue, 2012-10-09 at 15:27 +0100, Eric Dumazet wrote:
> >> > On Tue, 2012-10-09 at 15:17 +0100, Ian Campbell wrote:
> >> > 
> >> > > Does the higher order pages effectively reduce the
number of frags which
> >> > > are in use? e.g if MAX_SKB_FRAGS is 16, then for order-0
pages you could
> >> > > have 64K worth of frag data.
> >> > > 
> >> > > If we switch to order-3 pages everywhere then can the
skb contain 512K
> >> > > of data, or does the effective maximum number of frags
in an skb reduce
> >> > > to 2?
> >> > 
> >> > effective number of frags reduce to 2 or 3
> >> > 
> >> > (We still limit GSO packets to ~63536 bytes)
> >> 
> >> Great! Then I think the fix is more/less trivial...
> 
> > The following seems to work for me.
> 
> But it doesn''t seem to work for me ... dmesg attached.
> [  191.777994] ------------[ cut here ]------------
> [  191.784245] kernel BUG at drivers/net/xen-netback/netback.c:481!
Looks like that BUG_ON is a little aggressive. It''ll trigger if the
data
happens to end on a frame boundary. Hopefully this will fix it for you:

diff --git a/drivers/net/xen-netback/netback.c
b/drivers/net/xen-netback/netback.c
index d747e30..f2d6b78 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -477,7 +477,7 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct
sk_buff *skb,
 		size -= bytes;
 
 		/* Next frame */
-		if (offset == PAGE_SIZE) {
+		if (offset == PAGE_SIZE && size) {
 			BUG_ON(!PageCompound(page));
 			page++;
 			offset = 0;

Ian Campbell

2012-Oct-10 13:09 UTC

head link

Re: compound skb frag pages appearing in start_xmit

On Wed, 2012-10-10 at 11:13 +0100, Ian Campbell wrote:> I haven''t tackled netfront yet. 
I seem to be totally unable to reproduce the equivalent issue on the
netfront xmit side, even though it seems like the loop in
xennet_make_frags ought to be obviously susceptible to it.

Konrad, Sander, are either of you able to repro, e.g. with:

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index b06ef81..8a3f770 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -462,6 +462,8 @@ static void xennet_make_frags(struct sk_buff *skb, struct
net_device *dev,
 		ref = gnttab_claim_grant_reference(&np->gref_tx_head);
 		BUG_ON((signed short)ref < 0);
 
+		BUG_ON(PageCompound(skb_frag_page(frag)));
+
 		mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag)));
 		gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
 						mfn, GNTMAP_readonly);

My repro for netback was just to netcat a wodge of data from dom0->domU
but going the other way doesn''t seem to trigger.

Sander Eikelenboom

2012-Oct-10 13:31 UTC

head link

Re: compound skb frag pages appearing in start_xmit

Wednesday, October 10, 2012, 2:29:09 PM, you wrote:
> On Wed, 2012-10-10 at 13:24 +0100, Sander Eikelenboom wrote:
>> Wednesday, October 10, 2012, 12:13:04 PM, you wrote:
>> 
>> > On Tue, 2012-10-09 at 15:40 +0100, Ian Campbell wrote:
>> >> On Tue, 2012-10-09 at 15:27 +0100, Eric Dumazet wrote:
>> >> > On Tue, 2012-10-09 at 15:17 +0100, Ian Campbell wrote:
>> >> > 
>> >> > > Does the higher order pages effectively reduce the
number of frags which
>> >> > > are in use? e.g if MAX_SKB_FRAGS is 16, then for
order-0 pages you could
>> >> > > have 64K worth of frag data.
>> >> > > 
>> >> > > If we switch to order-3 pages everywhere then can
the skb contain 512K
>> >> > > of data, or does the effective maximum number of
frags in an skb reduce
>> >> > > to 2?
>> >> > 
>> >> > effective number of frags reduce to 2 or 3
>> >> > 
>> >> > (We still limit GSO packets to ~63536 bytes)
>> >> 
>> >> Great! Then I think the fix is more/less trivial...
>> 
>> > The following seems to work for me.
>> 
>> But it doesn''t seem to work for me ... dmesg attached.
>> [  191.777994] ------------[ cut here ]------------
>> [  191.784245] kernel BUG at drivers/net/xen-netback/netback.c:481!
> Looks like that BUG_ON is a little aggressive. It''ll trigger if
the data
> happens to end on a frame boundary. Hopefully this will fix it for you:
Yes it does !
Thanks .. will recompile and test the netfront case as well

--
Sander
> diff --git a/drivers/net/xen-netback/netback.c
b/drivers/net/xen-netback/netback.c
> index d747e30..f2d6b78 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -477,7 +477,7 @@ static void netbk_gop_frag_copy(struct xenvif *vif,
struct sk_buff *skb,
>                 size -= bytes;
>  
>                 /* Next frame */
> -               if (offset == PAGE_SIZE) {
> +               if (offset == PAGE_SIZE && size) {
>                         BUG_ON(!PageCompound(page));
>                         page++;
>                         offset = 0;

Sander Eikelenboom

2012-Oct-10 14:49 UTC

head link

Re: compound skb frag pages appearing in start_xmit

Wednesday, October 10, 2012, 3:09:58 PM, you wrote:
> On Wed, 2012-10-10 at 11:13 +0100, Ian Campbell wrote:
>> I haven''t tackled netfront yet. 
> I seem to be totally unable to reproduce the equivalent issue on the
> netfront xmit side, even though it seems like the loop in
> xennet_make_frags ought to be obviously susceptible to it.
> Konrad, Sander, are either of you able to repro, e.g. with:

Hmrrrmm i don''t see any traces, only strange behaviour ..

- i can connect to guests by ssh, but it''s sluggish, and sometimes
stops working
- The guest seem to keep trying to connect to netback:

[  658.276719] xen_bridge: port 2(vif40.0) entered forwarding state
[  658.282258] xen_bridge: port 2(vif40.0) entered forwarding state
[  663.945964] xen_bridge: port 7(vif39.0) entered forwarding state
[  669.674277] xen_bridge: port 2(vif40.0) entered disabled state
[  669.680290] device vif40.0 left promiscuous mode
[  669.685464] xen_bridge: port 2(vif40.0) entered disabled state
[  672.857222] device vif41.0 entered promiscuous mode
[  673.166254] xen-blkback:ring-ref 8, event-channel 9, protocol 1 (x86_64-abi)
[  673.176368] xen_bridge: port 2(vif41.0) entered forwarding state
[  673.182042] xen_bridge: port 2(vif41.0) entered forwarding state
[  674.439725] xen_bridge: port 7(vif39.0) entered disabled state
[  674.445708] device vif39.0 left promiscuous mode
[  674.450955] xen_bridge: port 7(vif39.0) entered disabled state
[  677.726040] device vif42.0 entered promiscuous mode
[  678.053381] xen-blkback:ring-ref 8, event-channel 9, protocol 1 (x86_64-abi)
[  678.062804] xen_bridge: port 7(vif42.0) entered forwarding state
[  678.068433] xen_bridge: port 7(vif42.0) entered forwarding state
[  688.224736] xen_bridge: port 2(vif41.0) entered forwarding state
[  693.080557] xen_bridge: port 7(vif42.0) entered forwarding state
[  700.786276] xen_bridge: port 7(vif42.0) entered disabled state
[  700.792484] device vif42.0 left promiscuous mode
[  700.802409] xen_bridge: port 7(vif42.0) entered disabled state
[  704.133606] device vif43.0 entered promiscuous mode
[  704.460160] xen-blkback:ring-ref 8, event-channel 9, protocol 1 (x86_64-abi)
[  704.469800] xen_bridge: port 7(vif43.0) entered forwarding state
[  704.475303] xen_bridge: port 7(vif43.0) entered forwarding state
[  719.493788] xen_bridge: port 7(vif43.0) entered forwarding state
[  726.302456] xen_bridge: port 7(vif43.0) entered disabled state
[  726.308898] device vif43.0 left promiscuous mode
[  726.314029] xen_bridge: port 7(vif43.0) entered disabled state

All the guests are already up, but this keeps on going and going and going ....


> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index b06ef81..8a3f770 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -462,6 +462,8 @@ static void xennet_make_frags(struct sk_buff *skb,
struct net_device *dev,
>                 ref =
gnttab_claim_grant_reference(&np->gref_tx_head);
>                 BUG_ON((signed short)ref < 0);
>  
> +               BUG_ON(PageCompound(skb_frag_page(frag)));
> +
>                 mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag)));
>                 gnttab_grant_foreign_access_ref(ref,
np->xbdev->otherend_id,
>                                                 mfn, GNTMAP_readonly);
> My repro for netback was just to netcat a wodge of data from dom0->domU
> but going the other way doesn''t seem to trigger.

Ian Campbell

2012-Oct-11 08:02 UTC

head link

Re: compound skb frag pages appearing in start_xmit

On Wed, 2012-10-10 at 15:49 +0100, Sander Eikelenboom
wrote:> Wednesday, October 10, 2012, 3:09:58 PM, you wrote:
> 
> > On Wed, 2012-10-10 at 11:13 +0100, Ian Campbell wrote:
> >> I haven''t tackled netfront yet. 
> 
> > I seem to be totally unable to reproduce the equivalent issue on the
> > netfront xmit side, even though it seems like the loop in
> > xennet_make_frags ought to be obviously susceptible to it.
> 
> > Konrad, Sander, are either of you able to repro, e.g. with:
> 
> 
> Hmrrrmm i don''t see any traces, only strange behaviour ..
> 
> - i can connect to guests by ssh, but it''s sluggish, and sometimes
stops working
I saw something like this (ssh sluggish) even with dom0 itself. I''m
trying to see if I can characterise it enough to reliably bisect it.

I already switched out xen-unstable for 4.2-testing but that didn''t
make
any difference.
> - The guest seem to keep trying to connect to netback:
> 
> [  658.276719] xen_bridge: port 2(vif40.0) entered forwarding state
> [  658.282258] xen_bridge: port 2(vif40.0) entered forwarding state
> [  663.945964] xen_bridge: port 7(vif39.0) entered forwarding state
> [  669.674277] xen_bridge: port 2(vif40.0) entered disabled state
> [  669.680290] device vif40.0 left promiscuous mode
> [  669.685464] xen_bridge: port 2(vif40.0) entered disabled state
> [  672.857222] device vif41.0 entered promiscuous mode
> [  673.166254] xen-blkback:ring-ref 8, event-channel 9, protocol 1
(x86_64-abi)
> [  673.176368] xen_bridge: port 2(vif41.0) entered forwarding state
> [  673.182042] xen_bridge: port 2(vif41.0) entered forwarding state
> [  674.439725] xen_bridge: port 7(vif39.0) entered disabled state
> [  674.445708] device vif39.0 left promiscuous mode
> [  674.450955] xen_bridge: port 7(vif39.0) entered disabled state
> [  677.726040] device vif42.0 entered promiscuous mode
> [  678.053381] xen-blkback:ring-ref 8, event-channel 9, protocol 1
(x86_64-abi)
> [  678.062804] xen_bridge: port 7(vif42.0) entered forwarding state
> [  678.068433] xen_bridge: port 7(vif42.0) entered forwarding state
> [  688.224736] xen_bridge: port 2(vif41.0) entered forwarding state
> [  693.080557] xen_bridge: port 7(vif42.0) entered forwarding state
> [  700.786276] xen_bridge: port 7(vif42.0) entered disabled state
> [  700.792484] device vif42.0 left promiscuous mode
> [  700.802409] xen_bridge: port 7(vif42.0) entered disabled state
> [  704.133606] device vif43.0 entered promiscuous mode
> [  704.460160] xen-blkback:ring-ref 8, event-channel 9, protocol 1
(x86_64-abi)
> [  704.469800] xen_bridge: port 7(vif43.0) entered forwarding state
> [  704.475303] xen_bridge: port 7(vif43.0) entered forwarding state
> [  719.493788] xen_bridge: port 7(vif43.0) entered forwarding state
> [  726.302456] xen_bridge: port 7(vif43.0) entered disabled state
> [  726.308898] device vif43.0 left promiscuous mode
> [  726.314029] xen_bridge: port 7(vif43.0) entered disabled state
> 
> All the guests are already up, but this keeps on going and going and going
....
The domain number seems to be climbing, are you sure something isn''t
(crashing and) restarting?
> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > index b06ef81..8a3f770 100644
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -462,6 +462,8 @@ static void xennet_make_frags(struct sk_buff *skb,
struct net_device *dev,
> >                 ref =
gnttab_claim_grant_reference(&np->gref_tx_head);
> >                 BUG_ON((signed short)ref < 0);
> >  
> > +               BUG_ON(PageCompound(skb_frag_page(frag)));
> > +
> >                 mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag)));
> >                 gnttab_grant_foreign_access_ref(ref,
np->xbdev->otherend_id,
> >                                                 mfn, GNTMAP_readonly);
> 
> > My repro for netback was just to netcat a wodge of data from
dom0->domU
> > but going the other way doesn''t seem to trigger.
> 
> 
>

Sander Eikelenboom

2012-Oct-11 10:00 UTC

head link

Re: compound skb frag pages appearing in start_xmit

Thursday, October 11, 2012, 10:02:26 AM, you wrote:
> On Wed, 2012-10-10 at 15:49 +0100, Sander Eikelenboom wrote:
>> Wednesday, October 10, 2012, 3:09:58 PM, you wrote:
>> 
>> > On Wed, 2012-10-10 at 11:13 +0100, Ian Campbell wrote:
>> >> I haven''t tackled netfront yet. 
>> 
>> > I seem to be totally unable to reproduce the equivalent issue on
the
>> > netfront xmit side, even though it seems like the loop in
>> > xennet_make_frags ought to be obviously susceptible to it.
>> 
>> > Konrad, Sander, are either of you able to repro, e.g. with:
>> 
>> 
>> Hmrrrmm i don''t see any traces, only strange behaviour ..
>> 
>> - i can connect to guests by ssh, but it''s sluggish, and
sometimes stops working
> I saw something like this (ssh sluggish) even with dom0 itself.
I''m
> trying to see if I can characterise it enough to reliably bisect it.
> I already switched out xen-unstable for 4.2-testing but that
didn''t make
> any difference.

>> - The guest seem to keep trying to connect to netback:
>> 
>> [  658.276719] xen_bridge: port 2(vif40.0) entered forwarding state
>> [  658.282258] xen_bridge: port 2(vif40.0) entered forwarding state
>> [  663.945964] xen_bridge: port 7(vif39.0) entered forwarding state
>> [  669.674277] xen_bridge: port 2(vif40.0) entered disabled state
>> [  669.680290] device vif40.0 left promiscuous mode
>> [  669.685464] xen_bridge: port 2(vif40.0) entered disabled state
>> [  672.857222] device vif41.0 entered promiscuous mode
>> [  673.166254] xen-blkback:ring-ref 8, event-channel 9, protocol 1
(x86_64-abi)
>> [  673.176368] xen_bridge: port 2(vif41.0) entered forwarding state
>> [  673.182042] xen_bridge: port 2(vif41.0) entered forwarding state
>> [  674.439725] xen_bridge: port 7(vif39.0) entered disabled state
>> [  674.445708] device vif39.0 left promiscuous mode
>> [  674.450955] xen_bridge: port 7(vif39.0) entered disabled state
>> [  677.726040] device vif42.0 entered promiscuous mode
>> [  678.053381] xen-blkback:ring-ref 8, event-channel 9, protocol 1
(x86_64-abi)
>> [  678.062804] xen_bridge: port 7(vif42.0) entered forwarding state
>> [  678.068433] xen_bridge: port 7(vif42.0) entered forwarding state
>> [  688.224736] xen_bridge: port 2(vif41.0) entered forwarding state
>> [  693.080557] xen_bridge: port 7(vif42.0) entered forwarding state
>> [  700.786276] xen_bridge: port 7(vif42.0) entered disabled state
>> [  700.792484] device vif42.0 left promiscuous mode
>> [  700.802409] xen_bridge: port 7(vif42.0) entered disabled state
>> [  704.133606] device vif43.0 entered promiscuous mode
>> [  704.460160] xen-blkback:ring-ref 8, event-channel 9, protocol 1
(x86_64-abi)
>> [  704.469800] xen_bridge: port 7(vif43.0) entered forwarding state
>> [  704.475303] xen_bridge: port 7(vif43.0) entered forwarding state
>> [  719.493788] xen_bridge: port 7(vif43.0) entered forwarding state
>> [  726.302456] xen_bridge: port 7(vif43.0) entered disabled state
>> [  726.308898] device vif43.0 left promiscuous mode
>> [  726.314029] xen_bridge: port 7(vif43.0) entered disabled state
>> 
>> All the guests are already up, but this keeps on going and going and
going ....
> The domain number seems to be climbing, are you sure something
isn''t
> (crashing and) restarting?
Probably due to the BUG_ON from the patch below, i changed it into a WARN_ON.
And i seem to hit it, but only in one of the guests at the moment and it
triggers quite irregularly.

[   34.298549] ------------[ cut here ]------------
[   34.298567] WARNING: at drivers/net/xen-netfront.c:465
xennet_start_xmit+0x7fe/0x860()
[   34.298574] Modules linked in:
[   34.298597] Pid: 1580, comm: sshd Not tainted 3.6.0pre-rc1-20121011 #1
[   34.298603] Call Trace:
[   34.298611]  [<ffffffff810664ea>] warn_slowpath_common+0x7a/0xb0
[   34.298617]  [<ffffffff81066535>] warn_slowpath_null+0x15/0x20
[   34.298623]  [<ffffffff8146d89e>] xennet_start_xmit+0x7fe/0x860
[   34.298631]  [<ffffffff8161f349>] dev_hard_start_xmit+0x209/0x460
[   34.298637]  [<ffffffff8163b036>] sch_direct_xmit+0xf6/0x290
[   34.298643]  [<ffffffff8161f746>] dev_queue_xmit+0x1a6/0x5a0
[   34.298649]  [<ffffffff8161f5a0>] ? dev_hard_start_xmit+0x460/0x460
[   34.298656]  [<ffffffff810aa8e5>] ? trace_softirqs_off+0x85/0x1b0
[   34.298663]  [<ffffffff816b9536>] ip_finish_output+0x226/0x530
[   34.298668]  [<ffffffff816b93dd>] ? ip_finish_output+0xcd/0x530
[   34.298674]  [<ffffffff816b9899>] ip_output+0x59/0xe0
[   34.298680]  [<ffffffff816b83b8>] ip_local_out+0x28/0x90
[   34.298687]  [<ffffffff816b896f>] ip_queue_xmit+0x17f/0x4a0
[   34.298692]  [<ffffffff816b87f0>] ? ip_send_unicast_reply+0x340/0x340
[   34.298699]  [<ffffffff810a0ba7>] ? getnstimeofday+0x47/0xe0
[   34.298705]  [<ffffffff8160f4c9>] ? __skb_clone+0x29/0x120
[   34.298711]  [<ffffffff816cea20>] tcp_transmit_skb+0x400/0x8d0
[   34.298717]  [<ffffffff816d19fa>] tcp_write_xmit+0x21a/0xa50
[   34.298723]  [<ffffffff816d225b>] tcp_push_one+0x2b/0x40
[   34.298728]  [<ffffffff816c2dec>] tcp_sendmsg+0x8dc/0xe20
[   34.298735]  [<ffffffff816e8f19>] inet_sendmsg+0xa9/0x100
[   34.298740]  [<ffffffff816e8e70>] ? inet_autobind+0x70/0x70
[   34.298746]  [<ffffffff810b0f88>] ? lock_acquire+0xd8/0x100
[   34.298753]  [<ffffffff8160630d>] sock_aio_write+0x12d/0x140
[   34.298762]  [<ffffffff811435b2>] do_sync_write+0xa2/0xe0
[   34.298768]  [<ffffffff810ad22d>] ? trace_hardirqs_on+0xd/0x10
[   34.298774]  [<ffffffff811441d4>] vfs_write+0x174/0x190
[   34.298779]  [<ffffffff811442fa>] sys_write+0x5a/0xa0
[   34.298786]  [<ffffffff812b33de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[   34.298792]  [<ffffffff817491cc>] cstar_dispatch+0x7/0x26
[   34.298797] ---[ end trace 2e28eec93b7a8b74 ]---


Complete dmesg from guest attached.


>> > diff --git a/drivers/net/xen-netfront.c
b/drivers/net/xen-netfront.c
>> > index b06ef81..8a3f770 100644
>> > --- a/drivers/net/xen-netfront.c
>> > +++ b/drivers/net/xen-netfront.c
>> > @@ -462,6 +462,8 @@ static void xennet_make_frags(struct sk_buff
*skb, struct net_device *dev,
>> >                 ref =
gnttab_claim_grant_reference(&np->gref_tx_head);
>> >                 BUG_ON((signed short)ref < 0);
>> >  
>> > +               BUG_ON(PageCompound(skb_frag_page(frag)));
>> > +
>> >                 mfn =
pfn_to_mfn(page_to_pfn(skb_frag_page(frag)));
>> >                 gnttab_grant_foreign_access_ref(ref,
np->xbdev->otherend_id,
>> >                                                 mfn,
GNTMAP_readonly);
>> 
>> > My repro for netback was just to netcat a wodge of data from
dom0->domU
>> > but going the other way doesn''t seem to trigger.
>> 
>> 
>> 



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Eric Dumazet

2012-Oct-11 10:05 UTC

head link

Re: compound skb frag pages appearing in start_xmit

On Thu, 2012-10-11 at 12:00 +0200, Sander Eikelenboom wrote:
> Probably due to the BUG_ON from the patch below, i changed it into a
WARN_ON.
> And i seem to hit it, but only in one of the guests at the moment and it
triggers quite irregularly.
xennet_make_frags() is able to split the skb->head in multiple page-size
chunks.

It should do the same for fragments

Ian Campbell

2012-Oct-11 10:14 UTC

head link

Re: compound skb frag pages appearing in start_xmit

On Thu, 2012-10-11 at 11:05 +0100, Eric Dumazet wrote:> On Thu, 2012-10-11 at 12:00 +0200, Sander Eikelenboom wrote:
> 
> > Probably due to the BUG_ON from the patch below, i changed it into a
WARN_ON.
> > And i seem to hit it, but only in one of the guests at the moment and
it triggers quite irregularly.
> 
> xennet_make_frags() is able to split the skb->head in multiple page-size
> chunks.
> 
> It should do the same for fragments
Right, I just want to be reproduce the issue so I can know I''ve fixed
it
properly ;-)

Ian.

Sander Eikelenboom

2012-Oct-11 10:20 UTC

head link

Re: compound skb frag pages appearing in start_xmit

Thursday, October 11, 2012, 12:14:54 PM, you wrote:
> On Thu, 2012-10-11 at 11:05 +0100, Eric Dumazet wrote:
>> On Thu, 2012-10-11 at 12:00 +0200, Sander Eikelenboom wrote:
>> 
>> > Probably due to the BUG_ON from the patch below, i changed it into
a WARN_ON.
>> > And i seem to hit it, but only in one of the guests at the moment
and it triggers quite irregularly.
>> 
>> xennet_make_frags() is able to split the skb->head in multiple
page-size
>> chunks.
>> 
>> It should do the same for fragments
> Right, I just want to be reproduce the issue so I can know I''ve
fixed it
> properly ;-)
Trying to scp/sftp files from a guest seems to trigger it for me ..
> Ian.

Xen devel - Oct 2012 - compound skb frag pages appearing in start_xmit

compound skb frag pages appearing in start_xmit

Re: compound skb frag pages appearing in start_xmit

Re: compound skb frag pages appearing in start_xmit

Re: compound skb frag pages appearing in start_xmit

Re: compound skb frag pages appearing in start_xmit

Re: compound skb frag pages appearing in start_xmit

Re: compound skb frag pages appearing in start_xmit

Re: compound skb frag pages appearing in start_xmit

Re: compound skb frag pages appearing in start_xmit

Re: compound skb frag pages appearing in start_xmit

Re: compound skb frag pages appearing in start_xmit

Re: compound skb frag pages appearing in start_xmit

Re: compound skb frag pages appearing in start_xmit

Re: compound skb frag pages appearing in start_xmit

Re: compound skb frag pages appearing in start_xmit

Re: compound skb frag pages appearing in start_xmit

Re: compound skb frag pages appearing in start_xmit

Re: compound skb frag pages appearing in start_xmit

Re: compound skb frag pages appearing in start_xmit

Re: compound skb frag pages appearing in start_xmit

Re: compound skb frag pages appearing in start_xmit