Sander Eikelenboom
2012-Nov-19 15:43 UTC
Re: compound skb frag pages appearing in start_xmit
Thursday, November 15, 2012, 3:31:42 AM, you wrote:> On 2012-10-11 18:14, Ian Campbell wrote: >> On Thu, 2012-10-11 at 11:05 +0100, Eric Dumazet wrote: >>> On Thu, 2012-10-11 at 12:00 +0200, Sander Eikelenboom wrote: >>> >>>> Probably due to the BUG_ON from the patch below, i changed it into a WARN_ON. >>>> And i seem to hit it, but only in one of the guests at the moment and it triggers quite irregularly. >>> xennet_make_frags() is able to split the skb->head in multiple page-size >>> chunks. >>> >>> It should do the same for fragments >> Right, I just want to be reproduce the issue so I can know I''ve fixed it >> properly ;-) > Hi Ian,> I can reproduce this BUG_ON when running netperf/netserver test between > two domus running on the same dom0. The domu and dom0 all use v3.7-rc1.> When I tried to rebase my persistent grant netfront/netback patch on > latest kernel, netperf/netserver test never succeeded. I did some test > to find out that v3.6-rc7 works fine, but v3.7-rc1, v3.7-rc2 and > v3.7-rc4 does not succeed in netperf/netserver test. So I keep my > persistent grant patch only based on v3.4-rc3 now.> Konrad thought about commit 6a8ed462f16b8455eec5ae00eb6014159a6721f0 in > v3.7-rc1, and suggested me to test your debug patch in netfront. This > BUG_ON happens soon after running the netperf/netserver test case.> Thanks > AnnieIs there any progression with this bug (rc6 is out the door, so the release of 3.7-final seems to be eminent and this bug completely cripples any networking with guests) ? -- Sander>> >> Ian. >> >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel
On 19.11.2012 16:43, Sander Eikelenboom wrote:> > Thursday, November 15, 2012, 3:31:42 AM, you wrote: > >> On 2012-10-11 18:14, Ian Campbell wrote: >>> On Thu, 2012-10-11 at 11:05 +0100, Eric Dumazet wrote: >>>> On Thu, 2012-10-11 at 12:00 +0200, Sander Eikelenboom wrote: >>>> >>>>> Probably due to the BUG_ON from the patch below, i changed it into a WARN_ON. >>>>> And i seem to hit it, but only in one of the guests at the moment and it triggers quite irregularly. >>>> xennet_make_frags() is able to split the skb->head in multiple page-size >>>> chunks. >>>> >>>> It should do the same for fragments >>> Right, I just want to be reproduce the issue so I can know I''ve fixed it >>> properly ;-) >> Hi Ian, > >> I can reproduce this BUG_ON when running netperf/netserver test between >> two domus running on the same dom0. The domu and dom0 all use v3.7-rc1. > >> When I tried to rebase my persistent grant netfront/netback patch on >> latest kernel, netperf/netserver test never succeeded. I did some test >> to find out that v3.6-rc7 works fine, but v3.7-rc1, v3.7-rc2 and >> v3.7-rc4 does not succeed in netperf/netserver test. So I keep my >> persistent grant patch only based on v3.4-rc3 now. > >> Konrad thought about commit 6a8ed462f16b8455eec5ae00eb6014159a6721f0 in >> v3.7-rc1, and suggested me to test your debug patch in netfront. This >> BUG_ON happens soon after running the netperf/netserver test case. > >> Thanks >> Annie > > Is there any progression with this bug (rc6 is out the door, so the release of 3.7-final seems to be eminent and this bug completely cripples any networking with guests) ? >+1 on that. I was testing yesterday with a PVM domU running 3.7-rc5 on Xen 4.2 (but also reported from EC2 running Xen 3.4.3) c with one VCPU. I actually can trigger it by just ssh''ing into the domU (from another machine) and then run "find /". Output starts to stutter and then stops completely. When this happens a new connection still can be made and as long as only shorter output is generated the ssh connection is ok. From a dump taken it looks like user-space is waiting in some select call (without any warnon I rather won''t see the tx path). -Stefan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Tue, 2012-11-20 at 08:30 +0000, Stefan Bader wrote:> >> When I tried to rebase my persistent grant netfront/netback patch on > >> latest kernel, netperf/netserver test never succeeded. I did some test > >> to find out that v3.6-rc7 works fine, but v3.7-rc1, v3.7-rc2 and > >> v3.7-rc4 does not succeed in netperf/netserver test. So I keep my > >> persistent grant patch only based on v3.4-rc3 now. > > > >> Konrad thought about commit 6a8ed462f16b8455eec5ae00eb6014159a6721f0 in > >> v3.7-rc1, and suggested me to test your debug patch in netfront. This > >> BUG_ON happens soon after running the netperf/netserver test case. > > > >> Thanks > >> Annie > > > > Is there any progression with this bug (rc6 is out the door, so the > release of 3.7-final seems to be eminent and this bug completely > cripples any networking with guests) ? > > > > +1 on that. I was testing yesterday with a PVM domU running 3.7-rc5 on Xen 4.2 > (but also reported from EC2 running Xen 3.4.3) c with one VCPU. I actually can > trigger it by just ssh''ing into the domU (from another machine) and then run > "find /". Output starts to stutter and then stops completely. When this happens > a new connection still can be made and as long as only shorter output is > generated the ssh connection is ok. From a dump taken it looks like user-space > is waiting in some select call (without any warnon I rather won''t see the tx path).Annie, are you still looking into this or shall I? Ian.
On Tue, 2012-11-20 at 09:21 +0000, Ian Campbell wrote:> On Tue, 2012-11-20 at 08:30 +0000, Stefan Bader wrote: > > >> When I tried to rebase my persistent grant netfront/netback patch on > > >> latest kernel, netperf/netserver test never succeeded. I did some test > > >> to find out that v3.6-rc7 works fine, but v3.7-rc1, v3.7-rc2 and > > >> v3.7-rc4 does not succeed in netperf/netserver test. So I keep my > > >> persistent grant patch only based on v3.4-rc3 now. > > > > > >> Konrad thought about commit 6a8ed462f16b8455eec5ae00eb6014159a6721f0 in > > >> v3.7-rc1, and suggested me to test your debug patch in netfront. This > > >> BUG_ON happens soon after running the netperf/netserver test case. > > > > > >> Thanks > > >> Annie > > > > > > Is there any progression with this bug (rc6 is out the door, so the > > release of 3.7-final seems to be eminent and this bug completely > > cripples any networking with guests) ? > > > > > > > +1 on that. I was testing yesterday with a PVM domU running 3.7-rc5 on Xen 4.2 > > (but also reported from EC2 running Xen 3.4.3) c with one VCPU. I actually can > > trigger it by just ssh''ing into the domU (from another machine) and then run > > "find /". Output starts to stutter and then stops completely. When this happens > > a new connection still can be made and as long as only shorter output is > > generated the ssh connection is ok. From a dump taken it looks like user-space > > is waiting in some select call (without any warnon I rather won''t see the tx path). > > Annie, are you still looking into this or shall I?I''ll assume that silence == No. Will post a patch shortly. Ian.
Ian Campbell
2012-Nov-20 11:40 UTC
[PATCH] xen/netfront: handle compound page fragments on transmit
An SKB paged fragment can consist of a compound page with order > 0. However the netchannel protocol deals only in PAGE_SIZE frames. Handle this in xennet_make_frags by iterating over the frames which make up the page. This is the netfront equivalent to 6a8ed462f16b for netback. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: netdev@vger.kernel.org Cc: xen-devel@lists.xen.org Cc: Eric Dumazet <edumazet@google.com> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> Cc: ANNIE LI <annie.li@oracle.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: Stefan Bader <stefan.bader@canonical.com> --- drivers/net/xen-netfront.c | 58 +++++++++++++++++++++++++++++++++---------- 1 files changed, 44 insertions(+), 14 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index caa0110..a12b99a 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -452,24 +452,54 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev, /* Grant backend access to each skb fragment page. */ for (i = 0; i < frags; i++) { skb_frag_t *frag = skb_shinfo(skb)->frags + i; + struct page *page = skb_frag_page(frag); + unsigned long size = skb_frag_size(frag); + unsigned long offset = frag->page_offset; - tx->flags |= XEN_NETTXF_more_data; + /* Data must not cross a page boundary. */ + BUG_ON(size + offset > PAGE_SIZE<<compound_order(page)); - id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs); - np->tx_skbs[id].skb = skb_get(skb); - tx = RING_GET_REQUEST(&np->tx, prod++); - tx->id = id; - ref = gnttab_claim_grant_reference(&np->gref_tx_head); - BUG_ON((signed short)ref < 0); + /* Skip unused frames from start of page */ + page += offset >> PAGE_SHIFT; + offset &= ~PAGE_MASK; - mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag))); - gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id, - mfn, GNTMAP_readonly); + while (size > 0) { + unsigned long bytes; - tx->gref = np->grant_tx_ref[id] = ref; - tx->offset = frag->page_offset; - tx->size = skb_frag_size(frag); - tx->flags = 0; + BUG_ON(offset >= PAGE_SIZE); + + bytes = PAGE_SIZE - offset; + if (bytes > size) + bytes = size; + + tx->flags |= XEN_NETTXF_more_data; + + id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs); + np->tx_skbs[id].skb = skb_get(skb); + tx = RING_GET_REQUEST(&np->tx, prod++); + tx->id = id; + ref = gnttab_claim_grant_reference(&np->gref_tx_head); + BUG_ON((signed short)ref < 0); + + mfn = pfn_to_mfn(page_to_pfn(page)); + gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id, + mfn, GNTMAP_readonly); + + tx->gref = np->grant_tx_ref[id] = ref; + tx->offset = offset; + tx->size = bytes; + tx->flags = 0; + + offset += bytes; + size -= bytes; + + /* Next frame */ + if (offset == PAGE_SIZE && size) { + BUG_ON(!PageCompound(page)); + page++; + offset = 0; + } + } } np->tx.req_prod_pvt = prod; -- 1.7.2.5
Jan Beulich
2012-Nov-20 12:28 UTC
Re: [PATCH] xen/netfront: handle compound page fragments on transmit
>>> On 20.11.12 at 12:40, Ian Campbell <ian.campbell@citrix.com> wrote: > An SKB paged fragment can consist of a compound page with order > 0. > However the netchannel protocol deals only in PAGE_SIZE frames. > > Handle this in xennet_make_frags by iterating over the frames which > make up the page. > > This is the netfront equivalent to 6a8ed462f16b for netback.Wouldn''t you need to be at least a little more conservative here with respect to resource use: I realize that get_id_from_freelist() return values were never checked, and failure of gnttab_claim_grant_reference() was always dealt with via BUG_ON(), but considering that netfront_tx_slot_available() doesn''t account for compound page fragments, I think this (lack of) error handling needs improvement in the course of the change here (regardless of - I think - someone having said that usually the sum of all pages referenced from an skb''s fragments would not exceed MAX_SKB_FRAGS - "usually" just isn''t enough imo). Jan> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Cc: netdev@vger.kernel.org > Cc: xen-devel@lists.xen.org > Cc: Eric Dumazet <edumazet@google.com> > Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> > Cc: ANNIE LI <annie.li@oracle.com> > Cc: Sander Eikelenboom <linux@eikelenboom.it> > Cc: Stefan Bader <stefan.bader@canonical.com> > --- > drivers/net/xen-netfront.c | 58 +++++++++++++++++++++++++++++++++---------- > 1 files changed, 44 insertions(+), 14 deletions(-) > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c > index caa0110..a12b99a 100644 > --- a/drivers/net/xen-netfront.c > +++ b/drivers/net/xen-netfront.c > @@ -452,24 +452,54 @@ static void xennet_make_frags(struct sk_buff *skb, > struct net_device *dev, > /* Grant backend access to each skb fragment page. */ > for (i = 0; i < frags; i++) { > skb_frag_t *frag = skb_shinfo(skb)->frags + i; > + struct page *page = skb_frag_page(frag); > + unsigned long size = skb_frag_size(frag); > + unsigned long offset = frag->page_offset; > > - tx->flags |= XEN_NETTXF_more_data; > + /* Data must not cross a page boundary. */ > + BUG_ON(size + offset > PAGE_SIZE<<compound_order(page)); > > - id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs); > - np->tx_skbs[id].skb = skb_get(skb); > - tx = RING_GET_REQUEST(&np->tx, prod++); > - tx->id = id; > - ref = gnttab_claim_grant_reference(&np->gref_tx_head); > - BUG_ON((signed short)ref < 0); > + /* Skip unused frames from start of page */ > + page += offset >> PAGE_SHIFT; > + offset &= ~PAGE_MASK; > > - mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag))); > - gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id, > - mfn, GNTMAP_readonly); > + while (size > 0) { > + unsigned long bytes; > > - tx->gref = np->grant_tx_ref[id] = ref; > - tx->offset = frag->page_offset; > - tx->size = skb_frag_size(frag); > - tx->flags = 0; > + BUG_ON(offset >= PAGE_SIZE); > + > + bytes = PAGE_SIZE - offset; > + if (bytes > size) > + bytes = size; > + > + tx->flags |= XEN_NETTXF_more_data; > + > + id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs); > + np->tx_skbs[id].skb = skb_get(skb); > + tx = RING_GET_REQUEST(&np->tx, prod++); > + tx->id = id; > + ref = gnttab_claim_grant_reference(&np->gref_tx_head); > + BUG_ON((signed short)ref < 0); > + > + mfn = pfn_to_mfn(page_to_pfn(page)); > + gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id, > + mfn, GNTMAP_readonly); > + > + tx->gref = np->grant_tx_ref[id] = ref; > + tx->offset = offset; > + tx->size = bytes; > + tx->flags = 0; > + > + offset += bytes; > + size -= bytes; > + > + /* Next frame */ > + if (offset == PAGE_SIZE && size) { > + BUG_ON(!PageCompound(page)); > + page++; > + offset = 0; > + } > + } > } > > np->tx.req_prod_pvt = prod; > -- > 1.7.2.5 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Stefan Bader
2012-Nov-20 13:30 UTC
Re: [PATCH] xen/netfront: handle compound page fragments on transmit
Aside from Jans comments about error handling, I tried below patch and it seems to solve the problem with transfers out of the domU for me (though only shallow testing done, otoh 5 times is more than getting stuck the first time). -Stefan On 20.11.2012 12:40, Ian Campbell wrote:> An SKB paged fragment can consist of a compound page with order > 0. > However the netchannel protocol deals only in PAGE_SIZE frames. > > Handle this in xennet_make_frags by iterating over the frames which > make up the page. > > This is the netfront equivalent to 6a8ed462f16b for netback. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Cc: netdev@vger.kernel.org > Cc: xen-devel@lists.xen.org > Cc: Eric Dumazet <edumazet@google.com> > Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> > Cc: ANNIE LI <annie.li@oracle.com> > Cc: Sander Eikelenboom <linux@eikelenboom.it> > Cc: Stefan Bader <stefan.bader@canonical.com>Tested-by: Stefan Bader <stefan.bader@canonical.com>> --- > drivers/net/xen-netfront.c | 58 +++++++++++++++++++++++++++++++++---------- > 1 files changed, 44 insertions(+), 14 deletions(-) > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c > index caa0110..a12b99a 100644 > --- a/drivers/net/xen-netfront.c > +++ b/drivers/net/xen-netfront.c > @@ -452,24 +452,54 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev, > /* Grant backend access to each skb fragment page. */ > for (i = 0; i < frags; i++) { > skb_frag_t *frag = skb_shinfo(skb)->frags + i; > + struct page *page = skb_frag_page(frag); > + unsigned long size = skb_frag_size(frag); > + unsigned long offset = frag->page_offset; > > - tx->flags |= XEN_NETTXF_more_data; > + /* Data must not cross a page boundary. */ > + BUG_ON(size + offset > PAGE_SIZE<<compound_order(page)); > > - id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs); > - np->tx_skbs[id].skb = skb_get(skb); > - tx = RING_GET_REQUEST(&np->tx, prod++); > - tx->id = id; > - ref = gnttab_claim_grant_reference(&np->gref_tx_head); > - BUG_ON((signed short)ref < 0); > + /* Skip unused frames from start of page */ > + page += offset >> PAGE_SHIFT; > + offset &= ~PAGE_MASK; > > - mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag))); > - gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id, > - mfn, GNTMAP_readonly); > + while (size > 0) { > + unsigned long bytes; > > - tx->gref = np->grant_tx_ref[id] = ref; > - tx->offset = frag->page_offset; > - tx->size = skb_frag_size(frag); > - tx->flags = 0; > + BUG_ON(offset >= PAGE_SIZE); > + > + bytes = PAGE_SIZE - offset; > + if (bytes > size) > + bytes = size; > + > + tx->flags |= XEN_NETTXF_more_data; > + > + id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs); > + np->tx_skbs[id].skb = skb_get(skb); > + tx = RING_GET_REQUEST(&np->tx, prod++); > + tx->id = id; > + ref = gnttab_claim_grant_reference(&np->gref_tx_head); > + BUG_ON((signed short)ref < 0); > + > + mfn = pfn_to_mfn(page_to_pfn(page)); > + gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id, > + mfn, GNTMAP_readonly); > + > + tx->gref = np->grant_tx_ref[id] = ref; > + tx->offset = offset; > + tx->size = bytes; > + tx->flags = 0; > + > + offset += bytes; > + size -= bytes; > + > + /* Next frame */ > + if (offset == PAGE_SIZE && size) { > + BUG_ON(!PageCompound(page)); > + page++; > + offset = 0; > + } > + } > } > > np->tx.req_prod_pvt = prod; >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Campbell
2012-Nov-20 13:35 UTC
Re: [PATCH] xen/netfront: handle compound page fragments on transmit
On Tue, 2012-11-20 at 12:28 +0000, Jan Beulich wrote:> >>> On 20.11.12 at 12:40, Ian Campbell <ian.campbell@citrix.com> wrote: > > An SKB paged fragment can consist of a compound page with order > 0. > > However the netchannel protocol deals only in PAGE_SIZE frames. > > > > Handle this in xennet_make_frags by iterating over the frames which > > make up the page. > > > > This is the netfront equivalent to 6a8ed462f16b for netback. > > Wouldn''t you need to be at least a little more conservative here > with respect to resource use: I realize that get_id_from_freelist() > return values were never checked, and failure of > gnttab_claim_grant_reference() was always dealt with via > BUG_ON(), but considering that netfront_tx_slot_available() > doesn''t account for compound page fragments, I think this (lack > of) error handling needs improvement in the course of the > change here (regardless of - I think - someone having said that > usually the sum of all pages referenced from an skb''s fragments > would not exceed MAX_SKB_FRAGS - "usually" just isn''t enough > imo).I think it is more than "usually", it is derived from the number of pages needed to contain 64K of data which is the maximum size of the data associated with an skb (AIUI). Unwinding from failure in xennet_make_frags looks pretty tricky, but how about this incremental patch: diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index a12b99a..06d0a84 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -505,6 +505,46 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev, np->tx.req_prod_pvt = prod; } +/* + * Count how many ring slots are required to send the frags of this + * skb. Each frag might be a compound page. + */ +static int xennet_count_skb_frag_pages(struct sk_buff *skb) +{ + int i, frags = skb_shinfo(skb)->nr_frags; + int pages = 0; + + for (i = 0; i < frags; i++) { + skb_frag_t *frag = skb_shinfo(skb)->frags + i; + unsigned long size = skb_frag_size(frag); + unsigned long offset = frag->page_offset; + + /* Skip unused frames from start of page */ + offset &= ~PAGE_MASK; + + while (size > 0) { + unsigned long bytes; + + BUG_ON(offset >= PAGE_SIZE); + + bytes = PAGE_SIZE - offset; + if (bytes > size) + bytes = size; + + offset += bytes; + size -= bytes; + + /* Next frame */ + if (offset == PAGE_SIZE && size) { + pages++; + offset = 0; + } + } + } + + return pages; +} + static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev) { unsigned short id; @@ -517,12 +557,13 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev) grant_ref_t ref; unsigned long mfn; int notify; - int frags = skb_shinfo(skb)->nr_frags; + int frags; unsigned int offset = offset_in_page(data); unsigned int len = skb_headlen(skb); unsigned long flags; - frags += DIV_ROUND_UP(offset + len, PAGE_SIZE); + frags = xennet_count_skb_frag_pages(skb) + + DIV_ROUND_UP(offset + len, PAGE_SIZE); if (unlikely(frags > MAX_SKB_FRAGS + 1)) { printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n", frags);
Sander Eikelenboom
2012-Nov-20 13:45 UTC
Re: [PATCH] xen/netfront: handle compound page fragments on transmit
Tuesday, November 20, 2012, 2:30:54 PM, you wrote:> Aside from Jans comments about error handling, I tried below patch and it seems > to solve the problem with transfers out of the domU for me (though only shallow > testing done, otoh 5 times is more than getting stuck the first time).> -StefanI''m running with this patch now, it seems to fix the problems for me as well. -- Sander> On 20.11.2012 12:40, Ian Campbell wrote: >> An SKB paged fragment can consist of a compound page with order > 0. >> However the netchannel protocol deals only in PAGE_SIZE frames. >> >> Handle this in xennet_make_frags by iterating over the frames which >> make up the page. >> >> This is the netfront equivalent to 6a8ed462f16b for netback. >> >> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> >> Cc: netdev@vger.kernel.org >> Cc: xen-devel@lists.xen.org >> Cc: Eric Dumazet <edumazet@google.com> >> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> >> Cc: ANNIE LI <annie.li@oracle.com> >> Cc: Sander Eikelenboom <linux@eikelenboom.it> >> Cc: Stefan Bader <stefan.bader@canonical.com> > Tested-by: Stefan Bader <stefan.bader@canonical.com> >> --- >> drivers/net/xen-netfront.c | 58 +++++++++++++++++++++++++++++++++---------- >> 1 files changed, 44 insertions(+), 14 deletions(-) >> >> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c >> index caa0110..a12b99a 100644 >> --- a/drivers/net/xen-netfront.c >> +++ b/drivers/net/xen-netfront.c >> @@ -452,24 +452,54 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev, >> /* Grant backend access to each skb fragment page. */ >> for (i = 0; i < frags; i++) { >> skb_frag_t *frag = skb_shinfo(skb)->frags + i; >> + struct page *page = skb_frag_page(frag); >> + unsigned long size = skb_frag_size(frag); >> + unsigned long offset = frag->page_offset; >> >> - tx->flags |= XEN_NETTXF_more_data; >> + /* Data must not cross a page boundary. */ >> + BUG_ON(size + offset > PAGE_SIZE<<compound_order(page)); >> >> - id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs); >> - np->tx_skbs[id].skb = skb_get(skb); >> - tx = RING_GET_REQUEST(&np->tx, prod++); >> - tx->id = id; >> - ref = gnttab_claim_grant_reference(&np->gref_tx_head); >> - BUG_ON((signed short)ref < 0); >> + /* Skip unused frames from start of page */ >> + page += offset >> PAGE_SHIFT; >> + offset &= ~PAGE_MASK; >> >> - mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag))); >> - gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id, >> - mfn, GNTMAP_readonly); >> + while (size > 0) { >> + unsigned long bytes; >> >> - tx->gref = np->grant_tx_ref[id] = ref; >> - tx->offset = frag->page_offset; >> - tx->size = skb_frag_size(frag); >> - tx->flags = 0; >> + BUG_ON(offset >= PAGE_SIZE); >> + >> + bytes = PAGE_SIZE - offset; >> + if (bytes > size) >> + bytes = size; >> + >> + tx->flags |= XEN_NETTXF_more_data; >> + >> + id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs); >> + np->tx_skbs[id].skb = skb_get(skb); >> + tx = RING_GET_REQUEST(&np->tx, prod++); >> + tx->id = id; >> + ref = gnttab_claim_grant_reference(&np->gref_tx_head); >> + BUG_ON((signed short)ref < 0); >> + >> + mfn = pfn_to_mfn(page_to_pfn(page)); >> + gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id, >> + mfn, GNTMAP_readonly); >> + >> + tx->gref = np->grant_tx_ref[id] = ref; >> + tx->offset = offset; >> + tx->size = bytes; >> + tx->flags = 0; >> + >> + offset += bytes; >> + size -= bytes; >> + >> + /* Next frame */ >> + if (offset == PAGE_SIZE && size) { >> + BUG_ON(!PageCompound(page)); >> + page++; >> + offset = 0; >> + } >> + } >> } >> >> np->tx.req_prod_pvt = prod; >>
Jan Beulich
2012-Nov-20 13:51 UTC
Re: [PATCH] xen/netfront: handle compound page fragments on transmit
>>> On 20.11.12 at 14:35, Ian Campbell <Ian.Campbell@citrix.com> wrote: > On Tue, 2012-11-20 at 12:28 +0000, Jan Beulich wrote: >> >>> On 20.11.12 at 12:40, Ian Campbell <ian.campbell@citrix.com> wrote: >> > An SKB paged fragment can consist of a compound page with order > 0. >> > However the netchannel protocol deals only in PAGE_SIZE frames. >> > >> > Handle this in xennet_make_frags by iterating over the frames which >> > make up the page. >> > >> > This is the netfront equivalent to 6a8ed462f16b for netback. >> >> Wouldn''t you need to be at least a little more conservative here >> with respect to resource use: I realize that get_id_from_freelist() >> return values were never checked, and failure of >> gnttab_claim_grant_reference() was always dealt with via >> BUG_ON(), but considering that netfront_tx_slot_available() >> doesn''t account for compound page fragments, I think this (lack >> of) error handling needs improvement in the course of the >> change here (regardless of - I think - someone having said that >> usually the sum of all pages referenced from an skb''s fragments >> would not exceed MAX_SKB_FRAGS - "usually" just isn''t enough >> imo). > > I think it is more than "usually", it is derived from the number of > pages needed to contain 64K of data which is the maximum size of the > data associated with an skb (AIUI). > > Unwinding from failure in xennet_make_frags looks pretty tricky,Yes, I agree.> but how about this incremental patch:Looks good, but can probably be simplified quite a bit:> --- a/drivers/net/xen-netfront.c > +++ b/drivers/net/xen-netfront.c > @@ -505,6 +505,46 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev, > np->tx.req_prod_pvt = prod; > } > > +/* > + * Count how many ring slots are required to send the frags of this > + * skb. Each frag might be a compound page. > + */ > +static int xennet_count_skb_frag_pages(struct sk_buff *skb) > +{ > + int i, frags = skb_shinfo(skb)->nr_frags; > + int pages = 0; > + > + for (i = 0; i < frags; i++) { > + skb_frag_t *frag = skb_shinfo(skb)->frags + i; > + unsigned long size = skb_frag_size(frag); > + unsigned long offset = frag->page_offset; > + > + /* Skip unused frames from start of page */ > + offset &= ~PAGE_MASK; > + > + while (size > 0) { > + unsigned long bytes; > + > + BUG_ON(offset >= PAGE_SIZE); > + > + bytes = PAGE_SIZE - offset; > + if (bytes > size) > + bytes = size; > + > + offset += bytes; > + size -= bytes; > + > + /* Next frame */ > + if (offset == PAGE_SIZE && size) { > + pages++; > + offset = 0; > + } > + }Isn''t the whole loop equivalent to pages = PFN_UP(offset + size); (at least as long as size is not zero)? Plus I think the increment of pages would need to be pulled out of the if() body.> + } > + > + return pages; > +} > + > static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev) > { > unsigned short id; > @@ -517,12 +557,13 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev) > grant_ref_t ref; > unsigned long mfn; > int notify; > - int frags = skb_shinfo(skb)->nr_frags; > + int frags; > unsigned int offset = offset_in_page(data); > unsigned int len = skb_headlen(skb); > unsigned long flags; > > - frags += DIV_ROUND_UP(offset + len, PAGE_SIZE); > + frags = xennet_count_skb_frag_pages(skb) + > + DIV_ROUND_UP(offset + len, PAGE_SIZE); > if (unlikely(frags > MAX_SKB_FRAGS + 1)) {This condition would now need adjustment, though (because "frags" is no longer what its name says). Jan> printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n", > frags);
Ian Campbell
2012-Nov-20 14:14 UTC
Re: [PATCH] xen/netfront: handle compound page fragments on transmit
On Tue, 2012-11-20 at 13:51 +0000, Jan Beulich wrote:> >>> On 20.11.12 at 14:35, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > On Tue, 2012-11-20 at 12:28 +0000, Jan Beulich wrote: > >> >>> On 20.11.12 at 12:40, Ian Campbell <ian.campbell@citrix.com> wrote: > >> > An SKB paged fragment can consist of a compound page with order > 0. > >> > However the netchannel protocol deals only in PAGE_SIZE frames. > >> > > >> > Handle this in xennet_make_frags by iterating over the frames which > >> > make up the page. > >> > > >> > This is the netfront equivalent to 6a8ed462f16b for netback. > >> > >> Wouldn''t you need to be at least a little more conservative here > >> with respect to resource use: I realize that get_id_from_freelist() > >> return values were never checked, and failure of > >> gnttab_claim_grant_reference() was always dealt with via > >> BUG_ON(), but considering that netfront_tx_slot_available() > >> doesn''t account for compound page fragments, I think this (lack > >> of) error handling needs improvement in the course of the > >> change here (regardless of - I think - someone having said that > >> usually the sum of all pages referenced from an skb''s fragments > >> would not exceed MAX_SKB_FRAGS - "usually" just isn''t enough > >> imo). > > > > I think it is more than "usually", it is derived from the number of > > pages needed to contain 64K of data which is the maximum size of the > > data associated with an skb (AIUI). > > > > Unwinding from failure in xennet_make_frags looks pretty tricky, > > Yes, I agree. > > > but how about this incremental patch: > > Looks good, but can probably be simplified quite a bit: > > > --- a/drivers/net/xen-netfront.c > > +++ b/drivers/net/xen-netfront.c > > @@ -505,6 +505,46 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev, > > np->tx.req_prod_pvt = prod; > > } > > > > +/* > > + * Count how many ring slots are required to send the frags of this > > + * skb. Each frag might be a compound page. > > + */ > > +static int xennet_count_skb_frag_pages(struct sk_buff *skb) > > +{ > > + int i, frags = skb_shinfo(skb)->nr_frags; > > + int pages = 0; > > + > > + for (i = 0; i < frags; i++) { > > + skb_frag_t *frag = skb_shinfo(skb)->frags + i; > > + unsigned long size = skb_frag_size(frag); > > + unsigned long offset = frag->page_offset; > > + > > + /* Skip unused frames from start of page */ > > + offset &= ~PAGE_MASK; > > + > > + while (size > 0) { > > + unsigned long bytes; > > + > > + BUG_ON(offset >= PAGE_SIZE); > > + > > + bytes = PAGE_SIZE - offset; > > + if (bytes > size) > > + bytes = size; > > + > > + offset += bytes; > > + size -= bytes; > > + > > + /* Next frame */ > > + if (offset == PAGE_SIZE && size) { > > + pages++; > > + offset = 0; > > + } > > + } > > Isn''t the whole loop equivalent to > > pages = PFN_UP(offset + size); > > (at least as long as size is not zero)?Er, yes. Wood for the trees etc... I think using PFN_UP overcounts a bit since the data needed start in the first frame of a compound frame, but if you keep the /* Skip unused frames from start of page */ offset &= ~PAGE_MASK; I think that does the right thing> > @@ -517,12 +557,13 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev) > > grant_ref_t ref; > > unsigned long mfn; > > int notify; > > - int frags = skb_shinfo(skb)->nr_frags; > > + int frags; > > unsigned int offset = offset_in_page(data); > > unsigned int len = skb_headlen(skb); > > unsigned long flags; > > > > - frags += DIV_ROUND_UP(offset + len, PAGE_SIZE); > > + frags = xennet_count_skb_frag_pages(skb) + > > + DIV_ROUND_UP(offset + len, PAGE_SIZE); > > if (unlikely(frags > MAX_SKB_FRAGS + 1)) { > > This condition would now need adjustment, though (because > "frags" is no longer what its name says).I think it already wasn''t what the name says, since it included the skb head too. Perhaps "slots" would be a better name? diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index a12b99a..b744875 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -505,6 +505,29 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev, np->tx.req_prod_pvt = prod; } +/* + * Count how many ring slots are required to send the frags of this + * skb. Each frag might be a compound page. + */ +static int xennet_count_skb_frag_slots(struct sk_buff *skb) +{ + int i, frags = skb_shinfo(skb)->nr_frags; + int pages = 0; + + for (i = 0; i < frags; i++) { + skb_frag_t *frag = skb_shinfo(skb)->frags + i; + unsigned long size = skb_frag_size(frag); + unsigned long offset = frag->page_offset; + + /* Skip unused frames from start of page */ + offset &= ~PAGE_MASK; + + pages += PFN_UP(offset + size); + } + + return pages; +} + static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev) { unsigned short id; @@ -517,15 +540,16 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev) grant_ref_t ref; unsigned long mfn; int notify; - int frags = skb_shinfo(skb)->nr_frags; + int slots; unsigned int offset = offset_in_page(data); unsigned int len = skb_headlen(skb); unsigned long flags; - frags += DIV_ROUND_UP(offset + len, PAGE_SIZE); - if (unlikely(frags > MAX_SKB_FRAGS + 1)) { - printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n", - frags); + slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) + + xennet_count_skb_frag_slots(skb); + if (unlikely(slots > MAX_SKB_FRAGS + 1)) { + printk(KERN_ALERT "xennet: skb rides the rocket: %d slots\n", + slots); dump_stack(); goto drop; } @@ -533,7 +557,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev) spin_lock_irqsave(&np->tx_lock, flags); if (unlikely(!netif_carrier_ok(dev) || - (frags > 1 && !xennet_can_sg(dev)) || + (slots > 1 && !xennet_can_sg(dev)) || netif_needs_gso(skb, netif_skb_features(skb)))) { spin_unlock_irqrestore(&np->tx_lock, flags); goto drop;
Jan Beulich
2012-Nov-20 14:32 UTC
Re: [PATCH] xen/netfront: handle compound page fragments on transmit
>>> On 20.11.12 at 15:14, Ian Campbell <Ian.Campbell@citrix.com> wrote: > On Tue, 2012-11-20 at 13:51 +0000, Jan Beulich wrote: >> >>> On 20.11.12 at 14:35, Ian Campbell <Ian.Campbell@citrix.com> wrote: >> > On Tue, 2012-11-20 at 12:28 +0000, Jan Beulich wrote: >> >> >>> On 20.11.12 at 12:40, Ian Campbell <ian.campbell@citrix.com> wrote: >> >> > An SKB paged fragment can consist of a compound page with order > 0. >> >> > However the netchannel protocol deals only in PAGE_SIZE frames. >> >> > >> >> > Handle this in xennet_make_frags by iterating over the frames which >> >> > make up the page. >> >> > >> >> > This is the netfront equivalent to 6a8ed462f16b for netback. >> >> >> >> Wouldn''t you need to be at least a little more conservative here >> >> with respect to resource use: I realize that get_id_from_freelist() >> >> return values were never checked, and failure of >> >> gnttab_claim_grant_reference() was always dealt with via >> >> BUG_ON(), but considering that netfront_tx_slot_available() >> >> doesn''t account for compound page fragments, I think this (lack >> >> of) error handling needs improvement in the course of the >> >> change here (regardless of - I think - someone having said that >> >> usually the sum of all pages referenced from an skb''s fragments >> >> would not exceed MAX_SKB_FRAGS - "usually" just isn''t enough >> >> imo). >> > >> > I think it is more than "usually", it is derived from the number of >> > pages needed to contain 64K of data which is the maximum size of the >> > data associated with an skb (AIUI). >> > >> > Unwinding from failure in xennet_make_frags looks pretty tricky, >> >> Yes, I agree. >> >> > but how about this incremental patch: >> >> Looks good, but can probably be simplified quite a bit: >> >> > --- a/drivers/net/xen-netfront.c >> > +++ b/drivers/net/xen-netfront.c >> > @@ -505,6 +505,46 @@ static void xennet_make_frags(struct sk_buff *skb, > struct net_device *dev, >> > np->tx.req_prod_pvt = prod; >> > } >> > >> > +/* >> > + * Count how many ring slots are required to send the frags of this >> > + * skb. Each frag might be a compound page. >> > + */ >> > +static int xennet_count_skb_frag_pages(struct sk_buff *skb) >> > +{ >> > + int i, frags = skb_shinfo(skb)->nr_frags; >> > + int pages = 0; >> > + >> > + for (i = 0; i < frags; i++) { >> > + skb_frag_t *frag = skb_shinfo(skb)->frags + i; >> > + unsigned long size = skb_frag_size(frag); >> > + unsigned long offset = frag->page_offset; >> > + >> > + /* Skip unused frames from start of page */ >> > + offset &= ~PAGE_MASK; >> > + >> > + while (size > 0) { >> > + unsigned long bytes; >> > + >> > + BUG_ON(offset >= PAGE_SIZE); >> > + >> > + bytes = PAGE_SIZE - offset; >> > + if (bytes > size) >> > + bytes = size; >> > + >> > + offset += bytes; >> > + size -= bytes; >> > + >> > + /* Next frame */ >> > + if (offset == PAGE_SIZE && size) { >> > + pages++; >> > + offset = 0; >> > + } >> > + } >> >> Isn''t the whole loop equivalent to >> >> pages = PFN_UP(offset + size); >> >> (at least as long as size is not zero)? > > Er, yes. Wood for the trees etc... > > I think using PFN_UP overcounts a bit since the data needed start in the > first frame of a compound frame, but if you keep the > /* Skip unused frames from start of page */ > offset &= ~PAGE_MASK; > > I think that does the right thingRight, that''s what I said (I only wanted the loop to be replaced, not what was prior to it).> @@ -517,15 +540,16 @@ static int xennet_start_xmit(struct sk_buff *skb, > struct net_device *dev) > grant_ref_t ref; > unsigned long mfn; > int notify; > - int frags = skb_shinfo(skb)->nr_frags; > + int slots; > unsigned int offset = offset_in_page(data); > unsigned int len = skb_headlen(skb); > unsigned long flags; > > - frags += DIV_ROUND_UP(offset + len, PAGE_SIZE); > - if (unlikely(frags > MAX_SKB_FRAGS + 1)) { > - printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n", > - frags); > + slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) + > + xennet_count_skb_frag_slots(skb); > + if (unlikely(slots > MAX_SKB_FRAGS + 1)) {But still - isn''t this wrong now (i.e. can''t it now validly exceed the boundary checked for)? Jan> + printk(KERN_ALERT "xennet: skb rides the rocket: %d slots\n", > + slots); > dump_stack(); > goto drop; > }
Eric Dumazet
2012-Nov-20 14:45 UTC
Re: [PATCH] xen/netfront: handle compound page fragments on transmit
On Tue, 2012-11-20 at 11:40 +0000, Ian Campbell wrote:> An SKB paged fragment can consist of a compound page with order > 0. > However the netchannel protocol deals only in PAGE_SIZE frames. > > Handle this in xennet_make_frags by iterating over the frames which > make up the page. > > This is the netfront equivalent to 6a8ed462f16b for netback. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > Cc: netdev@vger.kernel.org > Cc: xen-devel@lists.xen.org > Cc: Eric Dumazet <edumazet@google.com> > Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> > Cc: ANNIE LI <annie.li@oracle.com> > Cc: Sander Eikelenboom <linux@eikelenboom.it> > Cc: Stefan Bader <stefan.bader@canonical.com> > --- > drivers/net/xen-netfront.c | 58 +++++++++++++++++++++++++++++++++---------- > 1 files changed, 44 insertions(+), 14 deletions(-) > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c > index caa0110..a12b99a 100644 > --- a/drivers/net/xen-netfront.c > +++ b/drivers/net/xen-netfront.c > @@ -452,24 +452,54 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev, > /* Grant backend access to each skb fragment page. */ > for (i = 0; i < frags; i++) { > skb_frag_t *frag = skb_shinfo(skb)->frags + i; > + struct page *page = skb_frag_page(frag); > + unsigned long size = skb_frag_size(frag); > + unsigned long offset = frag->page_offset; > > - tx->flags |= XEN_NETTXF_more_data; > + /* Data must not cross a page boundary. */ > + BUG_ON(size + offset > PAGE_SIZE<<compound_order(page)); > > - id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs); > - np->tx_skbs[id].skb = skb_get(skb); > - tx = RING_GET_REQUEST(&np->tx, prod++); > - tx->id = id; > - ref = gnttab_claim_grant_reference(&np->gref_tx_head); > - BUG_ON((signed short)ref < 0); > + /* Skip unused frames from start of page */''frame'' in the comment means an order-0 page ?> + page += offset >> PAGE_SHIFT; > + offset &= ~PAGE_MASK; > > - mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag))); > - gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id, > - mfn, GNTMAP_readonly); > + while (size > 0) { > + unsigned long bytes; > > - tx->gref = np->grant_tx_ref[id] = ref; > - tx->offset = frag->page_offset; > - tx->size = skb_frag_size(frag); > - tx->flags = 0; > + BUG_ON(offset >= PAGE_SIZE); > + > + bytes = PAGE_SIZE - offset; > + if (bytes > size) > + bytes = size; > + > + tx->flags |= XEN_NETTXF_more_data; > + > + id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs); > + np->tx_skbs[id].skb = skb_get(skb);BTW this skb_get() means extra atomic operations for every 4096 bytes unit, and an extra atomic op (and test for final 0) at TX completion. This could be avoided, by setting np->tx_skbs[id].skb = skb only for the very last unit.> + tx = RING_GET_REQUEST(&np->tx, prod++); > + tx->id = id; > + ref = gnttab_claim_grant_reference(&np->gref_tx_head); > + BUG_ON((signed short)ref < 0); > + > + mfn = pfn_to_mfn(page_to_pfn(page)); > + gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id, > + mfn, GNTMAP_readonly); > + > + tx->gref = np->grant_tx_ref[id] = ref; > + tx->offset = offset; > + tx->size = bytes; > + tx->flags = 0; > + > + offset += bytes; > + size -= bytes; > + > + /* Next frame */ > + if (offset == PAGE_SIZE && size) { > + BUG_ON(!PageCompound(page)); > + page++; > + offset = 0; > + } > + } > } > > np->tx.req_prod_pvt = prod;Acked-by: Eric Dumazet <edumazet@google.com> Thanks !
Ian Campbell
2012-Nov-20 15:05 UTC
Re: [PATCH] xen/netfront: handle compound page fragments on transmit
On Tue, 2012-11-20 at 14:45 +0000, Eric Dumazet wrote:> > + /* Skip unused frames from start of page */ > > ''frame'' in the comment means an order-0 page ?Yes. Confusing in the context of a network driver I know! I couldn''t think of a better term.> > + id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs); > > + np->tx_skbs[id].skb = skb_get(skb); > > BTW this skb_get() means extra atomic operations for every 4096 bytes > unit, and an extra atomic op (and test for final 0) at TX completion. > This could be avoided, by setting np->tx_skbs[id].skb = skb only for the > very last unit.Thanks. Might be tricky because guests can ack the individual requests in any order but it''s something worth having a look at.> > np->tx.req_prod_pvt = prod; > > Acked-by: Eric Dumazet <edumazet@google.com> > > Thanks !Thanks for the review. Ian.> >
Ian Campbell
2012-Nov-20 15:06 UTC
Re: [PATCH] xen/netfront: handle compound page fragments on transmit
On Tue, 2012-11-20 at 14:32 +0000, Jan Beulich wrote:> > @@ -517,15 +540,16 @@ static int xennet_start_xmit(struct sk_buff *skb, > > struct net_device *dev) > > grant_ref_t ref; > > unsigned long mfn; > > int notify; > > - int frags = skb_shinfo(skb)->nr_frags; > > + int slots; > > unsigned int offset = offset_in_page(data); > > unsigned int len = skb_headlen(skb); > > unsigned long flags; > > > > - frags += DIV_ROUND_UP(offset + len, PAGE_SIZE); > > - if (unlikely(frags > MAX_SKB_FRAGS + 1)) { > > - printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n", > > - frags); > > + slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) + > > + xennet_count_skb_frag_slots(skb); > > + if (unlikely(slots > MAX_SKB_FRAGS + 1)) { > > But still - isn''t this wrong now (i.e. can''t it now validly exceed the > boundary checked for)?In practice no because of the property that the number of pages backing the frags is <= MAX_SKB_FRAGS even if you are using compound pages as the frags. Ian.
Eric Dumazet
2012-Nov-20 15:28 UTC
Re: [PATCH] xen/netfront: handle compound page fragments on transmit
On Tue, 2012-11-20 at 15:06 +0000, Ian Campbell wrote:> In practice no because of the property that the number of pages backing > the frags is <= MAX_SKB_FRAGS even if you are using compound pages as > the frags.Yes, but you can make this test trigger with some hacks from userland (since the frag allocator is per task instead of per socket), so you should remove the dump_stack() ? Best way would be to count exact number of slots. This could be something like 48 slots for a single skb (if each frag is 4098 (1+4096+1)bytes, only the last one is around 4000 bytes) MAX_SKB_FRAGS is really number of frags, while your driver needs a count of ''order-0'' ''frames''
Jan Beulich
2012-Nov-20 15:44 UTC
Re: [PATCH] xen/netfront: handle compound page fragments on transmit
>>> On 20.11.12 at 16:06, Ian Campbell <Ian.Campbell@citrix.com> wrote: > On Tue, 2012-11-20 at 14:32 +0000, Jan Beulich wrote: >> > @@ -517,15 +540,16 @@ static int xennet_start_xmit(struct sk_buff *skb, >> > struct net_device *dev) >> > grant_ref_t ref; >> > unsigned long mfn; >> > int notify; >> > - int frags = skb_shinfo(skb)->nr_frags; >> > + int slots; >> > unsigned int offset = offset_in_page(data); >> > unsigned int len = skb_headlen(skb); >> > unsigned long flags; >> > >> > - frags += DIV_ROUND_UP(offset + len, PAGE_SIZE); >> > - if (unlikely(frags > MAX_SKB_FRAGS + 1)) { >> > - printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n", >> > - frags); >> > + slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) + >> > + xennet_count_skb_frag_slots(skb); >> > + if (unlikely(slots > MAX_SKB_FRAGS + 1)) { >> >> But still - isn''t this wrong now (i.e. can''t it now validly exceed the >> boundary checked for)? > > In practice no because of the property that the number of pages backing > the frags is <= MAX_SKB_FRAGS even if you are using compound pages as > the frags.So are you saying that there is something in the system preventing up to MAX_SKB_FRAGS * SKB_FRAG_PAGE_ORDER (or NETDEV_FRAG_PAGE_MAX_ORDER) skb-s to be created? I didn''t find any. I do notice that __netdev_alloc_frag() currently never gets called with a size larger than PAGE_SIZE, but considering that the function just recently got made capable of that, I''m sure respective users will show up rather sooner than later. Jan
Ian Campbell
2012-Nov-20 15:54 UTC
Re: [PATCH] xen/netfront: handle compound page fragments on transmit
On Tue, 2012-11-20 at 15:28 +0000, Eric Dumazet wrote:> On Tue, 2012-11-20 at 15:06 +0000, Ian Campbell wrote: > > > In practice no because of the property that the number of pages backing > > the frags is <= MAX_SKB_FRAGS even if you are using compound pages as > > the frags. > > Yes, but you can make this test trigger with some hacks from userland > (since the frag allocator is per task instead of per socket), so you > should remove the dump_stack() ? > > Best way would be to count exact number of slots. > > This could be something like 48 slots for a single skb > > (if each frag is 4098 (1+4096+1)bytes, only the last one is around 4000 > bytes) > > MAX_SKB_FRAGS is really number of frags, while your driver needs a count > of ''order-0'' ''frames''The use of MAX_SKB_FRAGS is a bit misleading here, it''s really the max number of slots which the other end will be willing to receive as a single frame (in the Ethernet sense), as defined by the PV protocol. It happens to be the same as MAX_SKB_FRAGS (or it is at least MAX_SKB_FRAGS, I''m not too sure). I''ll nuke the dump_stack() though -- it''s not clear what sort of useful context it would contain anyway. Ian.
Ian Campbell
2012-Nov-20 16:04 UTC
netchannel vs MAX_SKB_FRAGS (Was: Re: [PATCH] xen/netfront: handle compound page fragments on transmit)
Limiting to just xen-devel folks. On Tue, 2012-11-20 at 15:54 +0000, Ian Campbell wrote:> The use of MAX_SKB_FRAGS is a bit misleading here, it''s really the max > number of slots which the other end will be willing to receive as a > single frame (in the Ethernet sense), as defined by the PV protocol. > It happens to be the same as MAX_SKB_FRAGS (or it is at least > MAX_SKB_FRAGS, I''m not too sure).This highlights a couple of issues, the main one is that implicitly including MAX_SKB_FRAGS in the PV net protocol is just madness. It can and has changed in the past. Someone really needs to (retroactively) figure out what a sensible default minimum which front and back must support is and use it in the front and backends instead of MAX_SKB_FRAGS. Probably something derived from the existing 64K limit (Linux has used 17 and 18 as MAX_SKB_FRAGS in the past). Then perhaps implement a feature-flag to allow front and backends to negotiate something bigger if they like. It might also be interesting for front and backends to coalesce multiple ring slots into compound pages. I don''t have time for either of those unfortunately Ian.
Eric Dumazet
2012-Nov-20 16:14 UTC
Re: [PATCH] xen/netfront: handle compound page fragments on transmit
At least TCP skbs are limited to 65536 bytes in tcp_sendmsg() (around 45 1460-bytes MSS segments) Thats probably because IPv4 stack only copes with this limit. This probably could be relaxed for TCP friends, but this kind of skbs should not hit a driver. On Tue, Nov 20, 2012 at 7:44 AM, Jan Beulich <JBeulich@suse.com> wrote:> >>> On 20.11.12 at 16:06, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > On Tue, 2012-11-20 at 14:32 +0000, Jan Beulich wrote: > >> > @@ -517,15 +540,16 @@ static int xennet_start_xmit(struct sk_buff > *skb, > >> > struct net_device *dev) > >> > grant_ref_t ref; > >> > unsigned long mfn; > >> > int notify; > >> > - int frags = skb_shinfo(skb)->nr_frags; > >> > + int slots; > >> > unsigned int offset = offset_in_page(data); > >> > unsigned int len = skb_headlen(skb); > >> > unsigned long flags; > >> > > >> > - frags += DIV_ROUND_UP(offset + len, PAGE_SIZE); > >> > - if (unlikely(frags > MAX_SKB_FRAGS + 1)) { > >> > - printk(KERN_ALERT "xennet: skb rides the rocket: %d > frags\n", > >> > - frags); > >> > + slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) + > >> > + xennet_count_skb_frag_slots(skb); > >> > + if (unlikely(slots > MAX_SKB_FRAGS + 1)) { > >> > >> But still - isn''t this wrong now (i.e. can''t it now validly exceed the > >> boundary checked for)? > > > > In practice no because of the property that the number of pages backing > > the frags is <= MAX_SKB_FRAGS even if you are using compound pages as > > the frags. > > So are you saying that there is something in the system > preventing up to MAX_SKB_FRAGS * SKB_FRAG_PAGE_ORDER > (or NETDEV_FRAG_PAGE_MAX_ORDER) skb-s to be created? I > didn''t find any. I do notice that __netdev_alloc_frag() currently > never gets called with a size larger than PAGE_SIZE, but > considering that the function just recently got made capable of > that, I''m sure respective users will show up rather sooner than > later. > > Jan > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 2012-11-20 19:36, Ian Campbell wrote:> On Tue, 2012-11-20 at 09:21 +0000, Ian Campbell wrote: >> On Tue, 2012-11-20 at 08:30 +0000, Stefan Bader wrote: >>>>> When I tried to rebase my persistent grant netfront/netback patch on >>>>> latest kernel, netperf/netserver test never succeeded. I did some test >>>>> to find out that v3.6-rc7 works fine, but v3.7-rc1, v3.7-rc2 and >>>>> v3.7-rc4 does not succeed in netperf/netserver test. So I keep my >>>>> persistent grant patch only based on v3.4-rc3 now. >>>>> Konrad thought about commit 6a8ed462f16b8455eec5ae00eb6014159a6721f0 in >>>>> v3.7-rc1, and suggested me to test your debug patch in netfront. This >>>>> BUG_ON happens soon after running the netperf/netserver test case. >>>>> Thanks >>>>> Annie >>>> Is there any progression with this bug (rc6 is out the door, so the >>> release of 3.7-final seems to be eminent and this bug completely >>> cripples any networking with guests) ? >>> +1 on that. I was testing yesterday with a PVM domU running 3.7-rc5 on Xen 4.2 >>> (but also reported from EC2 running Xen 3.4.3) c with one VCPU. I actually can >>> trigger it by just ssh''ing into the domU (from another machine) and then run >>> "find /". Output starts to stutter and then stops completely. When this happens >>> a new connection still can be made and as long as only shorter output is >>> generated the ssh connection is ok. From a dump taken it looks like user-space >>> is waiting in some select call (without any warnon I rather won''t see the tx path). >> Annie, are you still looking into this or shall I? > I''ll assume that silence == No. Will post a patch shortly.Sorry for the delay response, I did create a patch, but did not post it out in time. Thanks Annie> Ian. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
ANNIE LI
2012-Nov-21 02:52 UTC
Re: [PATCH] xen/netfront: handle compound page fragments on transmit
On 2012-11-20 19:40, Ian Campbell wrote:> An SKB paged fragment can consist of a compound page with order> 0. > However the netchannel protocol deals only in PAGE_SIZE frames. > > Handle this in xennet_make_frags by iterating over the frames which > make up the page. > > This is the netfront equivalent to 6a8ed462f16b for netback. > > Signed-off-by: Ian Campbell<ian.campbell@citrix.com> > Cc: netdev@vger.kernel.org > Cc: xen-devel@lists.xen.org > Cc: Eric Dumazet<edumazet@google.com> > Cc: Konrad Rzeszutek Wilk<konrad@kernel.org> > Cc: ANNIE LI<annie.li@oracle.com> > Cc: Sander Eikelenboom<linux@eikelenboom.it> > Cc: Stefan Bader<stefan.bader@canonical.com> > --- > drivers/net/xen-netfront.c | 58 +++++++++++++++++++++++++++++++++---------- > 1 files changed, 44 insertions(+), 14 deletions(-) > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c > index caa0110..a12b99a 100644 > --- a/drivers/net/xen-netfront.c > +++ b/drivers/net/xen-netfront.c > @@ -452,24 +452,54 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev, > /* Grant backend access to each skb fragment page. */ > for (i = 0; i< frags; i++) { > skb_frag_t *frag = skb_shinfo(skb)->frags + i; > + struct page *page = skb_frag_page(frag); > + unsigned long size = skb_frag_size(frag); > + unsigned long offset = frag->page_offset;There are following definitions at the beginning of xennet_make_frags, unsigned int offset = offset_in_page(data); unsigned int len = skb_headlen(skb); Is it better to reuse those definitions, and not define new size and offset again in this for loop? And unsigned int is enough here, right?> > - tx->flags |= XEN_NETTXF_more_data; > + /* Data must not cross a page boundary. */ > + BUG_ON(size + offset> PAGE_SIZE<<compound_order(page)); > > - id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs); > - np->tx_skbs[id].skb = skb_get(skb); > - tx = RING_GET_REQUEST(&np->tx, prod++); > - tx->id = id; > - ref = gnttab_claim_grant_reference(&np->gref_tx_head); > - BUG_ON((signed short)ref< 0); > + /* Skip unused frames from start of page */ > + page += offset>> PAGE_SHIFT; > + offset&= ~PAGE_MASK; > > - mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag))); > - gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id, > - mfn, GNTMAP_readonly); > + while (size> 0) { > + unsigned long bytes; > > - tx->gref = np->grant_tx_ref[id] = ref; > - tx->offset = frag->page_offset; > - tx->size = skb_frag_size(frag); > - tx->flags = 0; > + BUG_ON(offset>= PAGE_SIZE); > + > + bytes = PAGE_SIZE - offset; > + if (bytes> size) > + bytes = size; > + > + tx->flags |= XEN_NETTXF_more_data; > + > + id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs);Over 80 characters?> + np->tx_skbs[id].skb = skb_get(skb); > + tx = RING_GET_REQUEST(&np->tx, prod++); > + tx->id = id; > + ref = gnttab_claim_grant_reference(&np->gref_tx_head); > + BUG_ON((signed short)ref< 0); > + > + mfn = pfn_to_mfn(page_to_pfn(page)); > + gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id, > + mfn, GNTMAP_readonly);Over 80 characters? Thanks Annie> + > + tx->gref = np->grant_tx_ref[id] = ref; > + tx->offset = offset; > + tx->size = bytes; > + tx->flags = 0; > + > + offset += bytes; > + size -= bytes; > + > + /* Next frame */ > + if (offset == PAGE_SIZE&& size) { > + BUG_ON(!PageCompound(page)); > + page++; > + offset = 0; > + } > + } > } > > np->tx.req_prod_pvt = prod;
Ian Campbell
2012-Nov-21 11:09 UTC
Re: [PATCH] xen/netfront: handle compound page fragments on transmit
On Wed, 2012-11-21 at 02:52 +0000, ANNIE LI wrote:> > On 2012-11-20 19:40, Ian Campbell wrote: > > An SKB paged fragment can consist of a compound page with order> 0. > > However the netchannel protocol deals only in PAGE_SIZE frames. > > > > Handle this in xennet_make_frags by iterating over the frames which > > make up the page. > > > > This is the netfront equivalent to 6a8ed462f16b for netback. > > > > Signed-off-by: Ian Campbell<ian.campbell@citrix.com> > > Cc: netdev@vger.kernel.org > > Cc: xen-devel@lists.xen.org > > Cc: Eric Dumazet<edumazet@google.com> > > Cc: Konrad Rzeszutek Wilk<konrad@kernel.org> > > Cc: ANNIE LI<annie.li@oracle.com> > > Cc: Sander Eikelenboom<linux@eikelenboom.it> > > Cc: Stefan Bader<stefan.bader@canonical.com> > > --- > > drivers/net/xen-netfront.c | 58 +++++++++++++++++++++++++++++++++---------- > > 1 files changed, 44 insertions(+), 14 deletions(-) > > > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c > > index caa0110..a12b99a 100644 > > --- a/drivers/net/xen-netfront.c > > +++ b/drivers/net/xen-netfront.c > > @@ -452,24 +452,54 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev, > > /* Grant backend access to each skb fragment page. */ > > for (i = 0; i< frags; i++) { > > skb_frag_t *frag = skb_shinfo(skb)->frags + i; > > + struct page *page = skb_frag_page(frag); > > + unsigned long size = skb_frag_size(frag); > > + unsigned long offset = frag->page_offset; > > There are following definitions at the beginning of xennet_make_frags, > > unsigned int offset = offset_in_page(data); > unsigned int len = skb_headlen(skb);So they are, well spotted.> Is it better to reuse those definitions, and not define new size and > offset again in this for loop? And unsigned int is enough here, right?Yes to both.> [...] > Over 80 characters?[...]> Over 80 characters?Both fixed, thanks for your review. Ian.
ANNIE LI
2012-Nov-23 01:30 UTC
Re: netchannel vs MAX_SKB_FRAGS (Was: Re: [PATCH] xen/netfront: handle compound page fragments on transmit)
On 2012-11-21 0:04, Ian Campbell wrote:> Limiting to just xen-devel folks. > > On Tue, 2012-11-20 at 15:54 +0000, Ian Campbell wrote: > > This highlights a couple of issues, the main one is that implicitly > including MAX_SKB_FRAGS in the PV net protocol is just madness. It can > and has changed in the past. Someone really needs to (retroactively) > figure out what a sensible default minimum which front and back must > support is and use it in the front and backends instead of > MAX_SKB_FRAGS. Probably something derived from the existing 64K limit > (Linux has used 17 and 18 as MAX_SKB_FRAGS in the past). > > Then perhaps implement a feature-flag to allow front and backends to > negotiate something bigger if they like. > > It might also be interesting for front and backends to coalesce multiple > ring slots into compound pages.Yes, MAX_SKB_FRAGS is max frag number, not max slot number required. But I am not so clear about the whole implementation here. Does it mean netfront/netback uses something like MAX_SKB_FRAGS*SLOTS_PER_FRAG? and SLOTS_PER_FRAG can be adjusted based on different situation? If netfront and netback negotiate this value, then xenstore key feature is required, right? As mentioned in another compound page fragments patch, for non-compound page, this value could be 64K/PAGE_SIZE + 1, but if taking account into compound page, the worse case is around 48 slots(in 1+4096+1 case). It is hard to negotiate this between netfront and netback for different packets. Thanks Annie> I don''t have time for either of those unfortunately > > Ian. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
James Harper
2012-Nov-23 02:10 UTC
Re: netchannel vs MAX_SKB_FRAGS (Was: Re: [PATCH] xen/netfront: handle compound page fragments on transmit)
> > On 2012-11-21 0:04, Ian Campbell wrote: > > Limiting to just xen-devel folks. > > > > On Tue, 2012-11-20 at 15:54 +0000, Ian Campbell wrote: > > > > This highlights a couple of issues, the main one is that implicitly > > including MAX_SKB_FRAGS in the PV net protocol is just madness. It can > > and has changed in the past. Someone really needs to (retroactively) > > figure out what a sensible default minimum which front and back must > > support is and use it in the front and backends instead of > > MAX_SKB_FRAGS. Probably something derived from the existing 64K limit > > (Linux has used 17 and 18 as MAX_SKB_FRAGS in the past). > > > > Then perhaps implement a feature-flag to allow front and backends to > > negotiate something bigger if they like. > > > > It might also be interesting for front and backends to coalesce > > multiple ring slots into compound pages. > > Yes, MAX_SKB_FRAGS is max frag number, not max slot number required. > But I am not so clear about the whole implementation here. Does it mean > netfront/netback uses something like MAX_SKB_FRAGS*SLOTS_PER_FRAG? > and SLOTS_PER_FRAG can be adjusted based on different situation? If > netfront and netback negotiate this value, then xenstore key feature is > required, right? > As mentioned in another compound page fragments patch, for non- > compound page, this value could be 64K/PAGE_SIZE + 1, but if taking account > into compound page, the worse case is around 48 slots(in 1+4096+1 case). It > is hard to negotiate this between netfront and netback for different packets. >FWIW, Windows appears to have no real limit in its SG list, and if its >18 then netback (or linux?) just drops such packets. For a 64kb "large" packet, the buffer arrangement would likely be eth hdr + ip hdr + tcp hdr + (64K - total header size)/PAGE_SIZE + 1 frags. It would be good if the documentation of the protocol defined this limitation (which may not even exist if dom0 was something other than Linux anyway). James
ANNIE LI
2012-Nov-23 06:49 UTC
Re: netchannel vs MAX_SKB_FRAGS (Was: Re: [PATCH] xen/netfront: handle compound page fragments on transmit)
On 2012-11-23 10:10, James Harper wrote:>> On 2012-11-21 0:04, Ian Campbell wrote: >>> Limiting to just xen-devel folks. >>> >>> On Tue, 2012-11-20 at 15:54 +0000, Ian Campbell wrote: >>> >>> This highlights a couple of issues, the main one is that implicitly >>> including MAX_SKB_FRAGS in the PV net protocol is just madness. It can >>> and has changed in the past. Someone really needs to (retroactively) >>> figure out what a sensible default minimum which front and back must >>> support is and use it in the front and backends instead of >>> MAX_SKB_FRAGS. Probably something derived from the existing 64K limit >>> (Linux has used 17 and 18 as MAX_SKB_FRAGS in the past). >>> >>> Then perhaps implement a feature-flag to allow front and backends to >>> negotiate something bigger if they like. >>> >>> It might also be interesting for front and backends to coalesce >>> multiple ring slots into compound pages. >> Yes, MAX_SKB_FRAGS is max frag number, not max slot number required. >> But I am not so clear about the whole implementation here. Does it mean >> netfront/netback uses something like MAX_SKB_FRAGS*SLOTS_PER_FRAG? >> and SLOTS_PER_FRAG can be adjusted based on different situation? If >> netfront and netback negotiate this value, then xenstore key feature is >> required, right? >> As mentioned in another compound page fragments patch, for non- >> compound page, this value could be 64K/PAGE_SIZE + 1, but if taking account >> into compound page, the worse case is around 48 slots(in 1+4096+1 case). It >> is hard to negotiate this between netfront and netback for different packets. >> > FWIW, Windows appears to have no real limit in its SG list, and if its>18 then netback (or linux?) just drops such packets.This should be netback dropping packets in function netbk_count_requests, if (unlikely(frags >= MAX_SKB_FRAGS)) { netdev_dbg(vif->dev, "Too many frags\n"); return -frags; }> For a 64kb "large" packet, the buffer arrangement would likely be eth hdr + ip hdr + tcp hdr + (64K - total header size)/PAGE_SIZE + 1 frags. It would be good if the documentation of the protocol defined this limitation (which may not even exist if dom0 was something other than Linux anyway).Agree. Thanks Annie> > James > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Ian Campbell
2012-Nov-23 09:18 UTC
Re: netchannel vs MAX_SKB_FRAGS (Was: Re: [PATCH] xen/netfront: handle compound page fragments on transmit)
On Fri, 2012-11-23 at 01:30 +0000, ANNIE LI wrote:> > On 2012-11-21 0:04, Ian Campbell wrote: > > Limiting to just xen-devel folks. > > > > On Tue, 2012-11-20 at 15:54 +0000, Ian Campbell wrote: > > > > This highlights a couple of issues, the main one is that implicitly > > including MAX_SKB_FRAGS in the PV net protocol is just madness. It can > > and has changed in the past. Someone really needs to (retroactively) > > figure out what a sensible default minimum which front and back must > > support is and use it in the front and backends instead of > > MAX_SKB_FRAGS. Probably something derived from the existing 64K limit > > (Linux has used 17 and 18 as MAX_SKB_FRAGS in the past). > > > > Then perhaps implement a feature-flag to allow front and backends to > > negotiate something bigger if they like. > > > > It might also be interesting for front and backends to coalesce multiple > > ring slots into compound pages. > > Yes, MAX_SKB_FRAGS is max frag number, not max slot number required. > But I am not so clear about the whole implementation here. Does it mean > netfront/netback uses something like MAX_SKB_FRAGS*SLOTS_PER_FRAG?I suppose that''s one way of looking at it. SLOTS_PER_FRAG is 1 in the current Linux implementation. But really MAX_SKB_FRAGS (and the concept of frags generally, although I suspect most OSes have something equivalent in their SG mechanism) is a Linux-ism which has leaked into the protocol. A non-Linux front or backend might support more or less and have to jump through hoops to remain compatible with the Linux versions. So actually maybe it''s better to think in terms of MAX_SLOTS_PER_FRAME with no reference to frags at all. Currently MAX_SLOTS_PER_FRAME =MAX_SKB_FRAGS for the Linux implementation of front/back. This has leaked into the protocol, since Linux is the most commonly used backend, and so we''ve placed additional requirements onto non-Linux front and backends.> and > SLOTS_PER_FRAG can be adjusted based on different situation? If netfront > and netback negotiate this value, then xenstore key feature is required, > right?Yes.> As mentioned in another compound page fragments patch, for non-compound > page, this value could be 64K/PAGE_SIZE + 1, but if taking account into > compound page, the worse case is around 48 slots(in 1+4096+1 case).That''s for order 5 compound pages,> It is hard to negotiate this between netfront and netback for different > packets.You wouldn''t negotiate for each packet, you would negotiate at start of day. You''d also need some sort of fallback for the case where you end up negotiating something smaller than the maximum your upper layer network stack might give you. I suppose you''d have to do segmentation somewhere along the line. (We have this problem now, and the Linux implementation just ignores it and drops the frames.) Ian.> > Thanks > Annie > > I don''t have time for either of those unfortunately > > > > Ian. > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel
Paul Durrant
2012-Nov-23 09:22 UTC
Re: netchannel vs MAX_SKB_FRAGS (Was: Re: [PATCH] xen/netfront: handle compound page fragments on transmit)
> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > bounces@lists.xen.org] On Behalf Of James Harper > Sent: 23 November 2012 02:11 > To: ANNIE LI; Ian Campbell > Cc: KonradRzeszutekWilk; xen-devel > Subject: Re: [Xen-devel] netchannel vs MAX_SKB_FRAGS (Was: Re: [PATCH] > xen/netfront: handle compound page fragments on transmit) > > > > > On 2012-11-21 0:04, Ian Campbell wrote: > > > Limiting to just xen-devel folks. > > > > > > On Tue, 2012-11-20 at 15:54 +0000, Ian Campbell wrote: > > > > > > This highlights a couple of issues, the main one is that implicitly > > > including MAX_SKB_FRAGS in the PV net protocol is just madness. It > > > can and has changed in the past. Someone really needs to > > > (retroactively) figure out what a sensible default minimum which > > > front and back must support is and use it in the front and backends > > > instead of MAX_SKB_FRAGS. Probably something derived from the > > > existing 64K limit (Linux has used 17 and 18 as MAX_SKB_FRAGS in the > past). > > > > > > Then perhaps implement a feature-flag to allow front and backends to > > > negotiate something bigger if they like. > > > > > > It might also be interesting for front and backends to coalesce > > > multiple ring slots into compound pages. > > > > Yes, MAX_SKB_FRAGS is max frag number, not max slot number required. > > But I am not so clear about the whole implementation here. Does it > > mean netfront/netback uses something like > MAX_SKB_FRAGS*SLOTS_PER_FRAG? > > and SLOTS_PER_FRAG can be adjusted based on different situation? If > > netfront and netback negotiate this value, then xenstore key feature > > is required, right? > > As mentioned in another compound page fragments patch, for non- > > compound page, this value could be 64K/PAGE_SIZE + 1, but if taking > > account into compound page, the worse case is around 48 slots(in > > 1+4096+1 case). It is hard to negotiate this between netfront and netback > for different packets. > > > > FWIW, Windows appears to have no real limit in its SG list, and if its >18 then > netback (or linux?) just drops such packets. For a 64kb "large" packet, the > buffer arrangement would likely be eth hdr + ip hdr + tcp hdr + (64K - total > header size)/PAGE_SIZE + 1 frags. It would be good if the documentation of > the protocol defined this limitation (which may not even exist if dom0 was > something other than Linux anyway). >Given that it''s a backend limitation and not a protocol limitation then it''s probably not for documentation to define. Instead it may be more useful if the protocol itself allowed the backend to notify the frontend of its fragmentation constraints (and probably vice versa). Paul
James Harper
2012-Nov-23 09:26 UTC
Re: netchannel vs MAX_SKB_FRAGS (Was: Re: [PATCH] xen/netfront: handle compound page fragments on transmit)
> > Given that it''s a backend limitation and not a protocol limitation then it''s > probably not for documentation to define. Instead it may be more useful if > the protocol itself allowed the backend to notify the frontend of its > fragmentation constraints (and probably vice versa). >That would be good. It would have been nice if the feature-sg defined the maximum number of fragments but I guess that can''t be changed now without breaking backwards compatibility. James
Ian Campbell
2012-Nov-23 09:34 UTC
Re: netchannel vs MAX_SKB_FRAGS (Was: Re: [PATCH] xen/netfront: handle compound page fragments on transmit)
On Fri, 2012-11-23 at 09:26 +0000, James Harper wrote:> > > > Given that it''s a backend limitation and not a protocol limitation then it''s > > probably not for documentation to define. Instead it may be more useful if > > the protocol itself allowed the backend to notify the frontend of its > > fragmentation constraints (and probably vice versa). > > > > That would be good. It would have been nice if the feature-sg defined > the maximum number of fragments but I guess that can''t be changed now > without breaking backwards compatibility.We could add such a node, with the default in its absence being whatever to status quo is right now (16..18 slots per frame, whatever it actually is). Ian.
annie li
2012-Nov-23 15:08 UTC
Re: netchannel vs MAX_SKB_FRAGS (Was: Re: [PATCH] xen/netfront: handle compound page fragments on transmit)
On 2012-11-23 17:18, Ian Campbell wrote:> On Fri, 2012-11-23 at 01:30 +0000, ANNIE LI wrote: > >> It is hard to negotiate this between netfront and netback for different >> packets. >> > You wouldn''t negotiate for each packet, you would negotiate at start of > day. > > You''d also need some sort of fallback for the case where you end up > negotiating something smaller than the maximum your upper layer network > stack might give you. I suppose you''d have to do segmentation somewhere > along the line. (We have this problem now, and the Linux implementation > just ignores it and drops the frames.)What I am thinking is, this negotiation would be implemented during xenbus communication. * netback provides its default value in xenstore * netfront read out this value and compared this value with itself, then write back the larger one netback = netfront, use the same value netback > netfront, use netfront value and netfront does not need more slots netback < netfront, use netfront value and netfront does not need to do segmentation. But if netfront''s total frag numbers exceed the max frag number of dom0(like linux''s MAX_SKB_FRAGS), segmentation is needed for this kind of packets? * netback read out the negotiated value from xenstore It seems changing current netback to per-VIF is necessary(persistent grant benefit from it too), and every different vif in netback maintains its own value. Thanks Annie