Jan Beulich
2013-Jul-16 09:46 UTC
[PATCH v2] xen-netfront: pull on receive skb may need to happen earlier
Due to commit 3683243b ("xen-netfront: use __pskb_pull_tail to ensure linear area is big enough on RX") xennet_fill_frags() may end up filling MAX_SKB_FRAGS + 1 fragments in a receive skb, and only reduce the fragment count subsequently via __pskb_pull_tail(). That''s a result of xennet_get_responses() allowing a maximum of one more slot to be consumed (and intermediately transformed into a fragment) if the head slot has a size less than or equal to RX_COPY_THRESHOLD. Hence we need to adjust xennet_fill_frags() to pull earlier if we reached the maximum fragment count - due to the described behavior of xennet_get_responses() this guarantees that at least the first fragment will get completely consumed, and hence the fragment count reduced. In order to not needlessly call __pskb_pull_tail() twice, make the original call conditional upon the pull target not having been reached yet, and defer the newly added one as much as possible (an alternative would have been to always call the function right before the call to xennet_fill_frags(), but that would imply more frequent cases of needing to call it twice). Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: stable@vger.kernel.org (3.6 onwards) --- v2: Use skb_add_rx_frag() to keep all accounting fields up to date as we go (skb->len needing intermediate updating was pointed out by Wei Liu and David Miller, shinfo->nr_frags needing updating before calling __pskb_pull_tail() was spotted out by Dion Kant). --- drivers/net/xen-netfront.c | 32 +++++++++++++------------------- 1 file changed, 13 insertions(+), 19 deletions(-) --- 3.11-rc1/drivers/net/xen-netfront.c +++ 3.11-rc1-xen-netfront-pull-earlier/drivers/net/xen-netfront.c @@ -286,8 +286,7 @@ no_skb: break; } - __skb_fill_page_desc(skb, 0, page, 0, 0); - skb_shinfo(skb)->nr_frags = 1; + skb_add_rx_frag(skb, 0, page, 0, 0, PAGE_SIZE); __skb_queue_tail(&np->rx_batch, skb); } @@ -831,7 +830,6 @@ static RING_IDX xennet_fill_frags(struct struct sk_buff_head *list) { struct skb_shared_info *shinfo = skb_shinfo(skb); - int nr_frags = shinfo->nr_frags; RING_IDX cons = np->rx.rsp_cons; struct sk_buff *nskb; @@ -840,19 +838,21 @@ static RING_IDX xennet_fill_frags(struct RING_GET_RESPONSE(&np->rx, ++cons); skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0]; - __skb_fill_page_desc(skb, nr_frags, - skb_frag_page(nfrag), - rx->offset, rx->status); + if (shinfo->nr_frags == MAX_SKB_FRAGS) { + unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to; - skb->data_len += rx->status; + BUG_ON(pull_to <= skb_headlen(skb)); + __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); + } + BUG_ON(shinfo->nr_frags >= MAX_SKB_FRAGS); + + skb_add_rx_frag(skb, shinfo->nr_frags, skb_frag_page(nfrag), + rx->offset, rx->status, PAGE_SIZE); skb_shinfo(nskb)->nr_frags = 0; kfree_skb(nskb); - - nr_frags++; } - shinfo->nr_frags = nr_frags; return cons; } @@ -933,7 +933,8 @@ static int handle_incoming_queue(struct while ((skb = __skb_dequeue(rxq)) != NULL) { int pull_to = NETFRONT_SKB_CB(skb)->pull_to; - __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); + if (pull_to > skb_headlen(skb)) + __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); /* Ethernet work: Delayed to here as it peeks the header. */ skb->protocol = eth_type_trans(skb, dev); @@ -1018,17 +1019,10 @@ err: skb_shinfo(skb)->frags[0].page_offset = rx->offset; skb_frag_size_set(&skb_shinfo(skb)->frags[0], rx->status); - skb->data_len = rx->status; + skb->len += skb->data_len = rx->status; i = xennet_fill_frags(np, skb, &tmpq); - /* - * Truesize is the actual allocation size, even if the - * allocation is only partially used. - */ - skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags; - skb->len += skb->data_len; - if (rx->flags & XEN_NETRXF_csum_blank) skb->ip_summed = CHECKSUM_PARTIAL; else if (rx->flags & XEN_NETRXF_data_validated)
Wei Liu
2013-Jul-16 10:25 UTC
Re: [PATCH v2] xen-netfront: pull on receive skb may need to happen earlier
On Tue, Jul 16, 2013 at 10:46:01AM +0100, Jan Beulich wrote:> Due to commit 3683243b ("xen-netfront: use __pskb_pull_tail to ensure > linear area is big enough on RX") xennet_fill_frags() may end up > filling MAX_SKB_FRAGS + 1 fragments in a receive skb, and only reduce > the fragment count subsequently via __pskb_pull_tail(). That''s a > result of xennet_get_responses() allowing a maximum of one more slot to > be consumed (and intermediately transformed into a fragment) if the > head slot has a size less than or equal to RX_COPY_THRESHOLD. > > Hence we need to adjust xennet_fill_frags() to pull earlier if we > reached the maximum fragment count - due to the described behavior of > xennet_get_responses() this guarantees that at least the first fragment > will get completely consumed, and hence the fragment count reduced. > > In order to not needlessly call __pskb_pull_tail() twice, make the > original call conditional upon the pull target not having been reached > yet, and defer the newly added one as much as possible (an alternative > would have been to always call the function right before the call to > xennet_fill_frags(), but that would imply more frequent cases of > needing to call it twice). > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > Cc: Wei Liu <wei.liu2@citrix.com> > Cc: Ian Campbell <ian.campbell@citrix.com> > Cc: stable@vger.kernel.org (3.6 onwards) > --- > v2: Use skb_add_rx_frag() to keep all accounting fields up to date as > we go (skb->len needing intermediate updating was pointed out by > Wei Liu and David Miller, shinfo->nr_frags needing updating before > calling __pskb_pull_tail() was spotted out by Dion Kant).Jan and Dion, is this a confirmed fix for SuSE kernel? I complied and tested it, at least it didn''t break things for me. The tests I ran were 1) scp large_file to domU; 2) iperf from Dom0 to DomU; 3) netperf from Dom0 to DomU. On the basis that this patch 1) fixes the bug for SuSE kernel (to be confirmed with Jan); 2) doesn''t break upstream (tested by me). Acked-by: Wei Liu <wei.liu2@citrix.com>> > --- > drivers/net/xen-netfront.c | 32 +++++++++++++------------------- > 1 file changed, 13 insertions(+), 19 deletions(-) > > --- 3.11-rc1/drivers/net/xen-netfront.c > +++ 3.11-rc1-xen-netfront-pull-earlier/drivers/net/xen-netfront.c > @@ -286,8 +286,7 @@ no_skb: > break; > } > > - __skb_fill_page_desc(skb, 0, page, 0, 0); > - skb_shinfo(skb)->nr_frags = 1; > + skb_add_rx_frag(skb, 0, page, 0, 0, PAGE_SIZE); > __skb_queue_tail(&np->rx_batch, skb); > } > > @@ -831,7 +830,6 @@ static RING_IDX xennet_fill_frags(struct > struct sk_buff_head *list) > { > struct skb_shared_info *shinfo = skb_shinfo(skb); > - int nr_frags = shinfo->nr_frags; > RING_IDX cons = np->rx.rsp_cons; > struct sk_buff *nskb; > > @@ -840,19 +838,21 @@ static RING_IDX xennet_fill_frags(struct > RING_GET_RESPONSE(&np->rx, ++cons); > skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0]; > > - __skb_fill_page_desc(skb, nr_frags, > - skb_frag_page(nfrag), > - rx->offset, rx->status); > + if (shinfo->nr_frags == MAX_SKB_FRAGS) { > + unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to; > > - skb->data_len += rx->status; > + BUG_ON(pull_to <= skb_headlen(skb)); > + __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); > + } > + BUG_ON(shinfo->nr_frags >= MAX_SKB_FRAGS); > + > + skb_add_rx_frag(skb, shinfo->nr_frags, skb_frag_page(nfrag), > + rx->offset, rx->status, PAGE_SIZE); > > skb_shinfo(nskb)->nr_frags = 0; > kfree_skb(nskb); > - > - nr_frags++; > } > > - shinfo->nr_frags = nr_frags; > return cons; > } > > @@ -933,7 +933,8 @@ static int handle_incoming_queue(struct > while ((skb = __skb_dequeue(rxq)) != NULL) { > int pull_to = NETFRONT_SKB_CB(skb)->pull_to; > > - __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); > + if (pull_to > skb_headlen(skb)) > + __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); > > /* Ethernet work: Delayed to here as it peeks the header. */ > skb->protocol = eth_type_trans(skb, dev); > @@ -1018,17 +1019,10 @@ err: > > skb_shinfo(skb)->frags[0].page_offset = rx->offset; > skb_frag_size_set(&skb_shinfo(skb)->frags[0], rx->status); > - skb->data_len = rx->status; > + skb->len += skb->data_len = rx->status; > > i = xennet_fill_frags(np, skb, &tmpq); > > - /* > - * Truesize is the actual allocation size, even if the > - * allocation is only partially used. > - */ > - skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags; > - skb->len += skb->data_len; > - > if (rx->flags & XEN_NETRXF_csum_blank) > skb->ip_summed = CHECKSUM_PARTIAL; > else if (rx->flags & XEN_NETRXF_data_validated) >
Ian Campbell
2013-Jul-16 10:33 UTC
Re: [PATCH v2] xen-netfront: pull on receive skb may need to happen earlier
On Tue, 2013-07-16 at 10:46 +0100, Jan Beulich wrote:> Due to commit 3683243b ("xen-netfront: use __pskb_pull_tail to ensure > linear area is big enough on RX") xennet_fill_frags() may end up > filling MAX_SKB_FRAGS + 1 fragments in a receive skb, and only reduce > the fragment count subsequently via __pskb_pull_tail(). That''s a > result of xennet_get_responses() allowing a maximum of one more slot to > be consumed (and intermediately transformed into a fragment) if the > head slot has a size less than or equal to RX_COPY_THRESHOLD. > > Hence we need to adjust xennet_fill_frags() to pull earlier if we > reached the maximum fragment count - due to the described behavior of > xennet_get_responses() this guarantees that at least the first fragment > will get completely consumed, and hence the fragment count reduced. > > In order to not needlessly call __pskb_pull_tail() twice, make the > original call conditional upon the pull target not having been reached > yet, and defer the newly added one as much as possible (an alternative > would have been to always call the function right before the call to > xennet_fill_frags(), but that would imply more frequent cases of > needing to call it twice). > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > Cc: Wei Liu <wei.liu2@citrix.com> > Cc: Ian Campbell <ian.campbell@citrix.com> > Cc: stable@vger.kernel.org (3.6 onwards) > --- > v2: Use skb_add_rx_frag() to keep all accounting fields up to date as > we go (skb->len needing intermediate updating was pointed out by > Wei Liu and David Miller, shinfo->nr_frags needing updating before > calling __pskb_pull_tail() was spotted out by Dion Kant). > > --- > drivers/net/xen-netfront.c | 32 +++++++++++++------------------- > 1 file changed, 13 insertions(+), 19 deletions(-) > > --- 3.11-rc1/drivers/net/xen-netfront.c > +++ 3.11-rc1-xen-netfront-pull-earlier/drivers/net/xen-netfront.c > @@ -286,8 +286,7 @@ no_skb: > break; > } > > - __skb_fill_page_desc(skb, 0, page, 0, 0); > - skb_shinfo(skb)->nr_frags = 1; > + skb_add_rx_frag(skb, 0, page, 0, 0, PAGE_SIZE); > __skb_queue_tail(&np->rx_batch, skb); > } > > @@ -831,7 +830,6 @@ static RING_IDX xennet_fill_frags(struct > struct sk_buff_head *list) > { > struct skb_shared_info *shinfo = skb_shinfo(skb); > - int nr_frags = shinfo->nr_frags; > RING_IDX cons = np->rx.rsp_cons; > struct sk_buff *nskb; > > @@ -840,19 +838,21 @@ static RING_IDX xennet_fill_frags(struct > RING_GET_RESPONSE(&np->rx, ++cons); > skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0]; > > - __skb_fill_page_desc(skb, nr_frags, > - skb_frag_page(nfrag), > - rx->offset, rx->status); > + if (shinfo->nr_frags == MAX_SKB_FRAGS) { > + unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to; > > - skb->data_len += rx->status; > + BUG_ON(pull_to <= skb_headlen(skb)); > + __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); > + } > + BUG_ON(shinfo->nr_frags >= MAX_SKB_FRAGS); > + > + skb_add_rx_frag(skb, shinfo->nr_frags, skb_frag_page(nfrag), > + rx->offset, rx->status, PAGE_SIZE); > > skb_shinfo(nskb)->nr_frags = 0; > kfree_skb(nskb); > - > - nr_frags++; > } > > - shinfo->nr_frags = nr_frags; > return cons; > } > > @@ -933,7 +933,8 @@ static int handle_incoming_queue(struct > while ((skb = __skb_dequeue(rxq)) != NULL) { > int pull_to = NETFRONT_SKB_CB(skb)->pull_to; > > - __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); > + if (pull_to > skb_headlen(skb)) > + __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); > > /* Ethernet work: Delayed to here as it peeks the header. */ > skb->protocol = eth_type_trans(skb, dev); > @@ -1018,17 +1019,10 @@ err: > > skb_shinfo(skb)->frags[0].page_offset = rx->offset; > skb_frag_size_set(&skb_shinfo(skb)->frags[0], rx->status); > - skb->data_len = rx->status; > + skb->len += skb->data_len = rx->status;Of the top of my head I have no confidence in my understanding of what this is actually going to end up assigned to ->len or ->data_len after this. For my benefit, can we do the dumb version please? Also ref CodingStyle: Don''t put multiple assignments on a single line either. Kernel coding style is super simple. Avoid tricky expressions. The rest of it looked OK to me. Ian.> > i = xennet_fill_frags(np, skb, &tmpq); > > - /* > - * Truesize is the actual allocation size, even if the > - * allocation is only partially used. > - */ > - skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags; > - skb->len += skb->data_len; > - > if (rx->flags & XEN_NETRXF_csum_blank) > skb->ip_summed = CHECKSUM_PARTIAL; > else if (rx->flags & XEN_NETRXF_data_validated) > >
Dion Kant
2013-Jul-16 10:33 UTC
Re: [PATCH v2] xen-netfront: pull on receive skb may need to happen earlier
On 07/16/2013 12:25 PM, Wei Liu wrote:> On Tue, Jul 16, 2013 at 10:46:01AM +0100, Jan Beulich wrote: >> Due to commit 3683243b ("xen-netfront: use __pskb_pull_tail to ensure >> linear area is big enough on RX") xennet_fill_frags() may end up >> filling MAX_SKB_FRAGS + 1 fragments in a receive skb, and only reduce >> the fragment count subsequently via __pskb_pull_tail(). That''s a >> result of xennet_get_responses() allowing a maximum of one more slot to >> be consumed (and intermediately transformed into a fragment) if the >> head slot has a size less than or equal to RX_COPY_THRESHOLD. >> >> Hence we need to adjust xennet_fill_frags() to pull earlier if we >> reached the maximum fragment count - due to the described behavior of >> xennet_get_responses() this guarantees that at least the first fragment >> will get completely consumed, and hence the fragment count reduced. >> >> In order to not needlessly call __pskb_pull_tail() twice, make the >> original call conditional upon the pull target not having been reached >> yet, and defer the newly added one as much as possible (an alternative >> would have been to always call the function right before the call to >> xennet_fill_frags(), but that would imply more frequent cases of >> needing to call it twice). >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> Cc: Wei Liu <wei.liu2@citrix.com> >> Cc: Ian Campbell <ian.campbell@citrix.com> >> Cc: stable@vger.kernel.org (3.6 onwards) >> --- >> v2: Use skb_add_rx_frag() to keep all accounting fields up to date as >> we go (skb->len needing intermediate updating was pointed out by >> Wei Liu and David Miller, shinfo->nr_frags needing updating before >> calling __pskb_pull_tail() was spotted out by Dion Kant). > Jan and Dion, is this a confirmed fix for SuSE kernel?Dear Wei, I did thorough testing on the system which revealed the original problem and I can confirm that it does fix the issue. Dion> > I complied and tested it, at least it didn''t break things for me. The > tests I ran were 1) scp large_file to domU; 2) iperf from Dom0 to DomU; > 3) netperf from Dom0 to DomU. > > On the basis that this patch 1) fixes the bug for SuSE kernel (to be > confirmed with Jan); 2) doesn''t break upstream (tested by me). > > Acked-by: Wei Liu <wei.liu2@citrix.com> > >> --- >> drivers/net/xen-netfront.c | 32 +++++++++++++------------------- >> 1 file changed, 13 insertions(+), 19 deletions(-) >> >> --- 3.11-rc1/drivers/net/xen-netfront.c >> +++ 3.11-rc1-xen-netfront-pull-earlier/drivers/net/xen-netfront.c >> @@ -286,8 +286,7 @@ no_skb: >> break; >> } >> >> - __skb_fill_page_desc(skb, 0, page, 0, 0); >> - skb_shinfo(skb)->nr_frags = 1; >> + skb_add_rx_frag(skb, 0, page, 0, 0, PAGE_SIZE); >> __skb_queue_tail(&np->rx_batch, skb); >> } >> >> @@ -831,7 +830,6 @@ static RING_IDX xennet_fill_frags(struct >> struct sk_buff_head *list) >> { >> struct skb_shared_info *shinfo = skb_shinfo(skb); >> - int nr_frags = shinfo->nr_frags; >> RING_IDX cons = np->rx.rsp_cons; >> struct sk_buff *nskb; >> >> @@ -840,19 +838,21 @@ static RING_IDX xennet_fill_frags(struct >> RING_GET_RESPONSE(&np->rx, ++cons); >> skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0]; >> >> - __skb_fill_page_desc(skb, nr_frags, >> - skb_frag_page(nfrag), >> - rx->offset, rx->status); >> + if (shinfo->nr_frags == MAX_SKB_FRAGS) { >> + unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to; >> >> - skb->data_len += rx->status; >> + BUG_ON(pull_to <= skb_headlen(skb)); >> + __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); >> + } >> + BUG_ON(shinfo->nr_frags >= MAX_SKB_FRAGS); >> + >> + skb_add_rx_frag(skb, shinfo->nr_frags, skb_frag_page(nfrag), >> + rx->offset, rx->status, PAGE_SIZE); >> >> skb_shinfo(nskb)->nr_frags = 0; >> kfree_skb(nskb); >> - >> - nr_frags++; >> } >> >> - shinfo->nr_frags = nr_frags; >> return cons; >> } >> >> @@ -933,7 +933,8 @@ static int handle_incoming_queue(struct >> while ((skb = __skb_dequeue(rxq)) != NULL) { >> int pull_to = NETFRONT_SKB_CB(skb)->pull_to; >> >> - __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); >> + if (pull_to > skb_headlen(skb)) >> + __pskb_pull_tail(skb, pull_to - skb_headlen(skb)); >> >> /* Ethernet work: Delayed to here as it peeks the header. */ >> skb->protocol = eth_type_trans(skb, dev); >> @@ -1018,17 +1019,10 @@ err: >> >> skb_shinfo(skb)->frags[0].page_offset = rx->offset; >> skb_frag_size_set(&skb_shinfo(skb)->frags[0], rx->status); >> - skb->data_len = rx->status; >> + skb->len += skb->data_len = rx->status; >> >> i = xennet_fill_frags(np, skb, &tmpq); >> >> - /* >> - * Truesize is the actual allocation size, even if the >> - * allocation is only partially used. >> - */ >> - skb->truesize += PAGE_SIZE * skb_shinfo(skb)->nr_frags; >> - skb->len += skb->data_len; >> - >> if (rx->flags & XEN_NETRXF_csum_blank) >> skb->ip_summed = CHECKSUM_PARTIAL; >> else if (rx->flags & XEN_NETRXF_data_validated) >>
Jan Beulich
2013-Jul-16 11:26 UTC
Re: [PATCH v2] xen-netfront: pull on receive skb may need to happen earlier
>>> On 16.07.13 at 12:25, Wei Liu <wei.liu2@citrix.com> wrote: > On Tue, Jul 16, 2013 at 10:46:01AM +0100, Jan Beulich wrote: >> Due to commit 3683243b ("xen-netfront: use __pskb_pull_tail to ensure >> linear area is big enough on RX") xennet_fill_frags() may end up >> filling MAX_SKB_FRAGS + 1 fragments in a receive skb, and only reduce >> the fragment count subsequently via __pskb_pull_tail(). That''s a >> result of xennet_get_responses() allowing a maximum of one more slot to >> be consumed (and intermediately transformed into a fragment) if the >> head slot has a size less than or equal to RX_COPY_THRESHOLD. >> >> Hence we need to adjust xennet_fill_frags() to pull earlier if we >> reached the maximum fragment count - due to the described behavior of >> xennet_get_responses() this guarantees that at least the first fragment >> will get completely consumed, and hence the fragment count reduced. >> >> In order to not needlessly call __pskb_pull_tail() twice, make the >> original call conditional upon the pull target not having been reached >> yet, and defer the newly added one as much as possible (an alternative >> would have been to always call the function right before the call to >> xennet_fill_frags(), but that would imply more frequent cases of >> needing to call it twice). >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> Cc: Wei Liu <wei.liu2@citrix.com> >> Cc: Ian Campbell <ian.campbell@citrix.com> >> Cc: stable@vger.kernel.org (3.6 onwards) >> --- >> v2: Use skb_add_rx_frag() to keep all accounting fields up to date as >> we go (skb->len needing intermediate updating was pointed out by >> Wei Liu and David Miller, shinfo->nr_frags needing updating before >> calling __pskb_pull_tail() was spotted out by Dion Kant). > > Jan and Dion, is this a confirmed fix for SuSE kernel? > > I complied and tested it, at least it didn''t break things for me. The > tests I ran were 1) scp large_file to domU; 2) iperf from Dom0 to DomU; > 3) netperf from Dom0 to DomU.Yes, the equivalent of this applied to our driver fixes the issue for Dion. Jan