David Vrabel
2013-Sep-03 17:29 UTC
[PATCH] xen-netback: count number required slots for an skb more carefully
From: David Vrabel <david.vrabel@citrix.com> When a VM is providing an iSCSI target and the LUN is used by the backend domain, the generated skbs for direct I/O writes to the disk have large, multi-page skb->data but no frags. With some lengths and starting offsets, xen_netbk_count_skb_slots() would be one short because the simple calculation of DIV_ROUND_UP(skb_headlen(), PAGE_SIZE) was not accounting for the decisions made by start_new_rx_buffer() which does not guarantee responses are fully packed. For example, a skb with length < 2 pages but which spans 3 pages would be counted as requiring 2 slots but would actually use 3 slots. skb->data: | 1111|222222222222|3333 | Fully packed, this would need 2 slots: |111122222222|22223333 | But because the 2nd page wholy fits into a slot it is not split across slots and goes into a slot of its own: |1111 |222222222222|3333 | Miscounting the number of slots means netback may push more responses than the number of available requests. This will cause the frontend to get very confused and report "Too many frags/slots". The frontend never recovers and will eventually BUG. Fix this by counting the number of required slots more carefully. In xen_netbk_count_skb_slots(), more closely follow the algorithm used by xen_netbk_gop_skb() by introducing xen_netbk_count_frag_slots() which is the dry-run equivalent of netbk_gop_frag_copy(). Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- [Resend to actually Cc netdev, sorry.] --- drivers/net/xen-netback/netback.c | 94 +++++++++++++++++++++++++------------ 1 files changed, 64 insertions(+), 30 deletions(-) diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 64828de..f149ce5 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -361,6 +361,49 @@ static bool start_new_rx_buffer(int offset, unsigned long size, int head) return false; } +struct xen_netbk_count_slot_state { + unsigned long copy_off; + bool head; +}; + +unsigned int xen_netbk_count_frag_slots(struct xenvif *vif, + unsigned long offset, unsigned long size, + struct xen_netbk_count_slot_state *state) +{ + unsigned count = 0; + + offset &= ~PAGE_MASK; + + while (size > 0) { + unsigned long bytes; + + bytes = PAGE_SIZE - offset; + + if (bytes > size) + bytes = size; + + if (start_new_rx_buffer(state->copy_off, bytes, state->head)) { + count++; + state->copy_off = 0; + } + + if (state->copy_off + bytes > MAX_BUFFER_OFFSET) + bytes = MAX_BUFFER_OFFSET - state->copy_off; + + state->copy_off += bytes; + + offset += bytes; + size -= bytes; + + if (offset == PAGE_SIZE) + offset = 0; + + state->head = false; + } + + return count; +} + /* * Figure out how many ring slots we''re going to need to send @skb to * the guest. This function is essentially a dry run of @@ -368,48 +411,39 @@ static bool start_new_rx_buffer(int offset, unsigned long size, int head) */ unsigned int xen_netbk_count_skb_slots(struct xenvif *vif, struct sk_buff *skb) { + struct xen_netbk_count_slot_state state; unsigned int count; - int i, copy_off; + unsigned char *data; + unsigned i; - count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE); + state.head = true; + state.copy_off = 0; - copy_off = skb_headlen(skb) % PAGE_SIZE; + /* Slot for the first (partial) page of data. */ + count = 1; + /* Need a slot for the GSO prefix for GSO extra data? */ if (skb_shinfo(skb)->gso_size) count++; - for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { - unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]); - unsigned long offset = skb_shinfo(skb)->frags[i].page_offset; - unsigned long bytes; - - offset &= ~PAGE_MASK; - - while (size > 0) { - BUG_ON(offset >= PAGE_SIZE); - BUG_ON(copy_off > MAX_BUFFER_OFFSET); - - bytes = PAGE_SIZE - offset; - - if (bytes > size) - bytes = size; + data = skb->data; + while (data < skb_tail_pointer(skb)) { + unsigned long offset = offset_in_page(data); + unsigned long size = PAGE_SIZE - offset; - if (start_new_rx_buffer(copy_off, bytes, 0)) { - count++; - copy_off = 0; - } + if (data + size > skb_tail_pointer(skb)) + size = skb_tail_pointer(skb) - data; - if (copy_off + bytes > MAX_BUFFER_OFFSET) - bytes = MAX_BUFFER_OFFSET - copy_off; + count += xen_netbk_count_frag_slots(vif, offset, size, &state); - copy_off += bytes; + data += size; + } - offset += bytes; - size -= bytes; + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { + unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]); + unsigned long offset = skb_shinfo(skb)->frags[i].page_offset; - if (offset == PAGE_SIZE) - offset = 0; - } + count += xen_netbk_count_frag_slots(vif, offset, size, &state); } return count; } -- 1.7.2.5
Wei Liu
2013-Sep-03 21:53 UTC
Re: [PATCH] xen-netback: count number required slots for an skb more carefully
On Tue, Sep 03, 2013 at 06:29:50PM +0100, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com> > > When a VM is providing an iSCSI target and the LUN is used by the > backend domain, the generated skbs for direct I/O writes to the disk > have large, multi-page skb->data but no frags. > > With some lengths and starting offsets, xen_netbk_count_skb_slots() > would be one short because the simple calculation of > DIV_ROUND_UP(skb_headlen(), PAGE_SIZE) was not accounting for the > decisions made by start_new_rx_buffer() which does not guarantee > responses are fully packed. > > For example, a skb with length < 2 pages but which spans 3 pages would > be counted as requiring 2 slots but would actually use 3 slots. > > skb->data: > > | 1111|222222222222|3333 | > > Fully packed, this would need 2 slots: > > |111122222222|22223333 | > > But because the 2nd page wholy fits into a slot it is not split across > slots and goes into a slot of its own: > > |1111 |222222222222|3333 | > > Miscounting the number of slots means netback may push more responses > than the number of available requests. This will cause the frontend > to get very confused and report "Too many frags/slots". The frontend > never recovers and will eventually BUG. > > Fix this by counting the number of required slots more carefully. In > xen_netbk_count_skb_slots(), more closely follow the algorithm used by > xen_netbk_gop_skb() by introducing xen_netbk_count_frag_slots() which > is the dry-run equivalent of netbk_gop_frag_copy(). >Phew! So this is backend miscounting bug. I thought it was a frontend bug so it didn''t ring a bell when we had our face-to-face discussion, sorry. :-( This bug was discussed back in July among Annie, Matt, Ian and I. We finally agreed to take Matt''s solution. Matt agreed to post final version within a week but obviously he''s too busy to do so. I was away so I didn''t follow closely. Eventually it fell through the crack. :-( Matt, do you fancy sending the final version? IIRC the commit message needs to be re-written. I personally still prefer Matt''s solution as it a) make efficient use of the ring, b) uses ring pointers to calculate slots which is most accurate, c) removes the dependence on MAX_SKB_FRAGS in guest RX path. Anyway, we should get this fixed ASAP. Thanks Wei. REF: <1373409659-22383-1-git-send-email-msw@amazon.com> <1373350520-19985-1-git-send-email-annie.li@oracle.com>> Signed-off-by: David Vrabel <david.vrabel@citrix.com> > --- > [Resend to actually Cc netdev, sorry.] > --- > drivers/net/xen-netback/netback.c | 94 +++++++++++++++++++++++++------------ > 1 files changed, 64 insertions(+), 30 deletions(-) > > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c > index 64828de..f149ce5 100644 > --- a/drivers/net/xen-netback/netback.c > +++ b/drivers/net/xen-netback/netback.c > @@ -361,6 +361,49 @@ static bool start_new_rx_buffer(int offset, unsigned long size, int head) > return false; > } > > +struct xen_netbk_count_slot_state { > + unsigned long copy_off; > + bool head; > +}; > + > +unsigned int xen_netbk_count_frag_slots(struct xenvif *vif, > + unsigned long offset, unsigned long size, > + struct xen_netbk_count_slot_state *state) > +{ > + unsigned count = 0; > + > + offset &= ~PAGE_MASK; > + > + while (size > 0) { > + unsigned long bytes; > + > + bytes = PAGE_SIZE - offset; > + > + if (bytes > size) > + bytes = size; > + > + if (start_new_rx_buffer(state->copy_off, bytes, state->head)) { > + count++; > + state->copy_off = 0; > + } > + > + if (state->copy_off + bytes > MAX_BUFFER_OFFSET) > + bytes = MAX_BUFFER_OFFSET - state->copy_off; > + > + state->copy_off += bytes; > + > + offset += bytes; > + size -= bytes; > + > + if (offset == PAGE_SIZE) > + offset = 0; > + > + state->head = false; > + } > + > + return count; > +} > + > /* > * Figure out how many ring slots we''re going to need to send @skb to > * the guest. This function is essentially a dry run of > @@ -368,48 +411,39 @@ static bool start_new_rx_buffer(int offset, unsigned long size, int head) > */ > unsigned int xen_netbk_count_skb_slots(struct xenvif *vif, struct sk_buff *skb) > { > + struct xen_netbk_count_slot_state state; > unsigned int count; > - int i, copy_off; > + unsigned char *data; > + unsigned i; > > - count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE); > + state.head = true; > + state.copy_off = 0; > > - copy_off = skb_headlen(skb) % PAGE_SIZE; > + /* Slot for the first (partial) page of data. */ > + count = 1; > > + /* Need a slot for the GSO prefix for GSO extra data? */ > if (skb_shinfo(skb)->gso_size) > count++; > > - for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { > - unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]); > - unsigned long offset = skb_shinfo(skb)->frags[i].page_offset; > - unsigned long bytes; > - > - offset &= ~PAGE_MASK; > - > - while (size > 0) { > - BUG_ON(offset >= PAGE_SIZE); > - BUG_ON(copy_off > MAX_BUFFER_OFFSET); > - > - bytes = PAGE_SIZE - offset; > - > - if (bytes > size) > - bytes = size; > + data = skb->data; > + while (data < skb_tail_pointer(skb)) { > + unsigned long offset = offset_in_page(data); > + unsigned long size = PAGE_SIZE - offset; > > - if (start_new_rx_buffer(copy_off, bytes, 0)) { > - count++; > - copy_off = 0; > - } > + if (data + size > skb_tail_pointer(skb)) > + size = skb_tail_pointer(skb) - data; > > - if (copy_off + bytes > MAX_BUFFER_OFFSET) > - bytes = MAX_BUFFER_OFFSET - copy_off; > + count += xen_netbk_count_frag_slots(vif, offset, size, &state); > > - copy_off += bytes; > + data += size; > + } > > - offset += bytes; > - size -= bytes; > + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { > + unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]); > + unsigned long offset = skb_shinfo(skb)->frags[i].page_offset; > > - if (offset == PAGE_SIZE) > - offset = 0; > - } > + count += xen_netbk_count_frag_slots(vif, offset, size, &state); > } > return count; > } > -- > 1.7.2.5 > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
annie li
2013-Sep-04 02:25 UTC
Re: [PATCH] xen-netback: count number required slots for an skb more carefully
On 2013-9-4 5:53, Wei Liu wrote:> On Tue, Sep 03, 2013 at 06:29:50PM +0100, David Vrabel wrote: >> From: David Vrabel <david.vrabel@citrix.com> >> >> When a VM is providing an iSCSI target and the LUN is used by the >> backend domain, the generated skbs for direct I/O writes to the disk >> have large, multi-page skb->data but no frags. >> >> With some lengths and starting offsets, xen_netbk_count_skb_slots() >> would be one short because the simple calculation of >> DIV_ROUND_UP(skb_headlen(), PAGE_SIZE) was not accounting for the >> decisions made by start_new_rx_buffer() which does not guarantee >> responses are fully packed. >> >> For example, a skb with length < 2 pages but which spans 3 pages would >> be counted as requiring 2 slots but would actually use 3 slots. >> >> skb->data: >> >> | 1111|222222222222|3333 | >> >> Fully packed, this would need 2 slots: >> >> |111122222222|22223333 | >> >> But because the 2nd page wholy fits into a slot it is not split across >> slots and goes into a slot of its own: >> >> |1111 |222222222222|3333 | >> >> Miscounting the number of slots means netback may push more responses >> than the number of available requests. This will cause the frontend >> to get very confused and report "Too many frags/slots". The frontend >> never recovers and will eventually BUG. >> >> Fix this by counting the number of required slots more carefully. In >> xen_netbk_count_skb_slots(), more closely follow the algorithm used by >> xen_netbk_gop_skb() by introducing xen_netbk_count_frag_slots() which >> is the dry-run equivalent of netbk_gop_frag_copy(). >> > Phew! So this is backend miscounting bug. I thought it was a frontend > bug so it didn''t ring a bell when we had our face-to-face discussion, > sorry. :-( > > This bug was discussed back in July among Annie, Matt, Ian and I. We > finally agreed to take Matt''s solution. Matt agreed to post final > version within a week but obviously he''s too busy to do so. I was away > so I didn''t follow closely. Eventually it fell through the crack. :-(The fixes can be implemented in two ways, one is fix in xen_netbk_count_skb_slots to return correct slot count, my patch(http://lists.xen.org/archives/html/xen-devel/2013-07/msg00785.html) and David''s fall in this way. The other way is fixed in netbk_gop_frag_copy which is Matt''s(http://lists.xen.org/archives/html/xen-devel/2013-07/msg00760.html).> Matt, do you fancy sending the final version? IIRC the commit message > needs to be re-written. I personally still prefer Matt''s solution as > it a) make efficient use of the ring, b) uses ring pointers to > calculate slots which is most accurate, c) removes the dependence on > MAX_SKB_FRAGS in guest RX path. > > Anyway, we should get this fixed ASAP.Totally agree. This issue is easy to be reproduced with large MTU. It is better to upstream the fix soon in case others hit it and waste time to fix it. Thanks Annie> > Thanks > Wei. > > REF: > <1373409659-22383-1-git-send-email-msw@amazon.com> > <1373350520-19985-1-git-send-email-annie.li@oracle.com> > > >> Signed-off-by: David Vrabel <david.vrabel@citrix.com> >> --- >> [Resend to actually Cc netdev, sorry.] >> --- >> drivers/net/xen-netback/netback.c | 94 +++++++++++++++++++++++++------------ >> 1 files changed, 64 insertions(+), 30 deletions(-) >> >> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c >> index 64828de..f149ce5 100644 >> --- a/drivers/net/xen-netback/netback.c >> +++ b/drivers/net/xen-netback/netback.c >> @@ -361,6 +361,49 @@ static bool start_new_rx_buffer(int offset, unsigned long size, int head) >> return false; >> } >> >> +struct xen_netbk_count_slot_state { >> + unsigned long copy_off; >> + bool head; >> +}; >> + >> +unsigned int xen_netbk_count_frag_slots(struct xenvif *vif, >> + unsigned long offset, unsigned long size, >> + struct xen_netbk_count_slot_state *state) >> +{ >> + unsigned count = 0; >> + >> + offset &= ~PAGE_MASK; >> + >> + while (size > 0) { >> + unsigned long bytes; >> + >> + bytes = PAGE_SIZE - offset; >> + >> + if (bytes > size) >> + bytes = size; >> + >> + if (start_new_rx_buffer(state->copy_off, bytes, state->head)) { >> + count++; >> + state->copy_off = 0; >> + } >> + >> + if (state->copy_off + bytes > MAX_BUFFER_OFFSET) >> + bytes = MAX_BUFFER_OFFSET - state->copy_off; >> + >> + state->copy_off += bytes; >> + >> + offset += bytes; >> + size -= bytes; >> + >> + if (offset == PAGE_SIZE) >> + offset = 0; >> + >> + state->head = false; >> + } >> + >> + return count; >> +} >> + >> /* >> * Figure out how many ring slots we''re going to need to send @skb to >> * the guest. This function is essentially a dry run of >> @@ -368,48 +411,39 @@ static bool start_new_rx_buffer(int offset, unsigned long size, int head) >> */ >> unsigned int xen_netbk_count_skb_slots(struct xenvif *vif, struct sk_buff *skb) >> { >> + struct xen_netbk_count_slot_state state; >> unsigned int count; >> - int i, copy_off; >> + unsigned char *data; >> + unsigned i; >> >> - count = DIV_ROUND_UP(skb_headlen(skb), PAGE_SIZE); >> + state.head = true; >> + state.copy_off = 0; >> >> - copy_off = skb_headlen(skb) % PAGE_SIZE; >> + /* Slot for the first (partial) page of data. */ >> + count = 1; >> >> + /* Need a slot for the GSO prefix for GSO extra data? */ >> if (skb_shinfo(skb)->gso_size) >> count++; >> >> - for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { >> - unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]); >> - unsigned long offset = skb_shinfo(skb)->frags[i].page_offset; >> - unsigned long bytes; >> - >> - offset &= ~PAGE_MASK; >> - >> - while (size > 0) { >> - BUG_ON(offset >= PAGE_SIZE); >> - BUG_ON(copy_off > MAX_BUFFER_OFFSET); >> - >> - bytes = PAGE_SIZE - offset; >> - >> - if (bytes > size) >> - bytes = size; >> + data = skb->data; >> + while (data < skb_tail_pointer(skb)) { >> + unsigned long offset = offset_in_page(data); >> + unsigned long size = PAGE_SIZE - offset; >> >> - if (start_new_rx_buffer(copy_off, bytes, 0)) { >> - count++; >> - copy_off = 0; >> - } >> + if (data + size > skb_tail_pointer(skb)) >> + size = skb_tail_pointer(skb) - data; >> >> - if (copy_off + bytes > MAX_BUFFER_OFFSET) >> - bytes = MAX_BUFFER_OFFSET - copy_off; >> + count += xen_netbk_count_frag_slots(vif, offset, size, &state); >> >> - copy_off += bytes; >> + data += size; >> + } >> >> - offset += bytes; >> - size -= bytes; >> + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) { >> + unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]); >> + unsigned long offset = skb_shinfo(skb)->frags[i].page_offset; >> >> - if (offset == PAGE_SIZE) >> - offset = 0; >> - } >> + count += xen_netbk_count_frag_slots(vif, offset, size, &state); >> } >> return count; >> } >> -- >> 1.7.2.5 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe netdev" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html
David Vrabel
2013-Sep-04 11:48 UTC
Re: [PATCH] xen-netback: count number required slots for an skb more carefully
On 03/09/13 22:53, Wei Liu wrote:> On Tue, Sep 03, 2013 at 06:29:50PM +0100, David Vrabel wrote: >> From: David Vrabel <david.vrabel@citrix.com> >> >> When a VM is providing an iSCSI target and the LUN is used by the >> backend domain, the generated skbs for direct I/O writes to the disk >> have large, multi-page skb->data but no frags. >> >> With some lengths and starting offsets, xen_netbk_count_skb_slots() >> would be one short because the simple calculation of >> DIV_ROUND_UP(skb_headlen(), PAGE_SIZE) was not accounting for the >> decisions made by start_new_rx_buffer() which does not guarantee >> responses are fully packed. >> >> For example, a skb with length < 2 pages but which spans 3 pages would >> be counted as requiring 2 slots but would actually use 3 slots. >> >> skb->data: >> >> | 1111|222222222222|3333 | >> >> Fully packed, this would need 2 slots: >> >> |111122222222|22223333 | >> >> But because the 2nd page wholy fits into a slot it is not split across >> slots and goes into a slot of its own: >> >> |1111 |222222222222|3333 | >> >> Miscounting the number of slots means netback may push more responses >> than the number of available requests. This will cause the frontend >> to get very confused and report "Too many frags/slots". The frontend >> never recovers and will eventually BUG. >> >> Fix this by counting the number of required slots more carefully. In >> xen_netbk_count_skb_slots(), more closely follow the algorithm used by >> xen_netbk_gop_skb() by introducing xen_netbk_count_frag_slots() which >> is the dry-run equivalent of netbk_gop_frag_copy(). >> > > Phew! So this is backend miscounting bug. I thought it was a frontend > bug so it didn''t ring a bell when we had our face-to-face discussion, > sorry. :-( > > This bug was discussed back in July among Annie, Matt, Ian and I. We > finally agreed to take Matt''s solution. Matt agreed to post final > version within a week but obviously he''s too busy to do so. I was away > so I didn''t follow closely. Eventually it fell through the crack. :-(I think I prefer fixing the counting for backporting to stable kernels. Xi''s approach of packing the ring differently is a change in frontend visible behaviour and seems more risky. e.g., possible performance impact so I would like to see some performance analysis of that approach. David
Ian Campbell
2013-Sep-04 12:41 UTC
Re: [PATCH] xen-netback: count number required slots for an skb more carefully
On Wed, 2013-09-04 at 12:48 +0100, David Vrabel wrote:> On 03/09/13 22:53, Wei Liu wrote: > > On Tue, Sep 03, 2013 at 06:29:50PM +0100, David Vrabel wrote: > >> From: David Vrabel <david.vrabel@citrix.com> > >> > >> When a VM is providing an iSCSI target and the LUN is used by the > >> backend domain, the generated skbs for direct I/O writes to the disk > >> have large, multi-page skb->data but no frags. > >> > >> With some lengths and starting offsets, xen_netbk_count_skb_slots() > >> would be one short because the simple calculation of > >> DIV_ROUND_UP(skb_headlen(), PAGE_SIZE) was not accounting for the > >> decisions made by start_new_rx_buffer() which does not guarantee > >> responses are fully packed. > >> > >> For example, a skb with length < 2 pages but which spans 3 pages would > >> be counted as requiring 2 slots but would actually use 3 slots. > >> > >> skb->data: > >> > >> | 1111|222222222222|3333 | > >> > >> Fully packed, this would need 2 slots: > >> > >> |111122222222|22223333 | > >> > >> But because the 2nd page wholy fits into a slot it is not split across > >> slots and goes into a slot of its own: > >> > >> |1111 |222222222222|3333 | > >> > >> Miscounting the number of slots means netback may push more responses > >> than the number of available requests. This will cause the frontend > >> to get very confused and report "Too many frags/slots". The frontend > >> never recovers and will eventually BUG. > >> > >> Fix this by counting the number of required slots more carefully. In > >> xen_netbk_count_skb_slots(), more closely follow the algorithm used by > >> xen_netbk_gop_skb() by introducing xen_netbk_count_frag_slots() which > >> is the dry-run equivalent of netbk_gop_frag_copy(). > >> > > > > Phew! So this is backend miscounting bug. I thought it was a frontend > > bug so it didn''t ring a bell when we had our face-to-face discussion, > > sorry. :-( > > > > This bug was discussed back in July among Annie, Matt, Ian and I. We > > finally agreed to take Matt''s solution. Matt agreed to post final > > version within a week but obviously he''s too busy to do so. I was away > > so I didn''t follow closely. Eventually it fell through the crack. :-( > > I think I prefer fixing the counting for backporting to stable kernels.That''s a good argument. I think we should take this patch, or something very like it, now and then rebase the more complex thing on top.> Xi''s approach of packing the ring differently is a change in frontend > visible behaviour and seems more risky. e.g., possible performance > impact so I would like to see some performance analysis of that approach.Yes.
Wei Liu
2013-Sep-04 13:14 UTC
Re: [PATCH] xen-netback: count number required slots for an skb more carefully
On Wed, Sep 04, 2013 at 12:48:15PM +0100, David Vrabel wrote:> On 03/09/13 22:53, Wei Liu wrote: > > On Tue, Sep 03, 2013 at 06:29:50PM +0100, David Vrabel wrote: > >> From: David Vrabel <david.vrabel@citrix.com> > >> > >> When a VM is providing an iSCSI target and the LUN is used by the > >> backend domain, the generated skbs for direct I/O writes to the disk > >> have large, multi-page skb->data but no frags. > >> > >> With some lengths and starting offsets, xen_netbk_count_skb_slots() > >> would be one short because the simple calculation of > >> DIV_ROUND_UP(skb_headlen(), PAGE_SIZE) was not accounting for the > >> decisions made by start_new_rx_buffer() which does not guarantee > >> responses are fully packed. > >> > >> For example, a skb with length < 2 pages but which spans 3 pages would > >> be counted as requiring 2 slots but would actually use 3 slots. > >> > >> skb->data: > >> > >> | 1111|222222222222|3333 | > >> > >> Fully packed, this would need 2 slots: > >> > >> |111122222222|22223333 | > >> > >> But because the 2nd page wholy fits into a slot it is not split across > >> slots and goes into a slot of its own: > >> > >> |1111 |222222222222|3333 | > >> > >> Miscounting the number of slots means netback may push more responses > >> than the number of available requests. This will cause the frontend > >> to get very confused and report "Too many frags/slots". The frontend > >> never recovers and will eventually BUG. > >> > >> Fix this by counting the number of required slots more carefully. In > >> xen_netbk_count_skb_slots(), more closely follow the algorithm used by > >> xen_netbk_gop_skb() by introducing xen_netbk_count_frag_slots() which > >> is the dry-run equivalent of netbk_gop_frag_copy(). > >> > > > > Phew! So this is backend miscounting bug. I thought it was a frontend > > bug so it didn''t ring a bell when we had our face-to-face discussion, > > sorry. :-( > > > > This bug was discussed back in July among Annie, Matt, Ian and I. We > > finally agreed to take Matt''s solution. Matt agreed to post final > > version within a week but obviously he''s too busy to do so. I was away > > so I didn''t follow closely. Eventually it fell through the crack. :-( > > I think I prefer fixing the counting for backporting to stable kernels.The original patch has coding style change. Sans that contextual change it''s not a very long patch.> Xi''s approach of packing the ring differently is a change in frontend > visible behaviour and seems more risky. e.g., possible performance > impact so I would like to see some performance analysis of that approach. >With Xi''s approach it is more efficient for backend to process. As we now use one less grant copy operation which means we copy the same amount of data with less grant ops. From frontend''s PoV I think the impact is minimal. Frontend is involved in assembling the packets. It only takes what''s in the ring and chain them together. The operation involves copying so far is the __pskb_pull_tail which happens a) in rare case when there''s more frags than frontend''s MAX_SKB_FRAGS, b) when pull_to > skb_headlen which happens. With Xi''s change the rare case a) will even be rarer than before as we use less slots. b) happens the same as it happens before Xi''s change, because the pull is guarded by "if (pull_to > skb_headlen(skb))" and Xi''s change doesn''t affect skb_headlen. So overall I don''t see obvious downside. Wei.> David
Wei Liu
2013-Sep-04 13:35 UTC
Re: [PATCH] xen-netback: count number required slots for an skb more carefully
On Wed, Sep 04, 2013 at 01:41:25PM +0100, Ian Campbell wrote:> On Wed, 2013-09-04 at 12:48 +0100, David Vrabel wrote: > > On 03/09/13 22:53, Wei Liu wrote: > > > On Tue, Sep 03, 2013 at 06:29:50PM +0100, David Vrabel wrote: > > >> From: David Vrabel <david.vrabel@citrix.com> > > >> > > >> When a VM is providing an iSCSI target and the LUN is used by the > > >> backend domain, the generated skbs for direct I/O writes to the disk > > >> have large, multi-page skb->data but no frags. > > >> > > >> With some lengths and starting offsets, xen_netbk_count_skb_slots() > > >> would be one short because the simple calculation of > > >> DIV_ROUND_UP(skb_headlen(), PAGE_SIZE) was not accounting for the > > >> decisions made by start_new_rx_buffer() which does not guarantee > > >> responses are fully packed. > > >> > > >> For example, a skb with length < 2 pages but which spans 3 pages would > > >> be counted as requiring 2 slots but would actually use 3 slots. > > >> > > >> skb->data: > > >> > > >> | 1111|222222222222|3333 | > > >> > > >> Fully packed, this would need 2 slots: > > >> > > >> |111122222222|22223333 | > > >> > > >> But because the 2nd page wholy fits into a slot it is not split across > > >> slots and goes into a slot of its own: > > >> > > >> |1111 |222222222222|3333 | > > >> > > >> Miscounting the number of slots means netback may push more responses > > >> than the number of available requests. This will cause the frontend > > >> to get very confused and report "Too many frags/slots". The frontend > > >> never recovers and will eventually BUG. > > >> > > >> Fix this by counting the number of required slots more carefully. In > > >> xen_netbk_count_skb_slots(), more closely follow the algorithm used by > > >> xen_netbk_gop_skb() by introducing xen_netbk_count_frag_slots() which > > >> is the dry-run equivalent of netbk_gop_frag_copy(). > > >> > > > > > > Phew! So this is backend miscounting bug. I thought it was a frontend > > > bug so it didn''t ring a bell when we had our face-to-face discussion, > > > sorry. :-( > > > > > > This bug was discussed back in July among Annie, Matt, Ian and I. We > > > finally agreed to take Matt''s solution. Matt agreed to post final > > > version within a week but obviously he''s too busy to do so. I was away > > > so I didn''t follow closely. Eventually it fell through the crack. :-( > > > > I think I prefer fixing the counting for backporting to stable kernels. > > That''s a good argument. I think we should take this patch, or something > very like it, now and then rebase the more complex thing on top. >It''s not as complex as you first see it. David''s and Xi''s approaches are different routes to the same destination. David''s approach makes counting fits what netbk_gop_frag_copy actually does, while Xi''s approach is the other way around -- makes netbk_gop_frag_copy fits what counting returns. As long as we have them agreed with each other we''re fine.> > Xi''s approach of packing the ring differently is a change in frontend > > visible behaviour and seems more risky. e.g., possible performance > > impact so I would like to see some performance analysis of that approach. > > Yes.The performance impact is more concerning. I have a short analysis in the reply to David''s email. Wei.
David Vrabel
2013-Sep-04 14:02 UTC
Re: [PATCH] xen-netback: count number required slots for an skb more carefully
On 04/09/13 14:14, Wei Liu wrote:> On Wed, Sep 04, 2013 at 12:48:15PM +0100, David Vrabel wrote: >> On 03/09/13 22:53, Wei Liu wrote: >>> On Tue, Sep 03, 2013 at 06:29:50PM +0100, David Vrabel wrote: >>>> From: David Vrabel <david.vrabel@citrix.com> >>>> >>>> When a VM is providing an iSCSI target and the LUN is used by the >>>> backend domain, the generated skbs for direct I/O writes to the disk >>>> have large, multi-page skb->data but no frags. >>>> >>>> With some lengths and starting offsets, xen_netbk_count_skb_slots() >>>> would be one short because the simple calculation of >>>> DIV_ROUND_UP(skb_headlen(), PAGE_SIZE) was not accounting for the >>>> decisions made by start_new_rx_buffer() which does not guarantee >>>> responses are fully packed. >>>> >>>> For example, a skb with length < 2 pages but which spans 3 pages would >>>> be counted as requiring 2 slots but would actually use 3 slots. >>>> >>>> skb->data: >>>> >>>> | 1111|222222222222|3333 | >>>> >>>> Fully packed, this would need 2 slots: >>>> >>>> |111122222222|22223333 | >>>> >>>> But because the 2nd page wholy fits into a slot it is not split across >>>> slots and goes into a slot of its own: >>>> >>>> |1111 |222222222222|3333 | >>>> >>>> Miscounting the number of slots means netback may push more responses >>>> than the number of available requests. This will cause the frontend >>>> to get very confused and report "Too many frags/slots". The frontend >>>> never recovers and will eventually BUG. >>>> >>>> Fix this by counting the number of required slots more carefully. In >>>> xen_netbk_count_skb_slots(), more closely follow the algorithm used by >>>> xen_netbk_gop_skb() by introducing xen_netbk_count_frag_slots() which >>>> is the dry-run equivalent of netbk_gop_frag_copy(). >>>> >>> >>> Phew! So this is backend miscounting bug. I thought it was a frontend >>> bug so it didn''t ring a bell when we had our face-to-face discussion, >>> sorry. :-( >>> >>> This bug was discussed back in July among Annie, Matt, Ian and I. We >>> finally agreed to take Matt''s solution. Matt agreed to post final >>> version within a week but obviously he''s too busy to do so. I was away >>> so I didn''t follow closely. Eventually it fell through the crack. :-( >> >> I think I prefer fixing the counting for backporting to stable kernels. > > The original patch has coding style change. Sans that contextual change > it''s not a very long patch.The size of the patch isn''t the main concern for backport-ability. It''s the frontend visible changes and thus any (unexpected) impacts on frontends -- this is especially important as only a small fraction of frontends in use will be tested with these changes.>> Xi''s approach of packing the ring differently is a change in frontend >> visible behaviour and seems more risky. e.g., possible performance >> impact so I would like to see some performance analysis of that approach. >> > > With Xi''s approach it is more efficient for backend to process. As we > now use one less grant copy operation which means we copy the same > amount of data with less grant ops.It think it uses more grant ops because the copies of the linear portion are in chunks that do not cross source page boundaries. i.e., in netbk_gop_skb(): data = skb->data; while (data < skb_tail_pointer(skb)) { unsigned int offset = offset_in_page(data); unsigned int len = PAGE_SIZE - offset; [...] It wasn''t clear from the patch that this had been considered and that any extra space needed in the grant op array was made available.> From frontend''s PoV I think the impact is minimal. Frontend is involved > in assembling the packets. It only takes what''s in the ring and chain > them together. The operation involves copying so far is the > __pskb_pull_tail which happens a) in rare case when there''s more frags > than frontend''s MAX_SKB_FRAGS, b) when pull_to > skb_headlen which > happens. With Xi''s change the rare case a) will even be rarer than > before as we use less slots. b) happens the same as it happens before > Xi''s change, because the pull is guarded by "if (pull_to > > skb_headlen(skb))" and Xi''s change doesn''t affect skb_headlen. > > So overall I don''t see obvious downside.The obvious downside is it doesn''t exist (in a form that can be applied now), it hasn''t been tested and I think there may well be a subtle bug that would need a careful review or testing to confirm/deny. You are free to work on this as a future improvements but I really don''t see why this critical bug fix needs to be delayed any further. David
Wei Liu
2013-Sep-04 15:44 UTC
Re: [PATCH] xen-netback: count number required slots for an skb more carefully
On Wed, Sep 04, 2013 at 03:02:01PM +0100, David Vrabel wrote: [...]> >> > >> I think I prefer fixing the counting for backporting to stable kernels. > > > > The original patch has coding style change. Sans that contextual change > > it''s not a very long patch. > > The size of the patch isn''t the main concern for backport-ability. It''s > the frontend visible changes and thus any (unexpected) impacts on > frontends -- this is especially important as only a small fraction of > frontends in use will be tested with these changes. > > >> Xi''s approach of packing the ring differently is a change in frontend > >> visible behaviour and seems more risky. e.g., possible performance > >> impact so I would like to see some performance analysis of that approach. > >> > > > > With Xi''s approach it is more efficient for backend to process. As we > > now use one less grant copy operation which means we copy the same > > amount of data with less grant ops. > > It think it uses more grant ops because the copies of the linear > portion are in chunks that do not cross source page boundaries. > > i.e., in netbk_gop_skb(): > > data = skb->data; > while (data < skb_tail_pointer(skb)) { > unsigned int offset = offset_in_page(data); > unsigned int len = PAGE_SIZE - offset; > [...] > > It wasn''t clear from the patch that this had been considered and that > any extra space needed in the grant op array was made available. >If I''m not mistaken the grant op array is already enormous. See the comment in struct xen_netbk for grant_copy_op. The case that a buffer straddles two slots was taken into consideration long ago -- that''s why you don''t see any comment or code change WRT that...> > From frontend''s PoV I think the impact is minimal. Frontend is involved > > in assembling the packets. It only takes what''s in the ring and chain > > them together. The operation involves copying so far is the > > __pskb_pull_tail which happens a) in rare case when there''s more frags > > than frontend''s MAX_SKB_FRAGS, b) when pull_to > skb_headlen which > > happens. With Xi''s change the rare case a) will even be rarer than > > before as we use less slots. b) happens the same as it happens before > > Xi''s change, because the pull is guarded by "if (pull_to > > > skb_headlen(skb))" and Xi''s change doesn''t affect skb_headlen. > > > > So overall I don''t see obvious downside. > > The obvious downside is it doesn''t exist (in a form that can be applied > now), it hasn''t been tested and I think there may well be a subtle bug > that would need a careful review or testing to confirm/deny. >It does exist and apply cleanly on top of net tree. I haven''t posted it yet because we haven''t reached concensus which path to take. :-) The only reason that last version didn''t get upstreamed is that the commit message wasn''t clear enough. From the technical PoV it''s quite sound and I believe Amazon has been using it for a long time -- the older reference dates back to Aug 2012 IIRC. It''s just never properly upstreamed.> You are free to work on this as a future improvements but I really don''t > see why this critical bug fix needs to be delayed any further. >True. I don''t mean to hold off critical fix. Just want to make sure that every option is presented and considered. Wei.> David
David Miller
2013-Sep-04 18:48 UTC
Re: [PATCH] xen-netback: count number required slots for an skb more carefully
Can you guys just post the patch you want applied once this thread moves closer to a consensus? Thanks.
Ian Campbell
2013-Sep-05 07:35 UTC
Re: [PATCH] xen-netback: count number required slots for an skb more carefully
On Wed, 2013-09-04 at 14:48 -0400, David Miller wrote:> Can you guys just post the patch you want applied once this thread > moves closer to a consensus? Thanks.Of course. Ian.
David Vrabel
2013-Sep-05 10:12 UTC
Re: [PATCH] xen-netback: count number required slots for an skb more carefully
On 04/09/13 16:44, Wei Liu wrote:> On Wed, Sep 04, 2013 at 03:02:01PM +0100, David Vrabel wrote: > [...] >>>> >>>> I think I prefer fixing the counting for backporting to stable kernels. >>> >>> The original patch has coding style change. Sans that contextual change >>> it''s not a very long patch. >> >> The size of the patch isn''t the main concern for backport-ability. It''s >> the frontend visible changes and thus any (unexpected) impacts on >> frontends -- this is especially important as only a small fraction of >> frontends in use will be tested with these changes. >> >>>> Xi''s approach of packing the ring differently is a change in frontend >>>> visible behaviour and seems more risky. e.g., possible performance >>>> impact so I would like to see some performance analysis of that approach. >>>> >>> >>> With Xi''s approach it is more efficient for backend to process. As we >>> now use one less grant copy operation which means we copy the same >>> amount of data with less grant ops. >> >> It think it uses more grant ops because the copies of the linear >> portion are in chunks that do not cross source page boundaries. >> >> i.e., in netbk_gop_skb(): >> >> data = skb->data; >> while (data < skb_tail_pointer(skb)) { >> unsigned int offset = offset_in_page(data); >> unsigned int len = PAGE_SIZE - offset; >> [...] >> >> It wasn''t clear from the patch that this had been considered and that >> any extra space needed in the grant op array was made available. >> > > If I''m not mistaken the grant op array is already enormous. See the > comment in struct xen_netbk for grant_copy_op. The case that a buffer > straddles two slots was taken into consideration long ago -- that''s > why you don''t see any comment or code change WRT that..I''m not convinced that even that is enough for the current implementation in the (very) worse case. Consider a skb with 8 frags all 512 in length. The linear data will be placed into 1 slot, and the frags will be packed into 1 slot so 9 grant ops and 2 slots. I definitely think we do not want to potentially regress any further in this area. David
Wei Liu
2013-Sep-05 10:27 UTC
Re: [PATCH] xen-netback: count number required slots for an skb more carefully
On Thu, Sep 05, 2013 at 11:12:11AM +0100, David Vrabel wrote:> On 04/09/13 16:44, Wei Liu wrote: > > On Wed, Sep 04, 2013 at 03:02:01PM +0100, David Vrabel wrote: > > [...] > >>>> > >>>> I think I prefer fixing the counting for backporting to stable kernels. > >>> > >>> The original patch has coding style change. Sans that contextual change > >>> it''s not a very long patch. > >> > >> The size of the patch isn''t the main concern for backport-ability. It''s > >> the frontend visible changes and thus any (unexpected) impacts on > >> frontends -- this is especially important as only a small fraction of > >> frontends in use will be tested with these changes. > >> > >>>> Xi''s approach of packing the ring differently is a change in frontend > >>>> visible behaviour and seems more risky. e.g., possible performance > >>>> impact so I would like to see some performance analysis of that approach. > >>>> > >>> > >>> With Xi''s approach it is more efficient for backend to process. As we > >>> now use one less grant copy operation which means we copy the same > >>> amount of data with less grant ops. > >> > >> It think it uses more grant ops because the copies of the linear > >> portion are in chunks that do not cross source page boundaries. > >> > >> i.e., in netbk_gop_skb(): > >> > >> data = skb->data; > >> while (data < skb_tail_pointer(skb)) { > >> unsigned int offset = offset_in_page(data); > >> unsigned int len = PAGE_SIZE - offset; > >> [...] > >> > >> It wasn''t clear from the patch that this had been considered and that > >> any extra space needed in the grant op array was made available. > >> > > > > If I''m not mistaken the grant op array is already enormous. See the > > comment in struct xen_netbk for grant_copy_op. The case that a buffer > > straddles two slots was taken into consideration long ago -- that''s > > why you don''t see any comment or code change WRT that.. > > I''m not convinced that even that is enough for the current > implementation in the (very) worse case. > > Consider a skb with 8 frags all 512 in length. The linear data will be > placed into 1 slot, and the frags will be packed into 1 slot so 9 grant > ops and 2 slots. >That is still less than 8 * 2 = 16 slots in the grant copy array, right? Basically each paged buffer is allowed for 2 slots in the grant copy array.> I definitely think we do not want to potentially regress any further in > this area. >Sure.> David
Wei Liu
2013-Sep-09 09:20 UTC
Re: [PATCH] xen-netback: count number required slots for an skb more carefully
On Wed, Sep 04, 2013 at 01:41:25PM +0100, Ian Campbell wrote: [...]> > > This bug was discussed back in July among Annie, Matt, Ian and I. We > > > finally agreed to take Matt''s solution. Matt agreed to post final > > > version within a week but obviously he''s too busy to do so. I was away > > > so I didn''t follow closely. Eventually it fell through the crack. :-( > > > > I think I prefer fixing the counting for backporting to stable kernels. > > That''s a good argument. I think we should take this patch, or something > very like it, now and then rebase the more complex thing on top. >I take this as sort-of-ack by the maintainer. David, please go ahead with your fix. I will post the other approach as an improvement later on. Wei.> > Xi''s approach of packing the ring differently is a change in frontend > > visible behaviour and seems more risky. e.g., possible performance > > impact so I would like to see some performance analysis of that approach. > > Yes.