Ian Campbell
2012-Nov-20 16:00 UTC
[PATCH V2] xen/netfront: handle compound page fragments on transmit
An SKB paged fragment can consist of a compound page with order > 0. However the netchannel protocol deals only in PAGE_SIZE frames. Handle this in xennet_make_frags by iterating over the frames which make up the page. This is the netfront equivalent to 6a8ed462f16b for netback. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: netdev@vger.kernel.org Cc: xen-devel@lists.xen.org Cc: Eric Dumazet <edumazet@google.com> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> Cc: ANNIE LI <annie.li@oracle.com> Cc: Sander Eikelenboom <linux@eikelenboom.it> Cc: Stefan Bader <stefan.bader@canonical.com> --- v2: check we have enough room in the ring and that the other end can cope with the number of slots in a single frame --- drivers/net/xen-netfront.c | 95 ++++++++++++++++++++++++++++++++++---------- 1 files changed, 74 insertions(+), 21 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index caa0110..d9b16f4 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -452,29 +452,82 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev, /* Grant backend access to each skb fragment page. */ for (i = 0; i < frags; i++) { skb_frag_t *frag = skb_shinfo(skb)->frags + i; + struct page *page = skb_frag_page(frag); + unsigned long size = skb_frag_size(frag); + unsigned long offset = frag->page_offset; - tx->flags |= XEN_NETTXF_more_data; + /* Data must not cross a page boundary. */ + BUG_ON(size + offset > PAGE_SIZE<<compound_order(page)); - id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs); - np->tx_skbs[id].skb = skb_get(skb); - tx = RING_GET_REQUEST(&np->tx, prod++); - tx->id = id; - ref = gnttab_claim_grant_reference(&np->gref_tx_head); - BUG_ON((signed short)ref < 0); + /* Skip unused frames from start of page */ + page += offset >> PAGE_SHIFT; + offset &= ~PAGE_MASK; - mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag))); - gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id, - mfn, GNTMAP_readonly); + while (size > 0) { + unsigned long bytes; - tx->gref = np->grant_tx_ref[id] = ref; - tx->offset = frag->page_offset; - tx->size = skb_frag_size(frag); - tx->flags = 0; + BUG_ON(offset >= PAGE_SIZE); + + bytes = PAGE_SIZE - offset; + if (bytes > size) + bytes = size; + + tx->flags |= XEN_NETTXF_more_data; + + id = get_id_from_freelist(&np->tx_skb_freelist, np->tx_skbs); + np->tx_skbs[id].skb = skb_get(skb); + tx = RING_GET_REQUEST(&np->tx, prod++); + tx->id = id; + ref = gnttab_claim_grant_reference(&np->gref_tx_head); + BUG_ON((signed short)ref < 0); + + mfn = pfn_to_mfn(page_to_pfn(page)); + gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id, + mfn, GNTMAP_readonly); + + tx->gref = np->grant_tx_ref[id] = ref; + tx->offset = offset; + tx->size = bytes; + tx->flags = 0; + + offset += bytes; + size -= bytes; + + /* Next frame */ + if (offset == PAGE_SIZE && size) { + BUG_ON(!PageCompound(page)); + page++; + offset = 0; + } + } } np->tx.req_prod_pvt = prod; } +/* + * Count how many ring slots are required to send the frags of this + * skb. Each frag might be a compound page. + */ +static int xennet_count_skb_frag_slots(struct sk_buff *skb) +{ + int i, frags = skb_shinfo(skb)->nr_frags; + int pages = 0; + + for (i = 0; i < frags; i++) { + skb_frag_t *frag = skb_shinfo(skb)->frags + i; + unsigned long size = skb_frag_size(frag); + unsigned long offset = frag->page_offset; + + /* Skip unused frames from start of page */ + offset &= ~PAGE_MASK; + + pages += PFN_UP(offset + size); + } + + return pages; +} + static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev) { unsigned short id; @@ -487,23 +540,23 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev) grant_ref_t ref; unsigned long mfn; int notify; - int frags = skb_shinfo(skb)->nr_frags; + int slots; unsigned int offset = offset_in_page(data); unsigned int len = skb_headlen(skb); unsigned long flags; - frags += DIV_ROUND_UP(offset + len, PAGE_SIZE); - if (unlikely(frags > MAX_SKB_FRAGS + 1)) { - printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n", - frags); - dump_stack(); + slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) + + xennet_count_skb_frag_slots(skb); + if (unlikely(slots > MAX_SKB_FRAGS + 1)) { + printk(KERN_ALERT "xennet: skb rides the rocket: %d slots\n", + slots); goto drop; } spin_lock_irqsave(&np->tx_lock, flags); if (unlikely(!netif_carrier_ok(dev) || - (frags > 1 && !xennet_can_sg(dev)) || + (slots > 1 && !xennet_can_sg(dev)) || netif_needs_gso(skb, netif_skb_features(skb)))) { spin_unlock_irqrestore(&np->tx_lock, flags); goto drop; -- 1.7.2.5
Eric Dumazet
2012-Nov-20 16:27 UTC
Re: [PATCH V2] xen/netfront: handle compound page fragments on transmit
On Tue, 2012-11-20 at 16:00 +0000, Ian Campbell wrote:> An SKB paged fragment can consist of a compound page with order > 0. > However the netchannel protocol deals only in PAGE_SIZE frames. > > Handle this in xennet_make_frags by iterating over the frames which > make up the page. > > This is the netfront equivalent to 6a8ed462f16b for netback....> - frags += DIV_ROUND_UP(offset + len, PAGE_SIZE); > - if (unlikely(frags > MAX_SKB_FRAGS + 1)) { > - printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n", > - frags); > - dump_stack(); > + slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) + > + xennet_count_skb_frag_slots(skb); > + if (unlikely(slots > MAX_SKB_FRAGS + 1)) { > + printk(KERN_ALERT "xennet: skb rides the rocket: %d slots\n", > + slots);I think this is wrong. You should change netfront_tx_slot_available() to stop the queue before this can happen. Yes, you dont hit this on your tests, but a driver should not drop a good packet.> goto drop; > } > > spin_lock_irqsave(&np->tx_lock, flags); > > if (unlikely(!netif_carrier_ok(dev) || > - (frags > 1 && !xennet_can_sg(dev)) || > + (slots > 1 && !xennet_can_sg(dev)) || > netif_needs_gso(skb, netif_skb_features(skb)))) { > spin_unlock_irqrestore(&np->tx_lock, flags); > goto drop;diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index caa0110..cb1e605 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -215,10 +215,13 @@ static void rx_refill_timeout(unsigned long data) napi_schedule(&np->napi); } +/* Considering a 64Kb packet of 16 frags, each frag can be mapped + * to 3 order-0 parts on pathological cases + */ static int netfront_tx_slot_available(struct netfront_info *np) { return (np->tx.req_prod_pvt - np->tx.rsp_cons) < - (TX_MAX_TARGET - MAX_SKB_FRAGS - 2); + (TX_MAX_TARGET - 3*MAX_SKB_FRAGS - 2); } static void xennet_maybe_wake_tx(struct net_device *dev)
Ian Campbell
2012-Nov-21 12:08 UTC
Re: [PATCH V2] xen/netfront: handle compound page fragments on transmit
On Tue, 2012-11-20 at 16:27 +0000, Eric Dumazet wrote:> On Tue, 2012-11-20 at 16:00 +0000, Ian Campbell wrote: > > An SKB paged fragment can consist of a compound page with order > 0. > > However the netchannel protocol deals only in PAGE_SIZE frames. > > > > Handle this in xennet_make_frags by iterating over the frames which > > make up the page. > > > > This is the netfront equivalent to 6a8ed462f16b for netback. > ... > > > - frags += DIV_ROUND_UP(offset + len, PAGE_SIZE); > > - if (unlikely(frags > MAX_SKB_FRAGS + 1)) { > > - printk(KERN_ALERT "xennet: skb rides the rocket: %d frags\n", > > - frags); > > - dump_stack(); > > + slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) + > > + xennet_count_skb_frag_slots(skb); > > + if (unlikely(slots > MAX_SKB_FRAGS + 1)) { > > + printk(KERN_ALERT "xennet: skb rides the rocket: %d slots\n", > > + slots); > > I think this is wrong. > > You should change netfront_tx_slot_available() to stop the queue before > this can happen. > > Yes, you dont hit this on your tests, but a driver should not drop a > good packet.The max-frag related limitation comes from the "wire" protocol used between front and back. As it stands either the frontend or the backend is more than likely going to drop the sort of pathalogical skbs you are worried. I agree that this absolutely needs to be fixed in the protocol (and I''ve posted a call to arms on this topic on xen-devel) but I''d like to do it in a coordinated manner as part of a protocol extension (where the front and backend negotiate the maximum number of order-0 pages per Ethernet frame they are willing to handle) rather than as a side effect of this patch. So right now I don''t want to introduce frontends which default to sending increased numbers of pages in to the wild, since that makes things more complex when we come to extend the protocol. Perhaps in the short term doing an skb_linearize when we hit this case would help, that will turn the pathalogical skb into a much more normal one. It''ll be expensive but it should be rare. That assumes you can linearize such a large skb, which depends on the ability to allocate large order pages which isn''t a given. Herm, maybe that doesn''t work then. AFAIK we don''t have an existing skb_foo operation which copies an skb, including (or only) the frags, with the side effect of aligning and coalescing them. Do we? Ian.
Eric Dumazet
2012-Nov-21 15:13 UTC
Re: [PATCH V2] xen/netfront: handle compound page fragments on transmit
On Wed, 2012-11-21 at 12:08 +0000, Ian Campbell wrote:> The max-frag related limitation comes from the "wire" protocol used > between front and back. As it stands either the frontend or the backend > is more than likely going to drop the sort of pathalogical skbs you are > worried. > > I agree that this absolutely needs to be fixed in the protocol (and I''ve > posted a call to arms on this topic on xen-devel) but I''d like to do it > in a coordinated manner as part of a protocol extension (where the front > and backend negotiate the maximum number of order-0 pages per Ethernet > frame they are willing to handle) rather than as a side effect of this > patch. > > So right now I don''t want to introduce frontends which default to > sending increased numbers of pages in to the wild, since that makes > things more complex when we come to extend the protocol. > > Perhaps in the short term doing an skb_linearize when we hit this case > would help, that will turn the pathalogical skb into a much more normal > one. It''ll be expensive but it should be rare. That assumes you can > linearize such a large skb, which depends on the ability to allocate > large order pages which isn''t a given. Herm, maybe that doesn''t work > then. >First of all, thanks a lot for all these detailed informations. This now makes sense !> AFAIK we don''t have an existing skb_foo operation which copies an skb, > including (or only) the frags, with the side effect of aligning and > coalescing them. Do we? >No, we only have the full linearize helper, and skb_try_coalesce() helpers. TCP stack uses an internal function to collapse several skbs so skbs using a single page, I guess we could generalize this and make it available to other uses. Thanks !