Jason Wang
2013-Oct-31  11:47 UTC
[PATCH net-next V2 1/2] net: introduce skb_coalesce_rx_frag()
Sometimes we need to coalesce the rx frags to avoid frag list. One example is
virtio-net driver which tries to use small frags for both MTU sized packet and
GSO packet. So this patch introduce skb_coalesce_rx_frag() to do this.
Cc: Rusty Russell <rusty at rustcorp.com.au>
Cc: Michael S. Tsirkin <mst at redhat.com>
Cc: Michael Dalton <mwdalton at google.com>
Cc: Eric Dumazet <edumazet at google.com>
Acked-by: Michael S. Tsirkin <mst at redhat.com>
Signed-off-by: Jason Wang <jasowang at redhat.com>
---
Changes from V1:
- remove the useless off parameter.
---
 include/linux/skbuff.h |  3 +++
 net/core/skbuff.c      | 13 +++++++++++++
 2 files changed, 16 insertions(+)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 2c15497..fffaeaf 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1372,6 +1372,9 @@ static inline void skb_fill_page_desc(struct sk_buff *skb,
int i,
 void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off,
 		     int size, unsigned int truesize);
 
+void skb_coalesce_rx_frag(struct sk_buff *skb, int i, int size,
+			  unsigned int truesize);
+
 #define SKB_PAGE_ASSERT(skb) 	BUG_ON(skb_shinfo(skb)->nr_frags)
 #define SKB_FRAG_ASSERT(skb) 	BUG_ON(skb_has_frag_list(skb))
 #define SKB_LINEAR_ASSERT(skb)  BUG_ON(skb_is_nonlinear(skb))
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 0ab32fa..87670e1 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -476,6 +476,19 @@ void skb_add_rx_frag(struct sk_buff *skb, int i, struct
page *page, int off,
 }
 EXPORT_SYMBOL(skb_add_rx_frag);
 
+void skb_coalesce_rx_frag(struct sk_buff *skb, int i, int size,
+			  unsigned int truesize)
+{
+	skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+
+	skb_frag_size_add(frag, size);
+	skb->len += size;
+	skb->data_len += size;
+	skb->truesize += truesize;
+	skb_frag_unref(skb, i);
+}
+EXPORT_SYMBOL(skb_coalesce_rx_frag);
+
 static void skb_drop_list(struct sk_buff **listp)
 {
 	kfree_skb_list(*listp);
-- 
1.8.1.2
Jason Wang
2013-Oct-31  11:47 UTC
[PATCH net-next V2 2/2] virtio-net: coalesce rx frags when possible during rx
Commit 2613af0ed18a11d5c566a81f9a6510b73180660a (virtio_net: migrate mergeable
rx buffers to page frag allocators) try to increase the payload/truesize for
MTU-sized traffic. But this will introduce the extra overhead for GSO packets
received because of the frag list. This commit tries to reduce this issue by
coalesce the possible rx frags when possible during rx. Test result shows the
about 15% improvement on full size GSO packet receiving (and even better than
commit 2613af0ed18a11d5c566a81f9a6510b73180660a).
Before this commit:
./netperf -H 192.168.100.4
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4
() port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
 87380  16384  16384    10.00    20303.87
After this commit:
./netperf -H 192.168.100.4
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4
() port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
 87380  16384  16384    10.00    23841.26
Cc: Rusty Russell <rusty at rustcorp.com.au>
Cc: Michael S. Tsirkin <mst at redhat.com>
Cc: Michael Dalton <mwdalton at google.com>
Cc: Eric Dumazet <edumazet at google.com>
Acked-by: Michael S. Tsirkin <mst at redhat.com>
Signed-off-by: Jason Wang <jasowang at redhat.com>
---
 drivers/net/virtio_net.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 113ee93..5dc0de0 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -305,7 +305,7 @@ static int receive_mergeable(struct receive_queue *rq,
struct sk_buff *head_skb)
 	struct sk_buff *curr_skb = head_skb;
 	char *buf;
 	struct page *page;
-	int num_buf, len;
+	int num_buf, len, offset;
 
 	num_buf = hdr->mhdr.num_buffers;
 	while (--num_buf) {
@@ -342,9 +342,15 @@ static int receive_mergeable(struct receive_queue *rq,
struct sk_buff *head_skb)
 			head_skb->truesize += MAX_PACKET_LEN;
 		}
 		page = virt_to_head_page(buf);
-		skb_add_rx_frag(curr_skb, num_skb_frags, page,
-				buf - (char *)page_address(page), len,
-				MAX_PACKET_LEN);
+		offset = buf - (char *)page_address(page);
+		if (skb_can_coalesce(curr_skb, num_skb_frags, page, offset)) {
+			skb_coalesce_rx_frag(curr_skb, num_skb_frags - 1,
+					     len, MAX_PACKET_LEN);
+		} else {
+			skb_add_rx_frag(curr_skb, num_skb_frags, page,
+					offset, len,
+					MAX_PACKET_LEN);
+		}
 		--rq->num;
 	}
 	return 0;
-- 
1.8.1.2
Eric Dumazet
2013-Oct-31  13:57 UTC
[PATCH net-next V2 2/2] virtio-net: coalesce rx frags when possible during rx
On Thu, 2013-10-31 at 19:47 +0800, Jason Wang wrote:> Commit 2613af0ed18a11d5c566a81f9a6510b73180660a (virtio_net: migrate mergeable > rx buffers to page frag allocators) try to increase the payload/truesize for > MTU-sized traffic. But this will introduce the extra overhead for GSO packets > received because of the frag list. This commit tries to reduce this issue by > coalesce the possible rx frags when possible during rx. Test result shows the > about 15% improvement on full size GSO packet receiving (and even better than > commit 2613af0ed18a11d5c566a81f9a6510b73180660a). > > Before this commit: > ./netperf -H 192.168.100.4 > MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4 > () port 0 AF_INET : demo > Recv Send Send > Socket Socket Message Elapsed > Size Size Size Time Throughput > bytes bytes bytes secs. 10^6bits/sec > > 87380 16384 16384 10.00 20303.87 > > After this commit: > ./netperf -H 192.168.100.4 > MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4 () port 0 AF_INET : demo > Recv Send Send > Socket Socket Message Elapsed > Size Size Size Time Throughput > bytes bytes bytes secs. 10^6bits/sec > > 87380 16384 16384 10.00 23841.26 > > Cc: Rusty Russell <rusty at rustcorp.com.au> > Cc: Michael S. Tsirkin <mst at redhat.com> > Cc: Michael Dalton <mwdalton at google.com> > Cc: Eric Dumazet <edumazet at google.com> > Acked-by: Michael S. Tsirkin <mst at redhat.com> > Signed-off-by: Jason Wang <jasowang at redhat.com> > ---Excellent ! We now have 2 or 3 frags per skb, like tcp stack manages to do on output path. Michael Dalton is also working on a autotuning patch, using an EWMA, so that the size of individual sg blocks can vary from 1500 to 4096, this might show even better throughput, we'll see. Acked-by: Eric Dumazet <edumazet at google.com>
Eric Dumazet
2013-Oct-31  14:26 UTC
[PATCH net-next V2 1/2] net: introduce skb_coalesce_rx_frag()
On Thu, 2013-10-31 at 19:47 +0800, Jason Wang wrote:> Sometimes we need to coalesce the rx frags to avoid frag list. One example is > virtio-net driver which tries to use small frags for both MTU sized packet and > GSO packet. So this patch introduce skb_coalesce_rx_frag() to do this. > > Cc: Rusty Russell <rusty at rustcorp.com.au> > Cc: Michael S. Tsirkin <mst at redhat.com> > Cc: Michael Dalton <mwdalton at google.com> > Cc: Eric Dumazet <edumazet at google.com> > Acked-by: Michael S. Tsirkin <mst at redhat.com> > Signed-off-by: Jason Wang <jasowang at redhat.com> > --- > Changes from V1: > - remove the useless off parameter. > --- > include/linux/skbuff.h | 3 +++ > net/core/skbuff.c | 13 +++++++++++++ > 2 files changed, 16 insertions(+) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 2c15497..fffaeaf 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -1372,6 +1372,9 @@ static inline void skb_fill_page_desc(struct sk_buff *skb, int i, > void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off, > int size, unsigned int truesize); > > +void skb_coalesce_rx_frag(struct sk_buff *skb, int i, int size, > + unsigned int truesize); > + > #define SKB_PAGE_ASSERT(skb) BUG_ON(skb_shinfo(skb)->nr_frags) > #define SKB_FRAG_ASSERT(skb) BUG_ON(skb_has_frag_list(skb)) > #define SKB_LINEAR_ASSERT(skb) BUG_ON(skb_is_nonlinear(skb)) > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 0ab32fa..87670e1 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -476,6 +476,19 @@ void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off, > } > EXPORT_SYMBOL(skb_add_rx_frag); > > +void skb_coalesce_rx_frag(struct sk_buff *skb, int i, int size, > + unsigned int truesize) > +{ > + skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; > + > + skb_frag_size_add(frag, size); > + skb->len += size; > + skb->data_len += size; > + skb->truesize += truesize;> + skb_frag_unref(skb, i);This unref is not logical, or should at least be __skb_frag_unref(frag); But I do think this is best done in the caller. In virtio_net this would be a : put_page(page); In tcp stack we do almost the same, but we take the reference on the page if we could not coalesce with prio frag, instead of doing a get and put in the other case. if (can_coalesce) { skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); } else { get_page(page); skb_fill_page_desc(skb, i, page, offset, copy); }> +} > +EXPORT_SYMBOL(skb_coalesce_rx_frag); > + > static void skb_drop_list(struct sk_buff **listp) > { > kfree_skb_list(*listp);
Jason Wang
2013-Nov-01  05:25 UTC
[PATCH net-next V2 1/2] net: introduce skb_coalesce_rx_frag()
On 10/31/2013 10:26 PM, Eric Dumazet wrote:> On Thu, 2013-10-31 at 19:47 +0800, Jason Wang wrote: >> Sometimes we need to coalesce the rx frags to avoid frag list. One example is >> virtio-net driver which tries to use small frags for both MTU sized packet and >> GSO packet. So this patch introduce skb_coalesce_rx_frag() to do this. >> >> Cc: Rusty Russell <rusty at rustcorp.com.au> >> Cc: Michael S. Tsirkin <mst at redhat.com> >> Cc: Michael Dalton <mwdalton at google.com> >> Cc: Eric Dumazet <edumazet at google.com> >> Acked-by: Michael S. Tsirkin <mst at redhat.com> >> Signed-off-by: Jason Wang <jasowang at redhat.com> >> --- >> Changes from V1: >> - remove the useless off parameter. >> --- >> include/linux/skbuff.h | 3 +++ >> net/core/skbuff.c | 13 +++++++++++++ >> 2 files changed, 16 insertions(+) >> >> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h >> index 2c15497..fffaeaf 100644 >> --- a/include/linux/skbuff.h >> +++ b/include/linux/skbuff.h >> @@ -1372,6 +1372,9 @@ static inline void skb_fill_page_desc(struct sk_buff *skb, int i, >> void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off, >> int size, unsigned int truesize); >> >> +void skb_coalesce_rx_frag(struct sk_buff *skb, int i, int size, >> + unsigned int truesize); >> + >> #define SKB_PAGE_ASSERT(skb) BUG_ON(skb_shinfo(skb)->nr_frags) >> #define SKB_FRAG_ASSERT(skb) BUG_ON(skb_has_frag_list(skb)) >> #define SKB_LINEAR_ASSERT(skb) BUG_ON(skb_is_nonlinear(skb)) >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >> index 0ab32fa..87670e1 100644 >> --- a/net/core/skbuff.c >> +++ b/net/core/skbuff.c >> @@ -476,6 +476,19 @@ void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off, >> } >> EXPORT_SYMBOL(skb_add_rx_frag); >> >> +void skb_coalesce_rx_frag(struct sk_buff *skb, int i, int size, >> + unsigned int truesize) >> +{ >> + skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; >> + >> + skb_frag_size_add(frag, size); >> + skb->len += size; >> + skb->data_len += size; >> + skb->truesize += truesize; > >> + skb_frag_unref(skb, i); > This unref is not logical, or should at least be > > __skb_frag_unref(frag); > > But I do think this is best done in the caller. > > In virtio_net this would be a : > > put_page(page); > > In tcp stack we do almost the same, but we take the reference on the > page if we could not coalesce with prio frag, instead of doing a get and > put in the other case. > > if (can_coalesce) { > skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); > } else { > get_page(page); > skb_fill_page_desc(skb, i, page, offset, copy); > } >Ok, get it. Will do a put_page() in V3. Thanks>> +} >> +EXPORT_SYMBOL(skb_coalesce_rx_frag); >> + >> static void skb_drop_list(struct sk_buff **listp) >> { >> kfree_skb_list(*listp); > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Maybe Matching Threads
- [PATCH net-next V2 1/2] net: introduce skb_coalesce_rx_frag()
- [PATCH net-next 1/2] net: introduce skb_coalesce_rx_frag()
- [PATCH net-next 1/2] net: introduce skb_coalesce_rx_frag()
- [PATCH net-next V3 1/2] net: introduce skb_coalesce_rx_frag()
- [PATCH net-next V3 1/2] net: introduce skb_coalesce_rx_frag()