From: Eric Dumazet <edumazet at google.com> Straightforward patch to add GRO processing to virtio_net. napi_complete_done() usage allows more aggressive aggregation, opted-in by setting /sys/class/net/xxx/gro_flush_timeout Tested: Setting /sys/class/net/xxx/gro_flush_timeout to 1000 nsec, Rick Jones reported following results. One VM of each on a pair of OpenStack compute nodes with E5-2650Lv3 CPUs and Intel 82599ES-based NICs. So, two "before" and two "after" VMs. The OpenStack compute nodes were running OpenStack Kilo, with VxLAN encapsulation being used through OVS so no GRO coming-up the host stack. The compute nodes themselves were running a 3.14-based kernel. Single-stream netperf, CPU utilizations and thus service demands are based on intra-guest reported CPU. Throughput Mbit/s, bigger is better Min Median Average Max 4.2.0-rc3+ 1364 1686 1678 1938 4.2.0-rc3+flush1k 1824 2269 2275 2647 Send Service Demand, smaller is better Min Median Average Max 4.2.0-rc3+ 0.236 0.558 0.524 0.802 4.2.0-rc3+flush1k 0.176 0.503 0.471 0.738 Receive Service Demand, smaller is better. Min Median Average Max 4.2.0-rc3+ 1.906 2.188 2.191 2.531 4.2.0-rc3+flush1k 0.448 0.529 0.533 0.692 Signed-off-by: Eric Dumazet <edumazet at google.com> Tested-by: Rick Jones <rick.jones2 at hp.com> Cc: "Michael S. Tsirkin" <mst at redhat.com> --- drivers/net/virtio_net.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 7fbca37a1adf..66f08f622dc6 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -518,7 +518,7 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq, skb_mark_napi_id(skb, &rq->napi); - netif_receive_skb(skb); + napi_gro_receive(&rq->napi, skb); return; frame_err: @@ -756,7 +756,7 @@ static int virtnet_poll(struct napi_struct *napi, int budget) /* Out of packets? */ if (received < budget) { r = virtqueue_enable_cb_prepare(rq->vq); - napi_complete(napi); + napi_complete_done(napi, received); if (unlikely(virtqueue_poll(rq->vq, r)) && napi_schedule_prep(napi)) { virtqueue_disable_cb(rq->vq);
From: Eric Dumazet <eric.dumazet at gmail.com> Date: Fri, 31 Jul 2015 18:25:17 +0200> From: Eric Dumazet <edumazet at google.com> > > Straightforward patch to add GRO processing to virtio_net. >...> Signed-off-by: Eric Dumazet <edumazet at google.com> > Tested-by: Rick Jones <rick.jones2 at hp.com> > Cc: "Michael S. Tsirkin" <mst at redhat.com>Michael, please review :-)
On Fri, Jul 31, 2015 at 04:57:32PM -0700, David Miller wrote:> From: Eric Dumazet <eric.dumazet at gmail.com> > Date: Fri, 31 Jul 2015 18:25:17 +0200 > > > From: Eric Dumazet <edumazet at google.com> > > > > Straightforward patch to add GRO processing to virtio_net. > > > ... > > Signed-off-by: Eric Dumazet <edumazet at google.com> > > Tested-by: Rick Jones <rick.jones2 at hp.com> > > Cc: "Michael S. Tsirkin" <mst at redhat.com> > > Michael, please review :-)Will do shortly :)
On Fri, Jul 31, 2015 at 06:25:17PM +0200, Eric Dumazet wrote:> From: Eric Dumazet <edumazet at google.com> > > Straightforward patch to add GRO processing to virtio_net. > > napi_complete_done() usage allows more aggressive aggregation, > opted-in by setting /sys/class/net/xxx/gro_flush_timeout > > Tested: > > Setting /sys/class/net/xxx/gro_flush_timeout to 1000 nsec, > Rick Jones reported following results. > > One VM of each on a pair of OpenStack compute nodes with E5-2650Lv3 CPUs > and Intel 82599ES-based NICs. So, two "before" and two "after" VMs. > The OpenStack compute nodes were running OpenStack Kilo, with VxLAN > encapsulation being used through OVS so no GRO coming-up the host > stack. The compute nodes themselves were running a 3.14-based kernel. > > Single-stream netperf, CPU utilizations and thus service demands are > based on intra-guest reported CPU. > > Throughput Mbit/s, bigger is better > Min Median Average Max > 4.2.0-rc3+ 1364 1686 1678 1938 > 4.2.0-rc3+flush1k 1824 2269 2275 2647 > > Send Service Demand, smaller is better > Min Median Average Max > 4.2.0-rc3+ 0.236 0.558 0.524 0.802 > 4.2.0-rc3+flush1k 0.176 0.503 0.471 0.738 > > Receive Service Demand, smaller is better. > Min Median Average Max > 4.2.0-rc3+ 1.906 2.188 2.191 2.531 > 4.2.0-rc3+flush1k 0.448 0.529 0.533 0.692 > > > Signed-off-by: Eric Dumazet <edumazet at google.com> > Tested-by: Rick Jones <rick.jones2 at hp.com> > Cc: "Michael S. Tsirkin" <mst at redhat.com>Ideally this needs to also be tested on non-vxlan configs with gro in host, to make sure this doesn't cause regressions. But I don't see why it should: GRO overhead is pretty small if packets don't need to be combined. Acked-by: Michael S. Tsirkin <mst at redhat.com>> --- > drivers/net/virtio_net.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 7fbca37a1adf..66f08f622dc6 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -518,7 +518,7 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq, > > skb_mark_napi_id(skb, &rq->napi); > > - netif_receive_skb(skb); > + napi_gro_receive(&rq->napi, skb); > return; > > frame_err: > @@ -756,7 +756,7 @@ static int virtnet_poll(struct napi_struct *napi, int budget) > /* Out of packets? */ > if (received < budget) { > r = virtqueue_enable_cb_prepare(rq->vq); > - napi_complete(napi); > + napi_complete_done(napi, received); > if (unlikely(virtqueue_poll(rq->vq, r)) && > napi_schedule_prep(napi)) { > virtqueue_disable_cb(rq->vq); >
On 08/03/2015 06:37 AM, Michael S. Tsirkin wrote:> Ideally this needs to also be tested on non-vxlan configs with gro in > host, to make sure this doesn't cause regressions.Measured with the same instances on the same hardware and software, taking a path through the stack (public rather than private IPs, with Distributed Virtual Router (DVR) enabled) which gives them GRO: Throughput Min Median Average Max 4.2.0-rc3+_hostGRO 6713 8351 8232 9102 4.2.0-rc3+flush1k_hostGRO 6539 8267 8206 8982 As singletons, Mins and Maxes probably have rather high variability, I'd focus on the Median and Average and those are within 1%. Send Service Demand Min Median Average Max 4.2.0-rc3+_hostGRO 0.332 0.496 0.490 0.651 4.2.0-rc3+flush1k_hostGRO 0.328 0.493 0.488 0.678 Receive Service Demand Min Median Average Max 4.2.0-rc3+_hostGRO 0.386 0.469 0.485 0.677 4.2.0-rc3+flush1k_hostGRO 0.369 0.466 0.477 0.665 happy benchmarking, rick
From: Eric Dumazet <eric.dumazet at gmail.com> Date: Fri, 31 Jul 2015 18:25:17 +0200> From: Eric Dumazet <edumazet at google.com> > > Straightforward patch to add GRO processing to virtio_net. > > napi_complete_done() usage allows more aggressive aggregation, > opted-in by setting /sys/class/net/xxx/gro_flush_timeout > > Tested: > > Setting /sys/class/net/xxx/gro_flush_timeout to 1000 nsec, > Rick Jones reported following results. > > One VM of each on a pair of OpenStack compute nodes with E5-2650Lv3 CPUs > and Intel 82599ES-based NICs. So, two "before" and two "after" VMs. > The OpenStack compute nodes were running OpenStack Kilo, with VxLAN > encapsulation being used through OVS so no GRO coming-up the host > stack. The compute nodes themselves were running a 3.14-based kernel. > > Single-stream netperf, CPU utilizations and thus service demands are > based on intra-guest reported CPU....> Signed-off-by: Eric Dumazet <edumazet at google.com> > Tested-by: Rick Jones <rick.jones2 at hp.com>Applied.
Apparently Analagous Threads
- [PATCH net-next] virtio_net: add gro capability
- [PATCH net-next] virtio_net: add gro capability
- [PATCH net-next] virtio_net: add gro capability
- [PATCH net-next V3 1/2] net: introduce skb_coalesce_rx_frag()
- [PATCH net-next V3 1/2] net: introduce skb_coalesce_rx_frag()