thr3ads.net - search: "tcp

[PATCH net-next v2 2/5] virtio-net: transmit napi

2017 Apr 21

3

[PATCH net-next v2 2/5] virtio-net: transmit napi

...t;> The cycle cost is significant without affinity regardless of whether the >> optimization is used. > > > Yes, I noticed this in the past too. > >> Though this is not limited to napi-tx, it is more >> pronounced in that mode than without napi. >> >> 1x TCP_RR for affinity configuration {process, rx_irq, tx_irq}: >> >> upstream: >> >> 1,1,1: 28985 Mbps, 278 Gcyc >> 1,0,2: 30067 Mbps, 402 Gcyc >> >> napi tx: >> >> 1,1,1: 34492 Mbps, 269 Gcyc >> 1,0,2: 36527 Mbps, 537 Gcyc (!) >> 1,0,1: 3626...

[PATCH net-next v2 2/5] virtio-net: transmit napi

2017 Apr 21

3

[PATCH net-next v2 2/5] virtio-net: transmit napi

...t;> The cycle cost is significant without affinity regardless of whether the >> optimization is used. > > > Yes, I noticed this in the past too. > >> Though this is not limited to napi-tx, it is more >> pronounced in that mode than without napi. >> >> 1x TCP_RR for affinity configuration {process, rx_irq, tx_irq}: >> >> upstream: >> >> 1,1,1: 28985 Mbps, 278 Gcyc >> 1,0,2: 30067 Mbps, 402 Gcyc >> >> napi tx: >> >> 1,1,1: 34492 Mbps, 269 Gcyc >> 1,0,2: 36527 Mbps, 537 Gcyc (!) >> 1,0,1: 3626...

[PATCH net-next v2 2/5] virtio-net: transmit napi

2017 Apr 20

2

[PATCH net-next v2 2/5] virtio-net: transmit napi

...of napi tx. And enabling the optimization is always a win over keeping it off, even without irq affinity. The cycle cost is significant without affinity regardless of whether the optimization is used. Though this is not limited to napi-tx, it is more pronounced in that mode than without napi. 1x TCP_RR for affinity configuration {process, rx_irq, tx_irq}: upstream: 1,1,1: 28985 Mbps, 278 Gcyc 1,0,2: 30067 Mbps, 402 Gcyc napi tx: 1,1,1: 34492 Mbps, 269 Gcyc 1,0,2: 36527 Mbps, 537 Gcyc (!) 1,0,1: 36269 Mbps, 394 Gcyc 1,0,0: 34674 Mbps, 402 Gcyc This is a particularly strong example. It is also...

[PATCH net-next v2 2/5] virtio-net: transmit napi

2017 Apr 20

2

[PATCH net-next v2 2/5] virtio-net: transmit napi

...of napi tx. And enabling the optimization is always a win over keeping it off, even without irq affinity. The cycle cost is significant without affinity regardless of whether the optimization is used. Though this is not limited to napi-tx, it is more pronounced in that mode than without napi. 1x TCP_RR for affinity configuration {process, rx_irq, tx_irq}: upstream: 1,1,1: 28985 Mbps, 278 Gcyc 1,0,2: 30067 Mbps, 402 Gcyc napi tx: 1,1,1: 34492 Mbps, 269 Gcyc 1,0,2: 36527 Mbps, 537 Gcyc (!) 1,0,1: 36269 Mbps, 394 Gcyc 1,0,0: 34674 Mbps, 402 Gcyc This is a particularly strong example. It is also...

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

2014 Dec 02

2

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

...l > >>> packet transmission were noticed. This is expected since TSQ and > >>>other > >>> optimization for small packet transmission work after tx interrupt. > >>>But > >>> will use more cpu for large packets. > >>> - For TCP_RR, regression (10% on transaction rate and cpu > >>>utilization) were > >>> found. Tx interrupt won't help but cause overhead in this case. > >>>Using > >>> more aggressive coalescing parameters may help to reduce the > >>>regression...

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

2014 Dec 02

2

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

...l > >>> packet transmission were noticed. This is expected since TSQ and > >>>other > >>> optimization for small packet transmission work after tx interrupt. > >>>But > >>> will use more cpu for large packets. > >>> - For TCP_RR, regression (10% on transaction rate and cpu > >>>utilization) were > >>> found. Tx interrupt won't help but cause overhead in this case. > >>>Using > >>> more aggressive coalescing parameters may help to reduce the > >>>regression...

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

2014 Dec 02

2

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

...is is expected since TSQ and > >> >>>other > >> >>> optimization for small packet transmission work after tx > >>interrupt. > >> >>>But > >> >>> will use more cpu for large packets. > >> >>> - For TCP_RR, regression (10% on transaction rate and cpu > >> >>>utilization) were > >> >>> found. Tx interrupt won't help but cause overhead in this case. > >> >>>Using > >> >>> more aggressive coalescing parameters may help to re...

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

2014 Dec 02

2

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

...is is expected since TSQ and > >> >>>other > >> >>> optimization for small packet transmission work after tx > >>interrupt. > >> >>>But > >> >>> will use more cpu for large packets. > >> >>> - For TCP_RR, regression (10% on transaction rate and cpu > >> >>>utilization) were > >> >>> found. Tx interrupt won't help but cause overhead in this case. > >> >>>Using > >> >>> more aggressive coalescing parameters may help to re...

[PATCH net-next v2 2/5] virtio-net: transmit napi

2017 Apr 24

2

[PATCH net-next v2 2/5] virtio-net: transmit napi

...gt;> >> optimization is used. >> > >> > >> > Yes, I noticed this in the past too. >> > >> >> Though this is not limited to napi-tx, it is more >> >> pronounced in that mode than without napi. >> >> >> >> 1x TCP_RR for affinity configuration {process, rx_irq, tx_irq}: >> >> >> >> upstream: >> >> >> >> 1,1,1: 28985 Mbps, 278 Gcyc >> >> 1,0,2: 30067 Mbps, 402 Gcyc >> >> >> >> napi tx: >> >> >> >> 1,1,1: 3...

[PATCH net-next v2 2/5] virtio-net: transmit napi

2017 Apr 24

2

[PATCH net-next v2 2/5] virtio-net: transmit napi

...gt;> >> optimization is used. >> > >> > >> > Yes, I noticed this in the past too. >> > >> >> Though this is not limited to napi-tx, it is more >> >> pronounced in that mode than without napi. >> >> >> >> 1x TCP_RR for affinity configuration {process, rx_irq, tx_irq}: >> >> >> >> upstream: >> >> >> >> 1,1,1: 28985 Mbps, 278 Gcyc >> >> 1,0,2: 30067 Mbps, 402 Gcyc >> >> >> >> napi tx: >> >> >> >> 1,1,1: 3...

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

2014 Dec 02

0

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

...gt; > >> >>>other > > >> >>> optimization for small packet transmission work after tx > > >>interrupt. > > >> >>>But > > >> >>> will use more cpu for large packets. > > >> >>> - For TCP_RR, regression (10% on transaction rate and cpu > > >> >>>utilization) were > > >> >>> found. Tx interrupt won't help but cause overhead in this case. > > >> >>>Using > > >> >>> more aggressive coalescing param...

[PATCH V2 6/6] vhost_net: correctly limit the max pending buffers

2013 Sep 02

2

[PATCH V2 6/6] vhost_net: correctly limit the max pending buffers

...s submitted from guest. Guest can easily exceeds the limitation by > keeping sending packets. > > So this patch moves the check into main loop. Tests shows about 5%-10% > improvement on per cpu throughput for guest tx. But a 5% drop on per cpu > transaction rate for a single session TCP_RR. Any explanation for the drop? single session TCP_RR is unlikely to exceed VHOST_MAX_PEND, correct? > > Signed-off-by: Jason Wang <jasowang at redhat.com> > --- > drivers/vhost/net.c | 15 ++++----------- > 1 files changed, 4 insertions(+), 11 deletions(-) > > diff...

[PATCH V2 6/6] vhost_net: correctly limit the max pending buffers

2013 Sep 02

2

[PATCH V2 6/6] vhost_net: correctly limit the max pending buffers

...s submitted from guest. Guest can easily exceeds the limitation by > keeping sending packets. > > So this patch moves the check into main loop. Tests shows about 5%-10% > improvement on per cpu throughput for guest tx. But a 5% drop on per cpu > transaction rate for a single session TCP_RR. Any explanation for the drop? single session TCP_RR is unlikely to exceed VHOST_MAX_PEND, correct? > > Signed-off-by: Jason Wang <jasowang at redhat.com> > --- > drivers/vhost/net.c | 15 ++++----------- > 1 files changed, 4 insertions(+), 11 deletions(-) > > diff...

[PATCH net-next v3 4/4] net: vhost: add rx busy polling in tx path

2018 Jun 30

1

[PATCH net-next v3 4/4] net: vhost: add rx busy polling in tx path

...dwidth, use the netperf to test throughput and mean > latency. When running the tests, the vhost-net kthread of > that VM, is alway 100% CPU. The commands are shown as below. > > iperf3 -s -D > iperf3 -c IP -i 1 -P 1 -t 20 -M 1400 > > or > netserver > netperf -H IP -t TCP_RR -l 20 -- -O "THROUGHPUT,MEAN_LATENCY" > > host -> guest: > iperf3: > * With the patch: 27.0 Gbits/sec > * Without the patch: 14.4 Gbits/sec > > netperf (TCP_RR): > * With the patch: 48039.56 trans/s, 20.64us mean latency > * Without the patch: 460...

No subject

2011 Jun 09

0

No subject

...ed by something like mpstat). Sometimes throughput doesn't increase (e.g. guest-host) by CPU utilization does decrease. So it's interesting. Another issue is that we are trying to improve the latency of a busy queue here. However STREAM/MAERTS tests ignore the latency (more or less) while TCP_RR by default runs a single packet per queue. Without arguing about whether these are practically interesting workloads, these results are thus unlikely to be significantly affected by the optimization in question. What we are interested in, thus, is either TCP_RR with a -b flag (configure with --en...

No subject

2011 Jun 09

0

No subject

...ed by something like mpstat). Sometimes throughput doesn't increase (e.g. guest-host) by CPU utilization does decrease. So it's interesting. Another issue is that we are trying to improve the latency of a busy queue here. However STREAM/MAERTS tests ignore the latency (more or less) while TCP_RR by default runs a single packet per queue. Without arguing about whether these are practically interesting workloads, these results are thus unlikely to be significantly affected by the optimization in question. What we are interested in, thus, is either TCP_RR with a -b flag (configure with --en...

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

2014 Dec 02

0

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

...e noticed. This is expected since TSQ >> and >> >>>other >> >>> optimization for small packet transmission work after tx >> interrupt. >> >>>But >> >>> will use more cpu for large packets. >> >>> - For TCP_RR, regression (10% on transaction rate and cpu >> >>>utilization) were >> >>> found. Tx interrupt won't help but cause overhead in this >> case. >> >>>Using >> >>> more aggressive coalescing parameters may help to reduce t...

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

2014 Dec 02

0

[PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

...e noticed. This is expected since TSQ >> and >> >>>other >> >>> optimization for small packet transmission work after tx >> interrupt. >> >>>But >> >>> will use more cpu for large packets. >> >>> - For TCP_RR, regression (10% on transaction rate and cpu >> >>>utilization) were >> >>> found. Tx interrupt won't help but cause overhead in this >> case. >> >>>Using >> >>> more aggressive coalescing parameters may help to reduce t...

[PATCH net] vhost_net: correctly check tx avail during rx busy polling

2017 Sep 01

2

[PATCH net] vhost_net: correctly check tx avail during rx busy polling

...not guest has filled more available buffer since last avail idx synchronization which was just done by vhost_vq_avail_empty() before. What we really want is checking pending buffers in the avail ring. Fix this by calling vhost_vq_avail_empty() instead. This issue could be noticed by doing netperf TCP_RR benchmark as client from guest (but not host). With this fix, TCP_RR from guest to localhost restores from 1375.91 trans per sec to 55235.28 trans per sec on my laptop (Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz). Fixes: 030881372460 ("vhost_net: basic polling support") Signed-off-by: Jaso...

[PATCH net] vhost_net: correctly check tx avail during rx busy polling

2017 Sep 01

2

[PATCH net] vhost_net: correctly check tx avail during rx busy polling

...not guest has filled more available buffer since last avail idx synchronization which was just done by vhost_vq_avail_empty() before. What we really want is checking pending buffers in the avail ring. Fix this by calling vhost_vq_avail_empty() instead. This issue could be noticed by doing netperf TCP_RR benchmark as client from guest (but not host). With this fix, TCP_RR from guest to localhost restores from 1375.91 trans per sec to 55235.28 trans per sec on my laptop (Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz). Fixes: 030881372460 ("vhost_net: basic polling support") Signed-off-by: Jaso...

search for: tcp_rr