thr3ads.net - search: "tcp

2011 Jun 19

2

RFT: virtio_net: limit xmit polling

OK, different people seem to test different trees. In the hope to get everyone on the same page, I created several variants of this patch so they can be compared. Whoever's interested, please check out the following, and tell me how these compare: kernel: git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git virtio-net-limit-xmit-polling/base - this is net-next baseline to test

RFT: virtio_net: limit xmit polling

2011 Jun 19

2

RFT: virtio_net: limit xmit polling

OK, different people seem to test different trees. In the hope to get everyone on the same page, I created several variants of this patch so they can be compared. Whoever's interested, please check out the following, and tell me how these compare: kernel: git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git virtio-net-limit-xmit-polling/base - this is net-next baseline to test

No subject

2011 Oct 27

0

No subject

...port 0 AF_INET : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 122880 122880 1 1 10.00 12072.64 229376 229376 # netperf -H 192.168.33.4,ipv4 -t TCP_STREAM MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.33.4 (192.168.33.4) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87...

No subject

2011 Oct 27

0

No subject

...port 0 AF_INET : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 122880 122880 1 1 10.00 12072.64 229376 229376 # netperf -H 192.168.33.4,ipv4 -t TCP_STREAM MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.33.4 (192.168.33.4) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87...

No subject

2011 Jun 09

0

No subject

...; Instances Base V0 V1 V2 > 1 18,758.48 19,112.50 18,597.07 19,252.04 > 25 80,500.50 78,801.78 80,590.68 78,782.07 > 50 80,594.20 77,985.44 80,431.72 77,246.90 > 100 82,023.23 81,325.96 81,303.32 81,727.54 > > Here's the local guest-to-guest summary for 1 VM pair doing TCP_STREAM with > 256, 1K, 4K and 16K message size in Mbps: > > 256: > Instances Base V0 V1 V2 > 1 961.78 1,115.92 794.02 740.37 > 4 2,498.33 2,541.82 2,441.60 2,308.26 > > 1K: > 1 3,476.61 3,522.02 2,170.86 1,395.57 > 4 6,344.30 7,056.57 7,275....

No subject

2011 Jun 09

0

No subject

...; Instances Base V0 V1 V2 > 1 18,758.48 19,112.50 18,597.07 19,252.04 > 25 80,500.50 78,801.78 80,590.68 78,782.07 > 50 80,594.20 77,985.44 80,431.72 77,246.90 > 100 82,023.23 81,325.96 81,303.32 81,727.54 > > Here's the local guest-to-guest summary for 1 VM pair doing TCP_STREAM with > 256, 1K, 4K and 16K message size in Mbps: > > 256: > Instances Base V0 V1 V2 > 1 961.78 1,115.92 794.02 740.37 > 4 2,498.33 2,541.82 2,441.60 2,308.26 > > 1K: > 1 3,476.61 3,522.02 2,170.86 1,395.57 > 4 6,344.30 7,056.57 7,275....

[RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap()

2019 Mar 11

2

[RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap()

...ser() friends since they had too much >> overheads like checks, spec barriers or even hardware feature >> toggling. This is done through setup kernel address through vmap() and >> resigter MMU notifier for invalidation. >> >> Test shows about 24% improvement on TX PPS. TCP_STREAM doesn't see >> obvious improvement. > How is this going to work for CPUs with virtually tagged caches? Anything different that you worry? I can have a test but do you know any archs that use virtual tag cache? Thanks

[RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap()

2019 Mar 11

2

[RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap()

...ser() friends since they had too much >> overheads like checks, spec barriers or even hardware feature >> toggling. This is done through setup kernel address through vmap() and >> resigter MMU notifier for invalidation. >> >> Test shows about 24% improvement on TX PPS. TCP_STREAM doesn't see >> obvious improvement. > How is this going to work for CPUs with virtually tagged caches? Anything different that you worry? I can have a test but do you know any archs that use virtual tag cache? Thanks

[RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap()

2019 Mar 11

4

[RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap()

...> overheads like checks, spec barriers or even hardware feature >> > > toggling. This is done through setup kernel address through vmap() and >> > > resigter MMU notifier for invalidation. >> > > >> > > Test shows about 24% improvement on TX PPS. TCP_STREAM doesn't see >> > > obvious improvement. >> > How is this going to work for CPUs with virtually tagged caches? >> >> >> Anything different that you worry? > > If caches have virtual tags then kernel and userspace view of memory > might not be au...

[RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap()

2019 Mar 11

4

[RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap()

...> overheads like checks, spec barriers or even hardware feature >> > > toggling. This is done through setup kernel address through vmap() and >> > > resigter MMU notifier for invalidation. >> > > >> > > Test shows about 24% improvement on TX PPS. TCP_STREAM doesn't see >> > > obvious improvement. >> > How is this going to work for CPUs with virtually tagged caches? >> >> >> Anything different that you worry? > > If caches have virtual tags then kernel and userspace view of memory > might not be au...

[PATCH net-next 0/6] vhost: accelerate metadata access

2019 May 30

1

[PATCH net-next 0/6] vhost: accelerate metadata access

...too much > > overheads like checks, spec barriers or even hardware feature > > toggling like SMAP. This is done through setup kernel address through > > direct mapping and co-opreate VM management with MMU notifiers. > > > > Test shows about 23% improvement on TX PPS. TCP_STREAM doesn't see > > obvious improvement. > > I'm still waiting for some review from mst. > > If I don't see any review soon I will just wipe these changes from > patchwork as it serves no purpose to just let them rot there. > > Thank you. I thought we agreed I...

[PATCH net-next RFC 2/2] vhost_net: basic polling support

2015 Oct 22

1

[PATCH net-next RFC 2/2] vhost_net: basic polling support

...+3% > > Is there a measureable increase in cpu utilization > with busyloop_timeout = 0? And since a netperf TCP_RR test is involved, be careful about what netperf reports for CPU util if that increase isn't in the context of the guest OS. For completeness, looking at the effect on TCP_STREAM and TCP_MAERTS, aggregate _RR and even aggregate _RR/packets per second for many VMs on the same system would be in order. happy benchmarking, rick jones

[PATCH] virtio-net: mergeable buffer size should include virtio-net header

2013 Nov 14

2

[PATCH] virtio-net: mergeable buffer size should include virtio-net header

...ffic due to TCP window / SKB truesize effects. This commit changes the mergeable buffer size to include the virtio-net header. The buffer size is cacheline-aligned because skb_page_frag_refill will not automatically align the requested size. Benchmarks taken from an average of 5 netperf 30-second TCP_STREAM runs between two QEMU VMs on a single physical machine. Each VM has two VCPUs and vhost enabled. All VMs and vhost threads run in a single 4 CPU cgroup cpuset, using cgroups to ensure that other processes in the system will not be scheduled on the benchmark CPUs. Transmit offloads and mergeable rec...

[PATCH] virtio-net: mergeable buffer size should include virtio-net header

2013 Nov 14

2

[PATCH] virtio-net: mergeable buffer size should include virtio-net header

...ffic due to TCP window / SKB truesize effects. This commit changes the mergeable buffer size to include the virtio-net header. The buffer size is cacheline-aligned because skb_page_frag_refill will not automatically align the requested size. Benchmarks taken from an average of 5 netperf 30-second TCP_STREAM runs between two QEMU VMs on a single physical machine. Each VM has two VCPUs and vhost enabled. All VMs and vhost threads run in a single 4 CPU cgroup cpuset, using cgroups to ensure that other processes in the system will not be scheduled on the benchmark CPUs. Transmit offloads and mergeable rec...

[PATCH] vhost-net: set packet weight of tx polling to 2 * vq size

2018 Apr 09

0

[PATCH] vhost-net: set packet weight of tx polling to 2 * vq size

...64/ 8/ 0%/ -1% 64/ 8/ -2%/ +1% > 256/ 1/ +7%/ +1% 256/ 1/ -7%/ 0% > 256/ 4/ +1%/ +1% 256/ 4/ -3%/ -4% > 256/ 8/ +2%/ +2% 256/ 8/ +1%/ +1% > > vq size=256 TCP_STREAM vq size=512 TCP_STREAM > size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% > 64/ 1/ 0%/ -3% 64/ 1/ 0%/ 0% > 64/ 4/ +3%/ -1% 64/ 4/ -2%/ +4% > 64/ 8/ +9%/ -4% 64/ 8...

[PATCH] vhost-net: set packet weight of tx polling to 2 * vq size

2018 Apr 09

0

[PATCH] vhost-net: set packet weight of tx polling to 2 * vq size

...%/ +1% > > > 256/ 1/ +7%/ +1% 256/ 1/ -7%/ 0% > > > 256/ 4/ +1%/ +1% 256/ 4/ -3%/ -4% > > > 256/ 8/ +2%/ +2% 256/ 8/ +1%/ +1% > > > > > > vq size=256 TCP_STREAM vq size=512 TCP_STREAM > > > size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% > > > 64/ 1/ 0%/ -3% 64/ 1/ 0%/ 0% > > > 64/ 4/ +3%/ -1% 64/ 4/ -2%/ +4% > > > 64/...

[PATCH net-next] vhost_net: do not stall on zerocopy depletion

2017 Sep 30

2

[PATCH net-next] vhost_net: do not stall on zerocopy depletion

....com >> Signed-off-by: Willem de Bruijn <willemb at google.com> > > I'd like to see the effect on the non rate limited case though. > If guest is quick won't we have lots of copies then? Yes, but not significantly more than without this patch. I ran 1, 10 and 100 flow tcp_stream throughput tests from a sender in the guest to a receiver in the host. To answer the other benchmark question first, I did not see anything noteworthy when increasing vq->num from 256 to 1024. With 1 and 10 flows without this patch all packets use zerocopy. With the patch, less than 1% eschews...

[PATCH net-next] vhost_net: do not stall on zerocopy depletion

2017 Sep 30

2

[PATCH net-next] vhost_net: do not stall on zerocopy depletion

....com >> Signed-off-by: Willem de Bruijn <willemb at google.com> > > I'd like to see the effect on the non rate limited case though. > If guest is quick won't we have lots of copies then? Yes, but not significantly more than without this patch. I ran 1, 10 and 100 flow tcp_stream throughput tests from a sender in the guest to a receiver in the host. To answer the other benchmark question first, I did not see anything noteworthy when increasing vq->num from 256 to 1024. With 1 and 10 flows without this patch all packets use zerocopy. With the patch, less than 1% eschews...

[PATCH net-next v2] vhost_net: do not stall on zerocopy depletion

2017 Oct 06

1

[PATCH net-next v2] vhost_net: do not stall on zerocopy depletion

...Before the delay, both flows process around 80K pps. With the delay, before this patch, both process around 400. After this patch, the large flow is still rate limited, while the small reverts to its original rate. See also discussion in the first link, below. Without rate limiting, {1, 10, 100}x TCP_STREAM tests continued to send at 100% zerocopy. The limit in vhost_exceeds_maxpend must be carefully chosen. With vq->num >> 1, the flows remain correlated. This value happens to correspond to VHOST_MAX_PENDING for vq->num == 256. Allow smaller fractions and ensure correctness also for much...

[PATCH net-next v2] vhost_net: do not stall on zerocopy depletion

2017 Oct 06

1

[PATCH net-next v2] vhost_net: do not stall on zerocopy depletion

...Before the delay, both flows process around 80K pps. With the delay, before this patch, both process around 400. After this patch, the large flow is still rate limited, while the small reverts to its original rate. See also discussion in the first link, below. Without rate limiting, {1, 10, 100}x TCP_STREAM tests continued to send at 100% zerocopy. The limit in vhost_exceeds_maxpend must be carefully chosen. With vq->num >> 1, the flows remain correlated. This value happens to correspond to VHOST_MAX_PENDING for vq->num == 256. Allow smaller fractions and ensure correctness also for much...

search for: tcp_stream