thr3ads.net - search: "udp

[PATCH 2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used()

2013 Aug 23

1

[PATCH 2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used()

...it can especially when guest does support event index. When > guest enable tx interrupt, this can saves us some unnecessary signal to > guest. I will do some test. Have done some test. I can see 2% - 3% increasing in both aggregate transaction rate and per cpu transaction rate in TCP_RR and UDP_RR test. I'm using ixgbe. W/o this patch, I can see more than 100 calls of vhost_add_used_signal() in one vhost_zerocopy_signaled_used(). This is because ixgbe (and other modern ethernet driver) tends to free old tx skbs in a loop during tx interrupt, and vhost tend to batch the adding used and s...

[PATCH 2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used()

2013 Aug 23

1

[PATCH 2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used()

...it can especially when guest does support event index. When > guest enable tx interrupt, this can saves us some unnecessary signal to > guest. I will do some test. Have done some test. I can see 2% - 3% increasing in both aggregate transaction rate and per cpu transaction rate in TCP_RR and UDP_RR test. I'm using ixgbe. W/o this patch, I can see more than 100 calls of vhost_add_used_signal() in one vhost_zerocopy_signaled_used(). This is because ixgbe (and other modern ethernet driver) tends to free old tx skbs in a loop during tx interrupt, and vhost tend to batch the adding used and s...

No subject

2011 Oct 27

0

No subject

...) port 0 AF_INET : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 11160.63 16384 87380 # netperf -H 192.168.33.4,ipv4 -t UDP_RR MIGRATED UDP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.33.4 (192.168.33.4) port 0 AF_INET : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec...

No subject

2011 Oct 27

0

No subject

...) port 0 AF_INET : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 10.00 11160.63 16384 87380 # netperf -H 192.168.33.4,ipv4 -t UDP_RR MIGRATED UDP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.33.4 (192.168.33.4) port 0 AF_INET : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec...

[PATCH] vhost-net: set packet weight of tx polling to 2 * vq size

2018 Apr 09

0

[PATCH] vhost-net: set packet weight of tx polling to 2 * vq size

...64/ 8/ 0%/ 0% 64/ 8/ -1%/ -2% > 256/ 1/ -3%/ -4% 256/ 1/ -4%/ -2% > 256/ 4/ +3%/ +4% 256/ 4/ +1%/ +2% > 256/ 8/ +2%/ 0% 256/ 8/ +1%/ -1% > > vq size=256 UDP_RR vq size=512 UDP_RR > size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% > 1/ 1/ -5%/ +1% 1/ 1/ -3%/ -2% > 1/ 4/ +4%/ +1% 1/ 4/ -2%/ +2% > 1/ 8/ -1%/ -1% 1/ 8...

[PATCH] vhost-net: set packet weight of tx polling to 2 * vq size

2018 Apr 09

0

[PATCH] vhost-net: set packet weight of tx polling to 2 * vq size

...%/ -2% > > > 256/ 1/ -3%/ -4% 256/ 1/ -4%/ -2% > > > 256/ 4/ +3%/ +4% 256/ 4/ +1%/ +2% > > > 256/ 8/ +2%/ 0% 256/ 8/ +1%/ -1% > > > > > > vq size=256 UDP_RR vq size=512 UDP_RR > > > size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% > > > 1/ 1/ -5%/ +1% 1/ 1/ -3%/ -2% > > > 1/ 4/ +4%/ +1% 1/ 4/ -2%/ +2% > > > 1/...

[PATCH RFC v8 02/11] vhost: use batched get_vq_desc version

2020 Jul 01

0

[PATCH RFC v8 02/11] vhost: use batched get_vq_desc version

...ed affinity properly (manually assigning CPU on host/guest and > setting IRQs on guest), making them perform equally with and without > the patch again. Maybe the batching makes the scheduler perform > better. > > > > > > > - rests of the test goes noticeably worse: UDP_RR goes from ~6347 > > > > > > transactions/sec to 5830 > > * Regarding UDP_RR, TCP_STREAM, and TCP_RR, proper CPU pinning makes > them perform similarly again, only a very small performance drop > observed. It could be just noise. > ** All of them perform better than...

[PATCH RFC v8 02/11] vhost: use batched get_vq_desc version

2020 Jul 01

0

[PATCH RFC v8 02/11] vhost: use batched get_vq_desc version

...patch again. Maybe the batching makes the scheduler perform > better. Note that for UDP_STREAM, the result is pretty trick to be analyzed. E.g setting a sndbuf for TAP may help for the performance (reduce the drop). > >>>>>> - rests of the test goes noticeably worse: UDP_RR goes from ~6347 >>>>>> transactions/sec to 5830 > * Regarding UDP_RR, TCP_STREAM, and TCP_RR, proper CPU pinning makes > them perform similarly again, only a very small performance drop > observed. It could be just noise. > ** All of them perform better than vanilla if...

[PATCH RFC v8 02/11] vhost: use batched get_vq_desc version

2020 Jul 01

0

[PATCH RFC v8 02/11] vhost: use batched get_vq_desc version

...Thanks! Actually, it's better to skip the UDP_STREAM test since: - My understanding is very few application is using raw UDP stream - It's hard to analyze (usually you need to count the drop ratio etc) > >>>>>>>> - rests of the test goes noticeably worse: UDP_RR goes from ~6347 >>>>>>>> transactions/sec to 5830 >>> * Regarding UDP_RR, TCP_STREAM, and TCP_RR, proper CPU pinning makes >>> them perform similarly again, only a very small performance drop >>> observed. It could be just noise. >>> ** Al...

[PATCH v2 net-next] virtio_net: Add ethtool stats

2018 Jan 17

1

[PATCH v2 net-next] virtio_net: Add ethtool stats

...- Guest has 2 vcpus and 2 queues - Guest runs netserver - Host runs 100-flow super_netperf Before After Diff UDP_STREAM 18byte 86.22 87.00 +0.90% UDP_STREAM 1472byte 4055.27 4042.18 -0.32% TCP_STREAM 16956.32 16890.63 -0.39% UDP_RR 178667.11 185862.70 +4.03% TCP_RR 128473.04 124985.81 -2.71% Signed-off-by: Toshiaki Makita <makita.toshiaki at lab.ntt.co.jp> --- v2: - Removed redundant counters which can be obtained from dev_get_stats. - Made queue counter structure different for tx an...

[PATCH net-next 1/2] virtio_net: Fix short frame length check

2023 Jan 14

2

[PATCH net-next 1/2] virtio_net: Fix short frame length check

...l, a UDPv4 frame can easily do it since Ethernet is 14B, IP header is 20, and UDP is only 8 so that only comes to 42B if I recall correctly. Similarly I think a TCPv4 Frame can be as small as 54B if you disable all the option headers. A quick and dirty test would be to run something like a netperf UDP_RR test. I know in the case of the network stack we see the transmits that go out are less than 60B until they are padded on xmit, usually by the device. My concern is wanting to make sure all those paths are covered before we assume that all the packets will be padded.

[PATCH RFC v8 02/11] vhost: use batched get_vq_desc version

2020 Jul 09

0

[PATCH RFC v8 02/11] vhost: use batched get_vq_desc version

...ce: > > > > - My understanding is very few application is using raw UDP stream > > - It's hard to analyze (usually you need to count the drop ratio etc) > > > > > > > > > >>>>>>>> - rests of the test goes noticeably worse: UDP_RR goes from ~6347 > > >>>>>>>> transactions/sec to 5830 > > >>> * Regarding UDP_RR, TCP_STREAM, and TCP_RR, proper CPU pinning makes > > >>> them perform similarly again, only a very small performance drop > > >>> observed. It...

[PATCH RFC v8 02/11] vhost: use batched get_vq_desc version

2020 Jun 22

0

[PATCH RFC v8 02/11] vhost: use batched get_vq_desc version

...* If I forward packets between two vhost-net interfaces in the guest > using a linux bridge in the host: > - netperf UDP_STREAM shows a performance increase of 1.8, almost > doubling performance. This gets lower as frame size increase. > - rests of the test goes noticeably worse: UDP_RR goes from ~6347 > transactions/sec to 5830 > - TCP_STREAM goes from ~10.7 gbps to ~7Gbps Which direction did you mean here? Guest TX or RX? > - TCP_RR from 6223.64 transactions/sec to 5739.44 Perf diff might help. I think we can start from the RR result which should be easier....

[PATCH RFC v8 02/11] vhost: use batched get_vq_desc version

2020 Jun 22

0

[PATCH RFC v8 02/11] vhost: use batched get_vq_desc version

...n the guest > using a linux bridge in the host: And here I guess you mean virtio-net in the guest kernel? > - netperf UDP_STREAM shows a performance increase of 1.8, almost > doubling performance. This gets lower as frame size increase. > - rests of the test goes noticeably worse: UDP_RR goes from ~6347 > transactions/sec to 5830 OK so it seems plausible that we still have a bug where an interrupt is delayed. That is the main difference between pmd and virtio. Let's try disabling event index, and see what happens - that's the trickiest part of interrupts. > - TC...

[PATCHv3 0/4] qemu-kvm: vhost net support

2009 Aug 17

2

[PATCHv3 0/4] qemu-kvm: vhost net support

This adds support for vhost-net virtio kernel backend. This is RFC, but works without issues for me. Still needs to be split up, tested and benchmarked properly, but posting it here in case people want to test drive the kernel bits I posted. Changes since v2: - minor fixes - added patch to build on RHEL5.3 Changes since v1: - rebased on top of 9dc275d9d660fe1cd64d36102d600885f9fdb88a Michael

[PATCHv3 0/4] qemu-kvm: vhost net support

2009 Aug 17

2

[PATCHv3 0/4] qemu-kvm: vhost net support

This adds support for vhost-net virtio kernel backend. This is RFC, but works without issues for me. Still needs to be split up, tested and benchmarked properly, but posting it here in case people want to test drive the kernel bits I posted. Changes since v2: - minor fixes - added patch to build on RHEL5.3 Changes since v1: - rebased on top of 9dc275d9d660fe1cd64d36102d600885f9fdb88a Michael

domU has a better networking than dom0

2011 Dec 20

0

domU has a better networking than dom0

I am using a 2.6.35 domU guest on a 2.6.32 dom0 and a 4.0.1 Xen hypervisor. Using netperf with UDP_RR I get 1819 and 622 transaction per second for domU and dom0 respectively, but in UDP_STREAM test domU is showing a better throughput for all packet sizes (on average around 3Mbps better, on a 10/100 network with around 96Mbps average native throuput). Is it normal or I am missing something?

[PATCH] vhost-net: add limitation of sent packets for tx polling

2018 Apr 03

0

[PATCH] vhost-net: add limitation of sent packets for tx polling

...1/ 4/ +1%/ 0% > 1/ 8/ +1%/ -2% > 64/ 1/ -6%/ 0% > 64/ 4/ 0%/ +2% > 64/ 8/ 0%/ 0% > 256/ 1/ -3%/ -4% > 256/ 4/ +3%/ +4% > 256/ 8/ +2%/ 0% > > UDP_RR > > size/sessions/+thu%/+normalize% > 1/ 1/ -5%/ +1% > 1/ 4/ +4%/ +1% > 1/ 8/ -1%/ -1% > 64/ 1/ -2%/ -3% > 64/ 4/ -5%/ -1% > 64/ 8/ 0%/ -1% > 256/ 1/ +7%/ +...

[PATCH 2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used()

2013 Aug 16

2

[PATCH 2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used()

On Fri, Aug 16, 2013 at 01:16:26PM +0800, Jason Wang wrote: > Switch to use vhost_add_used_and_signal_n() to avoid multiple calls to > vhost_add_used_and_signal(). With the patch we will call at most 2 times > (consider done_idx warp around) compared to N times w/o this patch. > > Signed-off-by: Jason Wang <jasowang at redhat.com> So? Does this help performance then? >

[PATCH RFC v8 02/11] vhost: use batched get_vq_desc version

2020 Jun 22

0

[PATCH RFC v8 02/11] vhost: use batched get_vq_desc version

...nce you said you are using L2 bridging. I guess it's unimportant. > > > > > - netperf UDP_STREAM shows a performance increase of 1.8, almost > > > doubling performance. This gets lower as frame size increase. > > > - rests of the test goes noticeably worse: UDP_RR goes from ~6347 > > > transactions/sec to 5830 > > > > OK so it seems plausible that we still have a bug where an interrupt > > is delayed. That is the main difference between pmd and virtio. > > Let's try disabling event index, and see what happens - that's...

search for: udp_rr