Stefan Hajnoczi
2015-Apr-24 09:47 UTC
[virtio-dev] Zerocopy VM-to-VM networking using virtio-net
On Fri, Apr 24, 2015 at 9:12 AM, Luke Gorrie <luke at snabb.co> wrote:> - How fast would the new design likely be?This proposal eliminates two things in the path: 1. Compared to vhost_net, it bypasses the host tun driver and network stack, replacing it with direct vhost_net <-> vhost_net data transfer. At this level it's compared to vhost-user, but it's not programmable in userspace! 2. Data copies are eliminated because the Shared Buffers BAR gives both VMs access to the packets. My concern is the overhead of the vhost_net component copying descriptors between NICs. In a 100% shared memory model, each VM only has a receive queue that the other VM places packets into. There are no tx queues. The notification mechanism is an event fd that is ioeventfd for VM1 and irqfd for VM2. In other words, when VM1 kicks the queue, VM2 receives an interrupt (of course polling the receive queue is also possible). It would be interesting to compare the two approaches.> Our recent experience in Snabb Switch land is that networking on x86 is now > more of a HPC problem than a system programming problem. The SIMD bandwidth > per core keeps increasing that this erodes the value of traditional (and > complex) system programming optimizations. I will be interested to compare > notes with others on this, already on Haswell but more so when we have > AVX512. > > Incidentally, we also did a pile of work last year on zero-copy NIC->VM > transfers and discovered a lot of interesting problems and edge cases where > Virtio-net spec and/or drivers are hard to match up with common NICs. Happy > to explain a bit about our experience if that would be valuable.That sounds interesting, can you describe the setup? Stefan
Stefan Hajnoczi
2015-Apr-24 09:50 UTC
[virtio-dev] Zerocopy VM-to-VM networking using virtio-net
On Fri, Apr 24, 2015 at 10:47 AM, Stefan Hajnoczi <stefanha at gmail.com> wrote:> At this level it's compared to vhost-user, but it's not programmable > in userspace!s/compared/comparable/
Luke Gorrie
2015-Apr-24 12:17 UTC
[virtio-dev] Zerocopy VM-to-VM networking using virtio-net
On 24 April 2015 at 11:47, Stefan Hajnoczi <stefanha at gmail.com> wrote:> My concern is the overhead of the vhost_net component copying > descriptors between NICs.I see. So you would not have to reserve CPU resources for vswitches. Instead you would give all cores to the VMs and they would pay for their own networking. This would be especially appealing in the extreme case where all networking is "Layer 1" connectivity between local virtual machines. This would make VM<->VM links different to VM<->network links. I suppose that when you created VMs you would need to be conscious of whether or not you are placing them on the same host or NUMA node so that you can predict what network performance will be available. For what it is worth, I think this would make life more difficult for network operators hosting DPDK-style network applications ("NFV"). Virtio-net would become a more complex abstraction, the orchestration systems would need to take this into account, and there would be more opportunity for interoperability problems between virtual machines. The simpler alternative that I prefer is to provide network operators with a Virtio-net abstraction that behaves and performs in exactly the same way for all kinds of network traffic -- whether or not the VMs are on the same machine and NUMA node. That would be more in line with SR-IOV behavior which seems to me like the other horse in this race. Perhaps my world view here is too narrow though and other technologies like ivshmem are more relevant than I give them credit for? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20150424/c4f43b6f/attachment.html>
Luke Gorrie
2015-Apr-24 12:34 UTC
[virtio-dev] Zerocopy VM-to-VM networking using virtio-net
On 24 April 2015 at 11:47, Stefan Hajnoczi <stefanha at gmail.com> wrote:> > Incidentally, we also did a pile of work last year on zero-copy NIC->VM > > transfers and discovered a lot of interesting problems and edge cases > where > > Virtio-net spec and/or drivers are hard to match up with common NICs. > Happy > > to explain a bit about our experience if that would be valuable. > > That sounds interesting, can you describe the setup? >Sure. We implemented a zero-copy receive path that maps guest buffers received from the avail ring directly onto hardware receive buffers on a dedicated hardware receive queue for that VM (VMDq). This means that when the NIC receives a packet it stores it directly into the guest's memory but the vswitch has the opportunity to do as much or as little processing as it wants before making the packet available with a used ring descriptor. This scheme seems quite elegant to me. (I am sure it is not original - this is what the VMDq hardware feature is for, after all.) The devil is in the details though. I suspect it would work well given two extensions to Virtio-net: 1. The 'used' ring allow an offset where the payload starts. 2. The guest to always supply buffers with space for >= 2048 bytes of payload. but without these things it is tricky to satisfy the requirements of real NICs such as the Intel 10G ones. There are conflicting requirements. For example: - NIC requires buffer sizes to be uniform and a multiple of 1024 bytes. Guest suppliers variable-size buffers often of ~1500 bytes. These need to be either rounded down to 1024 bytes (causing excessive segmentation) or rounded up to 2048 bytes (requiring jumbo frames to be globally disabled on the port to avoid potential overruns). - Virtio-net with MRG_RXBUF expects the packet payload to be in a different offset for the first descriptor in a chain (offset 14 after the vnet header) vs following descriptions in the chain (offset 0). The NIC always stores packets at the same offset so the vswitch needs to pick one and then correct with memmove() when needed. - If the vswitch wants to shorten the packet payload, e.g. to remove encapsulation, then this will require a memmove() because there is no way to communicate an offset on the used ring. - The NIC has a limit to how many receive descriptors it can chain together. If the guest is supplying small buffers then this limit may be too low for jumbo frames to be received. ... and at a certain point we decided we were better off switching our focus away from clever-but-fragile NIC hacks and towards clever-and-robust SIMD hacks, and that is the path we have been on since a few months ago. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20150424/a212f9a0/attachment-0001.html>
Luke Gorrie
2015-Apr-24 13:10 UTC
[virtio-dev] Zerocopy VM-to-VM networking using virtio-net
On 24 April 2015 at 14:17, Luke Gorrie <luke at snabb.co> wrote:> For what it is worth, I think >Erm, sorry about ranting with my pre-existing ideas without having examined the proposed specification in detail. I have a long backlog of things that I have been meaning to discuss with the Virtio-net community but have not previously had time to. Humbly! -Luke -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20150424/a98f3e0f/attachment.html>
Stefan Hajnoczi
2015-Apr-24 13:22 UTC
[virtio-dev] Zerocopy VM-to-VM networking using virtio-net
On Fri, Apr 24, 2015 at 1:17 PM, Luke Gorrie <luke at snabb.co> wrote:> On 24 April 2015 at 11:47, Stefan Hajnoczi <stefanha at gmail.com> wrote: >> >> My concern is the overhead of the vhost_net component copying >> descriptors between NICs. > > > I see. So you would not have to reserve CPU resources for vswitches. Instead > you would give all cores to the VMs and they would pay for their own > networking. This would be especially appealing in the extreme case where all > networking is "Layer 1" connectivity between local virtual machines. > > This would make VM<->VM links different to VM<->network links. I suppose > that when you created VMs you would need to be conscious of whether or not > you are placing them on the same host or NUMA node so that you can predict > what network performance will be available.The motivation for making VM-to-VM fast is that while software switches on the host are efficient today (thanks to vhost-user), there is no efficient solution if the software switch is a VM. Have you had requests to run SnabbSwitch in a VM instead of on the host? For example, if someone wants to deploy it in a cloud environment they will not be allowed to run arbitrary software on the host. Stefan
Reasonably Related Threads
- [virtio-dev] Zerocopy VM-to-VM networking using virtio-net
- [virtio-dev] Zerocopy VM-to-VM networking using virtio-net
- [virtio-dev] Zerocopy VM-to-VM networking using virtio-net
- [virtio-dev] Zerocopy VM-to-VM networking using virtio-net
- [virtio-dev] Zerocopy VM-to-VM networking using virtio-net