Stefan Hajnoczi
2015-Apr-27 10:17 UTC
[virtio-dev] Zerocopy VM-to-VM networking using virtio-net
On Sun, Apr 26, 2015 at 2:24 PM, Luke Gorrie <luke at snabb.co> wrote:> On 24 April 2015 at 15:22, Stefan Hajnoczi <stefanha at gmail.com> wrote: >> >> The motivation for making VM-to-VM fast is that while software >> switches on the host are efficient today (thanks to vhost-user), there >> is no efficient solution if the software switch is a VM. > > > I see. This sounds like a noble goal indeed. I would love to run the > software switch as just another VM in the long term. It would make it much > easier for the various software switches to coexist in the world. > > The main technical risk I see in this proposal is that eliminating the > memory copies might not have the desired effect. I might be tempted to keep > the copies but prevent the kernel from having to inspect the vrings (more > like vhost-user). But that is just a hunch and I suppose the first step > would be a prototype to check the performance anyway. > > For what it is worth here is my view of networking performance on x86 in the > Haswell+ era: > https://groups.google.com/forum/#!topic/snabb-devel/aez4pEnd4owThanks. I've been thinking about how to eliminate the VM <-> host <-> VM switching and instead achieve just VM <-> VM. The holy grail of VM-to-VM networking is an exitless I/O path. In other words, packets can be transferred between VMs without any vmexits (this requires a polling driver). Here is how it works. QEMU gets "-device vhost-user" so that a VM can act as the vhost-user server: VM1 (virtio-net guest driver) <-> VM2 (vhost-user device) VM1 has a regular virtio-net PCI device. VM2 has a vhost-user device and plays the host role instead of the normal virtio-net guest driver role. The ugly thing about this is that VM2 needs to map all of VM1's guest RAM so it can access the vrings and packet data. The solution to this is something like the Shared Buffers BAR but this time it contains not just the packet data but also the vring, let's call it the Shared Virtqueues BAR. The Shared Virtqueues BAR eliminates the need for vhost-net on the host because VM1 and VM2 communicate directly using virtqueue notify or polling vring memory. Virtqueue notify works by connecting an eventfd as ioeventfd in VM1 and irqfd in VM2. And VM2 would also have an ioeventfd that is irqfd for VM1 to signal completions. Stefan
Michael S. Tsirkin
2015-Apr-27 10:36 UTC
[virtio-dev] Zerocopy VM-to-VM networking using virtio-net
On Mon, Apr 27, 2015 at 11:17:44AM +0100, Stefan Hajnoczi wrote:> On Sun, Apr 26, 2015 at 2:24 PM, Luke Gorrie <luke at snabb.co> wrote: > > On 24 April 2015 at 15:22, Stefan Hajnoczi <stefanha at gmail.com> wrote: > >> > >> The motivation for making VM-to-VM fast is that while software > >> switches on the host are efficient today (thanks to vhost-user), there > >> is no efficient solution if the software switch is a VM. > > > > > > I see. This sounds like a noble goal indeed. I would love to run the > > software switch as just another VM in the long term. It would make it much > > easier for the various software switches to coexist in the world. > > > > The main technical risk I see in this proposal is that eliminating the > > memory copies might not have the desired effect. I might be tempted to keep > > the copies but prevent the kernel from having to inspect the vrings (more > > like vhost-user). But that is just a hunch and I suppose the first step > > would be a prototype to check the performance anyway. > > > > For what it is worth here is my view of networking performance on x86 in the > > Haswell+ era: > > https://groups.google.com/forum/#!topic/snabb-devel/aez4pEnd4ow > > Thanks. > > I've been thinking about how to eliminate the VM <-> host <-> VM > switching and instead achieve just VM <-> VM. > > The holy grail of VM-to-VM networking is an exitless I/O path. In > other words, packets can be transferred between VMs without any > vmexits (this requires a polling driver). > > Here is how it works. QEMU gets "-device vhost-user" so that a VM can > act as the vhost-user server: > > VM1 (virtio-net guest driver) <-> VM2 (vhost-user device) > > VM1 has a regular virtio-net PCI device. VM2 has a vhost-user device > and plays the host role instead of the normal virtio-net guest driver > role. > > The ugly thing about this is that VM2 needs to map all of VM1's guest > RAM so it can access the vrings and packet data. The solution to this > is something like the Shared Buffers BAR but this time it contains not > just the packet data but also the vring, let's call it the Shared > Virtqueues BAR. > > The Shared Virtqueues BAR eliminates the need for vhost-net on the > host because VM1 and VM2 communicate directly using virtqueue notify > or polling vring memory. Virtqueue notify works by connecting an > eventfd as ioeventfd in VM1 and irqfd in VM2. And VM2 would also have > an ioeventfd that is irqfd for VM1 to signal completions. > > StefanSo this definitely works, it's just another virtio transport. Though this might mean guests need to copy data out to/from this BAR. -- MST
Jan Kiszka
2015-Apr-27 12:35 UTC
[virtio-dev] Zerocopy VM-to-VM networking using virtio-net
Am 2015-04-27 um 12:17 schrieb Stefan Hajnoczi:> On Sun, Apr 26, 2015 at 2:24 PM, Luke Gorrie <luke at snabb.co> wrote: >> On 24 April 2015 at 15:22, Stefan Hajnoczi <stefanha at gmail.com> wrote: >>> >>> The motivation for making VM-to-VM fast is that while software >>> switches on the host are efficient today (thanks to vhost-user), there >>> is no efficient solution if the software switch is a VM. >> >> >> I see. This sounds like a noble goal indeed. I would love to run the >> software switch as just another VM in the long term. It would make it much >> easier for the various software switches to coexist in the world. >> >> The main technical risk I see in this proposal is that eliminating the >> memory copies might not have the desired effect. I might be tempted to keep >> the copies but prevent the kernel from having to inspect the vrings (more >> like vhost-user). But that is just a hunch and I suppose the first step >> would be a prototype to check the performance anyway. >> >> For what it is worth here is my view of networking performance on x86 in the >> Haswell+ era: >> https://groups.google.com/forum/#!topic/snabb-devel/aez4pEnd4ow > > Thanks. > > I've been thinking about how to eliminate the VM <-> host <-> VM > switching and instead achieve just VM <-> VM. > > The holy grail of VM-to-VM networking is an exitless I/O path. In > other words, packets can be transferred between VMs without any > vmexits (this requires a polling driver). > > Here is how it works. QEMU gets "-device vhost-user" so that a VM can > act as the vhost-user server: > > VM1 (virtio-net guest driver) <-> VM2 (vhost-user device) > > VM1 has a regular virtio-net PCI device. VM2 has a vhost-user device > and plays the host role instead of the normal virtio-net guest driver > role. > > The ugly thing about this is that VM2 needs to map all of VM1's guest > RAM so it can access the vrings and packet data. The solution to this > is something like the Shared Buffers BAR but this time it contains not > just the packet data but also the vring, let's call it the Shared > Virtqueues BAR. > > The Shared Virtqueues BAR eliminates the need for vhost-net on the > host because VM1 and VM2 communicate directly using virtqueue notify > or polling vring memory. Virtqueue notify works by connecting an > eventfd as ioeventfd in VM1 and irqfd in VM2. And VM2 would also have > an ioeventfd that is irqfd for VM1 to signal completions.We had such a discussion before: http://thread.gmane.org/gmane.comp.emulators.kvm.devel/123014/focus=279658 Would be great to get this ball rolling again. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux
Jan Kiszka
2015-Apr-27 12:55 UTC
[virtio-dev] Zerocopy VM-to-VM networking using virtio-net
Am 2015-04-27 um 14:35 schrieb Jan Kiszka:> Am 2015-04-27 um 12:17 schrieb Stefan Hajnoczi: >> On Sun, Apr 26, 2015 at 2:24 PM, Luke Gorrie <luke at snabb.co> wrote: >>> On 24 April 2015 at 15:22, Stefan Hajnoczi <stefanha at gmail.com> wrote: >>>> >>>> The motivation for making VM-to-VM fast is that while software >>>> switches on the host are efficient today (thanks to vhost-user), there >>>> is no efficient solution if the software switch is a VM. >>> >>> >>> I see. This sounds like a noble goal indeed. I would love to run the >>> software switch as just another VM in the long term. It would make it much >>> easier for the various software switches to coexist in the world. >>> >>> The main technical risk I see in this proposal is that eliminating the >>> memory copies might not have the desired effect. I might be tempted to keep >>> the copies but prevent the kernel from having to inspect the vrings (more >>> like vhost-user). But that is just a hunch and I suppose the first step >>> would be a prototype to check the performance anyway. >>> >>> For what it is worth here is my view of networking performance on x86 in the >>> Haswell+ era: >>> https://groups.google.com/forum/#!topic/snabb-devel/aez4pEnd4ow >> >> Thanks. >> >> I've been thinking about how to eliminate the VM <-> host <-> VM >> switching and instead achieve just VM <-> VM. >> >> The holy grail of VM-to-VM networking is an exitless I/O path. In >> other words, packets can be transferred between VMs without any >> vmexits (this requires a polling driver). >> >> Here is how it works. QEMU gets "-device vhost-user" so that a VM can >> act as the vhost-user server: >> >> VM1 (virtio-net guest driver) <-> VM2 (vhost-user device) >> >> VM1 has a regular virtio-net PCI device. VM2 has a vhost-user device >> and plays the host role instead of the normal virtio-net guest driver >> role. >> >> The ugly thing about this is that VM2 needs to map all of VM1's guest >> RAM so it can access the vrings and packet data. The solution to this >> is something like the Shared Buffers BAR but this time it contains not >> just the packet data but also the vring, let's call it the Shared >> Virtqueues BAR. >> >> The Shared Virtqueues BAR eliminates the need for vhost-net on the >> host because VM1 and VM2 communicate directly using virtqueue notify >> or polling vring memory. Virtqueue notify works by connecting an >> eventfd as ioeventfd in VM1 and irqfd in VM2. And VM2 would also have >> an ioeventfd that is irqfd for VM1 to signal completions. > > We had such a discussion before: > http://thread.gmane.org/gmane.comp.emulators.kvm.devel/123014/focus=279658 > > Would be great to get this ball rolling again. > > Jan >But one challenge would remain even then (unless both sides only poll): exit-free inter-VM signaling, no? But that's a hardware issue first of all. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux
Stefan Hajnoczi
2015-Apr-27 12:57 UTC
[virtio-dev] Zerocopy VM-to-VM networking using virtio-net
On Mon, Apr 27, 2015 at 1:35 PM, Jan Kiszka <jan.kiszka at siemens.com> wrote:> Am 2015-04-27 um 12:17 schrieb Stefan Hajnoczi: >> On Sun, Apr 26, 2015 at 2:24 PM, Luke Gorrie <luke at snabb.co> wrote: >>> On 24 April 2015 at 15:22, Stefan Hajnoczi <stefanha at gmail.com> wrote: >>>> >>>> The motivation for making VM-to-VM fast is that while software >>>> switches on the host are efficient today (thanks to vhost-user), there >>>> is no efficient solution if the software switch is a VM. >>> >>> >>> I see. This sounds like a noble goal indeed. I would love to run the >>> software switch as just another VM in the long term. It would make it much >>> easier for the various software switches to coexist in the world. >>> >>> The main technical risk I see in this proposal is that eliminating the >>> memory copies might not have the desired effect. I might be tempted to keep >>> the copies but prevent the kernel from having to inspect the vrings (more >>> like vhost-user). But that is just a hunch and I suppose the first step >>> would be a prototype to check the performance anyway. >>> >>> For what it is worth here is my view of networking performance on x86 in the >>> Haswell+ era: >>> https://groups.google.com/forum/#!topic/snabb-devel/aez4pEnd4ow >> >> Thanks. >> >> I've been thinking about how to eliminate the VM <-> host <-> VM >> switching and instead achieve just VM <-> VM. >> >> The holy grail of VM-to-VM networking is an exitless I/O path. In >> other words, packets can be transferred between VMs without any >> vmexits (this requires a polling driver). >> >> Here is how it works. QEMU gets "-device vhost-user" so that a VM can >> act as the vhost-user server: >> >> VM1 (virtio-net guest driver) <-> VM2 (vhost-user device) >> >> VM1 has a regular virtio-net PCI device. VM2 has a vhost-user device >> and plays the host role instead of the normal virtio-net guest driver >> role. >> >> The ugly thing about this is that VM2 needs to map all of VM1's guest >> RAM so it can access the vrings and packet data. The solution to this >> is something like the Shared Buffers BAR but this time it contains not >> just the packet data but also the vring, let's call it the Shared >> Virtqueues BAR. >> >> The Shared Virtqueues BAR eliminates the need for vhost-net on the >> host because VM1 and VM2 communicate directly using virtqueue notify >> or polling vring memory. Virtqueue notify works by connecting an >> eventfd as ioeventfd in VM1 and irqfd in VM2. And VM2 would also have >> an ioeventfd that is irqfd for VM1 to signal completions. > > We had such a discussion before: > http://thread.gmane.org/gmane.comp.emulators.kvm.devel/123014/focus=279658 > > Would be great to get this ball rolling again.Thanks for the interesting link. Now that vhost-user exists, a QEMU -device vhost-user feature is a logical step. It would allow any virtio device to be emulated by another VM, not just virtio-net. It seems like a nice model for storage and networking appliance VMs. I don't have time to write the patches in the near future but can participate in code review and discussion. Stefan
Michael S. Tsirkin
2015-Apr-27 13:17 UTC
[virtio-dev] Zerocopy VM-to-VM networking using virtio-net
On Mon, Apr 27, 2015 at 02:35:19PM +0200, Jan Kiszka wrote:> Am 2015-04-27 um 12:17 schrieb Stefan Hajnoczi: > > On Sun, Apr 26, 2015 at 2:24 PM, Luke Gorrie <luke at snabb.co> wrote: > >> On 24 April 2015 at 15:22, Stefan Hajnoczi <stefanha at gmail.com> wrote: > >>> > >>> The motivation for making VM-to-VM fast is that while software > >>> switches on the host are efficient today (thanks to vhost-user), there > >>> is no efficient solution if the software switch is a VM. > >> > >> > >> I see. This sounds like a noble goal indeed. I would love to run the > >> software switch as just another VM in the long term. It would make it much > >> easier for the various software switches to coexist in the world. > >> > >> The main technical risk I see in this proposal is that eliminating the > >> memory copies might not have the desired effect. I might be tempted to keep > >> the copies but prevent the kernel from having to inspect the vrings (more > >> like vhost-user). But that is just a hunch and I suppose the first step > >> would be a prototype to check the performance anyway. > >> > >> For what it is worth here is my view of networking performance on x86 in the > >> Haswell+ era: > >> https://groups.google.com/forum/#!topic/snabb-devel/aez4pEnd4ow > > > > Thanks. > > > > I've been thinking about how to eliminate the VM <-> host <-> VM > > switching and instead achieve just VM <-> VM. > > > > The holy grail of VM-to-VM networking is an exitless I/O path. In > > other words, packets can be transferred between VMs without any > > vmexits (this requires a polling driver). > > > > Here is how it works. QEMU gets "-device vhost-user" so that a VM can > > act as the vhost-user server: > > > > VM1 (virtio-net guest driver) <-> VM2 (vhost-user device) > > > > VM1 has a regular virtio-net PCI device. VM2 has a vhost-user device > > and plays the host role instead of the normal virtio-net guest driver > > role. > > > > The ugly thing about this is that VM2 needs to map all of VM1's guest > > RAM so it can access the vrings and packet data. The solution to this > > is something like the Shared Buffers BAR but this time it contains not > > just the packet data but also the vring, let's call it the Shared > > Virtqueues BAR. > > > > The Shared Virtqueues BAR eliminates the need for vhost-net on the > > host because VM1 and VM2 communicate directly using virtqueue notify > > or polling vring memory. Virtqueue notify works by connecting an > > eventfd as ioeventfd in VM1 and irqfd in VM2. And VM2 would also have > > an ioeventfd that is irqfd for VM1 to signal completions. > > We had such a discussion before: > http://thread.gmane.org/gmane.comp.emulators.kvm.devel/123014/focus=279658 > > Would be great to get this ball rolling again. > > JanI think fundamentally, reducing the stress on the host scheduler can give a bigger gain than zero copy. But if I was to implement this, I wouldn't start with the funky virtio BAR thing. Start by enabling DPDK vhost-port within guest as-is. To this end, we can try implementing virtio-vhost: Assume we want to bridge VMX and VMY using bridge in VMB. - expose all of VMX and VMY memory as device BARs, or as some other region within VMB memory - add interface to send vhost-user messages to VMB (and ack them) the messages include tables that translate from VMX/VMY physical to VMB physical. The simplest guest driver then just copies from VMX TX ring to VMY RX ring, and vice versa. This will let you test performance somewhat easily. When used as a linux netdev, we probably will have to do extra data copies, at least initially. The point is that you get full interoperability with existing virtio, and test performance without rewriting everything first. One nice property is that KVM can log accesses for us. By detecting VMB accesses to memory of VMX and forwarding them to QEMU running VMX, we can make migration work out of box. This might also mean vringh code is reusable to make a linux driver for this device - IIRC dirty logging was the biggest hurdle to make vringh work well for vhost.> -- > Siemens AG, Corporate Technology, CT RTC ITP SES-DE > Corporate Competence Center Embedded Linux
Possibly Parallel Threads
- [virtio-dev] Zerocopy VM-to-VM networking using virtio-net
- [virtio-dev] Zerocopy VM-to-VM networking using virtio-net
- [virtio-dev] Zerocopy VM-to-VM networking using virtio-net
- [virtio-dev] Zerocopy VM-to-VM networking using virtio-net
- [virtio-dev] Zerocopy VM-to-VM networking using virtio-net