thr3ads.net - Linux Virtualization - [virtio-dev] Zerocopy VM-to-VM networking using virtio-net [Apr 2015]

If this information is useful, please help other people find it:
Share via:

Stefan Hajnoczi

2015-Apr-27 13:01 UTC

[virtio-dev] Zerocopy VM-to-VM networking using virtio-net

On Mon, Apr 27, 2015 at 1:55 PM, Jan Kiszka <jan.kiszka at siemens.com>
wrote:> Am 2015-04-27 um 14:35 schrieb Jan Kiszka:
>> Am 2015-04-27 um 12:17 schrieb Stefan Hajnoczi:
>>> On Sun, Apr 26, 2015 at 2:24 PM, Luke Gorrie <luke at
snabb.co> wrote:
>>>> On 24 April 2015 at 15:22, Stefan Hajnoczi <stefanha at
gmail.com> wrote:
>>>>>
>>>>> The motivation for making VM-to-VM fast is that while
software
>>>>> switches on the host are efficient today (thanks to
vhost-user), there
>>>>> is no efficient solution if the software switch is a VM.
>>>>
>>>>
>>>> I see. This sounds like a noble goal indeed. I would love to
run the
>>>> software switch as just another VM in the long term. It would
make it much
>>>> easier for the various software switches to coexist in the
world.
>>>>
>>>> The main technical risk I see in this proposal is that
eliminating the
>>>> memory copies might not have the desired effect. I might be
tempted to keep
>>>> the copies but prevent the kernel from having to inspect the
vrings (more
>>>> like vhost-user). But that is just a hunch and I suppose the
first step
>>>> would be a prototype to check the performance anyway.
>>>>
>>>> For what it is worth here is my view of networking performance
on x86 in the
>>>> Haswell+ era:
>>>> https://groups.google.com/forum/#!topic/snabb-devel/aez4pEnd4ow
>>>
>>> Thanks.
>>>
>>> I've been thinking about how to eliminate the VM <-> host
<-> VM
>>> switching and instead achieve just VM <-> VM.
>>>
>>> The holy grail of VM-to-VM networking is an exitless I/O path.  In
>>> other words, packets can be transferred between VMs without any
>>> vmexits (this requires a polling driver).
>>>
>>> Here is how it works.  QEMU gets "-device vhost-user" so
that a VM can
>>> act as the vhost-user server:
>>>
>>> VM1 (virtio-net guest driver) <-> VM2 (vhost-user device)
>>>
>>> VM1 has a regular virtio-net PCI device.  VM2 has a vhost-user
device
>>> and plays the host role instead of the normal virtio-net guest
driver
>>> role.
>>>
>>> The ugly thing about this is that VM2 needs to map all of VM1's
guest
>>> RAM so it can access the vrings and packet data.  The solution to
this
>>> is something like the Shared Buffers BAR but this time it contains
not
>>> just the packet data but also the vring, let's call it the
Shared
>>> Virtqueues BAR.
>>>
>>> The Shared Virtqueues BAR eliminates the need for vhost-net on the
>>> host because VM1 and VM2 communicate directly using virtqueue
notify
>>> or polling vring memory.  Virtqueue notify works by connecting an
>>> eventfd as ioeventfd in VM1 and irqfd in VM2.  And VM2 would also
have
>>> an ioeventfd that is irqfd for VM1 to signal completions.
>>
>> We had such a discussion before:
>>
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/123014/focus=279658
>>
>> Would be great to get this ball rolling again.
>>
>> Jan
>>
>
> But one challenge would remain even then (unless both sides only poll):
> exit-free inter-VM signaling, no? But that's a hardware issue first of
all.
To start with ioeventfd<->irqfd can be used.  It incurs a light-weight
exit in VM1 and interrupt injection in VM2.

For networking the cost is mitigated by NAPI drivers which switch
between interrupts and polling.  During notification-heavy periods the
guests would use polling anyway.

A hardware solution would be some kind of inter-guest interrupt
injection.  I don't know VMX well enough to know whether that is
possible on Intel CPUs.

Stefan

Muli Ben-Yehuda

2015-Apr-27 13:08 UTC

head link

[virtio-dev] Zerocopy VM-to-VM networking using virtio-net

On Mon, Apr 27, 2015 at 02:01:05PM +0100, Stefan Hajnoczi wrote:
> A hardware solution would be some kind of inter-guest interrupt
> injection.  I don't know VMX well enough to know whether that is
> possible on Intel CPUs.
It is: http://www.mulix.org/pubs/eli/eli.pdf.

(And there's hardware coming down the pipe that will make (some) of
the nasty tricks we used unnecessary).

Cheers,
Muli

Jan Kiszka

2015-Apr-27 14:30 UTC

head link

[virtio-dev] Zerocopy VM-to-VM networking using virtio-net

Am 2015-04-27 um 15:01 schrieb Stefan Hajnoczi:> On Mon, Apr 27, 2015 at 1:55 PM, Jan Kiszka <jan.kiszka at
siemens.com> wrote:
>> Am 2015-04-27 um 14:35 schrieb Jan Kiszka:
>>> Am 2015-04-27 um 12:17 schrieb Stefan Hajnoczi:
>>>> On Sun, Apr 26, 2015 at 2:24 PM, Luke Gorrie <luke at
snabb.co> wrote:
>>>>> On 24 April 2015 at 15:22, Stefan Hajnoczi <stefanha at
gmail.com> wrote:
>>>>>>
>>>>>> The motivation for making VM-to-VM fast is that while
software
>>>>>> switches on the host are efficient today (thanks to
vhost-user), there
>>>>>> is no efficient solution if the software switch is a
VM.
>>>>>
>>>>>
>>>>> I see. This sounds like a noble goal indeed. I would love
to run the
>>>>> software switch as just another VM in the long term. It
would make it much
>>>>> easier for the various software switches to coexist in the
world.
>>>>>
>>>>> The main technical risk I see in this proposal is that
eliminating the
>>>>> memory copies might not have the desired effect. I might be
tempted to keep
>>>>> the copies but prevent the kernel from having to inspect
the vrings (more
>>>>> like vhost-user). But that is just a hunch and I suppose
the first step
>>>>> would be a prototype to check the performance anyway.
>>>>>
>>>>> For what it is worth here is my view of networking
performance on x86 in the
>>>>> Haswell+ era:
>>>>>
https://groups.google.com/forum/#!topic/snabb-devel/aez4pEnd4ow
>>>>
>>>> Thanks.
>>>>
>>>> I've been thinking about how to eliminate the VM <->
host <-> VM
>>>> switching and instead achieve just VM <-> VM.
>>>>
>>>> The holy grail of VM-to-VM networking is an exitless I/O path. 
In
>>>> other words, packets can be transferred between VMs without any
>>>> vmexits (this requires a polling driver).
>>>>
>>>> Here is how it works.  QEMU gets "-device vhost-user"
so that a VM can
>>>> act as the vhost-user server:
>>>>
>>>> VM1 (virtio-net guest driver) <-> VM2 (vhost-user device)
>>>>
>>>> VM1 has a regular virtio-net PCI device.  VM2 has a vhost-user
device
>>>> and plays the host role instead of the normal virtio-net guest
driver
>>>> role.
>>>>
>>>> The ugly thing about this is that VM2 needs to map all of
VM1's guest
>>>> RAM so it can access the vrings and packet data.  The solution
to this
>>>> is something like the Shared Buffers BAR but this time it
contains not
>>>> just the packet data but also the vring, let's call it the
Shared
>>>> Virtqueues BAR.
>>>>
>>>> The Shared Virtqueues BAR eliminates the need for vhost-net on
the
>>>> host because VM1 and VM2 communicate directly using virtqueue
notify
>>>> or polling vring memory.  Virtqueue notify works by connecting
an
>>>> eventfd as ioeventfd in VM1 and irqfd in VM2.  And VM2 would
also have
>>>> an ioeventfd that is irqfd for VM1 to signal completions.
>>>
>>> We had such a discussion before:
>>>
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/123014/focus=279658
>>>
>>> Would be great to get this ball rolling again.
>>>
>>> Jan
>>>
>>
>> But one challenge would remain even then (unless both sides only poll):
>> exit-free inter-VM signaling, no? But that's a hardware issue first
of all.
> 
> To start with ioeventfd<->irqfd can be used.  It incurs a
light-weight
> exit in VM1 and interrupt injection in VM2.
> 
> For networking the cost is mitigated by NAPI drivers which switch
> between interrupts and polling.  During notification-heavy periods the
> guests would use polling anyway.
> 
> A hardware solution would be some kind of inter-guest interrupt
> injection.  I don't know VMX well enough to know whether that is
> possible on Intel CPUs.
Today, we have posted interrupts to avoid the vm-exit on the target CPU,
but there is nothing yet (to my best knowledge) to avoid the exit on the
sender side (unless we ignore security). That's the same problem with
intra-guest IPIs, BTW.

For throughput and given NAPI patterns, that's probably not an issue as
you noted. It may be for latency, though, when almost every cycle counts.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

Luke Gorrie

2015-Apr-27 14:36 UTC

head link

[virtio-dev] Zerocopy VM-to-VM networking using virtio-net

On 27 April 2015 at 16:30, Jan Kiszka <jan.kiszka at siemens.com> wrote:
> Today, we have posted interrupts to avoid the vm-exit on the target CPU,
> but there is nothing yet (to my best knowledge) to avoid the exit on the
> sender side (unless we ignore security). That's the same problem with
> intra-guest IPIs, BTW.
>
> For throughput and given NAPI patterns, that's probably not an issue as
> you noted. It may be for latency, though, when almost every cycle counts.
>
Poll-mode networking applications (DPDK, Snabb Switch, etc) are typically
busy-looping to poll the vring. They may have a very short usleep() between
checks to save power but they don't wait on their eventfd. So for those
particular applications latency is on the order of tens of microseconds
even without guest exits.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20150427/fba2ae24/attachment.html>

Michael S. Tsirkin

2015-Apr-27 14:40 UTC

head link

[virtio-dev] Zerocopy VM-to-VM networking using virtio-net

On Mon, Apr 27, 2015 at 04:30:35PM +0200, Jan Kiszka
wrote:> Am 2015-04-27 um 15:01 schrieb Stefan Hajnoczi:
> > On Mon, Apr 27, 2015 at 1:55 PM, Jan Kiszka <jan.kiszka at
siemens.com> wrote:
> >> Am 2015-04-27 um 14:35 schrieb Jan Kiszka:
> >>> Am 2015-04-27 um 12:17 schrieb Stefan Hajnoczi:
> >>>> On Sun, Apr 26, 2015 at 2:24 PM, Luke Gorrie <luke at
snabb.co> wrote:
> >>>>> On 24 April 2015 at 15:22, Stefan Hajnoczi
<stefanha at gmail.com> wrote:
> >>>>>>
> >>>>>> The motivation for making VM-to-VM fast is that
while software
> >>>>>> switches on the host are efficient today (thanks
to vhost-user), there
> >>>>>> is no efficient solution if the software switch is
a VM.
> >>>>>
> >>>>>
> >>>>> I see. This sounds like a noble goal indeed. I would
love to run the
> >>>>> software switch as just another VM in the long term.
It would make it much
> >>>>> easier for the various software switches to coexist in
the world.
> >>>>>
> >>>>> The main technical risk I see in this proposal is that
eliminating the
> >>>>> memory copies might not have the desired effect. I
might be tempted to keep
> >>>>> the copies but prevent the kernel from having to
inspect the vrings (more
> >>>>> like vhost-user). But that is just a hunch and I
suppose the first step
> >>>>> would be a prototype to check the performance anyway.
> >>>>>
> >>>>> For what it is worth here is my view of networking
performance on x86 in the
> >>>>> Haswell+ era:
> >>>>>
https://groups.google.com/forum/#!topic/snabb-devel/aez4pEnd4ow
> >>>>
> >>>> Thanks.
> >>>>
> >>>> I've been thinking about how to eliminate the VM
<-> host <-> VM
> >>>> switching and instead achieve just VM <-> VM.
> >>>>
> >>>> The holy grail of VM-to-VM networking is an exitless I/O
path.  In
> >>>> other words, packets can be transferred between VMs
without any
> >>>> vmexits (this requires a polling driver).
> >>>>
> >>>> Here is how it works.  QEMU gets "-device
vhost-user" so that a VM can
> >>>> act as the vhost-user server:
> >>>>
> >>>> VM1 (virtio-net guest driver) <-> VM2 (vhost-user
device)
> >>>>
> >>>> VM1 has a regular virtio-net PCI device.  VM2 has a
vhost-user device
> >>>> and plays the host role instead of the normal virtio-net
guest driver
> >>>> role.
> >>>>
> >>>> The ugly thing about this is that VM2 needs to map all of
VM1's guest
> >>>> RAM so it can access the vrings and packet data.  The
solution to this
> >>>> is something like the Shared Buffers BAR but this time it
contains not
> >>>> just the packet data but also the vring, let's call it
the Shared
> >>>> Virtqueues BAR.
> >>>>
> >>>> The Shared Virtqueues BAR eliminates the need for
vhost-net on the
> >>>> host because VM1 and VM2 communicate directly using
virtqueue notify
> >>>> or polling vring memory.  Virtqueue notify works by
connecting an
> >>>> eventfd as ioeventfd in VM1 and irqfd in VM2.  And VM2
would also have
> >>>> an ioeventfd that is irqfd for VM1 to signal completions.
> >>>
> >>> We had such a discussion before:
> >>>
http://thread.gmane.org/gmane.comp.emulators.kvm.devel/123014/focus=279658
> >>>
> >>> Would be great to get this ball rolling again.
> >>>
> >>> Jan
> >>>
> >>
> >> But one challenge would remain even then (unless both sides only
poll):
> >> exit-free inter-VM signaling, no? But that's a hardware issue
first of all.
> > 
> > To start with ioeventfd<->irqfd can be used.  It incurs a
light-weight
> > exit in VM1 and interrupt injection in VM2.
> > 
> > For networking the cost is mitigated by NAPI drivers which switch
> > between interrupts and polling.  During notification-heavy periods the
> > guests would use polling anyway.
> > 
> > A hardware solution would be some kind of inter-guest interrupt
> > injection.  I don't know VMX well enough to know whether that is
> > possible on Intel CPUs.
> 
> Today, we have posted interrupts to avoid the vm-exit on the target CPU,
> but there is nothing yet (to my best knowledge) to avoid the exit on the
> sender side (unless we ignore security). That's the same problem with
> intra-guest IPIs, BTW.
> 
> For throughput and given NAPI patterns, that's probably not an issue as
> you noted. It may be for latency, though, when almost every cycle counts.
> 
> Jan
If you are counting cycles you likely can't afford the
interrupt latency under linux, so you have to poll
memory.
> -- 
> Siemens AG, Corporate Technology, CT RTC ITP SES-DE
> Corporate Competence Center Embedded Linux

Apparently Analagous Threads

Search for more seemingly similar threads

Linux Virtualization - Apr 2015 - [virtio-dev] Zerocopy VM-to-VM networking using virtio-net

[virtio-dev] Zerocopy VM-to-VM networking using virtio-net

[virtio-dev] Zerocopy VM-to-VM networking using virtio-net

[virtio-dev] Zerocopy VM-to-VM networking using virtio-net

[virtio-dev] Zerocopy VM-to-VM networking using virtio-net

[virtio-dev] Zerocopy VM-to-VM networking using virtio-net

Apparently Analagous Threads