thr3ads.net - Linux Virtualization - [virtio-dev] Zerocopy VM-to-VM networking using virtio-net [Apr 2015]

If this information is useful, please help other people find it:
Share via:

Stefan Hajnoczi

2015-Apr-22 17:01 UTC

Zerocopy VM-to-VM networking using virtio-net

[It may be necessary to remove virtio-dev at lists.oasis-open.org from CC
if you are a non-TC member.]

Hi,
Some modern networking applications bypass the kernel network stack so
that rx/tx rings and DMA buffers can be directly mapped. This is
typical in DPDK applications where virtio-net currently is one of
several NIC choices.

Existing virtio-net implementations are not optimized for VM-to-VM
DPDK-style networking. The following outline describes a zero-copy
virtio-net solution for VM-to-VM networking.

Thanks to Paolo Bonzini for the Shared Buffers BAR idea.

Use case
--------
Two VMs on the same host need to communicate in the most efficient
manner possible (e.g. the sole purpose of the VMs is to do network I/O).

Applications running inside the VMs implement virtio-net in userspace so
they have full control over rx/tx rings and data buffer placement.

Performance requirements are higher priority than security or isolation.
If this bothers you, stick to classic virtio-net.

virtio-net VM-to-VM extensions
------------------------------
A few extensions to virtio-net are necessary to support zero-copy
VM-to-VM communication. The extensions are covered informally
throughout the text, this is not a VIRTIO specification change proposal.

The VM-to-VM capable virtio-net PCI adapter has an additional MMIO BAR
called the Shared Buffers BAR. The Shared Buffers BAR is a shared
memory region on the host so that the virtio-net devices in VM1 and VM2
both access the same region of memory.

The vring is still allocated in guest RAM as usual but data buffers must
be located in the Shared Buffers BAR in order to take advantage of
zero-copy.

When VM1 places a packet into the tx queue and the buffers are located
in the Shared Buffers BAR, the host finds the VM2's rx queue descriptor
with the same buffer address and completes it without copying any data
buffers.

Shared buffer allocation
------------------------
A simple scheme for two cooperating VMs to manage the Shared Buffers BAR
is as follows:

VM1 VM2
+---+
rx->| 1 |<-tx
+---+
tx->| 2 |<-rx
+---+
Shared Buffers

This is a trivial example where the Shared Buffers BAR has only two
packet buffers.

VM1 starts by putting buffer 1 in its rx queue. VM2 starts by putting
buffer 2 in its rx queue. The VMs know which buffers to choose based on
a new uint8_t virtio_net_config.shared_buffers_offset field (0 for VM1
and 1 for VM2).

VM1 can transmit to VM2 by filling buffer 2 and placing it on its tx
queue. VM2 can transmit by filling buffer 1 and placing it on its tx
queue.

As soon as a buffer is placed on a tx queue, the VM passes ownership of
the buffer to the other VM. In other words, the buffer must not be
touched even after virtio-net tx completion because it now belongs to
the other VM.

This scheme of bouncing ownership back-and-forth between the two VMs
only works if both VMs transmit an equal number of buffers over time.
In reality the traffic pattern may be unbalanced so VM1 is always
transmitting and VM2 is always receiving. This problem can be overcome
if the VMs cooperate and return buffers if they accumulate too many.

For example, after VM1 transmits buffer 2 it has run out of tx buffers:

VM1 VM2
+---+
rx->| 1 |<-tx
+---+
X->| 2 |<-rx
+---+

VM2 notices that it now holds all buffers. It can donate a buffer back
to VM1 by putting it on the tx queue with the new virtio_net_hdr.flags
VIRTIO_NET_HDR_F_GIFT_BUFFER flag. This flag indicates that this is not
a packet but rather an empty gifted buffer. VM1 checks the flags field
to detect that it has been gifted buffers.

Also note that zero-copy networking is not mutually exclusive with
classic virtio-net. If the descriptor has buffer addresses outside the
Shared Buffers BAR, then classic non-zero-copy virtio-net behavior
occurs.

Host-side implementation
------------------------
The host facilitates zero-copy VM-to-VM communication by taking
descriptors off tx queues and filling in rx descriptors of the paired
VM. In the Linux vhost_net implementation this could work as follows:

1. VM1 places buffer 2 on the tx queue and kicks the host. Ownership of
the buffer no longer belongs to VM1.
2. vhost_net pops the buffer from VM1's tx queue and verifies that the
buffer address is within the Shared Buffers BAR.
3. vhost_net finds the VM2 rx queue descriptor whose buffer address
matches, completes that descriptor, and kicks VM2.
4. VM2 pops buffer 2 from the rx queue. It can now reuse this buffer
for transmitting to VM1.

The vhost_net.ko kernel module needs a new ioctl for pairing vhost_net
instances. This ioctl is used to establish the VM-to-VM connection
between VM1's virtio-net and VM2's virtio-net.

Discussion
----------
The result is that applications in separate VMs can communicate in true
zero-copy fashion.

I think this approach could be fruitful in bringing virtio-net to
VM-to-VM networking use cases. Unless virtio-net is extended for this
use case, I'm afraid DPDK and OpenDataPlane communities might steer
clear of VIRTIO.

This is an idea I want to share but I'm not working on a prototype.
Feel free to flesh it out further and try it!

Open issues:
* Multiple VMs?
* Multiqueue?
* Choice of shared buffer allocation algorithm?
* etc

Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL:
<http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20150422/0cb84eb9/attachment.sig>

Cornelia Huck

2015-Apr-22 17:46 UTC

head link

Zerocopy VM-to-VM networking using virtio-net

On Wed, 22 Apr 2015 18:01:38 +0100
Stefan Hajnoczi <stefanha at redhat.com> wrote:
> [It may be necessary to remove virtio-dev at lists.oasis-open.org from CC
> if you are a non-TC member.]
> 
> Hi,
> Some modern networking applications bypass the kernel network stack so
> that rx/tx rings and DMA buffers can be directly mapped.  This is
> typical in DPDK applications where virtio-net currently is one of
> several NIC choices.
> 
> Existing virtio-net implementations are not optimized for VM-to-VM
> DPDK-style networking.  The following outline describes a zero-copy
> virtio-net solution for VM-to-VM networking.
> 
> Thanks to Paolo Bonzini for the Shared Buffers BAR idea.
> 
> Use case
> --------
> Two VMs on the same host need to communicate in the most efficient
> manner possible (e.g. the sole purpose of the VMs is to do network I/O).
> 
> Applications running inside the VMs implement virtio-net in userspace so
> they have full control over rx/tx rings and data buffer placement.
Wouldn't that also benefit applications that use a kernel
implementation? You still need to get the data to/from kernel space,
but you'd get the benefit of being able to get the data to the peer
immediately.
> 
> Performance requirements are higher priority than security or isolation.
> If this bothers you, stick to classic virtio-net.
> 
> virtio-net VM-to-VM extensions
> ------------------------------
> A few extensions to virtio-net are necessary to support zero-copy
> VM-to-VM communication.  The extensions are covered informally
> throughout the text, this is not a VIRTIO specification change proposal.
> 
> The VM-to-VM capable virtio-net PCI adapter has an additional MMIO BAR
> called the Shared Buffers BAR.  The Shared Buffers BAR is a shared
> memory region on the host so that the virtio-net devices in VM1 and VM2
> both access the same region of memory.
> 
> The vring is still allocated in guest RAM as usual but data buffers must
> be located in the Shared Buffers BAR in order to take advantage of
> zero-copy.
> 
> When VM1 places a packet into the tx queue and the buffers are located
> in the Shared Buffers BAR, the host finds the VM2's rx queue descriptor
> with the same buffer address and completes it without copying any data
> buffers.
The shared buffers BAR looks PCI-specific, but what about other
mechanisms to provide a shared space between two VMs with some kind of
lightweight notifications? This should make it possible to implement a
similar mode of operation for other transports if it is factored out
correctly. (The actual implementation of this shared space is probably
the difficult part :)
> 
> Shared buffer allocation
> ------------------------
> A simple scheme for two cooperating VMs to manage the Shared Buffers BAR
> is as follows:
> 
>   VM1         VM2
>        +---+
>    rx->| 1 |<-tx
>        +---+
>    tx->| 2 |<-rx
>        +---+
>    Shared Buffers
> 
> This is a trivial example where the Shared Buffers BAR has only two
> packet buffers.
> 
> VM1 starts by putting buffer 1 in its rx queue.  VM2 starts by putting
> buffer 2 in its rx queue.  The VMs know which buffers to choose based on
> a new uint8_t virtio_net_config.shared_buffers_offset field (0 for VM1
> and 1 for VM2).
> 
> VM1 can transmit to VM2 by filling buffer 2 and placing it on its tx
> queue.  VM2 can transmit by filling buffer 1 and placing it on its tx
> queue.
> 
> As soon as a buffer is placed on a tx queue, the VM passes ownership of
> the buffer to the other VM.  In other words, the buffer must not be
> touched even after virtio-net tx completion because it now belongs to
> the other VM.
> 
> This scheme of bouncing ownership back-and-forth between the two VMs
> only works if both VMs transmit an equal number of buffers over time.
> In reality the traffic pattern may be unbalanced so VM1 is always
> transmitting and VM2 is always receiving.  This problem can be overcome
> if the VMs cooperate and return buffers if they accumulate too many.
> 
> For example, after VM1 transmits buffer 2 it has run out of tx buffers:
> 
>   VM1         VM2
>        +---+
>    rx->| 1 |<-tx
>        +---+
>     X->| 2 |<-rx
>        +---+
> 
> VM2 notices that it now holds all buffers.  It can donate a buffer back
> to VM1 by putting it on the tx queue with the new virtio_net_hdr.flags
> VIRTIO_NET_HDR_F_GIFT_BUFFER flag.  This flag indicates that this is not
> a packet but rather an empty gifted buffer.  VM1 checks the flags field
> to detect that it has been gifted buffers.
> 
> Also note that zero-copy networking is not mutually exclusive with
> classic virtio-net.  If the descriptor has buffer addresses outside the
> Shared Buffers BAR, then classic non-zero-copy virtio-net behavior
> occurs.
Is simply writing the values in the header enough to trigger the other
side? You don't need some kind of notification? (I'm obviously coming
from a non-PCI view, and for my kind-of-nebulous idea I'd need a
lightweight interrupt so that the other side knows it should check the
header.)
> 
> Host-side implementation
> ------------------------
> The host facilitates zero-copy VM-to-VM communication by taking
> descriptors off tx queues and filling in rx descriptors of the paired
> VM.  In the Linux vhost_net implementation this could work as follows:
> 
> 1. VM1 places buffer 2 on the tx queue and kicks the host.  Ownership of
>    the buffer no longer belongs to VM1.
> 2. vhost_net pops the buffer from VM1's tx queue and verifies that the
>    buffer address is within the Shared Buffers BAR.
> 3. vhost_net finds the VM2 rx queue descriptor whose buffer address
>    matches, completes that descriptor, and kicks VM2.
> 4. VM2 pops buffer 2 from the rx queue.  It can now reuse this buffer
>    for transmitting to VM1.
> 
> The vhost_net.ko kernel module needs a new ioctl for pairing vhost_net
> instances.  This ioctl is used to establish the VM-to-VM connection
> between VM1's virtio-net and VM2's virtio-net.
> 
> Discussion
> ----------
> The result is that applications in separate VMs can communicate in true
> zero-copy fashion.
> 
> I think this approach could be fruitful in bringing virtio-net to
> VM-to-VM networking use cases.  Unless virtio-net is extended for this
> use case, I'm afraid DPDK and OpenDataPlane communities might steer
> clear of VIRTIO.
> 
> This is an idea I want to share but I'm not working on a prototype.
> Feel free to flesh it out further and try it!
Definetly interesting. It seems you get much of the needed
infrastructure by simply leveraging what PCI gives you anyway? If we
want something like in other environments (say, via ccw on s390), we'd
have to come up with a mechanism that can give us the same (which is
probably the hard part).
> 
> Open issues:
>  * Multiple VMs?
>  * Multiqueue?
>  * Choice of shared buffer allocation algorithm?
>  * etc
> 
> Stefan

Stefan Hajnoczi

2015-Apr-22 18:00 UTC

head link

Zerocopy VM-to-VM networking using virtio-net

On Wed, Apr 22, 2015 at 6:46 PM, Cornelia Huck <cornelia.huck at
de.ibm.com> wrote:> On Wed, 22 Apr 2015 18:01:38 +0100
> Stefan Hajnoczi <stefanha at redhat.com> wrote:
>
>> [It may be necessary to remove virtio-dev at lists.oasis-open.org from
CC
>> if you are a non-TC member.]
>>
>> Hi,
>> Some modern networking applications bypass the kernel network stack so
>> that rx/tx rings and DMA buffers can be directly mapped.  This is
>> typical in DPDK applications where virtio-net currently is one of
>> several NIC choices.
>>
>> Existing virtio-net implementations are not optimized for VM-to-VM
>> DPDK-style networking.  The following outline describes a zero-copy
>> virtio-net solution for VM-to-VM networking.
>>
>> Thanks to Paolo Bonzini for the Shared Buffers BAR idea.
>>
>> Use case
>> --------
>> Two VMs on the same host need to communicate in the most efficient
>> manner possible (e.g. the sole purpose of the VMs is to do network
I/O).
>>
>> Applications running inside the VMs implement virtio-net in userspace
so
>> they have full control over rx/tx rings and data buffer placement.
>
> Wouldn't that also benefit applications that use a kernel
> implementation? You still need to get the data to/from kernel space,
> but you'd get the benefit of being able to get the data to the peer
> immediately.
If the applications are using the sockets API then there is a memory
copy involved.  But you are right that it bypasses tap/bridge on the
host side, so it can still be an advantage.
>>
>> Performance requirements are higher priority than security or
isolation.
>> If this bothers you, stick to classic virtio-net.
>>
>> virtio-net VM-to-VM extensions
>> ------------------------------
>> A few extensions to virtio-net are necessary to support zero-copy
>> VM-to-VM communication.  The extensions are covered informally
>> throughout the text, this is not a VIRTIO specification change
proposal.
>>
>> The VM-to-VM capable virtio-net PCI adapter has an additional MMIO BAR
>> called the Shared Buffers BAR.  The Shared Buffers BAR is a shared
>> memory region on the host so that the virtio-net devices in VM1 and VM2
>> both access the same region of memory.
>>
>> The vring is still allocated in guest RAM as usual but data buffers
must
>> be located in the Shared Buffers BAR in order to take advantage of
>> zero-copy.
>>
>> When VM1 places a packet into the tx queue and the buffers are located
>> in the Shared Buffers BAR, the host finds the VM2's rx queue
descriptor
>> with the same buffer address and completes it without copying any data
>> buffers.
>
> The shared buffers BAR looks PCI-specific, but what about other
> mechanisms to provide a shared space between two VMs with some kind of
> lightweight notifications? This should make it possible to implement a
> similar mode of operation for other transports if it is factored out
> correctly. (The actual implementation of this shared space is probably
> the difficult part :)
It depends on the primitives available.  For example, in a virtual DMA
page-flipping environment the hypervisor could change page ownership
between the two VMs.  This does not required shared memory.  But
there's a cost to virtual memory bookkeeping so it might only be a win
for big packets.

Does s390 have a mechanism for giving VMs permanent shared or
temporary access to memory pages?
>>
>> Shared buffer allocation
>> ------------------------
>> A simple scheme for two cooperating VMs to manage the Shared Buffers
BAR
>> is as follows:
>>
>>   VM1         VM2
>>        +---+
>>    rx->| 1 |<-tx
>>        +---+
>>    tx->| 2 |<-rx
>>        +---+
>>    Shared Buffers
>>
>> This is a trivial example where the Shared Buffers BAR has only two
>> packet buffers.
>>
>> VM1 starts by putting buffer 1 in its rx queue.  VM2 starts by putting
>> buffer 2 in its rx queue.  The VMs know which buffers to choose based
on
>> a new uint8_t virtio_net_config.shared_buffers_offset field (0 for VM1
>> and 1 for VM2).
>>
>> VM1 can transmit to VM2 by filling buffer 2 and placing it on its tx
>> queue.  VM2 can transmit by filling buffer 1 and placing it on its tx
>> queue.
>>
>> As soon as a buffer is placed on a tx queue, the VM passes ownership of
>> the buffer to the other VM.  In other words, the buffer must not be
>> touched even after virtio-net tx completion because it now belongs to
>> the other VM.
>>
>> This scheme of bouncing ownership back-and-forth between the two VMs
>> only works if both VMs transmit an equal number of buffers over time.
>> In reality the traffic pattern may be unbalanced so VM1 is always
>> transmitting and VM2 is always receiving.  This problem can be overcome
>> if the VMs cooperate and return buffers if they accumulate too many.
>>
>> For example, after VM1 transmits buffer 2 it has run out of tx buffers:
>>
>>   VM1         VM2
>>        +---+
>>    rx->| 1 |<-tx
>>        +---+
>>     X->| 2 |<-rx
>>        +---+
>>
>> VM2 notices that it now holds all buffers.  It can donate a buffer back
>> to VM1 by putting it on the tx queue with the new virtio_net_hdr.flags
>> VIRTIO_NET_HDR_F_GIFT_BUFFER flag.  This flag indicates that this is
not
>> a packet but rather an empty gifted buffer.  VM1 checks the flags field
>> to detect that it has been gifted buffers.
>>
>> Also note that zero-copy networking is not mutually exclusive with
>> classic virtio-net.  If the descriptor has buffer addresses outside the
>> Shared Buffers BAR, then classic non-zero-copy virtio-net behavior
>> occurs.
>
> Is simply writing the values in the header enough to trigger the other
> side? You don't need some kind of notification? (I'm obviously
coming
> from a non-PCI view, and for my kind-of-nebulous idea I'd need a
> lightweight interrupt so that the other side knows it should check the
> header.)
Virtqueue kick is still used for notification.  In fact, the virtqueue
operation is basically the same, except that data buffers are now
located in the Shared Buffers BAR instead.
>> Discussion
>> ----------
>> The result is that applications in separate VMs can communicate in true
>> zero-copy fashion.
>>
>> I think this approach could be fruitful in bringing virtio-net to
>> VM-to-VM networking use cases.  Unless virtio-net is extended for this
>> use case, I'm afraid DPDK and OpenDataPlane communities might steer
>> clear of VIRTIO.
>>
>> This is an idea I want to share but I'm not working on a prototype.
>> Feel free to flesh it out further and try it!
>
> Definetly interesting. It seems you get much of the needed
> infrastructure by simply leveraging what PCI gives you anyway? If we
> want something like in other environments (say, via ccw on s390), we'd
> have to come up with a mechanism that can give us the same (which is
> probably the hard part).
It may not be a win in all environments.  It depends on the primitives
available for memory access.

With PCI devices and a Linux host we can use a shared memory region.
If shared memory is not available then maybe there is no performance
win to be had.

Stefan

Luke Gorrie

2015-Apr-24 08:12 UTC

head link

[virtio-dev] Zerocopy VM-to-VM networking using virtio-net

Hi Stefan,

Great topic. I am also extremely interested in helping Virtio-net become
the standard for the networking industry (the universe of DPDK, etc).

On 22 April 2015 at 19:01, Stefan Hajnoczi <stefanha at redhat.com> wrote:
> [It may be necessary to remove virtio-dev at lists.oasis-open.org from CC
> if you are a non-TC member.]
>
[Done.]

I think this approach could be fruitful in bringing virtio-net
to> VM-to-VM networking use cases.  Unless virtio-net is extended for this
> use case, I'm afraid DPDK and OpenDataPlane communities might steer
> clear of VIRTIO.
>
Questions:

- How fast is needed?

- How fast is the vhost-user support that shipped in DPDK 2.0?

- How fast would the new design likely be?

Our recent experience in Snabb Switch land is that networking on x86 is now
more of a HPC problem than a system programming problem. The SIMD bandwidth
per core keeps increasing that this erodes the value of traditional (and
complex) system programming optimizations. I will be interested to compare
notes with others on this, already on Haswell but more so when we have
AVX512.

Incidentally, we also did a pile of work last year on zero-copy NIC->VM
transfers and discovered a lot of interesting problems and edge cases where
Virtio-net spec and/or drivers are hard to match up with common NICs. Happy
to explain a bit about our experience if that would be valuable.

Cheers,
-Luke
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.linuxfoundation.org/pipermail/virtualization/attachments/20150424/33cc0389/attachment.html>

Paolo Bonzini

2015-Apr-24 08:20 UTC

head link

[virtio-dev] Zerocopy VM-to-VM networking using virtio-net

On 24/04/2015 10:12, Luke Gorrie wrote:> 
>     I think this approach could be fruitful in bringing virtio-net to
>     VM-to-VM networking use cases.  Unless virtio-net is extended for this
>     use case, I'm afraid DPDK and OpenDataPlane communities might steer
>     clear of VIRTIO.
> 
> 
> Questions:
> 
> - How fast is needed?
> 
> - How fast is the vhost-user support that shipped in DPDK 2.0?
vhost-user is fast.  The problem is not the speed, it's the desire of a
more peer-to-peer operation.

virtio by design has very distinct roles for driver and device, so for
VM2VM communication the virtio design requires two devices in the guest
and two drivers, comprising a "switch", in the host.

The switch could be using vhost-user indeed, but my understanding is
that in some cases this switch component is undesirable.  However, my
understanding does not include _why_ it is undesirable.  This is where
we need to gather more information from the DPDK folks.

Paolo

Stefan Hajnoczi

2015-Apr-24 09:47 UTC

head link

[virtio-dev] Zerocopy VM-to-VM networking using virtio-net

On Fri, Apr 24, 2015 at 9:12 AM, Luke Gorrie <luke at snabb.co>
wrote:> - How fast would the new design likely be?
This proposal eliminates two things in the path:

1. Compared to vhost_net, it bypasses the host tun driver and network
stack, replacing it with direct vhost_net <-> vhost_net data transfer.
At this level it's compared to vhost-user, but it's not programmable
in userspace!

2. Data copies are eliminated because the Shared Buffers BAR gives
both VMs access to the packets.

My concern is the overhead of the vhost_net component copying
descriptors between NICs.  In a 100% shared memory model, each VM only
has a receive queue that the other VM places packets into.  There are
no tx queues.  The notification mechanism is an event fd that is
ioeventfd for VM1 and irqfd for VM2.  In other words, when VM1 kicks
the queue, VM2 receives an interrupt (of course polling the receive
queue is also possible).

It would be interesting to compare the two approaches.
> Our recent experience in Snabb Switch land is that networking on x86 is now
> more of a HPC problem than a system programming problem. The SIMD bandwidth
> per core keeps increasing that this erodes the value of traditional (and
> complex) system programming optimizations. I will be interested to compare
> notes with others on this, already on Haswell but more so when we have
> AVX512.
>
> Incidentally, we also did a pile of work last year on zero-copy NIC->VM
> transfers and discovered a lot of interesting problems and edge cases where
> Virtio-net spec and/or drivers are hard to match up with common NICs. Happy
> to explain a bit about our experience if that would be valuable.
That sounds interesting, can you describe the setup?

Stefan

Apparently Analagous Threads

Search for more reasonably related threads

Linux Virtualization - Apr 2015 - [virtio-dev] Zerocopy VM-to-VM networking using virtio-net

Zerocopy VM-to-VM networking using virtio-net

Zerocopy VM-to-VM networking using virtio-net

Zerocopy VM-to-VM networking using virtio-net

[virtio-dev] Zerocopy VM-to-VM networking using virtio-net

[virtio-dev] Zerocopy VM-to-VM networking using virtio-net

[virtio-dev] Zerocopy VM-to-VM networking using virtio-net

Apparently Analagous Threads