thr3ads.net - Virtualization - [RFC] virtio-net: help live migrate SR-IOV devices [Nov 2017]

If this information is useful, please help other people find it:
Share via:

Jesse Brandeburg

2017-Nov-28 19:27 UTC

[RFC] virtio-net: help live migrate SR-IOV devices

Hi, I'd like to get some feedback on a proposal to enhance virtio-net
to ease configuration of a VM and that would enable live migration of
passthrough network SR-IOV devices. 

Today we have SR-IOV network devices (VFs) that can be passed into a VM
in order to enable high performance networking direct within the VM.
The problem I am trying to address is that this configuration is
generally difficult to live-migrate.  There is documentation [1]
indicating that some OS/Hypervisor vendors will support live migration
of a system with a direct assigned networking device.  The problem I
see with these implementations is that the network configuration
requirements that are passed on to the owner of the VM are quite
complicated.  You have to set up bonding, you have to configure it to
enslave two interfaces, those interfaces (one is virtio-net, the other
is SR-IOV device/driver like ixgbevf) must support MAC address changes
requested in the VM, and on and on...

So, on to the proposal:
Modify virtio-net driver to be a single VM network device that
enslaves an SR-IOV network device (inside the VM) with the same MAC
address. This would cause the virtio-net driver to appear and work like
a simplified bonding/team driver.  The live migration problem would be
solved just like today's bonding solution, but the VM user's networking
config would be greatly simplified.

At it's simplest, it would appear something like this in the VM.

========== vnet0           ============(virtio- =       |
 net)    =       |
         =  =========         =  = ixgbef ==========  =========
(forgive the ASCII art)

The fast path traffic would prefer the ixgbevf or other SR-IOV device
path, and fall back to virtio's transmit/receive when migrating.

Compared to today's options this proposal would
1) make virtio-net more sticky, allow fast path traffic at SR-IOV
   speeds 
2) simplify end user configuration in the VM (most if not all of the
   set up to enable migration would be done in the hypervisor)
3) allow live migration via a simple link down and maybe a PCI
   hot-unplug of the SR-IOV device, with failover to the virtio-net
   driver core
4) allow vendor agnostic hardware acceleration, and live migration 
   between vendors if the VM os has driver support for all the required
   SR-IOV devices.

Runtime operation proposed:
- <in either order> virtio-net driver loads, SR-IOV driver loads
- virtio-net finds other NICs that match it's MAC address by 
  both examining existing interfaces, and sets up a new device notifier
- virtio-net enslaves the first NIC with the same MAC address
- virtio-net brings up the slave, and makes it the "preferred" path
- virtio-net follows the behavior of an active backup bond/team
- virtio-net acts as the interface to the VM
- live migration initiates
- link goes down on SR-IOV, or SR-IOV device is removed
- failover to virtio-net as primary path
- migration continues to new host
- new host is started with virio-net as primary
- if no SR-IOV, virtio-net stays primary
- hypervisor can hot-add SR-IOV NIC, with same MAC addr as virtio
- virtio-net notices new NIC and starts over at enslave step above

Future ideas (brainstorming):
- Optimize Fast east-west by having special rules to direct east-west
  traffic through virtio-net traffic path

Thanks for reading!
Jesse

[1]
https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/virtual_machine_management_guide/sect-migrating_virtual_machines_between_hosts

Michael S. Tsirkin

2017-Nov-29 13:39 UTC

head link

[RFC] virtio-net: help live migrate SR-IOV devices

On Tue, Nov 28, 2017 at 11:27:22AM -0800, Jesse Brandeburg
wrote:> Hi, I'd like to get some feedback on a proposal to enhance virtio-net
> to ease configuration of a VM and that would enable live migration of
> passthrough network SR-IOV devices. 
You should also CC virtio-dev at lists.oasis-open.org (subscriber-only,
sorry about that).

-- 
MST

Jason Wang

2017-Nov-30 03:29 UTC

head link

[RFC] virtio-net: help live migrate SR-IOV devices

On 2017?11?29? 03:27, Jesse Brandeburg wrote:> Hi, I'd like to get some feedback on a proposal to enhance virtio-net
> to ease configuration of a VM and that would enable live migration of
> passthrough network SR-IOV devices.
>
> Today we have SR-IOV network devices (VFs) that can be passed into a VM
> in order to enable high performance networking direct within the VM.
> The problem I am trying to address is that this configuration is
> generally difficult to live-migrate.  There is documentation [1]
> indicating that some OS/Hypervisor vendors will support live migration
> of a system with a direct assigned networking device.  The problem I
> see with these implementations is that the network configuration
> requirements that are passed on to the owner of the VM are quite
> complicated.  You have to set up bonding, you have to configure it to
> enslave two interfaces, those interfaces (one is virtio-net, the other
> is SR-IOV device/driver like ixgbevf) must support MAC address changes
> requested in the VM, and on and on...
>
> So, on to the proposal:
> Modify virtio-net driver to be a single VM network device that
> enslaves an SR-IOV network device (inside the VM) with the same MAC
> address. This would cause the virtio-net driver to appear and work like
> a simplified bonding/team driver.  The live migration problem would be
> solved just like today's bonding solution, but the VM user's
networking
> config would be greatly simplified.
>
> At it's simplest, it would appear something like this in the VM.
>
> =========> = vnet0  >           ============> (virtio- =       |
>   net)    =       |
>           =  =========>           =  = ixgbef > ========== 
=========>
> (forgive the ASCII art)
>
> The fast path traffic would prefer the ixgbevf or other SR-IOV device
> path, and fall back to virtio's transmit/receive when migrating.
>
> Compared to today's options this proposal would
> 1) make virtio-net more sticky, allow fast path traffic at SR-IOV
>     speeds
> 2) simplify end user configuration in the VM (most if not all of the
>     set up to enable migration would be done in the hypervisor)
> 3) allow live migration via a simple link down and maybe a PCI
>     hot-unplug of the SR-IOV device, with failover to the virtio-net
>     driver core
> 4) allow vendor agnostic hardware acceleration, and live migration
>     between vendors if the VM os has driver support for all the required
>     SR-IOV devices.
>
> Runtime operation proposed:
> - <in either order> virtio-net driver loads, SR-IOV driver loads
> - virtio-net finds other NICs that match it's MAC address by
>    both examining existing interfaces, and sets up a new device notifier
> - virtio-net enslaves the first NIC with the same MAC address
> - virtio-net brings up the slave, and makes it the "preferred"
path
> - virtio-net follows the behavior of an active backup bond/team
> - virtio-net acts as the interface to the VM
> - live migration initiates
> - link goes down on SR-IOV, or SR-IOV device is removed
> - failover to virtio-net as primary path
> - migration continues to new host
> - new host is started with virio-net as primary
> - if no SR-IOV, virtio-net stays primary
> - hypervisor can hot-add SR-IOV NIC, with same MAC addr as virtio
> - virtio-net notices new NIC and starts over at enslave step above
>
> Future ideas (brainstorming):
> - Optimize Fast east-west by having special rules to direct east-west
>    traffic through virtio-net traffic path
>
> Thanks for reading!
> Jesse
Cc netdev.

Interesting, and this method is actually used by netvsc now:

commit 0c195567a8f6e82ea5535cd9f1d54a1626dd233e
Author: stephen hemminger <stephen at networkplumber.org>
Date:?? Tue Aug 1 19:58:53 2017 -0700

 ??? netvsc: transparent VF management

 ??? This patch implements transparent fail over from synthetic NIC to
 ??? SR-IOV virtual function NIC in Hyper-V environment. It is a better
 ??? alternative to using bonding as is done now. Instead, the receive and
 ??? transmit fail over is done internally inside the driver.

 ??? Using bonding driver has lots of issues because it depends on the
 ??? script being run early enough in the boot process and with sufficient
 ??? information to make the association. This patch moves all that
 ??? functionality into the kernel.

 ??? Signed-off-by: Stephen Hemminger <sthemmin at microsoft.com>
 ??? Signed-off-by: David S. Miller <davem at davemloft.net>

If my understanding is correct there's no need to for any extension of 
virtio spec. If this is true, maybe you can start to prepare the patch?

Thanks
>
> [1]
>
https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/virtual_machine_management_guide/sect-migrating_virtual_machines_between_hosts
> _______________________________________________
> Virtualization mailing list
> Virtualization at lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Jakub Kicinski

2017-Nov-30 03:51 UTC

head link

[RFC] virtio-net: help live migrate SR-IOV devices

On Thu, 30 Nov 2017 11:29:56 +0800, Jason Wang wrote:> On 2017?11?29? 03:27, Jesse Brandeburg wrote:
> > Hi, I'd like to get some feedback on a proposal to enhance
virtio-net
> > to ease configuration of a VM and that would enable live migration of
> > passthrough network SR-IOV devices.
> >
> > Today we have SR-IOV network devices (VFs) that can be passed into a
VM
> > in order to enable high performance networking direct within the VM.
> > The problem I am trying to address is that this configuration is
> > generally difficult to live-migrate.  There is documentation [1]
> > indicating that some OS/Hypervisor vendors will support live migration
> > of a system with a direct assigned networking device.  The problem I
> > see with these implementations is that the network configuration
> > requirements that are passed on to the owner of the VM are quite
> > complicated.  You have to set up bonding, you have to configure it to
> > enslave two interfaces, those interfaces (one is virtio-net, the other
> > is SR-IOV device/driver like ixgbevf) must support MAC address changes
> > requested in the VM, and on and on...
> >
> > So, on to the proposal:
> > Modify virtio-net driver to be a single VM network device that
> > enslaves an SR-IOV network device (inside the VM) with the same MAC
> > address. This would cause the virtio-net driver to appear and work
like
> > a simplified bonding/team driver.  The live migration problem would be
> > solved just like today's bonding solution, but the VM user's
networking
> > config would be greatly simplified.
> >
> > At it's simplest, it would appear something like this in the VM.
> >
> > =========> > = vnet0  > >           ============> >
(virtio- =       |
> >   net)    =       |
> >           =  =========> >           =  = ixgbef > >
==========  =========> >
> > (forgive the ASCII art)
> >
> > The fast path traffic would prefer the ixgbevf or other SR-IOV device
> > path, and fall back to virtio's transmit/receive when migrating.
> >
> > Compared to today's options this proposal would
> > 1) make virtio-net more sticky, allow fast path traffic at SR-IOV
> >     speeds
> > 2) simplify end user configuration in the VM (most if not all of the
> >     set up to enable migration would be done in the hypervisor)
> > 3) allow live migration via a simple link down and maybe a PCI
> >     hot-unplug of the SR-IOV device, with failover to the virtio-net
> >     driver core
> > 4) allow vendor agnostic hardware acceleration, and live migration
> >     between vendors if the VM os has driver support for all the
required
> >     SR-IOV devices.
> >
> > Runtime operation proposed:
> > - <in either order> virtio-net driver loads, SR-IOV driver loads
> > - virtio-net finds other NICs that match it's MAC address by
> >    both examining existing interfaces, and sets up a new device
notifier
> > - virtio-net enslaves the first NIC with the same MAC address
> > - virtio-net brings up the slave, and makes it the
"preferred" path
> > - virtio-net follows the behavior of an active backup bond/team
> > - virtio-net acts as the interface to the VM
> > - live migration initiates
> > - link goes down on SR-IOV, or SR-IOV device is removed
> > - failover to virtio-net as primary path
> > - migration continues to new host
> > - new host is started with virio-net as primary
> > - if no SR-IOV, virtio-net stays primary
> > - hypervisor can hot-add SR-IOV NIC, with same MAC addr as virtio
> > - virtio-net notices new NIC and starts over at enslave step above
> >
> > Future ideas (brainstorming):
> > - Optimize Fast east-west by having special rules to direct east-west
> >    traffic through virtio-net traffic path
> >
> > Thanks for reading!
> > Jesse  
> 
> Cc netdev.
> 
> Interesting, and this method is actually used by netvsc now:
> 
> commit 0c195567a8f6e82ea5535cd9f1d54a1626dd233e
> Author: stephen hemminger <stephen at networkplumber.org>
> Date:?? Tue Aug 1 19:58:53 2017 -0700
> 
>  ??? netvsc: transparent VF management
> 
>  ??? This patch implements transparent fail over from synthetic NIC to
>  ??? SR-IOV virtual function NIC in Hyper-V environment. It is a better
>  ??? alternative to using bonding as is done now. Instead, the receive and
>  ??? transmit fail over is done internally inside the driver.
> 
>  ??? Using bonding driver has lots of issues because it depends on the
>  ??? script being run early enough in the boot process and with sufficient
>  ??? information to make the association. This patch moves all that
>  ??? functionality into the kernel.
> 
>  ??? Signed-off-by: Stephen Hemminger <sthemmin at microsoft.com>
>  ??? Signed-off-by: David S. Miller <davem at davemloft.net>
> 
> If my understanding is correct there's no need to for any extension of 
> virtio spec. If this is true, maybe you can start to prepare the patch?
IMHO this is as close to policy in the kernel as one can get.  User
land has all the information it needs to instantiate that bond/team
automatically.  In fact I'm trying to discuss this with NetworkManager
folks and Red Hat right now:

https://mail.gnome.org/archives/networkmanager-list/2017-November/msg00038.html

Can we flip the argument and ask why is the kernel supposed to be
responsible for this?  It's not like we run DHCP out of the kernel
on new interfaces...

achiad shochat

2017-Nov-30 08:08 UTC

head link

[RFC] virtio-net: help live migrate SR-IOV devices

On 30 November 2017 at 05:29, Jason Wang <jasowang at redhat.com>
wrote:>
>
> On 2017?11?29? 03:27, Jesse Brandeburg wrote:
>>
>> Hi, I'd like to get some feedback on a proposal to enhance
virtio-net
>> to ease configuration of a VM and that would enable live migration of
>> passthrough network SR-IOV devices.
>>
>> Today we have SR-IOV network devices (VFs) that can be passed into a VM
>> in order to enable high performance networking direct within the VM.
>> The problem I am trying to address is that this configuration is
>> generally difficult to live-migrate.  There is documentation [1]
>> indicating that some OS/Hypervisor vendors will support live migration
>> of a system with a direct assigned networking device.  The problem I
>> see with these implementations is that the network configuration
>> requirements that are passed on to the owner of the VM are quite
>> complicated.  You have to set up bonding, you have to configure it to
>> enslave two interfaces, those interfaces (one is virtio-net, the other
>> is SR-IOV device/driver like ixgbevf) must support MAC address changes
>> requested in the VM, and on and on...
>>
>> So, on to the proposal:
>> Modify virtio-net driver to be a single VM network device that
>> enslaves an SR-IOV network device (inside the VM) with the same MAC
>> address. This would cause the virtio-net driver to appear and work like
>> a simplified bonding/team driver.  The live migration problem would be
>> solved just like today's bonding solution, but the VM user's
networking
>> config would be greatly simplified.
>>
>> At it's simplest, it would appear something like this in the VM.
>>
>> =========>> = vnet0  >>           ============>>
(virtio- =       |
>>   net)    =       |
>>           =  =========>>           =  = ixgbef >>
==========  =========>>
>> (forgive the ASCII art)
>>
>> The fast path traffic would prefer the ixgbevf or other SR-IOV device
>> path, and fall back to virtio's transmit/receive when migrating.
>>
>> Compared to today's options this proposal would
>> 1) make virtio-net more sticky, allow fast path traffic at SR-IOV
>>     speeds
>> 2) simplify end user configuration in the VM (most if not all of the
>>     set up to enable migration would be done in the hypervisor)
>> 3) allow live migration via a simple link down and maybe a PCI
>>     hot-unplug of the SR-IOV device, with failover to the virtio-net
>>     driver core
>> 4) allow vendor agnostic hardware acceleration, and live migration
>>     between vendors if the VM os has driver support for all the
required
>>     SR-IOV devices.
>>
>> Runtime operation proposed:
>> - <in either order> virtio-net driver loads, SR-IOV driver loads
>> - virtio-net finds other NICs that match it's MAC address by
>>    both examining existing interfaces, and sets up a new device
notifier
>> - virtio-net enslaves the first NIC with the same MAC address
>> - virtio-net brings up the slave, and makes it the
"preferred" path
>> - virtio-net follows the behavior of an active backup bond/team
>> - virtio-net acts as the interface to the VM
>> - live migration initiates
>> - link goes down on SR-IOV, or SR-IOV device is removed
>> - failover to virtio-net as primary path
>> - migration continues to new host
>> - new host is started with virio-net as primary
>> - if no SR-IOV, virtio-net stays primary
>> - hypervisor can hot-add SR-IOV NIC, with same MAC addr as virtio
>> - virtio-net notices new NIC and starts over at enslave step above
>>
>> Future ideas (brainstorming):
>> - Optimize Fast east-west by having special rules to direct east-west
>>    traffic through virtio-net traffic path
>>
>> Thanks for reading!
>> Jesse
>
>
> Cc netdev.
>
> Interesting, and this method is actually used by netvsc now:
>
> commit 0c195567a8f6e82ea5535cd9f1d54a1626dd233e
> Author: stephen hemminger <stephen at networkplumber.org>
> Date:   Tue Aug 1 19:58:53 2017 -0700
>
>     netvsc: transparent VF management
>
>     This patch implements transparent fail over from synthetic NIC to
>     SR-IOV virtual function NIC in Hyper-V environment. It is a better
>     alternative to using bonding as is done now. Instead, the receive and
>     transmit fail over is done internally inside the driver.
>
>     Using bonding driver has lots of issues because it depends on the
>     script being run early enough in the boot process and with sufficient
>     information to make the association. This patch moves all that
>     functionality into the kernel.
>
>     Signed-off-by: Stephen Hemminger <sthemmin at microsoft.com>
>     Signed-off-by: David S. Miller <davem at davemloft.net>
>
> If my understanding is correct there's no need to for any extension of
> virtio spec. If this is true, maybe you can start to prepare the patch?
>
> Thanks
>
>>
>> [1]
>>
>>
https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/virtual_machine_management_guide/sect-migrating_virtual_machines_between_hosts
>> _______________________________________________
>> Virtualization mailing list
>> Virtualization at lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
>
>
I do not see why to couple the solution with any specific para-virt technology.
Not with netvsc, nor with virt-io.
One may wish to implement the routing between the VMs and the HV
without any PV device at all, e.g using VF representors and PCIe
loopback, as done with ASAP2-direct.
This method is actually much more efficient in CPU utilization (on the
expense of PCIe BW utilization).

So let's try to first specify the problems that need to be resolved in
order to support Live Migration with SR-IOV rather than rely on
already-done-work (netvsc) without understanding if/why it was right
from the beginning.

To my understanding the problems are the following:
1) DMA: with SR-IOV devices write directly into the guests memory
which yields dirty guest pages that are not marked as dirty for the
host CPU MMU, thus preventing the migration pre-copy phase from
starting while the guest is running on the source machine.
2) Guest network interface persistency: VF detachment causes VF driver
PCI remove which causes the VF netdev to disappear. If that VF netdev
is a guest primary interface (has an IP), sockets using it will break.

Re problem #1:
So far in this mail thread, it was taken for granted that the way to
resolve it is to have a PV device as backup for the pre-copy phase.
In addition to tying the solution with para-virt being in place (which
as already said seems a wrong enforcement to me), it does not really
solve the problem, rather partially works around it.
It just mitigates the problem from long service downtime to long
service degradation time.
To really stab the problem in its heart we need to just mark the guest
DMA written pages as dirty.
Alexander Duyck already initiated patches to address it ~two years ago
(https://groups.google.com/forum/#!topic/linux.kernel/aIQOsh2oJEk) but
un-fortunately they were abandoned.
The simplest way I can think of to resolve it is to have the guest VF
driver just read-modify-write some word of each DMA page before
passing it to the stack by netif_rx().
To limit the performance impact of this operation we can signal the VM
to start doing it only upon pre-copy phase start.

Re. problem #2:
Indeed the best way to address it seems to be to enslave the VF driver
netdev under a persistent anchor netdev.
And it's indeed desired to allow (but not enforce) PV netdev and VF
netdev to work in conjunction.
And it's indeed desired that this enslavement logic work out-of-the box.
But in case of PV+VF some configurable policies must be in place (and
they'd better be generic rather than differ per PV technology).
For example - based on which characteristics should the PV+VF coupling
be done? netvsc uses MAC address, but that might not always be the
desire.
Another example - when to use PV and when to use VF? One may want to
use PV only if VF is gone/down, while others may want to use PV also
when VF is up, e.g for multicasting.
I think the right way to address it is to have a new dedicated module
for this purpose.
Have it automatically enslave PV and VF netdevs according to user
configured policy. Enslave the VF even if there is no PV device at
all.

This way we get:
1) Optimal migration performance
2) A PV agnostic (VM may be migrated even from one PV technology to
another) and HW device agnostic solution
    A dedicated generic module will also enforce a lower common
denominator of guest netdev features, preventing migration dependency
on source/guest machine capabilities.
3) Out-of-the box solution yet with generic methods for policy setting

Thanks

Michael S. Tsirkin

2017-Nov-30 14:14 UTC

head link

[RFC] virtio-net: help live migrate SR-IOV devices

On Thu, Nov 30, 2017 at 11:29:56AM +0800, Jason Wang
wrote:> If my understanding is correct there's no need to for any extension of
> virtio spec.
There appears to be a concern that some existing configurations
might use same MAC for an unrelated reason. Not sure what
that could be, but for sure, we could add a feature flag.
That needs to be approved by the virtio TC, but it's
just a single line in the spec no big deal, we can help here.
> If this is true, maybe you can start to prepare the patch?
Yes, please do. We can add a safeguard of a feature bit on top.

-- 
MST

Apparently Analagous Threads

Search for more seemingly similar threads

Virtualization - Nov 2017 - [RFC] virtio-net: help live migrate SR-IOV devices

[RFC] virtio-net: help live migrate SR-IOV devices

[RFC] virtio-net: help live migrate SR-IOV devices

[RFC] virtio-net: help live migrate SR-IOV devices

[RFC] virtio-net: help live migrate SR-IOV devices

[RFC] virtio-net: help live migrate SR-IOV devices

[RFC] virtio-net: help live migrate SR-IOV devices

Apparently Analagous Threads