thr3ads.net - Virtualization - [RFC] virtio-net: help live migrate SR-IOV devices [Dec 2017]

If this information is useful, please help other people find it:
Share via:

Alexander Duyck

2017-Dec-04 16:30 UTC

[RFC] virtio-net: help live migrate SR-IOV devices

On Mon, Dec 4, 2017 at 1:51 AM, achiad shochat
<achiad.mellanox at gmail.com> wrote:> On 3 December 2017 at 19:35, Stephen Hemminger
> <stephen at networkplumber.org> wrote:
>> On Sun, 3 Dec 2017 11:14:37 +0200
>> achiad shochat <achiad.mellanox at gmail.com> wrote:
>>
>>> On 3 December 2017 at 07:05, Michael S. Tsirkin <mst at
redhat.com> wrote:
>>> > On Fri, Dec 01, 2017 at 12:08:59PM -0800, Shannon Nelson
wrote:
>>> >> On 11/30/2017 6:11 AM, Michael S. Tsirkin wrote:
>>> >> > On Thu, Nov 30, 2017 at 10:08:45AM +0200, achiad
shochat wrote:
>>> >> > > Re. problem #2:
>>> >> > > Indeed the best way to address it seems to be to
enslave the VF driver
>>> >> > > netdev under a persistent anchor netdev.
>>> >> > > And it's indeed desired to allow (but not
enforce) PV netdev and VF
>>> >> > > netdev to work in conjunction.
>>> >> > > And it's indeed desired that this
enslavement logic work out-of-the box.
>>> >> > > But in case of PV+VF some configurable policies
must be in place (and
>>> >> > > they'd better be generic rather than differ
per PV technology).
>>> >> > > For example - based on which characteristics
should the PV+VF coupling
>>> >> > > be done? netvsc uses MAC address, but that might
not always be the
>>> >> > > desire.
>>> >> >
>>> >> > It's a policy but not guest userspace policy.
>>> >> >
>>> >> > The hypervisor certainly knows.
>>> >> >
>>> >> > Are you concerned that someone might want to create
two devices with the
>>> >> > same MAC for an unrelated reason?  If so, hypervisor
could easily set a
>>> >> > flag in the virtio device to say "this is a
backup, use MAC to find
>>> >> > another device".
>>> >>
>>> >> This is something I was going to suggest: a flag or other
configuration on
>>> >> the virtio device to help control how this new feature is
used.  I can
>>> >> imagine this might be useful to control from either the
hypervisor side or
>>> >> the VM side.
>>> >>
>>> >> The hypervisor might want to (1) disable it (force it
off), (2) enable it
>>> >> for VM choice, or (3) force it on for the VM.  In case
(2), the VM might be
>>> >> able to chose whether it wants to make use of the feature,
or stick with the
>>> >> bonding solution.
>>> >>
>>> >> Either way, the kernel is making a feature available, and
the user (VM or
>>> >> hypervisor) is able to control it by selecting the feature
based on the
>>> >> policy desired.
>>> >>
>>> >> sln
>>> >
>>> > I'm not sure what's the feature that is available
here.
>>> >
>>> > I saw this as a flag that says "this device shares
backend with another
>>> > network device which can be found using MAC, and that backend
should be
>>> > preferred".  kernel then forces configuration which uses
that other
>>> > backend - as long as it exists.
>>> >
>>> > However, please Cc virtio-dev mailing list if we are doing
this since
>>> > this is a spec extension.
>>> >
>>> > --
>>> > MST
>>>
>>>
>>> Can someone please explain why assume a virtio device is there at
all??
>>> I specified a case where there isn't any.
Migrating without any virtual device is going to be extremely
challenging, especially in any kind of virtualization setup where the
hosts are not homogeneous. By providing a virtio interface you can
guarantee that at least 1 network interface is available on any given
host, and then fail over to that as the least common denominator for
any migration.
>>> I second Jacob - having a netdev of one device driver enslave a
netdev
>>> of another device driver is an awkward a-symmetric model.
>>> Regardless of whether they share the same backend device.
>>> Only I am not sure the Linux Bond is the right choice.
>>> e.g one may well want to use the virtio device also when the
>>> pass-through device is available, e.g for multicasts, east-west
>>> traffic, etc.
>>> I'm not sure the Linux Bond fits that functionality.
>>> And, as I hear in this thread, it is hard to make it work out of
the box.
>>> So I think the right thing would be to write a new dedicated module
>>> for this purpose.
This part I can sort of agree with. What if we were to look at
providing a way to somehow advertise that the two devices were meant
to be boded for virtualization purposes? For now lets call it a
"virt-bond". Basically we could look at providing a means for virtio
and VF drivers to advertise that they want this sort of bond. Then it
would just be a matter of providing some sort of side channel to
indicate where you want things like multicast/broadcast/east-west
traffic to go.
>>> Re policy -
>>> Indeed the HV can request a policy from the guest but that's
not a
>>> claim for the virtio device enslaving the pass-through device.
>>> Any policy can be queried by the upper enslaving device.
>>>
>>> Bottom line - I do not see a single reason to have the virtio
netdev
>>> (nor netvsc or any other PV netdev) enslave another netdev by
itself.
>>> If we'd do it right with netvsc from the beginning we
wouldn't need
>>> this discussion at all...
>>
>> There are several issues with transparent migration.
>> The first is that the SR-IOV device needs to be shut off for earlier
>> in the migration process.
>
> That's not a given fact.
> It's due to the DMA and it should be solve anyway.
> Please read my first reply in this thread.
For now it is a fact. We would need to do a drastic rewrite of the DMA
API in the guest/host/QEMU/IOMMU in order to avoid it for now. So as a
first step I would say we should look at using this bonding type
solution. Being able to defer the VF eviction could be a next step for
all this as it would allow for much better performance, but we still
have too many cases where the VF might not be there after a migration.
>> Next, the SR-IOV device in the migrated go guest environment maybe
different.
>> It might not exist at all, it might be at a different PCI address, or
it
>> could even be a different vendor/speed/model.
>> Keeping a virtual network device around allows persisting the
connectivity,
>> during the process.
>
> Right, but that virtual device must not relate to any para-virt
> specific technology (not netvsc, nor virtio).
> Again, it seems you did not read my first reply.
I would agree with the need to make this agnostic. Maybe we could look
at the current netvsc solution and find a way to make it generic so it
could be applied to any combination of paravirtual interface and PF.

Michael S. Tsirkin

2017-Dec-05 19:20 UTC

head link

[RFC] virtio-net: help live migrate SR-IOV devices

On Tue, Dec 05, 2017 at 11:59:17AM +0200, achiad shochat
wrote:> Then we'll have a single solution for both netvsc and virtio (and any
> other PV device).
> And we could handle the VF DMA dirt issue agnostically.
For the record, I won't block patches adding this kist to virtio
on the basis that they must be generic. It's not a lot
of code, implementation can come first, prettify later.

But we do need to have a discussion about how devices are paired.
I am not sure using just MAC works. E.g. some passthrough
devices don't give host ability to set the MAC.
Are these worth worrying about?

-- 
MST

Jakub Kicinski

2017-Dec-05 22:29 UTC

head link

[RFC] virtio-net: help live migrate SR-IOV devices

On Tue, 5 Dec 2017 11:59:17 +0200, achiad shochat wrote:> >>>> I second Jacob - having a netdev of one device driver
enslave a netdev
> >>>> of another device driver is an awkward a-symmetric model.
> >>>> Regardless of whether they share the same backend device.
> >>>> Only I am not sure the Linux Bond is the right choice.
> >>>> e.g one may well want to use the virtio device also when
the
> >>>> pass-through device is available, e.g for multicasts,
east-west
> >>>> traffic, etc.
> >>>> I'm not sure the Linux Bond fits that functionality.
> >>>> And, as I hear in this thread, it is hard to make it work
out of the box.
> >>>> So I think the right thing would be to write a new
dedicated module
> >>>> for this purpose.  
> >
> > This part I can sort of agree with. What if we were to look at
> > providing a way to somehow advertise that the two devices were meant
> > to be boded for virtualization purposes? For now lets call it a
> > "virt-bond". Basically we could look at providing a means
for virtio
> > and VF drivers to advertise that they want this sort of bond. Then it
> > would just be a matter of providing some sort of side channel to
> > indicate where you want things like multicast/broadcast/east-west
> > traffic to go.
> 
> I like this approach.
+1 on a separate driver, just enslaving devices to virtio may break
existing setups.  If people are bonding from user space today, if they
update their kernel it may surprise them how things get auto-mangled.

Is what Alex is suggesting a separate PV device that says "I would
like to be a bond of those two interfaces"?  That would make the HV
intent explicit and kernel decisions more understandable.

Reasonably Related Threads

Search for more maybe matching threads

Virtualization - Dec 2017 - [RFC] virtio-net: help live migrate SR-IOV devices

[RFC] virtio-net: help live migrate SR-IOV devices

[RFC] virtio-net: help live migrate SR-IOV devices

[RFC] virtio-net: help live migrate SR-IOV devices

Reasonably Related Threads