thr3ads.net - Linux Virtualization - [summary] virtio network device failover writeup [Mar 2019]

If this information is useful, please help other people find it:
Share via:

Liran Alon

2019-Mar-21 13:04 UTC

[summary] virtio network device failover writeup

> On 21 Mar 2019, at 14:57, Michael S. Tsirkin <mst at redhat.com>
wrote:
> 
> On Thu, Mar 21, 2019 at 02:47:50PM +0200, Liran Alon wrote:
>> 
>> 
>>> On 21 Mar 2019, at 14:37, Michael S. Tsirkin <mst at
redhat.com> wrote:
>>> 
>>> On Thu, Mar 21, 2019 at 12:07:57PM +0200, Liran Alon wrote:
>>>>>>>> 2) It brings non-intuitive customer experience.
For example, a customer may attempt to analyse connectivity issue by checking
the connectivity
>>>>>>>> on a net-failover slave (e.g. the VF) but will
see no connectivity when in-fact checking the connectivity on the net-failover
master netdev shows correct connectivity.
>>>>>>>> 
>>>>>>>> The set of changes I vision to fix our issues
are:
>>>>>>>> 1) Hide net-failover slaves in a different
netns created and managed by the kernel. But that user can enter to it and
manage the netdevs there if wishes to do so explicitly.
>>>>>>>> (E.g. Configure the net-failover VF slave in
some special way).
>>>>>>>> 2) Match the virtio-net and the VF based on a
PV attribute instead of MAC. (Similar to as done in NetVSC). E.g. Provide a
virtio-net interface to get PCI slot where the matching VF will be hot-plugged
by hypervisor.
>>>>>>>> 3) Have an explicit virtio-net control message
to command hypervisor to switch data-path from virtio-net to VF and vice-versa.
Instead of relying on intercepting the PCI master enable-bit
>>>>>>>> as an indicator on when VF is about to be set
up. (Similar to as done in NetVSC).
>>>>>>>> 
>>>>>>>> Is there any clear issue we see regarding the
above suggestion?
>>>>>>>> 
>>>>>>>> -Liran
>>>>>>> 
>>>>>>> The issue would be this: how do we avoid
conflicting with namespaces
>>>>>>> created by users?
>>>>>> 
>>>>>> This is kinda controversial, but maybe separate netns
names into 2 groups: hidden and normal.
>>>>>> To reference a hidden netns, you need to do it
explicitly.
>>>>>> Hidden and normal netns names can collide as they will
be maintained in different namespaces (Yes I?m overloading the term namespace
here?).
>>>>> 
>>>>> Maybe it's an unnamed namespace. Hidden until userspace
gives it a name?
>>>> 
>>>> This is also a good idea that will solve the issue. Yes.
>>>> 
>>>>> 
>>>>>> Does this seems reasonable?
>>>>>> 
>>>>>> -Liran
>>>>> 
>>>>> Reasonable I'd say yes, easy to implement probably no.
But maybe I
>>>>> missed a trick or two.
>>>> 
>>>> BTW, from a practical point of view, I think that even until we
figure out a solution on how to implement this,
>>>> it was better to create an kernel auto-generated name (e.g.
?kernel_net_failover_slaves")
>>>> that will break only userspace workloads that by a very
rare-chance have a netns that collides with this then
>>>> the breakage we have today for the various userspace
components.
>>>> 
>>>> -Liran
>>> 
>>> It seems quite easy to supply that as a module parameter. Do we
need two
>>> namespaces though? Won't some userspace still be confused by
the two
>>> slaves sharing the MAC address?
>> 
>> That?s one reasonable option.
>> Another one is that we will indeed change the mechanism by which we
determine a VF should be bonded with a virtio-net device.
>> i.e. Expose a new virtio-net property that specify the PCI slot of the
VF to be bonded with.
>> 
>> The second seems cleaner but I don?t have a strong opinion on this.
Both seem reasonable to me and your suggestion is faster to implement from
current state of things.
>> 
>> -Liran
> 
> OK. Now what happens if master is moved to another namespace? Do we need
> to move the slaves too?
No. Why would we move the slaves? The whole point is to make most customer
ignore the net-failover slaves and remain them ?hidden? in their dedicated
netns.
We won?t prevent customer from explicitly moving the net-failover slaves out of
this netns, but we will not move them out of there automatically.
> 
> Also siwei's patch is then kind of extraneous right?
> Attempts to rename a slave will now fail as it's in a namespace?
I?m not sure actually. Isn't udev/systemd netns-aware?
I would expect it to be able to provide names also to netdevs in netns different
than default netns.
If that?s the case, Si-Wei patch to be able to rename a net-failover slave when
it is already open is still required. As the race-condition still exists.

-Liran
> 
>>> 
>>> -- 
>>> MST

Michael S. Tsirkin

2019-Mar-21 13:12 UTC

head link

[summary] virtio network device failover writeup

On Thu, Mar 21, 2019 at 03:04:37PM +0200, Liran Alon
wrote:> 
> 
> > On 21 Mar 2019, at 14:57, Michael S. Tsirkin <mst at redhat.com>
wrote:
> > 
> > On Thu, Mar 21, 2019 at 02:47:50PM +0200, Liran Alon wrote:
> >> 
> >> 
> >>> On 21 Mar 2019, at 14:37, Michael S. Tsirkin <mst at
redhat.com> wrote:
> >>> 
> >>> On Thu, Mar 21, 2019 at 12:07:57PM +0200, Liran Alon wrote:
> >>>>>>>> 2) It brings non-intuitive customer
experience. For example, a customer may attempt to analyse connectivity issue by
checking the connectivity
> >>>>>>>> on a net-failover slave (e.g. the VF) but
will see no connectivity when in-fact checking the connectivity on the
net-failover master netdev shows correct connectivity.
> >>>>>>>> 
> >>>>>>>> The set of changes I vision to fix our
issues are:
> >>>>>>>> 1) Hide net-failover slaves in a different
netns created and managed by the kernel. But that user can enter to it and
manage the netdevs there if wishes to do so explicitly.
> >>>>>>>> (E.g. Configure the net-failover VF slave
in some special way).
> >>>>>>>> 2) Match the virtio-net and the VF based
on a PV attribute instead of MAC. (Similar to as done in NetVSC). E.g. Provide a
virtio-net interface to get PCI slot where the matching VF will be hot-plugged
by hypervisor.
> >>>>>>>> 3) Have an explicit virtio-net control
message to command hypervisor to switch data-path from virtio-net to VF and
vice-versa. Instead of relying on intercepting the PCI master enable-bit
> >>>>>>>> as an indicator on when VF is about to be
set up. (Similar to as done in NetVSC).
> >>>>>>>> 
> >>>>>>>> Is there any clear issue we see regarding
the above suggestion?
> >>>>>>>> 
> >>>>>>>> -Liran
> >>>>>>> 
> >>>>>>> The issue would be this: how do we avoid
conflicting with namespaces
> >>>>>>> created by users?
> >>>>>> 
> >>>>>> This is kinda controversial, but maybe separate
netns names into 2 groups: hidden and normal.
> >>>>>> To reference a hidden netns, you need to do it
explicitly.
> >>>>>> Hidden and normal netns names can collide as they
will be maintained in different namespaces (Yes I?m overloading the term
namespace here?).
> >>>>> 
> >>>>> Maybe it's an unnamed namespace. Hidden until
userspace gives it a name?
> >>>> 
> >>>> This is also a good idea that will solve the issue. Yes.
> >>>> 
> >>>>> 
> >>>>>> Does this seems reasonable?
> >>>>>> 
> >>>>>> -Liran
> >>>>> 
> >>>>> Reasonable I'd say yes, easy to implement probably
no. But maybe I
> >>>>> missed a trick or two.
> >>>> 
> >>>> BTW, from a practical point of view, I think that even
until we figure out a solution on how to implement this,
> >>>> it was better to create an kernel auto-generated name
(e.g. ?kernel_net_failover_slaves")
> >>>> that will break only userspace workloads that by a very
rare-chance have a netns that collides with this then
> >>>> the breakage we have today for the various userspace
components.
> >>>> 
> >>>> -Liran
> >>> 
> >>> It seems quite easy to supply that as a module parameter. Do
we need two
> >>> namespaces though? Won't some userspace still be confused
by the two
> >>> slaves sharing the MAC address?
> >> 
> >> That?s one reasonable option.
> >> Another one is that we will indeed change the mechanism by which
we determine a VF should be bonded with a virtio-net device.
> >> i.e. Expose a new virtio-net property that specify the PCI slot of
the VF to be bonded with.
> >> 
> >> The second seems cleaner but I don?t have a strong opinion on
this. Both seem reasonable to me and your suggestion is faster to implement from
current state of things.
> >> 
> >> -Liran
> > 
> > OK. Now what happens if master is moved to another namespace? Do we
need
> > to move the slaves too?
> 
> No. Why would we move the slaves?

The reason we have 3 device model at all is so users can fine tune the
slaves. I don't see why this applies to the root namespace but not
a container. If it has access to failover it should have access
to slaves.
> The whole point is to make most customer ignore the net-failover slaves and
remain them ?hidden? in their dedicated netns.
So that makes the common case easy. That is good. My worry is it might
make some uncommon cases impossible.
> We won?t prevent customer from explicitly moving the net-failover slaves
out of this netns, but we will not move them out of there automatically.
> 
> > 
> > Also siwei's patch is then kind of extraneous right?
> > Attempts to rename a slave will now fail as it's in a namespace?
> 
> I?m not sure actually. Isn't udev/systemd netns-aware?
> I would expect it to be able to provide names also to netdevs in netns
different than default netns.
I think most people move devices after they are renamed.
> If that?s the case, Si-Wei patch to be able to rename a net-failover slave
when it is already open is still required. As the race-condition still exists.
> 
> -Liran
> 
> > 
> >>> 
> >>> -- 
> >>> MST

Liran Alon

2019-Mar-21 13:25 UTC

head link

[summary] virtio network device failover writeup

> On 21 Mar 2019, at 15:12, Michael S. Tsirkin <mst at redhat.com>
wrote:
> 
> On Thu, Mar 21, 2019 at 03:04:37PM +0200, Liran Alon wrote:
>> 
>> 
>>> On 21 Mar 2019, at 14:57, Michael S. Tsirkin <mst at
redhat.com> wrote:
>>> 
>>> On Thu, Mar 21, 2019 at 02:47:50PM +0200, Liran Alon wrote:
>>>> 
>>>> 
>>>>> On 21 Mar 2019, at 14:37, Michael S. Tsirkin <mst at
redhat.com> wrote:
>>>>> 
>>>>> On Thu, Mar 21, 2019 at 12:07:57PM +0200, Liran Alon wrote:
>>>>>>>>>> 2) It brings non-intuitive customer
experience. For example, a customer may attempt to analyse connectivity issue by
checking the connectivity
>>>>>>>>>> on a net-failover slave (e.g. the VF)
but will see no connectivity when in-fact checking the connectivity on the
net-failover master netdev shows correct connectivity.
>>>>>>>>>> 
>>>>>>>>>> The set of changes I vision to fix our
issues are:
>>>>>>>>>> 1) Hide net-failover slaves in a
different netns created and managed by the kernel. But that user can enter to it
and manage the netdevs there if wishes to do so explicitly.
>>>>>>>>>> (E.g. Configure the net-failover VF
slave in some special way).
>>>>>>>>>> 2) Match the virtio-net and the VF
based on a PV attribute instead of MAC. (Similar to as done in NetVSC). E.g.
Provide a virtio-net interface to get PCI slot where the matching VF will be
hot-plugged by hypervisor.
>>>>>>>>>> 3) Have an explicit virtio-net control
message to command hypervisor to switch data-path from virtio-net to VF and
vice-versa. Instead of relying on intercepting the PCI master enable-bit
>>>>>>>>>> as an indicator on when VF is about to
be set up. (Similar to as done in NetVSC).
>>>>>>>>>> 
>>>>>>>>>> Is there any clear issue we see
regarding the above suggestion?
>>>>>>>>>> 
>>>>>>>>>> -Liran
>>>>>>>>> 
>>>>>>>>> The issue would be this: how do we avoid
conflicting with namespaces
>>>>>>>>> created by users?
>>>>>>>> 
>>>>>>>> This is kinda controversial, but maybe separate
netns names into 2 groups: hidden and normal.
>>>>>>>> To reference a hidden netns, you need to do it
explicitly.
>>>>>>>> Hidden and normal netns names can collide as
they will be maintained in different namespaces (Yes I?m overloading the term
namespace here?).
>>>>>>> 
>>>>>>> Maybe it's an unnamed namespace. Hidden until
userspace gives it a name?
>>>>>> 
>>>>>> This is also a good idea that will solve the issue.
Yes.
>>>>>> 
>>>>>>> 
>>>>>>>> Does this seems reasonable?
>>>>>>>> 
>>>>>>>> -Liran
>>>>>>> 
>>>>>>> Reasonable I'd say yes, easy to implement
probably no. But maybe I
>>>>>>> missed a trick or two.
>>>>>> 
>>>>>> BTW, from a practical point of view, I think that even
until we figure out a solution on how to implement this,
>>>>>> it was better to create an kernel auto-generated name
(e.g. ?kernel_net_failover_slaves")
>>>>>> that will break only userspace workloads that by a very
rare-chance have a netns that collides with this then
>>>>>> the breakage we have today for the various userspace
components.
>>>>>> 
>>>>>> -Liran
>>>>> 
>>>>> It seems quite easy to supply that as a module parameter.
Do we need two
>>>>> namespaces though? Won't some userspace still be
confused by the two
>>>>> slaves sharing the MAC address?
>>>> 
>>>> That?s one reasonable option.
>>>> Another one is that we will indeed change the mechanism by
which we determine a VF should be bonded with a virtio-net device.
>>>> i.e. Expose a new virtio-net property that specify the PCI slot
of the VF to be bonded with.
>>>> 
>>>> The second seems cleaner but I don?t have a strong opinion on
this. Both seem reasonable to me and your suggestion is faster to implement from
current state of things.
>>>> 
>>>> -Liran
>>> 
>>> OK. Now what happens if master is moved to another namespace? Do we
need
>>> to move the slaves too?
>> 
>> No. Why would we move the slaves?
> 
> 
> The reason we have 3 device model at all is so users can fine tune the
> slaves.
I Agree.
> I don't see why this applies to the root namespace but not
> a container. If it has access to failover it should have access
> to slaves.
Oh now I see your point. I haven?t thought about the containers usage.
My thinking was that customer can always just enter to the ?hidden? netns and
configure there whatever he wants.

Do you have a suggestion how to handle this?

One option can be that every "visible" netns on system will have a
?hidden? unnamed netns where the net-failover slaves reside in.
If customer wishes to be able to enter to that netns and manage the net-failover
slaves explicitly, it will need to have an updated iproute2
that knows how to enter to that hidden netns. For most customers, they won?t
need to ever enter that netns and thus it is ok they don?t
have this updated iproute2.
> 
>> The whole point is to make most customer ignore the net-failover slaves
and remain them ?hidden? in their dedicated netns.
> 
> So that makes the common case easy. That is good. My worry is it might
> make some uncommon cases impossible.
> 
>> We won?t prevent customer from explicitly moving the net-failover
slaves out of this netns, but we will not move them out of there automatically.
>> 
>>> 
>>> Also siwei's patch is then kind of extraneous right?
>>> Attempts to rename a slave will now fail as it's in a
namespace?
>> 
>> I?m not sure actually. Isn't udev/systemd netns-aware?
>> I would expect it to be able to provide names also to netdevs in netns
different than default netns.
> 
> I think most people move devices after they are renamed.
So?
Si-Wei patch handles the issue that resolves from the fact the net-failover
master will be opened before the rename on the net-failover slaves occur.
This should happen (to my understanding) regardless of network namespaces.

-Liran
> 
>> If that?s the case, Si-Wei patch to be able to rename a net-failover
slave when it is already open is still required. As the race-condition still
exists.
>> 
>> -Liran
>> 
>>> 
>>>>> 
>>>>> -- 
>>>>> MST

Stephen Hemminger

2019-Mar-21 15:45 UTC

head link

[summary] virtio network device failover writeup

On Thu, 21 Mar 2019 15:04:37 +0200
Liran Alon <liran.alon at oracle.com> wrote:
> > 
> > OK. Now what happens if master is moved to another namespace? Do we
need
> > to move the slaves too?  
> 
> No. Why would we move the slaves? The whole point is to make most customer
ignore the net-failover slaves and remain them ?hidden? in their dedicated
netns.
> We won?t prevent customer from explicitly moving the net-failover slaves
out of this netns, but we will not move them out of there automatically.

The 2-device netvsc already handles case where master changes namespace.

Michael S. Tsirkin

2019-Mar-21 15:50 UTC

head link

[summary] virtio network device failover writeup

On Thu, Mar 21, 2019 at 08:45:17AM -0700, Stephen Hemminger
wrote:> On Thu, 21 Mar 2019 15:04:37 +0200
> Liran Alon <liran.alon at oracle.com> wrote:
> 
> > > 
> > > OK. Now what happens if master is moved to another namespace? Do
we need
> > > to move the slaves too?  
> > 
> > No. Why would we move the slaves? The whole point is to make most
customer ignore the net-failover slaves and remain them ?hidden? in their
dedicated netns.
> > We won?t prevent customer from explicitly moving the net-failover
slaves out of this netns, but we will not move them out of there automatically.
> 
> 
> The 2-device netvsc already handles case where master changes namespace.
Is it by moving slave with it?

-- 
MST

Apparently Analagous Threads

Search for more seemingly similar threads

Linux Virtualization - Mar 2019 - [summary] virtio network device failover writeup

[summary] virtio network device failover writeup

[summary] virtio network device failover writeup

[summary] virtio network device failover writeup

[summary] virtio network device failover writeup

[summary] virtio network device failover writeup

Apparently Analagous Threads