> On 21 Mar 2019, at 10:58, Michael S. Tsirkin <mst at redhat.com>
wrote:
>
> On Thu, Mar 21, 2019 at 12:19:22AM +0200, Liran Alon wrote:
>>
>>
>>> On 21 Mar 2019, at 0:10, Michael S. Tsirkin <mst at
redhat.com> wrote:
>>>
>>> On Wed, Mar 20, 2019 at 11:43:41PM +0200, Liran Alon wrote:
>>>>
>>>>
>>>>> On 20 Mar 2019, at 16:09, Michael S. Tsirkin <mst at
redhat.com> wrote:
>>>>>
>>>>> On Wed, Mar 20, 2019 at 02:23:36PM +0200, Liran Alon wrote:
>>>>>>
>>>>>>
>>>>>>> On 20 Mar 2019, at 12:25, Michael S. Tsirkin
<mst at redhat.com> wrote:
>>>>>>>
>>>>>>> On Wed, Mar 20, 2019 at 01:25:58AM +0200, Liran
Alon wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> On 19 Mar 2019, at 23:19, Michael S.
Tsirkin <mst at redhat.com> wrote:
>>>>>>>>>
>>>>>>>>> On Tue, Mar 19, 2019 at 08:46:47AM -0700,
Stephen Hemminger wrote:
>>>>>>>>>> On Tue, 19 Mar 2019 14:38:06 +0200
>>>>>>>>>> Liran Alon <liran.alon at
oracle.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> b.3) cloud-init: If configured to
perform network-configuration, it attempts to configure all available netdevs.
It should avoid however doing so on net-failover slaves.
>>>>>>>>>>> (Microsoft has handled this by
adding a mechanism in cloud-init to blacklist a netdev from being configured in
case it is owned by a specific PCI driver. Specifically, they blacklist Mellanox
VF driver. However, this technique doesn?t work for the net-failover mechanism
because both the net-failover netdev and the virtio-net netdev are owned by the
virtio-net PCI driver).
>>>>>>>>>>
>>>>>>>>>> Cloud-init should really just ignore
all devices that have a master device.
>>>>>>>>>> That would have been more general, and
safer for other use cases.
>>>>>>>>>
>>>>>>>>> Given lots of userspace doesn't do
this, I wonder whether it would be
>>>>>>>>> safer to just somehow pretend to userspace
that the slave links are
>>>>>>>>> down? And add a special attribute for the
actual link state.
>>>>>>>>
>>>>>>>> I think this may be problematic as it would
also break legit use case
>>>>>>>> of userspace attempt to set various config on
VF slave.
>>>>>>>> In general, lying to userspace usually leads to
problems.
>>>>>>>
>>>>>>> I hear you on this. So how about instead of lying,
>>>>>>> we basically just fail some accesses to slaves
>>>>>>> unless a flag is set e.g. in ethtool.
>>>>>>>
>>>>>>> Some userspace will need to change to set it but in
a minor way.
>>>>>>> Arguably/hopefully failure to set config would
generally be a safer
>>>>>>> failure.
>>>>>>
>>>>>> Once userspace will set this new flag by ethtool, all
operations done by other userspace components will still work.
>>>>>
>>>>> Sorry about being unclear, the idea would be to require the
flag on each ethtool operation.
>>>>
>>>> Oh. I have indeed misunderstood your previous email then. :)
>>>> Thanks for clarifying.
>>>>
>>>>>
>>>>>> E.g. Running dhclient without parameters, after this
flag was set, will still attempt to perform DHCP on it and will now succeed.
>>>>>
>>>>> I think sending/receiving should probably just fail
unconditionally.
>>>>
>>>> You mean that you wish that somehow kernel will prevent Tx on
net-failover slave netdev
>>>> unless skb is marked with some flag to indicate it has been
sent via the net-failover master?
>>>
>>> We can maybe avoid binding a protocol socket to the device?
>>
>> That is indeed another possibility that would work to avoid the DHCP
issues.
>> And will still allow checking connectivity. So it is better.
>> However, I still think it provides an non-intuitive customer
experience.
>> In addition, I also want to take into account that most customers are
expected a 1:1 mapping between a vNIC and a netdev.
>> i.e. A cloud instance should show 1-netdev if it has one vNIC attached
to it defined.
>> Customers usually don?t care how they get accelerated networking. They
just care they do.
>>
>>>
>>>> This indeed resolves the group of userspace issues around
performing DHCP on net-failover slaves directly (By dracut/initramfs, dhclient
and etc.).
>>>>
>>>> However, I see a couple of down-sides to it:
>>>> 1) It doesn?t resolve all userspace issues listed in this email
thread. For example, cloud-init will still attempt to perform network config on
net-failover slaves.
>>>> It also doesn?t help with regard to Ubuntu?s netplan issue that
creates udev rules that match only by MAC.
>>>
>>>
>>> How about we fail to retrieve mac from the slave?
>>
>> That would work but I think it is cleaner to just not bind PV and VF
based on having the same MAC.
>
> There's a reference to that under "Non-MAC based pairing".
>
> I'll look into making it more explicit.
Yes I know. I was referring to what you described in that section.
>
>>>
>>>> 2) It brings non-intuitive customer experience. For example, a
customer may attempt to analyse connectivity issue by checking the connectivity
>>>> on a net-failover slave (e.g. the VF) but will see no
connectivity when in-fact checking the connectivity on the net-failover master
netdev shows correct connectivity.
>>>>
>>>> The set of changes I vision to fix our issues are:
>>>> 1) Hide net-failover slaves in a different netns created and
managed by the kernel. But that user can enter to it and manage the netdevs
there if wishes to do so explicitly.
>>>> (E.g. Configure the net-failover VF slave in some special way).
>>>> 2) Match the virtio-net and the VF based on a PV attribute
instead of MAC. (Similar to as done in NetVSC). E.g. Provide a virtio-net
interface to get PCI slot where the matching VF will be hot-plugged by
hypervisor.
>>>> 3) Have an explicit virtio-net control message to command
hypervisor to switch data-path from virtio-net to VF and vice-versa. Instead of
relying on intercepting the PCI master enable-bit
>>>> as an indicator on when VF is about to be set up. (Similar to
as done in NetVSC).
>>>>
>>>> Is there any clear issue we see regarding the above suggestion?
>>>>
>>>> -Liran
>>>
>>> The issue would be this: how do we avoid conflicting with
namespaces
>>> created by users?
>>
>> This is kinda controversial, but maybe separate netns names into 2
groups: hidden and normal.
>> To reference a hidden netns, you need to do it explicitly.
>> Hidden and normal netns names can collide as they will be maintained in
different namespaces (Yes I?m overloading the term namespace here?).
>
> Maybe it's an unnamed namespace. Hidden until userspace gives it a
name?
This is also a good idea that will solve the issue. Yes.
>
>> Does this seems reasonable?
>>
>> -Liran
>
> Reasonable I'd say yes, easy to implement probably no. But maybe I
> missed a trick or two.
BTW, from a practical point of view, I think that even until we figure out a
solution on how to implement this,
it was better to create an kernel auto-generated name (e.g.
?kernel_net_failover_slaves")
that will break only userspace workloads that by a very rare-chance have a netns
that collides with this then
the breakage we have today for the various userspace components.
-Liran
>
>>>
>>>>>
>>>>>> Therefore, this proposal just effectively delays when
the net-failover slave can be operated on by userspace.
>>>>>> But what we actually want is to never allow a
net-failover slave to be operated by userspace unless it is explicitly stated
>>>>>> by userspace that it wishes to perform a set of actions
on the net-failover slave.
>>>>>>
>>>>>> Something that was achieved if, for example, the
net-failover slaves were in a different netns than default netns.
>>>>>> This also aligns with expected customer experience that
most customers just want to see a 1:1 mapping between a vNIC and a visible
netdev.
>>>>>> But of course maybe there are other ideas that can
achieve similar behaviour.
>>>>>>
>>>>>> -Liran
>>>>>>
>>>>>>>
>>>>>>> Which things to fail? Probably sending/receiving
packets? Getting MAC?
>>>>>>> More?
>>>>>>>
>>>>>>>> If we reach
>>>>>>>> to a scenario where we try to avoid userspace
issues generically and
>>>>>>>> not on a userspace component basis, I believe
the right path should be
>>>>>>>> to hide the net-failover slaves such that
explicit action is required
>>>>>>>> to actually manipulate them (As described in
blog-post). E.g.
>>>>>>>> Automatically move net-failover slaves by
kernel to a different netns.
>>>>>>>>
>>>>>>>> -Liran
>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> MST