> On 21 Mar 2019, at 14:37, Michael S. Tsirkin <mst at redhat.com> wrote: > > On Thu, Mar 21, 2019 at 12:07:57PM +0200, Liran Alon wrote: >>>>>> 2) It brings non-intuitive customer experience. For example, a customer may attempt to analyse connectivity issue by checking the connectivity >>>>>> on a net-failover slave (e.g. the VF) but will see no connectivity when in-fact checking the connectivity on the net-failover master netdev shows correct connectivity. >>>>>> >>>>>> The set of changes I vision to fix our issues are: >>>>>> 1) Hide net-failover slaves in a different netns created and managed by the kernel. But that user can enter to it and manage the netdevs there if wishes to do so explicitly. >>>>>> (E.g. Configure the net-failover VF slave in some special way). >>>>>> 2) Match the virtio-net and the VF based on a PV attribute instead of MAC. (Similar to as done in NetVSC). E.g. Provide a virtio-net interface to get PCI slot where the matching VF will be hot-plugged by hypervisor. >>>>>> 3) Have an explicit virtio-net control message to command hypervisor to switch data-path from virtio-net to VF and vice-versa. Instead of relying on intercepting the PCI master enable-bit >>>>>> as an indicator on when VF is about to be set up. (Similar to as done in NetVSC). >>>>>> >>>>>> Is there any clear issue we see regarding the above suggestion? >>>>>> >>>>>> -Liran >>>>> >>>>> The issue would be this: how do we avoid conflicting with namespaces >>>>> created by users? >>>> >>>> This is kinda controversial, but maybe separate netns names into 2 groups: hidden and normal. >>>> To reference a hidden netns, you need to do it explicitly. >>>> Hidden and normal netns names can collide as they will be maintained in different namespaces (Yes I?m overloading the term namespace here?). >>> >>> Maybe it's an unnamed namespace. Hidden until userspace gives it a name? >> >> This is also a good idea that will solve the issue. Yes. >> >>> >>>> Does this seems reasonable? >>>> >>>> -Liran >>> >>> Reasonable I'd say yes, easy to implement probably no. But maybe I >>> missed a trick or two. >> >> BTW, from a practical point of view, I think that even until we figure out a solution on how to implement this, >> it was better to create an kernel auto-generated name (e.g. ?kernel_net_failover_slaves") >> that will break only userspace workloads that by a very rare-chance have a netns that collides with this then >> the breakage we have today for the various userspace components. >> >> -Liran > > It seems quite easy to supply that as a module parameter. Do we need two > namespaces though? Won't some userspace still be confused by the two > slaves sharing the MAC address?That?s one reasonable option. Another one is that we will indeed change the mechanism by which we determine a VF should be bonded with a virtio-net device. i.e. Expose a new virtio-net property that specify the PCI slot of the VF to be bonded with. The second seems cleaner but I don?t have a strong opinion on this. Both seem reasonable to me and your suggestion is faster to implement from current state of things. -Liran> > -- > MST
On Thu, Mar 21, 2019 at 02:47:50PM +0200, Liran Alon wrote:> > > > On 21 Mar 2019, at 14:37, Michael S. Tsirkin <mst at redhat.com> wrote: > > > > On Thu, Mar 21, 2019 at 12:07:57PM +0200, Liran Alon wrote: > >>>>>> 2) It brings non-intuitive customer experience. For example, a customer may attempt to analyse connectivity issue by checking the connectivity > >>>>>> on a net-failover slave (e.g. the VF) but will see no connectivity when in-fact checking the connectivity on the net-failover master netdev shows correct connectivity. > >>>>>> > >>>>>> The set of changes I vision to fix our issues are: > >>>>>> 1) Hide net-failover slaves in a different netns created and managed by the kernel. But that user can enter to it and manage the netdevs there if wishes to do so explicitly. > >>>>>> (E.g. Configure the net-failover VF slave in some special way). > >>>>>> 2) Match the virtio-net and the VF based on a PV attribute instead of MAC. (Similar to as done in NetVSC). E.g. Provide a virtio-net interface to get PCI slot where the matching VF will be hot-plugged by hypervisor. > >>>>>> 3) Have an explicit virtio-net control message to command hypervisor to switch data-path from virtio-net to VF and vice-versa. Instead of relying on intercepting the PCI master enable-bit > >>>>>> as an indicator on when VF is about to be set up. (Similar to as done in NetVSC). > >>>>>> > >>>>>> Is there any clear issue we see regarding the above suggestion? > >>>>>> > >>>>>> -Liran > >>>>> > >>>>> The issue would be this: how do we avoid conflicting with namespaces > >>>>> created by users? > >>>> > >>>> This is kinda controversial, but maybe separate netns names into 2 groups: hidden and normal. > >>>> To reference a hidden netns, you need to do it explicitly. > >>>> Hidden and normal netns names can collide as they will be maintained in different namespaces (Yes I?m overloading the term namespace here?). > >>> > >>> Maybe it's an unnamed namespace. Hidden until userspace gives it a name? > >> > >> This is also a good idea that will solve the issue. Yes. > >> > >>> > >>>> Does this seems reasonable? > >>>> > >>>> -Liran > >>> > >>> Reasonable I'd say yes, easy to implement probably no. But maybe I > >>> missed a trick or two. > >> > >> BTW, from a practical point of view, I think that even until we figure out a solution on how to implement this, > >> it was better to create an kernel auto-generated name (e.g. ?kernel_net_failover_slaves") > >> that will break only userspace workloads that by a very rare-chance have a netns that collides with this then > >> the breakage we have today for the various userspace components. > >> > >> -Liran > > > > It seems quite easy to supply that as a module parameter. Do we need two > > namespaces though? Won't some userspace still be confused by the two > > slaves sharing the MAC address? > > That?s one reasonable option. > Another one is that we will indeed change the mechanism by which we determine a VF should be bonded with a virtio-net device. > i.e. Expose a new virtio-net property that specify the PCI slot of the VF to be bonded with. > > The second seems cleaner but I don?t have a strong opinion on this. Both seem reasonable to me and your suggestion is faster to implement from current state of things. > > -LiranOK. Now what happens if master is moved to another namespace? Do we need to move the slaves too? Also siwei's patch is then kind of extraneous right? Attempts to rename a slave will now fail as it's in a namespace...> > > > -- > > MST
> On 21 Mar 2019, at 14:57, Michael S. Tsirkin <mst at redhat.com> wrote: > > On Thu, Mar 21, 2019 at 02:47:50PM +0200, Liran Alon wrote: >> >> >>> On 21 Mar 2019, at 14:37, Michael S. Tsirkin <mst at redhat.com> wrote: >>> >>> On Thu, Mar 21, 2019 at 12:07:57PM +0200, Liran Alon wrote: >>>>>>>> 2) It brings non-intuitive customer experience. For example, a customer may attempt to analyse connectivity issue by checking the connectivity >>>>>>>> on a net-failover slave (e.g. the VF) but will see no connectivity when in-fact checking the connectivity on the net-failover master netdev shows correct connectivity. >>>>>>>> >>>>>>>> The set of changes I vision to fix our issues are: >>>>>>>> 1) Hide net-failover slaves in a different netns created and managed by the kernel. But that user can enter to it and manage the netdevs there if wishes to do so explicitly. >>>>>>>> (E.g. Configure the net-failover VF slave in some special way). >>>>>>>> 2) Match the virtio-net and the VF based on a PV attribute instead of MAC. (Similar to as done in NetVSC). E.g. Provide a virtio-net interface to get PCI slot where the matching VF will be hot-plugged by hypervisor. >>>>>>>> 3) Have an explicit virtio-net control message to command hypervisor to switch data-path from virtio-net to VF and vice-versa. Instead of relying on intercepting the PCI master enable-bit >>>>>>>> as an indicator on when VF is about to be set up. (Similar to as done in NetVSC). >>>>>>>> >>>>>>>> Is there any clear issue we see regarding the above suggestion? >>>>>>>> >>>>>>>> -Liran >>>>>>> >>>>>>> The issue would be this: how do we avoid conflicting with namespaces >>>>>>> created by users? >>>>>> >>>>>> This is kinda controversial, but maybe separate netns names into 2 groups: hidden and normal. >>>>>> To reference a hidden netns, you need to do it explicitly. >>>>>> Hidden and normal netns names can collide as they will be maintained in different namespaces (Yes I?m overloading the term namespace here?). >>>>> >>>>> Maybe it's an unnamed namespace. Hidden until userspace gives it a name? >>>> >>>> This is also a good idea that will solve the issue. Yes. >>>> >>>>> >>>>>> Does this seems reasonable? >>>>>> >>>>>> -Liran >>>>> >>>>> Reasonable I'd say yes, easy to implement probably no. But maybe I >>>>> missed a trick or two. >>>> >>>> BTW, from a practical point of view, I think that even until we figure out a solution on how to implement this, >>>> it was better to create an kernel auto-generated name (e.g. ?kernel_net_failover_slaves") >>>> that will break only userspace workloads that by a very rare-chance have a netns that collides with this then >>>> the breakage we have today for the various userspace components. >>>> >>>> -Liran >>> >>> It seems quite easy to supply that as a module parameter. Do we need two >>> namespaces though? Won't some userspace still be confused by the two >>> slaves sharing the MAC address? >> >> That?s one reasonable option. >> Another one is that we will indeed change the mechanism by which we determine a VF should be bonded with a virtio-net device. >> i.e. Expose a new virtio-net property that specify the PCI slot of the VF to be bonded with. >> >> The second seems cleaner but I don?t have a strong opinion on this. Both seem reasonable to me and your suggestion is faster to implement from current state of things. >> >> -Liran > > OK. Now what happens if master is moved to another namespace? Do we need > to move the slaves too?No. Why would we move the slaves? The whole point is to make most customer ignore the net-failover slaves and remain them ?hidden? in their dedicated netns. We won?t prevent customer from explicitly moving the net-failover slaves out of this netns, but we will not move them out of there automatically.> > Also siwei's patch is then kind of extraneous right? > Attempts to rename a slave will now fail as it's in a namespace?I?m not sure actually. Isn't udev/systemd netns-aware? I would expect it to be able to provide names also to netdevs in netns different than default netns. If that?s the case, Si-Wei patch to be able to rename a net-failover slave when it is already open is still required. As the race-condition still exists. -Liran> >>> >>> -- >>> MST
On Thu, 21 Mar 2019 08:57:03 -0400 "Michael S. Tsirkin" <mst at redhat.com> wrote:> On Thu, Mar 21, 2019 at 02:47:50PM +0200, Liran Alon wrote: > > > > > > > On 21 Mar 2019, at 14:37, Michael S. Tsirkin <mst at redhat.com> wrote: > > > > > > On Thu, Mar 21, 2019 at 12:07:57PM +0200, Liran Alon wrote: > > >>>>>> 2) It brings non-intuitive customer experience. For example, a customer may attempt to analyse connectivity issue by checking the connectivity > > >>>>>> on a net-failover slave (e.g. the VF) but will see no connectivity when in-fact checking the connectivity on the net-failover master netdev shows correct connectivity. > > >>>>>> > > >>>>>> The set of changes I vision to fix our issues are: > > >>>>>> 1) Hide net-failover slaves in a different netns created and managed by the kernel. But that user can enter to it and manage the netdevs there if wishes to do so explicitly. > > >>>>>> (E.g. Configure the net-failover VF slave in some special way). > > >>>>>> 2) Match the virtio-net and the VF based on a PV attribute instead of MAC. (Similar to as done in NetVSC). E.g. Provide a virtio-net interface to get PCI slot where the matching VF will be hot-plugged by hypervisor. > > >>>>>> 3) Have an explicit virtio-net control message to command hypervisor to switch data-path from virtio-net to VF and vice-versa. Instead of relying on intercepting the PCI master enable-bit > > >>>>>> as an indicator on when VF is about to be set up. (Similar to as done in NetVSC). > > >>>>>> > > >>>>>> Is there any clear issue we see regarding the above suggestion? > > >>>>>> > > >>>>>> -Liran > > >>>>> > > >>>>> The issue would be this: how do we avoid conflicting with namespaces > > >>>>> created by users? > > >>>> > > >>>> This is kinda controversial, but maybe separate netns names into 2 groups: hidden and normal. > > >>>> To reference a hidden netns, you need to do it explicitly. > > >>>> Hidden and normal netns names can collide as they will be maintained in different namespaces (Yes I?m overloading the term namespace here?). > > >>> > > >>> Maybe it's an unnamed namespace. Hidden until userspace gives it a name? > > >> > > >> This is also a good idea that will solve the issue. Yes. > > >> > > >>> > > >>>> Does this seems reasonable? > > >>>> > > >>>> -Liran > > >>> > > >>> Reasonable I'd say yes, easy to implement probably no. But maybe I > > >>> missed a trick or two. > > >> > > >> BTW, from a practical point of view, I think that even until we figure out a solution on how to implement this, > > >> it was better to create an kernel auto-generated name (e.g. ?kernel_net_failover_slaves") > > >> that will break only userspace workloads that by a very rare-chance have a netns that collides with this then > > >> the breakage we have today for the various userspace components. > > >> > > >> -Liran > > > > > > It seems quite easy to supply that as a module parameter. Do we need two > > > namespaces though? Won't some userspace still be confused by the two > > > slaves sharing the MAC address? > > > > That?s one reasonable option. > > Another one is that we will indeed change the mechanism by which we determine a VF should be bonded with a virtio-net device. > > i.e. Expose a new virtio-net property that specify the PCI slot of the VF to be bonded with. > > > > The second seems cleaner but I don?t have a strong opinion on this. Both seem reasonable to me and your suggestion is faster to implement from current state of things. > > > > -Liran > > OK. Now what happens if master is moved to another namespace? Do we need > to move the slaves too? > > Also siwei's patch is then kind of extraneous right? > Attempts to rename a slave will now fail as it's in a namespace...I did try moving slave device into a namespace at one point. The problem is that introduces all sorts of locking problems in the code because you can't do it directly in the context of when the callback happens that a new slave device is discovered. Since you can't safely change device namespace in the notifier, it requires a work queue. Then you add more complexity and error cases because the slave is exposed for a short period, and handling all the state race unwinds... Good idea but hard to implement