thr3ads.net - Linux Virtualization - [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device [Feb 2018]

If this information is useful, please help other people find it:
Share via:

Alexander Duyck

2018-Feb-22 00:17 UTC

[RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

On Wed, Feb 21, 2018 at 3:50 PM, Siwei Liu <loseweigh at gmail.com>
wrote:> I haven't checked emails for days and did not realize the new revision
> had already came out. And thank you for the effort, this revision
> really looks to be a step forward towards our use case and is close to
> what we wanted to do. A few questions in line.
>
> On Sat, Feb 17, 2018 at 9:12 AM, Alexander Duyck
> <alexander.duyck at gmail.com> wrote:
>> On Fri, Feb 16, 2018 at 6:38 PM, Jakub Kicinski <kubakici at
wp.pl> wrote:
>>> On Fri, 16 Feb 2018 10:11:19 -0800, Sridhar Samudrala wrote:
>>>> Ppatch 2 is in response to the community request for a 3 netdev
>>>> solution.  However, it creates some issues we'll get into
in a moment.
>>>> It extends virtio_net to use alternate datapath when available
and
>>>> registered. When BACKUP feature is enabled, virtio_net driver
creates
>>>> an additional 'bypass' netdev that acts as a master
device and controls
>>>> 2 slave devices.  The original virtio_net netdev is registered
as
>>>> 'backup' netdev and a passthru/vf device with the same
MAC gets
>>>> registered as 'active' netdev. Both 'bypass'
and 'backup' netdevs are
>>>> associated with the same 'pci' device.  The user
accesses the network
>>>> interface via 'bypass' netdev. The 'bypass'
netdev chooses 'active' netdev
>>>> as default for transmits when it is available with link up and
running.
>>>
>>> Thank you do doing this.
>>>
>>>> We noticed a couple of issues with this approach during
testing.
>>>> - As both 'bypass' and 'backup' netdevs are
associated with the same
>>>>   virtio pci device, udev tries to rename both of them with the
same name
>>>>   and the 2nd rename will fail. This would be OK as long as the
first netdev
>>>>   to be renamed is the 'bypass' netdev, but the order
in which udev gets
>>>>   to rename the 2 netdevs is not reliable.
>>>
>>> Out of curiosity - why do you link the master netdev to the virtio
>>> struct device?
>>
>> The basic idea of all this is that we wanted this to work with an
>> existing VM image that was using virtio. As such we were trying to
>> make it so that the bypass interface takes the place of the original
>> virtio and get udev to rename the bypass to what the original
>> virtio_net was.
>
> Could it made it also possible to take over the config from VF instead
> of virtio on an existing VM image? And get udev rename the bypass
> netdev to what the original VF was. I don't say tightly binding the
> bypass master to only virtio or VF, but I think we should provide both
> options to support different upgrade paths. Possibly we could tweak
> the device tree layout to reuse the same PCI slot for the master
> bypass netdev, such that udev would not get confused when renaming the
> device. The VF needs to use a different function slot afterwards.
> Perhaps we might need to a special multiseat like QEMU device for that
> purpose?
>
> Our case we'll upgrade the config from VF to virtio-bypass directly.
So if I am understanding what you are saying you are wanting to flip
the backup interface from the virtio to a VF. The problem is that
becomes a bit of a vendor lock-in solution since it would rely on a
specific VF driver. I would agree with Jiri that we don't want to go
down that path. We don't want every VF out there firing up its own
separate bond. Ideally you want the hypervisor to be able to manage
all of this which is why it makes sense to have virtio manage this and
why this is associated with the virtio_net interface.

The other bits get into more complexity then we are ready to handle
for now. I think I might have talked about something similar that I
was referring to as a "virtio-bond" where you would have a PCI/PCIe
tree topology that makes this easier to sort out, and the "virtio-bond
would be used to handle coordination/configuration of a much more
complex interface.
>>
>>> FWIW two solutions that immediately come to mind is to export
"backup"
>>> as phys_port_name of the backup virtio link and/or assign a name to
the
>>> master like you are doing already.  I think team uses team%d and
bond
>>> uses bond%d, soft naming of master devices seems quite natural in
this
>>> case.
>>
>> I figured I had overlooked something like that.. Thanks for pointing
>> this out. Okay so I think the phys_port_name approach might resolve
>> the original issue. If I am reading things correctly what we end up
>> with is the master showing up as "ens1" for example and the
backup
>> showing up as "ens1nbackup". Am I understanding that right?
>>
>> The problem with the team/bond%d approach is that it creates a new
>> netdevice and so it would require guest configuration changes.
>>
>>> IMHO phys_port_name == "backup" if BACKUP bit is set on
slave virtio
>>> link is quite neat.
>>
>> I agree. For non-"backup" virio_net devices would it be okay
for us to
>> just return -EOPNOTSUPP? I assume it would be and that way the legacy
>> behavior could be maintained although the function still exists.
>>
>>>> - When the 'active' netdev is unplugged OR not present
on a destination
>>>>   system after live migration, the user will see 2 virtio_net
netdevs.
>>>
>>> That's necessary and expected, all configuration applies to the
master
>>> so master must exist.
>>
>> With the naming issue resolved this is the only item left outstanding.
>> This becomes a matter of form vs function.
>>
>> The main complaint about the "3 netdev" solution is a bit
confusing to
>> have the 2 netdevs present if the VF isn't there. The idea is that
>> having the extra "master" netdev there if there isn't
really a bond is
>> a bit ugly.
>
> Is it this uglier in terms of user experience rather than
> functionality? I don't want it dynamically changed between 2-netdev
> and 3-netdev depending on the presence of VF. That gets back to my
> original question and suggestion earlier: why not just hide the lower
> netdevs from udev renaming and such? Which important observability
> benefits users may get if exposing the lower netdevs?
>
> Thanks,
> -Siwei
The only real advantage to a 2 netdev solution is that it looks like
the netvsc solution, however it doesn't behave like it since there are
some features like XDP that may not function correctly if they are
left enabled in the virtio_net interface.

As far as functionality the advantage of not hiding the lower devices
is that they are free to be managed. The problem with pushing all of
the configuration into the upper device is that you are limited to the
intersection of the features of the lower devices. This can be
limiting for some setups as some VFs support things like more queues,
or better interrupt moderation options than others so trying to make
everything work with one config would be ugly.

- Alex

Siwei Liu

2018-Feb-22 01:59 UTC

head link

[RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

On Wed, Feb 21, 2018 at 4:17 PM, Alexander Duyck
<alexander.duyck at gmail.com> wrote:> On Wed, Feb 21, 2018 at 3:50 PM, Siwei Liu <loseweigh at gmail.com>
wrote:
>> I haven't checked emails for days and did not realize the new
revision
>> had already came out. And thank you for the effort, this revision
>> really looks to be a step forward towards our use case and is close to
>> what we wanted to do. A few questions in line.
>>
>> On Sat, Feb 17, 2018 at 9:12 AM, Alexander Duyck
>> <alexander.duyck at gmail.com> wrote:
>>> On Fri, Feb 16, 2018 at 6:38 PM, Jakub Kicinski <kubakici at
wp.pl> wrote:
>>>> On Fri, 16 Feb 2018 10:11:19 -0800, Sridhar Samudrala wrote:
>>>>> Ppatch 2 is in response to the community request for a 3
netdev
>>>>> solution.  However, it creates some issues we'll get
into in a moment.
>>>>> It extends virtio_net to use alternate datapath when
available and
>>>>> registered. When BACKUP feature is enabled, virtio_net
driver creates
>>>>> an additional 'bypass' netdev that acts as a master
device and controls
>>>>> 2 slave devices.  The original virtio_net netdev is
registered as
>>>>> 'backup' netdev and a passthru/vf device with the
same MAC gets
>>>>> registered as 'active' netdev. Both
'bypass' and 'backup' netdevs are
>>>>> associated with the same 'pci' device.  The user
accesses the network
>>>>> interface via 'bypass' netdev. The 'bypass'
netdev chooses 'active' netdev
>>>>> as default for transmits when it is available with link up
and running.
>>>>
>>>> Thank you do doing this.
>>>>
>>>>> We noticed a couple of issues with this approach during
testing.
>>>>> - As both 'bypass' and 'backup' netdevs are
associated with the same
>>>>>   virtio pci device, udev tries to rename both of them with
the same name
>>>>>   and the 2nd rename will fail. This would be OK as long as
the first netdev
>>>>>   to be renamed is the 'bypass' netdev, but the
order in which udev gets
>>>>>   to rename the 2 netdevs is not reliable.
>>>>
>>>> Out of curiosity - why do you link the master netdev to the
virtio
>>>> struct device?
>>>
>>> The basic idea of all this is that we wanted this to work with an
>>> existing VM image that was using virtio. As such we were trying to
>>> make it so that the bypass interface takes the place of the
original
>>> virtio and get udev to rename the bypass to what the original
>>> virtio_net was.
>>
>> Could it made it also possible to take over the config from VF instead
>> of virtio on an existing VM image? And get udev rename the bypass
>> netdev to what the original VF was. I don't say tightly binding the
>> bypass master to only virtio or VF, but I think we should provide both
>> options to support different upgrade paths. Possibly we could tweak
>> the device tree layout to reuse the same PCI slot for the master
>> bypass netdev, such that udev would not get confused when renaming the
>> device. The VF needs to use a different function slot afterwards.
>> Perhaps we might need to a special multiseat like QEMU device for that
>> purpose?
>>
>> Our case we'll upgrade the config from VF to virtio-bypass
directly.
>
> So if I am understanding what you are saying you are wanting to flip
> the backup interface from the virtio to a VF. The problem is that
> becomes a bit of a vendor lock-in solution since it would rely on a
> specific VF driver. I would agree with Jiri that we don't want to go
> down that path. We don't want every VF out there firing up its own
> separate bond. Ideally you want the hypervisor to be able to manage
> all of this which is why it makes sense to have virtio manage this and
> why this is associated with the virtio_net interface.
No, that's not what I was talking about of course. I thought you
mentioned the upgrade scenario this patch would like to address is to
use the bypass interface "to take the place of the original virtio,
and get udev to rename the bypass to what the original virtio_net
was". That is one of the possible upgrade paths for sure. However the
upgrade path I was seeking is to use the bypass interface to take the
place of original VF interface while retaining the name and network
configs, which generally can be done simply with kernel upgrade. It
would become limiting as this patch makes the bypass interface share
the same virtio pci device with virito backup. Can this bypass
interface be made general to take place of any pci device other than
virtio-net? This will be more helpful as the cloud users who has
existing setup on VF interface don't have to recreate it on virtio-net
and VF separately again.
>
> The other bits get into more complexity then we are ready to handle
> for now. I think I might have talked about something similar that I
> was referring to as a "virtio-bond" where you would have a
PCI/PCIe
> tree topology that makes this easier to sort out, and the "virtio-bond
> would be used to handle coordination/configuration of a much more
> complex interface.
That was one way to solve this problem but I'd like to see simple ways
to sort it out.
>
>>>
>>>> FWIW two solutions that immediately come to mind is to export
"backup"
>>>> as phys_port_name of the backup virtio link and/or assign a
name to the
>>>> master like you are doing already.  I think team uses team%d
and bond
>>>> uses bond%d, soft naming of master devices seems quite natural
in this
>>>> case.
>>>
>>> I figured I had overlooked something like that.. Thanks for
pointing
>>> this out. Okay so I think the phys_port_name approach might resolve
>>> the original issue. If I am reading things correctly what we end up
>>> with is the master showing up as "ens1" for example and
the backup
>>> showing up as "ens1nbackup". Am I understanding that
right?
>>>
>>> The problem with the team/bond%d approach is that it creates a new
>>> netdevice and so it would require guest configuration changes.
>>>
>>>> IMHO phys_port_name == "backup" if BACKUP bit is set
on slave virtio
>>>> link is quite neat.
>>>
>>> I agree. For non-"backup" virio_net devices would it be
okay for us to
>>> just return -EOPNOTSUPP? I assume it would be and that way the
legacy
>>> behavior could be maintained although the function still exists.
>>>
>>>>> - When the 'active' netdev is unplugged OR not
present on a destination
>>>>>   system after live migration, the user will see 2
virtio_net netdevs.
>>>>
>>>> That's necessary and expected, all configuration applies to
the master
>>>> so master must exist.
>>>
>>> With the naming issue resolved this is the only item left
outstanding.
>>> This becomes a matter of form vs function.
>>>
>>> The main complaint about the "3 netdev" solution is a bit
confusing to
>>> have the 2 netdevs present if the VF isn't there. The idea is
that
>>> having the extra "master" netdev there if there isn't
really a bond is
>>> a bit ugly.
>>
>> Is it this uglier in terms of user experience rather than
>> functionality? I don't want it dynamically changed between 2-netdev
>> and 3-netdev depending on the presence of VF. That gets back to my
>> original question and suggestion earlier: why not just hide the lower
>> netdevs from udev renaming and such? Which important observability
>> benefits users may get if exposing the lower netdevs?
>>
>> Thanks,
>> -Siwei
>
> The only real advantage to a 2 netdev solution is that it looks like
> the netvsc solution, however it doesn't behave like it since there are
> some features like XDP that may not function correctly if they are
> left enabled in the virtio_net interface.
>
> As far as functionality the advantage of not hiding the lower devices
> is that they are free to be managed. The problem with pushing all of
> the configuration into the upper device is that you are limited to the
> intersection of the features of the lower devices. This can be
> limiting for some setups as some VFs support things like more queues,
> or better interrupt moderation options than others so trying to make
> everything work with one config would be ugly.
It depends on how you build it and the way you expect it to work. IMHO
the lower devices don't need to be directly managed at all, otherwise
it ends up with loss of configuration across migration, and it really
does not bring much value than having a general team or bond device.
Users still have to reconfigure those queue settings and interrupt
moderation options after all. The new upper device could take the
assumption that the VF/PT lower device always has superior feature set
than virtio-net in order to apply advanced configuration. The upper
device should remember all configurations previously done and apply
supporting ones to active device automatically when switching the
datapath.

Regards,
-Siwei
>
> - Alex

Samudrala, Sridhar

2018-Feb-22 02:35 UTC

head link

[RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

On 2/21/2018 5:59 PM, Siwei Liu wrote:> On Wed, Feb 21, 2018 at 4:17 PM, Alexander Duyck
> <alexander.duyck at gmail.com> wrote:
>> On Wed, Feb 21, 2018 at 3:50 PM, Siwei Liu <loseweigh at
gmail.com> wrote:
>>> I haven't checked emails for days and did not realize the new
revision
>>> had already came out. And thank you for the effort, this revision
>>> really looks to be a step forward towards our use case and is close
to
>>> what we wanted to do. A few questions in line.
>>>
>>> On Sat, Feb 17, 2018 at 9:12 AM, Alexander Duyck
>>> <alexander.duyck at gmail.com> wrote:
>>>> On Fri, Feb 16, 2018 at 6:38 PM, Jakub Kicinski <kubakici at
wp.pl> wrote:
>>>>> On Fri, 16 Feb 2018 10:11:19 -0800, Sridhar Samudrala
wrote:
>>>>>> Ppatch 2 is in response to the community request for a
3 netdev
>>>>>> solution.  However, it creates some issues we'll
get into in a moment.
>>>>>> It extends virtio_net to use alternate datapath when
available and
>>>>>> registered. When BACKUP feature is enabled, virtio_net
driver creates
>>>>>> an additional 'bypass' netdev that acts as a
master device and controls
>>>>>> 2 slave devices.  The original virtio_net netdev is
registered as
>>>>>> 'backup' netdev and a passthru/vf device with
the same MAC gets
>>>>>> registered as 'active' netdev. Both
'bypass' and 'backup' netdevs are
>>>>>> associated with the same 'pci' device.  The
user accesses the network
>>>>>> interface via 'bypass' netdev. The
'bypass' netdev chooses 'active' netdev
>>>>>> as default for transmits when it is available with link
up and running.
>>>>> Thank you do doing this.
>>>>>
>>>>>> We noticed a couple of issues with this approach during
testing.
>>>>>> - As both 'bypass' and 'backup' netdevs
are associated with the same
>>>>>>    virtio pci device, udev tries to rename both of them
with the same name
>>>>>>    and the 2nd rename will fail. This would be OK as
long as the first netdev
>>>>>>    to be renamed is the 'bypass' netdev, but
the order in which udev gets
>>>>>>    to rename the 2 netdevs is not reliable.
>>>>> Out of curiosity - why do you link the master netdev to the
virtio
>>>>> struct device?
>>>> The basic idea of all this is that we wanted this to work with
an
>>>> existing VM image that was using virtio. As such we were trying
to
>>>> make it so that the bypass interface takes the place of the
original
>>>> virtio and get udev to rename the bypass to what the original
>>>> virtio_net was.
>>> Could it made it also possible to take over the config from VF
instead
>>> of virtio on an existing VM image? And get udev rename the bypass
>>> netdev to what the original VF was. I don't say tightly binding
the
>>> bypass master to only virtio or VF, but I think we should provide
both
>>> options to support different upgrade paths. Possibly we could tweak
>>> the device tree layout to reuse the same PCI slot for the master
>>> bypass netdev, such that udev would not get confused when renaming
the
>>> device. The VF needs to use a different function slot afterwards.
>>> Perhaps we might need to a special multiseat like QEMU device for
that
>>> purpose?
>>>
>>> Our case we'll upgrade the config from VF to virtio-bypass
directly.
>> So if I am understanding what you are saying you are wanting to flip
>> the backup interface from the virtio to a VF. The problem is that
>> becomes a bit of a vendor lock-in solution since it would rely on a
>> specific VF driver. I would agree with Jiri that we don't want to
go
>> down that path. We don't want every VF out there firing up its own
>> separate bond. Ideally you want the hypervisor to be able to manage
>> all of this which is why it makes sense to have virtio manage this and
>> why this is associated with the virtio_net interface.
> No, that's not what I was talking about of course. I thought you
> mentioned the upgrade scenario this patch would like to address is to
> use the bypass interface "to take the place of the original virtio,
> and get udev to rename the bypass to what the original virtio_net
> was". That is one of the possible upgrade paths for sure. However the
> upgrade path I was seeking is to use the bypass interface to take the
> place of original VF interface while retaining the name and network
> configs, which generally can be done simply with kernel upgrade. It
> would become limiting as this patch makes the bypass interface share
> the same virtio pci device with virito backup. Can this bypass
> interface be made general to take place of any pci device other than
> virtio-net? This will be more helpful as the cloud users who has
> existing setup on VF interface don't have to recreate it on virtio-net
> and VF separately again.
Yes. This sounds interesting. Looks like you want an existing VM image with
VF only configuration to get transparent live migration support by adding
virtio_net with BACKUP feature.? We may need another feature bit to switch
between these 2 options.

>
>> The other bits get into more complexity then we are ready to handle
>> for now. I think I might have talked about something similar that I
>> was referring to as a "virtio-bond" where you would have a
PCI/PCIe
>> tree topology that makes this easier to sort out, and the
"virtio-bond
>> would be used to handle coordination/configuration of a much more
>> complex interface.
> That was one way to solve this problem but I'd like to see simple ways
> to sort it out.
>
>>>>> FWIW two solutions that immediately come to mind is to
export "backup"
>>>>> as phys_port_name of the backup virtio link and/or assign a
name to the
>>>>> master like you are doing already.  I think team uses
team%d and bond
>>>>> uses bond%d, soft naming of master devices seems quite
natural in this
>>>>> case.
>>>> I figured I had overlooked something like that.. Thanks for
pointing
>>>> this out. Okay so I think the phys_port_name approach might
resolve
>>>> the original issue. If I am reading things correctly what we
end up
>>>> with is the master showing up as "ens1" for example
and the backup
>>>> showing up as "ens1nbackup". Am I understanding that
right?
>>>>
>>>> The problem with the team/bond%d approach is that it creates a
new
>>>> netdevice and so it would require guest configuration changes.
>>>>
>>>>> IMHO phys_port_name == "backup" if BACKUP bit is
set on slave virtio
>>>>> link is quite neat.
>>>> I agree. For non-"backup" virio_net devices would it
be okay for us to
>>>> just return -EOPNOTSUPP? I assume it would be and that way the
legacy
>>>> behavior could be maintained although the function still
exists.
>>>>
>>>>>> - When the 'active' netdev is unplugged OR not
present on a destination
>>>>>>    system after live migration, the user will see 2
virtio_net netdevs.
>>>>> That's necessary and expected, all configuration
applies to the master
>>>>> so master must exist.
>>>> With the naming issue resolved this is the only item left
outstanding.
>>>> This becomes a matter of form vs function.
>>>>
>>>> The main complaint about the "3 netdev" solution is a
bit confusing to
>>>> have the 2 netdevs present if the VF isn't there. The idea
is that
>>>> having the extra "master" netdev there if there
isn't really a bond is
>>>> a bit ugly.
>>> Is it this uglier in terms of user experience rather than
>>> functionality? I don't want it dynamically changed between
2-netdev
>>> and 3-netdev depending on the presence of VF. That gets back to my
>>> original question and suggestion earlier: why not just hide the
lower
>>> netdevs from udev renaming and such? Which important observability
>>> benefits users may get if exposing the lower netdevs?
>>>
>>> Thanks,
>>> -Siwei
>> The only real advantage to a 2 netdev solution is that it looks like
>> the netvsc solution, however it doesn't behave like it since there
are
>> some features like XDP that may not function correctly if they are
>> left enabled in the virtio_net interface.
>>
>> As far as functionality the advantage of not hiding the lower devices
>> is that they are free to be managed. The problem with pushing all of
>> the configuration into the upper device is that you are limited to the
>> intersection of the features of the lower devices. This can be
>> limiting for some setups as some VFs support things like more queues,
>> or better interrupt moderation options than others so trying to make
>> everything work with one config would be ugly.
> It depends on how you build it and the way you expect it to work. IMHO
> the lower devices don't need to be directly managed at all, otherwise
> it ends up with loss of configuration across migration, and it really
> does not bring much value than having a general team or bond device.
> Users still have to reconfigure those queue settings and interrupt
> moderation options after all. The new upper device could take the
> assumption that the VF/PT lower device always has superior feature set
> than virtio-net in order to apply advanced configuration. The upper
> device should remember all configurations previously done and apply
> supporting ones to active device automatically when switching the
> datapath.
>It should be possible to extend this patchset to support migration of 
additional
settings? by enabling additional ndo_ops and ethtool_ops on the upper dev
and propagating them down the lower devices and replaying the settings after
the VF is replugged after migration.

Thanks
Sridhar

Stephen Hemminger

2018-Feb-24 00:03 UTC

head link

[RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

(pruned to reduce thread)

On Wed, 21 Feb 2018 16:17:19 -0800
Alexander Duyck <alexander.duyck at gmail.com> wrote:
> >>> FWIW two solutions that immediately come to mind is to export
"backup"
> >>> as phys_port_name of the backup virtio link and/or assign a
name to the
> >>> master like you are doing already.  I think team uses team%d
and bond
> >>> uses bond%d, soft naming of master devices seems quite natural
in this
> >>> case.  
> >>
> >> I figured I had overlooked something like that.. Thanks for
pointing
> >> this out. Okay so I think the phys_port_name approach might
resolve
> >> the original issue. If I am reading things correctly what we end
up
> >> with is the master showing up as "ens1" for example and
the backup
> >> showing up as "ens1nbackup". Am I understanding that
right?
> >>
> >> The problem with the team/bond%d approach is that it creates a new
> >> netdevice and so it would require guest configuration changes.
> >>  
> >>> IMHO phys_port_name == "backup" if BACKUP bit is set
on slave virtio
> >>> link is quite neat.  
> >>
> >> I agree. For non-"backup" virio_net devices would it be
okay for us to
> >> just return -EOPNOTSUPP? I assume it would be and that way the
legacy
> >> behavior could be maintained although the function still exists.
> >>  
> >>>> - When the 'active' netdev is unplugged OR not
present on a destination
> >>>>   system after live migration, the user will see 2
virtio_net netdevs.
> >>>
> >>> That's necessary and expected, all configuration applies
to the master
> >>> so master must exist.  
> >>
> >> With the naming issue resolved this is the only item left
outstanding.
> >> This becomes a matter of form vs function.
> >>
> >> The main complaint about the "3 netdev" solution is a
bit confusing to
> >> have the 2 netdevs present if the VF isn't there. The idea is
that
> >> having the extra "master" netdev there if there
isn't really a bond is
> >> a bit ugly.  
> >
> > Is it this uglier in terms of user experience rather than
> > functionality? I don't want it dynamically changed between
2-netdev
> > and 3-netdev depending on the presence of VF. That gets back to my
> > original question and suggestion earlier: why not just hide the lower
> > netdevs from udev renaming and such? Which important observability
> > benefits users may get if exposing the lower netdevs?
> >
> > Thanks,
> > -Siwei  
> 
> The only real advantage to a 2 netdev solution is that it looks like
> the netvsc solution, however it doesn't behave like it since there are
> some features like XDP that may not function correctly if they are
> left enabled in the virtio_net interface.
> 
> As far as functionality the advantage of not hiding the lower devices
> is that they are free to be managed. The problem with pushing all of
> the configuration into the upper device is that you are limited to the
> intersection of the features of the lower devices. This can be
> limiting for some setups as some VFs support things like more queues,
> or better interrupt moderation options than others so trying to make
> everything work with one config would be ugly.
> 

Let's not make XDP the blocker for doing the best solution
from the end user point of view. XDP is just yet another offload
thing which needs to be handled.  The current backup device solution
used in netvsc doesn't handle the full range of offload options
(things like flow direction, DCB, etc); no one but the HW vendors
seems to care.

Alexander Duyck

2018-Feb-25 22:17 UTC

head link

[RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

On Fri, Feb 23, 2018 at 4:03 PM, Stephen Hemminger
<stephen at networkplumber.org> wrote:> (pruned to reduce thread)
>
> On Wed, 21 Feb 2018 16:17:19 -0800
> Alexander Duyck <alexander.duyck at gmail.com> wrote:
>
>> >>> FWIW two solutions that immediately come to mind is to
export "backup"
>> >>> as phys_port_name of the backup virtio link and/or assign
a name to the
>> >>> master like you are doing already.  I think team uses
team%d and bond
>> >>> uses bond%d, soft naming of master devices seems quite
natural in this
>> >>> case.
>> >>
>> >> I figured I had overlooked something like that.. Thanks for
pointing
>> >> this out. Okay so I think the phys_port_name approach might
resolve
>> >> the original issue. If I am reading things correctly what we
end up
>> >> with is the master showing up as "ens1" for example
and the backup
>> >> showing up as "ens1nbackup". Am I understanding that
right?
>> >>
>> >> The problem with the team/bond%d approach is that it creates a
new
>> >> netdevice and so it would require guest configuration changes.
>> >>
>> >>> IMHO phys_port_name == "backup" if BACKUP bit is
set on slave virtio
>> >>> link is quite neat.
>> >>
>> >> I agree. For non-"backup" virio_net devices would it
be okay for us to
>> >> just return -EOPNOTSUPP? I assume it would be and that way the
legacy
>> >> behavior could be maintained although the function still
exists.
>> >>
>> >>>> - When the 'active' netdev is unplugged OR not
present on a destination
>> >>>>   system after live migration, the user will see 2
virtio_net netdevs.
>> >>>
>> >>> That's necessary and expected, all configuration
applies to the master
>> >>> so master must exist.
>> >>
>> >> With the naming issue resolved this is the only item left
outstanding.
>> >> This becomes a matter of form vs function.
>> >>
>> >> The main complaint about the "3 netdev" solution is
a bit confusing to
>> >> have the 2 netdevs present if the VF isn't there. The idea
is that
>> >> having the extra "master" netdev there if there
isn't really a bond is
>> >> a bit ugly.
>> >
>> > Is it this uglier in terms of user experience rather than
>> > functionality? I don't want it dynamically changed between
2-netdev
>> > and 3-netdev depending on the presence of VF. That gets back to my
>> > original question and suggestion earlier: why not just hide the
lower
>> > netdevs from udev renaming and such? Which important observability
>> > benefits users may get if exposing the lower netdevs?
>> >
>> > Thanks,
>> > -Siwei
>>
>> The only real advantage to a 2 netdev solution is that it looks like
>> the netvsc solution, however it doesn't behave like it since there
are
>> some features like XDP that may not function correctly if they are
>> left enabled in the virtio_net interface.
>>
>> As far as functionality the advantage of not hiding the lower devices
>> is that they are free to be managed. The problem with pushing all of
>> the configuration into the upper device is that you are limited to the
>> intersection of the features of the lower devices. This can be
>> limiting for some setups as some VFs support things like more queues,
>> or better interrupt moderation options than others so trying to make
>> everything work with one config would be ugly.
>>
>
>
> Let's not make XDP the blocker for doing the best solution
> from the end user point of view. XDP is just yet another offload
> thing which needs to be handled.  The current backup device solution
> used in netvsc doesn't handle the full range of offload options
> (things like flow direction, DCB, etc); no one but the HW vendors
> seems to care.
XDP isn't the blocker here. As far as I am concerned we can go either
way, with a 2 netdev or a 3 netdev solution. We just need to make sure
we are aware of all the trade-offs, and make a decision one way or the
other. This is quickly turning into a bikeshed and I would prefer us
to all agree, or at least disagree and commit, on which way to go
before we burn more cycles on a patch set that seems to be getting
tied up in debate.

With the 2 netdev solution we have to limit the functionality so that
we don't break things when we bypass the guts of the driver to hand
traffic off to the VF. Then ends up meaning that we are stuck with an
extra qdisc and Tx queue lock in the transmit path of the VF, and we
cannot rely on any in-driver Rx functionality to work such as
in-driver XDP. However the advantage here is that this is how netvsc
is already doing things.

The issue with the 3 netdev solution is that you are stuck with 2
netdevs ("ens1", "ens1nbackup") when the VF is not present.
It could
be argued this isn't a very elegant looking solution, especially when
the VF is not present. With virtio this makes more sense though as you
are still able to expose the full functionality of the lower device so
you don't have to strip or drop any of the existing net device ops if
the "backup" bit is present.

Ultimately I would have preferred to have the 3 netdev solution go
with virtio only as it would have allowed for healthy competition
between the two designs and we could have seen which one would have
ultimately won out, but if we have to decide this now we need to do so
before we put too much more effort into the patches as these end up
becoming two very different solutions, especially if we have to apply
the solution to both drivers. My preference would still be 3 netdevs
since we could apply this to netvsc without too many changes, but I
will agree with whatever conclusion we can come to in terms of how
this is supposed to work for both netvsc and virtio_bypass.

- Alex

Maybe Matching Threads

Search for more seemingly similar threads

Linux Virtualization - Feb 2018 - [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

[RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

[RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

[RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

[RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

[RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device

Maybe Matching Threads