thr3ads.net - Linux Virtualization - [summary] virtio network device failover writeup [Mar 2019]

If this information is useful, please help other people find it:
Share via:

Michael S. Tsirkin

2019-Mar-17 13:55 UTC

[summary] virtio network device failover writeup

Hi all,
I've put up a blog post with a summary of where network
device failover stands and some open issues.
Not sure where best to host it, I just put it up on blogspot:
https://mstsirkin.blogspot.com/2019/03/virtio-network-device-failover-support.html

Comments, corrections are welcome!

-- 
MST

Liran Alon

2019-Mar-19 12:38 UTC

head link

[summary] virtio network device failover writeup

Hi Michael,

Great blog-post which summarise everything very well!

Some comments I have:

1) I think that when we are using the term ?1-netdev model? on community
discussion, we tend to refer to what you have defined in blog-post as
"3-device model with hidden slaves?.
Therefore, I would suggest to just remove the ?1-netdev model? section and
rename the "3-device model with hidden slaves? section to ?1-netdev model?.

2) The userspace issues result both from using ?2-netdev model? and ?3-netdev
model?. However, they are described in blog-post as they only exist on ?3-netdev
model?.
The reason these issues are not seen in Azure environment is because these
issues were partially handled by Microsoft for their specific 2-netdev model.
Which leads me to the next comment.

3) I suggest that blog-post will also elaborate on what exactly are the
userspace issues which results in models different than ?1-netdev model?.
The issues that I?m aware of are (Please tell me if you are aware of others!):
(a) udev rename race-condition: When net-failover device is opened, it also
opens it's slaves. However, the order of events to udev on KOBJ_ADD is first
for the net-failover netdev and only then for the virtio-net netdev. This means
that if userspace will respond to first event by open the net-failover, then any
attempt of userspace to rename virtio-net netdev as a response to the second
event will fail because the virtio-net netdev is already opened. Also note that
this udev rename rule is useful because we would like to add rules that renames
virtio-net netdev to clearly signal that it?s used as the standby interface of
another net-failover netdev.
The way this problem was workaround by Microsoft in NetVSC is to delay the open
done on slave-VF from the open of the NetVSC netdev. However, this is still a
race and thus a hacky solution. It was accepted by community only because it?s
internal to the NetVSC driver. However, similar solution was rejected by
community for the net-failover driver.
The solution that we currently proposed to address this (Patch by Si-Wei) was to
change the rename kernel handling to allow a net-failover slave to be renamed
even if it is already opened. Patch is still not accepted.
(b) Issues caused because of various userspace components DHCP the net-failover
slaves: DHCP of course should only be done on the net-failover netdev.
Attempting to DHCP on net-failover slaves as-well will cause networking issues.
Therefore, userspace components should be taught to avoid doing DHCP on the
net-failover slaves. The various userspace components include:
b.1) dhclient: If run without parameters, it by default just enum all netdevs
and attempt to DHCP them all.
(I don?t think Microsoft has handled this)
b.2) initramfs / dracut: In order to mount the root file-system from iSCSI,
these components needs networking and therefore DHCP on all netdevs.
(Microsoft haven?t handled (b.2) because they don?t have images which perform
iSCSI boot in their Azure setup. Still an open issue)
b.3) cloud-init: If configured to perform network-configuration, it attempts to
configure all available netdevs. It should avoid however doing so on
net-failover slaves.
(Microsoft has handled this by adding a mechanism in cloud-init to blacklist a
netdev from being configured in case it is owned by a specific PCI driver.
Specifically, they blacklist Mellanox VF driver. However, this technique doesn?t
work for the net-failover mechanism because both the net-failover netdev and the
virtio-net netdev are owned by the virtio-net PCI driver).
b.4) Various distros network-manager need to be updated to avoid DHCP on
net-failover slaves? (Not sure. Asking...)

4) Another interesting use-case where the net-failover mechanism is useful is
for handling NIC firmware failures or NIC firmware Live-Upgrade.
In both cases, there is a need to perform a full PCIe reset of the NIC. Which
lose all the NIC eSwitch configuration of the various VFs.
To handle these cases gracefully, one could just hot-unplug all VFs from guests
running on host (which will make all guests now use the virtio-net netdev which
is backed by a netdev that eventually is on top of PF). Therefore, networking
will be restored to guests once the PCIe reset is completed and the PF is
functional again. To re-acceelrate the guests network, hypervisor can just
hot-plug new VFs to guests.

P.S:
I would very appreciate all this forum help in closing on the pending items
written in (3). Which currently prevents using this net-failover mechanism in
real production use-cases.

Regards,
-Liran
> On 17 Mar 2019, at 15:55, Michael S. Tsirkin <mst at redhat.com>
wrote:
> 
> Hi all,
> I've put up a blog post with a summary of where network
> device failover stands and some open issues.
> Not sure where best to host it, I just put it up on blogspot:
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__mstsirkin.blogspot.com_2019_03_virtio-2Dnetwork-2Ddevice-2Dfailover-2Dsupport.html&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=jd0emHx6EkPSTvO0TytfYmG4rOMQ9htenhrgKprrh9E&s=5EJamlc_g1lZa0Ga7K30E6aWVg3jy8lizhw1aSguo3A&e>
> Comments, corrections are welcome!
> 
> -- 
> MST

Stephen Hemminger

2019-Mar-19 15:46 UTC

head link

[summary] virtio network device failover writeup

On Tue, 19 Mar 2019 14:38:06 +0200
Liran Alon <liran.alon at oracle.com> wrote:
> b.3) cloud-init: If configured to perform network-configuration, it
attempts to configure all available netdevs. It should avoid however doing so on
net-failover slaves.
> (Microsoft has handled this by adding a mechanism in cloud-init to
blacklist a netdev from being configured in case it is owned by a specific PCI
driver. Specifically, they blacklist Mellanox VF driver. However, this technique
doesn?t work for the net-failover mechanism because both the net-failover netdev
and the virtio-net netdev are owned by the virtio-net PCI driver).
Cloud-init should really just ignore all devices that have a master device.
That would have been more general, and safer for other use cases.

Michael S. Tsirkin

2019-Mar-19 21:06 UTC

head link

[summary] virtio network device failover writeup

On Tue, Mar 19, 2019 at 02:38:06PM +0200, Liran Alon
wrote:> Hi Michael,
> 
> Great blog-post which summarise everything very well!
> 
> Some comments I have:
Thanks!
I'll try to update everything in the post when I'm not so jet-lagged.
> 1) I think that when we are using the term ?1-netdev model? on community
discussion, we tend to refer to what you have defined in blog-post as
"3-device model with hidden slaves?.
> Therefore, I would suggest to just remove the ?1-netdev model? section and
rename the "3-device model with hidden slaves? section to ?1-netdev model?.
> 
> 2) The userspace issues result both from using ?2-netdev model? and
?3-netdev model?. However, they are described in blog-post as they only exist on
?3-netdev model?.
> The reason these issues are not seen in Azure environment is because these
issues were partially handled by Microsoft for their specific 2-netdev model.
> Which leads me to the next comment.
> 
> 3) I suggest that blog-post will also elaborate on what exactly are the
userspace issues which results in models different than ?1-netdev model?.
> The issues that I?m aware of are (Please tell me if you are aware of
others!):
> (a) udev rename race-condition: When net-failover device is opened, it also
opens it's slaves. However, the order of events to udev on KOBJ_ADD is first
for the net-failover netdev and only then for the virtio-net netdev. This means
that if userspace will respond to first event by open the net-failover, then any
attempt of userspace to rename virtio-net netdev as a response to the second
event will fail because the virtio-net netdev is already opened. Also note that
this udev rename rule is useful because we would like to add rules that renames
virtio-net netdev to clearly signal that it?s used as the standby interface of
another net-failover netdev.
> The way this problem was workaround by Microsoft in NetVSC is to delay the
open done on slave-VF from the open of the NetVSC netdev. However, this is still
a race and thus a hacky solution. It was accepted by community only because it?s
internal to the NetVSC driver. However, similar solution was rejected by
community for the net-failover driver.
> The solution that we currently proposed to address this (Patch by Si-Wei)
was to change the rename kernel handling to allow a net-failover slave to be
renamed even if it is already opened. Patch is still not accepted.
> (b) Issues caused because of various userspace components DHCP the
net-failover slaves: DHCP of course should only be done on the net-failover
netdev. Attempting to DHCP on net-failover slaves as-well will cause networking
issues. Therefore, userspace components should be taught to avoid doing DHCP on
the net-failover slaves. The various userspace components include:
> b.1) dhclient: If run without parameters, it by default just enum all
netdevs and attempt to DHCP them all.
> (I don?t think Microsoft has handled this)
> b.2) initramfs / dracut: In order to mount the root file-system from iSCSI,
these components needs networking and therefore DHCP on all netdevs.
> (Microsoft haven?t handled (b.2) because they don?t have images which
perform iSCSI boot in their Azure setup. Still an open issue)
> b.3) cloud-init: If configured to perform network-configuration, it
attempts to configure all available netdevs. It should avoid however doing so on
net-failover slaves.
> (Microsoft has handled this by adding a mechanism in cloud-init to
blacklist a netdev from being configured in case it is owned by a specific PCI
driver. Specifically, they blacklist Mellanox VF driver. However, this technique
doesn?t work for the net-failover mechanism because both the net-failover netdev
and the virtio-net netdev are owned by the virtio-net PCI driver).
> b.4) Various distros network-manager need to be updated to avoid DHCP on
net-failover slaves? (Not sure. Asking...)
> 
> 4) Another interesting use-case where the net-failover mechanism is useful
is for handling NIC firmware failures or NIC firmware Live-Upgrade.
> In both cases, there is a need to perform a full PCIe reset of the NIC.
Which lose all the NIC eSwitch configuration of the various VFs.
In this setup, how does VF keep going? If it doesn't keep going, why is
it helpful?
> To handle these cases gracefully, one could just hot-unplug all VFs from
guests running on host (which will make all guests now use the virtio-net netdev
which is backed by a netdev that eventually is on top of PF). Therefore,
networking will be restored to guests once the PCIe reset is completed and the
PF is functional again. To re-acceelrate the guests network, hypervisor can just
hot-plug new VFs to guests.
> 
> P.S:
> I would very appreciate all this forum help in closing on the pending items
written in (3). Which currently prevents using this net-failover mechanism in
real production use-cases.
> 
> Regards,
> -Liran
> 
> > On 17 Mar 2019, at 15:55, Michael S. Tsirkin <mst at redhat.com>
wrote:
> > 
> > Hi all,
> > I've put up a blog post with a summary of where network
> > device failover stands and some open issues.
> > Not sure where best to host it, I just put it up on blogspot:
> >
https://urldefense.proofpoint.com/v2/url?u=https-3A__mstsirkin.blogspot.com_2019_03_virtio-2Dnetwork-2Ddevice-2Dfailover-2Dsupport.html&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=jd0emHx6EkPSTvO0TytfYmG4rOMQ9htenhrgKprrh9E&s=5EJamlc_g1lZa0Ga7K30E6aWVg3jy8lizhw1aSguo3A&e>
>
> > Comments, corrections are welcome!
> > 
> > -- 
> > MST

Apparently Analagous Threads

Search for more maybe matching threads

Linux Virtualization - Mar 2019 - [summary] virtio network device failover writeup

[summary] virtio network device failover writeup

[summary] virtio network device failover writeup

[summary] virtio network device failover writeup

[summary] virtio network device failover writeup

Apparently Analagous Threads