Samudrala, Sridhar
2018-Jan-03 18:14 UTC
[PATCH net-next 0/2] Enable virtio to act as a master for a passthru device
On 1/3/2018 8:59 AM, Alexander Duyck wrote:> On Tue, Jan 2, 2018 at 6:16 PM, Jakub Kicinski <kubakici at wp.pl> wrote: >> On Tue, 2 Jan 2018 16:35:36 -0800, Sridhar Samudrala wrote: >>> This patch series enables virtio to switch over to a VF datapath when a VF >>> netdev is present with the same MAC address. It allows live migration of a VM >>> with a direct attached VF without the need to setup a bond/team between a >>> VF and virtio net device in the guest. >>> >>> The hypervisor needs to unplug the VF device from the guest on the source >>> host and reset the MAC filter of the VF to initiate failover of datapath to >>> virtio before starting the migration. After the migration is completed, the >>> destination hypervisor sets the MAC filter on the VF and plugs it back to >>> the guest to switch over to VF datapath. >>> >>> It is based on netvsc implementation and it may be possible to make this code >>> generic and move it to a common location that can be shared by netvsc and virtio. >>> >>> This patch series is based on the discussion initiated by Jesse on this thread. >>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2 >> How does the notion of a device which is both a bond and a leg of a >> bond fit with Alex's recent discussions about feature propagation? >> Which propagation rules will apply to VirtIO master? Meaning of the >> flags on a software upper device may be different. Why muddy the >> architecture like this and not introduce a synthetic bond device? > It doesn't really fit with the notion I had. I think there may have > been a bit of a disconnect as I have been out for the last week or so > for the holidays. > > My thought on this was that the feature bit should be spawning a new > para-virtual bond device and that bond should have the virto and the > VF as slaves. Also I thought there was some discussion about trying to > reuse as much of the netvsc code as possible for this so that we could > avoid duplication of effort and have the two drivers use the same > approach. It seems like it should be pretty straight forward since you > would have the feature bit in the case of virto, and netvsc just does > this sort of thing by default if I am not mistaken.This patch is mostly based on netvsc implementation. The only change is avoiding the explicit dev_open() call of the VF netdev after a delay. I am assuming that the guest userspace will bring up the VF netdev and the hypervisor will update the MAC filters to switch to the right data path. We could commonize the code and make it shared between netvsc and virtio. Do we want to do this right away or later? If so, what would be a good location for these shared functions? Is it net/core/dev.c? Also, if we want to go with a solution that creates a bond device, do we want virtio_net/netvsc drivers to create a upper device?? Such a solution is already possible via config scripts that can create a bond with virtio and a VF net device as slaves.? netvsc and this patch series is trying to make it as simple as possible for the VM to use directly attached devices and support live migration by switching to virtio datapath as a backup during the migration process when the VF device is unplugged. Thanks Sridhar
Samudrala, Sridhar
2018-Jan-04 00:22 UTC
[PATCH net-next 0/2] Enable virtio to act as a master for a passthru device
On 1/3/2018 10:28 AM, Alexander Duyck wrote:> On Wed, Jan 3, 2018 at 10:14 AM, Samudrala, Sridhar > <sridhar.samudrala at intel.com> wrote: >> >> On 1/3/2018 8:59 AM, Alexander Duyck wrote: >>> On Tue, Jan 2, 2018 at 6:16 PM, Jakub Kicinski <kubakici at wp.pl> wrote: >>>> On Tue, 2 Jan 2018 16:35:36 -0800, Sridhar Samudrala wrote: >>>>> This patch series enables virtio to switch over to a VF datapath when a >>>>> VF >>>>> netdev is present with the same MAC address. It allows live migration of >>>>> a VM >>>>> with a direct attached VF without the need to setup a bond/team between >>>>> a >>>>> VF and virtio net device in the guest. >>>>> >>>>> The hypervisor needs to unplug the VF device from the guest on the >>>>> source >>>>> host and reset the MAC filter of the VF to initiate failover of datapath >>>>> to >>>>> virtio before starting the migration. After the migration is completed, >>>>> the >>>>> destination hypervisor sets the MAC filter on the VF and plugs it back >>>>> to >>>>> the guest to switch over to VF datapath. >>>>> >>>>> It is based on netvsc implementation and it may be possible to make this >>>>> code >>>>> generic and move it to a common location that can be shared by netvsc >>>>> and virtio. >>>>> >>>>> This patch series is based on the discussion initiated by Jesse on this >>>>> thread. >>>>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2 >>>> How does the notion of a device which is both a bond and a leg of a >>>> bond fit with Alex's recent discussions about feature propagation? >>>> Which propagation rules will apply to VirtIO master? Meaning of the >>>> flags on a software upper device may be different. Why muddy the >>>> architecture like this and not introduce a synthetic bond device? >>> It doesn't really fit with the notion I had. I think there may have >>> been a bit of a disconnect as I have been out for the last week or so >>> for the holidays. >>> >>> My thought on this was that the feature bit should be spawning a new >>> para-virtual bond device and that bond should have the virto and the >>> VF as slaves. Also I thought there was some discussion about trying to >>> reuse as much of the netvsc code as possible for this so that we could >>> avoid duplication of effort and have the two drivers use the same >>> approach. It seems like it should be pretty straight forward since you >>> would have the feature bit in the case of virto, and netvsc just does >>> this sort of thing by default if I am not mistaken. >> This patch is mostly based on netvsc implementation. The only change is >> avoiding the >> explicit dev_open() call of the VF netdev after a delay. I am assuming that >> the guest userspace >> will bring up the VF netdev and the hypervisor will update the MAC filters >> to switch to >> the right data path. >> We could commonize the code and make it shared between netvsc and virtio. Do >> we want >> to do this right away or later? If so, what would be a good location for >> these shared functions? >> Is it net/core/dev.c? > No, I would think about starting a new driver file in "/drivers/net/". > The idea is this driver would be utilized to create a bond > automatically and set the appropriate registration hooks. If nothing > else you could probably just call it something generic like virt-bond > or vbond or whatever.We are trying to avoid creating another driver or a device.? Can we look into consolidation of the 2 implementations(virtio & netvsc) as a later patch?> >> Also, if we want to go with a solution that creates a bond device, do we >> want virtio_net/netvsc >> drivers to create a upper device? Such a solution is already possible via >> config scripts that can >> create a bond with virtio and a VF net device as slaves. netvsc and this >> patch series is trying to >> make it as simple as possible for the VM to use directly attached devices >> and support live migration >> by switching to virtio datapath as a backup during the migration process >> when the VF device >> is unplugged. > We all understand that. But you are making the solution very virtio > specific. We want to see this be usable for other interfaces such as > netsc and whatever other virtual interfaces are floating around out > there. > > Also I haven't seen us address what happens as far as how we will > handle this on the host. My thought was we should have a paired > interface. Something like veth, but made up of a bond on each end. So > in the host we should have one bond that has a tap/vhost interface and > a VF port representor, and on the other we would be looking at the > virtio interface and the VF. Attaching the tap/vhost to the bond could > be a way of triggering the feature bit to be set in the virtio. That > way communication between the guest and the host won't get too > confusing as you will see all traffic from the bonded MAC address > always show up on the host side bond instead of potentially showing up > on two unrelated interfaces. It would also make for a good way to > resolve the east/west traffic problem on hosts since you could just > send the broadcast/multicast traffic via the tap/vhost/virtio channel > instead of having to send it back through the port representor and eat > up all that PCIe bus traffic.From the host point of view, here is a simple script that needs to be run to do the live migration. We don't need any bond configuration on the host. virsh detach-interface $DOMAIN hostdev --mac $MAC ip link set $PF vf $VF_NUM mac $ZERO_MAC virsh migrate --live $DOMAIN qemu+ssh://$REMOTE_HOST/system ssh $REMOTE_HOST ip link set $PF vf $VF_NUM mac $MAC ssh $REMOTE_HOST virsh attach-interface $DOMAIN hostdev $REMOTE_HOSTDEV --mac $MAC
Siwei Liu
2018-Jan-22 19:00 UTC
[PATCH net-next 0/2] Enable virtio to act as a master for a passthru device
Apologies I didn't notice that the discussion was mistakenly taken offline. Post it back. -Siwei On Sat, Jan 13, 2018 at 7:25 AM, Siwei Liu <loseweigh at gmail.com> wrote:> On Thu, Jan 11, 2018 at 12:32 PM, Samudrala, Sridhar > <sridhar.samudrala at intel.com> wrote: >> On 1/8/2018 9:22 AM, Siwei Liu wrote: >>> >>> On Sat, Jan 6, 2018 at 2:33 AM, Samudrala, Sridhar >>> <sridhar.samudrala at intel.com> wrote: >>>> >>>> On 1/5/2018 9:07 AM, Siwei Liu wrote: >>>>> >>>>> On Thu, Jan 4, 2018 at 8:22 AM, Samudrala, Sridhar >>>>> <sridhar.samudrala at intel.com> wrote: >>>>>> >>>>>> On 1/3/2018 10:28 AM, Alexander Duyck wrote: >>>>>>> >>>>>>> On Wed, Jan 3, 2018 at 10:14 AM, Samudrala, Sridhar >>>>>>> <sridhar.samudrala at intel.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 1/3/2018 8:59 AM, Alexander Duyck wrote: >>>>>>>>> >>>>>>>>> On Tue, Jan 2, 2018 at 6:16 PM, Jakub Kicinski <kubakici at wp.pl> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> On Tue, 2 Jan 2018 16:35:36 -0800, Sridhar Samudrala wrote: >>>>>>>>>>> >>>>>>>>>>> This patch series enables virtio to switch over to a VF datapath >>>>>>>>>>> when >>>>>>>>>>> a >>>>>>>>>>> VF >>>>>>>>>>> netdev is present with the same MAC address. It allows live >>>>>>>>>>> migration >>>>>>>>>>> of >>>>>>>>>>> a VM >>>>>>>>>>> with a direct attached VF without the need to setup a bond/team >>>>>>>>>>> between >>>>>>>>>>> a >>>>>>>>>>> VF and virtio net device in the guest. >>>>>>>>>>> >>>>>>>>>>> The hypervisor needs to unplug the VF device from the guest on the >>>>>>>>>>> source >>>>>>>>>>> host and reset the MAC filter of the VF to initiate failover of >>>>>>>>>>> datapath >>>>>>>>>>> to >>>>>>>>>>> virtio before starting the migration. After the migration is >>>>>>>>>>> completed, >>>>>>>>>>> the >>>>>>>>>>> destination hypervisor sets the MAC filter on the VF and plugs it >>>>>>>>>>> back >>>>>>>>>>> to >>>>>>>>>>> the guest to switch over to VF datapath. >>>>>>>>>>> >>>>>>>>>>> It is based on netvsc implementation and it may be possible to >>>>>>>>>>> make >>>>>>>>>>> this >>>>>>>>>>> code >>>>>>>>>>> generic and move it to a common location that can be shared by >>>>>>>>>>> netvsc >>>>>>>>>>> and virtio. >>>>>>>>>>> >>>>>>>>>>> This patch series is based on the discussion initiated by Jesse on >>>>>>>>>>> this >>>>>>>>>>> thread. >>>>>>>>>>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2 >>>>>>>>>> >>>>>>>>>> How does the notion of a device which is both a bond and a leg of a >>>>>>>>>> bond fit with Alex's recent discussions about feature propagation? >>>>>>>>>> Which propagation rules will apply to VirtIO master? Meaning of >>>>>>>>>> the >>>>>>>>>> flags on a software upper device may be different. Why muddy the >>>>>>>>>> architecture like this and not introduce a synthetic bond device? >>>>>>>>> >>>>>>>>> It doesn't really fit with the notion I had. I think there may have >>>>>>>>> been a bit of a disconnect as I have been out for the last week or >>>>>>>>> so >>>>>>>>> for the holidays. >>>>>>>>> >>>>>>>>> My thought on this was that the feature bit should be spawning a new >>>>>>>>> para-virtual bond device and that bond should have the virto and the >>>>>>>>> VF as slaves. Also I thought there was some discussion about trying >>>>>>>>> to >>>>>>>>> reuse as much of the netvsc code as possible for this so that we >>>>>>>>> could >>>>>>>>> avoid duplication of effort and have the two drivers use the same >>>>>>>>> approach. It seems like it should be pretty straight forward since >>>>>>>>> you >>>>>>>>> would have the feature bit in the case of virto, and netvsc just >>>>>>>>> does >>>>>>>>> this sort of thing by default if I am not mistaken. >>>>>>>> >>>>>>>> This patch is mostly based on netvsc implementation. The only change >>>>>>>> is >>>>>>>> avoiding the >>>>>>>> explicit dev_open() call of the VF netdev after a delay. I am >>>>>>>> assuming >>>>>>>> that >>>>>>>> the guest userspace >>>>>>>> will bring up the VF netdev and the hypervisor will update the MAC >>>>>>>> filters >>>>>>>> to switch to >>>>>>>> the right data path. >>>>>>>> We could commonize the code and make it shared between netvsc and >>>>>>>> virtio. >>>>>>>> Do >>>>>>>> we want >>>>>>>> to do this right away or later? If so, what would be a good location >>>>>>>> for >>>>>>>> these shared functions? >>>>>>>> Is it net/core/dev.c? >>>>>>> >>>>>>> No, I would think about starting a new driver file in "/drivers/net/". >>>>>>> The idea is this driver would be utilized to create a bond >>>>>>> automatically and set the appropriate registration hooks. If nothing >>>>>>> else you could probably just call it something generic like virt-bond >>>>>>> or vbond or whatever. >>>>>> >>>>>> >>>>>> We are trying to avoid creating another driver or a device. Can we >>>>>> look >>>>>> into >>>>>> consolidation of the 2 implementations(virtio & netvsc) as a later >>>>>> patch? >>>>>> >>>>>>>> Also, if we want to go with a solution that creates a bond device, do >>>>>>>> we >>>>>>>> want virtio_net/netvsc >>>>>>>> drivers to create a upper device? Such a solution is already >>>>>>>> possible >>>>>>>> via >>>>>>>> config scripts that can >>>>>>>> create a bond with virtio and a VF net device as slaves. netvsc and >>>>>>>> this >>>>>>>> patch series is trying to >>>>>>>> make it as simple as possible for the VM to use directly attached >>>>>>>> devices >>>>>>>> and support live migration >>>>>>>> by switching to virtio datapath as a backup during the migration >>>>>>>> process >>>>>>>> when the VF device >>>>>>>> is unplugged. >>>>>>> >>>>>>> We all understand that. But you are making the solution very virtio >>>>>>> specific. We want to see this be usable for other interfaces such as >>>>>>> netsc and whatever other virtual interfaces are floating around out >>>>>>> there. >>>>>>> >>>>>>> Also I haven't seen us address what happens as far as how we will >>>>>>> handle this on the host. My thought was we should have a paired >>>>>>> interface. Something like veth, but made up of a bond on each end. So >>>>>>> in the host we should have one bond that has a tap/vhost interface and >>>>>>> a VF port representor, and on the other we would be looking at the >>>>>>> virtio interface and the VF. Attaching the tap/vhost to the bond could >>>>>>> be a way of triggering the feature bit to be set in the virtio. That >>>>>>> way communication between the guest and the host won't get too >>>>>>> confusing as you will see all traffic from the bonded MAC address >>>>>>> always show up on the host side bond instead of potentially showing up >>>>>>> on two unrelated interfaces. It would also make for a good way to >>>>>>> resolve the east/west traffic problem on hosts since you could just >>>>>>> send the broadcast/multicast traffic via the tap/vhost/virtio channel >>>>>>> instead of having to send it back through the port representor and eat >>>>>>> up all that PCIe bus traffic. >>>>>> >>>>>> From the host point of view, here is a simple script that needs to be >>>>>> run to >>>>>> do the >>>>>> live migration. We don't need any bond configuration on the host. >>>>>> >>>>>> virsh detach-interface $DOMAIN hostdev --mac $MAC >>>>>> ip link set $PF vf $VF_NUM mac $ZERO_MAC >>>>> >>>>> I'm not sure I understand how this script may work with regard to >>>>> "live" migration. >>>>> >>>>> I'm confused, this script seems to require virtio-net to be configured >>>>> on top of a different PF than where the migrating VF is seated. Or >>>>> else, how does identical MAC address filter get programmed to one PF >>>>> with two (or more) child virtual interfaces (e.g. one macvtap for >>>>> virtio-net plus one VF)? The coincidence of it being able to work on >>>>> the NIC of one/some vendor(s) does not apply to the others AFAIK. >>>>> >>>>> If you're planning to use a different PF, I don't see how gratuitous >>>>> ARP announcements are generated to make this a "live" migration. >>>> >>>> >>>> I am not using a different PF. virtio is backed by a tap/bridge with PF >>>> attached >>>> to that bridge. When we reset VF MAC after it is unplugged, all the >>>> packets >>>> for >>>> the guest MAC will go to PF and reach virtio via the bridge. >>>> >>> That is the limitation of this scheme: it only works for virtio backed >>> by tap/bridge, rather than backed by macvtap on top of the >>> corresponding *PF*. Nowadays more datacenter users prefer macvtap as >>> opposed to bridge, simply because of better isolation and performance >>> (e.g. host stack consumption on NIC promiscuity processing are not >>> scalable for bridges). Additionally, the ongoing virtio receive >>> zero-copy work will be tightly integrated with macvtap, the >>> performance optimization of which is apparently difficult (if >>> technically possible at all) to be done on bridge. Why do we limit the >>> host backend support to only bridge at this point? >> >> >> No. This should work with virtio backed by macvtap over PF too. >> >>> >>>> If we want to use virtio backed by macvtap on top of another VF as the >>>> backup >>>> channel, and we could set the guest MAC to that VF after unplugging the >>>> directly >>>> attached VF. >>> >>> I meant macvtap on the regarding PF instead of another VF. You know, >>> users shouldn't have to change guest MAC back and forth. Live >>> migration shouldn't involve any form of user intervention IMHO. >> >> Yes. macvtap on top of PF should work too. Hypervisor doesn't need to change >> the guest MAC. The PF driver needs to program the HW MAC filters so that >> the >> frames reach PF when VF is unplugged. > > So the HW MAC filter is deferred to get programmed for virtio only > until VF is unplugged, correct? This is not the regular plumbing order > for macvtap. Unless I miss something obvious, how does this get > reflected in the script below? > > virsh detach-interface $DOMAIN hostdev --mac $MAC > ip link set $PF vf $VF_NUM mac $ZERO_MAC > > i.e. commands above won't automatically trigger the programming of MAC > filters for virtio. > > If you program two identical MAC address filters for both VF and > virito at the same point, I'm sure if won't work at all. It does not > sound clear to me how you propose to make it work if you don't plan > to change the plumbing order? > >> >> >>> >>>>>> virsh migrate --live $DOMAIN qemu+ssh://$REMOTE_HOST/system >>>>>> >>>>>> ssh $REMOTE_HOST ip link set $PF vf $VF_NUM mac $MAC >>>>>> ssh $REMOTE_HOST virsh attach-interface $DOMAIN hostdev $REMOTE_HOSTDEV >>>>>> --mac $MAC >>>>> >>>>> How do you keep guest side VF configurations e.g. MTU and VLAN filters >>>>> around across the migration? More broadly, how do you make sure the >>>>> new VF still as performant as previously done such that all hardware >>>>> ring tunings and offload settings can be kept as much as it can be? >>>>> I'm afraid this simple script won't work for those real-world >>>>> scenarios. >>>> >>>> >>>>> I would agree with Alex that we'll soon need a host-side stub/entity >>>>> with cached guest configurations that may make VF switching >>>>> straightforward and transparent. >>>> >>>> The script is only adding MAC filter to the VF on the destination. If the >>>> source host has >>>> done any additional tunings on the VF they need to be done on the >>>> destination host too. >>> >>> I was mainly saying the VF's run-time configuration in the guest more >>> than those to be configured from the host side. Let's say guest admin >>> had changed the VF's MTU value, the default of which is 1500, to 9000 >>> before the migration. How do you save and restore the old running >>> config for the VF across the migration? >> >> Such optimizations should be possible on top of this patch. We need to sync >> up >> any changes/updates to VF configuration/features with virtio. > > This is possible but not the ideal way to build it. Virtio perhaps > would not be the best place to stack this (VF specifics for live > migration) up further. We need a new driver and do it right from the > very beginning. > > Thanks, > -Siwei > >> >>> >>>> It is also possible that the VF on the destination is based on a totally >>>> different NIC which >>>> may be more or less performant. Or the destination may not even support a >>>> VF >>>> datapath too. >>> >>> This argument is rather weak. In almost all real-world live migration >>> scenarios, the hardware configurations on both source and destination >>> are (required to be) identical. Being able to support heterogenous >>> live migration doesn't mean we can do nothing but throw all running >>> configs or driver tunings away when it's done. Specifically, I don't >>> find a reason not to apply the guest network configs including NIC >>> offload settings if those are commonly supported on both ends, even on >>> virtio-net. While for some of the configs it might be noticeable for >>> user to respond to the loss or change, complaints would still arise >>> when issues are painful to troubleshoot and/or difficult to get them >>> detected and restored. This is why I say real-world scenarios are more >>> complex than just switch and go. >>> >> >> Sure. These patches by themselves don't enable live migration automatically. >> Hypervisor >> needs to do some additional setup before and after the migration.
Reasonably Related Threads
- [PATCH net-next 0/2] Enable virtio to act as a master for a passthru device
- [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a passthru device
- [PATCH net-next 0/2] Enable virtio to act as a master for a passthru device
- [PATCH net-next v12 5/5] virtio_net: Extend virtio to use VF datapath when available
- [virtio-dev] Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available