Julia Suvorova
2022-Dec-08 16:15 UTC
Predictable and consistent net interface naming in guests
On Thu, Nov 3, 2022 at 9:26 AM Amnon Ilan <ailan at redhat.com> wrote:> > > > On Thu, Nov 3, 2022 at 12:13 AM Amnon Ilan <ailan at redhat.com> wrote: >> >> >> >> On Wed, Nov 2, 2022 at 6:47 PM Laine Stump <laine at redhat.com> wrote: >>> >>> On 11/2/22 11:58 AM, Igor Mammedov wrote: >>> > On Wed, 2 Nov 2022 15:20:39 +0000 >>> > Daniel P. Berrang? <berrange at redhat.com> wrote: >>> > >>> >> On Wed, Nov 02, 2022 at 04:08:43PM +0100, Igor Mammedov wrote: >>> >>> On Wed, 2 Nov 2022 10:43:10 -0400 >>> >>> Laine Stump <laine at redhat.com> wrote: >>> >>> >>> >>>> On 11/1/22 7:46 AM, Igor Mammedov wrote: >>> >>>>> On Mon, 31 Oct 2022 14:48:54 +0000 >>> >>>>> Daniel P. Berrang? <berrange at redhat.com> wrote: >>> >>>>> >>> >>>>>> On Mon, Oct 31, 2022 at 04:32:27PM +0200, Edward Haas wrote: >>> >>>>>>> Hi Igor and Laine, >>> >>>>>>> >>> >>>>>>> I would like to revive a 2 years old discussion [1] about consistent network >>> >>>>>>> interfaces in the guest. >>> >>>>>>> >>> >>>>>>> That discussion mentioned that a guest PCI address may change in two cases: >>> >>>>>>> - The PCI topology changes. >>> >>>>>>> - The machine type changes. >>> >>>>>>> >>> >>>>>>> Usually, the machine type is not expected to change, especially if one >>> >>>>>>> wants to allow migrations between nodes. >>> >>>>>>> I would hope to argue this should not be problematic in practice, because >>> >>>>>>> guest images would be made per a specific machine type. >>> >>>>>>> >>> >>>>>>> Regarding the PCI topology, I am not sure I understand what changes >>> >>>>>>> need to occur to the domxml for a defined guest PCI address to change. >>> >>>>>>> The only think that I can think of is a scenario where hotplug/unplug is >>> >>>>>>> used, >>> >>>>>>> but even then I would expect existing devices to preserve their PCI address >>> >>>>>>> and the plug/unplug device to have a reserved address managed by the one >>> >>>>>>> acting on it (the management system). >>> >>>>>>> >>> >>>>>>> Could you please help clarify in which scenarios the PCI topology can cause >>> >>>>>>> a mess to the naming of interfaces in the guest? >>> >>>>>>> >>> >>>>>>> Are there any plans to add the acpi_index support? >>> >>>>>> >>> >>>>>> This was implemented a year & a half ago >>> >>>>>> >>> >>>>>> https://libvirt.org/formatdomain.html#network-interfaces >>> >>>>>> >>> >>>>>> though due to QEMU limitations this only works for the old >>> >>>>>> i440fx chipset, not Q35 yet. >>> >>>>> >>> >>>>> Q35 should work partially too. In its case acpi-index support >>> >>>>> is limited to hotplug enabled root-ports and PCIe-PCI bridges. >>> >>>>> One also has to enable ACPI PCI hotplug (it's enled by default >>> >>>>> on recent machine types) for it to work (i.e.it's not supported >>> >>>>> in native PCIe hotplug mode). >>> >>>>> >>> >>>>> So if mgmt can put nics on root-ports/bridges, then acpi-index >>> >>>>> should just work on Q35 as well. >>> >>>> >>> >>>> With only a few exceptions (e.g. the first ich9 audio device, which is >>> >>>> placed directly on the root bus at 00:1B.0 because that is where the >>> >>>> ich9 audio device is located on actual Q35 hardware), libvirt will >>> >>>> automatically put all PCI devices (including network interfaces) on a >>> >>>> pcie-root-port. >>> >>>> >>> >>>> After seeing reports that "acpi index doesn't work with Q35 >>> >>>> machinetypes" I just assumed that was correct and didn't try it. But >>> >>>> after seeing the "should work partially" statement above, I tried it >>> >>>> just now and an <interface> of a Q35 guest that had its PCI address >>> >>>> auto-assigned by libvirt (and so was placed on a pcie-root-port)m and >>> >>>> had <acpi index='4'/> was given the name "eno4". So what exactly is it >>> >>>> that *doesn't* work? >>> >>> >>> >>> From QEMU side: >>> >>> acpi-index requires: >>> >>> 1. acpi pci hotplug enabled (which is default on relatively new q35 machine types) >>> >>> 2. hotpluggble pci bus (root-port, various pci bridges) >>> >>> 3. NIC can be cold or hotplugged, guest should pick up acpi-index of the device >>> >>> currently plugged into slot >>> >>> what doesn't work: >>> >>> 1. device attached to host-bridge directly (work in progress) >>> >>> (q35) >>> >>> 2. devices attached to any PXB port and any hierarchy hanging of it (there are not plans to make it work) >>> >>> (q35, pc) >>> >> >>> >> I'd say this is still a relatively important, as the PXBs are needed >>> >> to create a NUMA placement aware topology for guests, and I'd say it >>> >> is undesirable to loose acpi-index if a guest is updated to be NUMA >>> >> aware, or if a guest image can be deployed in either normal or NUMA >>> >> aware setups. >>> > >>> > it's not only Q35 but also PC. >>> > We basically do not generate ACPI hierarchy for PXBs at all, >>> > so neither ACPI hotplug nor depended acpi-index would work. >>> > It's been so for many years and no one have asked to enable >>> > ACPI hotplug on them so far. >>> >>> I'm guessing (based on absolutely 0 information :-)) that there would be >>> more demand for acpi-index (and the resulting predictable interface >>> names) than for acpi hotplug for NUMA-aware setup. >> >> >> My guess is similar, but it is still desirable to have both (i.e. support ACPI-indexing/hotplug with Numa-aware) >> Adding @Peter Xu to check if our setups for SAP require NUMA-aware topology >> >> How big of a project would it be to enable ACPI-indexing/hotplug with PXB?Why would you need to add acpi hotplug on pxb?> Adding +Julia Suvorova and +Tsirkin, Michael to help answer this question > > Thanks, > Amnon > >> >> Since native PCI was improved, we can still compromise on switching to native-PCI-hotplug when PXB is required (and no fixed indexing)Native hotplug works on pxb as is, without disabling acpi hotplug.>> Thanks, >> Amnon >> >> >>> >>> >>> Anyway, it sounds like (*within the confines of how libvirt constructs >>> the PCI topology*) we actually have functional parity of acpi-index >>> between 440fx and Q35. >>>
Laine Stump
2022-Dec-08 16:44 UTC
Predictable and consistent net interface naming in guests
On 12/8/22 11:15 AM, Julia Suvorova wrote:> On Thu, Nov 3, 2022 at 9:26 AM Amnon Ilan <ailan at redhat.com> wrote: >> >> >> >> On Thu, Nov 3, 2022 at 12:13 AM Amnon Ilan <ailan at redhat.com> wrote: >>> >>> >>> >>> On Wed, Nov 2, 2022 at 6:47 PM Laine Stump <laine at redhat.com> wrote: >>>> >>>> On 11/2/22 11:58 AM, Igor Mammedov wrote: >>>>> On Wed, 2 Nov 2022 15:20:39 +0000 >>>>> Daniel P. Berrang? <berrange at redhat.com> wrote: >>>>> >>>>>> On Wed, Nov 02, 2022 at 04:08:43PM +0100, Igor Mammedov wrote: >>>>>>> On Wed, 2 Nov 2022 10:43:10 -0400 >>>>>>> Laine Stump <laine at redhat.com> wrote: >>>>>>> >>>>>>>> On 11/1/22 7:46 AM, Igor Mammedov wrote: >>>>>>>>> On Mon, 31 Oct 2022 14:48:54 +0000 >>>>>>>>> Daniel P. Berrang? <berrange at redhat.com> wrote: >>>>>>>>> >>>>>>>>>> On Mon, Oct 31, 2022 at 04:32:27PM +0200, Edward Haas wrote: >>>>>>>>>>> Hi Igor and Laine, >>>>>>>>>>> >>>>>>>>>>> I would like to revive a 2 years old discussion [1] about consistent network >>>>>>>>>>> interfaces in the guest. >>>>>>>>>>> >>>>>>>>>>> That discussion mentioned that a guest PCI address may change in two cases: >>>>>>>>>>> - The PCI topology changes. >>>>>>>>>>> - The machine type changes. >>>>>>>>>>> >>>>>>>>>>> Usually, the machine type is not expected to change, especially if one >>>>>>>>>>> wants to allow migrations between nodes. >>>>>>>>>>> I would hope to argue this should not be problematic in practice, because >>>>>>>>>>> guest images would be made per a specific machine type. >>>>>>>>>>> >>>>>>>>>>> Regarding the PCI topology, I am not sure I understand what changes >>>>>>>>>>> need to occur to the domxml for a defined guest PCI address to change. >>>>>>>>>>> The only think that I can think of is a scenario where hotplug/unplug is >>>>>>>>>>> used, >>>>>>>>>>> but even then I would expect existing devices to preserve their PCI address >>>>>>>>>>> and the plug/unplug device to have a reserved address managed by the one >>>>>>>>>>> acting on it (the management system). >>>>>>>>>>> >>>>>>>>>>> Could you please help clarify in which scenarios the PCI topology can cause >>>>>>>>>>> a mess to the naming of interfaces in the guest? >>>>>>>>>>> >>>>>>>>>>> Are there any plans to add the acpi_index support? >>>>>>>>>> >>>>>>>>>> This was implemented a year & a half ago >>>>>>>>>> >>>>>>>>>> https://libvirt.org/formatdomain.html#network-interfaces >>>>>>>>>> >>>>>>>>>> though due to QEMU limitations this only works for the old >>>>>>>>>> i440fx chipset, not Q35 yet. >>>>>>>>> >>>>>>>>> Q35 should work partially too. In its case acpi-index support >>>>>>>>> is limited to hotplug enabled root-ports and PCIe-PCI bridges. >>>>>>>>> One also has to enable ACPI PCI hotplug (it's enled by default >>>>>>>>> on recent machine types) for it to work (i.e.it's not supported >>>>>>>>> in native PCIe hotplug mode). >>>>>>>>> >>>>>>>>> So if mgmt can put nics on root-ports/bridges, then acpi-index >>>>>>>>> should just work on Q35 as well. >>>>>>>> >>>>>>>> With only a few exceptions (e.g. the first ich9 audio device, which is >>>>>>>> placed directly on the root bus at 00:1B.0 because that is where the >>>>>>>> ich9 audio device is located on actual Q35 hardware), libvirt will >>>>>>>> automatically put all PCI devices (including network interfaces) on a >>>>>>>> pcie-root-port. >>>>>>>> >>>>>>>> After seeing reports that "acpi index doesn't work with Q35 >>>>>>>> machinetypes" I just assumed that was correct and didn't try it. But >>>>>>>> after seeing the "should work partially" statement above, I tried it >>>>>>>> just now and an <interface> of a Q35 guest that had its PCI address >>>>>>>> auto-assigned by libvirt (and so was placed on a pcie-root-port)m and >>>>>>>> had <acpi index='4'/> was given the name "eno4". So what exactly is it >>>>>>>> that *doesn't* work? >>>>>>> >>>>>>> From QEMU side: >>>>>>> acpi-index requires: >>>>>>> 1. acpi pci hotplug enabled (which is default on relatively new q35 machine types) >>>>>>> 2. hotpluggble pci bus (root-port, various pci bridges) >>>>>>> 3. NIC can be cold or hotplugged, guest should pick up acpi-index of the device >>>>>>> currently plugged into slot >>>>>>> what doesn't work: >>>>>>> 1. device attached to host-bridge directly (work in progress) >>>>>>> (q35) >>>>>>> 2. devices attached to any PXB port and any hierarchy hanging of it (there are not plans to make it work) >>>>>>> (q35, pc) >>>>>> >>>>>> I'd say this is still a relatively important, as the PXBs are needed >>>>>> to create a NUMA placement aware topology for guests, and I'd say it >>>>>> is undesirable to loose acpi-index if a guest is updated to be NUMA >>>>>> aware, or if a guest image can be deployed in either normal or NUMA >>>>>> aware setups. >>>>> >>>>> it's not only Q35 but also PC. >>>>> We basically do not generate ACPI hierarchy for PXBs at all, >>>>> so neither ACPI hotplug nor depended acpi-index would work. >>>>> It's been so for many years and no one have asked to enable >>>>> ACPI hotplug on them so far. >>>> >>>> I'm guessing (based on absolutely 0 information :-)) that there would be >>>> more demand for acpi-index (and the resulting predictable interface >>>> names) than for acpi hotplug for NUMA-aware setup. >>> >>> >>> My guess is similar, but it is still desirable to have both (i.e. support ACPI-indexing/hotplug with Numa-aware) >>> Adding @Peter Xu to check if our setups for SAP require NUMA-aware topology >>> >>> How big of a project would it be to enable ACPI-indexing/hotplug with PXB? > > Why would you need to add acpi hotplug on pxb? > >> Adding +Julia Suvorova and +Tsirkin, Michael to help answer this question >> >> Thanks, >> Amnon >> >>> >>> Since native PCI was improved, we can still compromise on switching to native-PCI-hotplug when PXB is required (and no fixed indexing) > > Native hotplug works on pxb as is, without disabling acpi hotplug.Are you saying you can add an acpi-index to a device plugged into a pxb, that index will be recognized (and used to name the device), but it will still do native hotplug? That sounds okay to me, since it ticks all the functional marks (hotplug, consistent device names, NUMA-aware). It's possible there are some things I'm misunderstanding or haven't thought of though...> >>> Thanks, >>> Amnon >>> >>> >>>> >>>> >>>> Anyway, it sounds like (*within the confines of how libvirt constructs >>>> the PCI topology*) we actually have functional parity of acpi-index >>>> between 440fx and Q35. >>>> >