In attempting to isolate vfio-pci problems between two different guest instances, the creation of a second guest (with existing guest shutdown) resulted in:. Aug 09 12:43:23 grit libvirtd[6716]: internal error: Device 0000:01:00.3 is already in use Aug 09 12:43:23 grit libvirtd[6716]: internal error: Device 0000:01:00.3 is already in use Aug 09 12:43:23 grit libvirtd[6716]: Failed to allocate PCI device list: internal error: Device 0000:01:00.3 is already in use Compiled against library: libvirt 6.1.0 Using library: libvirt 6.1.0 Using API: QEMU 6.1.0 Running hypervisor: QEMU 4.2.1 (fc32 default install) The upstream code seems also to test definitions rather than active uses of the PCI device. My potentially naive patch to correct this (but not the failing test cases) would be: diff --git a/src/util/virpci.c b/src/util/virpci.c index 47c671daa0..a00c5e6f44 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1597,7 +1597,7 @@ int virPCIDeviceListAdd(virPCIDeviceListPtr list, virPCIDevicePtr dev) { - if (virPCIDeviceListFind(list, dev)) { + if (virPCIDeviceBusContainsActiveDevices(dev, list)) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Device %s is already in use"), dev->name); return -1; Is this too simplistic or undesirable a feature request/implementation? I'd be more than grateful if someone carries this through as I'm unsure when I may get time for this.
In attempting to isolate vfio-pci problems between two different guest instances, the creation of a second guest (with existing guest shutdown) resulted in:. Aug 09 12:43:23 grit libvirtd[6716]: internal error: Device 0000:01:00.3 is already in use Aug 09 12:43:23 grit libvirtd[6716]: internal error: Device 0000:01:00.3 is already in use Aug 09 12:43:23 grit libvirtd[6716]: Failed to allocate PCI device list: internal error: Device 0000:01:00.3 is already in use Compiled against library: libvirt 6.1.0 Using library: libvirt 6.1.0 Using API: QEMU 6.1.0 Running hypervisor: QEMU 4.2.1 (fc32 default install) The upstream code seems also to test definitions rather than active uses of the PCI device. My potentially naive patch to correct this (but not the failing test cases) would be: diff --git a/src/util/virpci.c b/src/util/virpci.c index 47c671daa0..a00c5e6f44 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1597,7 +1597,7 @@ int virPCIDeviceListAdd(virPCIDeviceListPtr list, virPCIDevicePtr dev) { - if (virPCIDeviceListFind(list, dev)) { + if (virPCIDeviceBusContainsActiveDevices(dev, list)) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Device %s is already in use"), dev->name); return -1; Is this too simplistic or undesirable a feature request/implementation? I'd be grateful if someone carries this through as I'm unsure when I may get time for this (busy handling other open source ecosystems). With an install and test mechanisms that doesn't create a mess of overwritten packages I"m happy to test a git branch or patches.
On 8/8/20 11:53 PM, Daniel Black wrote:> > In attempting to isolate vfio-pci problems between two different guest > instances, the creation of a second guest (with existing guest shutdown) > resulted in:. > > Aug 09 12:43:23 grit libvirtd[6716]: internal error: Device 0000:01:00.3 > is already in use > Aug 09 12:43:23 grit libvirtd[6716]: internal error: Device 0000:01:00.3 > is already in use > Aug 09 12:43:23 grit libvirtd[6716]: Failed to allocate PCI device list: > internal error: Device 0000:01:00.3 is already in useHmm. Normally the error that would be logged if a device is already in use would say something like this: error: Failed to start domain Win10-GPU error: Requested operation is not valid: PCI device 0000:05:00.0 is in use by driver QEMU, domain F30 So you're encountering this in an unexpected place.> > Compiled against library: libvirt 6.1.0 > Using library: libvirt 6.1.0 > Using API: QEMU 6.1.0 > Running hypervisor: QEMU 4.2.1 > (fc32 default install) > > The upstream code seems also to test definitions rather than active > uses of the PCI device.That isn't the case. You're misunderstanding what devices are on the list. (see below for details)> > My potentially naive patch to correct this (but not the failing test > cases) would be: > > diff --git a/src/util/virpci.c b/src/util/virpci.c > index 47c671daa0..a00c5e6f44 100644 > --- a/src/util/virpci.c > +++ b/src/util/virpci.c > @@ -1597,7 +1597,7 @@ int > virPCIDeviceListAdd(virPCIDeviceListPtr list, > virPCIDevicePtr dev) > { > - if (virPCIDeviceListFind(list, dev)) { > + if (virPCIDeviceBusContainsActiveDevices(dev, list)) { > virReportError(VIR_ERR_INTERNAL_ERROR, > _("Device %s is already in use"), dev->name); > return -1; > > Is this too simplistic or undesirable a feature request/implementation?Only devices that are currently in use by a guest (activePCIHostdevs), or that libvirt is in the process of detaching from the guest + vfio and rebinding to the device's host driver (inactivePCIHostdevs) are on either list of PCI devices maintained by libvirt. Once a device is completely detached from the guest and (if "managed='yes'" was set in the XML config) re-binded to the natural host driver for the device, it is removed from the list and can be used elsewhere. I just tested this with an assigned GPU + soundcard on two guests to verify that it works properly. (I'm running the latest upstream master though, so it's not an exact replication of your test)> > I'd be more than grateful if someone carries this through as I'm unsure > when I may get time for this.Can you provide the XML for your <hostdev> in the two guests, and the exact sequence of commands that lead to this error? There is definitely either a bug in the code, or a bug in what you're doing. By seeing the sequence of events, we can either attempt to replicate it, or let you know what change you need to make to your workflow to eliminate the error.
Thanks Laine, Mea Culpa, Couldn't reproduce, or found the multiple duplicate entries in the guest pci domain/bus/slot/function space which got resolved easily. [root@grit tmp]# virsh list Id Name State -------------------- [root@grit tmp]# virsh list --all Id Name State ------------------------------ - openbsd6.7 shut off - openindiana shut off - proxmox6.2 shut off - ubuntu20.04 shut off - win2k19 shut off [root@grit tmp]# virsh version Compiled against library: libvirt 6.1.0 Using library: libvirt 6.1.0 Using API: QEMU 6.1.0 Running hypervisor: QEMU 4.2.1 [root@grit tmp]# virsh dumpxml ubuntu20.04 | gzip -c > ubunt2004.xml.gz [root@grit tmp]# chown dan: ubunt2004.xml.gz [root@grit tmp]# virsh edit win2k19 (paste the 4 <hostdev mode='subsystem' type='pci' managed='yes'> devices from ubuntu2004 to the devices section) save and exit. error: XML error: Attempted double use of PCI Address 0000:08:00.0 Failed. Try again? [y,n,i,f,?]: [root@grit tmp]# virsh dumpxml win2k19 | gzip -c > win2k19.xml.gz Looking closer it seemed the 0000:08:00.0 was the guest address that already had a device defined on it. Removing those in virt-manager and readding them generated a warning that they were in another definition, however it did accept them. This, for 4 pci devices, confused libvrtd in the meantime however it was still functional. Aug 18 10:31:27 grit libvirtd[106082]: internal error: failed to get number of host interfaces: unspecified error - errors in loading some config files Aug 18 10:31:55 grit libvirtd[106082]: internal error: failed to get number of host interfaces: unspecified error - errors in loading some config files Aug 18 10:32:17 grit libvirtd[106082]: internal error: failed to get number of host interfaces: unspecified error - errors in loading some config files Aug 18 10:32:32 grit libvirtd[106082]: internal error: failed to get number of host interfaces: unspecified error - errors in loading some config files On Tue, Aug 18, 2020 at 1:00 AM Laine Stump <laine@redhat.com> wrote:> On 8/8/20 11:53 PM, Daniel Black wrote: > > > > In attempting to isolate vfio-pci problems between two different guest > > instances, the creation of a second guest (with existing guest shutdown) > > resulted in:. > > > > Aug 09 12:43:23 grit libvirtd[6716]: internal error: Device 0000:01:00.3 > > is already in use > > Aug 09 12:43:23 grit libvirtd[6716]: internal error: Device 0000:01:00.3 > > is already in use > > Aug 09 12:43:23 grit libvirtd[6716]: Failed to allocate PCI device list: > > internal error: Device 0000:01:00.3 is already in use > > Hmm. Normally the error that would be logged if a device is already in > use would say something like this: > > error: Failed to start domain Win10-GPU > error: Requested operation is not valid: PCI device 0000:05:00.0 is in > use by driver QEMU, domain F30 > > So you're encountering this in an unexpected place. > > > > > Compiled against library: libvirt 6.1.0 > > Using library: libvirt 6.1.0 > > Using API: QEMU 6.1.0 > > Running hypervisor: QEMU 4.2.1 > > (fc32 default install) > > > > The upstream code seems also to test definitions rather than active > > uses of the PCI device. > > > That isn't the case. You're misunderstanding what devices are on the > list. (see below for details) > > > > > My potentially naive patch to correct this (but not the failing test > > cases) would be: > > > > diff --git a/src/util/virpci.c b/src/util/virpci.c > > index 47c671daa0..a00c5e6f44 100644 > > --- a/src/util/virpci.c > > +++ b/src/util/virpci.c > > @@ -1597,7 +1597,7 @@ int > > virPCIDeviceListAdd(virPCIDeviceListPtr list, > > virPCIDevicePtr dev) > > { > > - if (virPCIDeviceListFind(list, dev)) { > > + if (virPCIDeviceBusContainsActiveDevices(dev, list)) { > > virReportError(VIR_ERR_INTERNAL_ERROR, > > _("Device %s is already in use"), dev->name); > > return -1; > > > > Is this too simplistic or undesirable a feature request/implementation? > > Only devices that are currently in use by a guest (activePCIHostdevs), > or that libvirt is in the process of detaching from the guest + vfio and > rebinding to the device's host driver (inactivePCIHostdevs) are on > either list of PCI devices maintained by libvirt. Once a device is > completely detached from the guest and (if "managed='yes'" was set in > the XML config) re-binded to the natural host driver for the device, it > is removed from the list and can be used elsewhere. > > I just tested this with an assigned GPU + soundcard on two guests to > verify that it works properly. (I'm running the latest upstream master > though, so it's not an exact replication of your test) > > > > > > I'd be more than grateful if someone carries this through as I'm unsure > > when I may get time for this. > > > Can you provide the XML for your <hostdev> in the two guests, and the > exact sequence of commands that lead to this error? There is definitely > either a bug in the code, or a bug in what you're doing. By seeing the > sequence of events, we can either attempt to replicate it, or let you > know what change you need to make to your workflow to eliminate the error. > >