(Alex - search for your name down in the middle of this - there is one
question for you. You can probably save your neurons the trouble of
reading the rest)
On 8/28/21 6:56 AM, daggs wrote:> Greetings Laine,
>
>> Sent: Wednesday, August 25, 2021 at 7:53 PM
>> From: "Laine Stump" <laine at redhat.com>
>> To: "daggs" <daggs at gmx.com>
>> Cc: "Martin Kletzander" <mkletzan at redhat.com>,
libvirt-users at redhat.com
>> Subject: Re: issues with vm after upgrade
>>
>> On 8/20/21 12:07 PM, daggs wrote:
>>> Greetings Laine,
>>>
>>>> Sent: Monday, August 16, 2021 at 12:57 AM
>>>> From: "Laine Stump" <laine at redhat.com>
>>>> To: "daggs" <daggs at gmx.com>
>>>> Cc: "Martin Kletzander" <mkletzan at
redhat.com>, libvirt-users at redhat.com
>>>> Subject: Re: issues with vm after upgrade
>>>>
>>>>
>>>>
>>>> On 8/14/21 6:05 AM, daggs wrote:
>>>>> Greetings Martin,
>>>>>
>>>>>> Sent: Thursday, August 12, 2021 at 2:07 PM
>>>>>> From: "daggs" <daggs at gmx.com>
>>>>>> To: "Martin Kletzander" <mkletzan at
redhat.com>
>>>>>> Cc: dan at berrange.com, libvirt-users at redhat.com
>>>>>> Subject: Re: issues with vm after upgrade
>>>>>>
>>>>>>> Sent: Thursday, August 12, 2021 at 11:49 AM
>>>>>>> From: "Martin Kletzander" <mkletzan at
redhat.com>
>>>>>>> To: "daggs" <daggs at gmx.com>
>>>>>>> Cc: dan at berrange.com, libvirt-users at
redhat.com
>>>>>>> Subject: Re: issues with vm after upgrade
>>>>>>>
>>>>>>> On Wed, Aug 11, 2021 at 08:53:10PM +0200, daggs
wrote:
>>>>>>>> Greetings Martin,
>>>>>>>>
>>>>>>>>
>>>>>>>>> Sent: Wednesday, August 11, 2021 at 6:08 PM
>>>>>>>>> From: "daggs" <daggs at
gmx.com>
>>>>>>>>> To: "Martin Kletzander"
<mkletzan at redhat.com>
>>>>>>>>> Cc: dan at berrange.com, libvirt-users at
redhat.com
>>>>>>>>> Subject: Re: issues with vm after upgrade
>>>>>>>>>
>>>>>>>>> Greetings Martin,
>>>>>>>>>
>>>>>>>>>> Sent: Wednesday, August 11, 2021 at
4:13 PM
>>>>>>>>>> From: "Martin Kletzander"
<mkletzan at redhat.com>
>>>>>>>>>> To: "daggs" <daggs at
gmx.com>
>>>>>>>>>> Cc: dan at berrange.com, libvirt-users
at redhat.com
>>>>>>>>>> Subject: Re: issues with vm after
upgrade
>>>>>>>>>>
>>>>>>>>>> On Wed, Aug 11, 2021 at 03:09:34PM
+0200, daggs wrote:
>>>>>>>>>>> Greetings Martin,
>>>>>>>>>>>
>>>>>>>>>>>> Sent: Wednesday, August 11,
2021 at 10:14 AM
>>>>>>>>>>>> From: "Martin
Kletzander" <mkletzan at redhat.com>
>>>>>>>>>>>> To: "daggs" <daggs
at gmx.com>
>>>>>>>>>>>> Cc: dan at berrange.com,
libvirt-users at redhat.com
>>>>>>>>>>>> Subject: Re: issues with vm
after upgrade
>>>>>>>>>>>>
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2) To your issue with starting
the domain it would be good to know what
>>>>>>>>>>>> is the error you get
from virsh (or however you are starting the
>>>>>>>>>>>> domain) and the debug
logs of libvirtd, ideally just for the part of
>>>>>>>>>>>> the domain starting.
>>>>>>>>>>> that is the issue, there wasn't
any error. the vm just didn't booted.
>>>>>>>>>>
>>>>>>>>>> Oh, so I misunderstood. What was the
state of the VM in libvirt?
>>>>>>>>>> "paused" or
"running"? Was there serial console working?
>>>>>>>>> it was marked as running and there was no
serial
>>>>>>>>>
>>>>>>>
>>>>>>> That's a pity we could not examine what was
actually happening.
>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> I can diff the original xml with
the new one to see the diffs and post them here if you wish
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Would be nice to see if there are any
differences. The newly created
>>>>>>>>>> one works then?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'll sent it later today
>>>>>>>>>
>>>>>>>>
>>>>>>>> here: https://dpaste.com/5VBUU8Z9W
>>>>>>>>
>>>>>>>
>>>>>>> Unfortunately there are many differences there.
The machine type
>>>>>>> changes _something_ in qemu, there is different
PCI(e) topology, and I
>>>>>>> do not think I will be able to figure this out
without the non-working
>>>>>>> machine.
>>>>>>>
>>>>>>> So if your current setup works for you right now
I'd leave figuring out
>>>>>>> the previous issue to others, if there is anyone
wanting to figure out
>>>>>>> if there is some libvirt issue.
>>>>>>>
>>>>>>> Have a nice day
>>>>>>>
>>>>>>
>>>>>> my current setup works beside the hdmi audio, this I
still need to investigate.
>>>>>>
>>>>>> thanks for your help.
>>>>>>
>>>>>> Dagg
>>>>>>
>>>>>
>>>>> just to update, I've solved the sound issue, frankly, I
don't understand how the guest showed a soundcard in the first place.
>>>>> from what I gather, libvirt sets the -nodefaults flag to
prepare the vm's properties from scratch.
>>>>> in this situation, the sound card is a function in the host
machine's pci tree.
>>>>> when libvirt created the pci tree for the guest, it placed
the card as a function of a device as well, in my case 02:00.2
>>>>> however it didn't created a device at 02:00.0.
>>>>
>>>> Are you basing this claim on the libvirt XML? Or on what you
see with
>>>> lspci in the guest?
>>>>
>>>> When libvirt is assigning PCI addresses to devices in a guest,
it will
>>>> never auto-assign a non-0 function. This will only happen if
the user
>>>> explicitly requests it (and even then, iirc, libvirt should
generate an
>>>> error if function 0 of the same slot has no device - something
to the
>>>> effect of "no device on function 0 of a multifunction
device").
>>>>
>>>> Anyway, when I looked back at the XML diff you posted earlier
(see
>>>> below), I didn't see any hostdev device assigned to
02:00.2. What I
>>>> *did* see was that in both the old and the new version of the
diff, the
>>>> hostdev devices were assigned to function 0 of different
*slots* on a
>>>> dmi-to-pci-bridge controller, which should cause no problems
(unless
>>>> there is a bug in QEMU's dmi-to-pci-bridge). (The important
thing,
>>>> though, is that there is no hostdev device on a non-0 function,
and when
>>>> it is on a non-0 slot, that's because it's on a
dmi-to-pci-bridge (which
>>>> has 32 slots).
>>
>>> I saw it in guest,
>>
>> But I didn't see it in the XML diffs that you had posted.
> as mentioned below, here is the xml of the new vm but with the sound
problem: https://dpaste.com/BB9EDY6BK
> the relevant entry is at https://dpaste.com/BB9EDY6BK#line-130
> you can see that the bdf is 08:01.0, note that there is no device defined
at 08:00.0.
> I may be wrong and there is no need for such device but if I run lspci on
both my linux system, I don't see device with such scenario.
Ah, that is a device at a non-0 *slot*, not a non-zero function - that
is bus 8, slot 1, function 0. You would never see that on a
*pcie-root-port* (where only slot 0 is usable), but your bus 8 is a
pcie-to-pci-bridge, where it is normal - they have 32 slots like the
root bus, and slot 0 is reserved for SHPC hotplug. Making slot 0 usable
requires disabling SHPC hotplug as it is enabled by default, and libvirt
doesn't even have a way to turn it on/off (since the only advantage is
that you have 32 slots instead of 31). So this config is correct (at
least if the device is to be put on a conventional PCI slot rather than
PCIe). It's possible that there is a bug in the pcie-to-pci-bridge in
QEMU, or in the guest's handling of devices on such a bridge though.
Since you say it worked before the upgrade, I'm inclined to think it's
the former.
(the thing about requiring something on "0" in order to have something
on a "non-0" is for functions, not slots - most (all?) OSes will not
scan the non-0 functions of a slot if they see no device on function 0.
That's not the case for the 32 (31 usable) slots on a conventional PCI
bus though; they are all scanned regardless of any empty gaps).
> note that the new vm was created using virt-manager, so the address
wasn't allocated by me
>
>>
>>> I'd assume that if libvirt defines a device on a specific bdf,
the guest will not change it.
>>
>> That's not exactly true - the bus "number" in libvirt
isn't given to
>> qemu as an actual number, but as an alphanumeric device id (called
>> "alias name" in libvirt XML). QEMU doesn't have any
concept of "bus
>> number", because (afaiu) there is no way to convey such info to
the
>> guest firmware/OS; instead, QEMU creates a topology of interconnected
>> controllers, the firmware and/or OS traverses this topology and assigns
>> numbers to the encountered controllers as it sees fit.
> so bottom line, libvirt defines the "order" of he device and qemu
creates however he wants but maintains libvirt's "order"?
Well, libvirt does put them on the qemu commandline in a specific order,
but AFAIU that ordering can't be propagated to the guest - the guest is
presented with the "root" of the PCI topology ("pci-root" on
440fx and
"pcie-root" on Q35), and discovers all the other buses by traversing
the
entire tree in whatever order it chooses (I would guest that it would
start looking in the 1st slot of pcie-root, but don't know if it does a
depth-first traversal, or breadth-first traversal).
>
>>
>> So you may have PCI controllers with indexes 1, 2, and 3 in your
libvirt
>> config, but those will be described on the QEMU commandline as
>> controllers "pcie.1", "pcie.2", and
"pcie.3":
>>
>> -device
>>
pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1
>> \
>> -device
pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 \
>> -device
pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 \
>>
>> and when a PCI device is attached to one of these controllers, the QEMU
>> commandline uses the id name of the controller, not a bus number:
>>
>> -device vfio-pci,host=0000:05:00.0,bus=pci.1,multifunction=on,addr=0x0
\
>> -device vfio-pci,host=0000:05:00.1,id=hostdev1,bus=pci.1,addr=0x0.0x1 \
>>
>> It is a nice coincidence that the OSes I've seen happen to traverse
the
>> PCI topology in a manner that results in the guest OS numbering the
>> buses the same as they are numbered in libvirt XML that has had PCI
>> addresses auto-assigned by libvirt, but it is trivial to make this
*not*
>> happen. For example, if you changed the config so that the bus with
>> index='2' (pcie.2) was attached to pcie.0, addr=0x1.0x1 (i.e.
change its
>> PCI address to <address type='pci' bus='0'
slot='1' function='1'/>" ,
>> and the bus with index='3' was attached to pcie.0,
addr=0x1.0x2, then
>> the guest would number "pcie.2" as bus 1, and
"pcie.1" as bus 2.
>>
>> And of course the guest OS is free to traverse the controller topology
>> in any manner it wants, so the bus numbering in the guest could be
>> different even if the libvirt-generated QEMU commandline was the same.
>>
>> *HOWEVER*, slot (device) and function number are specified on the QEMU
>> commandline numerically, and they will appear in the guest exactly as
>> they are in the libvirt XML.
>>
>>
>>
>>> infact, over the last 10 years I've booted thousand of systems
both bare metal and visualized and never encountered such scenario.
>>> that said, it might be a bug in qemu. >
>>> what I did saw is that on the old vm in guest, after the upgrade
the sound card was defined as a function of the scsi virtblk controller and the
new vm placed
>>> it as a function of non existent device.
>>
>> I would be very interested in seeing the libvirt XML, QEMU commandline,
>> and guest-side output of "lspci" for this. I can't think
of any way this
>> could happen without a serious bug *somewhere* (or manual intervention
>> in the PCI addresses in the guest).
> of the system with the sound issue? if so, I'll need to digg in my
logs, the most problematic part is the guest lspci
Now that we've determined the device was assigned to function 0 of a
non-zero slot on a standard PCI bridge, it all makes sense, so that is
no longer necessary.
>
>
>>
>>>
>>>>
>>>>
>>>> On the topic of having a dmi-to-pci-bridge show up in your XML:
I don't
>>>> remember what versions the changes were in (it was at least a
year or
>>>> two ago), but only a fairly old version of libvirt woud do that
- 1)
>>>> recent libvirt will assume that any hostdev PCI device is a
PCIe device,
>>>> so it will add a pcie-root-port and assign the hostdev device
to slot 0
>>>> of that root-port, and even before that 2) we switched from
using
>>>> dmi-to-pci-bridge to using pcie-to-pci-bridge quite some time
ago as well.
>>> as stated in the original mail, the issue started after a major
version upgrade of both libvirt and qemu,
>>> I'm currently using latest stable afaik.
>>
>> Right. If your guest was defined the first time using a much older
>> libvirt, then devices would have been assigned to an auto-created
>> dmi-to-pci-bridge at that time, and if you don't change (or remove)
the
>> PCI addresses of the devices or the bridge, then that will all be
>> maintained whenever you restart the guest, ragardless of libvirt
>> upgrades. But this again points out that the guest-side PCI addresses
>> (which are determined by the PCI addresses in the libvirt config)
should
>> not change when upgrading libvirt (NOTE: 1) libvirt will only
>> auto-assign a new PCI address to a device if it doesn't already
have a
>> PCI address assigned to it, and 2) libvirt *never* auto-assigns a non-0
>> function except when adding a pcie-root-port (and in that case it will
>> always first assign something to function 0))
> then I wonder how the upgrade broke the system, in contrast, the other vm
I'm running (router with 5 nics in pt) worked out of the box
If I'm remembering correctly, you got it to work by manually putting the
audio device on 00:1F.3 (functions 0-2 of that slot are used by
integrated chipset devices, e.g. the SATA controller). I'd be curious if
it worked properly if the audio card was manually assigned to:
1) function 0 of an otherwise unused slot on bus 0. (reading further, it
looks like your tried that (00:02.0) and it did work)
2) slot 0, function 0 of a new pcie-root-port
I'm guessing that both of these would work properly.
I also wonder if other devices manually assigned to different slots on
bus 8 (the pcie-to-pci-bridge) would work properly. My top suspicions
are that 1) there is some sort of bug wrt the pcie-to-pci-bridge in
general, or 2) there is some sort of bug wrt doing vfio assignment of a
device onto a pcie-to-pci-bridge (or maybe it's even specific to
assigning an integrated chipset device from the host onto the
pcie-to-pci-bridge).
I just looked back at the code that decides whether a device is
conventional PCI or PCIe for the first time in a few years
(virPCIDeviceInit) and it looks like it might consider integrated
chipset devices to be conventional PCI (since they don't have any
"Express Capabilities Data"); this would explain why it's wanting
to
assign it to a pcie-to-pci-bridge. Maybe we should just always assign
devices to PCIe slots when the guest is Q35 though. Alex - what's your
opinion about this?
>
>>
>>
>>>
>>>>
>>>> So if you're generating new XML based on config that
doesn't have pci
>>>> controllers already in it, and you're seeing hostdevs (or
any other PCI
>>>> devices) assigned to an automatically-added dmi-to-pci-bridge,
then your
>>>> libvirt version is severely out of date.
>>> here are the version I'm using:
>>> # emerge --search app-emulation/libvirt app-emulation/qemu
>>>
>>> [ Results for search key : app-emulation/libvirt ]
>>> Searching...
>>>
>>> * app-emulation/libvirt
>>> Latest version available: 7.5.0
>>> Latest version installed: 7.5.0
>>> Size of files: 9749 KiB
>>> Homepage: https://www.libvirt.org/
https://gitlab.com/libvirt/libvirt/
>>> Description: C toolkit to manipulate virtual machines
>>> License: LGPL-2.1
>>>
>>> [ Applications found : 1 ]
>>>
>>>
>>> [ Results for search key : app-emulation/qemu ]
>>> Searching...
>>>
>>> * app-emulation/qemu
>>> Latest version available: 6.0.0-r52
>>> Latest version installed: 6.0.0-r52
>>> Size of files: 22724 KiB
>>> Homepage: http://www.qemu.org http://www.linux-kvm.org
>>> Description: QEMU + Kernel-based Virtual Machine userland
tools
>>> License: GPL-2 LGPL-2 BSD-2
>>>
>>> [ Applications found : 1 ]
>>>
>>>>
>>>>
>>>> On 8/11/21 2:53 PM, daggs wrote:
>>>> >> From: "daggs" <daggs at gmx.com>
>>>> >>> From: "Martin Kletzander"
<mkletzan at redhat.com>
>>>> >>> On Wed, Aug 11, 2021 at 03:09:34PM +0200, daggs
wrote:
>>>> >>>> I can diff the original xml with the new
one to see the diffs and
>>>> post them here if you wish
>>>> >>>>
>>>> >>>
>>>> >>> Would be nice to see if there are any
differences. The newly created
>>>> >>> one works then?
>>>> >>
>>>> >> I'll sent it later today
>>>> >>
>>>> >
>>>> > here: https://dpaste.com/5VBUU8Z9W
>>>>
>>>>
>>>>> my fix was to move the device to 00:1f.4 in the guest.
>>>>
>>>> That's an interesting choice :-). You could have just put
it on function
>>>> 0 of some other unused slot (or a non-0 function of the slot
the GPU is
>>>> assigned to). 00:1f is used for integrated devices on the Q35
chipset -
>>>> it's nice that QEMU's emulation code was written to
allowing adding more
>>>> devices on that slot, but I wouldn't have been surprised if
it had
>>>> caused problems...
>>> 10 years of working in a virtualization company has taught me that
somethings, keeping the pci structure close as much as possible
>>> to the original is the best way to go.
>>> that is why I chose it a s func, it is a func on the host mahcine.
>>
>> It wasn't a function of a slot that also contains integrated
chipset
>> devices though.... Oh, wait. According to the XML you reference down
>> below, it looks like the audio device you're assigning to the guest
*is
>> itself* integrated on the chipset of the host, is that right?
>>
>> (It's interesting that this function of slot 1F is apparently in a
>> different IOMMU group than the other functions of slot 1F. I would have
>> guessed they would all be in the same IOMMU group, resulting in an
>> inability to assign this one function to the guest without at least
>> disabling the other devices on slot 1F (by binding them to the vfio-pci
>> driver)
> I'm using the pcie acs override patch to split the soundcard and the
nic as they belong in different vms.
Is the NIC in the other guest assigned to a pcie-to-pci-bridge? Or is it
assigned to a pcie-root-port?
>
>>
>>>
>>>>
>>>>
>>>>> I won't be surprised this was the issue why the vm
didn't booted after the upgrade with the old xml.
>>>>
>>>> Well, if your XML had a device assigned to a non-0 function of
a slot
>>>> and no device in function 0 of that slot, it would have failed
to work
>>>> previously as well (my recollection is that in this case
it's more a
>>>> problem of the guest OS not probing non-0 functions when there
is
>>>> nothing on function 0, and not with anything done by QEMU).
>>>>
>>>>
>>> here is the xml of the machine after I've recreated it, it
worked but no sound: https://dpaste.com/BB9EDY6BK
>>> I used virt-manager. note that the sound card pt is placed as a
func in bus 0x8 which doesn't exists.
>>
>> This doesn't show any devices assigned to non-0 functions in the
guest
>> (which is the part of what you said in previous messages that sounded
>> wrong to me). (except for the SATA controller, which is listed in the
>> libvirt config only for informational purposes, as it is hardcoded into
>> the basic q35 virtual machine and can't be removed).
>>
>> What is does show is that there is a device a 00:1F.3 *on the host*
that
>> is being assigned to 08:01.00 (slot 1, function 0 of the
>> pcie-pci-bridge) in the guest. I'm guessing this is the audio
device?
>> Also in this version of the XML, there is no longer a
dmi-to-pci-bridge,
>> but there is instead a pcie-to-pci-bridge, implying that you've
>> redefined the guest config, resulting in PCI address auto-assignment
>> being re-run (at least relative to the config you referenced last week
>> that had a dmi-to-pci-bridge).
>>
>> It's possible that the audio device's driver just doesn't
like the
>> device being on a standard PCI (i.e. non-PCIe) slot in the guest
>> somehow, since it's a chipset-integrated PCIe device on the host. I
>> haven't heard of that being the case in the past, but it's
possible.
>>
>>
>> Anyway, at this point I've lost track of all the changes that have
>> happened (your update entailed much more than just updating the libvirt
>> package - your guest config was also changed/redefined) so I don't
know
>> how much more effort should be expended with post-mortem, especially
>> since you now have it working. One thing that I would note is that we
>> should probably be auto-assigning integrated chipset devices to
>> pcie-root-ports rather than to a pci-bridge (I thought we already did
>> that, but I can see how we might not).
>>
>>
> I'll try to sum it up, prior to upgrade, both vms worked.
> after the upgrade, only the router vm worked, the streamer one started and
never ran,
> I've started experimenting with qemu cmdline invocation, I've got
to a situation where the vm was up and running with every thing defined beside
the nic link active.
> this lead me to believe that the issue is with my config.
> I've tried to downgrade to previous versions however the vm still
didn't booted.
> I've then reached the assumption that following the upgrade, the
vm's xml was changed. because I didn't had the previous xml, I cannot
verify.
> my next step was to recreate the vm's xml using virt-manager under the
assumption that new config defined by virt-manager will work.
> that was partially correct, after the recreation was completed, the vm
booted however the hdmi sound wasn't found.
> this sent me back to the qemu cmdline as I had a working cmd line
invocation.
> I've started adding and removing entries from the generated qemu line
to the working cmd and found out that the issue was caused because of the
-nodefaults switch in qemu.
> this lead me to inspect the pci tree created and found out that in my
working scenario, the sound card was placed in 00:02.0 and in the malfunctioning
scenario,
How did this happen? Was it a case where you created the qemu
commandline yourself and didn't provide a PCI address for the soundcard
on the commandline? (that must be what happened since, with only a few
specific exceptions with emulated devices that are part of the Q35
chipset, libvirt wouldn't auto-assign a device to anywhere on bus 0 of a
Q35 guest).
> the sound card was placed at 02:01.0 (got screen shots of it)
> this lead me to the conclusion that the pci config might cause it, moving
it to 1f.x worked
I remember there being some trouble with the pcie-to-pci-bridge in the
past (seems like it was devices being automatically unplugged after a
short time). I thought that was long ago fixed (in QEMU) but maybe I'm
wrong. In the end I think it's best to avoid pci-to-pcie-bridges (which
is why I asked Alex above if he thought it would be okay to just assign
host integrated devices to pcie-root-ports rather than treating them
like conventional PCI)