Richard W.M. Jones
2021-Dec-16 13:10 UTC
[Libguestfs] [v2v PATCH v2] convert_linux: translate the first CD-ROM's references in boot conf files
On Thu, Dec 16, 2021 at 12:12:33PM +0100, Laszlo Ersek wrote:> On 12/16/21 12:08, Laszlo Ersek wrote: > > > But, I'm attaching the full conversion log too. (NB this log was generated with my patch applied; however, the patch itself makes no difference regarding the boot failure, as stated before.) > > > > FWIW, I don't know if this warning (i.e., the failed attempt to mount /dev/sda) has anything to do with the boot failure in the converted guest. > > Apologies for conversing with myself :/ > > Noticed this (a bunch of it) in the log: > > > WARNING: PV /dev/sda2 in VG VolGroup00 is using an old PV header, modify the VG to update. > > I don't know if we ultimately perform any operation that modifies the volume group, but if we do, it seems plausible that that update prevents the LVM driver in RHEL5 from recognizing the volume group again :(This does indeed seem like a new problem (in LVM). I don't think we've seen this one before so it needs a bug to track it. About the patch: The fact that there's an existing bug does not block this patch from going upstream. It doesn't seem as if this patch is the cause. About device name translation: This is something of a mess at the moment. There's a lot of history ... Originally I didn't think very deeply about how device names should be represented in the API. For example if you had an API like int guestfs_mount (guestfs_h *g, char *device, char *mp) how should the device name be represented? In the original version it was basically the QEMU / kernel name, probably "/dev/vda1". When we switched to using virtio-scsi the device names changed ("/dev/sda1"), but that was fine I just added some code which spotted /dev/[hsv]... and translated it inside the daemon. (This is what "device name translation" does). Informally I said that "/dev/sda" in the API means the "first disk", (in the sense of "added first using guestfs_add_drive") "/dev/sdb" means second disk, "/dev/sda4" means the fourth partition of the first disk and so on. This was a convention but it wasn't encoded or enforced anywhere. That all worked for quite a long time, but then the kernel started to enumerate devices in parallel, so now if you two disks, you could no longer assume that the "first disk" might randomly appear as /dev/sda or /dev/sdb. We're still sending these strings across from the library to the daemon, but now we do some additional and very hokey translation using device serial numbers which you can check in the code (daemon/device-name-translation.c) I think what we _should_ be doing is translating the strings into an internal representation, which we'd use inside the library and serialise into the daemon. This is, naturally, an awful lot of work. Note that the ABI cannot be changed, so whatever happens we're still going to be passing strings to guestfs_* APIs, this is all about what happens internally after that point in the generated code. There's the additional complication that not everything mountable is a disk (eg. LVs, /dev/md*). Matt Booth already separated that out with a distinction between disks and "mountables". There's also the issue that we always add a hidden disk for the appliance, which is added after the other drives. We need to ignore that. There's also disk hotplugging which we should deprecate - it was a failed experiment, and indeed has never worked for the direct backend which is enough justification to replace those APIs with ones which print an error and return -1. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v
Laszlo Ersek
2021-Dec-16 16:53 UTC
[Libguestfs] [v2v PATCH v2] convert_linux: translate the first CD-ROM's references in boot conf files
On 12/16/21 14:10, Richard W.M. Jones wrote:> On Thu, Dec 16, 2021 at 12:12:33PM +0100, Laszlo Ersek wrote: >> On 12/16/21 12:08, Laszlo Ersek wrote: >> >>> But, I'm attaching the full conversion log too. (NB this log was >>> generated with my patch applied; however, the patch itself makes no >>> difference regarding the boot failure, as stated before.) >>> >>> FWIW, I don't know if this warning (i.e., the failed attempt to >>> mount /dev/sda) has anything to do with the boot failure in the >>> converted guest. >> >> Apologies for conversing with myself :/ >> >> Noticed this (a bunch of it) in the log: >> >>> WARNING: PV /dev/sda2 in VG VolGroup00 is using an old PV header, >>> modify the VG to update. >> >> I don't know if we ultimately perform any operation that modifies the >> volume group, but if we do, it seems plausible that that update >> prevents the LVM driver in RHEL5 from recognizing the volume group >> again :( > > This does indeed seem like a new problem (in LVM). I don't think > we've seen this one before so it needs a bug to track it.[... snipping the other explanations -- I've read them and they're great, but there's only so much I can internalize so quickly, and right now something else seems to be the issue after all ...] So, I reinstalled the RHEL-5.11 guest from scratch, and made sure that it did not use LVM at all, only an MBR partition table and three primary partitions (/boot, swap, and /). The issue reproduces just the same (just with different names). Then, it dawned on me. RHEL-5 does not have virtio-1.0 drivers, only virtio-0.9 drivers. And a QEMU monitor command (info qtree) confirmed that, for all virtio devices in the output domain, the "disable_legacy" property was "on". Meaning, these were not transitional virtio devices, but purely virtio-1.0 devices. I've looked at both the domain XML directly generated by virt-v2v, and the variant that libvirtd actually maintains after the "virsh define". In the former, there is no PCI topology information (no <address> elements), and also no "virtio-transitional" model specifications: https://libvirt.org/formatdomain.html#virtio-transitional-devices (1) <disk type='volume' device='disk'> <driver name='qemu' type='raw'/> <source pool='default' volume='rhel5.11.out-sda'/> <target dev='vda' bus='virtio'/> </disk> (2) <interface type='bridge'> <source bridge='virbr0'/> <model type='virtio'/> <mac address='52:54:00:6c:97:64'/> </interface> (3) <memballoon model='virtio'/> Accordingly, because the Q35 machine type is a PCI Express, not traditional PCI, board, libvirtd generates the PCIe topology such that each virtio device lands in a separate PCI Express Root Port: (1) <disk type='volume' device='disk'> <driver name='qemu' type='raw'/> <source pool='default' volume='rhel5.11.out-sda'/> <target dev='vda' bus='virtio'/> <boot order='2'/> <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/> ^^^^^^^^^^ </disk> with <controller type='pci' index='5' model='pcie-root-port'> ^^^^^^^^^ (2) <interface type='bridge'> <mac address='52:54:00:6c:97:64'/> <source bridge='virbr0'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> ^^^^^^^^^^ </interface> with <controller type='pci' index='2' model='pcie-root-port'> ^^^^^^^^^ (3) <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/> ^^^^^^^^^^ </memballoon> with <controller type='pci' index='6' model='pcie-root-port'> ^^^^^^^^^ In turn, the virtio-pci code in QEMU works such that, if the "disable_legacy" device property is not explicitly specified, then the virtio device's placement in the PCI(e) topology decides whether the device is transitional, or purely virtio-1.0. See function virtio_pci_realize() in "hw/virtio/virtio-pci.c": bool pcie_port = pci_bus_is_express(pci_get_bus(pci_dev)) && !pci_bus_is_root(pci_get_bus(pci_dev)); ... if (proxy->disable_legacy == ON_OFF_AUTO_AUTO) { proxy->disable_legacy = pcie_port ? ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF; } The particular "pcie_port" initialization means that PCI Express Root Ports will set "pcie_port" to true, but the PCI Express *root complex* -- that is, when the virtio devices are endpoints integrated into the PCIe root complex -- will set "pcie_port" to "false". (The difference is made by pci_bus_is_root() -- returns "false" for the root ports, but "true" for the root complex.) In other words, if all these virtio devices had been placed on: <controller type='pci' index='0' model='pcie-root'/> ^^^^^^^^^ then the devices would have been transitional. In order to verify this theory, I updated the addresses like that (moving each virtio device to a different slot of bus='0x00') in the output domain. This way, the RHEL5 guest booted -- no kernel panic. (The X11 server did not start up, but I chose to ignore that.) At this point, I managed to verify my patch -- the /dev/hda entry in /etc/fstab had indeed been replaced with /dev/cdrom, and the /dev/cdrom symlink pointed to "sr0". So mounting the "/mnt2" mount point worked fine. I think this is enough evidence to push the patch. Anyway, back to the larger non-transitional problem. How can we convince libvirtd to set up transitional devices? Based on this article again: <https://libvirt.org/formatdomain.html#virtio-transitional-devices>, I went back to the XML generated by virt-v2v, and tried to update every mention of "virtio" to "virtio-transitional", as follows: @@ -21,7 +21,7 @@ <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> - <disk type='volume' device='disk'> + <disk type='volume' device='disk' model='virtio-transitional'> <driver name='qemu' type='raw'/> <source pool='default' volume='rhel5.11.out-sda'/> <target dev='vda' bus='virtio'/> @@ -32,7 +32,7 @@ </disk> <interface type='bridge'> <source bridge='virbr0'/> - <model type='virtio'/> + <model type='virtio-transitional'/> <mac address='52:54:00:6c:97:64'/> </interface> <video> @@ -40,7 +40,7 @@ </video> <graphics type='spice' autoport='yes' port='-1'/> <sound model='ich6'/> - <memballoon model='virtio'/> + <memballoon model='virtio-transitional'/> <viosock model='none'/> <input type='tablet' bus='usb'/> <input type='mouse' bus='ps2'/> When running "virsh define" on this, the effect is that libvirtd places the virtio devices on the separate "pcie-to-pci-bridge" controller (not on the root complex, as integrated endpoints, but still good -- this way, the pci_bus_is_express() call retval is what makes the difference): (1) <disk type='volume' device='disk' model='virtio-transitional'> ^^^^^^^^^^^^^^^^^^^^^^^^^^^ <driver name='qemu' type='raw'/> <source pool='default' volume='rhel5.11.out-sda' index='2'/> <backingStore/> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x02' slot='0x03' function='0x0'/> ^^^^^^^^^^ </disk> with <controller type='pci' index='2' model='pcie-to-pci-bridge'> ^^^^^^^^^ (2) <interface type='bridge'> <mac address='52:54:00:6c:97:64'/> <source bridge='virbr0'/> <target dev='tap0'/> <model type='virtio-transitional'/> ^^^^^^^^^^^^^^^^^^^^^ <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/> ^^^^^^^^^^ </interface> (3) <memballoon model='virtio-transitional'> ^^^^^^^^^^^^^^^^^^^^^^^^^^ <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x02' slot='0x04' function='0x0'/> ^^^^^^^^^^ </memballoon> This way, the guest boots as well. Summary: libosinfo should have told virt-v2v that RHEL-5.11 does not have virtio-1.0 drivers, only virtio-0.9 drivers. Accordingly, virt-v2v should have generated "virtio-transitional" model attributes, so that libvirtd would position the virtio devices in traditional PCI bus slots. Do we need a new BZ about this? Thanks! Laszlo