lejeczek
2022-Jan-11 17:14 UTC
'migrate' says it worked but in reality it did not - centOS 9
On 11/01/2022 16:36, Daniel P. Berrang? wrote:> On Tue, Jan 11, 2022 at 04:30:11PM +0000, lejeczek wrote: >> Hi guys. >> >> I have a peculiar situation where between boxes: >> C->A >> -> $ virsh migrate --unsafe --live c8kubermaster1 >> qemu+ssh://10.1.1.99/system >> -> $ echo $? >> 0 >> but above does _not_ happen, instead!! VM was stopped in started, but _not_ >> migrated LIVE >> >> A->C >> -> $ virsh migrate --unsafe --live c8kubermaster1 >> qemu+ssh://10.1.1.100/system >> -> $ echo $? >> 0 >> indeed VM migrates live. >> >> box A & C have virtually identical OS stack, >> HW difference is: >> C = Ryzen 5 5600G >> A = Ryzen 5 3600 >> >> domain XML snippet where I think it matters: >> ... >> ? </metadata> >> ? <memory unit='GiB'>4</memory> >> ? <currentMemory unit='GiB'>4</currentMemory> >> ? <vcpu placement='static'>2</vcpu> >> ? <resource> >> ??? <partition>/machine</partition> >> ? </resource> >> ? <os> >> ??? <type arch='x86_64' machine='pc-i440fx-rhel7.6.0'>hvm</type> >> ??? <boot dev='hd'/> >> ? </os> >> ? <features> >> ??? <acpi/> >> ??? <apic/> >> ? </features> >> ? <cpu mode='custom' match='exact' check='full'> >> ??? <model fallback='forbid'>EPYC-IBPB</model> >> ??? <feature policy='require' name='ibpb'/> >> ??? <feature policy='require' name='ssbd'/> >> ??? <feature policy='require' name='virt-ssbd'/> >> ??? <feature policy='disable' name='monitor'/> >> ??? <feature policy='require' name='x2apic'/> >> ??? <feature policy='require' name='hypervisor'/> >> ??? <feature policy='disable' name='svm'/> >> ??? <feature policy='require' name='topoext'/> >> ? </cpu> >> ? <clock offset='utc'> >> ??? <timer name='rtc' tickpolicy='catchup'/> >> ??? <timer name='pit' tickpolicy='delay'/> >> ??? <timer name='hpet' present='no'/> >> ? </clock> >> ? <on_poweroff>destroy</on_poweroff> >> ? <on_reboot>restart</on_reboot> >> ? <on_crash>destroy</on_crash> >> ? <pm> >> ??? <suspend-to-mem enabled='no'/> >> ??? <suspend-to-disk enabled='no'/> >> ? </pm> >> ? <devices> >> ??? <emulator>/usr/libexec/qemu-kvm</emulator> >> ??? <disk type='file' device='disk'> >> ... >> >> Initially I submitted a BZ against 'PCS' but continued to filled with it and >> I find 'libvirt' might be the culprit(also?) here. >> There is not much in logs, certainly nothing (with default verbosity) in >> virtqemud.service >> Is it that VM gets migrated but then is restarted on 'migrate_to' host? if >> so then why? >> How to start troubleshooting such 'monstrosity'? - all suggestions >> appreciated. > /var/log/libvirt/qemu/$GUEST.log on both hosts should have more info >What if there is not much there neither? migrate_to(host A) seems to show only config for qemu, no errors no warnings. migrate_from(host C) shows only: ... 2022-01-11 17:00:40.687+0000: initiating migration 2022-01-11 17:00:43.413+0000: shutting down, reason=migrated 2022-01-11T17:00:43.414063Z qemu-kvm: terminating on signal 15 from pid 24022 (<unknown process>) no errors/warning but that 2nd line - ?? Again, migrating back between the same two hosts - where LIVE succeeds migrate_from(host A) also shows: ... 2022-01-11 17:10:27.921+0000: initiating migration 2022-01-11 17:10:30.459+0000: shutting down, reason=migrated 2022-01-11T17:10:30.460528Z qemu-kvm: terminating on signal 15 from pid 73193 (<unknown process> thanks, L
Daniel P. Berrangé
2022-Jan-11 17:33 UTC
'migrate' says it worked but in reality it did not - centOS 9
On Tue, Jan 11, 2022 at 05:14:53PM +0000, lejeczek wrote:> > > On 11/01/2022 16:36, Daniel P. Berrang? wrote: > > On Tue, Jan 11, 2022 at 04:30:11PM +0000, lejeczek wrote: > > > Hi guys. > > > > > > I have a peculiar situation where between boxes: > > > C->A > > > -> $ virsh migrate --unsafe --live c8kubermaster1 > > > qemu+ssh://10.1.1.99/system > > > -> $ echo $? > > > 0 > > > but above does _not_ happen, instead!! VM was stopped in started, but _not_ > > > migrated LIVE > > > > > > A->C > > > -> $ virsh migrate --unsafe --live c8kubermaster1 > > > qemu+ssh://10.1.1.100/system > > > -> $ echo $? > > > 0 > > > indeed VM migrates live. > > > > > > box A & C have virtually identical OS stack, > > > HW difference is: > > > C = Ryzen 5 5600G > > > A = Ryzen 5 3600 > > > > > > domain XML snippet where I think it matters: > > > ... > > > ? </metadata> > > > ? <memory unit='GiB'>4</memory> > > > ? <currentMemory unit='GiB'>4</currentMemory> > > > ? <vcpu placement='static'>2</vcpu> > > > ? <resource> > > > ??? <partition>/machine</partition> > > > ? </resource> > > > ? <os> > > > ??? <type arch='x86_64' machine='pc-i440fx-rhel7.6.0'>hvm</type> > > > ??? <boot dev='hd'/> > > > ? </os> > > > ? <features> > > > ??? <acpi/> > > > ??? <apic/> > > > ? </features> > > > ? <cpu mode='custom' match='exact' check='full'> > > > ??? <model fallback='forbid'>EPYC-IBPB</model> > > > ??? <feature policy='require' name='ibpb'/> > > > ??? <feature policy='require' name='ssbd'/> > > > ??? <feature policy='require' name='virt-ssbd'/> > > > ??? <feature policy='disable' name='monitor'/> > > > ??? <feature policy='require' name='x2apic'/> > > > ??? <feature policy='require' name='hypervisor'/> > > > ??? <feature policy='disable' name='svm'/> > > > ??? <feature policy='require' name='topoext'/> > > > ? </cpu> > > > ? <clock offset='utc'> > > > ??? <timer name='rtc' tickpolicy='catchup'/> > > > ??? <timer name='pit' tickpolicy='delay'/> > > > ??? <timer name='hpet' present='no'/> > > > ? </clock> > > > ? <on_poweroff>destroy</on_poweroff> > > > ? <on_reboot>restart</on_reboot> > > > ? <on_crash>destroy</on_crash> > > > ? <pm> > > > ??? <suspend-to-mem enabled='no'/> > > > ??? <suspend-to-disk enabled='no'/> > > > ? </pm> > > > ? <devices> > > > ??? <emulator>/usr/libexec/qemu-kvm</emulator> > > > ??? <disk type='file' device='disk'> > > > ... > > > > > > Initially I submitted a BZ against 'PCS' but continued to filled with it and > > > I find 'libvirt' might be the culprit(also?) here. > > > There is not much in logs, certainly nothing (with default verbosity) in > > > virtqemud.service > > > Is it that VM gets migrated but then is restarted on 'migrate_to' host? if > > > so then why? > > > How to start troubleshooting such 'monstrosity'? - all suggestions > > > appreciated. > > /var/log/libvirt/qemu/$GUEST.log on both hosts should have more info > > > What if there is not much there neither? > migrate_to(host A) seems to show only config for qemu, no errors no > warnings. > migrate_from(host C) shows only: > ... > 2022-01-11 17:00:40.687+0000: initiating migration > 2022-01-11 17:00:43.413+0000: shutting down, reason=migrated > 2022-01-11T17:00:43.414063Z qemu-kvm: terminating on signal 15 from pid > 24022 (<unknown process>) > > no errors/warning but that 2nd line - ?? > > Again, migrating back between the same two hosts - where LIVE succeeds > migrate_from(host A) also shows: > ... > 2022-01-11 17:10:27.921+0000: initiating migration > 2022-01-11 17:10:30.459+0000: shutting down, reason=migrated > 2022-01-11T17:10:30.460528Z qemu-kvm: terminating on signal 15 from pid > 73193 (<unknown process>Both those logs only show the state on the src QEMU during a migration op. There should be corresponding log for the dst QEMU at the same point in time. All tehse messages show that migration was successful from libvirt and QEMU's POV on the src. So I expect what'ps happening is that QEMU is crashing on the target host after migration has finished. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|