Hans van Kranenburg
2018-Oct-09 22:52 UTC
[Pkg-xen-devel] Test report xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2
I'm just dumping all I got in here, after initial feedback we can see how to organize todo's around it. tl;dr: * Does not upgrade cleanly from 4.8 packages, so we have to prevent this from entering testing until we fix that. * Live migration is broken, explodes with memory allocation errors. ---- >8 ---- 1. Build packages * I have built salsa/master using pbuilder targeting sid. Great success... * I have built packages for stretch-backports by adding a changelog entry and building with pbuider targeting stretch. Great success. ---- >8 ---- 2. Put the packages in a repository I use reprepro for our own package repos at work. I have a small repo named 'xen' on http://packages.knorrie.org/ that I use for testing xen. When adding the result with reprepro include, this happens: No section specified for 'xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1.dsc' in '/home/knorrie/pbuilder/result/4.11-stretch-backports/xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.changes'! commit e996c09e2f "debian/: Completely rework the packaging" drops the Section line for the source package. Is this intentional? I'd like to be able to put packages in reprepro. I used reprepro -S misc as workaround to override the sections. ---- >8 ---- 3. i386 and amd64 packages? After adding the new packages, I see that my reprepro has content left for i386. E.g.: -$ reprepro ls xen-utils-common xen-utils-common | 4.11.1~pre.20180911.5acdd26fdc+dfsg-1~exp1~bpo9+1 | stretch-backports | i386 xen-utils-common | 4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1 | stretch-backports | amd64 xen-utils-common | 4.10.1~pre+4.0f92968bcf-1~ | unstable | i386 xen-utils-common | 4.11.1~pre.20180911.5acdd26fdc+dfsg-2 | unstable | amd64 Why is this? Were the i386 things built before and not any more? I never really noticed these. Is this a problem? How does the Debian archive deal with this? ---- >8 ---- 4. Install the packages. At first I did an upgrade from previous 4.11 package to the new ones, and ran in a problem. So later I did downgrade to 4.8 from stretch and then redid the upgrade test. There it also occurs: -# apt-get dist-upgrade [...] Unpacking xenstore-utils (4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1) over (4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10) ... dpkg: error processing archive /tmp/apt-dpkg-install-WhZg6K/11-xenstore-utils_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.deb (--unpack): trying to overwrite '/usr/share/man/man1/xenstore-chmod.1.gz', which is also in package xen-utils-common 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10 [...] Errors were encountered while processing: /tmp/apt-dpkg-install-WhZg6K/11-xenstore-utils_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.deb E: Sub-process /usr/bin/dpkg returned an error code (1) If I simply run it again: -# apt-get dist-upgrade Preparing to unpack .../xenstore-utils_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.deb ... Unpacking xenstore-utils (4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1) over (4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10) ... Setting up xenstore-utils (4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1) ... So it seems a file has moved to another package, and the order in which they are upgraded matters. In the end I still have xen-hypervisor-4.8-amd64 and libxen-4.8, all other packages are 4.11-blah. ---- >8 ---- 5. Try to still use 4.8 -# xen create -c blaat.bofh.dpl.mendix.net Parsing config from blaat.bofh.dpl.mendix.net libxl: info: libxl_create.c:105:libxl__domain_build_info_setdefault: qemu-xen is unavailable, using qemu-xen-traditional instead: No such file or directory xenconsole: Could not read tty from store: Success There's no xenconsoled process any more now. /usr/lib/xen-4.8/bin/xenstored is still running however. -# /etc/init.d/xen restart [ ok ] Restarting xen (via systemctl): xen.service. Now I got a xenconsoled process back. After this, I can still start/stop a domU with this, so that's good. The only 4.8 (real package version) things I still have are libxen-4.8, xen-hypervisor-4.8-amd64 and xen-utils-4.8, so looks good. Also, I have seen xenconsoled randomly disappear with all the previous 4.11 packages already. From syslog it seems it has something to do with systemd, which is shutting it down during some nightly action. ---- >8 ---- 5. Reboot into 4.11 Ah, 4.8 again. Grub config was not updated. -# update-grub Reboot again... ---- >8 ---- 6. Now really reboot into 4.11 Yay. ---- >8 ---- 7. Live migrate a domU to it. At least it keeps running, but this is quite weird: dmesg: [ 3666.838699] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 3666.840734] OOM killer disabled. [ 3666.840738] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 3666.842265] suspending xenstore... [ 3666.856559] xen:grant_table: Grant tables using version 1 layout [18443294892.646187] OOM killer enabled. [18443294892.646200] Restarting tasks ... done. [18443294892.684093] Setting capacity to 41943040 or with -T: [Wed Oct 10 00:34:54 2018] Freezing user space processes ... (elapsed 0.001 seconds) done. [Wed Oct 10 00:34:54 2018] OOM killer disabled. [Wed Oct 10 00:34:54 2018] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [Wed Oct 10 00:34:54 2018] suspending xenstore... [Wed Oct 10 00:34:54 2018] xen:grant_table: Grant tables using version 1 layout [Tue Mar 22 00:02:00 2603] OOM killer enabled. [Tue Mar 22 00:02:00 2603] Restarting tasks ... done. [Tue Mar 22 00:02:00 2603] Setting capacity to 41943040 2603? Ok, I can confirm that this also happens with the previous 4.11 packages. Also, I lose the tcp connection to the domU while live migrating. Any process is still active, but my ssh session hangs totally. Sigh, not more live migrate problems please. ---- >8 ---- 8. Live migrate it away again (manual reproduction with debug options): -# xl -vvv migrate -C /etc/xen/guests/blaat.bofh.dpl.mendix.net -s "" blaat.bofh.dpl.mendix.net "socat - TCP:10.140.221.7:8002" Saving to migration stream new xl format (info 0x3/0x0/1254) libxl: debug: libxl_domain.c:492:libxl_domain_suspend: Domain 1:ao 0x56303d91b050: create: how=(nil) callback=(nil) poller=0x56303d91ab50 libxl: debug: libxl.c:719:libxl__fd_flags_modify_save: fnctl F_GETFL flags for fd 13 are 0x1 libxl: debug: libxl.c:727:libxl__fd_flags_modify_save: fnctl F_SETFL of fd 13 to 0x1 libxl: debug: libxl_domain.c:520:libxl_domain_suspend: Domain 1:ao 0x56303d91b050: inprogress: poller=0x56303d91ab50, flags=i libxl-save-helper: debug: starting save: Success xc: detail: fd 13, dom 1, flags 1, hvm 0 xc: info: Saving domain 1, type x86 PV xc: detail: 64 bits, 4 levels xc: detail: max_mfn 0x1b1ffff xc: detail: p2m list from 0xffffc90000000000 to 0xffffc90000ffffff, root at 0xd9408f xc: detail: max_pfn 0x1fffff, p2m_frames 4096 xencall: error: alloc_pages: mmap failed: Invalid argument xc: error: Unable to allocate memory for dirty bitmaps, batch pfns and deferred pages: Internal error xc: error: Save failed (12 = Cannot allocate memory): Internal error libxl-save-helper: debug: complete r=-1: Cannot allocate memory libxl: error: libxl_stream_write.c:350:libxl__xc_domain_save_done: Domain 1:saving domain: domain did not respond to suspend request: Cannot allocate memory libxl: debug: libxl.c:746:libxl__fd_flags_restore: fnctl F_SETFL of fd 13 to 0x1 libxl: debug: libxl_event.c:1869:libxl__ao_complete: ao 0x56303d91b050: complete, rc=-8 libxl: debug: libxl_event.c:1838:libxl__ao__destroy: ao 0x56303d91b050: destroy migration sender: libxl_domain_suspend failed (rc=-8) Migration failed, failed to suspend at sender. xencall:buffer: debug: total allocations:20 total releases:20 xencall:buffer: debug: current allocations:0 maximum allocations:2 xencall:buffer: debug: cache current size:2 xencall:buffer: debug: cache hits:14 misses:2 toobig:4 xencall:buffer: debug: total allocations:0 total releases:0 xencall:buffer: debug: current allocations:0 maximum allocations:0 xencall:buffer: debug: cache current size:0 xencall:buffer: debug: cache hits:0 misses:0 toobig:0 That's not good, and a show stopper for me to do anything with it beyond this first test machine. ---- >8 ---- 9. by/domain info -# xl info [...] cc_compiler : gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516 cc_compile_by : cc_compile_domain : cc_compile_date : Sat Oct 6 00:24:06 UTC 2018 [...] Previously the email address of the most recent debian/changelog entry appeared here. Apparently this is gone. -# xl dmesg [...] (XEN) Xen version 4.11.1-pre (Debian ) (@) (gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516) debug=n Sat Oct 6 00:24:06 UTC 2018 [...] Maybe it makes sense to 'hard'code the team list email address in here instead. ---- >8 ---- 10. xl/xen tab completion -# xl . ./ .bash_aliases .bashrc .profile .vim/ ../ .bash_history .lesshst .ssh/ .vimrc -# xen . ./ .bash_aliases .bashrc .profile .vim/ ../ .bash_history .lesshst .ssh/ .vimrc xl and xen now tab-complete filenames in the local directory. ---- >8 ---- So far my initial test report. Hans
Ian Jackson
2018-Oct-10 14:42 UTC
[Pkg-xen-devel] Test report xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2
Hans van Kranenburg writes ("Test report xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2"):> tl;dr: > * Does not upgrade cleanly from 4.8 packages, so we have to prevent this > from entering testing until we fix that.I suggest we take the approach of fixing the bugs in git and then uploading a new version as soon as what we have uploaded passes NEW.> * Live migration is broken, explodes with memory allocation errors.WFM, I'm afraid.> ---- >8 ---- > > 2. Put the packages in a repository > > I use reprepro for our own package repos at work. I have a small repo > named 'xen' on http://packages.knorrie.org/ that I use for testing xen. > > When adding the result with reprepro include, this happens: > > No section specified for > 'xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1.dsc' in > '/home/knorrie/pbuilder/result/4.11-stretch-backports/xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.changes'! > > commit e996c09e2f "debian/: Completely rework the packaging" drops the > Section line for the source package. Is this intentional? I'd like to be > able to put packages in reprepro. > > I used reprepro -S misc as workaround to override the sections.Hrm. Mostly I deleted the Section from the .dsc because I wanted to spot if I didn't explicitly set the Section in one of the .debs. I trusted lintian (which does not complain about this) too much - I see that Section is Recommended by policy 5.2 for the source stanza. I have added `Section: admin' in my working tree.> 3. i386 and amd64 packages? > > After adding the new packages, I see that my reprepro has content left > for i386. E.g.: > > -$ reprepro ls xen-utils-common > xen-utils-common | 4.11.1~pre.20180911.5acdd26fdc+dfsg-1~exp1~bpo9+1 | > stretch-backports | i386 > xen-utils-common | 4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1 | > stretch-backports | amd64 > xen-utils-common | 4.10.1~pre+4.0f92968bcf-1~ | > unstable | i386 > xen-utils-common | 4.11.1~pre.20180911.5acdd26fdc+dfsg-2 | > unstable | amd64 > > Why is this? Were the i386 things built before and not any more? I never > really noticed these. Is this a problem? How does the Debian archive > deal with this?The package should build fine for i386 as well as amd64. I assume you must have done an i386 build in the past.> ---- >8 ---- > > 4. Install the packages. > > At first I did an upgrade from previous 4.11 package to the new ones, > and ran in a problem. So later I did downgrade to 4.8 from stretch and > then redid the upgrade test. There it also occurs: > > -# apt-get dist-upgrade > [...] > Unpacking xenstore-utils (4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1) > over (4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10) ... > dpkg: error processing archive > /tmp/apt-dpkg-install-WhZg6K/11-xenstore-utils_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.deb > (--unpack): > trying to overwrite '/usr/share/man/man1/xenstore-chmod.1.gz', which is > also in package xen-utils-common 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10 > [...] > Errors were encountered while processing: > /tmp/apt-dpkg-install-WhZg6K/11-xenstore-utils_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.deb > E: Sub-process /usr/bin/dpkg returned an error code (1) > > If I simply run it again: > > -# apt-get dist-upgrade > Preparing to unpack > .../xenstore-utils_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.deb > ... > Unpacking xenstore-utils (4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1) > over (4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10) ... > Setting up xenstore-utils (4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1) ... > > So it seems a file has moved to another package, and the order in which > they are upgraded matters.This is a missing Replaces. I have fixed that in my working tree too.> In the end I still have xen-hypervisor-4.8-amd64 and libxen-4.8, all > other packages are 4.11-blah.Right.> ---- >8 ---- > > 5. Try to still use 4.8 > > -# xen create -c blaat.bofh.dpl.mendix.net > Parsing config from blaat.bofh.dpl.mendix.net > libxl: info: libxl_create.c:105:libxl__domain_build_info_setdefault: > qemu-xen is unavailable, using qemu-xen-traditional instead: No such > file or directory > xenconsole: Could not read tty from store: Success > > There's no xenconsoled process any more now.Can you investigate why this happens ? It sounds like upgrading the packages somehow stopped the old xenconsoled but didn't start a new one.> Also, I have seen xenconsoled randomly disappear with all the previous > 4.11 packages already. From syslog it seems it has something to do with > systemd, which is shutting it down during some nightly action.Oh. systemd. I have been testing with sysvinit.> ---- >8 ---- > > 5. Reboot into 4.11 > > Ah, 4.8 again. Grub config was not updated.I encountered that too. I thought I had fixed that. xen-hypversor-F-V.postinst.vsn-in turns into ... ... oh wait it is missing the .vsn-in in the filename. Fixed in my working tree.> 7. Live migrate a domU to it. > > At least it keeps running, but this is quite weird: > > dmesg: > > [ 3666.838699] Freezing user space processes ... (elapsed 0.001 seconds) > done. > [ 3666.840734] OOM killer disabled. > [ 3666.840738] Freezing remaining freezable tasks ... (elapsed 0.001 > seconds) done. > [ 3666.842265] suspending xenstore... > [ 3666.856559] xen:grant_table: Grant tables using version 1 layout > [18443294892.646187] OOM killer enabled. > [18443294892.646200] Restarting tasks ... done. > [18443294892.684093] Setting capacity to 41943040I think during early resume the timestamps may be wrong ?> Ok, I can confirm that this also happens with the previous 4.11 > packages. Also, I lose the tcp connection to the domU while live > migrating. Any process is still active, but my ssh session hangs totally. > > Sigh, not more live migrate problems please. > > ---- >8 ---- > > 8. Live migrate it away againIs that from 4.11 to 4.8 ? That's not necessarily expected to work. On my test machine (stretch) I can localhost migrate both PV and HVM guests. The VM stays up. My ssh session to it (tested with HVM only, but no doubt PV works too) survives.> (manual reproduction with debug options): > > -# xl -vvv migrate -C /etc/xen/guests/blaat.bofh.dpl.mendix.net -s "" > blaat.bofh.dpl.mendix.net "socat - TCP:10.140.221.7:8002" > Saving to migration stream new xl format (info 0x3/0x0/1254) > libxl: debug: libxl_domain.c:492:libxl_domain_suspend: Domain 1:ao > 0x56303d91b050: create: how=(nil) callback=(nil) poller=0x56303d91ab50 > libxl: debug: libxl.c:719:libxl__fd_flags_modify_save: fnctl F_GETFL > flags for fd 13 are 0x1 > libxl: debug: libxl.c:727:libxl__fd_flags_modify_save: fnctl F_SETFL of > fd 13 to 0x1 > libxl: debug: libxl_domain.c:520:libxl_domain_suspend: Domain 1:ao > 0x56303d91b050: inprogress: poller=0x56303d91ab50, flags=i > libxl-save-helper: debug: starting save: Success...> xencall: error: alloc_pages: mmap failed: Invalid argument > xc: error: Unable to allocate memory for dirty bitmaps, batch pfns and > deferred pages: Internal errorI'm afraid IDK what this means.> So far my initial test report.Thanks. Ian. -- Ian Jackson <ijackson at chiark.greenend.org.uk> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Ian Jackson
2018-Oct-10 14:46 UTC
[Pkg-xen-devel] Test report xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2
Ian Jackson writes ("Re: Test report xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2"):> Hans van Kranenburg writes ("Test report xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2"): > > tl;dr: > > * Does not upgrade cleanly from 4.8 packages, so we have to prevent this > > from entering testing until we fix that. > > I suggest we take the approach of fixing the bugs in git and then > uploading a new version as soon as what we have uploaded passes NEW.For the things with obvious fixes, I have pushed the fixes to salsa/master. Regards, Ian. -- Ian Jackson <ijackson at chiark.greenend.org.uk> These opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Hans van Kranenburg
2018-Oct-10 14:56 UTC
[Pkg-xen-devel] Test report xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2
On 10/10/2018 04:42 PM, Ian Jackson wrote:> Hans van Kranenburg writes ("Test report xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2"): >> tl;dr: >> * Does not upgrade cleanly from 4.8 packages, so we have to prevent this >> from entering testing until we fix that. > > I suggest we take the approach of fixing the bugs in git and then > uploading a new version as soon as what we have uploaded passes NEW. > >> * Live migration is broken, explodes with memory allocation errors. > > WFM, I'm afraid. > >> ---- >8 ---- >> >> 2. Put the packages in a repository >> >> I use reprepro for our own package repos at work. I have a small repo >> named 'xen' on http://packages.knorrie.org/ that I use for testing xen. >> >> When adding the result with reprepro include, this happens: >> >> No section specified for >> 'xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1.dsc' in >> '/home/knorrie/pbuilder/result/4.11-stretch-backports/xen_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.changes'! >> >> commit e996c09e2f "debian/: Completely rework the packaging" drops the >> Section line for the source package. Is this intentional? I'd like to be >> able to put packages in reprepro. >> >> I used reprepro -S misc as workaround to override the sections. > > Hrm. Mostly I deleted the Section from the .dsc because I wanted to > spot if I didn't explicitly set the Section in one of the .debs. I > trusted lintian (which does not complain about this) too much - I see > that Section is Recommended by policy 5.2 for the source stanza. > > I have added `Section: admin' in my working tree. > >> 3. i386 and amd64 packages? >> >> After adding the new packages, I see that my reprepro has content left >> for i386. E.g.: >> >> -$ reprepro ls xen-utils-common >> xen-utils-common | 4.11.1~pre.20180911.5acdd26fdc+dfsg-1~exp1~bpo9+1 | >> stretch-backports | i386 >> xen-utils-common | 4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1 | >> stretch-backports | amd64 >> xen-utils-common | 4.10.1~pre+4.0f92968bcf-1~ | >> unstable | i386 >> xen-utils-common | 4.11.1~pre.20180911.5acdd26fdc+dfsg-2 | >> unstable | amd64 >> >> Why is this? Were the i386 things built before and not any more? I never >> really noticed these. Is this a problem? How does the Debian archive >> deal with this? > > The package should build fine for i386 as well as amd64. I assume you > must have done an i386 build in the past.Nope, these are remainders of the output of the previous packaging. I've never explicitely done something about i386. But, it's not important, let's not spend time on th>> ---- >8 ---- >> >> 4. Install the packages. >> >> At first I did an upgrade from previous 4.11 package to the new ones, >> and ran in a problem. So later I did downgrade to 4.8 from stretch and >> then redid the upgrade test. There it also occurs: >> >> -# apt-get dist-upgrade >> [...] >> Unpacking xenstore-utils (4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1) >> over (4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10) ... >> dpkg: error processing archive >> /tmp/apt-dpkg-install-WhZg6K/11-xenstore-utils_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.deb >> (--unpack): >> trying to overwrite '/usr/share/man/man1/xenstore-chmod.1.gz', which is >> also in package xen-utils-common 4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10 >> [...] >> Errors were encountered while processing: >> /tmp/apt-dpkg-install-WhZg6K/11-xenstore-utils_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.deb >> E: Sub-process /usr/bin/dpkg returned an error code (1) >> >> If I simply run it again: >> >> -# apt-get dist-upgrade >> Preparing to unpack >> .../xenstore-utils_4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1_amd64.deb >> ... >> Unpacking xenstore-utils (4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1) >> over (4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10) ... >> Setting up xenstore-utils (4.11.1~pre.20180911.5acdd26fdc+dfsg-2~bpo9+1) ... >> >> So it seems a file has moved to another package, and the order in which >> they are upgraded matters. > > This is a missing Replaces. I have fixed that in my working tree too. > >> In the end I still have xen-hypervisor-4.8-amd64 and libxen-4.8, all >> other packages are 4.11-blah. > > Right. > >> ---- >8 ---- >> >> 5. Try to still use 4.8 >> >> -# xen create -c blaat.bofh.dpl.mendix.net >> Parsing config from blaat.bofh.dpl.mendix.net >> libxl: info: libxl_create.c:105:libxl__domain_build_info_setdefault: >> qemu-xen is unavailable, using qemu-xen-traditional instead: No such >> file or directory >> xenconsole: Could not read tty from store: Success >> >> There's no xenconsoled process any more now. > > Can you investigate why this happens ? It sounds like upgrading the > packages somehow stopped the old xenconsoled but didn't start a new > one.Yes, I have to investigate. I suspect it's not a problem that has been introduced now.>> Also, I have seen xenconsoled randomly disappear with all the previous >> 4.11 packages already. From syslog it seems it has something to do with >> systemd, which is shutting it down during some nightly action. > > Oh. systemd. I have been testing with sysvinit. > >> ---- >8 ---- >> >> 5. Reboot into 4.11 >> >> Ah, 4.8 again. Grub config was not updated. > > I encountered that too. I thought I had fixed that. > xen-hypversor-F-V.postinst.vsn-in turns into ... > ... oh wait it is missing the .vsn-in in the filename. > > Fixed in my working tree. > >> 7. Live migrate a domU to it. >> >> At least it keeps running, but this is quite weird: >> >> dmesg: >> >> [ 3666.838699] Freezing user space processes ... (elapsed 0.001 seconds) >> done. >> [ 3666.840734] OOM killer disabled. >> [ 3666.840738] Freezing remaining freezable tasks ... (elapsed 0.001 >> seconds) done. >> [ 3666.842265] suspending xenstore... >> [ 3666.856559] xen:grant_table: Grant tables using version 1 layout >> [18443294892.646187] OOM killer enabled. >> [18443294892.646200] Restarting tasks ... done. >> [18443294892.684093] Setting capacity to 41943040 > > I think during early resume the timestamps may be wrong ?Just caused a new logline to happen: [18446422056.096266] OOM killer enabled. [18446422056.096276] Restarting tasks ... done. [18446422056.169628] Setting capacity to 41943040 [18446479746.168280] EXT4-fs (xvdb): mounted filesystem with ordered data mode. Opts: (null) -$ date Wed Oct 10 16:51:41 CEST 2018>> Ok, I can confirm that this also happens with the previous 4.11 >> packages. Also, I lose the tcp connection to the domU while live >> migrating. Any process is still active, but my ssh session hangs totally. >> >> Sigh, not more live migrate problems please. >> >> ---- >8 ---- >> >> 8. Live migrate it away again > > Is that from 4.11 to 4.8 ? That's not necessarily expected to work.No, 4.11 to 4.11. Exactly same failure reproduced in 100% of the cases where I tried to live migrate away the domU. Attempt to live migrate to the same machine also fails. dom0: Linux omega 4.18.0-0.bpo.1-amd64 #1 SMP Debian 4.18.6-1~bpo9+1 (2018-09-13) x86_64 GNU/Linux domU: Linux blaat 4.18.0-0.bpo.1-amd64 #1 SMP Debian 4.18.6-1~bpo9+1 (2018-09-13) x86_64 GNU/Linux> On my test machine (stretch) I can localhost migrate both PV and HVM > guests. The VM stays up. My ssh session to it (tested with HVM only, > but no doubt PV works too) survives. > >> (manual reproduction with debug options): >> >> -# xl -vvv migrate -C /etc/xen/guests/blaat.bofh.dpl.mendix.net -s "" >> blaat.bofh.dpl.mendix.net "socat - TCP:10.140.221.7:8002" >> Saving to migration stream new xl format (info 0x3/0x0/1254) >> libxl: debug: libxl_domain.c:492:libxl_domain_suspend: Domain 1:ao >> 0x56303d91b050: create: how=(nil) callback=(nil) poller=0x56303d91ab50 >> libxl: debug: libxl.c:719:libxl__fd_flags_modify_save: fnctl F_GETFL >> flags for fd 13 are 0x1 >> libxl: debug: libxl.c:727:libxl__fd_flags_modify_save: fnctl F_SETFL of >> fd 13 to 0x1 >> libxl: debug: libxl_domain.c:520:libxl_domain_suspend: Domain 1:ao >> 0x56303d91b050: inprogress: poller=0x56303d91ab50, flags=i >> libxl-save-helper: debug: starting save: Success > ... >> xencall: error: alloc_pages: mmap failed: Invalid argument >> xc: error: Unable to allocate memory for dirty bitmaps, batch pfns and >> deferred pages: Internal error > > I'm afraid IDK what this means.D:> >> So far my initial test report. > > Thanks. > > Ian. >