Andreas Olsowski
2011-Sep-20 09:41 UTC
[Xen-devel] XL: pv guests dont reboot after migration (xen4.1.2-rc2-pre)
A pv guest will not reboot after migration, the guest itself does everything right, including the shutdown, but xl does not recreate the guest, it just shuts it down. This goes for 2.6.39 and 3.0.4 guest kernels, havent tried different ones. I also haven tried different xen versions. Dont know if this would affect hvm, probably not since qemu leaves the guest running and does a "proper" restart. I guess this behavior has always been that way and noone ever bothered to bring it up, well now i do :) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Sep-20 19:23 UTC
Re: [Xen-devel] XL: pv guests dont reboot after migration (xen4.1.2-rc2-pre)
On Tue, 2011-09-20 at 10:41 +0100, Andreas Olsowski wrote:> A pv guest will not reboot after migration, the guest itself does > everything right, including the shutdown, but xl does not recreate the > guest, it just shuts it down.After the migrate but before the shutdown is there an xl process associated with the guest? Please take alook at http://wiki.xen.org/xenwiki/ReportingBugs it includes a useful list of bits of info (logfiles etc), including those in your report will help us to help you. For example: What exact commands are you running on the host and in the guest? What is in your guest cfg file? What does /var/log/xen/*-$GUESTNAME* contain? Ian.> > This goes for 2.6.39 and 3.0.4 guest kernels, havent tried different > ones. I also haven tried different xen versions. > > Dont know if this would affect hvm, probably not since qemu leaves the > guest running and does a "proper" restart. > > > I guess this behavior has always been that way and noone ever bothered > to bring it up, well now i do :) >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2011-Sep-23 07:40 UTC
Re: [Xen-devel] XL: pv guests dont reboot after migration (xen4.1.2-rc2-pre)
On 09/20/2011 09:23 PM, Ian Campbell wrote: > On Tue, 2011-09-20 at 10:41 +0100, Andreas Olsowski wrote: >> A pv guest will not reboot after migration, the guest itself does >> everything right, including the shutdown, but xl does not recreate the >> guest, it just shuts it down. > > After the migrate but before the shutdown is there an xl process > associated with the guest? Yes, xl migrate-receive is running, but check this out: root@xenturio1:/var/log/xen# cat xl-thiswillfail.log Waiting for domain thiswillfail (domid 7) to die [pid 7475] root@xenturio1:/usr/src/linux-2.6-xen# xl -vvv migrate thiswillfail xenturio2 migration target: Ready to receive domain. Saving to migration stream new xl format (info 0x0/0x0/380) Loading new save file incoming migration stream (new xl fmt info 0x0/0x0/380) Savefile contains xl domain config xc: detail: Had 0 unexplained entries in p2m table xc: Saving memory: iter 0 (last sent 0 skipped 0): 133120/133120 100% xc: detail: delta 9519ms, dom0 94%, target 1%, sent 449Mb/s, dirtied 1Mb/s 533 pages xc: Saving memory: iter 1 (last sent 130565 skipped 507): 133120/133120 100% xc: detail: delta 39ms, dom0 92%, target 2%, sent 447Mb/s, dirtied 28Mb/s 34 pages xc: Saving memory: iter 2 (last sent 533 skipped 0): 133120/133120 100% xc: detail: Start last iteration libxl: debug: libxl_dom.c:384:libxl__domain_suspend_common_callback issuing PV suspend request via XenBus control node libxl: debug: libxl_dom.c:389:libxl__domain_suspend_common_callback wait for the guest to acknowledge suspend request libxl: debug: libxl_dom.c:434:libxl__domain_suspend_common_callback guest acknowledged suspend request libxl: debug: libxl_dom.c:438:libxl__domain_suspend_common_callback wait for the guest to suspend libxl: debug: libxl_dom.c:450:libxl__domain_suspend_common_callback guest has suspended xc: detail: SUSPEND shinfo 0007fafc xc: detail: delta 205ms, dom0 3%, target 0%, sent 5Mb/s, dirtied 25Mb/s 160 pages xc: Saving memory: iter 3 (last sent 34 skipped 0): 133120/133120 100% xc: detail: delta 3ms, dom0 0%, target 0%, sent 1747Mb/s, dirtied 1747Mb/s 160 pages xc: detail: Total pages sent= 131292 (0.99x) xc: detail: (of which 0 were fixups) xc: detail: All memory is saved xc: detail: Save exit rc=0 migration target: Transfer complete, requesting permission to start domain. migration sender: Target has acknowledged transfer. migration sender: Giving target permission to start. migration target: Got permission, starting domain. migration target: Domain started successsfully. migration sender: Target reports successful startup. Migration successful. root@xenturio1:/var/log/xen# cat xl-thiswillfail.log Waiting for domain thiswillfail (domid 7) to die [pid 7475] Domain 7 is dead Done. Exiting now root@xenturio2:/var/log/xen# cat xl-thiswillfail--incoming.log Waiting for domain thiswillfail--incoming (domid 10) to die [pid 5162] root@xenturio2:/var/log/xen# ps auxww |grep -v grep |grep "migrate-rec" root 5162 0.0 0.0 36128 1592 ? Ssl 09:30 0:00 xl migrate-receive root@xenturio2:/var/log/xen# xl console thiswillfail PM: early restore of devices complete after 0.071 msecs PM: restore of devices complete after 14.727 msecs Setting capacity to 10485760 Setting capacity to 2097152 root@thiswillfail:~# init 6 INIT: Switching to runlevel: 6 INIT: Sending processes the TERM signal Using makefile-style concurrent boot in runlevel 6. Asking all remaining processes to terminate...done. All processes ended within 1 seconds....done. Stopping enhanced syslogd: rsyslogd. Saving the system clock. Cannot access the Hardware Clock via any known method. Use the --debug option to see the details of our search for an access method. Deconfiguring network interfaces...Internet Systems Consortium DHCP Client 4.1.1-P1 Copyright 2004-2010 Internet Systems Consortium. All rights reserved. For info, please visit https://www.isc.org/software/dhcp/ Listening on LPF/eth0/00:16:3e:7e:38:fb Sending on LPF/eth0/00:16:3e:7e:38:fb Sending on Socket/fallback DHCPRELEASE on eth0 to 10.19.46.16 port 67 done. Cleaning up ifupdown.... Deactivating swap...done. Will now restart. md: stopping all md devices. Restarting system. root@xenturio2:/var/log/xen# xl list Name ID Mem VCPUs State Time(s) Domain-0 0 4661 8 r----- 77471.3 root@xenturio2:/var/log/xen# ps auxww |grep -v grep |grep xl root@xenturio2:/var/log/xen# cat xl-thiswillfail--incoming.log Waiting for domain thiswillfail--incoming (domid 10) to die [pid 5162] Domain 10 is dead Action for shutdown reason code 1 is restart Domain 10 needs to be cleaned up: destroying the domain Done. Rebooting now xc: error: 0-length read: Internal error xc: error: read_exact_timed failed (read rc: 0, errno: 0): Internal error xc: error: read: p2m_size (0 = Success): Internal error ###### # domU config root@xenturio2:/var/log/xen# cat /mnt/vmctrl/xenconfig/thiswillfail.sxp # generated using xen-tool kernel = "/boot/vmlinuz-3.0-xenU" ramdisk = "/boot/initrd.img-3.0-xenU" name = "thiswillfail" memory = "512" vcpus = "2" vif = [ ''bridge=vlanbr27'',''mac=fe:ff:00:1b:00:06,bridge=mgmtbr27'' ] disk = [ ''phy:/dev/xen-data/thiswillfail-root,xvda1,w'',''phy:/dev/xen-data/thiswillfail-swap,xvda2,w'' ] root = "/dev/xvda1" extra = "xencons=hvc0 console=hvc0" This again goes for 2.3.39-xenU and 3.0.4-xenU. I guess the core of the problem is somewhere around this: >xc: error: 0-length read: Internal error >xc: error: read_exact_timed failed (read rc: 0, errno: 0): Internal error >xc: error: read: p2m_size (0 = Success): Internal error with best regards andreas _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Sep-23 08:00 UTC
Re: [Xen-devel] XL: pv guests dont reboot after migration (xen4.1.2-rc2-pre)
On Fri, 2011-09-23 at 08:38 +0100, Andreas Olsowski wrote: I guess the core of the problem is somewhere around this:> >xc: error: 0-length read: Internal error > >xc: error: read_exact_timed failed (read rc: 0, errno: 0): Internal error > >xc: error: read: p2m_size (0 = Success): Internal error >It smells like on reboot it is trying to receive another incoming migration, instead of restarting the domain it already has. This (untested) might help: diff -r d7b14b76f1eb tools/libxl/xl_cmdimpl.c --- a/tools/libxl/xl_cmdimpl.c Thu Sep 22 14:26:08 2011 +0100 +++ b/tools/libxl/xl_cmdimpl.c Fri Sep 23 08:59:36 2011 +0100 @@ -1516,6 +1516,11 @@ start: ret = libxl_domain_create_restore(ctx, &d_config, cb, &child_console_pid, &domid, restore_fd); + /* + * On subsequent reboot etc we should create the domain, not + * restore/migrate-receive it again. + */ + restore_file = NULL; }else{ ret = libxl_domain_create_new(ctx, &d_config, cb, &child_console_pid, &domid); Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2011-Sep-23 09:15 UTC
Re: [Xen-devel] XL: pv guests dont reboot after migration (xen4.1.2-rc2-pre)
On 09/23/2011 10:00 AM, Ian Campbell wrote:> It smells like on reboot it is trying to receive another incoming > migration, instead of restarting the domain it already has. > > This (untested) might help: > > diff -r d7b14b76f1eb tools/libxl/xl_cmdimpl.c > --- a/tools/libxl/xl_cmdimpl.c Thu Sep 22 14:26:08 2011 +0100 > +++ b/tools/libxl/xl_cmdimpl.c Fri Sep 23 08:59:36 2011 +0100 > @@ -1516,6 +1516,11 @@ start: > ret = libxl_domain_create_restore(ctx,&d_config, > cb,&child_console_pid, > &domid, restore_fd); > + /* > + * On subsequent reboot etc we should create the domain, not > + * restore/migrate-receive it again. > + */ > + restore_file = NULL; > }else{ > ret = libxl_domain_create_new(ctx,&d_config, > cb,&child_console_pid,&domid); > > Ian.Patching works. root@xenturio2:/usr/src/xen-4.1-testing.hg# patch -p1 < ../xl-migration-reboot.ian.patch patching file tools/libxl/xl_cmdimpl.c Hunk #1 succeeded at 1520 with fuzz 2 (offset 4 lines). Compilation (clean/make/install) worked fine too. The patch did what you intended for it to do, the guest reboots: ############## root@xenturio2:/usr/src/xen-4.1-testing.hg# xl console thishopefullywontfail PM: early restore of devices complete after 0.068 msecs PM: restore of devices complete after 13.033 msecs Setting capacity to 10485760 Setting capacity to 2097152 root@thishopefullywontfail:~# init 6 INIT: Switching to runlevel: 6 INIT: Sending processes the TERM signal ... usual shutdown ... Restarting system. root@xenturio2:/usr/src/xen-4.1-testing.hg# xl list Name ID Mem VCPUs State Time(s) Domain-0 0 4661 8 r----- 78258.3 thishopefullywontfail 14 512 2 -b---- 2.6 root@xenturio2:/usr/src/xen-4.1-testing.hg# xl console thishopefullywontfail Linux version 3.0.4-xenU (root@xenturio1) (gcc version 4.4.5 (Debian 4.4.5-8) ) #6 SMP Wed Aug 31 17:04:24 CEST 2011 ... usual bootup .... root@thishopefullywontfail:~# ##################### Here is the output of the log: root@xenturio2:/var/log/xen# cat xl-thishopefullywontfail--incoming.log Waiting for domain thishopefullywontfail--incoming (domid 13) to die [pid 14668] Domain 13 is dead Action for shutdown reason code 1 is restart Domain 13 needs to be cleaned up: destroying the domain Done. Rebooting now Waiting for domain thishopefullywontfail (domid 14) to die [pid 14668] with best regards andreas _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2011-Sep-27 17:39 UTC
Re: [Xen-devel] XL: pv guests dont reboot after migration (xen4.1.2-rc2-pre) [and 1 more messages]
Ian Campbell writes ("Re: [Xen-devel] XL: pv guests dont reboot after migration (xen4.1.2-rc2-pre)"):> It smells like on reboot it is trying to receive another incoming > migration, instead of restarting the domain it already has. > > This (untested) might help:Thanks both, I''ve applied this patch. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2011-Oct-15 01:01 UTC
Re: [Xen-devel] XL: pv guests dont reboot after migration (xen-4.1.2-rc3) libc-2.11.2 segfault
Hi all. If you recall i created a similar discussion last month and working patches were replied and tested. Now i finally got around to updating my systems to the latest testing release and it would seem something else is preventing a clean reboot now. pv guests dont reboot after migration, just when xl should reboot the machine syslog shows: Oct 15 02:46:32 netcatarina kernel: xl[14986]: segfault at 7f0ec70a3008 ip 00007f0ec7d517f9 sp 00007fff366cf100 error 4 in libc-2.11.2.so[7f0ec7cdb000+158000] I am running debian squueze and havent made any changes to it since 4.1.2-rc2-pre with the patches from my previous thread worked fine. root@netcatarina:~# locate libc-2. /lib/libc-2.11.2.so /lib32/libc-2.11.2.so root@netcatarina:~# dpkg -l |grep libc6 ii libc6 2.11.2-10 Embedded GNU C Library: Shared libraries ii libc6-dev 2.11.2-10 Embedded GNU C Library: Development Libraries and Header Files ii libc6-i386 2.11.2-10 Embedded GNU C Library: 32-bit shared libraries for AMD64 root@netcatarina:~# ldd /usr/sbin/xl linux-vdso.so.1 => (0x00007fffa33c9000) libxlutil.so.1.0 => /usr/lib/libxlutil.so.1.0 (0x00007f6815a23000) libxenlight.so.1.0 => /usr/lib/libxenlight.so.1.0 (0x00007f68157fb000) libxenctrl.so.4.0 => /usr/lib/libxenctrl.so.4.0 (0x00007f68155d8000) libdl.so.2 => /lib/libdl.so.2 (0x00007f68153d4000) libxenguest.so.4.0 => /usr/lib/libxenguest.so.4.0 (0x00007f68151af000) libxenstore.so.3.0 => /usr/lib/libxenstore.so.3.0 (0x00007f6814fa5000) libblktapctl.so.1.0 => /usr/lib/libblktapctl.so.1.0 (0x00007f6814d9e000) libutil.so.1 => /lib/libutil.so.1 (0x00007f6814b9b000) libuuid.so.1 => /lib/libuuid.so.1 (0x00007f6814996000) libc.so.6 => /lib/libc.so.6 (0x00007f6814635000) libpthread.so.0 => /lib/libpthread.so.0 (0x00007f6814419000) /lib64/ld-linux-x86-64.so.2 (0x00007f6815c32000) libz.so.1 => /usr/lib/libz.so.1 (0x00007f6814201000) root@netcatarina:~# ls -la /lib/libc.so.6 lrwxrwxrwx 1 root root 14 May 23 13:04 /lib/libc.so.6 -> libc-2.11.2.so So its not the fact that i do have a additional 32bit version installed (this is a 64bit system). This patch is the main reason i even bothered to update the servers so it would be nice if you could post a patch to this problem as well. It happens more frequently than i would like that people reboot their servers. And since i migrate them to a different server when i do maintenance this problem pretty much affects all of my 60 virtual machines. With best regards --- Andreas _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Oct-15 05:45 UTC
Re: [Xen-devel] XL: pv guests dont reboot after migration (xen-4.1.2-rc3) libc-2.11.2 segfault
On Sat, 2011-10-15 at 02:01 +0100, Andreas Olsowski wrote:> pv guests dont reboot after migration, > just when xl should reboot the machine syslog shows: > > > Oct 15 02:46:32 netcatarina kernel: xl[14986]: segfault at 7f0ec70a3008 > ip 00007f0ec7d517f9 sp 00007fff366cf100 error 4 in > libc-2.11.2.so[7f0ec7cdb000+158000]Can you run under gdb and get a backtrace? Or perhaps core file is dropped somewhere? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Olsowski
2011-Oct-15 10:47 UTC
Re: [Xen-devel] XL: pv guests dont reboot after migration (xen-4.1.2-rc3) libc-2.11.2 segfault
On 10/15/2011 07:45 AM, Ian Campbell wrote:> On Sat, 2011-10-15 at 02:01 +0100, Andreas Olsowski wrote: > >> pv guests dont reboot after migration, >> just when xl should reboot the machine syslog shows: >> >> >> Oct 15 02:46:32 netcatarina kernel: xl[14986]: segfault at 7f0ec70a3008 >> ip 00007f0ec7d517f9 sp 00007fff366cf100 error 4 in >> libc-2.11.2.so[7f0ec7cdb000+158000] > > Can you run under gdb and get a backtrace? Or perhaps core file is > dropped somewhere?How? xl migrate-receive is not started by hand. Can you point me to the location within the code that calls it so i can put a "gdb" infront of it? > Or perhaps core file is dropped somewhere? Wouldnt i have to run a debugging enabled build of xen for that? I found this in the log dir: root@netcatarina:/var/log/xen# cat xl-testmig--incoming.log Waiting for domain testmig--incoming (domid 67) to die [pid 3429] Domain 67 is dead Action for shutdown reason code 1 is restart Domain 67 needs to be cleaned up: destroying the domain Done. Rebooting now xc: error: 0-length read: Internal error xc: error: read_exact_timed failed (read rc: 0, errno: 0): Internal error xc: error: read: p2m_size (0 = Success): Internal error -- Andreas _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Oct-15 13:07 UTC
Re: [Xen-devel] XL: pv guests dont reboot after migration (xen-4.1.2-rc3) libc-2.11.2 segfault
On Sat, 2011-10-15 at 11:47 +0100, Andreas Olsowski wrote:> On 10/15/2011 07:45 AM, Ian Campbell wrote: > > On Sat, 2011-10-15 at 02:01 +0100, Andreas Olsowski wrote: > > > >> pv guests dont reboot after migration, > >> just when xl should reboot the machine syslog shows: > >> > >> > >> Oct 15 02:46:32 netcatarina kernel: xl[14986]: segfault at 7f0ec70a3008 > >> ip 00007f0ec7d517f9 sp 00007fff366cf100 error 4 in > >> libc-2.11.2.so[7f0ec7cdb000+158000] > > > > Can you run under gdb and get a backtrace? Or perhaps core file is > > dropped somewhere? > How? xl migrate-receive is not started by hand. Can you point me to the > location within the code that calls it so i can put a "gdb" infront of it?tools/libxl/xl_cmdimpl.c, main_migrate(). Or you can attach gdb to a running xl migrate receive ("gdb -p <pid> /path/xl"?). I think you can also control the remove command which is run using the -e option to "xl migrate", maybe. Not so sure about that last one.> > Or perhaps core file is dropped somewhere? > Wouldnt i have to run a debugging enabled build of xen for that? > > I found this in the log dir: > > > root@netcatarina:/var/log/xen# cat xl-testmig--incoming.log > Waiting for domain testmig--incoming (domid 67) to die [pid 3429] > Domain 67 is dead > Action for shutdown reason code 1 is restart > Domain 67 needs to be cleaned up: destroying the domain > Done. Rebooting now > xc: error: 0-length read: Internal errorInteresting. That suggests we''ve gone back round to the migrate/restore path, but all the uses after the start: label (where we go back to on reboot) in create_domain seem to be gated on restore_file != NULL. I must be missing something... Adding some logging in create_domain wherever a *fd variable is used might be interesting, perhaps on the exit paths too. I notice that we don''t appear to close restore_fd in the child process. That probably isn''t related to this problem but would be worth doing I suspect.> xc: error: read_exact_timed failed (read rc: 0, errno: 0): Internal error > xc: error: read: p2m_size (0 = Success): Internal error> > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel