Hi, I''m trying to run Xen-unstable but I have a problem for save functionality, apparently some symbol is missing there: [2010-06-21 13:02:14 4333] DEBUG (XendCheckpoint:124) [xc_save]: /usr/lib64/xen/bin/xc_save 56 1 0 0 4 [2010-06-21 13:02:14 4333] INFO (XendCheckpoint:408) /usr/lib64/xen/bin/xc_save: symbol lookup error: /usr/lib64/xen/bin/xc_save: undefined symbol: xs_suspend_evtchn_port [2010-06-21 13:02:14 4333] ERROR (XendCheckpoint:178) Save failed on domain rhel5-32fv-stubdom (1) - resuming. Traceback (most recent call last): File "usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 146, in save forkHelper(cmd, fd, saveInputHandler, False) File "usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 394, in forkHelper raise XendError("%s failed: popen failed" % string.join(cmd)) XendError: /usr/lib64/xen/bin/xc_save 56 1 0 0 4 failed: popen failed [2010-06-21 13:02:14 4333] DEBUG (XendDomainInfo:3131) XendDomainInfo.resumeDomain(1) Any ideas? Michal -- Michal Novotny<minovotn@redhat.com>, RHCE Virtualization Team (xen userspace), Red Hat _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 21/06/2010 12:15, "Michal Novotny" <minovotn@redhat.com> wrote:> Hi, > I''m trying to run Xen-unstable but I have a problem for save > functionality, apparently some symbol is missing there:I don''t think it''s a msising symbol. Looks like popen threw an error. Perhaps xc_save binary couldn''t be found, or something like that? -- Keir> [2010-06-21 13:02:14 4333] DEBUG (XendCheckpoint:124) [xc_save]: > /usr/lib64/xen/bin/xc_save 56 1 0 0 4 > [2010-06-21 13:02:14 4333] INFO (XendCheckpoint:408) > /usr/lib64/xen/bin/xc_save: symbol lookup error: > /usr/lib64/xen/bin/xc_save: undefined symbol: xs_suspend_evtchn_port > [2010-06-21 13:02:14 4333] ERROR (XendCheckpoint:178) Save failed on > domain rhel5-32fv-stubdom (1) - resuming. > Traceback (most recent call last): > File "usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py", > line 146, in save > forkHelper(cmd, fd, saveInputHandler, False) > File "usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py", > line 394, in forkHelper > raise XendError("%s failed: popen failed" % string.join(cmd)) > XendError: /usr/lib64/xen/bin/xc_save 56 1 0 0 4 failed: popen failed > [2010-06-21 13:02:14 4333] DEBUG (XendDomainInfo:3131) > XendDomainInfo.resumeDomain(1) > > Any ideas? > > Michal_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 06/21/2010 02:05 PM, Keir Fraser wrote:> On 21/06/2010 12:15, "Michal Novotny"<minovotn@redhat.com> wrote: > > >> Hi, >> I''m trying to run Xen-unstable but I have a problem for save >> functionality, apparently some symbol is missing there: >> > I don''t think it''s a msising symbol. Looks like popen threw an error. > Perhaps xc_save binary couldn''t be found, or something like that? > > -- Keir > >Well, Keir, it does exist: # ls -al /usr/lib64/xen/bin/xc_save -rwxr-xr-x 1 root root 25245 Jun 18 21:37 /usr/lib64/xen/bin/xc_save # /usr/lib64/xen/bin/xc_save xc_save: usage: /usr/lib64/xen/bin/xc_save iofd domid maxit maxf flags Also, when I try to run it manually with the domain running and testing the save it returns: # /usr/lib64/xen/bin/xc_save 56 1 0 0 4 /usr/lib64/xen/bin/xc_save: symbol lookup error: undefined symbol: xs_suspend_evtchn_port When I try it to domain that is not running (or the iofd is not opened yet) it returns segmentation fault: # /usr/lib64/xen/bin/xc_save 56 1 0 0 4 Segmentation fault # gdb -arg /usr/lib64/xen/bin/xc_save 56 1 0 0 4 GNU gdb Fedora (6.8-27.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"... (gdb) run Starting program: /usr/lib64/xen/bin/xc_save 56 1 0 0 4 [Thread debugging using libthread_db enabled] [New Thread 0x7feb772b46e0 (LWP 6411)] Program received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? () (gdb) bt #0 0x0000000000000000 in ?? () #1 0x00000000004014f5 in main (argc=<value optimized out>, argv=0x7fff6a1f9868) at xc_save.c:192 (gdb) disassemble No function contains program counter for selected frame. (gdb) line 192 of xc_save is closing the xc_interface by "xc_interface_close(xc_fd);". Any ideas? Michal>> [2010-06-21 13:02:14 4333] DEBUG (XendCheckpoint:124) [xc_save]: >> /usr/lib64/xen/bin/xc_save 56 1 0 0 4 >> [2010-06-21 13:02:14 4333] INFO (XendCheckpoint:408) >> /usr/lib64/xen/bin/xc_save: symbol lookup error: >> /usr/lib64/xen/bin/xc_save: undefined symbol: xs_suspend_evtchn_port >> [2010-06-21 13:02:14 4333] ERROR (XendCheckpoint:178) Save failed on >> domain rhel5-32fv-stubdom (1) - resuming. >> Traceback (most recent call last): >> File "usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py", >> line 146, in save >> forkHelper(cmd, fd, saveInputHandler, False) >> File "usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py", >> line 394, in forkHelper >> raise XendError("%s failed: popen failed" % string.join(cmd)) >> XendError: /usr/lib64/xen/bin/xc_save 56 1 0 0 4 failed: popen failed >> [2010-06-21 13:02:14 4333] DEBUG (XendDomainInfo:3131) >> XendDomainInfo.resumeDomain(1) >> >> Any ideas? >> >> Michal >> > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > lists.xensource.com/xen-devel >-- Michal Novotny<minovotn@redhat.com>, RHCE Virtualization Team (xen userspace), Red Hat _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 21/06/2010 13:38, "Michal Novotny" <minovotn@redhat.com> wrote:> Well, Keir, it does exist: > > # ls -al /usr/lib64/xen/bin/xc_save > -rwxr-xr-x 1 root root 25245 Jun 18 21:37 /usr/lib64/xen/bin/xc_save > # /usr/lib64/xen/bin/xc_save > xc_save: usage: /usr/lib64/xen/bin/xc_save iofd domid maxit maxf flags > > Also, when I try to run it manually with the domain running and testing > the save it returns: > > # /usr/lib64/xen/bin/xc_save 56 1 0 0 4 > /usr/lib64/xen/bin/xc_save: symbol lookup error: undefined symbol: > xs_suspend_evtchn_portAh, the dynamic linker is picking up an old version of libxenstore. xs_suspend_evctnn_port was introduced in Xen 3.4.0. I missed that in your log-file snippet. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 06/21/2010 02:44 PM, Keir Fraser wrote:> On 21/06/2010 13:38, "Michal Novotny"<minovotn@redhat.com> wrote: > > >> Well, Keir, it does exist: >> >> # ls -al /usr/lib64/xen/bin/xc_save >> -rwxr-xr-x 1 root root 25245 Jun 18 21:37 /usr/lib64/xen/bin/xc_save >> # /usr/lib64/xen/bin/xc_save >> xc_save: usage: /usr/lib64/xen/bin/xc_save iofd domid maxit maxf flags >> >> Also, when I try to run it manually with the domain running and testing >> the save it returns: >> >> # /usr/lib64/xen/bin/xc_save 56 1 0 0 4 >> /usr/lib64/xen/bin/xc_save: symbol lookup error: undefined symbol: >> xs_suspend_evtchn_port >> > Ah, the dynamic linker is picking up an old version of libxenstore. > xs_suspend_evctnn_port was introduced in Xen 3.4.0. I missed that in your > log-file snippet. > > -- Keir > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > lists.xensource.com/xen-devel >Oh, ok, should it be in /usr/lib then or how to fix it ? Thanks, Michal -- Michal Novotny<minovotn@redhat.com>, RHCE Virtualization Team (xen userspace), Red Hat _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 21/06/2010 13:47, "Michal Novotny" <minovotn@redhat.com> wrote:>> Ah, the dynamic linker is picking up an old version of libxenstore. >> xs_suspend_evctnn_port was introduced in Xen 3.4.0. I missed that in your >> log-file snippet.> Oh, ok, should it be in /usr/lib then or how to fix it ?It should be where your dynamic linker will find it. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 06/21/2010 02:44 PM, Keir Fraser wrote:> On 21/06/2010 13:38, "Michal Novotny"<minovotn@redhat.com> wrote: > > >> Well, Keir, it does exist: >> >> # ls -al /usr/lib64/xen/bin/xc_save >> -rwxr-xr-x 1 root root 25245 Jun 18 21:37 /usr/lib64/xen/bin/xc_save >> # /usr/lib64/xen/bin/xc_save >> xc_save: usage: /usr/lib64/xen/bin/xc_save iofd domid maxit maxf flags >> >> Also, when I try to run it manually with the domain running and testing >> the save it returns: >> >> # /usr/lib64/xen/bin/xc_save 56 1 0 0 4 >> /usr/lib64/xen/bin/xc_save: symbol lookup error: undefined symbol: >> xs_suspend_evtchn_port >> > Ah, the dynamic linker is picking up an old version of libxenstore. > xs_suspend_evctnn_port was introduced in Xen 3.4.0. I missed that in your > log-file snippet. > > -- Keir > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > lists.xensource.com/xen-devel >Well, those are objdumps: $ objdump -x /xen-unstable.hg/tools/xcutils/xc_save | grep xs_sus 0000000000000000 F *UND* 00000000000000ad xs_suspend_evtchn_port $ objdump -x /usr/lib64/xen/bin/xc_save | grep xs_suspend 0000000000000000 F *UND* 00000000000000ad xs_suspend_evtchn_port $ ls -al /usr/lib64/libxenctrl.so lrwxrwxrwx 1 root root 17 Jun 21 13:52 /usr/lib64/libxenctrl.so -> libxenctrl.so.4.0 I was having some libxenctrl* files at /lib64 so I removed them in order to make linked link those from /usr/lib64 so it did. However, the error now is: [2010-06-21 14:52:22 6151] DEBUG (XendCheckpoint:126) [xc_save]: /usr/lib64/xen/bin/xc_save 57 1 0 0 4 [2010-06-21 14:52:22 6151] ERROR (XendCheckpoint:180) Save failed on domain rhel5-32fv-stubdom (1) - resuming. Traceback (most recent call last): File "/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 148, in save forkHelper(cmd, fd, saveInputHandler, False) File "/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 398, in forkHelper raise XendError("%s failed" % string.join(cmd)) XendError: /usr/lib64/xen/bin/xc_save 57 1 0 0 4 failed [2010-06-21 14:52:22 6151] DEBUG (XendDomainInfo:3131) XendDomainInfo.resumeDomain(1) and when I run xc_save manually it''s returning segmentation fault: gdb -arg /usr/lib64/xen/bin/xc_save 57 1 0 0 4 GNU gdb Fedora (6.8-27.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"... (gdb) run Starting program: /usr/lib64/xen/bin/xc_save 57 1 0 0 4 [Thread debugging using libthread_db enabled] [New Thread 0x7f6490b9d6e0 (LWP 6785)] Program received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? () (gdb) bt #0 0x0000000000000000 in ?? () #1 0x00000000004014f5 in main (argc=<value optimized out>, argv=0x7fffb5942ac8) at xc_save.c:192 (gdb) up #1 0x00000000004014f5 in main (argc=<value optimized out>, argv=0x7fffb5942ac8) at xc_save.c:192 192 port = xs_suspend_evtchn_port(si.domid); So I''m sorry, I was wrong about the line. This is from some other file. According to strace the "/usr/lib64/libxenctrl.so.4.0" (which has been built today) is loaded there but "objdump -x /usr/lib64/libxenctrl.so.4.0 | grep xs_" doesn''t return anything. Any ideas now? Michal -- Michal Novotny<minovotn@redhat.com>, RHCE Virtualization Team (xen userspace), Red Hat _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 21/06/2010 13:57, "Michal Novotny" <minovotn@redhat.com> wrote:> Well, those are objdumps: > > $ objdump -x /xen-unstable.hg/tools/xcutils/xc_save | grep xs_sus > 0000000000000000 F *UND* 00000000000000ad > xs_suspend_evtchn_port > $ objdump -x /usr/lib64/xen/bin/xc_save | grep xs_suspend > 0000000000000000 F *UND* 00000000000000ad > xs_suspend_evtchn_port > $ ls -al /usr/lib64/libxenctrl.so > lrwxrwxrwx 1 root root 17 Jun 21 13:52 /usr/lib64/libxenctrl.so -> > libxenctrl.so.4.0 > > I was having some libxenctrl* files at /lib64 so I removed them in order > to make linked link those from /usr/lib64 so it did. However, the error > now is:Well what was the above supposed to achieve? xs_suspend_evtchn_port is provided by libxenstore. It''s that library which is getting mis-linked. Moving the xc_save binary itself, and/or libxenctrl, isn''t going to change that. Go use ldd to find what libxenstore is being linked against; work out why it''s the wrong one; put the right one in its place. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 06/21/2010 02:56 PM, Keir Fraser wrote:> On 21/06/2010 13:47, "Michal Novotny"<minovotn@redhat.com> wrote: > > >>> Ah, the dynamic linker is picking up an old version of libxenstore. >>> xs_suspend_evctnn_port was introduced in Xen 3.4.0. I missed that in your >>> log-file snippet. >>> > >> Oh, ok, should it be in /usr/lib then or how to fix it ? >> > It should be where your dynamic linker will find it. :-) > > -- Keir > > >One more thing, should the xenstore daemon be running or not? Isn''t it in kernel and xenfs now ? I saw there''s a file: /usr/src/redhat/BUILD/kernel-2.6.32.15/drivers/xen/xenfs/xenstored.c in the kernel source codes. But I''m still having /usr/sbin/xenstored. Should this be running or not? Also, when I kill it there''s an error: [2010-06-21 15:08:59 6953] ERROR (SrvDaemon:349) Exception starting xend ((111, ''Connection refused'')) Traceback (most recent call last): File "usr/lib64/python2.4/site-packages/xen/xend/server/SrvDaemon.py", line 341, in run servers = SrvServer.create() File "usr/lib64/python2.4/site-packages/xen/xend/server/SrvServer.py", line 258, in create root.putChild(''xend'', SrvRoot()) File "usr/lib64/python2.4/site-packages/xen/xend/server/SrvRoot.py", line 40, in __init__ self.get(name) File "usr/lib64/python2.4/site-packages/xen/web/SrvDir.py", line 84, in get val = val.getobj() File "usr/lib64/python2.4/site-packages/xen/web/SrvDir.py", line 52, in getobj self.obj = klassobj() File "usr/lib64/python2.4/site-packages/xen/xend/server/SrvNode.py", line 30, in __init__ self.xn = XendNode.instance() File "usr/lib64/python2.4/site-packages/xen/xend/XendNode.py", line 1176, in instance inst = XendNode() File "usr/lib64/python2.4/site-packages/xen/xend/XendNode.py", line 163, in __init__ self._init_cpu_pools() File "usr/lib64/python2.4/site-packages/xen/xend/XendNode.py", line 377, in _init_cpu_pools XendCPUPool.recreate_active_pools() File "usr/lib64/python2.4/site-packages/xen/xend/XendCPUPool.py", line 754, in recreate_active_pools uuid = xstransact.Read(path, ''uuid'') File "usr/lib64/python2.4/site-packages/xen/xend/xenstore/xstransact.py", line 307, in Read return complete(path, lambda t: t.read(*args)) File "usr/lib64/python2.4/site-packages/xen/xend/xenstore/xstransact.py", line 361, in complete t = xstransact(path) File "usr/lib64/python2.4/site-packages/xen/xend/xenstore/xstransact.py", line 29, in __init__ self.transaction = xshandle().transaction_start() File "usr/lib64/python2.4/site-packages/xen/xend/xenstore/xsutil.py", line 18, in xshandle xs_handle = xen.lowlevel.xs.xs() Error: (111, ''Connection refused'') Did xenstore location change or something? # ls -al /usr/sbin/xenstored -rwxr-xr-x 1 root root 257580 Jun 18 21:37 /usr/sbin/xenstored Michal -- Michal Novotny<minovotn@redhat.com>, RHCE Virtualization Team (xen userspace), Red Hat _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 21/06/2010 14:10, "Michal Novotny" <minovotn@redhat.com> wrote:> One more thing, should the xenstore daemon be running or not? Isn''t it > in kernel and xenfs now ? > > I saw there''s a file: > /usr/src/redhat/BUILD/kernel-2.6.32.15/drivers/xen/xenfs/xenstored.cThat will eb the kernel''s own interface to the xenstore daemon. Or the interface it exposes to userspace to allow it to communicate to the xenstore daemon via the kernel''s xenstore connection.> in the kernel source codes. But I''m still having /usr/sbin/xenstored. > Should this be running or not?It should still run. The xenstore daemon remains a user-space daemon. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 06/21/2010 03:08 PM, Keir Fraser wrote:> On 21/06/2010 13:57, "Michal Novotny"<minovotn@redhat.com> wrote: > > >> Well, those are objdumps: >> >> $ objdump -x /xen-unstable.hg/tools/xcutils/xc_save | grep xs_sus >> 0000000000000000 F *UND* 00000000000000ad >> xs_suspend_evtchn_port >> $ objdump -x /usr/lib64/xen/bin/xc_save | grep xs_suspend >> 0000000000000000 F *UND* 00000000000000ad >> xs_suspend_evtchn_port >> $ ls -al /usr/lib64/libxenctrl.so >> lrwxrwxrwx 1 root root 17 Jun 21 13:52 /usr/lib64/libxenctrl.so -> >> libxenctrl.so.4.0 >> >> I was having some libxenctrl* files at /lib64 so I removed them in order >> to make linked link those from /usr/lib64 so it did. However, the error >> now is: >> > Well what was the above supposed to achieve? xs_suspend_evtchn_port is > provided by libxenstore. It''s that library which is getting mis-linked. > Moving the xc_save binary itself, and/or libxenctrl, isn''t going to change > that. Go use ldd to find what libxenstore is being linked against; work out > why it''s the wrong one; put the right one in its place. > > -- Keir > > >Well, I was trying to show that the version that''s being linked it the 4.0 version and not 3.4 or older one. Also, # ldd /usr/lib64/libxenctrl.so linux-vdso.so.1 => (0x00007fff275ff000) libpthread.so.0 => /lib64/libpthread.so.0 (0x000000391ec00000) libc.so.6 => /lib64/libc.so.6 (0x000000391e000000) /lib64/ld-linux-x86-64.so.2 (0x000000391dc00000) # objdump -x /usr/lib64/libxenctrl.so | grep xs_ # So there''s no xs_suspend_evtchn_port (or anything xs_*) function being exported by /usr/lib64/libxenctrl.so (which is the symlink to /usr/lib64/libxenctrl.so.4.0.0), therefore: #objdump -x /usr/lib64/libxenctrl.so.4.0.0 | grep xs_ #objdump -x /xen-unstable.hg/tools/libxc/libxenctrl.so.4.0.0 | grep xs_ # So the problem here is the missing xs_*. Note: /xen-unstable.hg/* is the path with Xen-4.1-unstable source codes downloaded from Mercurial and compiled. Michal -- Michal Novotny<minovotn@redhat.com>, RHCE Virtualization Team (xen userspace), Red Hat _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 21/06/2010 14:18, "Michal Novotny" <minovotn@redhat.com> wrote:> So there''s no xs_suspend_evtchn_port (or anything xs_*) function being > exported by /usr/lib64/libxenctrl.so (which is the symlink to > /usr/lib64/libxenctrl.so.4.0.0), therefore: > > #objdump -x /usr/lib64/libxenctrl.so.4.0.0 | grep xs_ > #objdump -x /xen-unstable.hg/tools/libxc/libxenctrl.so.4.0.0 | grep xs_Libxenstore, libxenstore, libxen**STORE**. #nm /my/path/to/libxenstore.so.3.0.0 | grep xs_sus 000000000000346a T xs_suspend_evtchn_port -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 06/21/2010 03:24 PM, Keir Fraser wrote:> On 21/06/2010 14:18, "Michal Novotny"<minovotn@redhat.com> wrote: > > >> So there''s no xs_suspend_evtchn_port (or anything xs_*) function being >> exported by /usr/lib64/libxenctrl.so (which is the symlink to >> /usr/lib64/libxenctrl.so.4.0.0), therefore: >> >> #objdump -x /usr/lib64/libxenctrl.so.4.0.0 | grep xs_ >> #objdump -x /xen-unstable.hg/tools/libxc/libxenctrl.so.4.0.0 | grep xs_ >> > Libxenstore, libxenstore, libxen**STORE**. > > #nm /my/path/to/libxenstore.so.3.0.0 | grep xs_sus > 000000000000346a T xs_suspend_evtchn_port > > -- Keir > > >Oh, sorry for that and thanks for noticing my mistake. I overlooked this one, nevertheless it''s still not working: [2010-06-21 17:27:39 4305] DEBUG (XendCheckpoint:126) [xc_save]: /usr/lib64/xen/bin/xc_save 56 1 0 0 4 [2010-06-21 17:27:39 4305] INFO (XendCheckpoint:410) xc_save: failed to get the suspend evtchn port [2010-06-21 17:27:39 4305] INFO (XendCheckpoint:410) [2010-06-21 17:27:39 4305] DEBUG (XendCheckpoint:381) suspend [2010-06-21 17:27:39 4305] DEBUG (XendCheckpoint:129) In saveInputHandler suspend [2010-06-21 17:27:39 4305] DEBUG (XendCheckpoint:131) Suspending 1 ... [2010-06-21 17:27:39 4305] DEBUG (XendDomainInfo:521) XendDomainInfo.shutdown(suspend) [2010-06-21 17:27:39 4305] DEBUG (XendDomainInfo:1877) XendDomainInfo.handleShutdownWatch [2010-06-21 17:27:39 4305] INFO (XendDomainInfo:538) HVM save:remote shutdown dom 1! [2010-06-21 17:27:39 4305] INFO (XendCheckpoint:137) Domain 1 suspended. [2010-06-21 17:27:39 4305] INFO (XendDomainInfo:2074) Domain has shutdown: name=migrating-rhel5-32fv-stubdom id=1 reason=suspend. [2010-06-21 17:27:40 4305] INFO (image:538) signalDeviceModel:restore dm state to running [2010-06-21 17:27:40 4305] DEBUG (XendCheckpoint:146) Written done [2010-06-21 17:27:46 4305] DEBUG (XendDomainInfo:3067) XendDomainInfo.destroy: domid=1 [2010-06-21 17:27:46 4305] DEBUG (XendDomainInfo:2397) Destroying device model [2010-06-21 17:27:47 4305] INFO (image:615) migrating-rhel5-32fv-stubdom device model terminated # ls -al rhel5-32fv.sav -rwxr-xr-x 1 root root 54657427 Jun 21 17:27 rhel5-32fv.sav # xm restore rhel5-32fv.sav Error: /usr/lib64/xen/bin/xc_restore 4 2 2 3 1 1 1 0 failed Usage: xm restore <CheckpointFile> [-p] Restore a domain from a saved state. -p, --paused Do not unpause domain after restoring it # tail /var/log/xen/xend.log [2010-06-21 17:29:21 4305] INFO (image:822) Need to create platform device.[domid:2] [2010-06-21 17:29:21 4305] DEBUG (XendCheckpoint:273) restore:shadow=0x9, _static_max=0x40000000, _static_min=0x0, [2010-06-21 17:29:21 4305] DEBUG (XendCheckpoint:292) [xc_restore]: /usr/lib64/xen/bin/xc_restore 4 2 2 3 1 1 1 0 [2010-06-21 17:29:22 4305] INFO (XendCheckpoint:410) xc: error: Error when reading batch size (0 = Success): Internal error [2010-06-21 17:29:22 4305] INFO (XendCheckpoint:410) xc: error: error when buffering batch, finishing (0 = Success): Internal error [2010-06-21 17:29:22 4305] INFO (XendCheckpoint:410) xc: error: error zeroing magic pages (22 = Invalid argument): Internal error [2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:3067) XendDomainInfo.destroy: domid=2 [2010-06-21 17:29:22 4305] ERROR (XendDomainInfo:3081) XendDomainInfo.destroy: domain destruction failed. Traceback (most recent call last): File "usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 3074, in destroy xc.domain_pause(self.domid) Error: (3, ''No such process'') [2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:2402) No device model [2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:2404) Releasing devices [2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:2410) Removing vif/0 [2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0 [2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:2410) Removing vbd/768 [2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/768 [2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:2410) Removing vbd/2048 [2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/2048 [2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:2410) Removing vfb/0 [2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = vfb, device = vfb/0 [2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:2410) Removing console/0 [2010-06-21 17:29:22 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = console, device = console/0 [2010-06-21 17:29:22 4305] ERROR (XendCheckpoint:344) /usr/lib64/xen/bin/xc_restore 4 2 2 3 1 1 1 0 failed Traceback (most recent call last): File "/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 296, in restore forkHelper(cmd, fd, handler.handler, True) File "/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 398, in forkHelper raise XendError("%s failed" % string.join(cmd)) XendError: /usr/lib64/xen/bin/xc_restore 4 2 2 3 1 1 1 0 failed [2010-06-21 17:29:22 4305] ERROR (XendDomain:1182) Restore failed Traceback (most recent call last): File "usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 1166, in domain_restore_fd dominfo = XendCheckpoint.restore(self, fd, paused=paused, relocating=relocating) File "/usr/lib64/python2.4/site-packages/xen/xend/XendCheckpoint.py", line 345, in restore raise exn XendError: /usr/lib64/xen/bin/xc_restore 4 2 2 3 1 1 1 0 failed So now it seems to be linked with the correct library but it can''t get the suspend port by now: >> xc_save: failed to get the suspend evtchn port This is being called from xc_save.c (snippet from line 192): ... port = xs_suspend_evtchn_port(si.domid); if (port < 0) warnx("failed to get the suspend evtchn port\n"); else { ... suspend... } I had a look at the code for xenstore/xs.c I saw it''s reading the value at: /local/domain/%d/device/suspend/event-channel but when I try to get it using: xenstore-ls /local/domain/3/device/suspendupstream where 3 is my domid I saw nothing, I saw just: # xenstore-ls /local/domain/3/device vfb = "" 0 = "" state = "1" backend-id = "0" backend = "/local/domain/0/backend/vfb/3/0" vbd = "" 768 = "" backend-id = "0" virtual-device = "768" device-type = "disk" state = "1" backend = "/local/domain/0/backend/vbd/3/768" 2048 = "" backend-id = "0" virtual-device = "2048" device-type = "disk" state = "1" backend = "/local/domain/0/backend/vbd/3/2048" vif = "" 0 = "" state = "1" backend-id = "0" backend = "/local/domain/0/backend/vif/3/0" console = "" 0 = "" state = "1" backend-id = "0" backend = "/local/domain/0/backend/console/3/0" # My guest is RHEL-5 i386 guest but this seems that the suspend port is missing. AFAIK, you started using the SUSPEND_CANCEL some time ago which requires the modified kernel. Isn''t it possible that''s the issue or how is it with the SUSPEND_CANCEL functionality? Thanks, Michal -- Michal Novotny<minovotn@redhat.com>, RHCE Virtualization Team (xen userspace), Red Hat _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 21/06/2010 14:37, "Michal Novotny" <minovotn@redhat.com> wrote:> My guest is RHEL-5 i386 guest but this seems that the suspend port is > missing. AFAIK, you started using the SUSPEND_CANCEL some time ago which > requires the modified kernel. > > Isn''t it possible that''s the issue or how is it with the SUSPEND_CANCEL > functionality?SUSPEND_CANCEL is a different thing. The suspend port is simply a quicker way for suspend notifications to be passed back and forth between the guest and the dom0 toolstack. We fall back okay if the guest kernel does not support the new faster method. I''m not sure why the domain restore operation fails. Unfortunately some error messages are now expected in the logs, since Remus functionality went into the tree. So it''s hard to work out what the first error is. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 06/21/2010 03:45 PM, Keir Fraser wrote:> On 21/06/2010 14:37, "Michal Novotny"<minovotn@redhat.com> wrote: > > >> My guest is RHEL-5 i386 guest but this seems that the suspend port is >> missing. AFAIK, you started using the SUSPEND_CANCEL some time ago which >> requires the modified kernel. >> >> Isn''t it possible that''s the issue or how is it with the SUSPEND_CANCEL >> functionality? >> > SUSPEND_CANCEL is a different thing. The suspend port is simply a quicker > way for suspend notifications to be passed back and forth between the guest > and the dom0 toolstack. We fall back okay if the guest kernel does not > support the new faster method. > > I''m not sure why the domain restore operation fails. Unfortunately some > error messages are now expected in the logs, since Remus functionality went > into the tree. So it''s hard to work out what the first error is. > > -- Keir > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > lists.xensource.com/xen-devel >Ok Keir, but what I don''t understand is why there''s nothing in `/local/domain/%d/device/suspend/event-channel`. So this is OK? For the restore functionality: # ls -ahl rhel5-32fv.sav -rwxr-xr-x 1 root root 53M Jun 21 2010 rhel5-32fv.sav As you can see the save file is 53M big but the guest was having 1G of memory and I think this is why it''s failing. You can see it should be having 1G of memory here too: ... [2010-06-21 17:29:20 4305] DEBUG (XendDomainInfo:237) XendDomainInfo.restore([''domain'', [''domid'', ''1''], [''cpu_weight'', ''256''], [''cpu_cap'', ''0''], [''on_crash'', ''restart''], [''uuid'', ''c91ec802-2015-cb49-80e5-810c808bf725''], [''bootloader_args''], [''pool_name'', ''Pool-0''], [''vcpus'', ''1''], [''name'', ''rhel5-32fv-stubdom''], [''on_poweroff'', ''destroy''], [''on_reboot'', ''restart''], [''cpus'', [[]]], [''description''], [''bootloader''], [''maxmem'', ''1024''],* [''memory'', ''1024''],* [''shadow_memory'', ''9''], [''vcpu_avail'', ''1''], [''features''], [''on_xend_start'', ''ignore''], [''on_xend_stop'', ''ignore''], [''start_time'', ''1277134046.11''], [''cpu_time'', ''1.550284835''], [''online_vcpus'', ''1''], [''image'', [''hvm'', [''kernel''], [''superpages'', ''0''], [''tsc_mode'', ''0''], [''videoram'', ''4''], [''hpet'', ''0''], [''boot'', ''c''], [''loader'', ''/usr/lib/xen/boot/hvmloader''], [''serial'', ''pty''], [''vpt_align'', ''1''], [''xen_platform_pci'', ''1''], [''opengl'', ''1''], [''vncunused'', ''1''], [''rtc_timeoffset'', ''0''], [''pci'', []], [''pae'', ''1''], [''stdvga'', ''0''], [''hap'', ''1''], [''viridian'', ''0''], [''acpi'', ''1''], [''localtime'', ''0''], [''timer_mode'', ''1''], [''vnc'', ''1''], [''nographic'', ''0''], [''guest_os_type'', ''default''], [''vncdisplay'', ''1''], [''pci_msitranslate'', ''1''], [''oos'', ''1''], [''apic'', ''1''], [''sdl'', ''0''], [''nomigrate'', ''0''], [''device_model'', ''/usr/lib/xen/bin/qemu-dm''], [''pci_power_mgmt'', ''0''], [''usb'', ''0''], [''xauthority'', ''/root/.Xauthority''], [''isa'', ''0''], [''display'', ''localhost:10.0''], [''notes'', [''SUSPEND_CANCEL'', ''1'']]]], [''status'', ''2''], [''state'', ''r-----''], [''store_mfn'', ''1044476''], [''device'', [''vif'', [''bridge'', ''virbr0''], [''uuid'', ''dcd99a20-2e8f-2692-8e56-dc4051579923''], [''script'', ''/etc/xen/scripts/vif-bridge''], [''mac'', ''00:16:3e:5b:bd:9c''], [''type'', ''ioemu''], [''backend'', ''0'']]], [''device'', [''vbd'', [''uuid'', ''e7e07da9-c104-800d-ee3f-5fe9757167fd''], [''bootable'', ''1''], [''dev'', ''hda:disk''], [''uname'', ''file:/var/lib/xen/images/colossus/rhel5-32fv.img''], [''mode'', ''w''], [''backend'', ''0''], [''VDI'']]], [''device'', [''vbd'', [''uuid'', ''0180089b-8394-cbfa-0da4-b8c1fc688617''], [''bootable'', ''0''], [''dev'', ''sda:disk''], [''uname'', ''file:/home2/test.img''], [''mode'', ''w''], [''backend'', ''0''], [''VDI'']]], [''device'', [''vfb'', [''vncunused'', ''1''], [''location'', ''127.0.0.1:5901''], [''vnc'', ''1''], [''vncdisplay'', ''1''], [''uuid'', ''7fa1bcc0-797d-66ac-eb88-6ef15f1209f0'']]], [''device'', [''console'', [''protocol'', ''vt100''], [''location'', ''3''], [''uuid'', ''d77b182b-4152-a4d2-f577-8b610b5cd6ff'']]]]) The first error (Error when reading batch size (0 = Success): Internal error) is coming from libxc/xc_domain_restore.c in pagebuf_get_one() function where it is there: ... if ( RDEXACT(fd, &count, sizeof(count)) ) { PERROR("Error when reading batch size"); return -1; } ... so I guess the data are not well-written for this guest (since the file is smaller than the original guest memory) and that''s why the error occurs. As you can see there''s nothing in xend.log except "failed to get the suspend evtchn port" message: [2010-06-21 15:59:55 4305] DEBUG (XendCheckpoint:126) [xc_save]: /usr/lib64/xen/bin/xc_save 56 5 0 0 4 [2010-06-21 15:59:55 4305] INFO (XendCheckpoint:410) xc_save: failed to get the suspend evtchn port [2010-06-21 15:59:55 4305] INFO (XendCheckpoint:410) [2010-06-21 15:59:55 4305] DEBUG (XendCheckpoint:381) suspend [2010-06-21 15:59:55 4305] DEBUG (XendCheckpoint:129) In saveInputHandler suspend [2010-06-21 15:59:55 4305] DEBUG (XendCheckpoint:131) Suspending 5 ... [2010-06-21 15:59:55 4305] DEBUG (XendDomainInfo:521) XendDomainInfo.shutdown(suspend) [2010-06-21 15:59:55 4305] DEBUG (XendDomainInfo:1877) XendDomainInfo.handleShutdownWatch [2010-06-21 15:59:55 4305] INFO (XendDomainInfo:538) HVM save:remote shutdown dom 5! [2010-06-21 15:59:55 4305] INFO (XendDomainInfo:2074) Domain has shutdown: name=migrating-rhel5-32fv-stubdom id=5 reason=suspend. [2010-06-21 15:59:55 4305] INFO (XendCheckpoint:137) Domain 5 suspended. [2010-06-21 15:59:56 4305] INFO (image:538) signalDeviceModel:restore dm state to running [2010-06-21 15:59:56 4305] DEBUG (XendCheckpoint:146) Written done [2010-06-21 16:00:02 4305] DEBUG (XendDomainInfo:3067) XendDomainInfo.destroy: domid=5 [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:2397) Destroying device model [2010-06-21 16:00:03 4305] INFO (image:615) migrating-rhel5-32fv-stubdom device model terminated [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:2404) Releasing devices [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:2410) Removing vif/0 [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0 [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:2410) Removing vbd/768 [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/768 [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:2410) Removing vbd/2048 [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/2048 [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:2410) Removing vfb/0 [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = vfb, device = vfb/0 [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:2410) Removing console/0 [2010-06-21 16:00:03 4305] DEBUG (XendDomainInfo:1272) XendDomainInfo.destroyDevice: deviceClass = console, device = console/0 Any ideas why the save file is that small (it should be 1024M at least, right? ) ? Thanks, Michal -- Michal Novotny<minovotn@redhat.com>, RHCE Virtualization Team (xen userspace), Red Hat _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 21/06/2010 15:02, "Michal Novotny" <minovotn@redhat.com> wrote:> Ok Keir, but what I don''t understand is why there''s nothing in > `/local/domain/%d/device/suspend/event-channel`. So this is OK?Yes that''s okay. The guest writes that location only If it supports the new event-channel notification method for suspend.> For the restore functionality: > > # ls -ahl rhel5-32fv.sav > -rwxr-xr-x 1 root root 53M Jun 21 2010 rhel5-32fv.savOkay, yeah, if the guest really has 1G of memory allocated to it at the time of the save, then something has gone wrong on the save side. But no errors were apparent on the save side, apart from that event-channel port warning which is benign? That''s weird. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 06/21/2010 04:06 PM, Keir Fraser wrote:> On 21/06/2010 15:02, "Michal Novotny"<minovotn@redhat.com> wrote: > > >> Ok Keir, but what I don''t understand is why there''s nothing in >> `/local/domain/%d/device/suspend/event-channel`. So this is OK? >> > Yes that''s okay. The guest writes that location only If it supports the new > event-channel notification method for suspend. > > >> For the restore functionality: >> >> # ls -ahl rhel5-32fv.sav >> -rwxr-xr-x 1 root root 53M Jun 21 2010 rhel5-32fv.sav >> > Okay, yeah, if the guest really has 1G of memory allocated to it at the time > of the save, then something has gone wrong on the save side. But no errors > were apparent on the save side, apart from that event-channel port warning > which is benign? That''s weird. > > -- Keir > > >Exactly Keir. Nothing wrong there except the event-channel port missing warning. So this is something strange I guess. Well, I''ll try to investigate this further then. Michal -- Michal Novotny<minovotn@redhat.com>, RHCE Virtualization Team (xen userspace), Red Hat _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 06/21/2010 04:06 PM, Keir Fraser wrote:> On 21/06/2010 15:02, "Michal Novotny"<minovotn@redhat.com> wrote: > > >> Ok Keir, but what I don''t understand is why there''s nothing in >> `/local/domain/%d/device/suspend/event-channel`. So this is OK? >> > Yes that''s okay. The guest writes that location only If it supports the new > event-channel notification method for suspend. > > >> For the restore functionality: >> >> # ls -ahl rhel5-32fv.sav >> -rwxr-xr-x 1 root root 53M Jun 21 2010 rhel5-32fv.sav >> > Okay, yeah, if the guest really has 1G of memory allocated to it at the time > of the save, then something has gone wrong on the save side. But no errors > were apparent on the save side, apart from that event-channel port warning > which is benign? That''s weird. > > -- Keir > > >Well, I see what''s going on now but I had to turn on the debug flag now. According to the code, this is what''s being hit per almost each page and MFN: ... if ( pfn_type[j] == XEN_DOMCTL_PFINFO_XTAB ) { printf("type fail: page %i mfn %08lx\n", j, gmfn); continue; } ... Xend.log is having many occurences of type failures: [2010-06-21 16:48:53 10573] DEBUG (XendCheckpoint:381) type fail: page 414 mfn 0000659e [2010-06-21 16:48:53 10573] DEBUG (XendCheckpoint:129) In saveInputHandler type fail: page 414 mfn 0000659e [2010-06-21 16:48:53 10573] DEBUG (XendCheckpoint:381) type fail: page 415 mfn 0000659f [2010-06-21 16:48:53 10573] DEBUG (XendCheckpoint:129) In saveInputHandler type fail: page 415 mfn 0000659f [2010-06-21 16:48:53 10573] DEBUG (XendCheckpoint:381) type fail: page 416 mfn 000065a0 [2010-06-21 16:48:53 10573] DEBUG (XendCheckpoint:129) In saveInputHandler type fail: page 416 mfn 000065a0 Well, did somebody hit those save issues or similar? Thanks, Michal -- Michal Novotny<minovotn@redhat.com>, RHCE Virtualization Team (xen userspace), Red Hat _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 21/06/2010 15:50, "Michal Novotny" <minovotn@redhat.com> wrote:> Well, I see what''s going on now but I had to turn on the debug flag now. > According to the code, this is what''s being hit per almost each page and > MFN: > > ... > if ( pfn_type[j] == XEN_DOMCTL_PFINFO_XTAB ) > { > printf("type fail: page %i mfn %08lx\n", j, gmfn); > continue; > } > ... > > Xend.log is having many occurences of type failures:Hm, well I just saved/restored a 750MB PV guest and a 750MB HVM guest okay with xen-unstable tip. However I think the above type-checking code may only have been introduced for HVM guests as of Jan Beulich''s recent changeset 21615. If your guest is an HVM guest, perhaps this changeset is to blame? I would suggest reverting it and re-testing. I''ve cc''ed Jan in case he can comment on the info you''ve given. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 06/21/2010 07:57 PM, Keir Fraser wrote:> On 21/06/2010 15:50, "Michal Novotny"<minovotn@redhat.com> wrote: > > >> Well, I see what''s going on now but I had to turn on the debug flag now. >> According to the code, this is what''s being hit per almost each page and >> MFN: >> >> ... >> if ( pfn_type[j] == XEN_DOMCTL_PFINFO_XTAB ) >> { >> printf("type fail: page %i mfn %08lx\n", j, gmfn); >> continue; >> } >> ... >> >> Xend.log is having many occurences of type failures: >> > Hm, well I just saved/restored a 750MB PV guest and a 750MB HVM guest okay > with xen-unstable tip. However I think the above type-checking code may only > have been introduced for HVM guests as of Jan Beulich''s recent changeset > 21615. If your guest is an HVM guest, perhaps this changeset is to blame? I > would suggest reverting it and re-testing. I''ve cc''ed Jan in case he can > comment on the info you''ve given. >Keir, you were right about the c/s 21615. I did try reverting it and it was working fine to both save and restore when I was having this patch reverted. Michal -- Michal Novotny<minovotn@redhat.com>, RHCE Virtualization Team (xen userspace), Red Hat _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 22/06/2010 06:38, "Michal Novotny" <minovotn@redhat.com> wrote:>> Hm, well I just saved/restored a 750MB PV guest and a 750MB HVM guest okay >> with xen-unstable tip. However I think the above type-checking code may only >> have been introduced for HVM guests as of Jan Beulich''s recent changeset >> 21615. If your guest is an HVM guest, perhaps this changeset is to blame? I >> would suggest reverting it and re-testing. I''ve cc''ed Jan in case he can >> comment on the info you''ve given. > > you were right about the c/s 21615. I did try reverting it and it was > working fine to both save and restore when I was having this patch reverted.Okay, Jan should be ebale to help then. We don''t really want to revert that changeset as it is itself a bugfix. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 06/22/2010 07:58 AM, Keir Fraser wrote:> On 22/06/2010 06:38, "Michal Novotny"<minovotn@redhat.com> wrote: > > >>> Hm, well I just saved/restored a 750MB PV guest and a 750MB HVM guest okay >>> with xen-unstable tip. However I think the above type-checking code may only >>> have been introduced for HVM guests as of Jan Beulich''s recent changeset >>> 21615. If your guest is an HVM guest, perhaps this changeset is to blame? I >>> would suggest reverting it and re-testing. I''ve cc''ed Jan in case he can >>> comment on the info you''ve given. >>> >> you were right about the c/s 21615. I did try reverting it and it was >> working fine to both save and restore when I was having this patch reverted. >> > Okay, Jan should be ebale to help then. We don''t really want to revert that > changeset as it is itself a bugfix. > > -- Keir > > >Jan, as we were solving this with Keir on the list I was unable to save the HVM guest properly and reverting your c/s 21615 helped and with your patch reverted it was working fine. Any idea why there were so type failures for all the pages of HVM guest? Thanks, Michal -- Michal Novotny<minovotn@redhat.com>, RHCE Virtualization Team (xen userspace), Red Hat _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
>>> On 22.06.10 at 08:23, Michal Novotny <minovotn@redhat.com> wrote: > Jan, as we were solving this with Keir on the list I was unable to save > the HVM guest properly and reverting your c/s 21615 helped and with your > patch reverted it was working fine. Any idea why there were so type > failures for all the pages of HVM guest?Not really, and I think you will need to find out by instrumenting the hypercall implementation (XEN_DOMCTL_getpageframeinfo{2,3}) - really, type failures should only happen for grant frames and I/O memory pages (as you can see in the code, for the function to report XTAB either the MFN must be invalid, must be a Xen frame, or it must be impossible to obtain a reference to the page - all of which shouldn''t hold for the majority of a HVM guest''s pages). Btw., Keir, is it really correct for the type to be returned as zero (normal page) when xsm_getpageframeinfo() returns non-zero? Is there anything special about the guest you''re trying to save? That patch was tested quite extensively here before submission... Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 22/06/2010 07:48, "Jan Beulich" <JBeulich@novell.com> wrote:> Btw., Keir, is it really correct for the type to be returned as zero > (normal page) when xsm_getpageframeinfo() returns non-zero?Well, maybe it should return XTAB... Really, who knows. I''m sure if you did use XSM and hit that path, no good would come of it for the caller whatever! -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
>>> On 22.06.10 at 08:51, Keir Fraser <keir.fraser@eu.citrix.com> wrote: > On 22/06/2010 07:48, "Jan Beulich" <JBeulich@novell.com> wrote: > >> Btw., Keir, is it really correct for the type to be returned as zero >> (normal page) when xsm_getpageframeinfo() returns non-zero? > > Well, maybe it should return XTAB... Really, who knows. I''m sure if you did > use XSM and hit that path, no good would come of it for the caller whatever!If you don''t know, really who does? Or is the XSM stuff dead code? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 22/06/2010 08:12, "Jan Beulich" <JBeulich@novell.com> wrote:>>> Btw., Keir, is it really correct for the type to be returned as zero >>> (normal page) when xsm_getpageframeinfo() returns non-zero? >> >> Well, maybe it should return XTAB... Really, who knows. I''m sure if you did >> use XSM and hit that path, no good would come of it for the caller whatever! > > If you don''t know, really who does? Or is the XSM stuff dead code?The NSA guys (George Coker et al) maintain it. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 06/22/2010 08:48 AM, Jan Beulich wrote:>>>> On 22.06.10 at 08:23, Michal Novotny<minovotn@redhat.com> wrote: >>>> >> Jan, as we were solving this with Keir on the list I was unable to save >> the HVM guest properly and reverting your c/s 21615 helped and with your >> patch reverted it was working fine. Any idea why there were so type >> failures for all the pages of HVM guest? >> > Not really, and I think you will need to find out by instrumenting the > hypercall implementation (XEN_DOMCTL_getpageframeinfo{2,3}) - > really, type failures should only happen for grant frames and I/O > memory pages (as you can see in the code, for the function to > report XTAB either the MFN must be invalid, must be a Xen frame, > or it must be impossible to obtain a reference to the page - all of > which shouldn''t hold for the majority of a HVM guest''s pages). > > Btw., Keir, is it really correct for the type to be returned as zero > (normal page) when xsm_getpageframeinfo() returns non-zero? > > Is there anything special about the guest you''re trying to save? That > patch was tested quite extensively here before submission... > > Jan >Nothing special, the guest is RHEL-5 i386 HVM guest with the serial console enabled in /boot/grub/grub.conf to be able to access it using the `xm console` command. It returns no error printed to the xend.log (except when I added the debug output) but the total size of the ''ls -alh /path/to/save/file'' (after the save) is 54M instead of > 1G (the guest memory size). Michal -- Michal Novotny<minovotn@redhat.com>, RHCE Virtualization Team (xen userspace), Red Hat _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 22/06/2010 07:48, "Jan Beulich" <JBeulich@novell.com> wrote:>>>> On 22.06.10 at 08:23, Michal Novotny <minovotn@redhat.com> wrote: >> Jan, as we were solving this with Keir on the list I was unable to save >> the HVM guest properly and reverting your c/s 21615 helped and with your >> patch reverted it was working fine. Any idea why there were so type >> failures for all the pages of HVM guest? > > Not really, and I think you will need to find out by instrumenting the > hypercall implementation (XEN_DOMCTL_getpageframeinfo{2,3}) - > really, type failures should only happen for grant frames and I/O > memory pages (as you can see in the code, for the function to > report XTAB either the MFN must be invalid, must be a Xen frame, > or it must be impossible to obtain a reference to the page - all of > which shouldn''t hold for the majority of a HVM guest''s pages).Michal, is it possible you are running new tools against a slightly older hypervisor? The tools changes had a corresponding, non-version-checked, modification to a hypercall. Without running the modified hypervisor, HVM save would indeed behave very oddly! -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 06/22/2010 12:01 PM, Keir Fraser wrote:> On 22/06/2010 07:48, "Jan Beulich"<JBeulich@novell.com> wrote: > > >>>>> On 22.06.10 at 08:23, Michal Novotny<minovotn@redhat.com> wrote: >>>>> >>> Jan, as we were solving this with Keir on the list I was unable to save >>> the HVM guest properly and reverting your c/s 21615 helped and with your >>> patch reverted it was working fine. Any idea why there were so type >>> failures for all the pages of HVM guest? >>> >> Not really, and I think you will need to find out by instrumenting the >> hypercall implementation (XEN_DOMCTL_getpageframeinfo{2,3}) - >> really, type failures should only happen for grant frames and I/O >> memory pages (as you can see in the code, for the function to >> report XTAB either the MFN must be invalid, must be a Xen frame, >> or it must be impossible to obtain a reference to the page - all of >> which shouldn''t hold for the majority of a HVM guest''s pages). >> > Michal, is it possible you are running new tools against a slightly older > hypervisor? The tools changes had a corresponding, non-version-checked, > modification to a hypercall. Without running the modified hypervisor, HVM > save would indeed behave very oddly! > > -- Keir > > >Keir, I don''t think so since the hypervisor datetime in /boot is the same like for kernel and tools. Michal -- Michal Novotny<minovotn@redhat.com>, RHCE Virtualization Team (xen userspace), Red Hat _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 22/06/2010 11:05, "Michal Novotny" <minovotn@redhat.com> wrote:>> Michal, is it possible you are running new tools against a slightly older >> hypervisor? The tools changes had a corresponding, non-version-checked, >> modification to a hypercall. Without running the modified hypervisor, HVM >> save would indeed behave very oddly! >> > Keir, I don''t think so since the hypervisor datetime in /boot is the > same like for kernel and tools.Well, it remains something to doublecheck at least. There''s no good reason that HVM save should work okay for everyone else and only fail so miserably for you. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On 06/22/2010 12:23 PM, Keir Fraser wrote:> On 22/06/2010 11:05, "Michal Novotny"<minovotn@redhat.com> wrote: > > >>> Michal, is it possible you are running new tools against a slightly older >>> hypervisor? The tools changes had a corresponding, non-version-checked, >>> modification to a hypercall. Without running the modified hypervisor, HVM >>> save would indeed behave very oddly! >>> >>> >> Keir, I don''t think so since the hypervisor datetime in /boot is the >> same like for kernel and tools. >> > Well, it remains something to doublecheck at least. There''s no good reason > that HVM save should work okay for everyone else and only fail so miserably > for you. > > -- Keir > > >I don''t know but I''ll investigate it further but it''s working with the patch 21615 patch reverted. Maybe something''s wrong with my test setup. Some further investigation is necessary. Michal -- Michal Novotny<minovotn@redhat.com>, RHCE Virtualization Team (xen userspace), Red Hat _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel