Stefan Berger
2007-Feb-28 04:46 UTC
[Xen-devel] Error in XendCheckpoint: failed to flush file
I get these errors pretty often lately. This is on a x86-32 machine with changes 14142. Does anyone else these this? Local migration and suspend/resume fail quite frequently. [2007-02-27 23:39:56 20114] DEBUG (XendCheckpoint:236) [xc_restore]: /usr/lib/xen/bin/xc_restore 23 262 18432 1 2 0 0 0 [2007-02-27 23:39:56 20114] INFO (XendCheckpoint:343) xc_linux_restore start: max_pfn = 4800 [2007-02-27 23:39:56 20114] INFO (XendCheckpoint:343) Reloading memory pages: 0% [2007-02-27 23:39:56 20114] INFO (XendCheckpoint:343) Saving memory pages: iter 1 37%ERROR Internal error: Failed to flush file: Invalid argument (22 = Invalid argument) Stefan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Feb-28 07:04 UTC
Re: [Xen-devel] Error in XendCheckpoint: failed to flush file
I''m not sure the two are related. Fsync, lseek(), fadvise() will all fail if the fd maps to a socket. The failure is harmless and the error return code is ignored. The error to xend.log is overly noisy and needs cleaning up but unfortunately the suspend/resume problems probably lie elsewhere. What failure symptoms do you see? -- Keir On 28/2/07 04:46, "Stefan Berger" <stefanb@us.ibm.com> wrote:> I get these errors pretty often lately. This is on a x86-32 machine with > changes 14142. Does anyone else these this? Local migration and > suspend/resume fail quite frequently. > > [2007-02-27 23:39:56 20114] DEBUG (XendCheckpoint:236) > [xc_restore]: /usr/lib/xen/bin/xc_restore 23 262 18432 1 2 0 0 0 > [2007-02-27 23:39:56 20114] INFO (XendCheckpoint:343) xc_linux_restore > start: max_pfn = 4800 > [2007-02-27 23:39:56 20114] INFO (XendCheckpoint:343) Reloading memory > pages: 0% > [2007-02-27 23:39:56 20114] INFO (XendCheckpoint:343) Saving memory > pages: iter 1 37%ERROR Internal error: Failed to flush file: Invalid > argument (22 = Invalid argument)_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefan Berger
2007-Feb-28 15:48 UTC
Re: [Xen-devel] Error in XendCheckpoint: failed to flush file
Hi Keir, here are some of the symptoms I get. ---------------- on x86-32 with changeset 14142 (this is on a blade) after a fresh ''hg clone'' and build: In the xm-test suite for example the ''restore'' test cases fail: make -C tests/restore check-TESTS REASON: Domain still running after save! FAIL: 01_restore_basic_pos.test PASS: 02_restore_badparm_neg.test PASS: 03_restore_badfilename_neg.test REASON: Failed to create domain FAIL: 04_restore_withdevices_pos.test similar errors in the save test case: REASON: Domain still running after save! FAIL: 01_save_basic_pos.test PASS: 02_save_badparm_neg.test PASS: 03_save_bogusfile_neg.test Is also see this here in ''xm dmesg''. (XEN) *** Serial input -> DOM0 (type ''CTRL-a'' three times to switch input to Xen). (XEN) platform_hypercall.c:142: Domain 0 says that IO-APIC REGSEL is good (XEN) grant_table.c:286:d0 Bad flags (0) or dom (0). (expected dom 0) (XEN) grant_table.c:251:d0 Bad ref (2097664). (XEN) grant_table.c:286:d0 Bad flags (0) or dom (0). (expected dom 0) When doing a ''reboot'' with the ''reboot'' command that blade does not actually reboot but hangs after completely shutting down domain-0. I do not see this problem on other machines, though. ------------ on x86-64 (this is also a blade) after a fresh ''hg clone'' and build: Intel-Xeon 3.2Ghz 2 physical processor with hyperthreading each -> 4 logical processors domain-0 has dom0_mem=10240000 The ''save'' tests just crashed that machine (twice). :-/ I''ll post a migration test that exposes the following error on x86-64 (only!) inside the guest when running that test 02_migrate_localhost_loop. To see these messages I modified the ''debugMe'' variable in xm-test/lib/XmTestLib/Console.py line 68 and set it to ''True''. @%@%> XENBUS error -12 while reading message XENBUS error -12 while reading message XENBUS unexpected type [1325400064], expected [4] XENBUS error -12 while reading message XENBUS error -12 while reading message [...] XENBUS error -12 while reading message XENBUS: Unable to read cpu state XENBUS: Unable to read cpu state When building the sources with ''make -j 16'' that blade''s VNC output freezes at some point. Pinging it still works, but ssh''ing into it does not respond within reasonable time. Building the sources with non-parallel ''make'' works fine. Stefan xen-devel-bounces@lists.xensource.com wrote on 02/28/2007 02:04:22 AM:> I''m not sure the two are related. Fsync, lseek(), fadvise() will allfail if> the fd maps to a socket. The failure is harmless and the error returncode> is ignored. The error to xend.log is overly noisy and needs cleaning upbut> unfortunately the suspend/resume problems probably lie elsewhere. What > failure symptoms do you see? > > -- Keir > > On 28/2/07 04:46, "Stefan Berger" <stefanb@us.ibm.com> wrote: > > > I get these errors pretty often lately. This is on a x86-32 machinewith> > changes 14142. Does anyone else these this? Local migration and > > suspend/resume fail quite frequently. > > > > [2007-02-27 23:39:56 20114] DEBUG (XendCheckpoint:236) > > [xc_restore]: /usr/lib/xen/bin/xc_restore 23 262 18432 1 2 0 0 0 > > [2007-02-27 23:39:56 20114] INFO (XendCheckpoint:343) xc_linux_restore > > start: max_pfn = 4800 > > [2007-02-27 23:39:56 20114] INFO (XendCheckpoint:343) Reloading memory > > pages: 0% > > [2007-02-27 23:39:56 20114] INFO (XendCheckpoint:343) Saving memory > > pages: iter 1 37%ERROR Internal error: Failed to flush file: Invalid > > argument (22 = Invalid argument) > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Graham, Simon
2007-Feb-28 16:15 UTC
RE: [Xen-devel] Error in XendCheckpoint: failed to flush file
> I''m not sure the two are related. Fsync, lseek(), fadvise() will all > fail if > the fd maps to a socket. The failure is harmless and the error return > code > is ignored. The error to xend.log is overly noisy and needs cleaningup Argh! Can''t believe I missed these errors in my testing of the change! I agree with Keir that they are harmless but noisy - patch to quieten things down will follow shortly... Note that I thought about plumbing the live flag through to xc_linux_restore as is done with xc_linux_save but decided I didn''t want to change the API... therefore I changed xc_linux_restore to figure out if the fd is a socket or not... hopefully this works on Solaris??? (just testing now). /simgr _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Feb-28 17:17 UTC
Re: [Xen-devel] Error in XendCheckpoint: failed to flush file
On 28/2/07 16:15, "Graham, Simon" <Simon.Graham@stratus.com> wrote:> Note that I thought about plumbing the live flag through to > xc_linux_restore as is done with xc_linux_save but decided I didn''t want > to change the API... therefore I changed xc_linux_restore to figure out > if the fd is a socket or not... hopefully this works on Solaris??? (just > testing now).Use of the live flag to gate the flush/sync calls is not a good idea. We can ''live save'' to disc (checkpointing) and we can ''non-live migrate'' via a socket. So the live flag is not really an indicator of what the file descriptor maps to (file vs. socket). Best to unconditionally try the flush/sync and ignore errors. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Graham, Simon
2007-Feb-28 18:07 UTC
RE: [Xen-devel] Error in XendCheckpoint: failed to flush file
> Use of the live flag to gate the flush/sync calls is not a good idea. > We can > ''live save'' to disc (checkpointing) and we can ''non-live migrate'' viaa> socket. So the live flag is not really an indicator of what the file > descriptor maps to (file vs. socket). Best to unconditionally try the > flush/sync and ignore errors. >OK. Do you think it''s worth checking the fd type with stat and only doing the flush/fadvise if it''s not a socket? /simgr _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Feb-28 18:20 UTC
Re: [Xen-devel] Error in XendCheckpoint: failed to flush file
On 28/2/07 18:07, "Graham, Simon" <Simon.Graham@stratus.com> wrote:> OK. Do you think it''s worth checking the fd type with stat and only > doing the flush/fadvise if it''s not a socket?My guess is probably not. I think fsync/fadvise/lseek are all well-defined to fail without trashing things if passed a socket. There''s no reason to suspect that doing a stat() will be any quicker than just letting the fsync() or fadvise() fail. I''ve checked in some cleanups in this area as c/s 14176:d66dff0933. Hopefully it''ll be in the public tree rsn, assuming I''ve fixed save/restore sufficently well! -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefan Berger
2007-Feb-28 21:18 UTC
Re: [Xen-devel] Error in XendCheckpoint: failed to flush file
xen-devel-bounces@lists.xensource.com wrote on 02/28/2007 01:20:15 PM:> On 28/2/07 18:07, "Graham, Simon" <Simon.Graham@stratus.com> wrote: > > > OK. Do you think it''s worth checking the fd type with stat and only > > doing the flush/fadvise if it''s not a socket? > > My guess is probably not. I think fsync/fadvise/lseek are allwell-defined> to fail without trashing things if passed a socket. There''s no reason to > suspect that doing a stat() will be any quicker than just letting the > fsync() or fadvise() fail. > > I''ve checked in some cleanups in this area as c/s 14176:d66dff0933. > Hopefully it''ll be in the public tree rsn, assuming I''ve fixedsave/restore> sufficently well!All the xm-test that I reported that weren''t working before are working now. Thanks. Stefan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel