Sorry for another possibly stupid question: I''ve observed that for a pv domain that''s been updated to a 2.6.31 kernel (straight from kernel.org), "xm save" never completes. When the older kernel (2.6.18) is booted, "xm save" works fine. Is this a known problem... or perhaps xm save has never worked with an upstream pv kernel and I''ve never noticed? I''d assume migrate and live migrate would fail also but haven''t tried them. Thanks, Dan P.S. This is with very recent xen-unstable, c/s 20399. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2009-Nov-06 20:37 UTC
Re: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate
On Fri, Nov 06, 2009 at 10:37:49AM -0800, Dan Magenheimer wrote:> Sorry for another possibly stupid question: > > I''ve observed that for a pv domain that''s been updated > to a 2.6.31 kernel (straight from kernel.org), "xm save" > never completes. When the older kernel (2.6.18) > is booted, "xm save" works fine. Is this a known problem... > or perhaps xm save has never worked with an upstream pv > kernel and I''ve never noticed? > > I''d assume migrate and live migrate would fail also but > haven''t tried them. >Just checking.. are you running the latest 2.6.31.5 ? I think there has been multiple xen related bugfixes in the 2.6.31.X releases. -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Nov-06 22:27 UTC
RE: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate
> On Fri, Nov 06, 2009 at 10:37:49AM -0800, Dan Magenheimer wrote: > > Sorry for another possibly stupid question: > > > > I''ve observed that for a pv domain that''s been updated > > to a 2.6.31 kernel (straight from kernel.org), "xm save" > > never completes. When the older kernel (2.6.18) > > is booted, "xm save" works fine. Is this a known problem... > > or perhaps xm save has never worked with an upstream pv > > kernel and I''ve never noticed? > > > > I''d assume migrate and live migrate would fail also but > > haven''t tried them. > > > > Just checking.. are you running the latest 2.6.31.5 ? I think > there has > been multiple xen related bugfixes in the 2.6.31.X releases. > > -- PasiNo it was plain 2.6.31. But I downloaded/built 2.6.31.5 and can''t even get it to boot (and no console or VNC output at all). Are CONFIG changes required betwen 2.6.31 and 2.6.31.5 for Xen? (I checked and I am using the same .config.) Trying to reproduce on a different machine, just to verify. Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2009-Nov-06 22:30 UTC
Re: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate
On Fri, Nov 06, 2009 at 02:27:27PM -0800, Dan Magenheimer wrote:> > On Fri, Nov 06, 2009 at 10:37:49AM -0800, Dan Magenheimer wrote: > > > Sorry for another possibly stupid question: > > > > > > I''ve observed that for a pv domain that''s been updated > > > to a 2.6.31 kernel (straight from kernel.org), "xm save" > > > never completes. When the older kernel (2.6.18) > > > is booted, "xm save" works fine. Is this a known problem... > > > or perhaps xm save has never worked with an upstream pv > > > kernel and I''ve never noticed? > > > > > > I''d assume migrate and live migrate would fail also but > > > haven''t tried them. > > > > > > > Just checking.. are you running the latest 2.6.31.5 ? I think > > there has > > been multiple xen related bugfixes in the 2.6.31.X releases. > > > > -- Pasi > > No it was plain 2.6.31. But I downloaded/built 2.6.31.5 and > can''t even get it to boot (and no console or VNC output at > all). Are CONFIG changes required betwen 2.6.31 and 2.6.31.5 > for Xen? (I checked and I am using the same .config.) > > Trying to reproduce on a different machine, just to verify. >There shouldn''t be any .config changes needed. Can you paste the full domU console output? Does it crash or? -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Nov-07 00:08 UTC
RE: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate
> On Fri, Nov 06, 2009 at 02:27:27PM -0800, Dan Magenheimer wrote: > > > On Fri, Nov 06, 2009 at 10:37:49AM -0800, Dan Magenheimer wrote: > > > > Sorry for another possibly stupid question: > > > > > > > > I''ve observed that for a pv domain that''s been updated > > > > to a 2.6.31 kernel (straight from kernel.org), "xm save" > > > > never completes. When the older kernel (2.6.18) > > > > is booted, "xm save" works fine. Is this a known problem... > > > > or perhaps xm save has never worked with an upstream pv > > > > kernel and I''ve never noticed? > > > > > > > > I''d assume migrate and live migrate would fail also but > > > > haven''t tried them. > > > > > > > > > > Just checking.. are you running the latest 2.6.31.5 ? I think > > > there has > > > been multiple xen related bugfixes in the 2.6.31.X releases. > > > > > > -- Pasi > > > > No it was plain 2.6.31. But I downloaded/built 2.6.31.5 and > > can''t even get it to boot (and no console or VNC output at > > all). Are CONFIG changes required betwen 2.6.31 and 2.6.31.5 > > for Xen? (I checked and I am using the same .config.) > > > > Trying to reproduce on a different machine, just to verify. > > There shouldn''t be any .config changes needed. > > Can you paste the full domU console output? Does it crash or? > > -- PasiWell, first, I got 2.6.31.5 to boot in a PV guest in another machine and it fails to save also. Are you able to save 2.6.31{,.5} successfully? On latest xen-unstable? (NOTE: Yes, I do have CONFIG_XEN_SAVE_RESTORE=y... don''t know if that is important.) (On the machine I couldn''t boot 2.6.31.5 as a PV guest, there was absolutely no console output. However, I think tools are out-of-date on that machine so ignore that.) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Nov-07 00:19 UTC
Re: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate
On 11/06/09 12:37, Pasi Kärkkäinen wrote:> On Fri, Nov 06, 2009 at 10:37:49AM -0800, Dan Magenheimer wrote: > >> Sorry for another possibly stupid question: >> >> I''ve observed that for a pv domain that''s been updated >> to a 2.6.31 kernel (straight from kernel.org), "xm save" >> never completes. When the older kernel (2.6.18) >> is booted, "xm save" works fine. Is this a known problem... >> or perhaps xm save has never worked with an upstream pv >> kernel and I''ve never noticed? >> >> I''d assume migrate and live migrate would fail also but >> haven''t tried them. >> >> > Just checking.. are you running the latest 2.6.31.5 ? I think there has > been multiple xen related bugfixes in the 2.6.31.X releases. >Nothing relating to save/restore. Does it work for you? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2009-Nov-07 11:09 UTC
Re: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate
On Fri, Nov 06, 2009 at 04:08:26PM -0800, Dan Magenheimer wrote:> > On Fri, Nov 06, 2009 at 02:27:27PM -0800, Dan Magenheimer wrote: > > > > On Fri, Nov 06, 2009 at 10:37:49AM -0800, Dan Magenheimer wrote: > > > > > Sorry for another possibly stupid question: > > > > > > > > > > I''ve observed that for a pv domain that''s been updated > > > > > to a 2.6.31 kernel (straight from kernel.org), "xm save" > > > > > never completes. When the older kernel (2.6.18) > > > > > is booted, "xm save" works fine. Is this a known problem... > > > > > or perhaps xm save has never worked with an upstream pv > > > > > kernel and I''ve never noticed? > > > > > > > > > > I''d assume migrate and live migrate would fail also but > > > > > haven''t tried them. > > > > > > > > > > > > > Just checking.. are you running the latest 2.6.31.5 ? I think > > > > there has > > > > been multiple xen related bugfixes in the 2.6.31.X releases. > > > > > > > > -- Pasi > > > > > > No it was plain 2.6.31. But I downloaded/built 2.6.31.5 and > > > can''t even get it to boot (and no console or VNC output at > > > all). Are CONFIG changes required betwen 2.6.31 and 2.6.31.5 > > > for Xen? (I checked and I am using the same .config.) > > > > > > Trying to reproduce on a different machine, just to verify. > > > > There shouldn''t be any .config changes needed. > > > > Can you paste the full domU console output? Does it crash or? > > > > -- Pasi > > Well, first, I got 2.6.31.5 to boot in a PV guest in another > machine and it fails to save also. Are you able to save > 2.6.31{,.5} successfully? On latest xen-unstable? > (NOTE: Yes, I do have CONFIG_XEN_SAVE_RESTORE=y... don''t > know if that is important.) >I''ll have to try it later today..> (On the machine I couldn''t boot 2.6.31.5 as a PV guest, there > was absolutely no console output. However, I think tools > are out-of-date on that machine so ignore that.)Did you have "console=hvc0 earlyprintk=xen" in the domU kernel parameters? You might also change the xen guest cfgfile so that you have on_crash=preserve and then when the PV guest is crashed run this: /usr/lib/xen/bin/xenctx -s System.map-domUkernelversion <domid> (if you have 64b host the xenctx binary might be under /usr/lib64/) to get a stack trace.. -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Nov-07 15:32 UTC
RE: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate
> > Well, first, I got 2.6.31.5 to boot in a PV guest in another > > machine and it fails to save also. Are you able to save > > 2.6.31{,.5} successfully? On latest xen-unstable? > > (NOTE: Yes, I do have CONFIG_XEN_SAVE_RESTORE=y... don''t > > know if that is important.) > > I''ll have to try it later today..Let me know.> > (On the machine I couldn''t boot 2.6.31.5 as a PV guest, there > > was absolutely no console output. However, I think tools > > are out-of-date on that machine so ignore that.) > > Did you have "console=hvc0 earlyprintk=xen" in the domU kernel > parameters?No, but that didn''t work either.> You might also change the xen guest cfgfile so that you have > on_crash=preserve and then when the PV guest is crashed run this: > > /usr/lib/xen/bin/xenctx -s System.map-domUkernelversion <domid> > > (if you have 64b host the xenctx binary might be under /usr/lib64/) > > to get a stack trace..Very interesting and useful! I was completely unaware of xenctx and could have used it many times in tmem development! The results explain why I can get it to run on one machine (an older laptop) and not run on another machine (a Nehalem system)... looks like this is maybe related to the cpuid-extended-topology-leaf bug that Jeremy sent a fix for upstream recently. cs:eip: e019:c040342d xen_cpuid+0x46 flags: 00001206 i nz p ss:esp: e021:c0779ee4 eax: 00000001 ebx: 00000002 ecx: 00000100 edx: 00000001 esi: c0779f1c edi: c0779f18 ebp: c0779f24 ds: e021 es: e021 fs: 00d8 gs: 0000 Code (instr addr c040342d) 24 04 8b 15 a4 02 7c c0 89 54 24 08 8b 0e 0f 0b 78 65 6e 0f a2 <89> 45 00 8b 04 24 89 18 89 0e 89 Stack: c0779f20 ffffffff ffffffff c07c0360 c0779f18 c0779f1c c0779f20 c066fd0f c0779f18 c0779f24 00000002 16aee301 00000001 00000001 16aee301 00000002 0000000b c07c03cc c07c0360 c07c0360 c07c03d8 c0670ed8 c0779f58 00000001 c07c0360 c0779f60 c066fe6a c0779f60 c0779f60 00000003 00000001 00000000 Call Trace: [<c040342d>] xen_cpuid+0x46 <-- [<c066fd0f>] detect_extended_topology+0xae [<c0670ed8>] init_intel+0x140 [<c066fe6a>] init_scattered_cpuid_features+0x82 [<c06705e2>] identify_cpu+0x22d [<c040584c>] xen_force_evtchn_callback+0xc [<c0405e78>] check_events+0x8 [<c07c9dec>] identify_boot_cpu+0xa [<c07c9e9a>] check_bugs+0x8 [<c07c27bd>] start_kernel+0x2a0 [<c07c5206>] xen_start_kernel+0x340 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2009-Nov-08 14:17 UTC
Re: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate, domU BUG()
On Sat, Nov 07, 2009 at 07:32:49AM -0800, Dan Magenheimer wrote:> > > Well, first, I got 2.6.31.5 to boot in a PV guest in another > > > machine and it fails to save also. Are you able to save > > > 2.6.31{,.5} successfully? On latest xen-unstable? > > > (NOTE: Yes, I do have CONFIG_XEN_SAVE_RESTORE=y... don''t > > > know if that is important.) > > > > I''ll have to try it later today.. > > Let me know. >Ok. I just tried with a Fedora 12 (rawhide) PV guest. I was able to "xm save" and "xm restore" it without problems. But I noticed there was a BUG printed on the guest console: http://pasik.reaktio.net/xen/debug/dmesg-2.6.31.5-122.fc12.x86_64-saverestore.txt BUG: sleeping function called from invalid context at kernel/mutex.c:94 in_atomic(): 0, irqs_disabled(): 1, pid: 1052, name: kstop/0 Pid: 1052, comm: kstop/0 Not tainted 2.6.31.5-122.fc12.x86_64 #1 Call Trace: [<ffffffff8104021f>] __might_sleep+0xe6/0xe8 [<ffffffff81419c84>] mutex_lock+0x22/0x4e [<ffffffff812afdce>] dpm_resume_noirq+0x21/0x11f [<ffffffff81272b05>] xen_suspend+0xca/0xd1 [<ffffffff8108c172>] stop_cpu+0x8c/0xd2 [<ffffffff8106350c>] worker_thread+0x18a/0x224 [<ffffffff81067ae7>] ? autoremove_wake_function+0x0/0x39 [<ffffffff8141ab29>] ? _spin_unlock_irqrestore+0x19/0x1b [<ffffffff81063382>] ? worker_thread+0x0/0x224 [<ffffffff81067765>] kthread+0x91/0x99 [<ffffffff81012daa>] child_rip+0xa/0x20 [<ffffffff81011f97>] ? int_ret_from_sys_call+0x7/0x1b [<ffffffff8101271d>] ? retint_restore_args+0x5/0x6 [<ffffffff81012da0>] ? child_rip+0x0/0x20 More information about my setup: Host/dom0: Fedora 12 (latest rawhide) with included Xen 3.4.1-5 and custom 2.6.31.5 x86_64 pv_ops dom0 kernel (a couple of days old). Guest/domU: Fedora 12 (latest rawhide) with the included/default 2.6.31.5-122.fc12.x86_64 kernel.> > > (On the machine I couldn''t boot 2.6.31.5 as a PV guest, there > > > was absolutely no console output. However, I think tools > > > are out-of-date on that machine so ignore that.) > > > > Did you have "console=hvc0 earlyprintk=xen" in the domU kernel > > parameters? > > No, but that didn''t work either. >Ok.. then it crashes really early.> > You might also change the xen guest cfgfile so that you have > > on_crash=preserve and then when the PV guest is crashed run this: > > > > /usr/lib/xen/bin/xenctx -s System.map-domUkernelversion <domid> > > > > (if you have 64b host the xenctx binary might be under /usr/lib64/) > > > > to get a stack trace.. > > Very interesting and useful! I was completely unaware of > xenctx and could have used it many times in tmem development! > > The results explain why I can get it to run on > one machine (an older laptop) and not run on another > machine (a Nehalem system)... looks like this is maybe > related to the cpuid-extended-topology-leaf bug that Jeremy > sent a fix for upstream recently. >Did you try with that patch applied? -- Pasi> cs:eip: e019:c040342d xen_cpuid+0x46 > flags: 00001206 i nz p > ss:esp: e021:c0779ee4 > eax: 00000001 ebx: 00000002 ecx: 00000100 edx: 00000001 > esi: c0779f1c edi: c0779f18 ebp: c0779f24 > ds: e021 es: e021 fs: 00d8 gs: 0000 > Code (instr addr c040342d) > 24 04 8b 15 a4 02 7c c0 89 54 24 08 8b 0e 0f 0b 78 65 6e 0f a2 <89> 45 00 8b 04 24 89 18 89 0e 89 > > > Stack: > c0779f20 ffffffff ffffffff c07c0360 c0779f18 c0779f1c c0779f20 c066fd0f > c0779f18 c0779f24 00000002 16aee301 00000001 00000001 16aee301 00000002 > 0000000b c07c03cc c07c0360 c07c0360 c07c03d8 c0670ed8 c0779f58 00000001 > c07c0360 c0779f60 c066fe6a c0779f60 c0779f60 00000003 00000001 00000000 > > Call Trace: > [<c040342d>] xen_cpuid+0x46 <-- > [<c066fd0f>] detect_extended_topology+0xae > [<c0670ed8>] init_intel+0x140 > [<c066fe6a>] init_scattered_cpuid_features+0x82 > [<c06705e2>] identify_cpu+0x22d > [<c040584c>] xen_force_evtchn_callback+0xc > [<c0405e78>] check_events+0x8 > [<c07c9dec>] identify_boot_cpu+0xa > [<c07c9e9a>] check_bugs+0x8 > [<c07c27bd>] start_kernel+0x2a0 > [<c07c5206>] xen_start_kernel+0x340_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2009-Nov-08 14:20 UTC
Re: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate, domU BUG
On Sun, Nov 08, 2009 at 04:17:43PM +0200, Pasi Kärkkäinen wrote:> On Sat, Nov 07, 2009 at 07:32:49AM -0800, Dan Magenheimer wrote: > > > > Well, first, I got 2.6.31.5 to boot in a PV guest in another > > > > machine and it fails to save also. Are you able to save > > > > 2.6.31{,.5} successfully? On latest xen-unstable? > > > > (NOTE: Yes, I do have CONFIG_XEN_SAVE_RESTORE=y... don''t > > > > know if that is important.) > > > > > > I''ll have to try it later today.. > > > > Let me know. > > > > Ok. I just tried with a Fedora 12 (rawhide) PV guest. I was able to > "xm save" and "xm restore" it without problems. > > But I noticed there was a BUG printed on the guest console: > http://pasik.reaktio.net/xen/debug/dmesg-2.6.31.5-122.fc12.x86_64-saverestore.txt > > BUG: sleeping function called from invalid context at kernel/mutex.c:94 > in_atomic(): 0, irqs_disabled(): 1, pid: 1052, name: kstop/0 > Pid: 1052, comm: kstop/0 Not tainted 2.6.31.5-122.fc12.x86_64 #1 > Call Trace: > [<ffffffff8104021f>] __might_sleep+0xe6/0xe8 > [<ffffffff81419c84>] mutex_lock+0x22/0x4e > [<ffffffff812afdce>] dpm_resume_noirq+0x21/0x11f > [<ffffffff81272b05>] xen_suspend+0xca/0xd1 > [<ffffffff8108c172>] stop_cpu+0x8c/0xd2 > [<ffffffff8106350c>] worker_thread+0x18a/0x224 > [<ffffffff81067ae7>] ? autoremove_wake_function+0x0/0x39 > [<ffffffff8141ab29>] ? _spin_unlock_irqrestore+0x19/0x1b > [<ffffffff81063382>] ? worker_thread+0x0/0x224 > [<ffffffff81067765>] kthread+0x91/0x99 > [<ffffffff81012daa>] child_rip+0xa/0x20 > [<ffffffff81011f97>] ? int_ret_from_sys_call+0x7/0x1b > [<ffffffff8101271d>] ? retint_restore_args+0x5/0x6 > [<ffffffff81012da0>] ? child_rip+0x0/0x20 >Oh, I forgot to mention that this BUG is non-fatal. The guest still works after that.. -- Pasi> > More information about my setup: > > Host/dom0: Fedora 12 (latest rawhide) with included Xen 3.4.1-5 and > custom 2.6.31.5 x86_64 pv_ops dom0 kernel (a couple of days old). > > Guest/domU: Fedora 12 (latest rawhide) with the included/default > 2.6.31.5-122.fc12.x86_64 kernel. > > > > > (On the machine I couldn''t boot 2.6.31.5 as a PV guest, there > > > > was absolutely no console output. However, I think tools > > > > are out-of-date on that machine so ignore that.) > > > > > > Did you have "console=hvc0 earlyprintk=xen" in the domU kernel > > > parameters? > > > > No, but that didn''t work either. > > > > Ok.. then it crashes really early. > > > > You might also change the xen guest cfgfile so that you have > > > on_crash=preserve and then when the PV guest is crashed run this: > > > > > > /usr/lib/xen/bin/xenctx -s System.map-domUkernelversion <domid> > > > > > > (if you have 64b host the xenctx binary might be under /usr/lib64/) > > > > > > to get a stack trace.. > > > > Very interesting and useful! I was completely unaware of > > xenctx and could have used it many times in tmem development! > > > > The results explain why I can get it to run on > > one machine (an older laptop) and not run on another > > machine (a Nehalem system)... looks like this is maybe > > related to the cpuid-extended-topology-leaf bug that Jeremy > > sent a fix for upstream recently. > > > > Did you try with that patch applied? > > -- Pasi > > > cs:eip: e019:c040342d xen_cpuid+0x46 > > flags: 00001206 i nz p > > ss:esp: e021:c0779ee4 > > eax: 00000001 ebx: 00000002 ecx: 00000100 edx: 00000001 > > esi: c0779f1c edi: c0779f18 ebp: c0779f24 > > ds: e021 es: e021 fs: 00d8 gs: 0000 > > Code (instr addr c040342d) > > 24 04 8b 15 a4 02 7c c0 89 54 24 08 8b 0e 0f 0b 78 65 6e 0f a2 <89> 45 00 8b 04 24 89 18 89 0e 89 > > > > > > Stack: > > c0779f20 ffffffff ffffffff c07c0360 c0779f18 c0779f1c c0779f20 c066fd0f > > c0779f18 c0779f24 00000002 16aee301 00000001 00000001 16aee301 00000002 > > 0000000b c07c03cc c07c0360 c07c0360 c07c03d8 c0670ed8 c0779f58 00000001 > > c07c0360 c0779f60 c066fe6a c0779f60 c0779f60 00000003 00000001 00000000 > > > > Call Trace: > > [<c040342d>] xen_cpuid+0x46 <-- > > [<c066fd0f>] detect_extended_topology+0xae > > [<c0670ed8>] init_intel+0x140 > > [<c066fe6a>] init_scattered_cpuid_features+0x82 > > [<c06705e2>] identify_cpu+0x22d > > [<c040584c>] xen_force_evtchn_callback+0xc > > [<c0405e78>] check_events+0x8 > > [<c07c9dec>] identify_boot_cpu+0xa > > [<c07c9e9a>] check_bugs+0x8 > > [<c07c27bd>] start_kernel+0x2a0 > > [<c07c5206>] xen_start_kernel+0x340 > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Nov-08 15:29 UTC
RE: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate, domU BUG()
> > > > machine and it fails to save also. Are you able to save > > > > 2.6.31{,.5} successfully? On latest xen-unstable? > > > > (NOTE: Yes, I do have CONFIG_XEN_SAVE_RESTORE=y... don''t > > > > know if that is important.) > > Ok. I just tried with a Fedora 12 (rawhide) PV guest. I was able to > "xm save" and "xm restore" it without problems. > > But I noticed there was a BUG printed on the guest console: > http://pasik.reaktio.net/xen/debug/dmesg-2.6.31.5-122.fc12.x86 > _64-saverestore.txt > BUG: sleeping function called from invalid context at > kernel/mutex.c:94 > in_atomic(): 0, irqs_disabled(): 1, pid: 1052, name: kstop/0 > Pid: 1052, comm: kstop/0 Not tainted 2.6.31.5-122.fc12.x86_64 #1Ok, so it appears there is something problematic with saving an upstream kernel. It might be (partially) fixed in Fedora 12 or maybe there is some other environmental difference which makes save fail entirely on my system.> > The results explain why I can get it to run on > > one machine (an older laptop) and not run on another > > machine (a Nehalem system)... looks like this is maybe > > related to the cpuid-extended-topology-leaf bug that Jeremy > > sent a fix for upstream recently. > > Did you try with that patch applied?No, the patch wasn''t posted, just a pull request to Linus, so I don''t have the patch (and am not a git expert so am not sure how to get it). http://lists.xensource.com/archives/html/xen-devel/2009-11/msg00182.html So I''ll try it again when .6 or .7 is available. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2009-Nov-08 15:41 UTC
Re: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate, domU BUG()
On Sun, Nov 08, 2009 at 07:29:58AM -0800, Dan Magenheimer wrote:> > > > > machine and it fails to save also. Are you able to save > > > > > 2.6.31{,.5} successfully? On latest xen-unstable? > > > > > (NOTE: Yes, I do have CONFIG_XEN_SAVE_RESTORE=y... don''t > > > > > know if that is important.) > > > > Ok. I just tried with a Fedora 12 (rawhide) PV guest. I was able to > > "xm save" and "xm restore" it without problems. > > > > But I noticed there was a BUG printed on the guest console: > > http://pasik.reaktio.net/xen/debug/dmesg-2.6.31.5-122.fc12.x86 > > _64-saverestore.txt > > BUG: sleeping function called from invalid context at > > kernel/mutex.c:94 > > in_atomic(): 0, irqs_disabled(): 1, pid: 1052, name: kstop/0 > > Pid: 1052, comm: kstop/0 Not tainted 2.6.31.5-122.fc12.x86_64 #1 > > Ok, so it appears there is something problematic with > saving an upstream kernel. It might be (partially) fixed > in Fedora 12 or maybe there is some other environmental > difference which makes save fail entirely on my system. >Yeah, fedora kernel has some patches, but it should be pretty close to upstream kernel.. btw was your guest UP or SMP? Mine was UP..> > > The results explain why I can get it to run on > > > one machine (an older laptop) and not run on another > > > machine (a Nehalem system)... looks like this is maybe > > > related to the cpuid-extended-topology-leaf bug that Jeremy > > > sent a fix for upstream recently. > > > > Did you try with that patch applied? > > No, the patch wasn''t posted, just a pull request to Linus, > so I don''t have the patch (and am not a git expert so > am not sure how to get it). > > http://lists.xensource.com/archives/html/xen-devel/2009-11/msg00182.html > > So I''ll try it again when .6 or .7 is available.See here for changelog: http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=shortlog;h=bugfix You can get the diffs/patches from there using the links.. -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2009-Nov-08 16:48 UTC
Re: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate, domU BUG()
On Sun, Nov 08, 2009 at 05:41:53PM +0200, Pasi Kärkkäinen wrote:> On Sun, Nov 08, 2009 at 07:29:58AM -0800, Dan Magenheimer wrote: > > > > > > machine and it fails to save also. Are you able to save > > > > > > 2.6.31{,.5} successfully? On latest xen-unstable? > > > > > > (NOTE: Yes, I do have CONFIG_XEN_SAVE_RESTORE=y... don''t > > > > > > know if that is important.) > > > > > > Ok. I just tried with a Fedora 12 (rawhide) PV guest. I was able to > > > "xm save" and "xm restore" it without problems. > > > > > > But I noticed there was a BUG printed on the guest console: > > > http://pasik.reaktio.net/xen/debug/dmesg-2.6.31.5-122.fc12.x86 > > > _64-saverestore.txt > > > BUG: sleeping function called from invalid context at > > > kernel/mutex.c:94 > > > in_atomic(): 0, irqs_disabled(): 1, pid: 1052, name: kstop/0 > > > Pid: 1052, comm: kstop/0 Not tainted 2.6.31.5-122.fc12.x86_64 #1 > > > > Ok, so it appears there is something problematic with > > saving an upstream kernel. It might be (partially) fixed > > in Fedora 12 or maybe there is some other environmental > > difference which makes save fail entirely on my system. > > > > Yeah, fedora kernel has some patches, but it should be pretty > close to upstream kernel.. > > btw was your guest UP or SMP? Mine was UP.. >Ok.. saving SMP guest fails for me too: [2009-11-09 23:44:38 1353] DEBUG (XendCheckpoint:110) [xc_save]: /usr/lib64/xen/bin/xc_save 28 2 0 0 0 [2009-11-09 23:44:38 1353] INFO (XendCheckpoint:417) xc_save: failed to get the suspend evtchn port Jeremy: Ideas what''s causing that? "xm save" for UP 2.6.31.5 guest works OK, but for SMP guest it fails with the error above. -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Nov-08 16:54 UTC
RE: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate, domU BUG()
> > Ok, so it appears there is something problematic with > > saving an upstream kernel. It might be (partially) fixed > > in Fedora 12 or maybe there is some other environmental > > difference which makes save fail entirely on my system. > > > > Yeah, fedora kernel has some patches, but it should be pretty > close to upstream kernel.. > > btw was your guest UP or SMP? Mine was UP..Mine was SMP... switching to UP I can now save. BUT... restore doesn''t seem to quite work. The restore completes but I get no response from the VNC console. When I use a tty console, after restore, I am getting an infinite dump of WARNING: at arch/x86/time.c:180 xen_sched_clock+0x2b (see attached). Did you try restore on Fedora 12?> > > > The results explain why I can get it to run on > > > > one machine (an older laptop) and not run on another > > > > machine (a Nehalem system)... looks like this is maybe > > > > related to the cpuid-extended-topology-leaf bug that Jeremy > > > > sent a fix for upstream recently. > > > > > > Did you try with that patch applied? > > > > No, the patch wasn''t posted, just a pull request to Linus, > > so I don''t have the patch (and am not a git expert so > > am not sure how to get it). > > > > http://lists.xensource.com/archives/html/xen-devel/2009-11/msg00182.html > > > > So I''ll try it again when .6 or .7 is available. > > See here for changelog: > http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=shortlog;h=bugfix > > You can get the diffs/patches from there using the links..Thanks. Yes, Jeremy''s patch allows 2.6.31.5 (in a PV domain) to completely boot on my Nehalem box. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2009-Nov-08 17:27 UTC
Re: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate, domU BUG
On Sun, Nov 08, 2009 at 08:54:23AM -0800, Dan Magenheimer wrote:> > > Ok, so it appears there is something problematic with > > > saving an upstream kernel. It might be (partially) fixed > > > in Fedora 12 or maybe there is some other environmental > > > difference which makes save fail entirely on my system. > > > > > > > Yeah, fedora kernel has some patches, but it should be pretty > > close to upstream kernel.. > > > > btw was your guest UP or SMP? Mine was UP.. > > Mine was SMP... switching to UP I can now save. BUT... > restore doesn''t seem to quite work. The restore completes > but I get no response from the VNC console. When I > use a tty console, after restore, I am getting > an infinite dump of > > WARNING: at arch/x86/time.c:180 xen_sched_clock+0x2b > > (see attached). > > Did you try restore on Fedora 12? >Yeah. save+restore for UP F12 guest works for me (except I get that non-fatal BUG on the guest). SMP guest doesn''t work.. save crashes it.> > > > > The results explain why I can get it to run on > > > > > one machine (an older laptop) and not run on another > > > > > machine (a Nehalem system)... looks like this is maybe > > > > > related to the cpuid-extended-topology-leaf bug that Jeremy > > > > > sent a fix for upstream recently. > > > > > > > > Did you try with that patch applied? > > > > > > No, the patch wasn''t posted, just a pull request to Linus, > > > so I don''t have the patch (and am not a git expert so > > > am not sure how to get it). > > > > > > http://lists.xensource.com/archives/html/xen-devel/2009-11/msg00182.html > > > > > > So I''ll try it again when .6 or .7 is available. > > > > See here for changelog: > > http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=shortlog;h=bugfix > > > > You can get the diffs/patches from there using the links.. > > Thanks. Yes, Jeremy''s patch allows 2.6.31.5 (in a PV domain) > to completely boot on my Nehalem box.Ok. But I guess those doesn''t help for the save+restore problem.. -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2009-Nov-10 10:08 UTC
Re: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate fails, domU BUG
Hello, Jeremy: Here''s summary about these save/restore problems using upstream Linux 2.6.31.5 PV guest. For me: - I can "xm save" + "xm restore" UP guest, but I get non-fatal BUG in the guest kernel, see [1]. - "xm save" fails for SMP guest with "failed to get the suspend evtchn port", see [2]. For Dan: - "xm save" works for UP guest, but "xm restore" doesn''t, giving infinite xen_sched_clock related dumps in the guest kernel, see [3]. - "xm save" for SMP guest fails, it never ends. I suspect this is the same problem I''m seeing. [1] non-fatal BUG on the guest kernel after "xm restore": http://pasik.reaktio.net/xen/debug/dmesg-2.6.31.5-122.fc12.x86_64-saverestore.txt [2] "xm log" contains: [2009-11-09 23:44:38 1353] DEBUG (XendCheckpoint:110) [xc_save]: /usr/lib64/xen/bin/xc_save 28 2 0 0 0 [2009-11-09 23:44:38 1353] INFO (XendCheckpoint:417) xc_save: failed to get the suspend evtchn port [3] See the attachment in this email: http://lists.xensource.com/archives/html/xen-devel/2009-11/msg00391.html Any tips how to debug these? -- Pasi On Sun, Nov 08, 2009 at 07:27:47PM +0200, Pasi Kärkkäinen wrote:> On Sun, Nov 08, 2009 at 08:54:23AM -0800, Dan Magenheimer wrote: > > > > Ok, so it appears there is something problematic with > > > > saving an upstream kernel. It might be (partially) fixed > > > > in Fedora 12 or maybe there is some other environmental > > > > difference which makes save fail entirely on my system. > > > > > > > > > > Yeah, fedora kernel has some patches, but it should be pretty > > > close to upstream kernel.. > > > > > > btw was your guest UP or SMP? Mine was UP.. > > > > Mine was SMP... switching to UP I can now save. BUT... > > restore doesn''t seem to quite work. The restore completes > > but I get no response from the VNC console. When I > > use a tty console, after restore, I am getting > > an infinite dump of > > > > WARNING: at arch/x86/time.c:180 xen_sched_clock+0x2b > > > > (see attached). > > > > Did you try restore on Fedora 12? > > > > Yeah. save+restore for UP F12 guest works for me > (except I get that non-fatal BUG on the guest). > > SMP guest doesn''t work.. save crashes it. > > > > > > > The results explain why I can get it to run on > > > > > > one machine (an older laptop) and not run on another > > > > > > machine (a Nehalem system)... looks like this is maybe > > > > > > related to the cpuid-extended-topology-leaf bug that Jeremy > > > > > > sent a fix for upstream recently. > > > > > > > > > > Did you try with that patch applied? > > > > > > > > No, the patch wasn''t posted, just a pull request to Linus, > > > > so I don''t have the patch (and am not a git expert so > > > > am not sure how to get it). > > > > > > > > http://lists.xensource.com/archives/html/xen-devel/2009-11/msg00182.html > > > > > > > > So I''ll try it again when .6 or .7 is available. > > > > > > See here for changelog: > > > http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=shortlog;h=bugfix > > > > > > You can get the diffs/patches from there using the links.. > > > > Thanks. Yes, Jeremy''s patch allows 2.6.31.5 (in a PV domain) > > to completely boot on my Nehalem box. > > Ok. But I guess those doesn''t help for the save+restore problem.. > > -- Pasi >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Nov-12 23:16 UTC
Re: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate, domU BUG()
On 11/08/09 08:48, Pasi Kärkkäinen wrote:> Ok.. saving SMP guest fails for me too: > > [2009-11-09 23:44:38 1353] DEBUG (XendCheckpoint:110) [xc_save]: /usr/lib64/xen/bin/xc_save 28 2 0 0 0 > [2009-11-09 23:44:38 1353] INFO (XendCheckpoint:417) xc_save: failed to get the suspend evtchn port > > Jeremy: Ideas what''s causing that? "xm save" for UP 2.6.31.5 guest works > OK, but for SMP guest it fails with the error above.There''s no "suspend evtchn port" in a pvops kernel. That looks like a Remus thing. I think. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Nov-12 23:21 UTC
Re: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate, domU BUG()
On 11/08/09 08:54, Dan Magenheimer wrote:> Mine was SMP... switching to UP I can now save. BUT... > restore doesn''t seem to quite work. The restore completes > but I get no response from the VNC console. When I > use a tty console, after restore, I am getting > an infinite dump of > > WARNING: at arch/x86/time.c:180 xen_sched_clock+0x2b >That means that the test to see that the CPU its currently running on is not currently running according to Xen... It''s hard to imagine how it got into that state... J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2009-Nov-12 23:22 UTC
Re: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate, domU BUG()
On Thursday, 12 November 2009 at 15:16, Jeremy Fitzhardinge wrote:> On 11/08/09 08:48, Pasi Kärkkäinen wrote: > > Ok.. saving SMP guest fails for me too: > > > > [2009-11-09 23:44:38 1353] DEBUG (XendCheckpoint:110) [xc_save]: /usr/lib64/xen/bin/xc_save 28 2 0 0 0 > > [2009-11-09 23:44:38 1353] INFO (XendCheckpoint:417) xc_save: failed to get the suspend evtchn port > > > > Jeremy: Ideas what''s causing that? "xm save" for UP 2.6.31.5 guest works > > OK, but for SMP guest it fails with the error above. > > There''s no "suspend evtchn port" in a pvops kernel. That looks like a > Remus thing. I think.This is only an INFO-level message, because xc_save falls back to the old xenstore method if it can''t find a suspend event channel. I don''t know the context here, but this particular message ought to be harmless. The event channel was made for Remus, but regular xc_save also uses it to reduce the downtime at the end of live migration. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Nov-12 23:36 UTC
Re: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate fails, domU BUG
On 11/10/09 02:08, Pasi Kärkkäinen wrote:> Hello, > > Jeremy: Here''s summary about these save/restore problems > using upstream Linux 2.6.31.5 PV guest. > > For me: > - I can "xm save" + "xm restore" UP guest, but I get non-fatal > BUG in the guest kernel, see [1]. > - "xm save" fails for SMP guest with "failed to get the suspend evtchn port", see [2]. > > For Dan: > - "xm save" works for UP guest, but "xm restore" doesn''t, giving > infinite xen_sched_clock related dumps in the guest kernel, see [3]. > - "xm save" for SMP guest fails, it never ends. I suspect this > is the same problem I''m seeing. > > > [1] non-fatal BUG on the guest kernel after "xm restore": > http://pasik.reaktio.net/xen/debug/dmesg-2.6.31.5-122.fc12.x86_64-saverestore.txt >Does this help: diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c index 10d03d7..da57ea1 100644 --- a/drivers/xen/manage.c +++ b/drivers/xen/manage.c @@ -43,7 +43,6 @@ static int xen_suspend(void *data) if (err) { printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n", err); - dpm_resume_noirq(PMSG_RESUME); return err; } @@ -69,7 +68,6 @@ static int xen_suspend(void *data) } sysdev_resume(); - dpm_resume_noirq(PMSG_RESUME); return 0; } @@ -108,6 +106,9 @@ static void do_suspend(void) } err = stop_machine(xen_suspend, &cancelled, cpumask_of(0)); + + dpm_resume_noirq(PMSG_RESUME); + if (err) { printk(KERN_ERR "failed to start xen_suspend: %d\n", err); goto out;> [2] "xm log" contains: > [2009-11-09 23:44:38 1353] DEBUG (XendCheckpoint:110) [xc_save]: /usr/lib64/xen/bin/xc_save 28 2 0 0 0 > [2009-11-09 23:44:38 1353] INFO (XendCheckpoint:417) xc_save: failed to get the suspend evtchn port >I think this may be a Remus side-effect.> [3] See the attachment in this email: > http://lists.xensource.com/archives/html/xen-devel/2009-11/msg00391.html >No idea about this one. Needs a closer look. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2009-Nov-23 16:44 UTC
Re: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate fails, domU BUG
On Tue, 2009-11-10 at 10:08 +0000, Pasi Kärkkäinen wrote:> Hello, > > Jeremy: Here''s summary about these save/restore problems > using upstream Linux 2.6.31.5 PV guest. > > For me: > - I can "xm save" + "xm restore" UP guest, but I get non-fatal > BUG in the guest kernel, see [1]. > - "xm save" fails for SMP guest with "failed to get the suspend evtchn port", see [2]. > > For Dan: > - "xm save" works for UP guest, but "xm restore" doesn''t, giving > infinite xen_sched_clock related dumps in the guest kernel, see [3].The runstate fix I sent to the list last week should help with this one.> - "xm save" for SMP guest fails, it never ends. I suspect this > is the same problem I''m seeing.I''m seeing this (or something very like it) too. At the moment it looks as if drivers/xen/manage.c:do_suspend is getting as far as the stop_machine() call but I am never seeing to the xen_suspend() callback. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2009-Nov-24 10:27 UTC
Re: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate fails, domU BUG
On Mon, 2009-11-23 at 16:44 +0000, Ian Campbell wrote:> On Tue, 2009-11-10 at 10:08 +0000, Pasi Kärkkäinen wrote: > > Hello, > > > > Jeremy: Here''s summary about these save/restore problems > > using upstream Linux 2.6.31.5 PV guest. > > > > For me: > > - I can "xm save" + "xm restore" UP guest, but I get non-fatal > > BUG in the guest kernel, see [1]. > > - "xm save" fails for SMP guest with "failed to get the suspend evtchn port", see [2]. > > > > For Dan: > > - "xm save" works for UP guest, but "xm restore" doesn''t, giving > > infinite xen_sched_clock related dumps in the guest kernel, see [3]. > > The runstate fix I sent to the list last week should help with this one. > > > - "xm save" for SMP guest fails, it never ends. I suspect this > > is the same problem I''m seeing. > > I''m seeing this (or something very like it) too. At the moment it looks > as if drivers/xen/manage.c:do_suspend is getting as far as the > stop_machine() call but I am never seeing to the xen_suspend() callback.See "xen: register timer interrupt with IRQF_TIMER" that I just sent to the list for the fix. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2009-Nov-24 14:27 UTC
Re: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate fails, domU BUG
On Thu, 2009-11-12 at 23:36 +0000, Jeremy Fitzhardinge wrote:> On 11/10/09 02:08, Pasi Kärkkäinen wrote: > > Hello, > > > > Jeremy: Here''s summary about these save/restore problems > > using upstream Linux 2.6.31.5 PV guest. > > > > For me: > > - I can "xm save" + "xm restore" UP guest, but I get non-fatal > > BUG in the guest kernel, see [1]. > > - "xm save" fails for SMP guest with "failed to get the suspend evtchn port", see [2]. > > > > For Dan: > > - "xm save" works for UP guest, but "xm restore" doesn''t, giving > > infinite xen_sched_clock related dumps in the guest kernel, see [3]. > > - "xm save" for SMP guest fails, it never ends. I suspect this > > is the same problem I''m seeing. > > > > > > [1] non-fatal BUG on the guest kernel after "xm restore": > > http://pasik.reaktio.net/xen/debug/dmesg-2.6.31.5-122.fc12.x86_64-saverestore.txt > > > > Does this help:It does for me. There''s another dpm_resume_noirq(PMSG_RESUME) a little later in do_suspend() which I think needs to be dropped as well. I''m still seeing other problems with resume, the system is hung on restore and the RCU stall detection logic is triggering, unfortunately arch_trigger_all_cpu_backtrace is not Xen compatible (uses APIC directly) so I don''t get much useful info out of it. It''s most likely a symptom of the actual problem rather than a problem with RCU per-se anyhow. diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c index 10d03d7..7b69a1a 100644 --- a/drivers/xen/manage.c +++ b/drivers/xen/manage.c @@ -43,7 +43,6 @@ static int xen_suspend(void *data) if (err) { printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n", err); - dpm_resume_noirq(PMSG_RESUME); return err; } @@ -69,7 +68,6 @@ static int xen_suspend(void *data) } sysdev_resume(); - dpm_resume_noirq(PMSG_RESUME); return 0; } @@ -108,6 +106,9 @@ static void do_suspend(void) } err = stop_machine(xen_suspend, &cancelled, cpumask_of(0)); + + dpm_resume_noirq(PMSG_RESUME); + if (err) { printk(KERN_ERR "failed to start xen_suspend: %d\n", err); goto out; @@ -119,8 +120,6 @@ static void do_suspend(void) } else xs_suspend_cancel(); - dpm_resume_noirq(PMSG_RESUME); - resume_devices: dpm_resume_end(PMSG_RESUME); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2009-Nov-25 14:12 UTC
Re: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate fails, domU BUG
On Tue, 2009-11-24 at 14:27 +0000, Ian Campbell wrote:> > I''m still seeing other problems with resume, the system is hung on > restore and the RCU stall detection logic is triggering, unfortunately > arch_trigger_all_cpu_backtrace is not Xen compatible (uses APIC > directly) so I don''t get much useful info out of it. It''s most likely > a symptom of the actual problem rather than a problem with RCU per-se > anyhow.tick_resume() is never called on secondary processors. Presumably this is because they are offlined for suspend on native and so this is normally taken care of in the CPU onlining path. Under Xen we keep all CPUs online over a suspend. This patch papers over the issue for me but I will investigate a more generic, less hacky, way of doing to the same. tick_suspend is also only called on the boot CPU which I presume should be fixed too. Ian. diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c index 6343a5d..cdfeed2 100644 --- a/arch/x86/xen/suspend.c +++ b/arch/x86/xen/suspend.c @@ -1,4 +1,5 @@ #include <linux/types.h> +#include <linux/clockchips.h> #include <xen/interface/xen.h> #include <xen/grant_table.h> @@ -46,7 +50,19 @@ void xen_post_suspend(int suspend_cancelled) } +static void xen_vcpu_notify_restore(void *data) +{ + unsigned long reason = (unsigned long)data; + + /* Boot processor notified via generic timekeeping_resume() */ + if ( smp_processor_id() == 0) + return; + + clockevents_notify(reason, NULL); +} + void xen_arch_resume(void) { - /* nothing */ + smp_call_function_many(cpu_online_mask, xen_vcpu_notify_restore, + (void *)CLOCK_EVT_NOTIFY_RESUME, 1); } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Nov-25 19:28 UTC
Re: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate fails, domU BUG
On 11/25/09 06:12, Ian Campbell wrote:> tick_resume() is never called on secondary processors. Presumably this > is because they are offlined for suspend on native and so this is > normally taken care of in the CPU onlining path. Under Xen we keep all > CPUs online over a suspend. > > This patch papers over the issue for me but I will investigate a more > generic, less hacky, way of doing to the same. > > tick_suspend is also only called on the boot CPU which I presume should > be fixed too. >Yep. I wonder how it ever worked? There''s been a fair amount of change in the PM code, so that could have changed things. I don''t know if there''s a deep reason for not calling tick_resume() on all processors. Rafael, tglx: suspend/resume under Xen doesn''t need to hot unplug all the CPUs, so we don''t; the hypervisor can manage the context save/restore for all CPUs. Is there a deep reason why timekeeping_resume() can''t call the CLOCK_EVT_NOTIFY_RESUME notifier on all online CPUs?> void xen_arch_resume(void) > { > - /* nothing */ > + smp_call_function_many(cpu_online_mask, xen_vcpu_notify_restore, > + (void *)CLOCK_EVT_NOTIFY_RESUME, 1); > } >This is equivalent to smp_call_function(). J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2009-Nov-25 20:03 UTC
Re: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate fails, domU BUG
On Wed, 2009-11-25 at 19:28 +0000, Jeremy Fitzhardinge wrote:> On 11/25/09 06:12, Ian Campbell wrote: > > tick_resume() is never called on secondary processors. Presumably this > > is because they are offlined for suspend on native and so this is > > normally taken care of in the CPU onlining path. Under Xen we keep all > > CPUs online over a suspend. > > > > This patch papers over the issue for me but I will investigate a more > > generic, less hacky, way of doing to the same. > > > > tick_suspend is also only called on the boot CPU which I presume should > > be fixed too. > > > > Yep. I wonder how it ever worked? There''s been a fair amount of change > in the PM code, so that could have changed things. I don''t know if > there''s a deep reason for not calling tick_resume() on all processors. > > Rafael, tglx: suspend/resume under Xen doesn''t need to hot unplug all > the CPUs, so we don''t; the hypervisor can manage the context > save/restore for all CPUs. Is there a deep reason why > timekeeping_resume() can''t call the CLOCK_EVT_NOTIFY_RESUME notifier on > all online CPUs?Interrupts are disabled at that point where it currently calls the notifier, so none of the SMP function call primitives work.> > void xen_arch_resume(void) > > { > > - /* nothing */ > > + smp_call_function_many(cpu_online_mask, xen_vcpu_notify_restore, > > + (void *)CLOCK_EVT_NOTIFY_RESUME, 1); > > } > > > > This is equivalent to smp_call_function().Oh yeah. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Nov-25 20:32 UTC
Re: [Xen-devel] pv 2.6.31 (kernel.org) and save/migrate fails, domU BUG
On 11/25/09 12:03, Ian Campbell wrote:>> Yep. I wonder how it ever worked? There''s been a fair amount of change >> in the PM code, so that could have changed things. I don''t know if >> there''s a deep reason for not calling tick_resume() on all processors. >> >> Rafael, tglx: suspend/resume under Xen doesn''t need to hot unplug all >> the CPUs, so we don''t; the hypervisor can manage the context >> save/restore for all CPUs. Is there a deep reason why >> timekeeping_resume() can''t call the CLOCK_EVT_NOTIFY_RESUME notifier on >> all online CPUs? >> > Interrupts are disabled at that point where it currently calls the > notifier, so none of the SMP function call primitives work. >That does make it pretty awkward. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2009-Dec-01 11:47 UTC
[Xen-devel] [PATCH] xen: improve error handling in do_suspend.
The existing error handling has a few issues: - If freeze_processes() fails it exits with shutting_down = SHUTDOWN_SUSPEND. - If dpm_suspend_noirq() fails it exits without resuming xenbus. - If stop_machine() fails it exits without resuming xenbus or calling dpm_resume_end(). - xs_suspend()/xs_resume() and dpm_suspend_noirq()/dpm_resume_noirq() were not nested in the obvious way. Fix by ensuring each failure case goto''s the correct label. Treat a failure of stop_machine() as a cancelled suspend in order to follow the correct resume path. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> --- drivers/xen/manage.c | 20 +++++++++++--------- 1 files changed, 11 insertions(+), 9 deletions(-) diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c index 7b69a1a..2fb7d39 100644 --- a/drivers/xen/manage.c +++ b/drivers/xen/manage.c @@ -86,32 +86,32 @@ static void do_suspend(void) err = freeze_processes(); if (err) { printk(KERN_ERR "xen suspend: freeze failed %d\n", err); - return; + goto out; } #endif err = dpm_suspend_start(PMSG_SUSPEND); if (err) { printk(KERN_ERR "xen suspend: dpm_suspend_start %d\n", err); - goto out; + goto out_thaw; } - printk(KERN_DEBUG "suspending xenstore...\n"); - xs_suspend(); - err = dpm_suspend_noirq(PMSG_SUSPEND); if (err) { printk(KERN_ERR "dpm_suspend_noirq failed: %d\n", err); - goto resume_devices; + goto out_resume; } + printk(KERN_DEBUG "suspending xenstore...\n"); + xs_suspend(); + err = stop_machine(xen_suspend, &cancelled, cpumask_of(0)); dpm_resume_noirq(PMSG_RESUME); if (err) { printk(KERN_ERR "failed to start xen_suspend: %d\n", err); - goto out; + cancelled = 1; } if (!cancelled) { @@ -120,15 +120,17 @@ static void do_suspend(void) } else xs_suspend_cancel(); -resume_devices: +out_resume: dpm_resume_end(PMSG_RESUME); /* Make sure timer events get retriggered on all CPUs */ clock_was_set(); -out: + +out_thaw: #ifdef CONFIG_PREEMPT thaw_processes(); #endif +out: shutting_down = SHUTDOWN_INVALID; } #endif /* CONFIG_PM_SLEEP */ -- 1.5.6.5 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2009-Dec-01 11:47 UTC
[Xen-devel] [PATCH] xen: explicitly create/destroy stop_machine workqueues outside suspend/resume region.
I have observed cases where the implicit stop_machine_destroy() done by stop_machine() hangs while destroying the workqueues, specifically in kthread_stop(). This seems to be because timer ticks are not restarted until after stop_machine() returns. Fortunately stop_machine provides a facility to pre-create/post-destroy the workqueues so use this to ensure that workqueues are only destroyed after everything is really up and running again. I only actually observed this failure with 2.6.30. It seems that newer kernels are somehow more robust against doing kthread_stop() without timer interrupts (I tried some backports of some likely looking candidates but did not track down the commit which added this robustness). However this change seems like a reasonable belt&braces thing to do. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> --- drivers/xen/manage.c | 12 +++++++++++- 1 files changed, 11 insertions(+), 1 deletions(-) diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c index 2fb7d39..c499793 100644 --- a/drivers/xen/manage.c +++ b/drivers/xen/manage.c @@ -79,6 +79,12 @@ static void do_suspend(void) shutting_down = SHUTDOWN_SUSPEND; + err = stop_machine_create(); + if (err) { + printk(KERN_ERR "xen suspend: failed to setup stop_machine %d\n", err); + goto out; + } + #ifdef CONFIG_PREEMPT /* If the kernel is preemptible, we need to freeze all the processes to prevent them from being in the middle of a pagetable update @@ -86,7 +92,7 @@ static void do_suspend(void) err = freeze_processes(); if (err) { printk(KERN_ERR "xen suspend: freeze failed %d\n", err); - goto out; + goto out_destroy_sm; } #endif @@ -129,7 +135,11 @@ out_resume: out_thaw: #ifdef CONFIG_PREEMPT thaw_processes(); + +out_destroy_sm: #endif + stop_machine_destroy(); + out: shutting_down = SHUTDOWN_INVALID; } -- 1.5.6.5 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Dec-01 22:50 UTC
Re: [Xen-devel] [PATCH] xen: explicitly create/destroy stop_machine workqueues outside suspend/resume region.
On 12/01/09 03:47, Ian Campbell wrote:> I have observed cases where the implicit stop_machine_destroy() done by > stop_machine() hangs while destroying the workqueues, specifically in > kthread_stop(). This seems to be because timer ticks are not restarted > until after stop_machine() returns. >Thanks for these - applied. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel