Eelco Dolstra
2010-Oct-12 14:11 UTC
[Xen-devel] DomU clock jumps forward then freezes after Dom0 reboot
Hi, I''m running into a strange problem with DomU clocks after saving/restoring the domain across a reboot of Dom0. After saving DomU, rebooting Dom0, and restoring DomU, DomU''s clock jumps into the future by an amount equal to the previous uptime of Dom0, then freezes until the same amount of time has passed, after which it start running normally again. This is on Xen 4.0.1, with Dom0 running Linux 2.6.32.24-xen-179eca5 (the pvops stable-2.6.32.x tree from a few days ago), and a DomU running a vanilla paravirtualised 2.6.32.24 kernel. Here is an example: [root@mrhankey:~]# xm create drdoctor Using config file "/etc/xen/drdoctor". Started domain drdoctor (id=4) [root@mrhankey:~]# uptime 18:47pm up 1:41, 1 user, load average: 1.04, 1.01, 1.00 [root@mrhankey:~]# ssh drdoctor date Mon Oct 11 18:47:59 CEST 2010 Now we reboot Dom0 (which saves and restores "drdoctor"). After this the clock in "drdoctor" is stuck in the future: [root@mrhankey:~]# uptime 18:53pm up 0:01, 1 user, load average: 0.40, 0.15, 0.05 [root@mrhankey:~]# date Mon Oct 11 18:53:49 CEST 2010 [root@mrhankey:~]# ssh drdoctor date Mon Oct 11 20:33:21 CEST 2010 (wait a while...) [root@mrhankey:~]# ssh drdoctor date Mon Oct 11 20:33:21 CEST 2010 Note that the DomU kernel has jumped roughly 1:40 into the future, which was Dom0''s uptime prior to its reboot. The clock in DomU stays stuck at 20:33:21 until Dom0''s clock reaches 20:33:21, after which it starts ticking again. During this time, the machine is basically unusable because any time-dependent function (such as sleep()) remains stuck. The problem does not occur when DomU is saved and restored without a Dom0 reboot in between. Whether NTP is running on Dom0 or DomU doesn''t matter. I tried "tsc_mode=1" (force RDTSC emulation) but it didn''t have an effect. Neither did changing the clocksource in DomU from "xen" to "tsc", or changing the date with "date -s" on Dom0 or DomU. The following messages in /var/log/xen/xend.log might be relevant: (during save...) [2010-10-11 16:48:10 2000] INFO (XendCheckpoint:423) xc_save: failed to get the suspend evtchn port ... (during restore...) [2010-10-11 16:53:29 2066] INFO (XendCheckpoint:423) Reloading memory pages: 0% [2010-10-11 16:53:34 2066] INFO (XendCheckpoint:423) ERROR Internal error: Error when reading batch size [2010-10-11 16:53:34 2066] INFO (XendCheckpoint:423) ERROR Internal error: error when buffering batch, finishing ... [2010-10-11 16:53:35 2066] INFO (XendCheckpoint:423) Restore exit with rc=0 And another time: [2010-10-11 14:20:03 2044] INFO (XendCheckpoint:423) ERROR Internal error: Max batch size exceeded (1970103633). Giving up. [2010-10-11 14:20:03 2044] INFO (XendCheckpoint:423) ERROR Internal error: error when buffering batch, finishing These seem to suggest that the save is incomplete or corrupt. However, in all cases the restore completes succesfully, apart from the clock issue. Anybody have an idea what might be the cause? BTW, I''m packaging Xen for NixOS (http://nixos.org/nixos/), which stores packages under non-standard prefixes (i.e. not /usr), but I don''t think this is an issue here. -- Eelco Dolstra | http://www.st.ewi.tudelft.nl/~dolstra/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Olivier Hanesse
2010-Oct-19 10:07 UTC
Re: [Xen-devel] DomU clock jumps forward then freezes after Dom0 reboot
Hello, I think I got exactly the same bug. After a reboot, all my DomU are stuck during X min , where X was the "uptime" of the dom0 before reboot. Save/Restore without reboot works perfectly. I am running Debian Lenny with backports : ii xen-hypervisor-4.0-amd64 4.0.1-1 The Xen Hypervisor on AMD64 ii linux-image-2.6.32-bpo.5-xen-amd64 2.6.32-23~bpo50+1 Linux 2.6.32 for 64-bit PCs, Xen dom0 suppor Any ideas ? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cédric Schieli
2010-Oct-24 12:31 UTC
Re: [Xen-devel] DomU clock jumps forward then freezes after Dom0 reboot
Hello, I can confirm my problem reported here http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00057.html is the same. DomU kernels affected by the migration hang are also affected by the save/restore hang. Reverting "x86, paravirt: Add a global synchronization point for pvclock" also fix the save/restore hang. After doing save/reboot/restore (which led to a hang), migrating it to a host with a longer uptime will unblock the domain, but the wallclock will be several hours forward. Migrating back will block again. Regards, Cédric Schieli 2010/10/19 Olivier Hanesse <olivier.hanesse@gmail.com>:> Hello, > > I think I got exactly the same bug. > > After a reboot, all my DomU are stuck during X min , where X was the > "uptime" of the dom0 before reboot. > > Save/Restore without reboot works perfectly. > > I am running Debian Lenny with backports : > > ii xen-hypervisor-4.0-amd64 4.0.1-1 > The Xen Hypervisor on AMD64 > ii linux-image-2.6.32-bpo.5-xen-amd64 2.6.32-23~bpo50+1 > Linux 2.6.32 for 64-bit PCs, Xen dom0 suppor > > Any ideas ? > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Oct-25 23:25 UTC
Re: [Xen-devel] DomU clock jumps forward then freezes after Dom0 reboot
On 10/24/2010 05:31 AM, Cédric Schieli wrote:> Hello, > > I can confirm my problem reported here > http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00057.html > is the same. > DomU kernels affected by the migration hang are also affected by the > save/restore hang. Reverting "x86, paravirt: Add a global > synchronization point for pvclock" also fix the save/restore hang. > After doing save/reboot/restore (which led to a hang), migrating it to > a host with a longer uptime will unblock the domain, but the wallclock > will be several hours forward. Migrating back will block again.Ah, thanks for that. I was trying to think of what changes could have broken that, since it certainly used to work. I''ll sort out a fix. Thanks, J> Regards, > Cédric Schieli > > 2010/10/19 Olivier Hanesse <olivier.hanesse@gmail.com>: >> Hello, >> >> I think I got exactly the same bug. >> >> After a reboot, all my DomU are stuck during X min , where X was the >> "uptime" of the dom0 before reboot. >> >> Save/Restore without reboot works perfectly. >> >> I am running Debian Lenny with backports : >> >> ii xen-hypervisor-4.0-amd64 4.0.1-1 >> The Xen Hypervisor on AMD64 >> ii linux-image-2.6.32-bpo.5-xen-amd64 2.6.32-23~bpo50+1 >> Linux 2.6.32 for 64-bit PCs, Xen dom0 suppor >> >> Any ideas ? >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> >> > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Oct-26 00:21 UTC
Re: [Xen-devel] DomU clock jumps forward then freezes after Dom0 reboot
On 10/24/2010 05:31 AM, Cédric Schieli wrote:> Hello, > > I can confirm my problem reported here > http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00057.html > is the same. > DomU kernels affected by the migration hang are also affected by the > save/restore hang. Reverting "x86, paravirt: Add a global > synchronization point for pvclock" also fix the save/restore hang. > After doing save/reboot/restore (which led to a hang), migrating it to > a host with a longer uptime will unblock the domain, but the wallclock > will be several hours forward. Migrating back will block again.Does this help? From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Date: Mon, 25 Oct 2010 16:53:46 -0700 Subject: [PATCH] x86/pvclock: zero last_value on resume If the guest domain has been suspend/resumed or migrated, then the system clock backing the pvclock clocksource may revert to a smaller value (ie, can be non-monotonic across the migration/save-restore). Make sure we zero last_value in that case so that the domain continues to see clock updates. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h index cd02f32..6226870 100644 --- a/arch/x86/include/asm/pvclock.h +++ b/arch/x86/include/asm/pvclock.h @@ -11,5 +11,6 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src); void pvclock_read_wallclock(struct pvclock_wall_clock *wall, struct pvclock_vcpu_time_info *vcpu, struct timespec *ts); +void pvclock_resume(void); #endif /* _ASM_X86_PVCLOCK_H */ diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c index 239427c..a4f07c1 100644 --- a/arch/x86/kernel/pvclock.c +++ b/arch/x86/kernel/pvclock.c @@ -120,6 +120,11 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src) static atomic64_t last_value = ATOMIC64_INIT(0); +void pvclock_resume(void) +{ + atomic64_set(&last_value, 0); +} + cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) { struct pvclock_shadow_time shadow; diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index b2bb5aa..5da5e53 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -426,6 +426,8 @@ void xen_timer_resume(void) { int cpu; + pvclock_resume(); + if (xen_clockevent != &xen_vcpuop_clockevent) return; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cédric Schieli
2010-Oct-26 13:08 UTC
Re: [Xen-devel] DomU clock jumps forward then freezes after Dom0 reboot
2010/10/26 Jeremy Fitzhardinge <jeremy@goop.org>:> On 10/24/2010 05:31 AM, Cédric Schieli wrote: >> Hello, >> >> I can confirm my problem reported here >> http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00057.html >> is the same. >> DomU kernels affected by the migration hang are also affected by the >> save/restore hang. Reverting "x86, paravirt: Add a global >> synchronization point for pvclock" also fix the save/restore hang. >> After doing save/reboot/restore (which led to a hang), migrating it to >> a host with a longer uptime will unblock the domain, but the wallclock >> will be several hours forward. Migrating back will block again. > > Does this help?Yes. With this patch applied I can migrate and migrate back without problem. Save/restore with a reboot in between also works. Thanks !> > From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > Date: Mon, 25 Oct 2010 16:53:46 -0700 > Subject: [PATCH] x86/pvclock: zero last_value on resume > > If the guest domain has been suspend/resumed or migrated, then the > system clock backing the pvclock clocksource may revert to a smaller > value (ie, can be non-monotonic across the migration/save-restore). > Make sure we zero last_value in that case so that the domain > continues to see clock updates. > > Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > > diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h > index cd02f32..6226870 100644 > --- a/arch/x86/include/asm/pvclock.h > +++ b/arch/x86/include/asm/pvclock.h > @@ -11,5 +11,6 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src); > void pvclock_read_wallclock(struct pvclock_wall_clock *wall, > struct pvclock_vcpu_time_info *vcpu, > struct timespec *ts); > +void pvclock_resume(void); > > #endif /* _ASM_X86_PVCLOCK_H */ > diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c > index 239427c..a4f07c1 100644 > --- a/arch/x86/kernel/pvclock.c > +++ b/arch/x86/kernel/pvclock.c > @@ -120,6 +120,11 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src) > > static atomic64_t last_value = ATOMIC64_INIT(0); > > +void pvclock_resume(void) > +{ > + atomic64_set(&last_value, 0); > +} > + > cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) > { > struct pvclock_shadow_time shadow; > diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c > index b2bb5aa..5da5e53 100644 > --- a/arch/x86/xen/time.c > +++ b/arch/x86/xen/time.c > @@ -426,6 +426,8 @@ void xen_timer_resume(void) > { > int cpu; > > + pvclock_resume(); > + > if (xen_clockevent != &xen_vcpuop_clockevent) > return; > > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Oct-26 16:52 UTC
Re: [Xen-devel] DomU clock jumps forward then freezes after Dom0 reboot
On 10/26/2010 06:08 AM, Cédric Schieli wrote:> 2010/10/26 Jeremy Fitzhardinge <jeremy@goop.org>: >> On 10/24/2010 05:31 AM, Cédric Schieli wrote: >>> Hello, >>> >>> I can confirm my problem reported here >>> http://lists.xensource.com/archives/html/xen-devel/2010-10/msg00057.html >>> is the same. >>> DomU kernels affected by the migration hang are also affected by the >>> save/restore hang. Reverting "x86, paravirt: Add a global >>> synchronization point for pvclock" also fix the save/restore hang. >>> After doing save/reboot/restore (which led to a hang), migrating it to >>> a host with a longer uptime will unblock the domain, but the wallclock >>> will be several hours forward. Migrating back will block again. >> Does this help? > Yes. With this patch applied I can migrate and migrate back without > problem. Save/restore with a reboot in between also works.OK, thanks very much. J> Thanks ! > >> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> >> Date: Mon, 25 Oct 2010 16:53:46 -0700 >> Subject: [PATCH] x86/pvclock: zero last_value on resume >> >> If the guest domain has been suspend/resumed or migrated, then the >> system clock backing the pvclock clocksource may revert to a smaller >> value (ie, can be non-monotonic across the migration/save-restore). >> Make sure we zero last_value in that case so that the domain >> continues to see clock updates. >> >> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> >> >> diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h >> index cd02f32..6226870 100644 >> --- a/arch/x86/include/asm/pvclock.h >> +++ b/arch/x86/include/asm/pvclock.h >> @@ -11,5 +11,6 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src); >> void pvclock_read_wallclock(struct pvclock_wall_clock *wall, >> struct pvclock_vcpu_time_info *vcpu, >> struct timespec *ts); >> +void pvclock_resume(void); >> >> #endif /* _ASM_X86_PVCLOCK_H */ >> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c >> index 239427c..a4f07c1 100644 >> --- a/arch/x86/kernel/pvclock.c >> +++ b/arch/x86/kernel/pvclock.c >> @@ -120,6 +120,11 @@ unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src) >> >> static atomic64_t last_value = ATOMIC64_INIT(0); >> >> +void pvclock_resume(void) >> +{ >> + atomic64_set(&last_value, 0); >> +} >> + >> cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src) >> { >> struct pvclock_shadow_time shadow; >> diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c >> index b2bb5aa..5da5e53 100644 >> --- a/arch/x86/xen/time.c >> +++ b/arch/x86/xen/time.c >> @@ -426,6 +426,8 @@ void xen_timer_resume(void) >> { >> int cpu; >> >> + pvclock_resume(); >> + >> if (xen_clockevent != &xen_vcpuop_clockevent) >> return; >> >> >> >>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel