Frank Pan
2011-Mar-21 18:26 UTC
[Xen-devel] [PATCH] [Bugfix][pv-ops] Guest get stucked after migration
In recent pv-ops kernel, pvclock is not guaranteed as monotone. After a migration, pvclock can produce smaller cycle count. [The test is performed on next-2.6.32 tree] The issue occured when uptime(sender) > uptime(target), and with CONFIG_GENERIC_TIME. The guest get stucked after the migration, doing a huge loop inside update_wall_time, until the overflow of 64-bit unsigned offset. The following patch fixed this issue by introducing a global sign. Xen pvclock will update the cycle_last with the newest cycle count on the first read after migration. Signed-off-by: Frank Pan <frankpzh@gmail.com> --- linux-2.6-xen/arch/x86/xen/time.c | 12 +++++++++++- 1 files changed, 11 insertions(+), 1 deletions(-) diff --git a/linux-2.6-xen/arch/x86/xen/time.c b/linux-2.6-xen/arch/x86/xen/time.c index ab35140..ac6fe2a 100644 --- a/linux-2.6-xen/arch/x86/xen/time.c +++ b/linux-2.6-xen/arch/x86/xen/time.c @@ -41,6 +41,9 @@ static DEFINE_PER_CPU(struct vcpu_runstate_info, runstate_snapshot); static DEFINE_PER_CPU(u64, residual_stolen); static DEFINE_PER_CPU(u64, residual_blocked); +/* xen_clocksource_get_cycles should update cycle_last after resume */ +static int just_resume = 0; + /* return an consistent snapshot of 64-bit time/counter value */ static u64 get64(const u64 *p) { @@ -177,7 +180,13 @@ cycle_t xen_clocksource_read(void) static cycle_t xen_clocksource_get_cycles(struct clocksource *cs) { - return xen_clocksource_read(); + cycle_t c = xen_clocksource_read(); + + if (unlikely(just_resume)) { + cs->cycle_last = c++; + just_resume = 0; + } + return c; } static void xen_read_wallclock(struct timespec *ts) @@ -441,6 +450,7 @@ void xen_timer_resume(void) { int cpu; + just_resume = 1; pvclock_resume(); if (xen_clockevent != &xen_vcpuop_clockevent) -- 1.7.0.4 -- 潘震皓, Frank Pan Computer Science and Technology Tsinghua University _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Mar-21 19:34 UTC
[Xen-devel] Re: [PATCH] [Bugfix][pv-ops] Guest get stucked after migration
On Mon, 2011-03-21 at 18:26 +0000, Frank Pan wrote:> In recent pv-ops kernel, pvclock is not guaranteed as monotone. > After a migration, pvclock can produce smaller cycle count. > > [The test is performed on next-2.6.32 tree] > The issue occured when uptime(sender) > uptime(target), and with > CONFIG_GENERIC_TIME. The guest get stucked after the migration, > doing a huge loop inside update_wall_time, until the overflow of > 64-bit unsigned offset. > > The following patch fixed this issue by introducing a global sign. > Xen pvclock will update the cycle_last with the newest cycle count > on the first read after migration.Does the tree you are running have e7a3481c0246 (backported to the 2.6.32 longterm tree in 2.6.32.30 as 595b62a8acfb) in it? That commit seems to deal with a very similar problem to this. Ian.> > Signed-off-by: Frank Pan <frankpzh@gmail.com> > --- > linux-2.6-xen/arch/x86/xen/time.c | 12 +++++++++++- > 1 files changed, 11 insertions(+), 1 deletions(-) > > diff --git a/linux-2.6-xen/arch/x86/xen/time.c > b/linux-2.6-xen/arch/x86/xen/time.c > index ab35140..ac6fe2a 100644 > --- a/linux-2.6-xen/arch/x86/xen/time.c > +++ b/linux-2.6-xen/arch/x86/xen/time.c > @@ -41,6 +41,9 @@ static DEFINE_PER_CPU(struct vcpu_runstate_info, > runstate_snapshot); > static DEFINE_PER_CPU(u64, residual_stolen); > static DEFINE_PER_CPU(u64, residual_blocked); > > +/* xen_clocksource_get_cycles should update cycle_last after resume */ > +static int just_resume = 0; > + > /* return an consistent snapshot of 64-bit time/counter value */ > static u64 get64(const u64 *p) > { > @@ -177,7 +180,13 @@ cycle_t xen_clocksource_read(void) > > static cycle_t xen_clocksource_get_cycles(struct clocksource *cs) > { > - return xen_clocksource_read(); > + cycle_t c = xen_clocksource_read(); > + > + if (unlikely(just_resume)) { > + cs->cycle_last = c++; > + just_resume = 0; > + } > + return c; > } > > static void xen_read_wallclock(struct timespec *ts) > @@ -441,6 +450,7 @@ void xen_timer_resume(void) > { > int cpu; > > + just_resume = 1; > pvclock_resume(); > > if (xen_clockevent != &xen_vcpuop_clockevent) > -- > 1.7.0.4 > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frank Pan
2011-Mar-22 02:42 UTC
[Xen-devel] Re: [PATCH] [Bugfix][pv-ops] Guest get stucked after migration
> Does the tree you are running have e7a3481c0246 (backported to the > 2.6.32 longterm tree in 2.6.32.30 as 595b62a8acfb) in it? That commit > seems to deal with a very similar problem to this.Yes, I have that commit, and It''s related. The commit you mentioned deals with the issue that un-monotone xen uptime can stuck the pvclock code. On the other hand, the issue I met is happened in the xen clocksource layer. The cycle count get from pvclock can be un-monotone and make the clocksource code stucked. -- 潘震皓, Frank Pan Computer Science and Technology Tsinghua University _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frank Pan
2011-Mar-22 04:19 UTC
[Xen-devel] Re: [PATCH] [Bugfix][pv-ops] Guest get stucked after migration
On Tue, Mar 22, 2011 at 2:26 AM, Frank Pan <frankpzh@gmail.com> wrote:> In recent pv-ops kernel, pvclock is not guaranteed as monotone. > After a migration, pvclock can produce smaller cycle count. > > [The test is performed on next-2.6.32 tree] > The issue occured when uptime(sender) > uptime(target), and with > CONFIG_GENERIC_TIME. The guest get stucked after the migration, > doing a huge loop inside update_wall_time, until the overflow of > 64-bit unsigned offset. > > The following patch fixed this issue by introducing a global sign. > Xen pvclock will update the cycle_last with the newest cycle count > on the first read after migration.It seems the issue is not that simple. This issue can be well solved by the suspend/resume code of timekeeper sysdev. However, the recent code may think this is not necessary on hvm domain. (How can I find the reason here?) I don''t know if it''s ok to add a pair of sysdev_suspend/sysdev_resume inside xen_hvm_suspend. If not, this patch can be a quickfix/solution. Any ideas on this issue? -- 潘震皓, Frank Pan Computer Science and Technology Tsinghua University _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frank Pan
2011-Mar-22 04:35 UTC
[Xen-devel] Re: [PATCH] [Bugfix][pv-ops] Guest get stucked after migration
Add Stefano Stabellini, who made the commit of xen_hvm_suspend in the git tree. Stefano, do you have any idea? On Tue, Mar 22, 2011 at 12:19 PM, Frank Pan <frankpzh@gmail.com> wrote:> On Tue, Mar 22, 2011 at 2:26 AM, Frank Pan <frankpzh@gmail.com> wrote: >> In recent pv-ops kernel, pvclock is not guaranteed as monotone. >> After a migration, pvclock can produce smaller cycle count. >> >> [The test is performed on next-2.6.32 tree] >> The issue occured when uptime(sender) > uptime(target), and with >> CONFIG_GENERIC_TIME. The guest get stucked after the migration, >> doing a huge loop inside update_wall_time, until the overflow of >> 64-bit unsigned offset. >> >> The following patch fixed this issue by introducing a global sign. >> Xen pvclock will update the cycle_last with the newest cycle count >> on the first read after migration. > > It seems the issue is not that simple. This issue can be well solved by > the suspend/resume code of timekeeper sysdev. However, the recent > code may think this is not necessary on hvm domain. (How can I find the > reason here?) > > I don''t know if it''s ok to add a pair of sysdev_suspend/sysdev_resume inside > xen_hvm_suspend. If not, this patch can be a quickfix/solution. > > Any ideas on this issue? > > -- > 潘震皓, Frank Pan > > Computer Science and Technology > Tsinghua University >-- 潘震皓, Frank Pan Computer Science and Technology Tsinghua University _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Mar-22 14:28 UTC
[Xen-devel] Re: [PATCH] [Bugfix][pv-ops] Guest get stucked after migration
On Tue, 2011-03-22 at 04:19 +0000, Frank Pan wrote:> On Tue, Mar 22, 2011 at 2:26 AM, Frank Pan <frankpzh@gmail.com> wrote: > > In recent pv-ops kernel, pvclock is not guaranteed as monotone. > > After a migration, pvclock can produce smaller cycle count. > > > > [The test is performed on next-2.6.32 tree] > > The issue occured when uptime(sender) > uptime(target), and with > > CONFIG_GENERIC_TIME. The guest get stucked after the migration, > > doing a huge loop inside update_wall_time, until the overflow of > > 64-bit unsigned offset. > > > > The following patch fixed this issue by introducing a global sign. > > Xen pvclock will update the cycle_last with the newest cycle count > > on the first read after migration. > > It seems the issue is not that simple. This issue can be well solved by > the suspend/resume code of timekeeper sysdev. However, the recent > code may think this is not necessary on hvm domain. (How can I find the > reason here?) > > I don''t know if it''s ok to add a pair of sysdev_suspend/sysdev_resume inside > xen_hvm_suspend. If not, this patch can be a quickfix/solution. > > Any ideas on this issue?In Linus'' tree there is 8dd38383a51d "xen: suspend and resume system devices when running PVHVM" which adds these. I think it is in xen.git#xen/next-2.6.32 as well now, as of dfd083bd89b2. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Mar-22 15:14 UTC
[Xen-devel] Re: [PATCH] [Bugfix][pv-ops] Guest get stucked after migration
On Tue, 22 Mar 2011, Frank Pan wrote:> Add Stefano Stabellini, who made the commit of xen_hvm_suspend in the git tree. > > Stefano, do you have any idea?As Ian wrote, 8dd38383a51d0fb6b025dc330aaa3470281da3b2 introduced sysdev_suspend/sysdev_resume in xen_hvm_suspend. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frank Pan
2011-Mar-23 00:30 UTC
[Xen-devel] Re: [PATCH] [Bugfix][pv-ops] Guest get stucked after migration
Well, that''s good. Thanks. Grrr, that takes me days on debugging. 在 2011-3-22 下午11:15,"Stefano Stabellini" <stefano.stabellini@eu.citrix.com>写道: > On Tue, 22 Mar 2011, Frank Pan wrote: >> Add Stefano Stabellini, who made the commit of xen_hvm_suspend in the gittree.>> >> Stefano, do you have any idea? > > As Ian wrote, 8dd38383a51d0fb6b025dc330aaa3470281da3b2 introduced > sysdev_suspend/sysdev_resume in xen_hvm_suspend. >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel