thr3ads.net - Xen devel - pvops: Does PVOPS guest os support online "suspend/resume" [Aug 2013]

If this information is useful, please help other people find it:
Share via:

Gonglei (Arei)

2013-Aug-08 14:23 UTC

pvops: Does PVOPS guest os support online "suspend/resume"

Hi all,

While suspend and resume a PVOPS guest os while it's running, we found that
it would get its block/net io stucked. However, non-PVOPS guest os has no such
problem.

How reproducible:
-------------------
1/1

Steps to reproduce:
------------------
  1)suspend guest os
    Note: do not migrate/shutdown the guest os.
  2)resume guest os 

(Think about rolling-back(resume) during core-dumping(suspend) a guest, such
problem would cause the guest os unoprationable.)

===================================================================we found
warning messages in guest os:
--------------------------------------------------------------------
Aug  2 10:17:34 localhost kernel: [38592.985159] platform pcspkr: resume
Aug  2 10:17:34 localhost kernel: [38592.989890] platform vesafb.0: resume
Aug  2 10:17:34 localhost kernel: [38592.996075] input input0: type resume
Aug  2 10:17:34 localhost kernel: [38593.001330] input input1: type resume
Aug  2 10:17:34 localhost kernel: [38593.005496] vbd vbd-51712: legacy resume
Aug  2 10:17:34 localhost kernel: [38593.011506] WARNING: g.e. still in use!
Aug  2 10:17:34 localhost kernel: [38593.016909] WARNING: leaking g.e. and page
still in use!
Aug  2 10:17:34 localhost kernel: [38593.026204] xen vbd-51760: legacy resume
Aug  2 10:17:34 localhost kernel: [38593.033070] vif vif-0: legacy resume
Aug  2 10:17:34 localhost kernel: [38593.039327] WARNING: g.e. still in use!
Aug  2 10:17:34 localhost kernel: [38593.045304] WARNING: leaking g.e. and page
still in use!
Aug  2 10:17:34 localhost kernel: [38593.052101] WARNING: g.e. still in use!
Aug  2 10:17:34 localhost kernel: [38593.057965] WARNING: leaking g.e. and page
still in use!
Aug  2 10:17:34 localhost kernel: [38593.066795] serial8250 serial8250: resume
Aug  2 10:17:34 localhost kernel: [38593.073556] input input2: type resume
Aug  2 10:17:34 localhost kernel: [38593.079385] platform Fixed MDIO bus.0:
resume
Aug  2 10:17:34 localhost kernel: [38593.086285] usb usb1: type resume
------------------------------------------------------

which means that we refers to a grant-table while it's in use.

The reason results in that:
suspend/resume codes:
--------------------------------------------------------
//drivers/xen/manage.c
static void do_suspend(void)
{
	int err;
	struct suspend_info si;

	shutting_down = SHUTDOWN_SUSPEND;

………………
	err = dpm_suspend_start(PMSG_FREEZE);
………………
	dpm_resume_start(si.cancelled ? PMSG_THAW : PMSG_RESTORE);

	if (err) {
		pr_err("failed to start xen_suspend: %d\n", err);
		si.cancelled = 1;
	}
//NOTE: si.cancelled = 1

out_resume:
	if (!si.cancelled) {
		xen_arch_resume();   
		xs_resume();
	} else
		xs_suspend_cancel();

	dpm_resume_end(si.cancelled ? PMSG_THAW : PMSG_RESTORE);  //blkfront device got
resumed here.

out_thaw:
#ifdef CONFIG_PREEMPT
	thaw_processes();
out:
#endif
	shutting_down = SHUTDOWN_INVALID;
}
------------------------------------

Func "dpm_suspend_start" suspends devices, and
"dpm_resume_end" resumes devices.
However, we found that the device "blkfront" has no SUSPEND method but
RESUME method.

-------------------------------------
//drivers/block/xen-blkfront.c
static DEFINE_XENBUS_DRIVER(blkfront, ,
	.probe = blkfront_probe,
	.remove = blkfront_remove,
	.resume = blkfront_resume,  // only RESUME method found here.
	.otherend_changed = blkback_changed,
	.is_ready = blkfront_is_ready,
);
--------------------------------------

It resumes blkfront device when it didn't get suspended, which caused the
prolem above.


========================================In order to check whether it's the
problem of PVOPS or hypervisor(xen)/dom0, we suspend/resume other non-PVOPS
guest oses, no such problem occured.

Other non-PVOPS are using their own xen drivers, as shown in
https://github.com/jpaton/xen-4.1-LJX1/blob/master/unmodified_drivers/linux-2.6/platform-pci/machine_reboot.c
:

int __xen_suspend(int fast_suspend, void (*resume_notifier)(int))
{
    int err, suspend_cancelled, nr_cpus;
    struct ap_suspend_info info;

    xenbus_suspend();

……………………
    preempt_enable();

    if (!suspend_cancelled)
        xenbus_resume();     //when the guest os get resumed, suspend_cancelled
== 1, thus it wouldn't enter xenbus_resume_uvp here.
    else
        xenbus_suspend_cancel();  //It gets here. so the blkfront wouldn't
resume.

    return 0;
}


In non-PVOPS guest os, although they don't have blkfront SUSPEND method
either, their xen-driver doesn't resume blkfront device, thus, they
would't have any problem after suspend/resume.


I'm wondering why the 2 types of driver(PVOPS and non-PVOPS) are different
here.
Is that because:
1) PVOPS kernel doesn't take this situation into accont, and has a bug here?
or
2) PVOPS has other ways to avoid such problem?

thank you in advance.

-Gonglei
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Konrad Rzeszutek Wilk

2013-Aug-08 19:16 UTC

head link

Re: pvops: Does PVOPS guest os support online "suspend/resume"

On Thu, Aug 08, 2013 at 02:23:06PM +0000, Gonglei (Arei)
wrote:> Hi all,
> 
> While suspend and resume a PVOPS guest os while it's running, we found
that it would get its block/net io stucked. However, non-PVOPS guest os has no
such problem.
> 
With what version of Linux is this? Have you tried with v3.10?

Thanks.> How reproducible:
> -------------------
> 1/1
> 
> Steps to reproduce:
> ------------------
>   1)suspend guest os
>     Note: do not migrate/shutdown the guest os.
>   2)resume guest os 
> 
> (Think about rolling-back(resume) during core-dumping(suspend) a guest,
such problem would cause the guest os unoprationable.)
> 
> ===================================================================> we
found warning messages in guest os:
> --------------------------------------------------------------------
> Aug  2 10:17:34 localhost kernel: [38592.985159] platform pcspkr: resume
> Aug  2 10:17:34 localhost kernel: [38592.989890] platform vesafb.0: resume
> Aug  2 10:17:34 localhost kernel: [38592.996075] input input0: type resume
> Aug  2 10:17:34 localhost kernel: [38593.001330] input input1: type resume
> Aug  2 10:17:34 localhost kernel: [38593.005496] vbd vbd-51712: legacy
resume
> Aug  2 10:17:34 localhost kernel: [38593.011506] WARNING: g.e. still in
use!
> Aug  2 10:17:34 localhost kernel: [38593.016909] WARNING: leaking g.e. and
page still in use!
> Aug  2 10:17:34 localhost kernel: [38593.026204] xen vbd-51760: legacy
resume
> Aug  2 10:17:34 localhost kernel: [38593.033070] vif vif-0: legacy resume
> Aug  2 10:17:34 localhost kernel: [38593.039327] WARNING: g.e. still in
use!
> Aug  2 10:17:34 localhost kernel: [38593.045304] WARNING: leaking g.e. and
page still in use!
> Aug  2 10:17:34 localhost kernel: [38593.052101] WARNING: g.e. still in
use!
> Aug  2 10:17:34 localhost kernel: [38593.057965] WARNING: leaking g.e. and
page still in use!
> Aug  2 10:17:34 localhost kernel: [38593.066795] serial8250 serial8250:
resume
> Aug  2 10:17:34 localhost kernel: [38593.073556] input input2: type resume
> Aug  2 10:17:34 localhost kernel: [38593.079385] platform Fixed MDIO bus.0:
resume
> Aug  2 10:17:34 localhost kernel: [38593.086285] usb usb1: type resume
> ------------------------------------------------------
> 
> which means that we refers to a grant-table while it's in use.
> 
> The reason results in that:
> suspend/resume codes:
> --------------------------------------------------------
> //drivers/xen/manage.c
> static void do_suspend(void)
> {
> 	int err;
> 	struct suspend_info si;
> 
> 	shutting_down = SHUTDOWN_SUSPEND;
> 
> ………………
> 	err = dpm_suspend_start(PMSG_FREEZE);
> ………………
> 	dpm_resume_start(si.cancelled ? PMSG_THAW : PMSG_RESTORE);
> 
> 	if (err) {
> 		pr_err("failed to start xen_suspend: %d\n", err);
> 		si.cancelled = 1;
> 	}
> //NOTE: si.cancelled = 1
> 
> out_resume:
> 	if (!si.cancelled) {
> 		xen_arch_resume();   
> 		xs_resume();
> 	} else
> 		xs_suspend_cancel();
> 
> 	dpm_resume_end(si.cancelled ? PMSG_THAW : PMSG_RESTORE);  //blkfront
device got resumed here.
> 
> out_thaw:
> #ifdef CONFIG_PREEMPT
> 	thaw_processes();
> out:
> #endif
> 	shutting_down = SHUTDOWN_INVALID;
> }
> ------------------------------------
> 
> Func "dpm_suspend_start" suspends devices, and
"dpm_resume_end" resumes devices.
> However, we found that the device "blkfront" has no SUSPEND
method but RESUME method.
> 
> -------------------------------------
> //drivers/block/xen-blkfront.c
> static DEFINE_XENBUS_DRIVER(blkfront, ,
> 	.probe = blkfront_probe,
> 	.remove = blkfront_remove,
> 	.resume = blkfront_resume,  // only RESUME method found here.
> 	.otherend_changed = blkback_changed,
> 	.is_ready = blkfront_is_ready,
> );
> --------------------------------------
> 
> It resumes blkfront device when it didn't get suspended, which caused
the prolem above.
> 
> 
> ========================================> In order to check whether
it's the problem of PVOPS or hypervisor(xen)/dom0, we suspend/resume other
non-PVOPS guest oses, no such problem occured.
> 
> Other non-PVOPS are using their own xen drivers, as shown in
https://github.com/jpaton/xen-4.1-LJX1/blob/master/unmodified_drivers/linux-2.6/platform-pci/machine_reboot.c
:
> 
> int __xen_suspend(int fast_suspend, void (*resume_notifier)(int))
> {
>     int err, suspend_cancelled, nr_cpus;
>     struct ap_suspend_info info;
> 
>     xenbus_suspend();
> 
> ……………………
>     preempt_enable();
> 
>     if (!suspend_cancelled)
>         xenbus_resume();     //when the guest os get resumed,
suspend_cancelled == 1, thus it wouldn't enter xenbus_resume_uvp here.
>     else
>         xenbus_suspend_cancel();  //It gets here. so the blkfront
wouldn't resume.
> 
>     return 0;
> }
> 
> 
> In non-PVOPS guest os, although they don't have blkfront SUSPEND method
either, their xen-driver doesn't resume blkfront device, thus, they
would't have any problem after suspend/resume.
> 
> 
> I'm wondering why the 2 types of driver(PVOPS and non-PVOPS) are
different here.
> Is that because:
> 1) PVOPS kernel doesn't take this situation into accont, and has a bug
here?
> or
> 2) PVOPS has other ways to avoid such problem?
> 
> thank you in advance.
> 
> -Gonglei
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Gonglei (Arei)

2013-Aug-10 08:29 UTC

head link

Re: pvops: Does PVOPS guest os support online "suspend/resume"

> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Friday, August 09, 2013 3:17 AM
> To: Gonglei (Arei)
> Cc: xen-devel@lists.xen.org; Zhangbo (Oscar); Luonengjun; Hanweidong
> Subject: Re: [Xen-devel] pvops: Does PVOPS guest os support online
> "suspend/resume"
> 
> On Thu, Aug 08, 2013 at 02:23:06PM +0000, Gonglei (Arei) wrote:
> > Hi all,
> >
> > While suspend and resume a PVOPS guest os while it's running, we
found that
> it would get its block/net io stucked. However, non-PVOPS guest os has no
such
> problem.
> >
> 
> With what version of Linux is this? Have you tried with v3.10?
Thanks for responding. We've tried kernel "3.5.0-17 generic"
(ubuntu 12.10), the problem still exists.
Although we are not sure about the result about kernel 3.10, but suspiciously it
would also have the same problem.

Xen version:  4.3.0

Another method to reproduce:
1) xl create dom1.cfg
2) xl save -c dom1 /path/to/save/file
   (-c  Leave domain running after creating the snapshot.)

As I mentioned before, the problem occurs because PVOPS guest os RESUMEes
blkfront when the guest resumes.
The "blkfront_resume" method seems unnecessary here. 
non-PVOPS guest os doesn't RESUME blkfront, thus they works fine.

So, here comes the 2 questions, is the problem caused because: 
1) PVOPS kernel doesn't take this situation into accont, and has a bug here?
or
2) PVOPS has other ways to avoid such problem?

-Gonglei> 
> Thanks.
> > How reproducible:
> > -------------------
> > 1/1
> >
> > Steps to reproduce:
> > ------------------
> >   1)suspend guest os
> >     Note: do not migrate/shutdown the guest os.
> >   2)resume guest os
> >
> > (Think about rolling-back(resume) during core-dumping(suspend) a
guest,
> such problem would cause the guest os unoprationable.)
> >
> >
> ===============================================================> ===>
> we found warning messages in guest os:
> > --------------------------------------------------------------------
> > Aug  2 10:17:34 localhost kernel: [38592.985159] platform pcspkr:
resume
> > Aug  2 10:17:34 localhost kernel: [38592.989890] platform vesafb.0:
resume
> > Aug  2 10:17:34 localhost kernel: [38592.996075] input input0: type
resume
> > Aug  2 10:17:34 localhost kernel: [38593.001330] input input1: type
resume
> > Aug  2 10:17:34 localhost kernel: [38593.005496] vbd vbd-51712: legacy
> resume
> > Aug  2 10:17:34 localhost kernel: [38593.011506] WARNING: g.e. still
in use!
> > Aug  2 10:17:34 localhost kernel: [38593.016909] WARNING: leaking g.e.
> and page still in use!
> > Aug  2 10:17:34 localhost kernel: [38593.026204] xen vbd-51760: legacy
> resume
> > Aug  2 10:17:34 localhost kernel: [38593.033070] vif vif-0: legacy
resume
> > Aug  2 10:17:34 localhost kernel: [38593.039327] WARNING: g.e. still
in use!
> > Aug  2 10:17:34 localhost kernel: [38593.045304] WARNING: leaking g.e.
> and page still in use!
> > Aug  2 10:17:34 localhost kernel: [38593.052101] WARNING: g.e. still
in use!
> > Aug  2 10:17:34 localhost kernel: [38593.057965] WARNING: leaking g.e.
> and page still in use!
> > Aug  2 10:17:34 localhost kernel: [38593.066795] serial8250
serial8250:
> resume
> > Aug  2 10:17:34 localhost kernel: [38593.073556] input input2: type
resume
> > Aug  2 10:17:34 localhost kernel: [38593.079385] platform Fixed MDIO
bus.0:
> resume
> > Aug  2 10:17:34 localhost kernel: [38593.086285] usb usb1: type resume
> > ------------------------------------------------------
> >
> > which means that we refers to a grant-table while it's in use.
> >
> > The reason results in that:
> > suspend/resume codes:
> > --------------------------------------------------------
> > //drivers/xen/manage.c
> > static void do_suspend(void)
> > {
> > 	int err;
> > 	struct suspend_info si;
> >
> > 	shutting_down = SHUTDOWN_SUSPEND;
> >
> > ………………
> > 	err = dpm_suspend_start(PMSG_FREEZE);
> > ………………
> > 	dpm_resume_start(si.cancelled ? PMSG_THAW : PMSG_RESTORE);
> >
> > 	if (err) {
> > 		pr_err("failed to start xen_suspend: %d\n", err);
> > 		si.cancelled = 1;
> > 	}
> > //NOTE: si.cancelled = 1
> >
> > out_resume:
> > 	if (!si.cancelled) {
> > 		xen_arch_resume();
> > 		xs_resume();
> > 	} else
> > 		xs_suspend_cancel();
> >
> > 	dpm_resume_end(si.cancelled ? PMSG_THAW : PMSG_RESTORE);
> //blkfront device got resumed here.
> >
> > out_thaw:
> > #ifdef CONFIG_PREEMPT
> > 	thaw_processes();
> > out:
> > #endif
> > 	shutting_down = SHUTDOWN_INVALID;
> > }
> > ------------------------------------
> >
> > Func "dpm_suspend_start" suspends devices, and
"dpm_resume_end"
> resumes devices.
> > However, we found that the device "blkfront" has no SUSPEND
method but
> RESUME method.
> >
> > -------------------------------------
> > //drivers/block/xen-blkfront.c
> > static DEFINE_XENBUS_DRIVER(blkfront, ,
> > 	.probe = blkfront_probe,
> > 	.remove = blkfront_remove,
> > 	.resume = blkfront_resume,  // only RESUME method found here.
> > 	.otherend_changed = blkback_changed,
> > 	.is_ready = blkfront_is_ready,
> > );
> > --------------------------------------
> >
> > It resumes blkfront device when it didn't get suspended, which
caused the
> prolem above.
> >
> >
> > ========================================> > In order to check
whether it's the problem of PVOPS or hypervisor(xen)/dom0,
> we suspend/resume other non-PVOPS guest oses, no such problem occured.
> >
> > Other non-PVOPS are using their own xen drivers, as shown in
>
https://github.com/jpaton/xen-4.1-LJX1/blob/master/unmodified_drivers/linux-
> 2.6/platform-pci/machine_reboot.c :
> >
> > int __xen_suspend(int fast_suspend, void (*resume_notifier)(int))
> > {
> >     int err, suspend_cancelled, nr_cpus;
> >     struct ap_suspend_info info;
> >
> >     xenbus_suspend();
> >
> > ……………………
> >     preempt_enable();
> >
> >     if (!suspend_cancelled)
> >         xenbus_resume();     //when the guest os get resumed,
> suspend_cancelled == 1, thus it wouldn't enter xenbus_resume_uvp here.
> >     else
> >         xenbus_suspend_cancel();  //It gets here. so the blkfront
> wouldn't resume.
> >
> >     return 0;
> > }
> >
> >
> > In non-PVOPS guest os, although they don't have blkfront SUSPEND
method
> either, their xen-driver doesn't resume blkfront device, thus, they
would't have
> any problem after suspend/resume.
> >
> >
> > I'm wondering why the 2 types of driver(PVOPS and non-PVOPS) are
different
> here.
> > Is that because:
> > 1) PVOPS kernel doesn't take this situation into accont, and has a
bug here?
> > or
> > 2) PVOPS has other ways to avoid such problem?
> >
> > thank you in advance.
> >
> > -Gonglei
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Konrad Rzeszutek Wilk

2013-Aug-12 12:49 UTC

head link

Re: pvops: Does PVOPS guest os support online "suspend/resume"

On Sat, Aug 10, 2013 at 08:29:43AM +0000, Gonglei (Arei)
wrote:> 
> 
> > -----Original Message-----
> > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> > Sent: Friday, August 09, 2013 3:17 AM
> > To: Gonglei (Arei)
> > Cc: xen-devel@lists.xen.org; Zhangbo (Oscar); Luonengjun; Hanweidong
> > Subject: Re: [Xen-devel] pvops: Does PVOPS guest os support online
> > "suspend/resume"
> > 
> > On Thu, Aug 08, 2013 at 02:23:06PM +0000, Gonglei (Arei) wrote:
> > > Hi all,
> > >
> > > While suspend and resume a PVOPS guest os while it's running,
we found that
> > it would get its block/net io stucked. However, non-PVOPS guest os has
no such
> > problem.
> > >
> > 
> > With what version of Linux is this? Have you tried with v3.10?
> 
> Thanks for responding. We've tried kernel "3.5.0-17 generic"
(ubuntu 12.10), the problem still exists.
So you have not tried v3.10. v3.5 is ancient from the upstream perspective.
> Although we are not sure about the result about kernel 3.10, but
suspiciously it would also have the same problem.
Potentially. There were fixes added in 3.5:

commit 569ca5b3f94cd0b3295ec5943aa457cf4a4f6a3a
Author: Jan Beulich <JBeulich@suse.com>
Date:   Thu Apr 5 16:10:07 2012 +0100

    xen/gnttab: add deferred freeing logic
    
    Rather than just leaking pages that can't be freed at the point where
    access permission for the backend domain gets revoked, put them on a
    list and run a timer to (infrequently) retry freeing them. (This can
    particularly happen when unloading a frontend driver when devices are
    still present, and the backend still has them in non-closed state or
    hasn't finished closing them yet.)
    
and that seems to be triggered.> 
> Xen version:  4.3.0
> 
> Another method to reproduce:
> 1) xl create dom1.cfg
> 2) xl save -c dom1 /path/to/save/file
>    (-c  Leave domain running after creating the snapshot.)
> 
> As I mentioned before, the problem occurs because PVOPS guest os RESUMEes
blkfront when the guest resumes.
> The "blkfront_resume" method seems unnecessary here. 
It has to do that otherwise it can't replay the I/Os that might not have
hit the platter when it migrated from the original host.

But you are exercising the case where it does a checkpoint,
not a full save/restore cycle.

In which case you might be indeed hitting a bug.
> non-PVOPS guest os doesn't RESUME blkfront, thus they works fine.
Potentially. The non-PVOPS guests are based on an ancient kernels and
the upstream logic in the generic suspend/resume machinery has also
changed.
> 
> So, here comes the 2 questions, is the problem caused because: 
> 1) PVOPS kernel doesn't take this situation into accont, and has a bug
here?
> or
> 2) PVOPS has other ways to avoid such problem?
Just to make sure I am not confused here. The problem does not
appear if you do NOT use -c, correct?
> 
> -Gonglei
> > 
> > Thanks.
> > > How reproducible:
> > > -------------------
> > > 1/1
> > >
> > > Steps to reproduce:
> > > ------------------
> > >   1)suspend guest os
> > >     Note: do not migrate/shutdown the guest os.
> > >   2)resume guest os
> > >
> > > (Think about rolling-back(resume) during core-dumping(suspend) a
guest,
> > such problem would cause the guest os unoprationable.)
> > >
> > >
> > ===============================================================>
> ===> > > we found warning messages in guest os:
> > >
--------------------------------------------------------------------
> > > Aug  2 10:17:34 localhost kernel: [38592.985159] platform pcspkr:
resume
> > > Aug  2 10:17:34 localhost kernel: [38592.989890] platform
vesafb.0: resume
> > > Aug  2 10:17:34 localhost kernel: [38592.996075] input input0:
type resume
> > > Aug  2 10:17:34 localhost kernel: [38593.001330] input input1:
type resume
> > > Aug  2 10:17:34 localhost kernel: [38593.005496] vbd vbd-51712:
legacy
> > resume
> > > Aug  2 10:17:34 localhost kernel: [38593.011506] WARNING: g.e.
still in use!
> > > Aug  2 10:17:34 localhost kernel: [38593.016909] WARNING: leaking
g.e.
> > and page still in use!
> > > Aug  2 10:17:34 localhost kernel: [38593.026204] xen vbd-51760:
legacy
> > resume
> > > Aug  2 10:17:34 localhost kernel: [38593.033070] vif vif-0:
legacy resume
> > > Aug  2 10:17:34 localhost kernel: [38593.039327] WARNING: g.e.
still in use!
> > > Aug  2 10:17:34 localhost kernel: [38593.045304] WARNING: leaking
g.e.
> > and page still in use!
> > > Aug  2 10:17:34 localhost kernel: [38593.052101] WARNING: g.e.
still in use!
> > > Aug  2 10:17:34 localhost kernel: [38593.057965] WARNING: leaking
g.e.
> > and page still in use!
> > > Aug  2 10:17:34 localhost kernel: [38593.066795] serial8250
serial8250:
> > resume
> > > Aug  2 10:17:34 localhost kernel: [38593.073556] input input2:
type resume
> > > Aug  2 10:17:34 localhost kernel: [38593.079385] platform Fixed
MDIO bus.0:
> > resume
> > > Aug  2 10:17:34 localhost kernel: [38593.086285] usb usb1: type
resume
> > > ------------------------------------------------------
> > >
> > > which means that we refers to a grant-table while it's in
use.
> > >
> > > The reason results in that:
> > > suspend/resume codes:
> > > --------------------------------------------------------
> > > //drivers/xen/manage.c
> > > static void do_suspend(void)
> > > {
> > > 	int err;
> > > 	struct suspend_info si;
> > >
> > > 	shutting_down = SHUTDOWN_SUSPEND;
> > >
> > > ………………
> > > 	err = dpm_suspend_start(PMSG_FREEZE);
> > > ………………
> > > 	dpm_resume_start(si.cancelled ? PMSG_THAW : PMSG_RESTORE);
> > >
> > > 	if (err) {
> > > 		pr_err("failed to start xen_suspend: %d\n", err);
> > > 		si.cancelled = 1;
> > > 	}
> > > //NOTE: si.cancelled = 1
> > >
> > > out_resume:
> > > 	if (!si.cancelled) {
> > > 		xen_arch_resume();
> > > 		xs_resume();
> > > 	} else
> > > 		xs_suspend_cancel();
> > >
> > > 	dpm_resume_end(si.cancelled ? PMSG_THAW : PMSG_RESTORE);
> > //blkfront device got resumed here.
> > >
> > > out_thaw:
> > > #ifdef CONFIG_PREEMPT
> > > 	thaw_processes();
> > > out:
> > > #endif
> > > 	shutting_down = SHUTDOWN_INVALID;
> > > }
> > > ------------------------------------
> > >
> > > Func "dpm_suspend_start" suspends devices, and
"dpm_resume_end"
> > resumes devices.
> > > However, we found that the device "blkfront" has no
SUSPEND method but
> > RESUME method.
> > >
> > > -------------------------------------
> > > //drivers/block/xen-blkfront.c
> > > static DEFINE_XENBUS_DRIVER(blkfront, ,
> > > 	.probe = blkfront_probe,
> > > 	.remove = blkfront_remove,
> > > 	.resume = blkfront_resume,  // only RESUME method found here.
> > > 	.otherend_changed = blkback_changed,
> > > 	.is_ready = blkfront_is_ready,
> > > );
> > > --------------------------------------
> > >
> > > It resumes blkfront device when it didn't get suspended,
which caused the
> > prolem above.
> > >
> > >
> > > ========================================> > > In order
to check whether it's the problem of PVOPS or hypervisor(xen)/dom0,
> > we suspend/resume other non-PVOPS guest oses, no such problem occured.
> > >
> > > Other non-PVOPS are using their own xen drivers, as shown in
> >
https://github.com/jpaton/xen-4.1-LJX1/blob/master/unmodified_drivers/linux-
> > 2.6/platform-pci/machine_reboot.c :
> > >
> > > int __xen_suspend(int fast_suspend, void (*resume_notifier)(int))
> > > {
> > >     int err, suspend_cancelled, nr_cpus;
> > >     struct ap_suspend_info info;
> > >
> > >     xenbus_suspend();
> > >
> > > ……………………
> > >     preempt_enable();
> > >
> > >     if (!suspend_cancelled)
> > >         xenbus_resume();     //when the guest os get resumed,
> > suspend_cancelled == 1, thus it wouldn't enter xenbus_resume_uvp
here.
> > >     else
> > >         xenbus_suspend_cancel();  //It gets here. so the blkfront
> > wouldn't resume.
> > >
> > >     return 0;
> > > }
> > >
> > >
> > > In non-PVOPS guest os, although they don't have blkfront
SUSPEND method
> > either, their xen-driver doesn't resume blkfront device, thus,
they would't have
> > any problem after suspend/resume.
> > >
> > >
> > > I'm wondering why the 2 types of driver(PVOPS and non-PVOPS)
are different
> > here.
> > > Is that because:
> > > 1) PVOPS kernel doesn't take this situation into accont, and
has a bug here?
> > > or
> > > 2) PVOPS has other ways to avoid such problem?
> > >
> > > thank you in advance.
> > >
> > > -Gonglei
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.xen.org
> > > http://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Gonglei (Arei)

2013-Aug-12 14:19 UTC

head link

Re: pvops: Does PVOPS guest os support online "suspend/resume"

Hi,
> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Monday, August 12, 2013 8:50 PM
> To: Gonglei (Arei)
> Cc: xen-devel@lists.xen.org; Zhangbo (Oscar); Luonengjun;
> ian.campbell@citrix.com; stefano.stabellini@eu.citrix.com; rjw@sisk.pl;
> rshriram@cs.ubc.ca; Yanqiangjun; Jinjian (Ken)
> Subject: Re: [Xen-devel] pvops: Does PVOPS guest os support online
> "suspend/resume"
> 
> On Sat, Aug 10, 2013 at 08:29:43AM +0000, Gonglei (Arei) wrote:
> >
> >
> > > -----Original Message-----
> > > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> > > Sent: Friday, August 09, 2013 3:17 AM
> > > To: Gonglei (Arei)
> > > Cc: xen-devel@lists.xen.org; Zhangbo (Oscar); Luonengjun;
Hanweidong
> > > Subject: Re: [Xen-devel] pvops: Does PVOPS guest os support
online
> > > "suspend/resume"
> > >
> > > On Thu, Aug 08, 2013 at 02:23:06PM +0000, Gonglei (Arei) wrote:
> > > > Hi all,
> > > >
> > > > While suspend and resume a PVOPS guest os while
it''s running, we found
> that
> > > it would get its block/net io stucked. However, non-PVOPS guest
os has no
> such
> > > problem.
> > > >
> > >
> > > With what version of Linux is this? Have you tried with v3.10?
> >
> > Thanks for responding. We''ve tried kernel "3.5.0-17
generic" (ubuntu 12.10),
> the problem still exists.
> 
> So you have not tried v3.10. v3.5 is ancient from the upstream perspective.
> thank you, I didn''t notice that, I would try 3.10 later.
> > Although we are not sure about the result about kernel 3.10, but
suspiciously
> it would also have the same problem.
> 
> Potentially. There were fixes added in 3.5:
> 
> commit 569ca5b3f94cd0b3295ec5943aa457cf4a4f6a3a
> Author: Jan Beulich <JBeulich@suse.com>
> Date:   Thu Apr 5 16:10:07 2012 +0100
> 
>     xen/gnttab: add deferred freeing logic
> 
>     Rather than just leaking pages that can''t be freed at the
point where
>     access permission for the backend domain gets revoked, put them on a
>     list and run a timer to (infrequently) retry freeing them. (This can
>     particularly happen when unloading a frontend driver when devices are
>     still present, and the backend still has them in non-closed state or
>     hasn''t finished closing them yet.)
> 
> and that seems to be triggered.
I''ve tryed to apply this patch, but it didn''t fix this
problem:
it retries endlessly to free the leaking pages, however, there seems to be no
end.
messages keep coming out per seconds "WARNING: leaking g.e. and page still
in use!"> >
> > Xen version:  4.3.0
> >
> > Another method to reproduce:
> > 1) xl create dom1.cfg
> > 2) xl save -c dom1 /path/to/save/file
> >    (-c  Leave domain running after creating the snapshot.)
> >
> > As I mentioned before, the problem occurs because PVOPS guest os
> RESUMEes blkfront when the guest resumes.
> > The "blkfront_resume" method seems unnecessary here.
> 
> It has to do that otherwise it can''t replay the I/Os that might
not have
> hit the platter when it migrated from the original host.
> 
> But you are exercising the case where it does a checkpoint,
> not a full save/restore cycle.
> 
> In which case you might be indeed hitting a bug.
If we add a suspend method for the blkfront, to make the front/end blk device
turn their states from
{XenbusStateConnected, XenbusStateConnected} into{XenbusStateInitialising,
XenbusStateInitWait},
when we suspend the guest os,would that cause any problem? 
We found that windows xen-pv driver did such things. We''re hoping that
such attempt would solve this problem> 
> > non-PVOPS guest os doesn''t RESUME blkfront, thus they works
fine.
> 
> Potentially. The non-PVOPS guests are based on an ancient kernels and
> the upstream logic in the generic suspend/resume machinery has also
> changed.
> 
> >
> > So, here comes the 2 questions, is the problem caused because:
> > 1) PVOPS kernel doesn''t take this situation into accont, and
has a bug here?
> > or
> > 2) PVOPS has other ways to avoid such problem?
> 
> Just to make sure I am not confused here. The problem does not
> appear if you do NOT use -c, correct?
yes, the purpose of using "-c" here is to do a "ONLINE"
suspend/resume. such problem just occurs with ONLINE suspend/resume,
rather than OFFLINE suspend/resume. To be precisely, 2 examples are listed here
below:
  <1>
  1) xl create dom1.cfg
  2) xl save -c dom1 /opt/dom1.save  
     after this, the dom1 guest os has its io stucked. which means ONLINE
suspend/resume has something wrong.
  3) xl destroy dom1
  4) xl restore /opt/dom1.save
     the restored dom1 works fine, which means OFFLINE suspend/resume is OK.
   

  <2>
  1) xl create dom1.cfg
  2) xl save dom1 /opt/dom1.save
     no "-c" here, it would destroy the guest dom1 automatically. 
  3) xl restore /opt/dom1.save
     the restored dom1 works fine, which means OFFLINE suspend/resume is OK.

-Gonglei

Shriram Rajagopalan

2013-Aug-12 18:04 UTC

head link

Re: pvops: Does PVOPS guest os support online "suspend/resume"

On Mon, Aug 12, 2013 at 10:19 AM, Gonglei (Arei)
<arei.gonglei@huawei.com>wrote:
> > > Thanks for responding. We''ve tried kernel "3.5.0-17
generic" (ubuntu
> 12.10),
> > the problem still exists.
> >
> > So you have not tried v3.10. v3.5 is ancient from the upstream
> perspective.
> >
> thank you, I didn''t notice that, I would try 3.10 later.
>
>3.5 may be ancient compared to 3.10 but from the suspend/resume support
perspective,
I think things were fixed way back in 3.0 series.

> yes, the purpose of using "-c" here is to do a "ONLINE"
suspend/resume.
> such problem just occurs with ONLINE suspend/resume,
> rather than OFFLINE suspend/resume. To be precisely, 2 examples are listed
> here below:
>   <1>
>   1) xl create dom1.cfg
>   2) xl save -c dom1 /opt/dom1.save
>      after this, the dom1 guest os has its io stucked. which means ONLINE
> suspend/resume has something wrong.
>   3) xl destroy dom1
>   4) xl restore /opt/dom1.save
>      the restored dom1 works fine, which means OFFLINE suspend/resume is
> OK.
>
>
>I am a bit lost here. Didnt we fix suspend/resume issues in the 3.0 release
window.
I tested it with both xm and xl save (with/without -c option). That was
also when I fixed
some bugs in "xl save -c" code and introduced a minimal xl remus
implementation (which is a continuous xl save -c).
And we had blkfront et. al at that time too.

Did the distros miss some kernel config (iirc it was HIBERNATE_CALLBACKS) ?

So, did something fundamental change between 3.0 to 3.5, causing the
"regression" that
Gonglei is seeing ?


>   <2>
>   1) xl create dom1.cfg
>   2) xl save dom1 /opt/dom1.save
>      no "-c" here, it would destroy the guest dom1 automatically.
>   3) xl restore /opt/dom1.save
>      the restored dom1 works fine, which means OFFLINE suspend/resume is
> OK.
>
>This one always worked.. even with stock 2.6 kernels.

shriram


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Gonglei (Arei)

2013-Aug-13 14:38 UTC

head link

Re: pvops: Does PVOPS guest os support online "suspend/resume"

Hi,
I rechecked the different kernels today, and found that I made a mistake before.
sorry for misleading you all:)

All in all, the problems should be concluded in the 2 items below:
1 the kernel 2.6.32 PVOPS guest os(I tested RHEL6.1 and RHEL6.3), does have bugs
in ONLINE suspend/resume (checkpoint), which was,
as Shriram mentioned, fixed in:
http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/xen/manage.c?id=b3e96c0c756211e805c6941d4a6e5f6e1995cb6b
2 the kernel above 3.0(I tested Ubuntu12.10 with kernel 3.5 and Ubuntu13.04 with
kernel 3.8), they seem to have another "bug":
  1) if we set MULTI VCPUS for the guest os, it would have problems in
resuming(to be correctly, it's thaw).
     In details:
         <1>set the guest os with 4 vcpus
             in dom1.cfg: vcpus=4
         <2>xl create dom1.cfg
             excute command "top -d 1" in guest dom1's vnc window
         <3>xl save -c dom1 /opt/dom1.save
         <4>after step <3>, we check the guest dom1's vnc
window, and found that:
             kernel thread migration/1, migration/2, migration/3 got their cpu
usage up to 100%
                   the guest os couldn't respond to any request such as
mouse movement or keyboard input.
                   no "thaw" things printed in dom1's serial
output.

  2) if we set only 1 vcpu for the guest os, it would thaw back and works fine.
  3) anyother odd thing is that: if we use the saved file generated in 2-1) to
restore the guest, and then do online suspend/resume (xl save -c, checkpoint),
it would be fine, no problems occurred.

Such problem occurs on guest os with kernel 3.5/3.8(maybe other kernels as well,
not tested). I hope that the steps I did was correct.
Have you ever entercounter such "suspend/resume checkpoint on multi-vcpu
guest os" problem?

-------
PS: BTW, I'm wondering why using freeze/thaw instead of suspend/resume would
solve the problem with kernels below 3.0?
 It seems that blkfront_resume is still called if we use thaw method here,
because blkfront has no available pm_op.

    static int device_resume(struct device *dev, pm_message_t state, bool async)
    {
         …………
                   if (dev->bus) {
                   if (dev->bus->pm) {
                            info = "bus ";
                            callback = pm_op(dev->bus->pm, state);
                   } else if (dev->bus->resume) {
                            info = "legacy bus ";
                            callback = dev->bus->resume; 
//blkfront_resume is called here. here?
                            goto End;
                   }
         …………
         }

Best Regards!

-Gonglei

From: Shriram Rajagopalan [mailto:rshriram@cs.ubc.ca]
Sent: Tuesday, August 13, 2013 2:05 AM
To: Gonglei (Arei)
Cc: Konrad Rzeszutek Wilk; xen-devel@lists.xen.org; Zhangbo (Oscar); Luonengjun;
ian.campbell@citrix.com; stefano.stabellini@eu.citrix.com; rjw@sisk.pl;
Yanqiangjun; Jinjian (Ken)
Subject: Re: [Xen-devel] pvops: Does PVOPS guest os support online
"suspend/resume"

On Mon, Aug 12, 2013 at 10:19 AM, Gonglei (Arei)
<arei.gonglei@huawei.com<mailto:arei.gonglei@huawei.com>>
wrote:> > Thanks for responding. We've tried kernel "3.5.0-17
generic" (ubuntu 12.10),
> the problem still exists.
>
> So you have not tried v3.10. v3.5 is ancient from the upstream perspective.
>thank you, I didn't notice that, I would try 3.10 later.


3.5 may be ancient compared to 3.10 but from the suspend/resume support
perspective,
I think things were fixed way back in 3.0 series.


yes, the purpose of using "-c" here is to do a "ONLINE"
suspend/resume. such problem just occurs with ONLINE suspend/resume,
rather than OFFLINE suspend/resume. To be precisely, 2 examples are listed here
below:
  <1>
  1) xl create dom1.cfg
  2) xl save -c dom1 /opt/dom1.save
     after this, the dom1 guest os has its io stucked. which means ONLINE
suspend/resume has something wrong.
  3) xl destroy dom1
  4) xl restore /opt/dom1.save
     the restored dom1 works fine, which means OFFLINE suspend/resume is OK.


I am a bit lost here. Didnt we fix suspend/resume issues in the 3.0 release
window.
I tested it with both xm and xl save (with/without -c option). That was also
when I fixed
some bugs in "xl save -c" code and introduced a minimal xl remus
implementation (which is a continuous xl save -c).
And we had blkfront et. al at that time too.

Did the distros miss some kernel config (iirc it was HIBERNATE_CALLBACKS) ?

So, did something fundamental change between 3.0 to 3.5, causing the
"regression" that
Gonglei is seeing ?


  <2>
  1) xl create dom1.cfg
  2) xl save dom1 /opt/dom1.save
     no "-c" here, it would destroy the guest dom1 automatically.
  3) xl restore /opt/dom1.save
     the restored dom1 works fine, which means OFFLINE suspend/resume is OK.

This one always worked.. even with stock 2.6 kernels.

shriram


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Konrad Rzeszutek Wilk

2013-Aug-13 16:34 UTC

head link

Re: pvops: Does PVOPS guest os support online "suspend/resume"

On Tue, Aug 13, 2013 at 02:38:18PM +0000, Gonglei (Arei)
wrote:> Hi,
> I rechecked the different kernels today, and found that I made a mistake
before. sorry for misleading you all:)
> 
> All in all, the problems should be concluded in the 2 items below:
> 1 the kernel 2.6.32 PVOPS guest os(I tested RHEL6.1 and RHEL6.3), does have
bugs in ONLINE suspend/resume (checkpoint), which was,
> as Shriram mentioned, fixed in:
>
http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/drivers/xen/manage.c?id=b3e96c0c756211e805c6941d4a6e5f6e1995cb6b
> 2 the kernel above 3.0(I tested Ubuntu12.10 with kernel 3.5 and Ubuntu13.04
with kernel 3.8), they seem to have another "bug":
>   1) if we set MULTI VCPUS for the guest os, it would have problems in
resuming(to be correctly, it's thaw).
>      In details:
>          <1>set the guest os with 4 vcpus
>              in dom1.cfg: vcpus=4
>          <2>xl create dom1.cfg
>              excute command "top -d 1" in guest dom1's vnc
window
>          <3>xl save -c dom1 /opt/dom1.save
>          <4>after step <3>, we check the guest dom1's vnc
window, and found that:
>              kernel thread migration/1, migration/2, migration/3 got their
cpu usage up to 100%
>                    the guest os couldn't respond to any request such as
mouse movement or keyboard input.
>                    no "thaw" things printed in dom1's serial
output.
> 
>   2) if we set only 1 vcpu for the guest os, it would thaw back and works
fine.
>   3) anyother odd thing is that: if we use the saved file generated in 2-1)
to restore the guest, and then do online suspend/resume (xl save -c,
checkpoint),
> it would be fine, no problems occurred.
> 
> Such problem occurs on guest os with kernel 3.5/3.8(maybe other kernels as
well, not tested). I hope that the steps I did was correct.
Please do check with the upstream kernel. There were some CPU hotplug issues in
older kernels
and just to make sure that this is not one of them it would be good to eliminate
this.

Please do test with v3.11-rc5.
> Have you ever entercounter such "suspend/resume checkpoint on
multi-vcpu guest os" problem?
> 
> -------
> PS: BTW, I'm wondering why using freeze/thaw instead of suspend/resume
would solve the problem with kernels below 3.0?
>  It seems that blkfront_resume is still called if we use thaw method here,
because blkfront has no available pm_op.
> 
>     static int device_resume(struct device *dev, pm_message_t state, bool
async)
>     {
>          …………
>                    if (dev->bus) {
>                    if (dev->bus->pm) {
>                             info = "bus ";
>                             callback = pm_op(dev->bus->pm, state);
>                    } else if (dev->bus->resume) {
>                             info = "legacy bus ";
>                             callback = dev->bus->resume; 
//blkfront_resume is called here. here?
>                             goto End;
One easy way to figure this out is to stick printks in here to see if that
blkfront code
is indeed called. You can also use 'dump_stack()' to get a nice
stack-trace.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Gonglei (Arei)

2013-Aug-14 10:52 UTC

head link

Re: pvops: Does PVOPS guest os support online "suspend/resume"

> -----Original Message-----
> From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@oracle.com]
> Sent: Wednesday, August 14, 2013 12:35 AM
> To: Gonglei (Arei)
> Cc: rshriram@cs.ubc.ca; xen-devel@lists.xen.org; Zhangbo (Oscar);
> Luonengjun; ian.campbell@citrix.com; stefano.stabellini@eu.citrix.com;
> rjw@sisk.pl; Yanqiangjun; Jinjian (Ken)
> Subject: Re: [Xen-devel] pvops: Does PVOPS guest os support online
> "suspend/resume"
> 
> On Tue, Aug 13, 2013 at 02:38:18PM +0000, Gonglei (Arei) wrote:
> > Hi,
> > I rechecked the different kernels today, and found that I made a
mistake
> before. sorry for misleading you all:)
> >
> > All in all, the problems should be concluded in the 2 items below:
> > 1 the kernel 2.6.32 PVOPS guest os(I tested RHEL6.1 and RHEL6.3), does
have
> bugs in ONLINE suspend/resume (checkpoint), which was,
> > as Shriram mentioned, fixed in:
> >
>
http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/driver
> s/xen/manage.c?id=b3e96c0c756211e805c6941d4a6e5f6e1995cb6b
> > 2 the kernel above 3.0(I tested Ubuntu12.10 with kernel 3.5 and
Ubuntu13.04
> with kernel 3.8), they seem to have another "bug":
> >   1) if we set MULTI VCPUS for the guest os, it would have problems in
> resuming(to be correctly, it's thaw).
> >      In details:
> >          <1>set the guest os with 4 vcpus
> >              in dom1.cfg: vcpus=4
> >          <2>xl create dom1.cfg
> >              excute command "top -d 1" in guest dom1's
vnc window
> >          <3>xl save -c dom1 /opt/dom1.save
> >          <4>after step <3>, we check the guest dom1's
vnc window, and
> found that:
> >              kernel thread migration/1, migration/2, migration/3 got
> their cpu usage up to 100%
> >                    the guest os couldn't respond to any request
such as
> mouse movement or keyboard input.
> >                    no "thaw" things printed in dom1's
serial output.
> >
> >   2) if we set only 1 vcpu for the guest os, it would thaw back and
works fine.
> >   3) anyother odd thing is that: if we use the saved file generated in
2-1) to
> restore the guest, and then do online suspend/resume (xl save -c,
checkpoint),
> > it would be fine, no problems occurred.
> >
> > Such problem occurs on guest os with kernel 3.5/3.8(maybe other
kernels as
> well, not tested). I hope that the steps I did was correct.
> 
> Please do check with the upstream kernel. There were some CPU hotplug
> issues in older kernels
> and just to make sure that this is not one of them it would be good to
eliminate
> this.
> 
> Please do test with v3.11-rc5.
> 
> > Have you ever entercounter such "suspend/resume checkpoint on
multi-vcpu
> guest os" problem?
> >
> > -------
> > PS: BTW, I'm wondering why using freeze/thaw instead of
suspend/resume
> would solve the problem with kernels below 3.0?
> >  It seems that blkfront_resume is still called if we use thaw method
here,
> because blkfront has no available pm_op.
> >
> >     static int device_resume(struct device *dev, pm_message_t state,
bool
> async)
> >     {
> >          …………
> >                    if (dev->bus) {
> >                    if (dev->bus->pm) {
> >                             info = "bus ";
> >                             callback = pm_op(dev->bus->pm,
state);
> >                    } else if (dev->bus->resume) {
> >                             info = "legacy bus ";
> >                             callback = dev->bus->resume;
> //blkfront_resume is called here. here?
> >                             goto End;
> 
> One easy way to figure this out is to stick printks in here to see if that
blkfront
> code
> is indeed called. You can also use 'dump_stack()' to get a nice
stack-trace.
Hi,
1 I tried kernel 3.11-rc6, it has the same problem: 
    after doing the checkpoint, multi-vcpu guest os can't respond to
anything, because its kernel threads migration/1, migration/2, etc, got their
cpu usage up to 100%
	
2 kernel 3.0 doesn't have this problem.

So, It seems that some bugs came out between v3.0 and v3.5, something concerning
vcpu freeze/thaw ? Thanks!

-Gonglei
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Xen devel - Aug 2013 - pvops: Does PVOPS guest os support online "suspend/resume"

pvops: Does PVOPS guest os support online "suspend/resume"

Re: pvops: Does PVOPS guest os support online "suspend/resume"

Re: pvops: Does PVOPS guest os support online "suspend/resume"

Re: pvops: Does PVOPS guest os support online "suspend/resume"

Re: pvops: Does PVOPS guest os support online "suspend/resume"

Re: pvops: Does PVOPS guest os support online "suspend/resume"

Re: pvops: Does PVOPS guest os support online "suspend/resume"

Re: pvops: Does PVOPS guest os support online "suspend/resume"

Re: pvops: Does PVOPS guest os support online "suspend/resume"