SUZUKI, Kazuhiro
2011-Feb-07 09:07 UTC
[Xen-devel] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
Hi, The following patch series fixes hangup after creating checkpoint on Xen. The Linux Xen guest can be saved the state to restore later, and also created snapshot like checkpoint via the hypervisor. But, when the snapshot is created for the PV guest, it will hangup. We added ''PMSG_CANCEL'' message and ''cancel'' handler in dev_pm_ops struct in the pm-linux part. In creating checkpoint mode, the resume handler of xenbus should not be called. In this case, it is recognized that the suspend was canceled in drivers/xen/manage.c and call dpm_resume_end() with PMSG_CANCEL. If the ''cancel'' handler is defined, it is called instead of resume(). [1/2] - Fix hangup after creating checkpoint on Xen -- pm-linux part. [2/2] - Fix hangup after creating checkpoint on Xen -- Xen part. Please review. Thanks, KAZ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
SUZUKI, Kazuhiro
2011-Feb-07 09:08 UTC
[Xen-devel] [PATCH 1/2] Fix hangup after creating checkpoint on Xen.
Hi, This is a pm-linux part patch. Thanks, KAZ Signed-off-by: Kenji Wakamiya <wkenji@jp.fujitsu.com> Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com> --- drivers/base/power/main.c | 9 +++++++++ include/linux/pm.h | 3 +++ 2 files changed, 12 insertions(+), 0 deletions(-) diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 8aa2443..e348b5d 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -181,6 +181,13 @@ static int pm_op(struct device *dev, suspend_report_result(ops->suspend, error); } break; + case PM_EVENT_CANCEL: + if (ops->cancel) { + error = ops->cancel(dev); + suspend_report_result(ops->cancel, error); + break; + } + /* Fall through */ case PM_EVENT_RESUME: if (ops->resume) { error = ops->resume(dev); @@ -293,6 +300,8 @@ static char *pm_verb(int event) return "suspend"; case PM_EVENT_RESUME: return "resume"; + case PM_EVENT_CANCEL: + return "cancel"; case PM_EVENT_FREEZE: return "freeze"; case PM_EVENT_QUIESCE: diff --git a/include/linux/pm.h b/include/linux/pm.h index 3b7e04b..d118781 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -198,6 +198,7 @@ struct dev_pm_ops { void (*complete)(struct device *dev); int (*suspend)(struct device *dev); int (*resume)(struct device *dev); + int (*cancel)(struct device *dev); int (*freeze)(struct device *dev); int (*thaw)(struct device *dev); int (*poweroff)(struct device *dev); @@ -291,6 +292,7 @@ struct dev_pm_ops name = { \ #define PM_EVENT_USER 0x0100 #define PM_EVENT_REMOTE 0x0200 #define PM_EVENT_AUTO 0x0400 +#define PM_EVENT_CANCEL 0x0800 #define PM_EVENT_SLEEP (PM_EVENT_SUSPEND | PM_EVENT_HIBERNATE) #define PM_EVENT_USER_SUSPEND (PM_EVENT_USER | PM_EVENT_SUSPEND) @@ -308,6 +310,7 @@ struct dev_pm_ops name = { \ #define PMSG_THAW ((struct pm_message){ .event = PM_EVENT_THAW, }) #define PMSG_RESTORE ((struct pm_message){ .event = PM_EVENT_RESTORE, }) #define PMSG_RECOVER ((struct pm_message){ .event = PM_EVENT_RECOVER, }) +#define PMSG_CANCEL ((struct pm_message){ .event = PM_EVENT_CANCEL, }) #define PMSG_USER_SUSPEND ((struct pm_message) \ { .event = PM_EVENT_USER_SUSPEND, }) #define PMSG_USER_RESUME ((struct pm_message) \ -- 1.7.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
SUZUKI, Kazuhiro
2011-Feb-07 09:08 UTC
[Xen-devel] [PATCH 2/2] Fix hangup after creating checkpoint on Xen.
Hi, This is a Xen part patch. Thanks, KAZ Signed-off-by: Kenji Wakamiya <wkenji@jp.fujitsu.com> Signed-off-by: Kazuhiro Suzuki <kaz@jp.fujitsu.com> --- drivers/net/xen-netfront.c | 2 +- drivers/xen/manage.c | 7 ++++--- drivers/xen/xenbus/xenbus_probe.c | 12 ++++++++++-- drivers/xen/xenbus/xenbus_probe.h | 3 ++- drivers/xen/xenbus/xenbus_probe_frontend.c | 8 ++++++-- include/xen/xenbus.h | 2 +- 6 files changed, 24 insertions(+), 10 deletions(-) diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index 3f71199..22c6288 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -1293,7 +1293,7 @@ static void xennet_disconnect_backend(struct netfront_info *info) info->rx.sring = NULL; } -static int netfront_suspend(struct xenbus_device *dev, pm_message_t state) +static int netfront_suspend(struct xenbus_device *dev) { struct netfront_info *info = dev_get_drvdata(&dev->dev); struct hrtimer *timer = &info->smart_poll.timer; diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c index 0b50906..845afb8 100644 --- a/drivers/xen/manage.c +++ b/drivers/xen/manage.c @@ -148,10 +148,11 @@ out_resume: if (!cancelled) { xen_arch_resume(); xs_resume(); - } else + dpm_resume_end(PMSG_RESUME); + } else { xs_suspend_cancel(); - - dpm_resume_end(PMSG_RESUME); + dpm_resume_end(PMSG_CANCEL); + } /* Make sure timer events get retriggered on all CPUs */ clock_was_set(); diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index 3a83ba2..02dbb8b 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -575,7 +575,7 @@ void xenbus_dev_changed(const char *node, struct xen_bus_type *bus) } EXPORT_SYMBOL_GPL(xenbus_dev_changed); -int xenbus_dev_suspend(struct device *dev, pm_message_t state) +int xenbus_dev_suspend(struct device *dev) { int err = 0; struct xenbus_driver *drv; @@ -587,7 +587,7 @@ int xenbus_dev_suspend(struct device *dev, pm_message_t state) return 0; drv = to_xenbus_driver(dev->driver); if (drv->suspend) - err = drv->suspend(xdev, state); + err = drv->suspend(xdev); if (err) printk(KERN_WARNING "xenbus: suspend %s failed: %i\n", dev_name(dev), err); @@ -638,6 +638,14 @@ int xenbus_dev_resume(struct device *dev) } EXPORT_SYMBOL_GPL(xenbus_dev_resume); +int xenbus_dev_cancel(struct device *dev) +{ + /* Do nothing */ + DPRINTK("cancel"); + return 0; +} +EXPORT_SYMBOL_GPL(xenbus_dev_cancel); + /* A flag to determine if xenstored is ''ready'' (i.e. has started) */ int xenstored_ready = 0; diff --git a/drivers/xen/xenbus/xenbus_probe.h b/drivers/xen/xenbus/xenbus_probe.h index 0e5fc4c..4019187 100644 --- a/drivers/xen/xenbus/xenbus_probe.h +++ b/drivers/xen/xenbus/xenbus_probe.h @@ -62,8 +62,9 @@ extern void xenbus_dev_changed(const char *node, struct xen_bus_type *bus); extern void xenbus_dev_shutdown(struct device *_dev); -extern int xenbus_dev_suspend(struct device *dev, pm_message_t state); +extern int xenbus_dev_suspend(struct device *dev); extern int xenbus_dev_resume(struct device *dev); +extern int xenbus_dev_cancel(struct device *dev); extern void xenbus_otherend_changed(struct xenbus_watch *watch, const char **vec, unsigned int len, diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c index 5413248..6dfc588 100644 --- a/drivers/xen/xenbus/xenbus_probe_frontend.c +++ b/drivers/xen/xenbus/xenbus_probe_frontend.c @@ -82,6 +82,11 @@ static struct device_attribute xenbus_frontend_dev_attrs[] = { __ATTR_NULL }; +static struct dev_pm_ops xenbus_pm_ops = { + .suspend = xenbus_dev_suspend, + .resume = xenbus_dev_resume, + .cancel = xenbus_dev_cancel, +}; static struct xen_bus_type xenbus_frontend = { .root = "device", @@ -98,8 +103,7 @@ static struct xen_bus_type xenbus_frontend = { .shutdown = xenbus_dev_shutdown, .dev_attrs= xenbus_frontend_dev_attrs, - .suspend = xenbus_dev_suspend, - .resume = xenbus_dev_resume, + .pm = &xenbus_pm_ops, }, }; diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h index 542ca7c..23e7f25 100644 --- a/include/xen/xenbus.h +++ b/include/xen/xenbus.h @@ -91,7 +91,7 @@ struct xenbus_driver { void (*otherend_changed)(struct xenbus_device *dev, enum xenbus_state backend_state); int (*remove)(struct xenbus_device *dev); - int (*suspend)(struct xenbus_device *dev, pm_message_t state); + int (*suspend)(struct xenbus_device *dev); int (*resume)(struct xenbus_device *dev); int (*uevent)(struct xenbus_device *, struct kobj_uevent_env *); struct device_driver driver; -- 1.7.3.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rafael J. Wysocki
2011-Feb-07 09:35 UTC
[Xen-devel] Re: [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
On Monday, February 07, 2011, SUZUKI, Kazuhiro wrote:> Hi,Hi,> The following patch series fixes hangup after creating checkpoint on > Xen. The Linux Xen guest can be saved the state to restore later, and > also created snapshot like checkpoint via the hypervisor. > But, when the snapshot is created for the PV guest, it will hangup. > > We added ''PMSG_CANCEL'' message and ''cancel'' handler in dev_pm_ops > struct in the pm-linux part.Please don''t do that, unless you can convince me there''s no other way to fix the problem you''re trying to address. In my opinion it''s highly unrealistic to assume that device drivers (or even subsystems) will implement the ->cancel() callback just for the benefit of Xen. And if the only subsystem that needs to implement ->cancel() is Xen, then the issue should be addressed without modifying the device core code, in a different way.> In creating checkpoint mode, the resume handler of xenbus should not > be called. In this case, it is recognized that the suspend was canceled > in drivers/xen/manage.c and call dpm_resume_end() with PMSG_CANCEL. > If the ''cancel'' handler is defined, it is called instead of resume(). > > [1/2] - Fix hangup after creating checkpoint on Xen -- pm-linux part. > [2/2] - Fix hangup after creating checkpoint on Xen -- Xen part.Thanks, Rafael _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Feb-08 11:22 UTC
[Xen-devel] Re: [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
On Mon, 2011-02-07 at 10:35 +0100, Rafael J. Wysocki wrote:> On Monday, February 07, 2011, SUZUKI, Kazuhiro wrote: > > Hi, > > Hi, > > > The following patch series fixes hangup after creating checkpoint on > > Xen. The Linux Xen guest can be saved the state to restore later, and > > also created snapshot like checkpoint via the hypervisor. > > But, when the snapshot is created for the PV guest, it will hangup. > > > > We added ''PMSG_CANCEL'' message and ''cancel'' handler in dev_pm_ops > > struct in the pm-linux part. > > Please don''t do that, unless you can convince me there''s no other way to fix > the problem you''re trying to address.Sorry, it was my advise to Kazuhiro that solving the underlying issue by extending the core would be preferable to making Xen specific hacks. The problem is that currently we have: dpm_suspend_start(PMSG_SUSPEND); dpm_suspend_noirq(PMSG_SUSPEND); sysdev_suspend(PMSG_SUSPEND); /* suspend hypercall */ sysdev_resume(); dpm_resume_noirq(PMSG_RESUME); dpm_resume_end(PMSG_RESUME); However the suspend hypercall can return a value indicating that the suspend didn''t actually happen (e.g. was cancelled). This is used e.g. when checkpointing guests, because in that case you want the original guest to continue. When the suspend didn''t happen the drivers need to recover differently from if it did. The originally proposed solution was to only call dpm_resume_end if the suspend was not cancelled. My concern with this was that unbalancing the dpm_suspend_* and dpm_resume_* did not seem like a correct interaction with the core. For example dpm_suspend_* adds stuff to dpm_suspended_list and if dpm_resume_* is not called presumably this all gets out of sync for next time. Hence my suggestion to add a cancel message type.> In my opinion it''s highly unrealistic to assume that device drivers > (or even subsystems) will implement the ->cancel() callback just for the > benefit of Xen.I did vague wonder if a similar message might be of interest to e.g. cancelling hibernations or similar.> And if the only subsystem that needs to implement ->cancel() is Xen, then the > issue should be addressed without modifying the device core code, in a different > way.I thought it would be preferable to make use of/extend core functionality where possible but if that''s not the case we can find another way. Do you have any suggestions for how to correctly interact with the core functions? Is adding a suspend_cancel operation to just at the struct xenbus_driver level and introducing a xen specific function to walk to the bus the sort of thing you were thinking of? (it seems reasonable). Should we be doing anything with dpm_*_list in that case? (FWIW original thread is on xen-devel at http://thread.gmane.org/gmane.comp.emulators.xen.devel/95265/) Ian. -- Ian Campbell Current Noise: Crowbar - Dead Sun Why on earth do people buy old bottles of wine when they can get a fresh one for a quarter of the price? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Alan Stern
2011-Feb-08 16:46 UTC
[Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
On Tue, 8 Feb 2011, Ian Campbell wrote:> The problem is that currently we have: > > dpm_suspend_start(PMSG_SUSPEND); > > dpm_suspend_noirq(PMSG_SUSPEND); > > sysdev_suspend(PMSG_SUSPEND); > /* suspend hypercall */ > sysdev_resume(); > > dpm_resume_noirq(PMSG_RESUME); > > dpm_resume_end(PMSG_RESUME); > > However the suspend hypercall can return a value indicating that the > suspend didn''t actually happen (e.g. was cancelled). This is used e.g. > when checkpointing guests, because in that case you want the original > guest to continue. When the suspend didn''t happen the drivers need to > recover differently from if it did.That is odd, and it is quite different from the intended design of the PM core. Drivers are supposed to put their devices into a known suspended state; then afterwards they put the devices back into an operational state. What happens while the devices are in the suspended state isn''t supposed to matter -- the system transition can fail, but devices get treated exactly the same way as if it succeeded. Why do your drivers need to recover differently based on the success of the hypercall? Alan Stern _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Feb-08 17:35 UTC
[Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
On Tue, 2011-02-08 at 11:46 -0500, Alan Stern wrote:> On Tue, 8 Feb 2011, Ian Campbell wrote: > > > The problem is that currently we have: > > > > dpm_suspend_start(PMSG_SUSPEND); > > > > dpm_suspend_noirq(PMSG_SUSPEND); > > > > sysdev_suspend(PMSG_SUSPEND); > > /* suspend hypercall */ > > sysdev_resume(); > > > > dpm_resume_noirq(PMSG_RESUME); > > > > dpm_resume_end(PMSG_RESUME); > > > > However the suspend hypercall can return a value indicating that the > > suspend didn''t actually happen (e.g. was cancelled). This is used e.g. > > when checkpointing guests, because in that case you want the original > > guest to continue. When the suspend didn''t happen the drivers need to > > recover differently from if it did. > > That is odd, and it is quite different from the intended design of the > PM core. Drivers are supposed to put their devices into a known > suspended state; then afterwards they put the devices back into an > operational state. What happens while the devices are in the suspended > state isn''t supposed to matter -- the system transition can fail, but > devices get treated exactly the same way as if it succeeded. > > Why do your drivers need to recover differently based on the success of > the hypercall?checkpointing isn''t really my area but AIUI you don''t want to do a full device teardown and reconnect like you would with a proper suspend because of the time that takes which prevents you from doing continuous rolling checkpoints at granularity which people want to implement various disaster recovery schemes. Hopefully one of the Xen checkpointing folks will chime in and explain why this is not possible to achieve at the individual driver level (or, even better, with a patch which does it that way ;-)). Ian. -- Ian Campbell Current Noise: Buckcherry - King Of Kings Beauty? What''s that? -- Larry Wall in <199710221937.MAA25131@wall.org> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2011-Feb-09 23:16 UTC
[Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
On Tuesday, 08 February 2011 at 17:35, Ian Campbell wrote:> On Tue, 2011-02-08 at 11:46 -0500, Alan Stern wrote: > > On Tue, 8 Feb 2011, Ian Campbell wrote: > > > > > The problem is that currently we have: > > > > > > dpm_suspend_start(PMSG_SUSPEND); > > > > > > dpm_suspend_noirq(PMSG_SUSPEND); > > > > > > sysdev_suspend(PMSG_SUSPEND); > > > /* suspend hypercall */ > > > sysdev_resume(); > > > > > > dpm_resume_noirq(PMSG_RESUME); > > > > > > dpm_resume_end(PMSG_RESUME); > > > > > > However the suspend hypercall can return a value indicating that the > > > suspend didn''t actually happen (e.g. was cancelled). This is used e.g. > > > when checkpointing guests, because in that case you want the original > > > guest to continue. When the suspend didn''t happen the drivers need to > > > recover differently from if it did. > > > > That is odd, and it is quite different from the intended design of the > > PM core. Drivers are supposed to put their devices into a known > > suspended state; then afterwards they put the devices back into an > > operational state. What happens while the devices are in the suspended > > state isn''t supposed to matter -- the system transition can fail, but > > devices get treated exactly the same way as if it succeeded. > > > > Why do your drivers need to recover differently based on the success of > > the hypercall? > > checkpointing isn''t really my area but AIUI you don''t want to do a full > device teardown and reconnect like you would with a proper suspend > because of the time that takes which prevents you from doing continuous > rolling checkpoints at granularity which people want to implement > various disaster recovery schemes. > > Hopefully one of the Xen checkpointing folks will chime in and explain > why this is not possible to achieve at the individual driver level (or, > even better, with a patch which does it that way ;-)).As Ian says, Xen has suspend_cancel because while the normal full suspend/resume path works, it is much slower, and the work done during resume is redundant. I don''t remember the numbers offhand, but when we added suspend_cancel I think we could do full suspend/resume in under 100us, which was maybe a couple of orders of magnitude faster than full resume (largely due to slow xenstore handshaking on resume, IIRC). It made a big difference for our Remus HA project, which was checkpointing tens of times per second. I''d like to keep the fast resume option, and expect that it can be contained entirely in Xen-specific code. I''ll try to get someone to look into it here. I think fast resume is somewhat orthogonal to the problem of hanging on resume, which just sounds like a xen-specific bug in the slow path. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Alan Stern
2011-Feb-09 23:42 UTC
[Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
On Wed, 9 Feb 2011, Brendan Cully wrote:> > > That is odd, and it is quite different from the intended design of the > > > PM core. Drivers are supposed to put their devices into a known > > > suspended state; then afterwards they put the devices back into an > > > operational state. What happens while the devices are in the suspended > > > state isn''t supposed to matter -- the system transition can fail, but > > > devices get treated exactly the same way as if it succeeded. > > > > > > Why do your drivers need to recover differently based on the success of > > > the hypercall? > > > > checkpointing isn''t really my area but AIUI you don''t want to do a full > > device teardown and reconnect like you would with a proper suspend > > because of the time that takes which prevents you from doing continuous > > rolling checkpoints at granularity which people want to implement > > various disaster recovery schemes. > > > > Hopefully one of the Xen checkpointing folks will chime in and explain > > why this is not possible to achieve at the individual driver level (or, > > even better, with a patch which does it that way ;-)). > > As Ian says, Xen has suspend_cancel because while the normal full > suspend/resume path works, it is much slower, and the work done during > resume is redundant. I don''t remember the numbers offhand, but when we > added suspend_cancel I think we could do full suspend/resume in under > 100us, which was maybe a couple of orders of magnitude faster than > full resume (largely due to slow xenstore handshaking on resume, > IIRC). It made a big difference for our Remus HA project, which was > checkpointing tens of times per second. > > I''d like to keep the fast resume option, and expect that it can be > contained entirely in Xen-specific code. I''ll try to get someone to > look into it here. > > I think fast resume is somewhat orthogonal to the problem of hanging > on resume, which just sounds like a xen-specific bug in the slow > path.In fact there already is a "fast suspend & resume" path in the PM core. It''s the freeze/thaw procedure used when starting to hibernate. The documentation specifically says that drivers'' freeze methods are supposed to quiesce their devices but not change power levels. In addition, the thaw method is invoked as part of recovery from a failed hibernation attempt, so it already has the "cancel" semantics that xen seems to want. Alan Stern _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Feb-10 11:31 UTC
Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
On Wed, 2011-02-09 at 23:16 +0000, Brendan Cully wrote:> I''d like to keep the fast resume option, and expect that it can be > contained entirely in Xen-specific code. I''ll try to get someone to > look into it here.Thanks, please put them in contact with Kazuhiro who has already been looking into this.> I think fast resume is somewhat orthogonal to the problem of hanging > on resume, which just sounds like a xen-specific bug in the slow > path.The bug on the Xen side is taking the slow path when the suspend was actually cancelled, which is what the original patch tries to fix. Or should/could the slow path be able cope either way? Ian. -- Ian Campbell Current Noise: Neil Young - Love And War For every complex problem, there is a solution that is simple, neat, and wrong. -- H. L. Mencken _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Feb-10 11:40 UTC
Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
On Wed, 2011-02-09 at 23:42 +0000, Alan Stern wrote:> In fact there already is a "fast suspend & resume" path in the PM core. > It''s the freeze/thaw procedure used when starting to hibernate. The > documentation specifically says that drivers'' freeze methods are > supposed to quiesce their devices but not change power levels. In > addition, the thaw method is invoked as part of recovery from a failed > hibernation attempt, so it already has the "cancel" semantics that xen > seems to want.Sounds like that would work and I would much prefer to simply make correct use of the core functionality. So PMSG_FREEZE is balanced by either PMSG_RECOVER or PMSG_THAW depending on whether the suspend was cancelled or not? So the sequence of events is something like: dpm_suspend_start(PMSG_FREEZE); dpm_suspend_noirq(PMSG_FREEZE); sysdev_suspend(PMSG_QUIESCE); cancelled = suspend_hypercall() sysdev_resume(); dpm_resume_noirq(cancelled ? PMSG_RECOVER : PMSG_THAW); dpm_resume_end(cancelled ? PMSG_RECOVER : PMSG_THAW); ? (For comparison we currently have:> > > dpm_suspend_start(PMSG_SUSPEND); > > > > > > dpm_suspend_noirq(PMSG_SUSPEND); > > > > > > sysdev_suspend(PMSG_SUSPEND); > > > /* suspend hypercall */ > > > sysdev_resume(); > > > > > > dpm_resume_noirq(PMSG_RESUME); > > > > > > dpm_resume_end(PMSG_RESUME);) Ian. -- Ian Campbell Current Noise: Neil Young - Angry World If only one could get that wonderful feeling of accomplishment without having to accomplish anything. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Feb-10 12:40 UTC
Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
(cutting CC to just xen-devel) On Thu, 2011-02-10 at 11:31 +0000, Ian Campbell wrote:> On Wed, 2011-02-09 at 23:16 +0000, Brendan Cully wrote: > > I''d like to keep the fast resume option, and expect that it can be > > contained entirely in Xen-specific code. I''ll try to get someone to > > look into it here.On that note: Is there someone around who is willing to more actively and visibly maintain Remus? i.e. make it work with the new toolstack, upstream it to the mainline kernel, work to get it into a state where it can be tested as a matter of course, and generally be seen to be improving it (and its integration/interaction with the rest of the ecosystem) with time? At the moment the impression is that Remus is mostly being left to rot apart from when noise gets made about specific issues. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Alan Stern
2011-Feb-10 16:00 UTC
Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
On Thu, 10 Feb 2011, Ian Campbell wrote:> On Wed, 2011-02-09 at 23:42 +0000, Alan Stern wrote: > > In fact there already is a "fast suspend & resume" path in the PM core. > > It''s the freeze/thaw procedure used when starting to hibernate. The > > documentation specifically says that drivers'' freeze methods are > > supposed to quiesce their devices but not change power levels. In > > addition, the thaw method is invoked as part of recovery from a failed > > hibernation attempt, so it already has the "cancel" semantics that xen > > seems to want. > > Sounds like that would work and I would much prefer to simply make > correct use of the core functionality.It seems like a reasonable approach. Whether it will actually _work_ is a harder question... :-)> So PMSG_FREEZE is balanced by either PMSG_RECOVER or PMSG_THAW depending > on whether the suspend was cancelled or not?Basically yes. It is also "balanced" by PMSG_RESTORE, which is used after a memory image has been restored (although this isn''t relevant to your snapshotting). See the comments in include/linux/pm.h.> So the sequence of events > is something like: > dpm_suspend_start(PMSG_FREEZE); > > dpm_suspend_noirq(PMSG_FREEZE); > > sysdev_suspend(PMSG_QUIESCE);This should say sysdev_suspend(PMSG_FREEZE).> cancelled = suspend_hypercall()At this point swsusp_arch_suspend() is called. If that translates to suspend_hypercall() in your setting, then yes.> sysdev_resume(); > > dpm_resume_noirq(cancelled ? PMSG_RECOVER : PMSG_THAW); > > dpm_resume_end(cancelled ? PMSG_RECOVER : PMSG_THAW); > ?Yes.> (For comparison we currently have: > > > > dpm_suspend_start(PMSG_SUSPEND); > > > > > > > > dpm_suspend_noirq(PMSG_SUSPEND); > > > > > > > > sysdev_suspend(PMSG_SUSPEND); > > > > /* suspend hypercall */ > > > > sysdev_resume(); > > > > > > > > dpm_resume_noirq(PMSG_RESUME); > > > > > > > > dpm_resume_end(PMSG_RESUME); > )Right. The sequence of calls is the same, but the PMSG_ argument is different so drivers are expected to act differently in response. Alan Stern _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rafael J. Wysocki
2011-Feb-10 16:26 UTC
Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
On Thursday, February 10, 2011, Alan Stern wrote:> On Thu, 10 Feb 2011, Ian Campbell wrote: > > > On Wed, 2011-02-09 at 23:42 +0000, Alan Stern wrote: > > > In fact there already is a "fast suspend & resume" path in the PM core. > > > It''s the freeze/thaw procedure used when starting to hibernate. The > > > documentation specifically says that drivers'' freeze methods are > > > supposed to quiesce their devices but not change power levels. In > > > addition, the thaw method is invoked as part of recovery from a failed > > > hibernation attempt, so it already has the "cancel" semantics that xen > > > seems to want. > > > > Sounds like that would work and I would much prefer to simply make > > correct use of the core functionality. > > It seems like a reasonable approach. Whether it will actually _work_ > is a harder question... :-) > > > So PMSG_FREEZE is balanced by either PMSG_RECOVER or PMSG_THAW depending > > on whether the suspend was cancelled or not?That''s correct, but from drivers'' point of view PMSG_THAW is equivalent to PMSG_RECOVER, because the both of them cause ->thaw() callbacks to be executed.> Basically yes. It is also "balanced" by PMSG_RESTORE, which is used > after a memory image has been restored (although this isn''t relevant to > your snapshotting). See the comments in include/linux/pm.h.Yup.> > So the sequence of events > > is something like: > > dpm_suspend_start(PMSG_FREEZE); > > > > dpm_suspend_noirq(PMSG_FREEZE); > > > > sysdev_suspend(PMSG_QUIESCE); > > This should say sysdev_suspend(PMSG_FREEZE).Yes, PMSG_QUIESCE is restore-specific.> > cancelled = suspend_hypercall() > > At this point swsusp_arch_suspend() is called. If that translates to > suspend_hypercall() in your setting, then yes. > > > sysdev_resume(); > > > > dpm_resume_noirq(cancelled ? PMSG_RECOVER : PMSG_THAW); > > > > dpm_resume_end(cancelled ? PMSG_RECOVER : PMSG_THAW); > > ? > > Yes.Actually, I think PMSG_THAW can be used in both cases. The resume-side routines only use the ''state'' argument for diagnostics.> > (For comparison we currently have: > > > > > dpm_suspend_start(PMSG_SUSPEND); > > > > > > > > > > dpm_suspend_noirq(PMSG_SUSPEND); > > > > > > > > > > sysdev_suspend(PMSG_SUSPEND); > > > > > /* suspend hypercall */ > > > > > sysdev_resume(); > > > > > > > > > > dpm_resume_noirq(PMSG_RESUME); > > > > > > > > > > dpm_resume_end(PMSG_RESUME); > > ) > > Right. The sequence of calls is the same, but the PMSG_ argument is > different so drivers are expected to act differently in response.That''s correct. Thanks, Rafael _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Feb-10 16:34 UTC
Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
On Thu, 2011-02-10 at 11:00 -0500, Alan Stern wrote:> On Thu, 10 Feb 2011, Ian Campbell wrote: > > > On Wed, 2011-02-09 at 23:42 +0000, Alan Stern wrote: > > > In fact there already is a "fast suspend & resume" path in the PM core. > > > It''s the freeze/thaw procedure used when starting to hibernate. The > > > documentation specifically says that drivers'' freeze methods are > > > supposed to quiesce their devices but not change power levels. In > > > addition, the thaw method is invoked as part of recovery from a failed > > > hibernation attempt, so it already has the "cancel" semantics that xen > > > seems to want. > > > > Sounds like that would work and I would much prefer to simply make > > correct use of the core functionality. > > It seems like a reasonable approach. Whether it will actually _work_ > is a harder question... :-)Heh.> > So PMSG_FREEZE is balanced by either PMSG_RECOVER or PMSG_THAW depending > > on whether the suspend was cancelled or not? > > Basically yes. It is also "balanced" by PMSG_RESTORE, which is used > after a memory image has been restored (although this isn''t relevant to > your snapshotting). See the comments in include/linux/pm.h.The documentation of the individual events in pm.h is good. Is there a reference for the sequence of events for the different types of suspend/hibernate/etc?> > So the sequence of events > > is something like: > > dpm_suspend_start(PMSG_FREEZE); > > > > dpm_suspend_noirq(PMSG_FREEZE); > > > > sysdev_suspend(PMSG_QUIESCE); > > This should say sysdev_suspend(PMSG_FREEZE). > > > cancelled = suspend_hypercall() > > At this point swsusp_arch_suspend() is called. If that translates to > suspend_hypercall() in your setting, then yes. > > > sysdev_resume(); > > > > dpm_resume_noirq(cancelled ? PMSG_RECOVER : PMSG_THAW); > > > > dpm_resume_end(cancelled ? PMSG_RECOVER : PMSG_THAW); > > ? > > Yes.Both of those call ->thaw ->complete. Did I mean "cancelled ? PMSG_THAW : PMSG_RESTORE"? (or s/THAW/RECOVER?) If the suspend was cancelled then we want the devices to simply pickup where they were before the freeze, wereas if we really did suspend (or migrate or whatever) then they need to do a more complete reset and reconnect operation so we want some sort of indication to the driver which happened.> > (For comparison we currently have: > > > > > dpm_suspend_start(PMSG_SUSPEND); > > > > > > > > > > dpm_suspend_noirq(PMSG_SUSPEND); > > > > > > > > > > sysdev_suspend(PMSG_SUSPEND); > > > > > /* suspend hypercall */ > > > > > sysdev_resume(); > > > > > > > > > > dpm_resume_noirq(PMSG_RESUME); > > > > > > > > > > dpm_resume_end(PMSG_RESUME); > > ) > > Right. The sequence of calls is the same, but the PMSG_ argument is > different so drivers are expected to act differently in response.The drivers don''t actually see the PMSG_* though right? They only see a differing sequence of hooks from dev_pm_ops called. Thanks, Ian.> > Alan Stern > >-- Ian Campbell Current Noise: Sworn Amongst - Useless Accident, n.: A condition in which presence of mind is good, but absence of body is better. -- Foolish Dictionary _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rafael J. Wysocki
2011-Feb-10 17:01 UTC
Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
On Thursday, February 10, 2011, Ian Campbell wrote:> On Thu, 2011-02-10 at 11:00 -0500, Alan Stern wrote: > > On Thu, 10 Feb 2011, Ian Campbell wrote: > > > > > On Wed, 2011-02-09 at 23:42 +0000, Alan Stern wrote: > > > > In fact there already is a "fast suspend & resume" path in the PM core. > > > > It''s the freeze/thaw procedure used when starting to hibernate. The > > > > documentation specifically says that drivers'' freeze methods are > > > > supposed to quiesce their devices but not change power levels. In > > > > addition, the thaw method is invoked as part of recovery from a failed > > > > hibernation attempt, so it already has the "cancel" semantics that xen > > > > seems to want. > > > > > > Sounds like that would work and I would much prefer to simply make > > > correct use of the core functionality. > > > > It seems like a reasonable approach. Whether it will actually _work_ > > is a harder question... :-) > > Heh. > > > > So PMSG_FREEZE is balanced by either PMSG_RECOVER or PMSG_THAW depending > > > on whether the suspend was cancelled or not? > > > > Basically yes. It is also "balanced" by PMSG_RESTORE, which is used > > after a memory image has been restored (although this isn''t relevant to > > your snapshotting). See the comments in include/linux/pm.h. > > The documentation of the individual events in pm.h is good. Is there a > reference for the sequence of events for the different types of > suspend/hibernate/etc? > > > > So the sequence of events > > > is something like: > > > dpm_suspend_start(PMSG_FREEZE); > > > > > > dpm_suspend_noirq(PMSG_FREEZE); > > > > > > sysdev_suspend(PMSG_QUIESCE); > > > > This should say sysdev_suspend(PMSG_FREEZE). > > > > > cancelled = suspend_hypercall() > > > > At this point swsusp_arch_suspend() is called. If that translates to > > suspend_hypercall() in your setting, then yes. > > > > > sysdev_resume(); > > > > > > dpm_resume_noirq(cancelled ? PMSG_RECOVER : PMSG_THAW); > > > > > > dpm_resume_end(cancelled ? PMSG_RECOVER : PMSG_THAW); > > > ? > > > > Yes. > > Both of those call ->thaw ->complete. Did I mean "cancelled ? > PMSG_THAW : PMSG_RESTORE"? (or s/THAW/RECOVER?) > > If the suspend was cancelled then we want the devices to simply pickup > where they were before the freeze, wereas if we really did suspend (or > migrate or whatever) then they need to do a more complete reset and > reconnect operation so we want some sort of indication to the driver > which happened.In that case you should probably use PMSG_THAW (or PMSG_RECOVER) for the "cancel" case and PMSG_RESTORE for the "success" case (pretty much what hibernation does). And please don''t forget to update the comments in pm.h to cover your usage case. :-)> > > (For comparison we currently have: > > > > > > dpm_suspend_start(PMSG_SUSPEND); > > > > > > > > > > > > dpm_suspend_noirq(PMSG_SUSPEND); > > > > > > > > > > > > sysdev_suspend(PMSG_SUSPEND); > > > > > > /* suspend hypercall */ > > > > > > sysdev_resume(); > > > > > > > > > > > > dpm_resume_noirq(PMSG_RESUME); > > > > > > > > > > > > dpm_resume_end(PMSG_RESUME); > > > ) > > > > Right. The sequence of calls is the same, but the PMSG_ argument is > > different so drivers are expected to act differently in response. > > The drivers don''t actually see the PMSG_* though right? They only see a > differing sequence of hooks from dev_pm_ops called.That''s correct. Thanks, Rafael _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2011-Feb-10 17:53 UTC
Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
On Thursday, 10 February 2011 at 11:31, Ian Campbell wrote:> On Wed, 2011-02-09 at 23:16 +0000, Brendan Cully wrote: > > I''d like to keep the fast resume option, and expect that it can be > > contained entirely in Xen-specific code. I''ll try to get someone to > > look into it here. > > Thanks, please put them in contact with Kazuhiro who has already been > looking into this. > > > I think fast resume is somewhat orthogonal to the problem of hanging > > on resume, which just sounds like a xen-specific bug in the slow > > path. > > The bug on the Xen side is taking the slow path when the suspend was > actually cancelled, which is what the original patch tries to fix. > > Or should/could the slow path be able cope either way?I haven''t looked at what''s actually happening yet, but what should be happening is that the pvops kernel should not advertise SUSPEND_CANCEL since it doesn''t yet support it. So xm save -c should call xc_domain_resume with the fast argument set to 0, to do slow path resumption, and that slow path should be able to handle resuming on the same host where it suspended. Slow path resumption from checkpoint was supported in the pre-pvops kernel. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Alan Stern
2011-Feb-10 18:56 UTC
Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
On Thu, 10 Feb 2011, Ian Campbell wrote:> The documentation of the individual events in pm.h is good. Is there a > reference for the sequence of events for the different types of > suspend/hibernate/etc?There''s Documentation/power/devices.txt. Alan Stern _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Brendan Cully
2011-Feb-10 19:31 UTC
Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
On Thursday, 10 February 2011 at 12:40, Ian Campbell wrote:> (cutting CC to just xen-devel) > > On Thu, 2011-02-10 at 11:31 +0000, Ian Campbell wrote: > > On Wed, 2011-02-09 at 23:16 +0000, Brendan Cully wrote: > > > I''d like to keep the fast resume option, and expect that it can be > > > contained entirely in Xen-specific code. I''ll try to get someone to > > > look into it here. > > On that note: Is there someone around who is willing to more actively > and visibly maintain Remus? i.e. make it work with the new toolstack, > upstream it to the mainline kernel, work to get it into a state where it > can be tested as a matter of course, and generally be seen to be > improving it (and its integration/interaction with the rest of the > ecosystem) with time? > > At the moment the impression is that Remus is mostly being left to rot > apart from when noise gets made about specific issues.I''d like to introduce Shriram Rajagopalan, another PhD student here at UBC who has been doing a lot of work with Remus internally. Up to this point he''s concentrated on new features that aren''t quite ready for upstream yet: checkpoint compression, DRBD integration, and failback (live resync back to the primary after it has recovered from failure). He''s also started doing maintenance work for his own purposes that would be beneficial for Xen: building a regression test suite, working on pvops support, and planning xl integration. He''s excited to take on a maintenance role, and ready to put together a plan for upstreaming his work. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Feb-11 09:14 UTC
Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
On Thu, 2011-02-10 at 19:31 +0000, Brendan Cully wrote:> On Thursday, 10 February 2011 at 12:40, Ian Campbell wrote: > > (cutting CC to just xen-devel) > > > > On Thu, 2011-02-10 at 11:31 +0000, Ian Campbell wrote: > > > On Wed, 2011-02-09 at 23:16 +0000, Brendan Cully wrote: > > > > I''d like to keep the fast resume option, and expect that it can be > > > > contained entirely in Xen-specific code. I''ll try to get someone to > > > > look into it here. > > > > On that note: Is there someone around who is willing to more actively > > and visibly maintain Remus? i.e. make it work with the new toolstack, > > upstream it to the mainline kernel, work to get it into a state where it > > can be tested as a matter of course, and generally be seen to be > > improving it (and its integration/interaction with the rest of the > > ecosystem) with time? > > > > At the moment the impression is that Remus is mostly being left to rot > > apart from when noise gets made about specific issues. > > I''d like to introduce Shriram Rajagopalan, another PhD student here at > UBC who has been doing a lot of work with Remus internally. Up to this > point he''s concentrated on new features that aren''t quite ready for > upstream yet: checkpoint compression, DRBD integration, and failback > (live resync back to the primary after it has recovered from > failure). > > He''s also started doing maintenance work for his own purposes that > would be beneficial for Xen: building a regression test suite, working > on pvops support, and planning xl integration. He''s excited to take on > a maintenance role, and ready to put together a plan for upstreaming > his work.Awesome! Welcome Shriram! Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2011-Feb-11 09:37 UTC
Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
On Fri, Feb 11, 2011 at 09:14:09AM +0000, Ian Campbell wrote:> On Thu, 2011-02-10 at 19:31 +0000, Brendan Cully wrote: > > On Thursday, 10 February 2011 at 12:40, Ian Campbell wrote: > > > (cutting CC to just xen-devel) > > > > > > On Thu, 2011-02-10 at 11:31 +0000, Ian Campbell wrote: > > > > On Wed, 2011-02-09 at 23:16 +0000, Brendan Cully wrote: > > > > > I''d like to keep the fast resume option, and expect that it can be > > > > > contained entirely in Xen-specific code. I''ll try to get someone to > > > > > look into it here. > > > > > > On that note: Is there someone around who is willing to more actively > > > and visibly maintain Remus? i.e. make it work with the new toolstack, > > > upstream it to the mainline kernel, work to get it into a state where it > > > can be tested as a matter of course, and generally be seen to be > > > improving it (and its integration/interaction with the rest of the > > > ecosystem) with time? > > > > > > At the moment the impression is that Remus is mostly being left to rot > > > apart from when noise gets made about specific issues. > > > > I''d like to introduce Shriram Rajagopalan, another PhD student here at > > UBC who has been doing a lot of work with Remus internally. Up to this > > point he''s concentrated on new features that aren''t quite ready for > > upstream yet: checkpoint compression, DRBD integration, and failback > > (live resync back to the primary after it has recovered from > > failure). > > > > He''s also started doing maintenance work for his own purposes that > > would be beneficial for Xen: building a regression test suite, working > > on pvops support, and planning xl integration. He''s excited to take on > > a maintenance role, and ready to put together a plan for upstreaming > > his work. > > Awesome! Welcome Shriram! >Indeed, very much welcome! Remus is discussed almost every day on ##xen on irc, many people seem to want to use it! -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Feb-11 09:51 UTC
Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
On Thu, 2011-02-10 at 19:31 +0000, Brendan Cully wrote:> On Thursday, 10 February 2011 at 12:40, Ian Campbell wrote: > > (cutting CC to just xen-devel) > > > > On Thu, 2011-02-10 at 11:31 +0000, Ian Campbell wrote: > > > On Wed, 2011-02-09 at 23:16 +0000, Brendan Cully wrote: > > > > I''d like to keep the fast resume option, and expect that it can be > > > > contained entirely in Xen-specific code. I''ll try to get someone to > > > > look into it here. > > > > On that note: Is there someone around who is willing to more actively > > and visibly maintain Remus? i.e. make it work with the new toolstack, > > upstream it to the mainline kernel, work to get it into a state where it > > can be tested as a matter of course, and generally be seen to be > > improving it (and its integration/interaction with the rest of the > > ecosystem) with time? > > > > At the moment the impression is that Remus is mostly being left to rot > > apart from when noise gets made about specific issues. > > I''d like to introduce Shriram Rajagopalan,[...] He''s excited to take on > a maintenance role, and ready to put together a plan for upstreaming > his work.Shriram, are you happy to add your acked-by to the following? I mostly guessed the most obvious path for the F: tag, feel free to extend (I presume there are bits under tools/python too?). If you have a Remus tree somewhere you can add a T: label too if you like. We can get you a tree on xenbis.xen.org if you want. Ian. 8<------------------- MAINTAINERS: Add Remus maintainer. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> diff -r 9e463cb15658 MAINTAINERS --- a/MAINTAINERS Mon Feb 07 17:02:46 2011 +0000 +++ b/MAINTAINERS Fri Feb 11 09:46:38 2011 +0000 @@ -158,6 +158,11 @@ S: Supported S: Supported T: git git://xenbits.xen.org/qemu-xen-*.git +REMUS +M: Shriram Rajagopalan <rshriram@cs.ubc.ca> +S: Maintained +F: tools/remus/ + SCHEDULING M: George Dunlap <george.dunlap@eu.citrix.com> S: Supported _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shriram Rajagopalan
2011-Feb-11 18:13 UTC
Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
On Fri, Feb 11, 2011 at 1:51 AM, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote:> On Thu, 2011-02-10 at 19:31 +0000, Brendan Cully wrote: >> On Thursday, 10 February 2011 at 12:40, Ian Campbell wrote: >> > (cutting CC to just xen-devel) >> > >> > On Thu, 2011-02-10 at 11:31 +0000, Ian Campbell wrote: >> > > On Wed, 2011-02-09 at 23:16 +0000, Brendan Cully wrote: >> > > > I''d like to keep the fast resume option, and expect that it can be >> > > > contained entirely in Xen-specific code. I''ll try to get someone to >> > > > look into it here. >> > >> > On that note: Is there someone around who is willing to more actively >> > and visibly maintain Remus? i.e. make it work with the new toolstack, >> > upstream it to the mainline kernel, work to get it into a state where it >> > can be tested as a matter of course, and generally be seen to be >> > improving it (and its integration/interaction with the rest of the >> > ecosystem) with time? >> > >> > At the moment the impression is that Remus is mostly being left to rot >> > apart from when noise gets made about specific issues. >> >> I''d like to introduce Shriram Rajagopalan,[...] He''s excited to take on >> a maintenance role, and ready to put together a plan for upstreaming >> his work. > > Shriram, are you happy to add your acked-by to the following? > > I mostly guessed the most obvious path for the F: tag, feel free to > extend (I presume there are bits under tools/python too?). If you have a > Remus tree somewhere you can add a T: label too if you like. We can get > you a tree on xenbis.xen.org if you want. > > Ian. > > 8<------------------- > > MAINTAINERS: Add Remus maintainer. > > Signed-off-by: Ian Campbell <ian.campbell@citrix.com> > > diff -r 9e463cb15658 MAINTAINERS > --- a/MAINTAINERS Mon Feb 07 17:02:46 2011 +0000 > +++ b/MAINTAINERS Fri Feb 11 09:46:38 2011 +0000 > @@ -158,6 +158,11 @@ S: Supported > S: Supported > T: git git://xenbits.xen.org/qemu-xen-*.git > > +REMUS > +M: Shriram Rajagopalan <rshriram@cs.ubc.ca> > +S: Maintained > +F: tools/remus/ > + > SCHEDULING > M: George Dunlap <george.dunlap@eu.citrix.com> > S: Supported > > >Yep. You can add the acked-by. About the F: tag, the following paths contain remus code tools/remus/ tools/python/xen/remus/ tools/python/xen/lowlevel/checkpoint/ tools/blktap2/drivers/ The blktap2/drivers path has only few files (block-remus.c, hashtable* files) About the tree, it would be great if I could get a tree. :) shriram _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Feb-14 09:15 UTC
Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
On Fri, 2011-02-11 at 18:13 +0000, Shriram Rajagopalan wrote:> On Fri, Feb 11, 2011 at 1:51 AM, Ian Campbell > <Ian.Campbell@eu.citrix.com> wrote: > > On Thu, 2011-02-10 at 19:31 +0000, Brendan Cully wrote: > >> On Thursday, 10 February 2011 at 12:40, Ian Campbell wrote: > >> > (cutting CC to just xen-devel) > >> > > >> > On Thu, 2011-02-10 at 11:31 +0000, Ian Campbell wrote: > >> > > On Wed, 2011-02-09 at 23:16 +0000, Brendan Cully wrote: > >> > > > I''d like to keep the fast resume option, and expect that it can be > >> > > > contained entirely in Xen-specific code. I''ll try to get someone to > >> > > > look into it here. > >> > > >> > On that note: Is there someone around who is willing to more actively > >> > and visibly maintain Remus? i.e. make it work with the new toolstack, > >> > upstream it to the mainline kernel, work to get it into a state where it > >> > can be tested as a matter of course, and generally be seen to be > >> > improving it (and its integration/interaction with the rest of the > >> > ecosystem) with time? > >> > > >> > At the moment the impression is that Remus is mostly being left to rot > >> > apart from when noise gets made about specific issues. > >> > >> I''d like to introduce Shriram Rajagopalan,[...] He''s excited to take on > >> a maintenance role, and ready to put together a plan for upstreaming > >> his work. > > > > Shriram, are you happy to add your acked-by to the following? > >[...] > Yep. You can add the acked-by. > About the F: tag, the following paths contain remus code > tools/remus/ > tools/python/xen/remus/ > tools/python/xen/lowlevel/checkpoint/ > tools/blktap2/drivers/ > > The blktap2/drivers path has only few files (block-remus.c, hashtable* files)Thanks, I''ll resend the patch with this update and your ack.> About the tree, it would be great if I could get a tree. :)I''ve dropped you and the xenbits admins an email offline about this. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Feb-14 09:27 UTC
Re: [Xen-devel] Re: [linux-pm] [PATCH 0/2] Fix hangup after creating checkpoint on Xen.
On Mon, 2011-02-14 at 09:15 +0000, Ian Campbell wrote:> On Fri, 2011-02-11 at 18:13 +0000, Shriram Rajagopalan wrote: > > About the F: tag, the following paths contain remus code > > tools/remus/ > > tools/python/xen/remus/ > > tools/python/xen/lowlevel/checkpoint/ > > tools/blktap2/drivers/ > > > > The blktap2/drivers path has only few files (block-remus.c, hashtable* files) > > Thanks, I''ll resend the patch with this update and your ack.Keir beat me to the punch with the original patch. I''ve sent an update with the extra paths. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shriram Rajagopalan
2011-Feb-17 07:56 UTC
[Xen-devel] [PATCH] update comments in pm.h describing Xen Guest save/restore/checkpoint use case
Add documentation to pm.h on how xen uses PM events (freeze, thaw, restore) to implement Guest VM save/checkpoint/restore functionality. Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> --- include/linux/pm.h | 19 +++++++++++++++++++ 1 files changed, 19 insertions(+), 0 deletions(-) diff --git a/include/linux/pm.h b/include/linux/pm.h index dd9c7ab..2ddd9d3 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -516,6 +516,25 @@ extern void update_pm_runtime_accounting(struct device *dev); * well as during system sleep states like PM_SUSPEND_STANDBY. They may * be able to use wakeup events to exit from runtime low-power states, * or from system low-power states such as standby or suspend-to-RAM. + * + * Xen Guest Kernels use PM_FREEZE, PM_RESTORE and PM_THAW to implement + * VM save/restore/checkpoint functionality. Save and Restore are somewhat + * similar to hibernate functionality. The sequence of events is shown below: + * dpm_suspend_start(PMSG_FREEZE); + * + * dpm_suspend_noirq(PMSG_FREEZE); + * + * sysdev_suspend(PMSG_FREEZE); + * cancelled = suspend_hypercall() + * sysdev_resume(); + * + * dpm_resume_noirq(cancelled ? PMSG_THAW : PMSG_RESTORE); + * + * dpm_resume_end(cancelled ? PMSG_THAW : PMSG_RESTORE); + * + * If the syspend_hypercall returns 1, it means that the VM was merely + * checkpointed (akin to THAW). If it returns 0, it means the system has been + * fully restored from its on-disk snapshot (akin to RESTORE). */ #ifdef CONFIG_PM_SLEEP -- 1.7.0.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Feb-17 10:56 UTC
Re: [Xen-devel] [PATCH] update comments in pm.h describing Xen Guest save/restore/checkpoint use case
On Thu, 2011-02-17 at 07:56 +0000, Shriram Rajagopalan wrote:> Add documentation to pm.h on how xen uses PM events (freeze, > thaw, restore) to implement Guest VM save/checkpoint/restore > functionality.The change to freeze/thaw/restore instead of suspend/resume raises the question of what the correct .config option for Xen to key off is. Currently the Xen suspend functionality in drivers/xen/manage.c is keyed off CONFIG_PM_SLEEP (which depends on SUSPEND || HIBERNATION || XEN_SAVE_RESTORE). Since PMSG_{SUSPEND,RESUME} are covered by CONFIG_SUSPEND in pm_op() this already seems incorrect before this change, since PM_SLEEP can be enabled without SUSPEND, and is equally incorrect after changing to PMSG_{FREEZE,THAW,RESUME}, which are covered by CONFIG_HIBERNATION. Is the correct fix to update pm_op to include || defined(CONFIG_XEN_SAVE_RESTORE) where appropriate in pm_op()? Or should we be looking to adjust the Kconfig and/or code on the Xen side? Having Xen depend on HIBERNATION ="Hibernation (aka ''suspend to disk'')" seems semantically incorrect. Ian.> > Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca> > --- > include/linux/pm.h | 19 +++++++++++++++++++ > 1 files changed, 19 insertions(+), 0 deletions(-) > > diff --git a/include/linux/pm.h b/include/linux/pm.h > index dd9c7ab..2ddd9d3 100644 > --- a/include/linux/pm.h > +++ b/include/linux/pm.h > @@ -516,6 +516,25 @@ extern void update_pm_runtime_accounting(struct device *dev); > * well as during system sleep states like PM_SUSPEND_STANDBY. They may > * be able to use wakeup events to exit from runtime low-power states, > * or from system low-power states such as standby or suspend-to-RAM. > + * > + * Xen Guest Kernels use PM_FREEZE, PM_RESTORE and PM_THAW to implement > + * VM save/restore/checkpoint functionality. Save and Restore are somewhat > + * similar to hibernate functionality. The sequence of events is shown below: > + * dpm_suspend_start(PMSG_FREEZE); > + * > + * dpm_suspend_noirq(PMSG_FREEZE); > + * > + * sysdev_suspend(PMSG_FREEZE); > + * cancelled = suspend_hypercall() > + * sysdev_resume(); > + * > + * dpm_resume_noirq(cancelled ? PMSG_THAW : PMSG_RESTORE); > + * > + * dpm_resume_end(cancelled ? PMSG_THAW : PMSG_RESTORE); > + * > + * If the syspend_hypercall returns 1, it means that the VM was merely > + * checkpointed (akin to THAW). If it returns 0, it means the system has been > + * fully restored from its on-disk snapshot (akin to RESTORE). > */ > > #ifdef CONFIG_PM_SLEEP_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Possibly Parallel Threads
- [PATCH] xen: use freeze/restore/thaw PM events for suspend/resume/chkpt
- Debian/squeeze: domU live migraton hangs
- [040/111] xen: Do not suspend IPI IRQs.
- [PATCH] xen: wrap load_cr3() in manage.c for ia64 support.
- [PATCH] xen: remove load_cr3() from manage.c for ia64 support.