Frank Pan
2011-Mar-04 04:31 UTC
[Xen-devel] [PATCH] pvops: Resume devices when suspend is failed
Recent pvops kernel does not call dpm_resume_end when dpm_suspend_start is failed. This makes some device remain suspended after the unsuccessful call of do_suspend from xenbus. In my test, a PV-on-HVM guest printed the following message after received a suspend request through xenbus, and then stucked due to disk access. [41577.764748] sd 0:0:0:0: [sda] Stopping disk [41577.765273] PM: Device input2 failed to suspend: error -22 [41577.765275] xen suspend: dpm_suspend_start -22 The following patch fixes this by calling dpm_suspend_start after the failure of dpm_resume_end. --- linux-2.6-xen/drivers/xen/manage.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/linux-2.6-xen/drivers/xen/manage.c b/linux-2.6-xen/drivers/xen/manage.c index 5078704..3eec337 100644 --- a/linux-2.6-xen/drivers/xen/manage.c +++ b/linux-2.6-xen/drivers/xen/manage.c @@ -120,6 +120,7 @@ static void do_suspend(void) err = dpm_suspend_start(PMSG_SUSPEND); if (err) { + dpm_resume_end(PMSG_RESUME); printk(KERN_ERR "xen suspend: dpm_suspend_start %d\n", err); goto out_thaw; } -- 1.7.0.4 -- 潘震皓, Frank Pan Computer Science and Technology Tsinghua University _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Mar-07 17:22 UTC
[Xen-devel] Re: [PATCH] pvops: Resume devices when suspend is failed
On Fri, 4 Mar 2011, Frank Pan wrote:> Recent pvops kernel does not call dpm_resume_end when > dpm_suspend_start is failed. This makes some device remain suspended > after the unsuccessful call of do_suspend from xenbus. > > In my test, a PV-on-HVM guest printed the following message after > received a suspend request through xenbus, and then stucked due to > disk access. > > [41577.764748] sd 0:0:0:0: [sda] Stopping disk > [41577.765273] PM: Device input2 failed to suspend: error -22 > [41577.765275] xen suspend: dpm_suspend_start -22 > > The following patch fixes this by calling dpm_suspend_start after the > failure of dpm_resume_end.Thanks for spotting this issue and for the patch! However I think it would be better to move out_thaw before dpm_resume_end instead. Would you be OK to do that and send a patch? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Mar-08 10:32 UTC
Re: [Xen-devel] Re: [PATCH] pvops: Resume devices when suspend is failed
On Mon, 2011-03-07 at 17:22 +0000, Stefano Stabellini wrote:> On Fri, 4 Mar 2011, Frank Pan wrote: > > Recent pvops kernel does not call dpm_resume_end when > > dpm_suspend_start is failed. This makes some device remain suspended > > after the unsuccessful call of do_suspend from xenbus. > > > > In my test, a PV-on-HVM guest printed the following message after > > received a suspend request through xenbus, and then stucked due to > > disk access. > > > > [41577.764748] sd 0:0:0:0: [sda] Stopping disk > > [41577.765273] PM: Device input2 failed to suspend: error -22 > > [41577.765275] xen suspend: dpm_suspend_start -22 > > > > The following patch fixes this by calling dpm_suspend_start after the > > failure of dpm_resume_end.> Thanks for spotting this issue and for the patch! > However I think it would be better to move out_thaw before > dpm_resume_end instead.ACK. This will likely interact with Shriram''s recent changes to the PMSG_* types used by Xen save/restore. Ideally this patch would be against those. Also the choice of PMSG_* passed to dpm_resume_end will likely depend on whether dpm_suspend_start succeeded or failed. Based on looking at hibernate.c I think, after Shriram''s patches, it needs to be PMSG_RECOVER in the error case, PMSG_THAW in the checkpoint case and PMSG_RESTORE in the normal resume case but you should check the descriptions in pm.h to be sure. Ian.> Would you be OK to do that and send a patch? > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shriram Rajagopalan
2011-Mar-08 21:09 UTC
Re: [Xen-devel] Re: [PATCH] pvops: Resume devices when suspend is failed
On Tue, Mar 8, 2011 at 2:32 AM, Ian Campbell <Ian.Campbell@citrix.com>wrote:> On Mon, 2011-03-07 at 17:22 +0000, Stefano Stabellini wrote: > > On Fri, 4 Mar 2011, Frank Pan wrote: > > > Recent pvops kernel does not call dpm_resume_end when > > > dpm_suspend_start is failed. This makes some device remain suspended > > > after the unsuccessful call of do_suspend from xenbus. > > > > > > In my test, a PV-on-HVM guest printed the following message after > > > received a suspend request through xenbus, and then stucked due to > > > disk access. > > > > > > [41577.764748] sd 0:0:0:0: [sda] Stopping disk > > > [41577.765273] PM: Device input2 failed to suspend: error -22 > > > [41577.765275] xen suspend: dpm_suspend_start -22 > > > > > > The following patch fixes this by calling dpm_suspend_start after the > > > failure of dpm_resume_end. > > > Thanks for spotting this issue and for the patch! > > However I think it would be better to move out_thaw before > > dpm_resume_end instead. > > FYI, the code flow after Frank''s patchif (dpm_suspend(..)) { dpm_resume_end() goto out_thaw; } xs_suspend() if (dpm_suspend_noirq()) goto out_resume; xen_suspend() dpm_resume_noirq() out_resume: xen_arch_resume() xs_resume() dpm_resume_end() clock_was_set() out_thaw: return I am not sure if the clock_was_set() call is harmless (retriggering the timers), even if no dpm_suspend_noirq() calls were issued.> ACK. > > This will likely interact with Shriram''s recent changes to the PMSG_* > types used by Xen save/restore. Ideally this patch would be against > those. > > Also the choice of PMSG_* passed to dpm_resume_end will likely depend on > whether dpm_suspend_start succeeded or failed. Based on looking at > hibernate.c I think, after Shriram''s patches, it needs to be > PMSG_RECOVER in the error case, PMSG_THAW in the checkpoint case and > PMSG_RESTORE in the normal resume case but you should check the > descriptions in pm.h to be sure. > > Ian. > > > Would you be OK to do that and send a patch? > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frank Pan
2011-Mar-10 10:42 UTC
Re: [Xen-devel] Re: [PATCH] pvops: Resume devices when suspend is failed
Yes, that''s what I doubt. I write dpm_resume_end() inside the if clause just because this reason. What if we call clock_was_set() before dpm_resume_end() ? Is there any device need timer intr during resume? IOW: if (dpm_suspend(..)) goto out_thaw; xs_suspend() if (dpm_suspend_noirq()) goto out_resume; xen_suspend() dpm_resume_noirq() out_resume: xen_arch_resume() xs_resume() clock_was_set() out_thaw: dpm_resume_end() return On Wed, Mar 9, 2011 at 5:09 AM, Shriram Rajagopalan <rshriram@cs.ubc.ca> wrote:> On Tue, Mar 8, 2011 at 2:32 AM, Ian Campbell <Ian.Campbell@citrix.com> > wrote: >> >> On Mon, 2011-03-07 at 17:22 +0000, Stefano Stabellini wrote: >> > On Fri, 4 Mar 2011, Frank Pan wrote: >> > > Recent pvops kernel does not call dpm_resume_end when >> > > dpm_suspend_start is failed. This makes some device remain suspended >> > > after the unsuccessful call of do_suspend from xenbus. >> > > >> > > In my test, a PV-on-HVM guest printed the following message after >> > > received a suspend request through xenbus, and then stucked due to >> > > disk access. >> > > >> > > [41577.764748] sd 0:0:0:0: [sda] Stopping disk >> > > [41577.765273] PM: Device input2 failed to suspend: error -22 >> > > [41577.765275] xen suspend: dpm_suspend_start -22 >> > > >> > > The following patch fixes this by calling dpm_suspend_start after the >> > > failure of dpm_resume_end. >> >> > Thanks for spotting this issue and for the patch! >> > However I think it would be better to move out_thaw before >> > dpm_resume_end instead. >> > FYI, the code flow after Frank''s patch > if (dpm_suspend(..)) { > dpm_resume_end() > goto out_thaw; > } > xs_suspend() > if (dpm_suspend_noirq()) > goto out_resume; > xen_suspend() > dpm_resume_noirq() > > out_resume: > xen_arch_resume() > xs_resume() > dpm_resume_end() > clock_was_set() > > out_thaw: > return > > I am not sure if the clock_was_set() call is harmless (retriggering the > timers), > even if no dpm_suspend_noirq() calls were issued. >> >> ACK. >> >> This will likely interact with Shriram''s recent changes to the PMSG_* >> types used by Xen save/restore. Ideally this patch would be against >> those. >> >> Also the choice of PMSG_* passed to dpm_resume_end will likely depend on >> whether dpm_suspend_start succeeded or failed. Based on looking at >> hibernate.c I think, after Shriram''s patches, it needs to be >> PMSG_RECOVER in the error case, PMSG_THAW in the checkpoint case and >> PMSG_RESTORE in the normal resume case but you should check the >> descriptions in pm.h to be sure. >> >> Ian. >> >> > Would you be OK to do that and send a patch? >> > >> > _______________________________________________ >> > Xen-devel mailing list >> > Xen-devel@lists.xensource.com >> > http://lists.xensource.com/xen-devel >> >> > >-- 潘震皓, Frank Pan Computer Science and Technology Tsinghua University _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel