Frank Pan
2011-Mar-04 04:31 UTC
[Xen-devel] [PATCH] pvops: Resume devices when suspend is failed
Recent pvops kernel does not call dpm_resume_end when
dpm_suspend_start is failed. This makes some device remain suspended
after the unsuccessful call of do_suspend from xenbus.
In my test, a PV-on-HVM guest printed the following message after
received a suspend request through xenbus, and then stucked due to
disk access.
[41577.764748] sd 0:0:0:0: [sda] Stopping disk
[41577.765273] PM: Device input2 failed to suspend: error -22
[41577.765275] xen suspend: dpm_suspend_start -22
The following patch fixes this by calling dpm_suspend_start after the
failure of dpm_resume_end.
---
linux-2.6-xen/drivers/xen/manage.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/linux-2.6-xen/drivers/xen/manage.c
b/linux-2.6-xen/drivers/xen/manage.c
index 5078704..3eec337 100644
--- a/linux-2.6-xen/drivers/xen/manage.c
+++ b/linux-2.6-xen/drivers/xen/manage.c
@@ -120,6 +120,7 @@ static void do_suspend(void)
err = dpm_suspend_start(PMSG_SUSPEND);
if (err) {
+ dpm_resume_end(PMSG_RESUME);
printk(KERN_ERR "xen suspend: dpm_suspend_start %d\n", err);
goto out_thaw;
}
--
1.7.0.4
--
潘震皓, Frank Pan
Computer Science and Technology
Tsinghua University
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Stefano Stabellini
2011-Mar-07 17:22 UTC
[Xen-devel] Re: [PATCH] pvops: Resume devices when suspend is failed
On Fri, 4 Mar 2011, Frank Pan wrote:> Recent pvops kernel does not call dpm_resume_end when > dpm_suspend_start is failed. This makes some device remain suspended > after the unsuccessful call of do_suspend from xenbus. > > In my test, a PV-on-HVM guest printed the following message after > received a suspend request through xenbus, and then stucked due to > disk access. > > [41577.764748] sd 0:0:0:0: [sda] Stopping disk > [41577.765273] PM: Device input2 failed to suspend: error -22 > [41577.765275] xen suspend: dpm_suspend_start -22 > > The following patch fixes this by calling dpm_suspend_start after the > failure of dpm_resume_end.Thanks for spotting this issue and for the patch! However I think it would be better to move out_thaw before dpm_resume_end instead. Would you be OK to do that and send a patch? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Mar-08 10:32 UTC
Re: [Xen-devel] Re: [PATCH] pvops: Resume devices when suspend is failed
On Mon, 2011-03-07 at 17:22 +0000, Stefano Stabellini wrote:> On Fri, 4 Mar 2011, Frank Pan wrote: > > Recent pvops kernel does not call dpm_resume_end when > > dpm_suspend_start is failed. This makes some device remain suspended > > after the unsuccessful call of do_suspend from xenbus. > > > > In my test, a PV-on-HVM guest printed the following message after > > received a suspend request through xenbus, and then stucked due to > > disk access. > > > > [41577.764748] sd 0:0:0:0: [sda] Stopping disk > > [41577.765273] PM: Device input2 failed to suspend: error -22 > > [41577.765275] xen suspend: dpm_suspend_start -22 > > > > The following patch fixes this by calling dpm_suspend_start after the > > failure of dpm_resume_end.> Thanks for spotting this issue and for the patch! > However I think it would be better to move out_thaw before > dpm_resume_end instead.ACK. This will likely interact with Shriram''s recent changes to the PMSG_* types used by Xen save/restore. Ideally this patch would be against those. Also the choice of PMSG_* passed to dpm_resume_end will likely depend on whether dpm_suspend_start succeeded or failed. Based on looking at hibernate.c I think, after Shriram''s patches, it needs to be PMSG_RECOVER in the error case, PMSG_THAW in the checkpoint case and PMSG_RESTORE in the normal resume case but you should check the descriptions in pm.h to be sure. Ian.> Would you be OK to do that and send a patch? > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shriram Rajagopalan
2011-Mar-08 21:09 UTC
Re: [Xen-devel] Re: [PATCH] pvops: Resume devices when suspend is failed
On Tue, Mar 8, 2011 at 2:32 AM, Ian Campbell <Ian.Campbell@citrix.com>wrote:> On Mon, 2011-03-07 at 17:22 +0000, Stefano Stabellini wrote: > > On Fri, 4 Mar 2011, Frank Pan wrote: > > > Recent pvops kernel does not call dpm_resume_end when > > > dpm_suspend_start is failed. This makes some device remain suspended > > > after the unsuccessful call of do_suspend from xenbus. > > > > > > In my test, a PV-on-HVM guest printed the following message after > > > received a suspend request through xenbus, and then stucked due to > > > disk access. > > > > > > [41577.764748] sd 0:0:0:0: [sda] Stopping disk > > > [41577.765273] PM: Device input2 failed to suspend: error -22 > > > [41577.765275] xen suspend: dpm_suspend_start -22 > > > > > > The following patch fixes this by calling dpm_suspend_start after the > > > failure of dpm_resume_end. > > > Thanks for spotting this issue and for the patch! > > However I think it would be better to move out_thaw before > > dpm_resume_end instead. > > FYI, the code flow after Frank''s patchif (dpm_suspend(..)) { dpm_resume_end() goto out_thaw; } xs_suspend() if (dpm_suspend_noirq()) goto out_resume; xen_suspend() dpm_resume_noirq() out_resume: xen_arch_resume() xs_resume() dpm_resume_end() clock_was_set() out_thaw: return I am not sure if the clock_was_set() call is harmless (retriggering the timers), even if no dpm_suspend_noirq() calls were issued.> ACK. > > This will likely interact with Shriram''s recent changes to the PMSG_* > types used by Xen save/restore. Ideally this patch would be against > those. > > Also the choice of PMSG_* passed to dpm_resume_end will likely depend on > whether dpm_suspend_start succeeded or failed. Based on looking at > hibernate.c I think, after Shriram''s patches, it needs to be > PMSG_RECOVER in the error case, PMSG_THAW in the checkpoint case and > PMSG_RESTORE in the normal resume case but you should check the > descriptions in pm.h to be sure. > > Ian. > > > Would you be OK to do that and send a patch? > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frank Pan
2011-Mar-10 10:42 UTC
Re: [Xen-devel] Re: [PATCH] pvops: Resume devices when suspend is failed
Yes, that''s what I doubt.
I write dpm_resume_end() inside the if clause just because this reason.
What if we call clock_was_set() before dpm_resume_end() ?
Is there any device need timer intr during resume?
IOW:
if (dpm_suspend(..))
goto out_thaw;
xs_suspend()
if (dpm_suspend_noirq())
goto out_resume;
xen_suspend()
dpm_resume_noirq()
out_resume:
xen_arch_resume()
xs_resume()
clock_was_set()
out_thaw:
dpm_resume_end()
return
On Wed, Mar 9, 2011 at 5:09 AM, Shriram Rajagopalan <rshriram@cs.ubc.ca>
wrote:> On Tue, Mar 8, 2011 at 2:32 AM, Ian Campbell
<Ian.Campbell@citrix.com>
> wrote:
>>
>> On Mon, 2011-03-07 at 17:22 +0000, Stefano Stabellini wrote:
>> > On Fri, 4 Mar 2011, Frank Pan wrote:
>> > > Recent pvops kernel does not call dpm_resume_end when
>> > > dpm_suspend_start is failed. This makes some device remain
suspended
>> > > after the unsuccessful call of do_suspend from xenbus.
>> > >
>> > > In my test, a PV-on-HVM guest printed the following message
after
>> > > received a suspend request through xenbus, and then stucked
due to
>> > > disk access.
>> > >
>> > > [41577.764748] sd 0:0:0:0: [sda] Stopping disk
>> > > [41577.765273] PM: Device input2 failed to suspend: error -22
>> > > [41577.765275] xen suspend: dpm_suspend_start -22
>> > >
>> > > The following patch fixes this by calling dpm_suspend_start
after the
>> > > failure of dpm_resume_end.
>>
>> > Thanks for spotting this issue and for the patch!
>> > However I think it would be better to move out_thaw before
>> > dpm_resume_end instead.
>>
> FYI, the code flow after Frank''s patch
> if (dpm_suspend(..)) {
> dpm_resume_end()
> goto out_thaw;
> }
> xs_suspend()
> if (dpm_suspend_noirq())
> goto out_resume;
> xen_suspend()
> dpm_resume_noirq()
>
> out_resume:
> xen_arch_resume()
> xs_resume()
> dpm_resume_end()
> clock_was_set()
>
> out_thaw:
> return
>
> I am not sure if the clock_was_set() call is harmless (retriggering the
> timers),
> even if no dpm_suspend_noirq() calls were issued.
>>
>> ACK.
>>
>> This will likely interact with Shriram''s recent changes to the
PMSG_*
>> types used by Xen save/restore. Ideally this patch would be against
>> those.
>>
>> Also the choice of PMSG_* passed to dpm_resume_end will likely depend
on
>> whether dpm_suspend_start succeeded or failed. Based on looking at
>> hibernate.c I think, after Shriram''s patches, it needs to be
>> PMSG_RECOVER in the error case, PMSG_THAW in the checkpoint case and
>> PMSG_RESTORE in the normal resume case but you should check the
>> descriptions in pm.h to be sure.
>>
>> Ian.
>>
>> > Would you be OK to do that and send a patch?
>> >
>> > _______________________________________________
>> > Xen-devel mailing list
>> > Xen-devel@lists.xensource.com
>> > http://lists.xensource.com/xen-devel
>>
>>
>
>
--
潘震皓, Frank Pan
Computer Science and Technology
Tsinghua University
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel