Hi, all, When working on adding PM support to xen, we realized that some enhancements are required to suspend/resume domU. Following is some background and thoughts, and welcome on comments. :-) Currently we use a simple approach (pause/unpause) for domU when ready to pull whole platform into a power save state, saying a S3. Because pause/unpause is out of domU''s knowledge, domU observes soft lockup when unpaused after resuming from S3. Also this can not handle driver domain case. We tried "xm save/restore" or equivalent "xm suspend/resume", it works however overhead is a bit high since the whole domU memory is saved to disk and domain itself is destroyed after suspend. For S3 support, it''s better to have quick suspend/resume, cause memory still keeps along the process. Basically the change may lie with two aspects: - Lightweight "xm suspend/resume" - Extend suspend support to driver domU [Lightweight "xm suspend/resume"] It''s reasonable for current implementation to save and release whole memory of domU after suspended, since it allows more memory available to other domains. However for platform level S3, this is redundant when box is physically put into a suspend state. What we need is just to send a suspend notification into domU, and let domU fall into __xen_suspend path. Then domU exits scheduler by issuing HYPERVISOR_suspend. Nothing else required after this. After resume, domU just continues to run after suspend point. Even __xen_suspend path is a bit heavy, and in this case resources don''t change for domU even after resume. Maybe we can benefit from recent checkpoint patch which has appropriate change on this path. But I''m not familiar with control panel side and hope some guys can suggest me. My gut-feeling is to add an option (like -L) to existing "xm suspend/resume", instead of a new command. Actually the possible changes may look like what checkpoint patch does except no immediate resume and we need disable memory save logic. [Driver domU suspend] This has to be added if one device is assigned to a domU and we want system still working correctly after resume. When driver domU receives suspend request, it should invoke driver suspend method of owned physical devices. Before that, one other necessary step is to freeze all processes since some may still hold critical resource. In this case, we need borrow some Linux PM stuff into xen suspend path, something like: Smp_suspend(); freeze_processes(); device_suspend(); device_power_down(); xenbus_suspend(); ... HYPERVISOR_suspend(virt_to_mfn(xen_start_info)); ... xenbus_resume(); device_power_up(); device_resume(); thaw_processes(); smp_resume(); It may be more difficult if we want to support wake-on-LAN when that NIC is assigned to domU, which is more tightly related to ACPI. So we will simply consider normal device suspend here. One interesting question is, why doesn''t current __xen_suspend freeze running processes? My rough feeling is that virtual device state is still kept (like in xenstore, BE, etc.) along with single domU suspend/resume, and thus almost all front-end drivers (except TPM) have no suspend method. If no driver suspend methods are invoked, no need to freeze processes since no contention there. Correct me for the real trick. :-) Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
OK, seems no comments or against by far. I''m starting coding this enhancement now. Of course, comments are still welcomed. :-) Thanks, Kevin>-----Original Message----- >From: xen-devel-bounces@lists.xensource.com >[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Tian, >Kevin >Sent: 2007年1月17日 17:10 >To: xen-devel@lists.xensource.com >Subject: [Xen-devel] Ehancement to domU suspend/resume > >Hi, all, > When working on adding PM support to xen, we realized that >some enhancements are required to suspend/resume domU. Following >is some background and thoughts, and welcome on comments. :-) > > Currently we use a simple approach (pause/unpause) for domU >when ready to pull whole platform into a power save state, saying a >S3. Because pause/unpause is out of domU''s knowledge, domU >observes soft lockup when unpaused after resuming from S3. Also >this can not handle driver domain case. We tried "xm save/restore" >or equivalent "xm suspend/resume", it works however overhead is a >bit high since the whole domU memory is saved to disk and domain >itself is destroyed after suspend. For S3 support, it''s better to have >quick suspend/resume, cause memory still keeps along the process. > > Basically the change may lie with two aspects: > - Lightweight "xm suspend/resume" > - Extend suspend support to driver domU > >[Lightweight "xm suspend/resume"] > It''s reasonable for current implementation to save and release >whole memory of domU after suspended, since it allows more >memory available to other domains. However for platform level S3, >this is redundant when box is physically put into a suspend state. >What we need is just to send a suspend notification into domU, and >let domU fall into __xen_suspend path. Then domU exits scheduler >by issuing HYPERVISOR_suspend. Nothing else required after this. > After resume, domU just continues to run after suspend point. > > Even __xen_suspend path is a bit heavy, and in this case >resources don''t change for domU even after resume. Maybe we >can benefit from recent checkpoint patch which has appropriate >change on this path. > > But I''m not familiar with control panel side and hope some guys >can suggest me. My gut-feeling is to add an option (like -L) to existing > >"xm suspend/resume", instead of a new command. Actually the >possible changes may look like what checkpoint patch does except >no immediate resume and we need disable memory save logic. > >[Driver domU suspend] > This has to be added if one device is assigned to a domU and >we want system still working correctly after resume. When driver >domU receives suspend request, it should invoke driver suspend >method of owned physical devices. Before that, one other necessary >step is to freeze all processes since some may still hold critical >resource. In this case, we need borrow some Linux PM stuff into >xen suspend path, something like: > Smp_suspend(); > freeze_processes(); > device_suspend(); > device_power_down(); > xenbus_suspend(); > ... > HYPERVISOR_suspend(virt_to_mfn(xen_start_info)); > ... > xenbus_resume(); > device_power_up(); > device_resume(); > thaw_processes(); > smp_resume(); > > It may be more difficult if we want to support wake-on-LAN when >that NIC is assigned to domU, which is more tightly related to ACPI. >So we will simply consider normal device suspend here. > > One interesting question is, why doesn''t current __xen_suspend >freeze running processes? My rough feeling is that virtual device state >is still kept (like in xenstore, BE, etc.) along with single domU >suspend/resume, and thus almost all front-end drivers (except TPM) >have no suspend method. If no driver suspend methods are invoked, >no need to freeze processes since no contention there. Correct me for >the real trick. :-) > >Thanks, >Kevin > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 18/1/07 2:41 am, "Tian, Kevin" <kevin.tian@intel.com> wrote:>> freeze_processes(); >> device_suspend(); >> device_power_down();>> device_power_up(); >> device_resume(); >> thaw_processes();Any idea what cost this adds for a non driver domain? In particular, what does freeze/thaw do? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:keir@xensource.com] >Sent: 2007年1月18日 15:04 > >On 18/1/07 2:41 am, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >>> freeze_processes(); >>> device_suspend(); >>> device_power_down(); > >>> device_power_up(); >>> device_resume(); >>> thaw_processes(); > >Any idea what cost this adds for a non driver domain? In particular, what >does freeze/thaw do? > > -- KeirFreeze basically puts all other processes into a frozen point with no lock held. For kernel thread, each one is required to check freezing flag (by try_to_freeze) at each loop out of critical chunk, and then yield or sleep if set. For user space process, a dummy signal notification is sent to that process which will then check freezing flag when do_signal before returning to user space. This can assure all processes falling into a save point before drivers are ready to suspend. Thaw just does the reverse to unfreeze them when resume. Yes, it may have to wait some time for all processes to be frozen, and I have no estimation. But it''s a necessary step to put whole box into a stable state. Is there any flag to check whether current domU is a driver domain? Then we can differentiate two paths. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 18/1/07 7:27 am, "Tian, Kevin" <kevin.tian@intel.com> wrote:> Freeze basically puts all other processes into a frozen point with no > lock held. For kernel thread, each one is required to check freezing > flag (by try_to_freeze) at each loop out of critical chunk, and then yield > or sleep if set. For user space process, a dummy signal notification is > sent to that process which will then check freezing flag when do_signal > before returning to user space. This can assure all processes falling > into a save point before drivers are ready to suspend. Thaw just does > the reverse to unfreeze them when resume. > > Yes, it may have to wait some time for all processes to be frozen, and I > have no estimation. But it''s a necessary step to put whole box into a > stable state. Is there any flag to check whether current domU is a driver > domain? Then we can differentiate two paths.Well, let''s see what latency it adds in practise. I believe the kernel guys are going to use the process refrigerator for CPU hotplug so we may have to go this route anyway long term. One fear I have is that user processes doing xenbus transactions may be unable to enter the fridge if they are waiting for the transaction mutex (which is locked out across save/restore xenbus_suspend()/xenbus_resume()). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk] >Sent: 2007年1月18日 15:51 > >Well, let''s see what latency it adds in practise. I believe the kernel guys >are going to use the process refrigerator for CPU hotplug so we may >have to >go this route anyway long term. > >One fear I have is that user processes doing xenbus transactions may >be >unable to enter the fridge if they are waiting for the transaction mutex >(which is locked out across save/restore >xenbus_suspend()/xenbus_resume()). > > -- KeirI think that should be OK, since process freezes happen before device related suspend, including xenbus_suspend. System is in a fully-working state along with the freeze process, except that SMP has been hot-removed to be UP. :-) BTW, I have a small question for the front-end driver. If there''s pending requests within shared ring buffer at suspend, what will happen when FE detects connection broken after resuming back? Will those requests be abandoned, or continue to service after re-connected? I''m afraid that previous grant entries may lose after gnttab_resume... Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 18/1/07 08:16, "Tian, Kevin" <kevin.tian@intel.com> wrote:> BTW, I have a small question for the front-end driver. If there''s pending > requests within shared ring buffer at suspend, what will happen when > FE detects connection broken after resuming back? Will those requests > be abandoned, or continue to service after re-connected? I''m afraid that > previous grant entries may lose after gnttab_resume...They get requeued. The grant entries are filled in again. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel