The hypervisor appears to make the assumption that all but the vCPU XENPF_enter_acpi_sleep is being called on are down (in 3.2 because the sender of the event check IPI assumes the remote CPU is idle, in 3.3 by and explicit check in __cpu_disable() - here we also have an incorrect comment stating that this path can only be used when entering S3). I can''t, however, see how this would be guaranteed on the kernel side (and apart from that I don''t think the hypervisor should be dependent on kernel behavior here, even if it''s dom0). Shouldn''t therefore freeze_domains() not only freeze all DomU-s, but also all non-current vCPU-s of Dom0? Thanks, Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 19/11/08 13:18, "Jan Beulich" <jbeulich@novell.com> wrote:> The hypervisor appears to make the assumption that all but the vCPU > XENPF_enter_acpi_sleep is being called on are down (in 3.2 because the > sender of the event check IPI assumes the remote CPU is idle, in 3.3 by > and explicit check in __cpu_disable() - here we also have an incorrect > comment stating that this path can only be used when entering S3). > > I can''t, however, see how this would be guaranteed on the kernel side > (and apart from that I don''t think the hypervisor should be dependent on > kernel behavior here, even if it''s dom0). Shouldn''t therefore > freeze_domains() not only freeze all DomU-s, but also all non-current > vCPU-s of Dom0?Kevin Tian is probably best placed to answer this. I''m happy to see this added if he agrees. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Wednesday, November 19, 2008 9:22 PM > >On 19/11/08 13:18, "Jan Beulich" <jbeulich@novell.com> wrote: > >> The hypervisor appears to make the assumption that all but the vCPU >> XENPF_enter_acpi_sleep is being called on are down (in 3.2 >because the >> sender of the event check IPI assumes the remote CPU is >idle, in 3.3 by >> and explicit check in __cpu_disable() - here we also have an >incorrect >> comment stating that this path can only be used when entering S3).Comment says "Only s3 is using this path", instead of "this path can only be used by s3". :-) At that time cpu online/offline is not supported and thus only s3 is the user on that path. If you look at latest xen upstream with cpu offline support, that comment went away.>> >> I can''t, however, see how this would be guaranteed on the kernel side >> (and apart from that I don''t think the hypervisor should be >dependent on >> kernel behavior here, even if it''s dom0). Shouldn''t therefore >> freeze_domains() not only freeze all DomU-s, but also all non-current >> vCPU-s of Dom0? > >Kevin Tian is probably best placed to answer this. I''m happy >to see this >added if he agrees. >Yes, original design depends on cooperation from dom0 kernel ( offline other vcpus) and control panels (send virtual S3 or equivalent suspend command to all domains except dom0). It''s expected that adminstrator should request system S3 on top of control panel, instead of accessing raw sysfs interface. Current code gives a final brute-force action to freeze domains. I agree that such guard should be also added to dom0''s vcpus if following this policy. However I''m considering the point whether Xen can simply reject the s3 request, when observing non-current vcpus still alive. Domain can be in trouble if unaware of underlying sleep phase, such time keeping and softlockup warning. More seriously, domain with passthrough devices can''t recover device state since it''s even not notified to save context. Opinions? Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 20/11/08 02:39, "Tian, Kevin" <kevin.tian@intel.com> wrote:> However I''m considering the point whether Xen can simply reject the > s3 request, when observing non-current vcpus still alive. Domain can > be in trouble if unaware of underlying sleep phase, such time keeping > and softlockup warning. More seriously, domain with passthrough > devices can''t recover device state since it''s even not notified to save > context. Opinions?What would you warn on? - VCPUs still exist? - VCPUs still online? - VCPUs not paused? - VCPUs not ''paused_by_system_controller''? I''m not sure what the WARN_ON() condition would be. A forceful domain_pause()/vcpu_pause() is a good idea anyway. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Thursday, November 20, 2008 4:01 PM > >On 20/11/08 02:39, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> However I''m considering the point whether Xen can simply reject the >> s3 request, when observing non-current vcpus still alive. Domain can >> be in trouble if unaware of underlying sleep phase, such time keeping >> and softlockup warning. More seriously, domain with passthrough >> devices can''t recover device state since it''s even not >notified to save >> context. Opinions? > >What would you warn on? > - VCPUs still exist? > - VCPUs still online? > - VCPUs not paused? > - VCPUs not ''paused_by_system_controller''?warn on unpaused domains and online dom0 vcpus.> >I''m not sure what the WARN_ON() condition would be. A forceful >domain_pause()/vcpu_pause() is a good idea anyway. > > -- KeirI''m pretty sure that domains will be busy catching up missing ticks and throw warnings after system is waken up. Why should Xen continue the progress even when we''re aware the fact that something will be hurted if doing so? Return a error with warning thrown out at least let user know current condition inapproriate for s3 (e.g. some incautious action) who can turn back to normal flow then. This is like normal OS suspend flow which simply exits if some checks fail. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 20/11/08 08:11, "Tian, Kevin" <kevin.tian@intel.com> wrote:>> I''m not sure what the WARN_ON() condition would be. A forceful >> domain_pause()/vcpu_pause() is a good idea anyway. >> >> -- Keir > > I''m pretty sure that domains will be busy catching up missing ticks > and throw warnings after system is waken up. Why should Xen > continue the progress even when we''re aware the fact that something > will be hurted if doing so? Return a error with warning thrown out at > least let user know current condition inapproriate for s3 (e.g. some > incautious action) who can turn back to normal flow then. This is like > normal OS suspend flow which simply exits if some checks fail.If Xen itself itself is now robust to VCPUs still being runnable/running then I''m fine with warnings only. If Xen isn''t, then forceful pausing is still needed (perhaps with some warnings in addition). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> "Tian, Kevin" <kevin.tian@intel.com> 20.11.08 03:39 >>> >>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >>Sent: Wednesday, November 19, 2008 9:22 PM >> >>On 19/11/08 13:18, "Jan Beulich" <jbeulich@novell.com> wrote: >> >>> The hypervisor appears to make the assumption that all but the vCPU >>> XENPF_enter_acpi_sleep is being called on are down (in 3.2 >>because the >>> sender of the event check IPI assumes the remote CPU is >>idle, in 3.3 by >>> and explicit check in __cpu_disable() - here we also have an >>incorrect >>> comment stating that this path can only be used when entering S3). > >Comment says "Only s3 is using this path", instead of "this path >can only be used by s3". :-) At that time cpu online/offline is not >supported and thus only s3 is the user on that path. If you look at >latest xen upstream with cpu offline support, that comment went >away.But my point is that this is wrong (no matter how it''s worded): entering S5 also uses this path, and in that case there''s nothing that stops non-current vCPU-s of dom0.>>> I can''t, however, see how this would be guaranteed on the kernel side >>> (and apart from that I don''t think the hypervisor should be >>dependent on >>> kernel behavior here, even if it''s dom0). Shouldn''t therefore >>> freeze_domains() not only freeze all DomU-s, but also all non-current >>> vCPU-s of Dom0? >> >>Kevin Tian is probably best placed to answer this. I''m happy >>to see this >>added if he agrees. >> > >Yes, original design depends on cooperation from dom0 kernel ( >offline other vcpus) and control panels (send virtual S3 or equivalent >suspend command to all domains except dom0). It''s expected that >adminstrator should request system S3 on top of control panel, >instead of accessing raw sysfs interface. Current code gives a final >brute-force action to freeze domains. I agree that such guard should >be also added to dom0''s vcpus if following this policy.I''ll prepare a patch for this then.>However I''m considering the point whether Xen can simply reject the >s3 request, when observing non-current vcpus still alive. Domain can >be in trouble if unaware of underlying sleep phase, such time keeping >and softlockup warning. More seriously, domain with passthrough >devices can''t recover device state since it''s even not notified to save >context. Opinions?I agree to this. But as per above, S3 (and S4 if ever supported) must be distinguished from S5 in this regard. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 20/11/08 08:18, "Jan Beulich" <jbeulich@novell.com> wrote:>> Comment says "Only s3 is using this path", instead of "this path >> can only be used by s3". :-) At that time cpu online/offline is not >> supported and thus only s3 is the user on that path. If you look at >> latest xen upstream with cpu offline support, that comment went >> away. > > But my point is that this is wrong (no matter how it''s worded): entering > S5 also uses this path, and in that case there''s nothing that stops > non-current vCPU-s of dom0.If you''re powering off, why does it matter if VCPUs are paused or not? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Jan Beulich [mailto:jbeulich@novell.com] >Sent: Thursday, November 20, 2008 4:18 PM >> >>Comment says "Only s3 is using this path", instead of "this path >>can only be used by s3". :-) At that time cpu online/offline is not >>supported and thus only s3 is the user on that path. If you look at >>latest xen upstream with cpu offline support, that comment went >>away. > >But my point is that this is wrong (no matter how it''s >worded): entering >S5 also uses this path, and in that case there''s nothing that stops >non-current vCPU-s of dom0.I see. You''re right. But that seems not matter as you''re doing S5. :-) Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Thursday, November 20, 2008 4:17 PM > >On 20/11/08 08:11, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >>> I''m not sure what the WARN_ON() condition would be. A forceful >>> domain_pause()/vcpu_pause() is a good idea anyway. >>> >>> -- Keir >> >> I''m pretty sure that domains will be busy catching up missing ticks >> and throw warnings after system is waken up. Why should Xen >> continue the progress even when we''re aware the fact that something >> will be hurted if doing so? Return a error with warning thrown out at >> least let user know current condition inapproriate for s3 (e.g. some >> incautious action) who can turn back to normal flow then. >This is like >> normal OS suspend flow which simply exits if some checks fail. > >If Xen itself itself is now robust to VCPUs still being >runnable/running >then I''m fine with warnings only. If Xen isn''t, then forceful >pausing is >still needed (perhaps with some warnings in addition). >what do you mean by "xen itself is robust to..."? Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Jan Beulich [mailto:jbeulich@novell.com] >Sent: Thursday, November 20, 2008 4:18 PM >>However I''m considering the point whether Xen can simply reject the >>s3 request, when observing non-current vcpus still alive. Domain can >>be in trouble if unaware of underlying sleep phase, such time keeping >>and softlockup warning. More seriously, domain with passthrough >>devices can''t recover device state since it''s even not >notified to save >>context. Opinions? > >I agree to this. But as per above, S3 (and S4 if ever >supported) must be >distinguished from S5 in this regard. >Why is distinguish required here? You just want a machine poweroff with S5, and it doesn''t matter to reuse same main path as S3. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Keir Fraser <keir.fraser@eu.citrix.com> 20.11.08 09:18 >>> >On 20/11/08 08:18, "Jan Beulich" <jbeulich@novell.com> wrote: > >>> Comment says "Only s3 is using this path", instead of "this path >>> can only be used by s3". :-) At that time cpu online/offline is not >>> supported and thus only s3 is the user on that path. If you look at >>> latest xen upstream with cpu offline support, that comment went >>> away. >> >> But my point is that this is wrong (no matter how it''s worded): entering >> S5 also uses this path, and in that case there''s nothing that stops >> non-current vCPU-s of dom0. > >If you''re powering off, why does it matter if VCPUs are paused or not?Because they can prevent the idle vCPU-s from being entered as needed in order to fully bring down pCPU-s. This is what a customer is reporting (on 3.2), prompting the start of this mail thread. And I''ve too seen occasional problems with pCPU-s going down prior to power off, just not as bad as they do (where things end up in a panic). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> "Tian, Kevin" <kevin.tian@intel.com> 20.11.08 09:19 >>> >>But my point is that this is wrong (no matter how it''s >>worded): entering >>S5 also uses this path, and in that case there''s nothing that stops >>non-current vCPU-s of dom0. > >I see. You''re right. But that seems not matter as you''re doing S5. :-)See my other reply just sent to Keir''s similar question. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Jan Beulich >Sent: Thursday, November 20, 2008 4:31 PM > >>>> Keir Fraser <keir.fraser@eu.citrix.com> 20.11.08 09:18 >>> >>On 20/11/08 08:18, "Jan Beulich" <jbeulich@novell.com> wrote: >> >>>> Comment says "Only s3 is using this path", instead of "this path >>>> can only be used by s3". :-) At that time cpu online/offline is not >>>> supported and thus only s3 is the user on that path. If you look at >>>> latest xen upstream with cpu offline support, that comment went >>>> away. >>> >>> But my point is that this is wrong (no matter how it''s >worded): entering >>> S5 also uses this path, and in that case there''s nothing that stops >>> non-current vCPU-s of dom0. >> >>If you''re powering off, why does it matter if VCPUs are paused or not? > >Because they can prevent the idle vCPU-s from being entered as needed >in order to fully bring down pCPU-s. This is what a customer >is reporting >(on 3.2), prompting the start of this mail thread. And I''ve too seen >occasional problems with pCPU-s going down prior to power off, just not >as bad as they do (where things end up in a panic). >Well, then probably we need check domains/dom0-vcpus for both, and then only forcefully freeze them for S5. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 20/11/08 08:21, "Tian, Kevin" <kevin.tian@intel.com> wrote:>> If Xen itself itself is now robust to VCPUs still being >> runnable/running >> then I''m fine with warnings only. If Xen isn''t, then forceful >> pausing is >> still needed (perhaps with some warnings in addition). >> > > what do you mean by "xen itself is robust to..."?It used to be the case that Xen CPU hotplug depended on CPUs alreayd being idle. It would just go horribly wrong if non-idle VCPUs were still running. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Thursday, November 20, 2008 4:52 PM >On 20/11/08 08:21, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >>> If Xen itself itself is now robust to VCPUs still being >>> runnable/running >>> then I''m fine with warnings only. If Xen isn''t, then forceful >>> pausing is >>> still needed (perhaps with some warnings in addition). >>> >> >> what do you mean by "xen itself is robust to..."? > >It used to be the case that Xen CPU hotplug depended on CPUs >alreayd being >idle. It would just go horribly wrong if non-idle VCPUs were >still running. >That should be still the current case, and running vcpus will be migrated first before pulling down the CPUs (added by Haitao Shan). So I think we can just return error for S3 case if there''re other running vcpus other than dom0''s vcpu0. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel