Hi, Jeremy Since now the pv_ops domain0 is approaching, we are considering to port the dom0 power management related code, or more specifically, the Cx/Px ACPI info parsing code, to pv_ops domain0 tree, so that people can utilize Xen power management capability (Cx/PX) under pv_ops domain0. This RFC is a draft version for comments. === Overview == Requirement: Xen hypervisor need Cx/Px ACPI info to do the Cx/Px states power management. This info is provided by BIOS ACPI table. Since hypervisor has no ACPI parser, this info has to be parsed by domain0 kernel ACPI sub-system, and then passed to hypervisor by hypercall. To make this happen, the key point is to add hook in the kernel ACPI sub-system. Fortunately, kernel already has good abstraction, and only several places need to add hook. To be more detail, there is an acpi_processor_driver (in drivers/acpi/processor_core.c) , which all the Cx/Px parsing event will go to. This driver will call its acpi processor event handler, e.g. add/remove, start/stop, notify to handle these events. These event handlers in turn will call some library functions (in drivers/acpi/processor_perflib.c), e.g. acpi_processor_ppc_has_changed, acpi_processor_ppc_has_changed, acpi_processor_cst_has_changed, to finish the acpi info parsing. So the conclusion is: adding hooks in acpi_processor_driver and those related library functions will satisfy our requirement. To make the added hook cleaner, we introduce an interface called external control operation (struct processor_extcntl_ops). All the hooks are encapsulated in this interface processor_extcntl_ops . Here the "external" means the acpi processor is controlled by external entity, e.g. VMM. Every kind of external entity can has its implementation of this interface. In this patch, the interface for Xen is implemented. == Patch Set Description = This patch set is based the xen/dom0/hackery branch. It has three patches: 1. acpi_parser_framework.patch This patch introduces the interface of external control operation, and adds hooks to the acpi sub-system, including the acpi_processor_driver, and the related library functions. 2. acpi_parser_xen_driver.patch This patch implements the Xen version of the external control operation interface. 3. acpi_parser_platform_hypercall.patch This patches implements the xen_platform_op hypercall, to pass the parsed ACPI info to hypervisor. == Misc = To make this patch works correctly, there is a corner case need to consider: processor in domain0 is virtual CPU, while BIOS ACPI table provides the physical CPU info, since the vCPU and pCPU is not 1:1 mapped, this may caused unexpected result. E.g. in UP domain0, Xen hypervisor may only get one physical processor info. Current linux-2.6.18-xen.hg tree solve this issue by using acpi_id instead of processor_id. However, this code is a bit tricky and hacky. We are still trying to find a cleaner way to solve this issue. Comments are welcome. Best Regards Yu Ke and Tian Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-May-11 18:48 UTC
[Xen-devel] Re: [RFC] pvops domain0 Cx/Px info parser patch
Yu, Ke wrote:> Hi, Jeremy > > Since now the pv_ops domain0 is approaching, we are considering to port the dom0 power management related code, or more specifically, the Cx/Px ACPI info parsing code, to pv_ops domain0 tree, so that people can utilize Xen power management capability (Cx/PX) under pv_ops domain0. This RFC is a draft version for comments. >Hi. I commented on the last patch, but saw no response. Did you get those comments? Oh, perhaps they never got sent. I''ve attached them now. I have not looked at these patches yet. Do they differ much? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, Jeremy Glad to see your the comments. I did not get the previous replying email, so just resend this RFC. There is no difference. I will reply your last email in a separate email. Best Regards Ke>-----Original Message----- >From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] >Sent: Tuesday, May 12, 2009 2:48 AM >To: Yu, Ke >Cc: xen-devel@lists.xensource.com; Tian, Kevin >Subject: Re: [RFC] pvops domain0 Cx/Px info parser patch > >Yu, Ke wrote: >> Hi, Jeremy >> >> Since now the pv_ops domain0 is approaching, we are considering to port the >dom0 power management related code, or more specifically, the Cx/Px ACPI info >parsing code, to pv_ops domain0 tree, so that people can utilize Xen power >management capability (Cx/PX) under pv_ops domain0. This RFC is a draft version >for comments. >> > >Hi. I commented on the last patch, but saw no response. Did you get >those comments? > >Oh, perhaps they never got sent. I''ve attached them now. I have not >looked at these patches yet. Do they differ much? > > J_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy, thanks for the comments. It is helpful. Please see below. Jeremy Fitzhardinge wrote:>Yu, Ke wrote: >Hi Jeremy, > > >Thanks for doing this. I looked at the changes in 2.6.18-xen and put them into the >too-hard basket. > >Some general comments about patch submission: >* please send patches one-per-mail, with [PATCH N/M] in the subject line to >indicate ordering >* in each mail, include the full changelog description like you have below >* each person signing off the mail should have a Signed-off-by: line >This makes it easier to do line-by-line commentary on the patches and also process >it with the tools I have on hand.Yes, I will follow this way next time.> >I just committed these patches to my tree, rebased onto xen-tip/dom0/core as >xen-tip/dom0/acpi-parser, but I haven''t even compile-tested to see if the rebase >worked. But I did cut-and-paste the overview into the first patch comment so its >recorded properly. > > >=== Overview ==> >Requirement: Xen hypervisor need Cx/Px ACPI info to do the Cx/Px states power >management. This info is provided by BIOS ACPI table. Since hypervisor has no >ACPI parser, this info has to be parsed by domain0 kernel ACPI sub-system, and >then passed to hypervisor by hypercall. > > >Couldn''t this be done with a variant of the cpufreq_acpi driver? It needs to parse the >tables, but rather than acting on them directly it passes the information to Xen?Oh, you raise a good point. This is a more clean design. I go through the related code again. And find this method seems feasible. Basically, the xen cpufreq driver need to do two things: 1. pass the initial cpufreq frequency info to xen. This can be done in the xen cpufreq driver init phase. 2. if hardware cpufreq frequency info changed (see acpi_processor_ppc_has_changed()), notify xen the changed info. This can be done by registering one notify callback in cpufreq (see cpufreq_register_notifier) . This is just initial thought, I will follow up this with more detail. And another thing is that cpuidle (Cx info) has no such arch specific place for xen, we probably still need to use current logic for cpuidle.> > >To make this happen, the key point is to add hook in the kernel ACPI sub-system. >Fortunately, kernel already has good abstraction, and only several places need to >add hook. To be more detail, there is an acpi_processor_driver (in >drivers/acpi/processor_core.c) , which all the Cx/Px parsing event will go to. This >driver will call its acpi processor event handler, e.g. add/remove, start/stop, notify to >handle these events. These event handlers in turn will call some library functions (in >drivers/acpi/processor_perflib.c), e.g. acpi_processor_ppc_has_changed, >acpi_processor_ppc_has_changed, acpi_processor_cst_has_changed, to finish the >acpi info parsing. > >So the conclusion is: adding hooks in acpi_processor_driver and those related >library functions will satisfy our requirement. > >To make the added hook cleaner, we introduce an interface called external control >operation (struct processor_extcntl_ops). All the hooks are encapsulated in this >interface processor_extcntl_ops . Here the "external" means the acpi processor is >controlled by external entity, e.g. VMM. Every kind of external entity can has its >implementation of this interface. In this patch, the interface for Xen is implemented. > > >Is the issue that the hypervisor will change the CPUs states asynchronously under >the kernel, and the kernel wants to be informed about those state changes, along >with some kind of mapping from pcpu to vcpu?In pv_ops dom0, kernel do not need to be informed on the cpu state changes. Kernel only need to pass the acpi processor info to hypervisor, and let hypervisor control the CPU states.> >What use does the kernel make of that information? > >I guess I don''t really understand what the larger problem domain is.Let me explain more detail. In native linux, the Cx/Px processing flow is like this: firstly, acpi subsystem will parse the ACPI table, and get all Cx/Px related acpi info (stored as struct acpi_processor, struct acpi_processor_power, and struct acpi_processor_performance). and then acpi subsystem will notify cpuidle driver and cpufreq driver to use the parsed acpi info to control physical processor Cx/Px transition. In pv_ops dom0, kernel can see physical ACPI table, but it only see vCPUs, so we want kernel to parse ACPI table, but don''t want kernel to control physical processor Cx/Px transition. So the ideal flow control is: firstly, kernel acpi subsystem will parse the ACPI table, and get all Cx/Px related acpi info. Then, kernel pass this info to hypervisor. Kernel cpuidle and cpufreq driver is turned off. You may ask why not moving all stuff to hypervisor, i.e. parsing acpi table in hypervisor too. the major reason is that acpi parser is unfortunately very complex, and at meantime kernel already have pretty good acpi parser, so it is a better choice to leverage kernel''s acpi parser. So the larger problem domain is: how to cleanly pass the acpi info to hypervisor, and meanwhile, keep kernel cpuidle and cpufreq turned off.> >Do you have any plans for other hypervisor backends? Does a kvm one make >sense?In KVM case, since kernel see the physical processor, the Cx/Px transition is controlled by kernel. So KVM don''t need this logic. This logic only applies to hypervisor who want to control Cx/Px itself. By far, I did not find other hypervisor need this logic.> > >== Patch Set Description => >This patch set is based the xen/dom0/hackery branch. It has three patches: > >1. acpi_parser_framework.patch >This patch introduces the interface of external control operation, and adds hooks to >the acpi sub-system, including the acpi_processor_driver, and the related library >functions. > >2. acpi_parser_xen_driver.patch >This patch implements the Xen version of the external control operation interface. > >3. acpi_parser_platform_hypercall.patch >This patches implements the xen_platform_op hypercall, to pass the parsed ACPI >info to hypervisor. > >== Misc => >To make this patch works correctly, there is a corner case need to consider: >processor in domain0 is virtual CPU, while BIOS ACPI table provides the physical >CPU info, since the vCPU and pCPU is not 1:1 mapped, this may caused unexpected >result. E.g. in UP domain0, Xen hypervisor may only get one physical processor >info. Current linux-2.6.18-xen.hg tree solve this issue by using acpi_id instead of >processor_id. However, this code is a bit tricky and hacky. We are still trying to find >a cleaner way to solve this issue. > > >Is this a corner case? I would think it''s the common case.This is common case, but currently only affects Cx/Px logic. If any code need to use the physical acpi processor info, it need to care about this.> > > >OK, some more specific comments: > >Cut down on the number of #ifdefs. If >CONFIG_PROCESSOR_EXTERNAL_CONTROL is not set, define the functions to >appropriate no-op definitions. For example, rather than: >+#ifdef CONFIG_PROCESSOR_EXTERNAL_CONTROL >+ if (!processor_cntl_external()) { >+ return -ENODEV; >+ } >+#else > return -ENODEV; >+#endif /* CONFIG_PROCESSOR_EXTERNAL_CONTROL */ >just define processor_cntl_external() to always return 0. (Oh, it looks like you >already do this. Why all the #ifdefs then?)Oh, we just want to make sure all changes are embraced by CONFIG_PROCESSOR_EXTERNAL_CONTROL, for better tracking purpose. Will be remove this in the final version.> >Also, rather than: >+ if (processor_cntl_external()) >+ processor_notify_external(pr, PROCESSOR_HOTPLUG, >+ HOTPLUG_TYPE_REMOVE); >can''t processor_notify_external() do the test itself? (In addition to the #ifdef >comments.)Good point. will revise it.> >This seems pointless: >-static int acpi_processor_get_performance_info(struct acpi_processor *pr) >+#ifndef CONFIG_PROCESSOR_EXTERNAL_CONTROL >+static >+#endif >+int acpi_processor_get_performance_info(struct acpi_processor *pr)For tracking purpose only. Probably better to simply remove the "static"> > > config CPU_FREQ > bool "CPU Frequency scaling" >+ depends on !PROCESSOR_EXTERNAL_CONTROL > >This isn''t acceptible. A single kernel image could be running on bare hardware, or >be booted as a Xen dom0 kernel. It needs to be able to handle both CPUFREQ and >external processor control. Unless external processor control somehow subsumes >all the cpufreq driver functions? Anyway, all decisions need to be made at runtime.I see. When use the xen cpufreq driver approach as you suggested, this change is not necessary any more.> >Put all the Xen-specific code into a Xen-specific header. I''ve already added >include/xen/acpi.h for the S3 changes, so perhaps that would be suitable.Ok.> >+ifneq ($(CONFIG_PROCESSOR_EXTERNAL_CONTROL),) >+obj-$(CONFIG_XEN) += processor_extcntl_xen.o >+endif >No, do this in Kconfig, with something like: >config XEN_PROCESSOR_EXTERNAL_CONTROL > def_bool y > depends on XEN && PROCESSOR_EXTERNAL_CONTROL >And maybe XEN_ACPI and/or XEN_DOM0? If this is useful for non-x86 (ia64), then >put it in drivers/xen.Ok. and yes, it will be extended to ia64 in the future.> > JThanks for the above comments. I will investigate the xen cpufreq driver approach as you suggested, and sent the revised patch when it is ready. Best Regards Ke _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel