thr3ads.net - Xen devel - [Xen-devel] [RFC] pvops domain0 Cx/Px info parser patch [May 2009]

If this information is useful, please help other people find it:
Share via:

Yu, Ke

2009-May-11 07:58 UTC

[Xen-devel] [RFC] pvops domain0 Cx/Px info parser patch

Hi, Jeremy

Since now the pv_ops domain0 is approaching, we are considering to port the dom0
power management related code, or more specifically, the Cx/Px ACPI info parsing
code, to pv_ops domain0 tree, so that people can utilize Xen power management
capability (Cx/PX) under pv_ops domain0. This RFC is a draft version for
comments.

=== Overview ==
Requirement: Xen hypervisor need Cx/Px ACPI info to do the Cx/Px states power
management. This info is provided by BIOS ACPI table. Since hypervisor has no
ACPI parser, this info has to be parsed by domain0 kernel ACPI sub-system, and
then passed to hypervisor by hypercall.

To make this happen, the key point is to add hook in the kernel ACPI sub-system.
Fortunately, kernel already has good abstraction, and only several places need
to add hook. To be more detail, there is an acpi_processor_driver (in
drivers/acpi/processor_core.c) , which all the Cx/Px parsing event will go to.
This driver will call its acpi processor event handler, e.g. add/remove,
start/stop, notify to handle these events. These event handlers in turn will
call some library functions (in drivers/acpi/processor_perflib.c), e.g.
acpi_processor_ppc_has_changed, acpi_processor_ppc_has_changed,
acpi_processor_cst_has_changed, to finish the acpi info parsing.

So the conclusion is: adding hooks in acpi_processor_driver and those related
library functions will satisfy our requirement.

To make the added hook cleaner, we introduce an interface called external
control operation (struct processor_extcntl_ops). All the hooks are encapsulated
in this interface processor_extcntl_ops . Here the "external" means
the acpi processor is controlled by external entity, e.g. VMM. Every kind of
external entity can has its implementation of this interface. In this patch, the
interface for Xen is implemented.

== Patch Set Description =
This patch set is based the xen/dom0/hackery branch. It has three patches:

1.	acpi_parser_framework.patch
This patch introduces the interface of external control operation, and adds
hooks to the acpi sub-system, including the acpi_processor_driver, and the
related library functions.

2.	acpi_parser_xen_driver.patch
This patch implements the Xen version of the external control operation
interface.

3.	acpi_parser_platform_hypercall.patch
This patches implements the xen_platform_op hypercall, to pass the parsed ACPI
info to hypervisor.

== Misc =
To make this patch works correctly, there is a corner case need to consider:
processor in domain0 is virtual CPU, while BIOS ACPI table provides the physical
CPU info, since the vCPU and pCPU is not 1:1 mapped, this may caused unexpected
result.  E.g. in UP domain0, Xen hypervisor may only get one physical processor
info.  Current linux-2.6.18-xen.hg tree solve this issue by using acpi_id
instead of processor_id. However, this code is a bit tricky and hacky. We are
still trying to find a cleaner way to solve this issue.

Comments are welcome.

Best Regards
Yu Ke and Tian Kevin




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-May-11 18:48 UTC

head link

[Xen-devel] Re: [RFC] pvops domain0 Cx/Px info parser patch

Yu, Ke wrote:> Hi, Jeremy
>
> Since now the pv_ops domain0 is approaching, we are considering to port the
dom0 power management related code, or more specifically, the Cx/Px ACPI info
parsing code, to pv_ops domain0 tree, so that people can utilize Xen power
management capability (Cx/PX) under pv_ops domain0. This RFC is a draft version
for comments.
>   
Hi.  I commented on the last patch, but saw no response.  Did you get 
those comments?

Oh, perhaps they never got sent.  I''ve attached them now.  I have not 
looked at these patches yet.  Do they differ much?

    J


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Yu, Ke

2009-May-12 01:15 UTC

head link

[Xen-devel] RE: [RFC] pvops domain0 Cx/Px info parser patch

Hi, Jeremy

Glad to see your the comments. I did not get the previous replying email, so
just resend this RFC. There is no difference. I will reply your last email in a
separate email.

Best Regards
Ke
>-----Original Message-----
>From: Jeremy Fitzhardinge [mailto:jeremy@goop.org]
>Sent: Tuesday, May 12, 2009 2:48 AM
>To: Yu, Ke
>Cc: xen-devel@lists.xensource.com; Tian, Kevin
>Subject: Re: [RFC] pvops domain0 Cx/Px info parser patch
>
>Yu, Ke wrote:
>> Hi, Jeremy
>>
>> Since now the pv_ops domain0 is approaching, we are considering to port
the
>dom0 power management related code, or more specifically, the Cx/Px ACPI
info
>parsing code, to pv_ops domain0 tree, so that people can utilize Xen power
>management capability (Cx/PX) under pv_ops domain0. This RFC is a draft
version
>for comments.
>>
>
>Hi.  I commented on the last patch, but saw no response.  Did you get
>those comments?
>
>Oh, perhaps they never got sent.  I''ve attached them now.  I have
not
>looked at these patches yet.  Do they differ much?
>
>    J
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Yu, Ke

2009-May-13 07:46 UTC

head link

[Xen-devel] RE:[RFC] pvops domain0 Cx/Px info parser patch

Jeremy, thanks for the comments. It is helpful. Please see below.

Jeremy Fitzhardinge wrote:>Yu, Ke wrote:
>Hi Jeremy,
>
>
>Thanks for doing this.  I looked at the changes in 2.6.18-xen and put them
into the
>too-hard basket.
>
>Some general comments about patch submission:
>*	please send patches one-per-mail, with [PATCH N/M] in the subject line to
>indicate ordering
>*	in each mail, include the full changelog description like you have below
>*	each person signing off the mail should have a Signed-off-by: line
>This makes it easier to do line-by-line commentary on the patches and also
process
>it with the tools I have on hand.
Yes, I will follow this way next time.
>
>I just committed these patches to my tree, rebased onto xen-tip/dom0/core as
>xen-tip/dom0/acpi-parser, but I haven''t even compile-tested to see
if the rebase
>worked.  But I did cut-and-paste the overview into the first patch comment
so its
>recorded properly.
>
>
>=== Overview ==>
>Requirement: Xen hypervisor need Cx/Px ACPI info to do the Cx/Px states
power
>management. This info is provided by BIOS ACPI table. Since hypervisor has
no
>ACPI parser, this info has to be parsed by domain0 kernel ACPI sub-system,
and
>then passed to hypervisor by hypercall.
>
>
>Couldn''t this be done with a variant of the cpufreq_acpi driver? 
It needs to parse the
>tables, but rather than acting on them directly it passes the information to
Xen?
Oh, you raise a good point. This is a more clean design. I go through the
related code again. And find this method seems feasible. Basically, the xen
cpufreq driver need to do two things:
1. pass the initial cpufreq frequency info to xen. This can be done in the xen
cpufreq driver init phase.
2. if hardware cpufreq frequency info changed (see
acpi_processor_ppc_has_changed()), notify xen the changed info. This can be done
by registering one notify callback in cpufreq (see cpufreq_register_notifier) .

This is just initial thought, I will follow up this with more detail.

And another thing is that cpuidle (Cx info) has no such arch specific place for
xen, we probably still need to use current logic for cpuidle.
>
>
>To make this happen, the key point is to add hook in the kernel ACPI
sub-system.
>Fortunately, kernel already has good abstraction, and only several places
need to
>add hook. To be more detail, there is an acpi_processor_driver (in
>drivers/acpi/processor_core.c) , which all the Cx/Px parsing event will go
to. This
>driver will call its acpi processor event handler, e.g. add/remove,
start/stop, notify to
>handle these events. These event handlers in turn will call some library
functions (in
>drivers/acpi/processor_perflib.c), e.g. acpi_processor_ppc_has_changed,
>acpi_processor_ppc_has_changed, acpi_processor_cst_has_changed, to finish
the
>acpi info parsing.
>
>So the conclusion is: adding hooks in acpi_processor_driver and those
related
>library functions will satisfy our requirement.
>
>To make the added hook cleaner, we introduce an interface called external
control
>operation (struct processor_extcntl_ops). All the hooks are encapsulated in
this
>interface processor_extcntl_ops . Here the "external" means the
acpi processor is
>controlled by external entity, e.g. VMM. Every kind of external entity can
has its
>implementation of this interface. In this patch, the interface for Xen is
implemented.
>
>
>Is the issue that the hypervisor will change the CPUs states asynchronously
under
>the kernel, and the kernel wants to be informed about those state changes,
along
>with some kind of mapping from pcpu to vcpu?
In pv_ops dom0, kernel do not need to be informed on the cpu state changes.
Kernel only need to pass the acpi processor info to hypervisor, and let
hypervisor control the CPU states.
>
>What use does the kernel make of that information?
>
>I guess I don''t really understand what the larger problem domain
is.
Let me explain more detail. In native linux, the Cx/Px processing flow is like
this: firstly, acpi subsystem will parse the ACPI table, and get all Cx/Px
related acpi info (stored as struct acpi_processor, struct acpi_processor_power,
and struct acpi_processor_performance). and then acpi subsystem will notify
cpuidle driver and cpufreq driver to use the parsed acpi info to control
physical processor Cx/Px transition.

In pv_ops dom0, kernel can see physical ACPI table, but it only see vCPUs, so we
want kernel to parse ACPI table, but don''t want kernel to control
physical processor Cx/Px transition. So the ideal flow control is: firstly,
kernel acpi subsystem will parse the ACPI table, and get all Cx/Px related acpi
info. Then, kernel pass this info to hypervisor. Kernel cpuidle and cpufreq
driver is turned off.

You may ask why not moving all stuff to hypervisor, i.e. parsing acpi table in
hypervisor too. the major reason is that acpi parser is unfortunately very
complex, and at meantime kernel already have pretty good acpi parser, so it is a
better choice to leverage kernel''s acpi parser.

So the larger problem domain is: how to cleanly pass the acpi info to
hypervisor, and meanwhile, keep kernel cpuidle and cpufreq turned off.
>
>Do you have any plans for other hypervisor backends?  Does a kvm one make
>sense?
In KVM case, since kernel see the physical processor, the Cx/Px transition is
controlled by kernel. So KVM don''t need this logic. This logic only
applies to hypervisor who want to control Cx/Px itself. By far, I did not find
other hypervisor need this logic.
>
>
>== Patch Set Description =>
>This patch set is based the xen/dom0/hackery branch. It has three patches:
>
>1.	acpi_parser_framework.patch
>This patch introduces the interface of external control operation, and adds
hooks to
>the acpi sub-system, including the acpi_processor_driver, and the related
library
>functions.
>
>2.	acpi_parser_xen_driver.patch
>This patch implements the Xen version of the external control operation
interface.
>
>3.	acpi_parser_platform_hypercall.patch
>This patches implements the xen_platform_op hypercall, to pass the parsed
ACPI
>info to hypervisor.
>
>== Misc =>
>To make this patch works correctly, there is a corner case need to consider:
>processor in domain0 is virtual CPU, while BIOS ACPI table provides the
physical
>CPU info, since the vCPU and pCPU is not 1:1 mapped, this may caused
unexpected
>result.  E.g. in UP domain0, Xen hypervisor may only get one physical
processor
>info.  Current linux-2.6.18-xen.hg tree solve this issue by using acpi_id
instead of
>processor_id. However, this code is a bit tricky and hacky. We are still
trying to find
>a cleaner way to solve this issue.
>
>
>Is this a corner case?  I would think it''s the common case.
This is common case, but currently only affects Cx/Px logic. If any code need to
use the physical acpi processor info, it need to care about this.
>
>
>
>OK, some more specific comments:
>
>Cut down on the number of #ifdefs.  If
>CONFIG_PROCESSOR_EXTERNAL_CONTROL is not set, define the functions to
>appropriate no-op definitions.  For example, rather than:
>+#ifdef CONFIG_PROCESSOR_EXTERNAL_CONTROL
>+			if (!processor_cntl_external()) {
>+				return -ENODEV;
>+			}
>+#else
> 			return -ENODEV;
>+#endif /* CONFIG_PROCESSOR_EXTERNAL_CONTROL */
>just define processor_cntl_external() to always return 0.  (Oh, it looks
like you
>already do this.  Why all the #ifdefs then?)
Oh, we just want to make sure all changes are embraced by
CONFIG_PROCESSOR_EXTERNAL_CONTROL, for better tracking purpose. Will be remove
this in the final version.
>
>Also, rather than:
>+		if (processor_cntl_external())
>+			processor_notify_external(pr, PROCESSOR_HOTPLUG,
>+							HOTPLUG_TYPE_REMOVE);
>can''t processor_notify_external() do the test itself?  (In addition
to the #ifdef
>comments.)
Good point. will revise it.
>
>This seems pointless:
>-static int acpi_processor_get_performance_info(struct acpi_processor *pr)
>+#ifndef CONFIG_PROCESSOR_EXTERNAL_CONTROL
>+static
>+#endif
>+int acpi_processor_get_performance_info(struct acpi_processor *pr)
For tracking purpose only. Probably better to simply remove the
"static"
>
>
> config CPU_FREQ
> 	bool "CPU Frequency scaling"
>+	depends on !PROCESSOR_EXTERNAL_CONTROL
>
>This isn''t acceptible.  A single kernel image could be running on
bare hardware, or
>be booted as a Xen dom0 kernel.  It needs to be able to handle both CPUFREQ
and
>external processor control.  Unless external processor control somehow
subsumes
>all the cpufreq driver functions?  Anyway, all decisions need to be made at
runtime.
I see. When use the xen cpufreq driver approach as you suggested, this change is
not necessary any more.
>
>Put all the Xen-specific code into a Xen-specific header.  I''ve
already added
>include/xen/acpi.h for the S3 changes, so perhaps that would be suitable.
Ok.
>
>+ifneq ($(CONFIG_PROCESSOR_EXTERNAL_CONTROL),)
>+obj-$(CONFIG_XEN)		+= processor_extcntl_xen.o
>+endif
>No, do this in Kconfig, with something like:
>config XEN_PROCESSOR_EXTERNAL_CONTROL
>	def_bool y
>	depends on XEN && PROCESSOR_EXTERNAL_CONTROL
>And maybe XEN_ACPI and/or XEN_DOM0?  If this is useful for non-x86 (ia64),
then
>put it in drivers/xen.
Ok. and yes, it will be extended to ia64 in the future.
>
>    J
Thanks for the above comments. I will investigate the xen cpufreq driver
approach as you suggested, and sent the revised patch when it is ready.

Best Regards
Ke

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Reasonably Related Threads

Search for more seemingly similar threads

Xen devel - May 2009 - [RFC] pvops domain0 Cx/Px info parser patch

[Xen-devel] [RFC] pvops domain0 Cx/Px info parser patch

[Xen-devel] Re: [RFC] pvops domain0 Cx/Px info parser patch

[Xen-devel] RE: [RFC] pvops domain0 Cx/Px info parser patch

[Xen-devel] RE:[RFC] pvops domain0 Cx/Px info parser patch

Reasonably Related Threads