# HG changeset patch # User Boris Ostrovsky <boris.ostrovsky@amd.com> # Date 1330466573 -3600 # Node ID 9e5991ad9c85b5176ce269001e7957e8805dd93c # Parent a7bacdc5449a2f7bb9c35b2a1334b463fe9f29a9 x86: Use deep C states for off-lined CPUs Currently when a core is taken off-line it is placed in C1 state (unless MONITOR/MWAIT is used). This patch allows a core to go to deeper C states resulting in significantly higher power savings. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> diff -r a7bacdc5449a -r 9e5991ad9c85 xen/arch/x86/acpi/cpu_idle.c --- a/xen/arch/x86/acpi/cpu_idle.c Mon Feb 27 17:05:18 2012 +0000 +++ b/xen/arch/x86/acpi/cpu_idle.c Tue Feb 28 23:02:53 2012 +0100 @@ -573,10 +573,10 @@ static void acpi_dead_idle(void) if ( (cx = &power->states[power->count-1]) == NULL ) goto default_halt; - mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); - if ( cx->entry_method == ACPI_CSTATE_EM_FFH ) { + mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); + /* * Cache must be flushed as the last operation before sleeping. * Otherwise, CPU may still hold dirty data, breaking cache coherency, @@ -601,6 +601,20 @@ static void acpi_dead_idle(void) mb(); __mwait(cx->address, 0); } + } + else if ( cx->entry_method == ACPI_CSTATE_EM_SYSIO ) + { + /* Avoid references to shared data after the cache flush */ + u32 address = cx->address; + u32 pmtmr_ioport_local = pmtmr_ioport; + + wbinvd(); + + while ( 1 ) + { + inb(address); + inl(pmtmr_ioport_local); + } } default_halt:
I noticed the following comments when using mwait based idle: ------------------------------------------------------------------------- while ( 1 ) { /* * 1. The CLFLUSH is a workaround for erratum AAI65 for * the Xeon 7400 series. * 2. The WBINVD is insufficient due to the spurious-wakeup * case where we return around the loop. * 3. Unlike wbinvd, clflush is a light weight but not serializing * instruction, hence memory fence is necessary to make sure all * load/store visible before flush cache line. */ mb(); clflush(mwait_ptr); __monitor(mwait_ptr, 0, 0); mb(); __mwait(cx->address, 0); } } ------------------------------------------------------------------------- Your patch should follow it too. best regards yang> -----Original Message----- > From: xen-devel-bounces@lists.xen.org > [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Boris Ostrovsky > Sent: Wednesday, February 29, 2012 6:09 AM > To: xen-devel@lists.xensource.com > Cc: boris.ostrovsky@amd.com > Subject: [Xen-devel] [PATCH] x86: Use deep C states for off-lined CPUs > > # HG changeset patch > # User Boris Ostrovsky <boris.ostrovsky@amd.com> # Date 1330466573 -3600 # > Node ID 9e5991ad9c85b5176ce269001e7957e8805dd93c > # Parent a7bacdc5449a2f7bb9c35b2a1334b463fe9f29a9 > x86: Use deep C states for off-lined CPUs > > Currently when a core is taken off-line it is placed in C1 state (unless > MONITOR/MWAIT is used). This patch allows a core to go to deeper C states > resulting in significantly higher power savings. > > Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> > > diff -r a7bacdc5449a -r 9e5991ad9c85 xen/arch/x86/acpi/cpu_idle.c > --- a/xen/arch/x86/acpi/cpu_idle.c Mon Feb 27 17:05:18 2012 +0000 > +++ b/xen/arch/x86/acpi/cpu_idle.c Tue Feb 28 23:02:53 2012 +0100 > @@ -573,10 +573,10 @@ static void acpi_dead_idle(void) > if ( (cx = &power->states[power->count-1]) == NULL ) > goto default_halt; > > - mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); > - > if ( cx->entry_method == ACPI_CSTATE_EM_FFH ) > { > + mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); > + > /* > * Cache must be flushed as the last operation before sleeping. > * Otherwise, CPU may still hold dirty data, breaking cache coherency, > @@ -601,6 +601,20 @@ static void acpi_dead_idle(void) > mb(); > __mwait(cx->address, 0); > } > + } > + else if ( cx->entry_method == ACPI_CSTATE_EM_SYSIO ) > + { > + /* Avoid references to shared data after the cache flush */ > + u32 address = cx->address; > + u32 pmtmr_ioport_local = pmtmr_ioport; > + > + wbinvd(); > + > + while ( 1 ) > + { > + inb(address); > + inl(pmtmr_ioport_local); > + } > } > > default_halt: > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > lists.xen.org/xen-devel
Ostrovsky, Boris
2012-Feb-29 04:03 UTC
Re: [PATCH] x86: Use deep C states for off-lined CPUs
The patch is adding IO-based C-states. My understading is that CFLUSH was to work around a MONITOR-related erratum. Or are you referring to something else? -boris ________________________________________ From: Zhang, Yang Z [yang.z.zhang@intel.com] Sent: Tuesday, February 28, 2012 8:37 PM To: Ostrovsky, Boris; xen-devel@lists.xensource.com Subject: RE: [Xen-devel] [PATCH] x86: Use deep C states for off-lined CPUs I noticed the following comments when using mwait based idle: ------------------------------------------------------------------------- while ( 1 ) { /* * 1. The CLFLUSH is a workaround for erratum AAI65 for * the Xeon 7400 series. * 2. The WBINVD is insufficient due to the spurious-wakeup * case where we return around the loop. * 3. Unlike wbinvd, clflush is a light weight but not serializing * instruction, hence memory fence is necessary to make sure all * load/store visible before flush cache line. */ mb(); clflush(mwait_ptr); __monitor(mwait_ptr, 0, 0); mb(); __mwait(cx->address, 0); } } ------------------------------------------------------------------------- Your patch should follow it too. best regards yang> -----Original Message----- > From: xen-devel-bounces@lists.xen.org > [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Boris Ostrovsky > Sent: Wednesday, February 29, 2012 6:09 AM > To: xen-devel@lists.xensource.com > Cc: boris.ostrovsky@amd.com > Subject: [Xen-devel] [PATCH] x86: Use deep C states for off-lined CPUs > > # HG changeset patch > # User Boris Ostrovsky <boris.ostrovsky@amd.com> # Date 1330466573 -3600 # > Node ID 9e5991ad9c85b5176ce269001e7957e8805dd93c > # Parent a7bacdc5449a2f7bb9c35b2a1334b463fe9f29a9 > x86: Use deep C states for off-lined CPUs > > Currently when a core is taken off-line it is placed in C1 state (unless > MONITOR/MWAIT is used). This patch allows a core to go to deeper C states > resulting in significantly higher power savings. > > Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> > > diff -r a7bacdc5449a -r 9e5991ad9c85 xen/arch/x86/acpi/cpu_idle.c > --- a/xen/arch/x86/acpi/cpu_idle.c Mon Feb 27 17:05:18 2012 +0000 > +++ b/xen/arch/x86/acpi/cpu_idle.c Tue Feb 28 23:02:53 2012 +0100 > @@ -573,10 +573,10 @@ static void acpi_dead_idle(void) > if ( (cx = &power->states[power->count-1]) == NULL ) > goto default_halt; > > - mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); > - > if ( cx->entry_method == ACPI_CSTATE_EM_FFH ) > { > + mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); > + > /* > * Cache must be flushed as the last operation before sleeping. > * Otherwise, CPU may still hold dirty data, breaking cache coherency, > @@ -601,6 +601,20 @@ static void acpi_dead_idle(void) > mb(); > __mwait(cx->address, 0); > } > + } > + else if ( cx->entry_method == ACPI_CSTATE_EM_SYSIO ) > + { > + /* Avoid references to shared data after the cache flush */ > + u32 address = cx->address; > + u32 pmtmr_ioport_local = pmtmr_ioport; > + > + wbinvd(); > + > + while ( 1 ) > + { > + inb(address); > + inl(pmtmr_ioport_local); > + } > } > > default_halt: > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > lists.xen.org/xen-devel
I don''t think we should go back to old SYSIO method, the history here is: Xen originally has SYSIO method when offline cpu, but at c/s 23022 we cancel it as reason below =====================x86: Fix cpu offline bug: cancel SYSIO method when play dead Play dead is a fragile and tricky point of cpu offline logic. For how to play cpu dead, linux kernel changed several times: Very old kernel support 3 ways to play cpu dead: mwait, SYSIO, and halt, just like what cpuidle did when enter C3; Later, it cancel mwait and SYSIO support, only use halt to play dead; Latest linux 2.6.38 add mwait support when cpu dead. This patch cancel SYSIO method when cpu dead, keep same with latest kernel. SYSIO is an obsoleted method to enter deep C, with some tricky hardware behavior, and seldom supported in new platform. Xen experiment indicate that when cpu dead, SYSIO method would trigger unknown issue which would bring strange error. We now cancel SYSIO method when cpu dead, after all, correctness is more important than power save, and btw new platform use mwait. ===================== Thanks, Jinsong Boris Ostrovsky wrote:> # HG changeset patch > # User Boris Ostrovsky <boris.ostrovsky@amd.com> > # Date 1330466573 -3600 > # Node ID 9e5991ad9c85b5176ce269001e7957e8805dd93c > # Parent a7bacdc5449a2f7bb9c35b2a1334b463fe9f29a9 > x86: Use deep C states for off-lined CPUs > > Currently when a core is taken off-line it is placed in C1 state > (unless MONITOR/MWAIT is used). This patch allows a core to go to > deeper C states resulting in significantly higher power savings. > > Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> > > diff -r a7bacdc5449a -r 9e5991ad9c85 xen/arch/x86/acpi/cpu_idle.c > --- a/xen/arch/x86/acpi/cpu_idle.c Mon Feb 27 17:05:18 2012 +0000 > +++ b/xen/arch/x86/acpi/cpu_idle.c Tue Feb 28 23:02:53 2012 +0100 > @@ -573,10 +573,10 @@ static void acpi_dead_idle(void) > if ( (cx = &power->states[power->count-1]) == NULL ) > goto default_halt; > > - mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); > - > if ( cx->entry_method == ACPI_CSTATE_EM_FFH ) > { > + mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); > + > /* > * Cache must be flushed as the last operation before > sleeping. > * Otherwise, CPU may still hold dirty data, breaking cache > coherency, @@ -601,6 +601,20 @@ static void acpi_dead_idle(void) > mb(); > __mwait(cx->address, 0); > } > + } > + else if ( cx->entry_method == ACPI_CSTATE_EM_SYSIO ) > + { > + /* Avoid references to shared data after the cache flush */ > + u32 address = cx->address; > + u32 pmtmr_ioport_local = pmtmr_ioport; > + > + wbinvd(); > + > + while ( 1 ) > + { > + inb(address); > + inl(pmtmr_ioport_local); > + } > } > > default_halt: > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > lists.xen.org/xen-devel
Hmm, no. It need flush cache, as long as *deep Cx* would be spurious-wokenup. The reason clflush here is, it''s a light-weight flush, in fact it also could use wbinvd if not consider performance. For halt, it don''t need to do so since cpu still keep snoop when sleep. Thanks, Jinsong Ostrovsky, Boris wrote:> The patch is adding IO-based C-states. My understading is that CFLUSH > was to work around a MONITOR-related erratum. > > Or are you referring to something else? > > -boris > > > ________________________________________ > From: Zhang, Yang Z [yang.z.zhang@intel.com] > Sent: Tuesday, February 28, 2012 8:37 PM > To: Ostrovsky, Boris; xen-devel@lists.xensource.com > Subject: RE: [Xen-devel] [PATCH] x86: Use deep C states for off-lined > CPUs > > I noticed the following comments when using mwait based idle: > ------------------------------------------------------------------------- > while ( 1 ) > { > /* > * 1. The CLFLUSH is a workaround for erratum AAI65 for > * the Xeon 7400 series. > * 2. The WBINVD is insufficient due to the > spurious-wakeup > * case where we return around the loop. > * 3. Unlike wbinvd, clflush is a light weight but not > serializing > * instruction, hence memory fence is necessary to make > sure all > * load/store visible before flush cache line. > */ > mb(); > clflush(mwait_ptr); > __monitor(mwait_ptr, 0, 0); > mb(); > __mwait(cx->address, 0); > } > } > ------------------------------------------------------------------------- > Your patch should follow it too. > > best regards > yang > > >> -----Original Message----- >> From: xen-devel-bounces@lists.xen.org >> [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Boris Ostrovsky >> Sent: Wednesday, February 29, 2012 6:09 AM >> To: xen-devel@lists.xensource.com >> Cc: boris.ostrovsky@amd.com >> Subject: [Xen-devel] [PATCH] x86: Use deep C states for off-lined >> CPUs >> >> # HG changeset patch >> # User Boris Ostrovsky <boris.ostrovsky@amd.com> # Date 1330466573 >> -3600 # Node ID 9e5991ad9c85b5176ce269001e7957e8805dd93c >> # Parent a7bacdc5449a2f7bb9c35b2a1334b463fe9f29a9 >> x86: Use deep C states for off-lined CPUs >> >> Currently when a core is taken off-line it is placed in C1 state >> (unless MONITOR/MWAIT is used). This patch allows a core to go to >> deeper C states resulting in significantly higher power savings. >> >> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> >> >> diff -r a7bacdc5449a -r 9e5991ad9c85 xen/arch/x86/acpi/cpu_idle.c >> --- a/xen/arch/x86/acpi/cpu_idle.c Mon Feb 27 17:05:18 2012 +0000 >> +++ b/xen/arch/x86/acpi/cpu_idle.c Tue Feb 28 23:02:53 2012 +0100 >> @@ -573,10 +573,10 @@ static void acpi_dead_idle(void) >> if ( (cx = &power->states[power->count-1]) == NULL ) >> goto default_halt; >> >> - mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); - >> if ( cx->entry_method == ACPI_CSTATE_EM_FFH ) >> { >> + mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); + >> /* >> * Cache must be flushed as the last operation before >> sleeping. >> * Otherwise, CPU may still hold dirty data, breaking cache >> coherency, @@ -601,6 +601,20 @@ static void acpi_dead_idle(void) >> mb(); __mwait(cx->address, 0); >> } >> + } >> + else if ( cx->entry_method == ACPI_CSTATE_EM_SYSIO ) + { >> + /* Avoid references to shared data after the cache flush */ >> + u32 address = cx->address; >> + u32 pmtmr_ioport_local = pmtmr_ioport; >> + >> + wbinvd(); >> + >> + while ( 1 ) >> + { >> + inb(address); >> + inl(pmtmr_ioport_local); >> + } >> } >> >> default_halt: >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> lists.xen.org/xen-devel > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > lists.xen.org/xen-devel
>>> On 28.02.12 at 23:08, Boris Ostrovsky <boris.ostrovsky@amd.com> wrote: > --- a/xen/arch/x86/acpi/cpu_idle.c Mon Feb 27 17:05:18 2012 +0000 > +++ b/xen/arch/x86/acpi/cpu_idle.c Tue Feb 28 23:02:53 2012 +0100 > @@ -573,10 +573,10 @@ static void acpi_dead_idle(void) > if ( (cx = &power->states[power->count-1]) == NULL ) > goto default_halt; > > - mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); > - > if ( cx->entry_method == ACPI_CSTATE_EM_FFH ) > { > + mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); > +If you''re concerned about the placement of this (the change being unrelated to what your patch is aiming it anyway), then you should - explain why - move the declaration of mwait_ptr also into the if() scope> /* > * Cache must be flushed as the last operation before sleeping. > * Otherwise, CPU may still hold dirty data, breaking cache coherency, > @@ -601,6 +601,20 @@ static void acpi_dead_idle(void) > mb(); > __mwait(cx->address, 0); > } > + } > + else if ( cx->entry_method == ACPI_CSTATE_EM_SYSIO ) > + { > + /* Avoid references to shared data after the cache flush */ > + u32 address = cx->address; > + u32 pmtmr_ioport_local = pmtmr_ioport; > + > + wbinvd(); > + > + while ( 1 ) > + { > + inb(address); > + inl(pmtmr_ioport_local); > + }You will need to eliminate the reservations of the Intel folks for this to be accepted, I''m afraid, or make this AMD specific (provided the issues pointed out by them don''t affect AMD systems). Jan> } > > default_halt:
Boris Ostrovsky
2012-Feb-29 13:48 UTC
Re: [PATCH] x86: Use deep C states for off-lined CPUs
As far as I can tell the most relevant change in Linux was this: git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=ea53069231f9317062910d6e772cca4ce93de8c8 and it sounds that it was made mostly because MWAIT-based idle is more efficient on Intel processors. That''s not the case on AMD where IO-based idle is preferred (and I am not aware of any issues, at least so far). I can make the patch to be AMD_specific but since for the most parts the logic is the same as in acpi_idle_do_entry() won''t we have to modify that function as well? -boris On 02/28/12 23:58, Liu, Jinsong wrote:> I don''t think we should go back to old SYSIO method, the history here is: > > Xen originally has SYSIO method when offline cpu, but at c/s 23022 we cancel it as reason below > =====================> x86: Fix cpu offline bug: cancel SYSIO method when play dead > > Play dead is a fragile and tricky point of cpu offline logic. For how > to play cpu dead, linux kernel changed several times: Very old kernel > support 3 ways to play cpu dead: mwait, SYSIO, and halt, just like > what cpuidle did when enter C3; Later, it cancel mwait and SYSIO > support, only use halt to play dead; Latest linux 2.6.38 add mwait > support when cpu dead. > > This patch cancel SYSIO method when cpu dead, keep same with latest > kernel. > > SYSIO is an obsoleted method to enter deep C, with some tricky > hardware behavior, and seldom supported in new platform. Xen > experiment indicate that when cpu dead, SYSIO method would trigger > unknown issue which would bring strange error. We now cancel SYSIO > method when cpu dead, after all, correctness is more important than > power save, and btw new platform use mwait. > =====================> > Thanks, > Jinsong > > Boris Ostrovsky wrote: >> # HG changeset patch >> # User Boris Ostrovsky<boris.ostrovsky@amd.com> >> # Date 1330466573 -3600 >> # Node ID 9e5991ad9c85b5176ce269001e7957e8805dd93c >> # Parent a7bacdc5449a2f7bb9c35b2a1334b463fe9f29a9 >> x86: Use deep C states for off-lined CPUs >> >> Currently when a core is taken off-line it is placed in C1 state >> (unless MONITOR/MWAIT is used). This patch allows a core to go to >> deeper C states resulting in significantly higher power savings. >> >> Signed-off-by: Boris Ostrovsky<boris.ostrovsky@amd.com> >> >> diff -r a7bacdc5449a -r 9e5991ad9c85 xen/arch/x86/acpi/cpu_idle.c >> --- a/xen/arch/x86/acpi/cpu_idle.c Mon Feb 27 17:05:18 2012 +0000 >> +++ b/xen/arch/x86/acpi/cpu_idle.c Tue Feb 28 23:02:53 2012 +0100 >> @@ -573,10 +573,10 @@ static void acpi_dead_idle(void) >> if ( (cx =&power->states[power->count-1]) == NULL ) >> goto default_halt; >> >> - mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); >> - >> if ( cx->entry_method == ACPI_CSTATE_EM_FFH ) >> { >> + mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); >> + >> /* >> * Cache must be flushed as the last operation before >> sleeping. >> * Otherwise, CPU may still hold dirty data, breaking cache >> coherency, @@ -601,6 +601,20 @@ static void acpi_dead_idle(void) >> mb(); >> __mwait(cx->address, 0); >> } >> + } >> + else if ( cx->entry_method == ACPI_CSTATE_EM_SYSIO ) >> + { >> + /* Avoid references to shared data after the cache flush */ >> + u32 address = cx->address; >> + u32 pmtmr_ioport_local = pmtmr_ioport; >> + >> + wbinvd(); >> + >> + while ( 1 ) >> + { >> + inb(address); >> + inl(pmtmr_ioport_local); >> + } >> } >> >> default_halt: >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> lists.xen.org/xen-devel > >
Boris Ostrovsky
2012-Feb-29 14:55 UTC
Re: [PATCH] x86: Use deep C states for off-lined CPUs
On 02/29/12 00:21, Liu, Jinsong wrote:> Hmm, no. > > It need flush cache, as long as *deep Cx* would be spurious-wokenup. > The reason clflush here is, it''s a light-weight flush, in fact it also could use wbinvd if not consider performance.What address would need to be CFLUSH''d ? Both "address" and "pmtmr_ioport_local"?> > For halt, it don''t need to do so since cpu still keep snoop when sleep.If cpu not snoop when in deeper C-states, wouldn''t we have a problem with acpi_idle_do_entry()? There is a code path (at least for for C2) where the cache is not flushed. Incidentally, if CFLUSH is required for MONITOR then perhaps mwait_idle_with_hints() needs to have it as well? -boris> > Thanks, > Jinsong > > Ostrovsky, Boris wrote: >> The patch is adding IO-based C-states. My understading is that CFLUSH >> was to work around a MONITOR-related erratum. >> >> Or are you referring to something else? >> >> -boris >> >> >> ________________________________________ >> From: Zhang, Yang Z [yang.z.zhang@intel.com] >> Sent: Tuesday, February 28, 2012 8:37 PM >> To: Ostrovsky, Boris; xen-devel@lists.xensource.com >> Subject: RE: [Xen-devel] [PATCH] x86: Use deep C states for off-lined >> CPUs >> >> I noticed the following comments when using mwait based idle: >> ------------------------------------------------------------------------- >> while ( 1 ) >> { >> /* >> * 1. The CLFLUSH is a workaround for erratum AAI65 for >> * the Xeon 7400 series. >> * 2. The WBINVD is insufficient due to the >> spurious-wakeup >> * case where we return around the loop. >> * 3. Unlike wbinvd, clflush is a light weight but not >> serializing >> * instruction, hence memory fence is necessary to make >> sure all >> * load/store visible before flush cache line. >> */ >> mb(); >> clflush(mwait_ptr); >> __monitor(mwait_ptr, 0, 0); >> mb(); >> __mwait(cx->address, 0); >> } >> } >> ------------------------------------------------------------------------- >> Your patch should follow it too. >> >> best regards >> yang >> >> >>> -----Original Message----- >>> From: xen-devel-bounces@lists.xen.org >>> [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Boris Ostrovsky >>> Sent: Wednesday, February 29, 2012 6:09 AM >>> To: xen-devel@lists.xensource.com >>> Cc: boris.ostrovsky@amd.com >>> Subject: [Xen-devel] [PATCH] x86: Use deep C states for off-lined >>> CPUs >>> >>> # HG changeset patch >>> # User Boris Ostrovsky<boris.ostrovsky@amd.com> # Date 1330466573 >>> -3600 # Node ID 9e5991ad9c85b5176ce269001e7957e8805dd93c >>> # Parent a7bacdc5449a2f7bb9c35b2a1334b463fe9f29a9 >>> x86: Use deep C states for off-lined CPUs >>> >>> Currently when a core is taken off-line it is placed in C1 state >>> (unless MONITOR/MWAIT is used). This patch allows a core to go to >>> deeper C states resulting in significantly higher power savings. >>> >>> Signed-off-by: Boris Ostrovsky<boris.ostrovsky@amd.com> >>> >>> diff -r a7bacdc5449a -r 9e5991ad9c85 xen/arch/x86/acpi/cpu_idle.c >>> --- a/xen/arch/x86/acpi/cpu_idle.c Mon Feb 27 17:05:18 2012 +0000 >>> +++ b/xen/arch/x86/acpi/cpu_idle.c Tue Feb 28 23:02:53 2012 +0100 >>> @@ -573,10 +573,10 @@ static void acpi_dead_idle(void) >>> if ( (cx =&power->states[power->count-1]) == NULL ) >>> goto default_halt; >>> >>> - mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); - >>> if ( cx->entry_method == ACPI_CSTATE_EM_FFH ) >>> { >>> + mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); + >>> /* >>> * Cache must be flushed as the last operation before >>> sleeping. >>> * Otherwise, CPU may still hold dirty data, breaking cache >>> coherency, @@ -601,6 +601,20 @@ static void acpi_dead_idle(void) >>> mb(); __mwait(cx->address, 0); >>> } >>> + } >>> + else if ( cx->entry_method == ACPI_CSTATE_EM_SYSIO ) + { >>> + /* Avoid references to shared data after the cache flush */ >>> + u32 address = cx->address; >>> + u32 pmtmr_ioport_local = pmtmr_ioport; >>> + >>> + wbinvd(); >>> + >>> + while ( 1 ) >>> + { >>> + inb(address); >>> + inl(pmtmr_ioport_local); >>> + } >>> } >>> >>> default_halt: >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xen.org >>> lists.xen.org/xen-devel >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> lists.xen.org/xen-devel > >
>>> On 29.02.12 at 15:55, Boris Ostrovsky <boris.ostrovsky@amd.com> wrote: > On 02/29/12 00:21, Liu, Jinsong wrote: >> Hmm, no. >> >> It need flush cache, as long as *deep Cx* would be spurious-wokenup. >> The reason clflush here is, it''s a light-weight flush, in fact it also could > use wbinvd if not consider performance. > > What address would need to be CFLUSH''d ? Both "address" and > "pmtmr_ioport_local"?Hardly - these are both I/O ports. Jan
Boris Ostrovsky wrote:> As far as I can tell the most relevant change in Linux was this: > git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=ea53069231f9317062910d6e772cca4ce93de8c8 > and it sounds that it was made mostly because MWAIT-based idle is more > efficient on Intel processors. That''s not the case on AMD where > IO-based idle is preferred (and I am not aware of any issues, at > least so far). > > I can make the patch to be AMD_specific but since for the most parts > the logic is the same as in acpi_idle_do_entry() won''t we have to > modify that function as well? >AMD specific approach is OK to me. Thanks, Jinsong> > On 02/28/12 23:58, Liu, Jinsong wrote: >> I don''t think we should go back to old SYSIO method, the history >> here is: >> >> Xen originally has SYSIO method when offline cpu, but at c/s 23022 >> we cancel it as reason below ====================== x86: Fix cpu >> offline bug: cancel SYSIO method when play dead >> >> Play dead is a fragile and tricky point of cpu offline logic. For >> how >> to play cpu dead, linux kernel changed several times: Very old kernel >> support 3 ways to play cpu dead: mwait, SYSIO, and halt, just like >> what cpuidle did when enter C3; Later, it cancel mwait and SYSIO >> support, only use halt to play dead; Latest linux 2.6.38 add mwait >> support when cpu dead. >> >> This patch cancel SYSIO method when cpu dead, keep same with latest >> kernel. >> >> SYSIO is an obsoleted method to enter deep C, with some tricky >> hardware behavior, and seldom supported in new platform. Xen >> experiment indicate that when cpu dead, SYSIO method would trigger >> unknown issue which would bring strange error. We now cancel SYSIO >> method when cpu dead, after all, correctness is more important than >> power save, and btw new platform use mwait. >> =====================>> >> Thanks, >> Jinsong >> >> Boris Ostrovsky wrote: >>> # HG changeset patch >>> # User Boris Ostrovsky<boris.ostrovsky@amd.com> >>> # Date 1330466573 -3600 >>> # Node ID 9e5991ad9c85b5176ce269001e7957e8805dd93c >>> # Parent a7bacdc5449a2f7bb9c35b2a1334b463fe9f29a9 >>> x86: Use deep C states for off-lined CPUs >>> >>> Currently when a core is taken off-line it is placed in C1 state >>> (unless MONITOR/MWAIT is used). This patch allows a core to go to >>> deeper C states resulting in significantly higher power savings. >>> >>> Signed-off-by: Boris Ostrovsky<boris.ostrovsky@amd.com> >>> >>> diff -r a7bacdc5449a -r 9e5991ad9c85 xen/arch/x86/acpi/cpu_idle.c >>> --- a/xen/arch/x86/acpi/cpu_idle.c Mon Feb 27 17:05:18 2012 +0000 >>> +++ b/xen/arch/x86/acpi/cpu_idle.c Tue Feb 28 23:02:53 2012 +0100 >>> @@ -573,10 +573,10 @@ static void acpi_dead_idle(void) >>> if ( (cx =&power->states[power->count-1]) == NULL ) >>> goto default_halt; >>> >>> - mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); - >>> if ( cx->entry_method == ACPI_CSTATE_EM_FFH ) { >>> + mwait_ptr = (void *)&mwait_wakeup(smp_processor_id()); + >>> /* >>> * Cache must be flushed as the last operation before >>> sleeping. >>> * Otherwise, CPU may still hold dirty data, breaking >>> cache coherency, @@ -601,6 +601,20 @@ static void >>> acpi_dead_idle(void) mb(); >>> __mwait(cx->address, 0); } >>> + } >>> + else if ( cx->entry_method == ACPI_CSTATE_EM_SYSIO ) + { >>> + /* Avoid references to shared data after the cache flush */ >>> + u32 address = cx->address; >>> + u32 pmtmr_ioport_local = pmtmr_ioport; >>> + >>> + wbinvd(); >>> + >>> + while ( 1 ) >>> + { >>> + inb(address); >>> + inl(pmtmr_ioport_local); >>> + } >>> } >>> >>> default_halt: >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xen.org >>> lists.xen.org/xen-devel
Boris Ostrovsky wrote:> On 02/29/12 00:21, Liu, Jinsong wrote: >> Hmm, no. >> >> It need flush cache, as long as *deep Cx* would be spurious-wokenup. >> The reason clflush here is, it''s a light-weight flush, in fact it >> also could use wbinvd if not consider performance. > > What address would need to be CFLUSH''d ? Both "address" and > "pmtmr_ioport_local"?if while loop only involve inb/inl port, no need to flush.> >> >> For halt, it don''t need to do so since cpu still keep snoop when >> sleep. > > If cpu not snoop when in deeper C-states, wouldn''t we have a problem > with acpi_idle_do_entry()? There is a code path (at least for for C2) > where the cache is not flushed.No problem for C1/C2, only C3 and deeper would stop snoop.> > Incidentally, if CFLUSH is required for MONITOR then perhaps > mwait_idle_with_hints() needs to have it as well? >No need to do so, wbinvd has been done before mwait_idle_with_hints enter C3, and it has different scenario with acpi_dead_idle which is a while(1) loop. Thanks, Jinsong