Cihula, Joseph
2007-Sep-10 21:55 UTC
[Xen-devel] [PATCH] Prevent incorrect optimization of read_crx()
This patch prevents the compiler from optimizing multiple read_crx() calls when there might be an intervening write_crx(). E.g.: write_cr0(read_cr0() | X86_CR0_PE); ... if ( read_cr0() & X86_CR0_PE ) <- could get old value (w/o PE set) ... This was actually seen to happen in the sboot code that copied this header from Xen. Without the volatile attribute, the compiler simply cached the result of the first read and returned it for the second. This was made even more problematic, in this particular case, because the two read''s were in separate static fns that the compiler also coalesced into a single function before it optimized it. Joe # HG changeset patch # User Joseph Cihula <joseph.cihula@intel.com> # Date 1189460837 25200 # Node ID 0e8e662becc240b424682a78ec9efbeaf6e84745 # Parent 9071521d48646d247efafcf230ea0a4a2b6f0efa Prevents incorrect optimization of read_crx() calls with an intervening write_crx(). Signed-off-by: Joseph Cihula <joseph.cihula@intel.com> diff -r 9071521d4864 -r 0e8e662becc2 xen/include/asm-x86/processor.h --- a/xen/include/asm-x86/processor.h Fri Sep 07 11:39:10 2007 +0100 +++ b/xen/include/asm-x86/processor.h Mon Sep 10 14:47:17 2007 -0700 @@ -281,7 +281,7 @@ static inline unsigned long read_cr0(voi static inline unsigned long read_cr0(void) { unsigned long __cr0; - __asm__("mov %%cr0,%0\n\t" :"=r" (__cr0)); + __asm__ __volatile__("mov %%cr0,%0\n\t" :"=r" (__cr0)); return __cr0; } @@ -293,14 +293,14 @@ static inline unsigned long read_cr2(voi static inline unsigned long read_cr2(void) { unsigned long __cr2; - __asm__("mov %%cr2,%0\n\t" :"=r" (__cr2)); + __asm__ __volatile__("mov %%cr2,%0\n\t" :"=r" (__cr2)); return __cr2; } static inline unsigned long read_cr4(void) { unsigned long __cr4; - __asm__("mov %%cr4,%0\n\t" :"=r" (__cr4)); + __asm__ __volatile__("mov %%cr4,%0\n\t" :"=r" (__cr4)); return __cr4; } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2007-Sep-11 06:52 UTC
Re: [Xen-devel] [PATCH] Prevent incorrect optimization of read_crx()
Wouldn''t you need to put __volatile__ in write_crX(), too? And does it really make sense to put it in read_cr2()? Jan>>> "Cihula, Joseph" <joseph.cihula@intel.com> 10.09.07 23:55 >>>This patch prevents the compiler from optimizing multiple read_crx() calls when there might be an intervening write_crx(). E.g.: write_cr0(read_cr0() | X86_CR0_PE); ... if ( read_cr0() & X86_CR0_PE ) <- could get old value (w/o PE set) ... This was actually seen to happen in the sboot code that copied this header from Xen. Without the volatile attribute, the compiler simply cached the result of the first read and returned it for the second. This was made even more problematic, in this particular case, because the two read''s were in separate static fns that the compiler also coalesced into a single function before it optimized it. Joe # HG changeset patch # User Joseph Cihula <joseph.cihula@intel.com> # Date 1189460837 25200 # Node ID 0e8e662becc240b424682a78ec9efbeaf6e84745 # Parent 9071521d48646d247efafcf230ea0a4a2b6f0efa Prevents incorrect optimization of read_crx() calls with an intervening write_crx(). Signed-off-by: Joseph Cihula <joseph.cihula@intel.com> diff -r 9071521d4864 -r 0e8e662becc2 xen/include/asm-x86/processor.h --- a/xen/include/asm-x86/processor.h Fri Sep 07 11:39:10 2007 +0100 +++ b/xen/include/asm-x86/processor.h Mon Sep 10 14:47:17 2007 -0700 @@ -281,7 +281,7 @@ static inline unsigned long read_cr0(voi static inline unsigned long read_cr0(void) { unsigned long __cr0; - __asm__("mov %%cr0,%0\n\t" :"=r" (__cr0)); + __asm__ __volatile__("mov %%cr0,%0\n\t" :"=r" (__cr0)); return __cr0; } @@ -293,14 +293,14 @@ static inline unsigned long read_cr2(voi static inline unsigned long read_cr2(void) { unsigned long __cr2; - __asm__("mov %%cr2,%0\n\t" :"=r" (__cr2)); + __asm__ __volatile__("mov %%cr2,%0\n\t" :"=r" (__cr2)); return __cr2; } static inline unsigned long read_cr4(void) { unsigned long __cr4; - __asm__("mov %%cr4,%0\n\t" :"=r" (__cr4)); + __asm__ __volatile__("mov %%cr4,%0\n\t" :"=r" (__cr4)); return __cr4; } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Sep-11 06:59 UTC
Re: [Xen-devel] [PATCH] Prevent incorrect optimization of read_crx()
It doesn''t hurt to do so, for consistency''s sake. It''s only used on one path we care about the performance of, and in that case it can''t be optimised away even without volatile. -- Keir On 11/9/07 07:52, "Jan Beulich" <jbeulich@novell.com> wrote:> Wouldn''t you need to put __volatile__ in write_crX(), too? And does it really > make sense to put it in read_cr2()? Jan > >>>> "Cihula, Joseph" <joseph.cihula@intel.com> 10.09.07 23:55 >>> > This patch prevents the compiler from optimizing multiple read_crx() > calls when there might be an intervening write_crx(). E.g.: > write_cr0(read_cr0() | X86_CR0_PE); > ... > if ( read_cr0() & X86_CR0_PE ) <- could get old value (w/o PE > set) > ... > > This was actually seen to happen in the sboot code that copied this > header from Xen. Without the volatile attribute, the compiler simply > cached the result of the first read and returned it for the second. > This was made even more problematic, in this particular case, because > the two read''s were in separate static fns that the compiler also > coalesced into a single function before it optimized it. > > Joe > > > # HG changeset patch > # User Joseph Cihula <joseph.cihula@intel.com> > # Date 1189460837 25200 > # Node ID 0e8e662becc240b424682a78ec9efbeaf6e84745 > # Parent 9071521d48646d247efafcf230ea0a4a2b6f0efa > Prevents incorrect optimization of read_crx() calls with an intervening > write_crx(). > > Signed-off-by: Joseph Cihula <joseph.cihula@intel.com> > > diff -r 9071521d4864 -r 0e8e662becc2 xen/include/asm-x86/processor.h > --- a/xen/include/asm-x86/processor.h Fri Sep 07 11:39:10 2007 +0100 > +++ b/xen/include/asm-x86/processor.h Mon Sep 10 14:47:17 2007 -0700 > @@ -281,7 +281,7 @@ static inline unsigned long read_cr0(voi > static inline unsigned long read_cr0(void) > { > unsigned long __cr0; > - __asm__("mov %%cr0,%0\n\t" :"=r" (__cr0)); > + __asm__ __volatile__("mov %%cr0,%0\n\t" :"=r" (__cr0)); > return __cr0; > } > > @@ -293,14 +293,14 @@ static inline unsigned long read_cr2(voi > static inline unsigned long read_cr2(void) > { > unsigned long __cr2; > - __asm__("mov %%cr2,%0\n\t" :"=r" (__cr2)); > + __asm__ __volatile__("mov %%cr2,%0\n\t" :"=r" (__cr2)); > return __cr2; > } > > static inline unsigned long read_cr4(void) > { > unsigned long __cr4; > - __asm__("mov %%cr4,%0\n\t" :"=r" (__cr4)); > + __asm__ __volatile__("mov %%cr4,%0\n\t" :"=r" (__cr4)); > return __cr4; > } > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Sep-11 07:12 UTC
Re: [Xen-devel] [PATCH] Prevent incorrect optimization of read_crx()
On 11/9/07 07:52, "Jan Beulich" <jbeulich@novell.com> wrote:> Wouldn''t you need to put __volatile__ in write_crX(), too?Actually it is surprising that the compiler hasn''t been optimising away our write_cr0() entirely. Perhaps it special cases asm blocks with no outputs. Adding the volatile keyword makes a lot of sense anyway. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel