Besides the .text space savings of over 2.5k on x86-64 (1.5k for
x86-32) this removes a load (plus a lea on x86-64) from various
frequently executed code paths, and finally provides a reason (other
than legibility) to prefer this_cpu() over per_cpu() in all places
where smp_processor_id() isn''t being called anyway.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
--- 2010-06-15.orig/xen/include/asm-x86/current.h 2010-07-13 14:38:21.000000000
+0200
+++ 2010-06-15/xen/include/asm-x86/current.h 2010-07-13 15:12:37.000000000 +0200
@@ -17,6 +17,10 @@ struct vcpu;
struct cpu_info {
struct cpu_user_regs guest_cpu_user_regs;
unsigned int processor_id;
+ unsigned long per_cpu_offset;
+#ifdef __x86_64__
+ unsigned long __pad_for_stack_bottom;
+#endif
struct vcpu *current_vcpu;
};
@@ -35,7 +39,12 @@ static inline struct cpu_info *get_cpu_i
#define current (get_current())
#define get_processor_id() (get_cpu_info()->processor_id)
-#define set_processor_id(id) (get_cpu_info()->processor_id = (id))
+#define set_processor_id(id) do { \
+ struct cpu_info *ci__ = get_cpu_info(); \
+ ci__->per_cpu_offset = __per_cpu_offset[ci__->processor_id = (id)]; \
+} while (0)
+
+#define get_per_cpu_offset() (get_cpu_info()->per_cpu_offset)
#define guest_cpu_user_regs() (&get_cpu_info()->guest_cpu_user_regs)
--- 2010-06-15.orig/xen/include/asm-x86/percpu.h 2010-07-13 14:38:21.000000000
+0200
+++ 2010-06-15/xen/include/asm-x86/percpu.h 2010-07-13 13:56:38.000000000 +0200
@@ -16,7 +16,7 @@ void percpu_init_areas(void);
#define per_cpu(var, cpu) \
(*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset[cpu]))
#define __get_cpu_var(var) \
- (per_cpu(var, smp_processor_id()))
+ (*RELOC_HIDE(&per_cpu__##var, get_per_cpu_offset()))
#define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
On 13/07/2010 14:35, "Jan Beulich" <JBeulich@novell.com> wrote:> Besides the .text space savings of over 2.5k on x86-64 (1.5k for > x86-32) this removes a load (plus a lea on x86-64) from various > frequently executed code paths, and finally provides a reason (other > than legibility) to prefer this_cpu() over per_cpu() in all places > where smp_processor_id() isn''t being called anyway. > > Signed-off-by: Jan Beulich <jbeulich@novell.com> > > --- 2010-06-15.orig/xen/include/asm-x86/current.h 2010-07-13 > 14:38:21.000000000 +0200 > +++ 2010-06-15/xen/include/asm-x86/current.h 2010-07-13 15:12:37.000000000 > +0200 > @@ -17,6 +17,10 @@ struct vcpu; > struct cpu_info { > struct cpu_user_regs guest_cpu_user_regs; > unsigned int processor_id; > + unsigned long per_cpu_offset; > +#ifdef __x86_64__ > + unsigned long __pad_for_stack_bottom; > +#endifThat''s just nasty. If we need the structure to be 16-byte aligned then we should achieve it via __attribute__((__aligned__(16))). And if we add that we may as well not ifdef it, I''m sure the up to 12 bytes padding on i386 won''t cause stack overflow. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 13/07/2010 15:26, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:> That''s just nasty. If we need the structure to be 16-byte aligned then we > should achieve it via __attribute__((__aligned__(16))). And if we add that > we may as well not ifdef it, I''m sure the up to 12 bytes padding on i386 > won''t cause stack overflow.I should add: if you agree please re-do the patch. Also a code comment on what the alignment attribute is for would be a good idea of course. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 13/07/2010 15:27, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:> On 13/07/2010 15:26, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: > >> That''s just nasty. If we need the structure to be 16-byte aligned then we >> should achieve it via __attribute__((__aligned__(16))). And if we add that >> we may as well not ifdef it, I''m sure the up to 12 bytes padding on i386 >> won''t cause stack overflow. > > I should add: if you agree please re-do the patch. Also a code comment on > what the alignment attribute is for would be a good idea of course.Ah, hang on... We actually specifically want the structure to be not a multiple of 16 don''t we. :-) We need it to be an odd multiple of 8... Fair enough en I guess. I''ll add a code comment. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 13.07.10 at 16:26, Keir Fraser <keir.fraser@eu.citrix.com> wrote: > On 13/07/2010 14:35, "Jan Beulich" <JBeulich@novell.com> wrote: > >> Besides the .text space savings of over 2.5k on x86-64 (1.5k for >> x86-32) this removes a load (plus a lea on x86-64) from various >> frequently executed code paths, and finally provides a reason (other >> than legibility) to prefer this_cpu() over per_cpu() in all places >> where smp_processor_id() isn''t being called anyway. >> >> Signed-off-by: Jan Beulich <jbeulich@novell.com> >> >> --- 2010-06-15.orig/xen/include/asm-x86/current.h 2010-07-13 >> 14:38:21.000000000 +0200 >> +++ 2010-06-15/xen/include/asm-x86/current.h 2010-07-13 15:12:37.000000000 >> +0200 >> @@ -17,6 +17,10 @@ struct vcpu; >> struct cpu_info { >> struct cpu_user_regs guest_cpu_user_regs; >> unsigned int processor_id; >> + unsigned long per_cpu_offset; >> +#ifdef __x86_64__ >> + unsigned long __pad_for_stack_bottom; >> +#endif > > That''s just nasty. If we need the structure to be 16-byte aligned then we > should achieve it via __attribute__((__aligned__(16))). And if we add that > we may as well not ifdef it, I''m sure the up to 12 bytes padding on i386 > won''t cause stack overflow.I indeed considered this apparently cleaner alternative first, but no, __attribute__((__aligned__())) isn''t the right solution here: For one, the has no effect due to the way get_cpu_info() calculates its result. Second, sizeof(struct cpu_info) would change (to a value divisible by 16), and thus offsetof(struct cpu_info, guest_cpu_user_regs.es) would become indivisible by 16 (triggering the BUG_ON((get_stack_bottom() & 15) != 0) in cpu_init()) without any way of making it so again. I found it quite odd that, without any special comment to that effect, I couldn''t just add a single field to struct cpu_info without causing breakage. The apparently odd extra padding field at least provides a slight hint towards issues here. A similar issue is that there is a silent requirement of "current_vcpu" being the last field... Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 13/07/2010 15:43, "Jan Beulich" <JBeulich@novell.com> wrote:> I found it quite odd that, without any special comment to that > effect, I couldn''t just add a single field to struct cpu_info without > causing breakage. The apparently odd extra padding field at > least provides a slight hint towards issues here. A similar issue > is that there is a silent requirement of "current_vcpu" being the > last field...Where does that come from? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 13.07.10 at 16:45, Keir Fraser <keir.fraser@eu.citrix.com> wrote: > On 13/07/2010 15:43, "Jan Beulich" <JBeulich@novell.com> wrote: > >> I found it quite odd that, without any special comment to that >> effect, I couldn''t just add a single field to struct cpu_info without >> causing breakage. The apparently odd extra padding field at >> least provides a slight hint towards issues here. A similar issue >> is that there is a silent requirement of "current_vcpu" being the >> last field... > > Where does that come from?>From various assembly files, defining GET_CURRENT() or alikeon their own. grep-ing for STACK_SIZE had helped me noticing the issue. Likewise GET_GUEST_REGS() implies that "guest_cpu_user_regs" is the first field of struct cpu_info (but this I would consider expected). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 13/07/2010 15:57, "Jan Beulich" <JBeulich@novell.com> wrote:>> Where does that come from? > > From various assembly files, defining GET_CURRENT() or alike > on their own. grep-ing for STACK_SIZE had helped me noticing > the issue. Likewise GET_GUEST_REGS() implies that > "guest_cpu_user_regs" is the first field of struct cpu_info (but > this I would consider expected).Hmm, I think I will do a cleanup patch, thanks! On this theme, just look at how the code in arch/x86/acpi/wakeup_prot.S (lines 162--167) does multiple hardcoded indirections to get at the current domain_id. That is absolutely shocking! -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 13.07.10 at 17:03, Keir Fraser <keir.fraser@eu.citrix.com> wrote: > On 13/07/2010 15:57, "Jan Beulich" <JBeulich@novell.com> wrote: > >>> Where does that come from? >> >> From various assembly files, defining GET_CURRENT() or alike >> on their own. grep-ing for STACK_SIZE had helped me noticing >> the issue. Likewise GET_GUEST_REGS() implies that >> "guest_cpu_user_regs" is the first field of struct cpu_info (but >> this I would consider expected). > > Hmm, I think I will do a cleanup patch, thanks! > > On this theme, just look at how the code in arch/x86/acpi/wakeup_prot.S > (lines 162--167) does multiple hardcoded indirections to get at the current > domain_id. That is absolutely shocking!We can call ourselves really lucky that this never caused any problems. I''ll be glad seeing you clean this up. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel