Jeremy Fitzhardinge
2011-Oct-01  21:55 UTC
[Xen-devel] [PATCH RFC V2 0/5] jump-label: allow early jump_label_enable()
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> [ This is a new version of this series, which drops the need for any special _early() function in favour of making sure stop_machine() can operate correctly and efficiently in a pre-SMP envronment. ] While trying to use the jump-label stuff for my PV ticketlock changes, I had some problems using jump labels early in the kernel''s lifetime (pre-SMP). The basic problem is that even if I enable a jump_label_key, the jump_label_init() initializer ends up nopping out all the code sites. This series enables early use of jump labels by making jump_label_init() respect already-enabled keys. To do this, I''ve dropped arch_jump_label_poke_text_early() and replaced it with arch_jump_label_transform(), allowing it to either insert an optimal nop, or leave the jump in place. Part of this series makes sure that stop_machine() is safe to call in an early pre-SMP environment, by making it just call the function with interrupts disabled. git://github.com/jsgf/linux-xen upstream/jump-label-noearly Jeremy Fitzhardinge (5): jump_label: use proper atomic_t initializer stop_machine: make stop_machine safe and efficient to call early jump_label: if a key has already been initialized, don''t nop it out x86/jump_label: drop arch_jump_label_text_poke_early() sparc/jump_label: drop arch_jump_label_text_poke_early() arch/sparc/kernel/jump_label.c | 8 -------- arch/x86/kernel/jump_label.c | 6 ------ include/linux/jump_label.h | 7 ++++--- kernel/jump_label.c | 20 ++++++++------------ kernel/stop_machine.c | 21 +++++++++++++++++++++ 5 files changed, 33 insertions(+), 29 deletions(-) -- 1.7.6.2 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Oct-01  21:55 UTC
[Xen-devel] [PATCH RFC V2 1/5] jump_label: use proper atomic_t initializer
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
ATOMIC_INIT() is the proper thing to use.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 include/linux/jump_label.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 66f23dc..1213e9d 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -28,9 +28,9 @@ struct module;
 #ifdef HAVE_JUMP_LABEL
 
 #ifdef CONFIG_MODULES
-#define JUMP_LABEL_INIT {{ 0 }, NULL, NULL}
+#define JUMP_LABEL_INIT {ATOMIC_INIT(0), NULL, NULL}
 #else
-#define JUMP_LABEL_INIT {{ 0 }, NULL}
+#define JUMP_LABEL_INIT {ATOMIC_INIT(0), NULL}
 #endif
 
 static __always_inline bool static_branch(struct jump_label_key *key)
-- 
1.7.6.2
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Oct-01  21:55 UTC
[Xen-devel] [PATCH RFC V2 2/5] stop_machine: make stop_machine safe and efficient to call early
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Make stop_machine() safe to call early in boot, before SMP has been
set up, by simply calling the callback function directly if there''s
only one CPU online.
[ Fixes from AKPM:
   - add comment
   - local_irq_flags, not save_flags
   - also call hard_irq_disable() for systems which need it
  Tejun suggested using an explicit flag rather than just looking at
  the online cpu count. ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
---
 kernel/stop_machine.c |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index ba5070c..9c59d9e 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -41,6 +41,7 @@ struct cpu_stopper {
 };
 
 static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper);
+static bool stop_machine_initialized = false;
 
 static void cpu_stop_init_done(struct cpu_stop_done *done, unsigned int
nr_todo)
 {
@@ -386,6 +387,8 @@ static int __init cpu_stop_init(void)
 	cpu_stop_cpu_callback(&cpu_stop_cpu_notifier, CPU_ONLINE, bcpu);
 	register_cpu_notifier(&cpu_stop_cpu_notifier);
 
+	stop_machine_initialized = true;
+
 	return 0;
 }
 early_initcall(cpu_stop_init);
@@ -485,6 +488,24 @@ int __stop_machine(int (*fn)(void *), void *data, const
struct cpumask *cpus)
 					    .num_threads = num_online_cpus(),
 					    .active_cpus = cpus };
 
+	if (!stop_machine_initialized) {
+		/*
+		 * Handle the case where stop_machine() is called early in boot
+		 * before SMP startup.
+		 */
+ 		unsigned long flags;
+		int ret;
+
+		WARN_ON_ONCE(smdata.num_threads != 1);
+
+		local_irq_save(flags);
+		hard_irq_disable();
+		ret = (*fn)(data);
+		local_irq_restore(flags);
+
+		return ret;
+	}
+
 	/* Set the initial state and stop all online cpus. */
 	set_state(&smdata, STOPMACHINE_PREPARE);
 	return stop_cpus(cpu_online_mask, stop_machine_cpu_stop, &smdata);
-- 
1.7.6.2
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Oct-01  21:55 UTC
[Xen-devel] [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
If a key has been enabled before jump_label_init() is called, don''t
nop it out.
This removes arch_jump_label_text_poke_early() (which can only nop
out a site) and uses arch_jump_label_transform() instead.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
---
 include/linux/jump_label.h |    3 ++-
 kernel/jump_label.c        |   20 ++++++++------------
 2 files changed, 10 insertions(+), 13 deletions(-)
diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 1213e9d..c8fb1b3 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -45,7 +45,8 @@ extern void jump_label_lock(void);
 extern void jump_label_unlock(void);
 extern void arch_jump_label_transform(struct jump_entry *entry,
 				 enum jump_label_type type);
-extern void arch_jump_label_text_poke_early(jump_label_t addr);
+extern void arch_jump_label_transform_early(struct jump_entry *entry,
+				 enum jump_label_type type);
 extern int jump_label_text_reserved(void *start, void *end);
 extern void jump_label_inc(struct jump_label_key *key);
 extern void jump_label_dec(struct jump_label_key *key);
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index a8ce450..059202d5 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -121,13 +121,6 @@ static void __jump_label_update(struct jump_label_key *key,
 	}
 }
 
-/*
- * Not all archs need this.
- */
-void __weak arch_jump_label_text_poke_early(jump_label_t addr)
-{
-}
-
 static __init int jump_label_init(void)
 {
 	struct jump_entry *iter_start = __start___jump_table;
@@ -139,12 +132,15 @@ static __init int jump_label_init(void)
 	jump_label_sort_entries(iter_start, iter_stop);
 
 	for (iter = iter_start; iter < iter_stop; iter++) {
-		arch_jump_label_text_poke_early(iter->code);
-		if (iter->key == (jump_label_t)(unsigned long)key)
+		struct jump_label_key *iterk;
+
+		iterk = (struct jump_label_key *)(unsigned long)iter->key;
+		arch_jump_label_transform(iter, jump_label_enabled(iterk) ?
+					  JUMP_LABEL_ENABLE : JUMP_LABEL_DISABLE);
+		if (iterk == key)
 			continue;
 
-		key = (struct jump_label_key *)(unsigned long)iter->key;
-		atomic_set(&key->enabled, 0);
+		key = iterk;
 		key->entries = iter;
 #ifdef CONFIG_MODULES
 		key->next = NULL;
@@ -212,7 +208,7 @@ void jump_label_apply_nops(struct module *mod)
 		return;
 
 	for (iter = iter_start; iter < iter_stop; iter++)
-		arch_jump_label_text_poke_early(iter->code);
+		arch_jump_label_transform(iter, JUMP_LABEL_DISABLE);
 }
 
 static int jump_label_add_module(struct module *mod)
-- 
1.7.6.2
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Oct-01  21:55 UTC
[Xen-devel] [PATCH RFC V2 4/5] x86/jump_label: drop arch_jump_label_text_poke_early()
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
It is no longer used.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Jason Baron <jbaron@redhat.com>
---
 arch/x86/kernel/jump_label.c |    6 ------
 1 files changed, 0 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index 3fee346..2ad0298 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -42,10 +42,4 @@ void arch_jump_label_transform(struct jump_entry *entry,
 	put_online_cpus();
 }
 
-void arch_jump_label_text_poke_early(jump_label_t addr)
-{
-	text_poke_early((void *)addr, ideal_nops[NOP_ATOMIC5],
-			JUMP_LABEL_NOP_SIZE);
-}
-
 #endif
-- 
1.7.6.2
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Oct-01  21:55 UTC
[Xen-devel] [PATCH RFC V2 5/5] sparc/jump_label: drop arch_jump_label_text_poke_early()
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
It is no longer used.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: David Miller <davem@davemloft.net>
---
 arch/sparc/kernel/jump_label.c |    8 --------
 1 files changed, 0 insertions(+), 8 deletions(-)
diff --git a/arch/sparc/kernel/jump_label.c b/arch/sparc/kernel/jump_label.c
index ea2dafc..971fd43 100644
--- a/arch/sparc/kernel/jump_label.c
+++ b/arch/sparc/kernel/jump_label.c
@@ -36,12 +36,4 @@ void arch_jump_label_transform(struct jump_entry *entry,
 	put_online_cpus();
 }
 
-void arch_jump_label_text_poke_early(jump_label_t addr)
-{
-	u32 *insn_p = (u32 *) (unsigned long) addr;
-
-	*insn_p = 0x01000000;
-	flushi(insn_p);
-}
-
 #endif
-- 
1.7.6.2
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Tejun Heo
2011-Oct-02  00:36 UTC
[Xen-devel] Re: [PATCH RFC V2 2/5] stop_machine: make stop_machine safe and efficient to call early
Hello, On Sat, Oct 01, 2011 at 02:55:34PM -0700, Jeremy Fitzhardinge wrote:> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > > Make stop_machine() safe to call early in boot, before SMP has been > set up, by simply calling the callback function directly if there''s > only one CPU online.Maybe "before stop_machine is initialized" is better wording now both here and in the comment?> [ Fixes from AKPM: > - add comment > - local_irq_flags, not save_flags > - also call hard_irq_disable() for systems which need it > > Tejun suggested using an explicit flag rather than just looking at > the online cpu count. ] > > Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > Cc: Tejun Heo <tj@kernel.org> > Cc: Rusty Russell <rusty@rustcorp.com.au> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: H. Peter Anvin <hpa@linux.intel.com> > Cc: Ingo Molnar <mingo@elte.hu> > Cc: Steven Rostedt <rostedt@goodmis.org>Other than that, Acked-by: Tejun Heo <tj@kernel.org> Thanks. -- tejun _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jason Baron
2011-Oct-03  15:02 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
Hi, (Sorry for the late reply - I was away for a few days). The early enable is really nice - it means there are not restrictions on when jump_label_inc()/dec() can be called which is nice. comments below. On Sat, Oct 01, 2011 at 02:55:35PM -0700, Jeremy Fitzhardinge wrote:> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > > If a key has been enabled before jump_label_init() is called, don''t > nop it out. > > This removes arch_jump_label_text_poke_early() (which can only nop > out a site) and uses arch_jump_label_transform() instead. > > Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > --- > include/linux/jump_label.h | 3 ++- > kernel/jump_label.c | 20 ++++++++------------ > 2 files changed, 10 insertions(+), 13 deletions(-) > > diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h > index 1213e9d..c8fb1b3 100644 > --- a/include/linux/jump_label.h > +++ b/include/linux/jump_label.h > @@ -45,7 +45,8 @@ extern void jump_label_lock(void); > extern void jump_label_unlock(void); > extern void arch_jump_label_transform(struct jump_entry *entry, > enum jump_label_type type); > -extern void arch_jump_label_text_poke_early(jump_label_t addr); > +extern void arch_jump_label_transform_early(struct jump_entry *entry, > + enum jump_label_type type); > extern int jump_label_text_reserved(void *start, void *end); > extern void jump_label_inc(struct jump_label_key *key); > extern void jump_label_dec(struct jump_label_key *key); > diff --git a/kernel/jump_label.c b/kernel/jump_label.c > index a8ce450..059202d5 100644 > --- a/kernel/jump_label.c > +++ b/kernel/jump_label.c > @@ -121,13 +121,6 @@ static void __jump_label_update(struct jump_label_key *key, > } > } > > -/* > - * Not all archs need this. > - */ > -void __weak arch_jump_label_text_poke_early(jump_label_t addr) > -{ > -} > - > static __init int jump_label_init(void) > { > struct jump_entry *iter_start = __start___jump_table; > @@ -139,12 +132,15 @@ static __init int jump_label_init(void) > jump_label_sort_entries(iter_start, iter_stop); > > for (iter = iter_start; iter < iter_stop; iter++) { > - arch_jump_label_text_poke_early(iter->code); > - if (iter->key == (jump_label_t)(unsigned long)key) > + struct jump_label_key *iterk; > + > + iterk = (struct jump_label_key *)(unsigned long)iter->key; > + arch_jump_label_transform(iter, jump_label_enabled(iterk) ? > + JUMP_LABEL_ENABLE : JUMP_LABEL_DISABLE);The only reason I called this at boot-time was that the ''ideal'' x86 no-op isn''t known until boot time. Thus, in the enabled case we could skip the the arch_jump_label_transform() call. ie: if (!enabled) arch_jump_label_transform(iter, JUMP_LABEL_DISABLE);> + if (iterk == key) > continue; > > - key = (struct jump_label_key *)(unsigned long)iter->key; > - atomic_set(&key->enabled, 0); > + key = iterk; > key->entries = iter; > #ifdef CONFIG_MODULES > key->next = NULL; > @@ -212,7 +208,7 @@ void jump_label_apply_nops(struct module *mod) > return; > > for (iter = iter_start; iter < iter_stop; iter++) > - arch_jump_label_text_poke_early(iter->code); > + arch_jump_label_transform(iter, JUMP_LABEL_DISABLE); > } > > static int jump_label_add_module(struct module *mod) > -- > 1.7.6.2 >hmmm...this is used on module load in smp - so this would introduce a number of calls to stop_machine() where we didn''t have them before. Yes, module load is a very slow path to begin with, but I think its at least worth pointing out... thanks, -Jason _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Rostedt
2011-Oct-03  15:47 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On Mon, 2011-10-03 at 11:02 -0400, Jason Baron wrote:> if (!enabled) > arch_jump_label_transform(iter, JUMP_LABEL_DISABLE); > > > > + if (iterk == key) > > continue; > > > > - key = (struct jump_label_key *)(unsigned long)iter->key; > > - atomic_set(&key->enabled, 0); > > + key = iterk; > > key->entries = iter; > > #ifdef CONFIG_MODULES > > key->next = NULL; > > @@ -212,7 +208,7 @@ void jump_label_apply_nops(struct module *mod) > > return; > > > > for (iter = iter_start; iter < iter_stop; iter++) > > - arch_jump_label_text_poke_early(iter->code); > > + arch_jump_label_transform(iter, JUMP_LABEL_DISABLE); > > } > > > > static int jump_label_add_module(struct module *mod) > > -- > > 1.7.6.2 > > > > hmmm...this is used on module load in smp - so this would introduce a number of > calls to stop_machine() where we didn''t have them before. Yes, module > load is a very slow path to begin with, but I think its at least worth > pointing out...And it is a good point to point out. As stop_machine becomes noticeable by users on large scale CPU boxes. Ideally, we want to avoid stopmachine when we do not need it. -- Steve _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Oct-03  16:27 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On 10/03/2011 08:02 AM, Jason Baron wrote:> Hi, > > (Sorry for the late reply - I was away for a few days). > > The early enable is really nice - it means there are not restrictions on > when jump_label_inc()/dec() can be called which is nice. > > comments below. > > > On Sat, Oct 01, 2011 at 02:55:35PM -0700, Jeremy Fitzhardinge wrote: >> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> >> >> If a key has been enabled before jump_label_init() is called, don''t >> nop it out. >> >> This removes arch_jump_label_text_poke_early() (which can only nop >> out a site) and uses arch_jump_label_transform() instead. >> >> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> >> --- >> include/linux/jump_label.h | 3 ++- >> kernel/jump_label.c | 20 ++++++++------------ >> 2 files changed, 10 insertions(+), 13 deletions(-) >> >> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h >> index 1213e9d..c8fb1b3 100644 >> --- a/include/linux/jump_label.h >> +++ b/include/linux/jump_label.h >> @@ -45,7 +45,8 @@ extern void jump_label_lock(void); >> extern void jump_label_unlock(void); >> extern void arch_jump_label_transform(struct jump_entry *entry, >> enum jump_label_type type); >> -extern void arch_jump_label_text_poke_early(jump_label_t addr); >> +extern void arch_jump_label_transform_early(struct jump_entry *entry, >> + enum jump_label_type type); >> extern int jump_label_text_reserved(void *start, void *end); >> extern void jump_label_inc(struct jump_label_key *key); >> extern void jump_label_dec(struct jump_label_key *key); >> diff --git a/kernel/jump_label.c b/kernel/jump_label.c >> index a8ce450..059202d5 100644 >> --- a/kernel/jump_label.c >> +++ b/kernel/jump_label.c >> @@ -121,13 +121,6 @@ static void __jump_label_update(struct jump_label_key *key, >> } >> } >> >> -/* >> - * Not all archs need this. >> - */ >> -void __weak arch_jump_label_text_poke_early(jump_label_t addr) >> -{ >> -} >> - >> static __init int jump_label_init(void) >> { >> struct jump_entry *iter_start = __start___jump_table; >> @@ -139,12 +132,15 @@ static __init int jump_label_init(void) >> jump_label_sort_entries(iter_start, iter_stop); >> >> for (iter = iter_start; iter < iter_stop; iter++) { >> - arch_jump_label_text_poke_early(iter->code); >> - if (iter->key == (jump_label_t)(unsigned long)key) >> + struct jump_label_key *iterk; >> + >> + iterk = (struct jump_label_key *)(unsigned long)iter->key; >> + arch_jump_label_transform(iter, jump_label_enabled(iterk) ? >> + JUMP_LABEL_ENABLE : JUMP_LABEL_DISABLE); > The only reason I called this at boot-time was that the ''ideal'' x86 > no-op isn''t known until boot time. Thus, in the enabled case we could > skip the the arch_jump_label_transform() call. ie: > > if (!enabled) > arch_jump_label_transform(iter, JUMP_LABEL_DISABLE);Yep, fair enough.> > >> + if (iterk == key) >> continue; >> >> - key = (struct jump_label_key *)(unsigned long)iter->key; >> - atomic_set(&key->enabled, 0); >> + key = iterk; >> key->entries = iter; >> #ifdef CONFIG_MODULES >> key->next = NULL; >> @@ -212,7 +208,7 @@ void jump_label_apply_nops(struct module *mod) >> return; >> >> for (iter = iter_start; iter < iter_stop; iter++) >> - arch_jump_label_text_poke_early(iter->code); >> + arch_jump_label_transform(iter, JUMP_LABEL_DISABLE); >> } >> >> static int jump_label_add_module(struct module *mod) >> -- >> 1.7.6.2 >> > hmmm...this is used on module load in smp - so this would introduce a number of > calls to stop_machine() where we didn''t have them before. Yes, module > load is a very slow path to begin with, but I think its at least worth > pointing out...Ah, that explains it - the module stuff certainly isn''t "early" except - I guess - in the module''s lifetime. Well, I suppose I could introduce either second variant of the function, or add a "live" flag (ie, may be updating code that a processor is executing), which requires a stop_machine, or direct update if it doesn''t. But is there any reason why we couldn''t just generate a reasonably efficient 5-byte atomic nop in the first place, and get rid of all that fooling around? It looks like x86 is the only arch where it makes any difference at all, and how much difference does it really make? Or is there no one 5-byte atomic nop that works on all x86 variants, aside from jmp +0? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Oct-03  19:24 UTC
Re: [Xen-devel] [PATCH RFC V2 2/5] stop_machine: make stop_machine safe and efficient to call early
On Sat, Oct 01, 2011 at 02:55:34PM -0700, Jeremy Fitzhardinge wrote:> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > > Make stop_machine() safe to call early in boot, before SMP has been > set up, by simply calling the callback function directly if there''s > only one CPU online. > > [ Fixes from AKPM: > - add comment > - local_irq_flags, not save_flags > - also call hard_irq_disable() for systems which need it > > Tejun suggested using an explicit flag rather than just looking at > the online cpu count. ] > > Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > Cc: Tejun Heo <tj@kernel.org> > Cc: Rusty Russell <rusty@rustcorp.com.au> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: H. Peter Anvin <hpa@linux.intel.com> > Cc: Ingo Molnar <mingo@elte.hu> > Cc: Steven Rostedt <rostedt@goodmis.org> > --- > kernel/stop_machine.c | 21 +++++++++++++++++++++ > 1 files changed, 21 insertions(+), 0 deletions(-) > > diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c > index ba5070c..9c59d9e 100644 > --- a/kernel/stop_machine.c > +++ b/kernel/stop_machine.c > @@ -41,6 +41,7 @@ struct cpu_stopper { > }; > > static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper); > +static bool stop_machine_initialized = false;__read_mostly? Thought it probably does not really matter that much in what section it is put in. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jason Baron
2011-Oct-04  14:10 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On Mon, Oct 03, 2011 at 09:27:56AM -0700, Jeremy Fitzhardinge wrote:> On 10/03/2011 08:02 AM, Jason Baron wrote: > > Hi, > > > > (Sorry for the late reply - I was away for a few days). > > > > The early enable is really nice - it means there are not restrictions on > > when jump_label_inc()/dec() can be called which is nice. > > > > comments below. > > > > > > On Sat, Oct 01, 2011 at 02:55:35PM -0700, Jeremy Fitzhardinge wrote: > >> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > >> > >> If a key has been enabled before jump_label_init() is called, don''t > >> nop it out. > >> > >> This removes arch_jump_label_text_poke_early() (which can only nop > >> out a site) and uses arch_jump_label_transform() instead. > >> > >> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > >> --- > >> include/linux/jump_label.h | 3 ++- > >> kernel/jump_label.c | 20 ++++++++------------ > >> 2 files changed, 10 insertions(+), 13 deletions(-) > >> > >> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h > >> index 1213e9d..c8fb1b3 100644 > >> --- a/include/linux/jump_label.h > >> +++ b/include/linux/jump_label.h > >> @@ -45,7 +45,8 @@ extern void jump_label_lock(void); > >> extern void jump_label_unlock(void); > >> extern void arch_jump_label_transform(struct jump_entry *entry, > >> enum jump_label_type type); > >> -extern void arch_jump_label_text_poke_early(jump_label_t addr); > >> +extern void arch_jump_label_transform_early(struct jump_entry *entry, > >> + enum jump_label_type type); > >> extern int jump_label_text_reserved(void *start, void *end); > >> extern void jump_label_inc(struct jump_label_key *key); > >> extern void jump_label_dec(struct jump_label_key *key); > >> diff --git a/kernel/jump_label.c b/kernel/jump_label.c > >> index a8ce450..059202d5 100644 > >> --- a/kernel/jump_label.c > >> +++ b/kernel/jump_label.c > >> @@ -121,13 +121,6 @@ static void __jump_label_update(struct jump_label_key *key, > >> } > >> } > >> > >> -/* > >> - * Not all archs need this. > >> - */ > >> -void __weak arch_jump_label_text_poke_early(jump_label_t addr) > >> -{ > >> -} > >> - > >> static __init int jump_label_init(void) > >> { > >> struct jump_entry *iter_start = __start___jump_table; > >> @@ -139,12 +132,15 @@ static __init int jump_label_init(void) > >> jump_label_sort_entries(iter_start, iter_stop); > >> > >> for (iter = iter_start; iter < iter_stop; iter++) { > >> - arch_jump_label_text_poke_early(iter->code); > >> - if (iter->key == (jump_label_t)(unsigned long)key) > >> + struct jump_label_key *iterk; > >> + > >> + iterk = (struct jump_label_key *)(unsigned long)iter->key; > >> + arch_jump_label_transform(iter, jump_label_enabled(iterk) ? > >> + JUMP_LABEL_ENABLE : JUMP_LABEL_DISABLE); > > The only reason I called this at boot-time was that the ''ideal'' x86 > > no-op isn''t known until boot time. Thus, in the enabled case we could > > skip the the arch_jump_label_transform() call. ie: > > > > if (!enabled) > > arch_jump_label_transform(iter, JUMP_LABEL_DISABLE); > > > Yep, fair enough. > > > > > > >> + if (iterk == key) > >> continue; > >> > >> - key = (struct jump_label_key *)(unsigned long)iter->key; > >> - atomic_set(&key->enabled, 0); > >> + key = iterk; > >> key->entries = iter; > >> #ifdef CONFIG_MODULES > >> key->next = NULL; > >> @@ -212,7 +208,7 @@ void jump_label_apply_nops(struct module *mod) > >> return; > >> > >> for (iter = iter_start; iter < iter_stop; iter++) > >> - arch_jump_label_text_poke_early(iter->code); > >> + arch_jump_label_transform(iter, JUMP_LABEL_DISABLE); > >> } > >> > >> static int jump_label_add_module(struct module *mod) > >> -- > >> 1.7.6.2 > >> > > hmmm...this is used on module load in smp - so this would introduce a number of > > calls to stop_machine() where we didn''t have them before. Yes, module > > load is a very slow path to begin with, but I think its at least worth > > pointing out... > > Ah, that explains it - the module stuff certainly isn''t "early" except - > I guess - in the module''s lifetime. > > Well, I suppose I could introduce either second variant of the function, > or add a "live" flag (ie, may be updating code that a processor is > executing), which requires a stop_machine, or direct update if it doesn''t. > > But is there any reason why we couldn''t just generate a reasonably > efficient 5-byte atomic nop in the first place, and get rid of all that > fooling around? It looks like x86 is the only arch where it makes any > difference at all, and how much difference does it really make? Or is > there no one 5-byte atomic nop that works on all x86 variants, aside > from jmp +0? > > JYes, there are really two reasons for the initial no-op patching pass: 1) The jmp +0, is a ''safe'' no-op that I know is going to initially boot for all x86. I''m not sure if there is a 5-byte nop that works on all x86 variants - but by using jmp +0, we make it much easier to debug cases where we may be using broken no-ops. 2) This optimization is about as close to a 0 cost off case as possible. I know there have been various no-op benchmarks posted on lkml in the past, so the choice of no-op does seem to make a difference. see: http://lkml.indiana.edu/hypermail/linux/kernel/0808.1/2416.html, for example. So at least to me, if we are not using the lowest cost no-op, we are at least in-part defeating the point of this optimization. I like the "live" flag suggestion mentioned above. Less functions is better, and non-x86 arches can simply ignore the flag. Thanks, -Jason _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Oct-04  15:18 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On 10/04/2011 07:10 AM, Jason Baron wrote:> On Mon, Oct 03, 2011 at 09:27:56AM -0700, Jeremy Fitzhardinge wrote: >> On 10/03/2011 08:02 AM, Jason Baron wrote: >>> Hi, >>> >>> (Sorry for the late reply - I was away for a few days). >>> >>> The early enable is really nice - it means there are not restrictions on >>> when jump_label_inc()/dec() can be called which is nice. >>> >>> comments below. >>> >>> >>> On Sat, Oct 01, 2011 at 02:55:35PM -0700, Jeremy Fitzhardinge wrote: >>>> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> >>>> >>>> If a key has been enabled before jump_label_init() is called, don''t >>>> nop it out. >>>> >>>> This removes arch_jump_label_text_poke_early() (which can only nop >>>> out a site) and uses arch_jump_label_transform() instead. >>>> >>>> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> >>>> --- >>>> include/linux/jump_label.h | 3 ++- >>>> kernel/jump_label.c | 20 ++++++++------------ >>>> 2 files changed, 10 insertions(+), 13 deletions(-) >>>> >>>> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h >>>> index 1213e9d..c8fb1b3 100644 >>>> --- a/include/linux/jump_label.h >>>> +++ b/include/linux/jump_label.h >>>> @@ -45,7 +45,8 @@ extern void jump_label_lock(void); >>>> extern void jump_label_unlock(void); >>>> extern void arch_jump_label_transform(struct jump_entry *entry, >>>> enum jump_label_type type); >>>> -extern void arch_jump_label_text_poke_early(jump_label_t addr); >>>> +extern void arch_jump_label_transform_early(struct jump_entry *entry, >>>> + enum jump_label_type type); >>>> extern int jump_label_text_reserved(void *start, void *end); >>>> extern void jump_label_inc(struct jump_label_key *key); >>>> extern void jump_label_dec(struct jump_label_key *key); >>>> diff --git a/kernel/jump_label.c b/kernel/jump_label.c >>>> index a8ce450..059202d5 100644 >>>> --- a/kernel/jump_label.c >>>> +++ b/kernel/jump_label.c >>>> @@ -121,13 +121,6 @@ static void __jump_label_update(struct jump_label_key *key, >>>> } >>>> } >>>> >>>> -/* >>>> - * Not all archs need this. >>>> - */ >>>> -void __weak arch_jump_label_text_poke_early(jump_label_t addr) >>>> -{ >>>> -} >>>> - >>>> static __init int jump_label_init(void) >>>> { >>>> struct jump_entry *iter_start = __start___jump_table; >>>> @@ -139,12 +132,15 @@ static __init int jump_label_init(void) >>>> jump_label_sort_entries(iter_start, iter_stop); >>>> >>>> for (iter = iter_start; iter < iter_stop; iter++) { >>>> - arch_jump_label_text_poke_early(iter->code); >>>> - if (iter->key == (jump_label_t)(unsigned long)key) >>>> + struct jump_label_key *iterk; >>>> + >>>> + iterk = (struct jump_label_key *)(unsigned long)iter->key; >>>> + arch_jump_label_transform(iter, jump_label_enabled(iterk) ? >>>> + JUMP_LABEL_ENABLE : JUMP_LABEL_DISABLE); >>> The only reason I called this at boot-time was that the ''ideal'' x86 >>> no-op isn''t known until boot time. Thus, in the enabled case we could >>> skip the the arch_jump_label_transform() call. ie: >>> >>> if (!enabled) >>> arch_jump_label_transform(iter, JUMP_LABEL_DISABLE); >> >> Yep, fair enough. >> >>> >>>> + if (iterk == key) >>>> continue; >>>> >>>> - key = (struct jump_label_key *)(unsigned long)iter->key; >>>> - atomic_set(&key->enabled, 0); >>>> + key = iterk; >>>> key->entries = iter; >>>> #ifdef CONFIG_MODULES >>>> key->next = NULL; >>>> @@ -212,7 +208,7 @@ void jump_label_apply_nops(struct module *mod) >>>> return; >>>> >>>> for (iter = iter_start; iter < iter_stop; iter++) >>>> - arch_jump_label_text_poke_early(iter->code); >>>> + arch_jump_label_transform(iter, JUMP_LABEL_DISABLE); >>>> } >>>> >>>> static int jump_label_add_module(struct module *mod) >>>> -- >>>> 1.7.6.2 >>>> >>> hmmm...this is used on module load in smp - so this would introduce a number of >>> calls to stop_machine() where we didn''t have them before. Yes, module >>> load is a very slow path to begin with, but I think its at least worth >>> pointing out... >> Ah, that explains it - the module stuff certainly isn''t "early" except - >> I guess - in the module''s lifetime. >> >> Well, I suppose I could introduce either second variant of the function, >> or add a "live" flag (ie, may be updating code that a processor is >> executing), which requires a stop_machine, or direct update if it doesn''t. >> >> But is there any reason why we couldn''t just generate a reasonably >> efficient 5-byte atomic nop in the first place, and get rid of all that >> fooling around? It looks like x86 is the only arch where it makes any >> difference at all, and how much difference does it really make? Or is >> there no one 5-byte atomic nop that works on all x86 variants, aside >> from jmp +0? >> >> J > Yes, there are really two reasons for the initial no-op patching pass: > > 1) The jmp +0, is a ''safe'' no-op that I know is going to initially > boot for all x86. I''m not sure if there is a 5-byte nop that works on > all x86 variants - but by using jmp +0, we make it much easier to debug > cases where we may be using broken no-ops. > > 2) This optimization is about as close to a 0 cost off case as possible. > I know there have been various no-op benchmarks posted on lkml in the > past, so the choice of no-op does seem to make a difference. see: > http://lkml.indiana.edu/hypermail/linux/kernel/0808.1/2416.html, for > example. So at least to me, if we are not using the lowest cost no-op, > we are at least in-part defeating the point of this optimization. > > I like the "live" flag suggestion mentioned above. Less functions is > better, and non-x86 arches can simply ignore the flag.I went the other way and added a second function, arch_jump_label_transform_static(), which has a weak default implementation which calls arch_jump_label_transform(). That way only the architectures which really care about it need implement a second variant. I did x86 and s390 by adapting the patches I had from the other series; it didn''t look like mips/sparc/power were very heavyweight at all. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
H. Peter Anvin
2011-Oct-04  16:30 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On 10/04/2011 07:10 AM, Jason Baron wrote:> > 1) The jmp +0, is a ''safe'' no-op that I know is going to initially > boot for all x86. I''m not sure if there is a 5-byte nop that works on > all x86 variants - but by using jmp +0, we make it much easier to debug > cases where we may be using broken no-ops. >There are *plenty*. jmp+0 is about as pessimal as you can get. The current recommendation when you don''t know the CPU you''re running at is: 3E 8D 74 26 00 (GENERIC_NOP5_ATOMIC) ... on 32 bits and ... 0F 1F 44 00 00 (P6_NOP5_ATOMIC) ... on 64 bits. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don''t speak on their behalf. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jason Baron
2011-Oct-04  17:53 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On Tue, Oct 04, 2011 at 09:30:01AM -0700, H. Peter Anvin wrote:> On 10/04/2011 07:10 AM, Jason Baron wrote: > > > > 1) The jmp +0, is a ''safe'' no-op that I know is going to initially > > boot for all x86. I''m not sure if there is a 5-byte nop that works on > > all x86 variants - but by using jmp +0, we make it much easier to debug > > cases where we may be using broken no-ops. > > > > There are *plenty*. jmp+0 is about as pessimal as you can get. > > The current recommendation when you don''t know the CPU you''re running at is: > > 3E 8D 74 26 00 (GENERIC_NOP5_ATOMIC) > > ... on 32 bits and ... > > 0F 1F 44 00 00 (P6_NOP5_ATOMIC) > > ... on 64 bits. > > -hpa >We''re currently patching the code at run-time (boot and module load time), with the ''ideal'' no-op anyway, so the initial no-op doesn''t really matter much (other than to save patching if the initial and ideal match). -Jason _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Rostedt
2011-Oct-04  18:05 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On Tue, 2011-10-04 at 13:53 -0400, Jason Baron wrote:> On Tue, Oct 04, 2011 at 09:30:01AM -0700, H. Peter Anvin wrote: > > On 10/04/2011 07:10 AM, Jason Baron wrote: > > > > > > 1) The jmp +0, is a ''safe'' no-op that I know is going to initially > > > boot for all x86. I''m not sure if there is a 5-byte nop that works on > > > all x86 variants - but by using jmp +0, we make it much easier to debug > > > cases where we may be using broken no-ops. > > > > > > > There are *plenty*. jmp+0 is about as pessimal as you can get. > > > > The current recommendation when you don''t know the CPU you''re running at is: > > > > 3E 8D 74 26 00 (GENERIC_NOP5_ATOMIC) > > > > ... on 32 bits and ... > > > > 0F 1F 44 00 00 (P6_NOP5_ATOMIC) > > > > ... on 64 bits. > > > > -hpa > > > > We''re currently patching the code at run-time (boot and module load > time), with the ''ideal'' no-op anyway, so the initial no-op doesn''t > really matter much (other than to save patching if the initial and ideal > match).Out of correctness, we should still update this to use the proper "default" nops, as mcount already does. -- Steve _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Oct-06  00:16 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On 10/04/2011 09:30 AM, H. Peter Anvin wrote:> On 10/04/2011 07:10 AM, Jason Baron wrote: >> 1) The jmp +0, is a ''safe'' no-op that I know is going to initially >> boot for all x86. I''m not sure if there is a 5-byte nop that works on >> all x86 variants - but by using jmp +0, we make it much easier to debug >> cases where we may be using broken no-ops. >> > There are *plenty*. jmp+0 is about as pessimal as you can get.As an aside, do you know if a 2-byte unconditional jmp is any more efficient than 5-byte, aside from just being a smaller instruction and taking less icache? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
H. Peter Anvin
2011-Oct-06  00:17 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On 10/05/2011 05:16 PM, Jeremy Fitzhardinge wrote:> On 10/04/2011 09:30 AM, H. Peter Anvin wrote: >> On 10/04/2011 07:10 AM, Jason Baron wrote: >>> 1) The jmp +0, is a ''safe'' no-op that I know is going to initially >>> boot for all x86. I''m not sure if there is a 5-byte nop that works on >>> all x86 variants - but by using jmp +0, we make it much easier to debug >>> cases where we may be using broken no-ops. >>> >> There are *plenty*. jmp+0 is about as pessimal as you can get. > > As an aside, do you know if a 2-byte unconditional jmp is any more > efficient than 5-byte, aside from just being a smaller instruction and > taking less icache? >I don''t know for sure, no. I probably depends on the CPU. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don''t speak on their behalf. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Oct-06  00:47 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On 10/05/2011 05:17 PM, H. Peter Anvin wrote:> On 10/05/2011 05:16 PM, Jeremy Fitzhardinge wrote: >> On 10/04/2011 09:30 AM, H. Peter Anvin wrote: >>> On 10/04/2011 07:10 AM, Jason Baron wrote: >>>> 1) The jmp +0, is a ''safe'' no-op that I know is going to initially >>>> boot for all x86. I''m not sure if there is a 5-byte nop that works on >>>> all x86 variants - but by using jmp +0, we make it much easier to debug >>>> cases where we may be using broken no-ops. >>>> >>> There are *plenty*. jmp+0 is about as pessimal as you can get. >> As an aside, do you know if a 2-byte unconditional jmp is any more >> efficient than 5-byte, aside from just being a smaller instruction and >> taking less icache? >> > I don''t know for sure, no. I probably depends on the CPU.I was thinking about making the jump-label stuff generate a small jmp if the offset is small (specifically "jmp; nop3", or perhaps "jmp; ud2a; nop" to make absolutely sure there''s no speculation beyond the jmp), on the grounds that, while it probably doesn''t matter for any modern Intel/AMD processor, it may help for others. But I couldn''t find any concrete evidence to support it, and there''s already enough questions about doing "live" code updates without adding more instruction patterns. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Oct-06  17:53 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On 10/05/2011 05:17 PM, H. Peter Anvin wrote:> On 10/05/2011 05:16 PM, Jeremy Fitzhardinge wrote: >> On 10/04/2011 09:30 AM, H. Peter Anvin wrote: >>> On 10/04/2011 07:10 AM, Jason Baron wrote: >>>> 1) The jmp +0, is a ''safe'' no-op that I know is going to initially >>>> boot for all x86. I''m not sure if there is a 5-byte nop that works on >>>> all x86 variants - but by using jmp +0, we make it much easier to debug >>>> cases where we may be using broken no-ops. >>>> >>> There are *plenty*. jmp+0 is about as pessimal as you can get. >> As an aside, do you know if a 2-byte unconditional jmp is any more >> efficient than 5-byte, aside from just being a smaller instruction and >> taking less icache? >> > I don''t know for sure, no. I probably depends on the CPU.Looks like jmp2 is about 5% faster than jmp5 on Sandybridge with this benchmark. But insignificant difference on Nehalem. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jason Baron
2011-Oct-06  18:10 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On Thu, Oct 06, 2011 at 10:53:29AM -0700, Jeremy Fitzhardinge wrote:> On 10/05/2011 05:17 PM, H. Peter Anvin wrote: > > On 10/05/2011 05:16 PM, Jeremy Fitzhardinge wrote: > >> On 10/04/2011 09:30 AM, H. Peter Anvin wrote: > >>> On 10/04/2011 07:10 AM, Jason Baron wrote: > >>>> 1) The jmp +0, is a ''safe'' no-op that I know is going to initially > >>>> boot for all x86. I''m not sure if there is a 5-byte nop that works on > >>>> all x86 variants - but by using jmp +0, we make it much easier to debug > >>>> cases where we may be using broken no-ops. > >>>> > >>> There are *plenty*. jmp+0 is about as pessimal as you can get. > >> As an aside, do you know if a 2-byte unconditional jmp is any more > >> efficient than 5-byte, aside from just being a smaller instruction and > >> taking less icache? > >> > > I don''t know for sure, no. I probably depends on the CPU. > > Looks like jmp2 is about 5% faster than jmp5 on Sandybridge with this > benchmark. > > But insignificant difference on Nehalem. > > JIt would be cool if we could make the total width 2-bytes, when possible. It might be possible by making the initial ''JUMP_LABEL_INITIAL_NOP'' as a ''jmp'' to the ''l_yes'' label. And then patching that with a no-op at boot time or link time - letting the compiler pick the width. In that way we could get the optimal width... thanks, -Jason> #include <stdio.h> > > struct { > unsigned char flag; > unsigned char val; > } l; > > #define JMP2 asm volatile ("jmp 1f; .byte 0x0f,0x1f,0x00; 1: "); > #define JMPJMP2 JMP2 JMP2 > #define JMPJMPJMP2 JMPJMP2 JMPJMP2 > #define JMPJMPJMP2 JMPJMP2 JMPJMP2 > #define JMPJMPJMPJMP2 JMPJMPJMP2 JMPJMPJMP2 > #define JMPJMPJMPJMPJMP2 JMPJMPJMPJMP2 JMPJMPJMPJMP2 > #define JMPJMPJMPJMPJMPJMP2 JMPJMPJMPJMPJMP2 JMPJMPJMPJMPJMP2 > > int main(int argc, char **argv) > { > int i; > > for (i = 0; i < 100000000; i++) { > JMPJMPJMPJMPJMPJMP2; > asm volatile("" : : : "memory"); > } > > return 0; > }> #include <stdio.h> > > struct { > unsigned char flag; > unsigned char val; > } l; > > #define JMP5 asm volatile (".byte 0xe9; .long 0"); > #define JMPJMP5 JMP5 JMP5 > #define JMPJMPJMP5 JMPJMP5 JMPJMP5 > #define JMPJMPJMP5 JMPJMP5 JMPJMP5 > #define JMPJMPJMPJMP5 JMPJMPJMP5 JMPJMPJMP5 > #define JMPJMPJMPJMPJMP5 JMPJMPJMPJMP5 JMPJMPJMPJMP5 > #define JMPJMPJMPJMPJMPJMP5 JMPJMPJMPJMPJMP5 JMPJMPJMPJMPJMP5 > > int main(int argc, char **argv) > { > int i; > > for (i = 0; i < 100000000; i++) { > JMPJMPJMPJMPJMPJMP5; > asm volatile("" : : : "memory"); > } > > return 0; > }_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
H. Peter Anvin
2011-Oct-06  18:13 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On 10/06/2011 11:10 AM, Jason Baron wrote:> > It would be cool if we could make the total width 2-bytes, when > possible. It might be possible by making the initial ''JUMP_LABEL_INITIAL_NOP'' > as a ''jmp'' to the ''l_yes'' label. And then patching that with a no-op at boot > time or link time - letting the compiler pick the width. In that way we could > get the optimal width... >Yes, that would be a win just based on icache footprint alone. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don''t speak on their behalf. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Oct-06  18:15 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On 10/06/2011 11:10 AM, Jason Baron wrote:> On Thu, Oct 06, 2011 at 10:53:29AM -0700, Jeremy Fitzhardinge wrote: >> On 10/05/2011 05:17 PM, H. Peter Anvin wrote: >>> On 10/05/2011 05:16 PM, Jeremy Fitzhardinge wrote: >>>> On 10/04/2011 09:30 AM, H. Peter Anvin wrote: >>>>> On 10/04/2011 07:10 AM, Jason Baron wrote: >>>>>> 1) The jmp +0, is a ''safe'' no-op that I know is going to initially >>>>>> boot for all x86. I''m not sure if there is a 5-byte nop that works on >>>>>> all x86 variants - but by using jmp +0, we make it much easier to debug >>>>>> cases where we may be using broken no-ops. >>>>>> >>>>> There are *plenty*. jmp+0 is about as pessimal as you can get. >>>> As an aside, do you know if a 2-byte unconditional jmp is any more >>>> efficient than 5-byte, aside from just being a smaller instruction and >>>> taking less icache? >>>> >>> I don''t know for sure, no. I probably depends on the CPU. >> Looks like jmp2 is about 5% faster than jmp5 on Sandybridge with this >> benchmark. >> >> But insignificant difference on Nehalem. >> >> J > It would be cool if we could make the total width 2-bytes, when > possible. It might be possible by making the initial ''JUMP_LABEL_INITIAL_NOP'' > as a ''jmp'' to the ''l_yes'' label. And then patching that with a no-op at boot > time or link time - letting the compiler pick the width. In that way we could > get the optimal width...I''ll have a look at it later today if I get a moment. Should be fairly straightforward. What about the rest of the series. Do you think it looks cooked enough for next mergewindow? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Rostedt
2011-Oct-06  18:26 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On Thu, 2011-10-06 at 14:10 -0400, Jason Baron wrote:> > Looks like jmp2 is about 5% faster than jmp5 on Sandybridge with this > > benchmark. > > > > But insignificant difference on Nehalem. > > > > J > > It would be cool if we could make the total width 2-bytes, when > possible. It might be possible by making the initial ''JUMP_LABEL_INITIAL_NOP'' > as a ''jmp'' to the ''l_yes'' label. And then patching that with a no-op at boot > time or link time - letting the compiler pick the width. In that way we could > get the optimal width...Why not just do it? jump_label is encapsulated in arch_static_branch() which on x86 looks like: static __always_inline bool arch_static_branch(struct jump_label_key *key) { asm goto("1:" JUMP_LABEL_INITIAL_NOP ".pushsection __jump_table, \"aw\" \n\t" _ASM_ALIGN "\n\t" _ASM_PTR "1b, %l[l_yes], %c0 \n\t" ".popsection \n\t" : : "i" (key) : : l_yes); return false; l_yes: return true; } That jmp to l_yes should easily be a two byte jump. If not I''m sure it would be easy to catch it before modifying the code. And then complain real loudly about it. -- Steve _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
H. Peter Anvin
2011-Oct-06  18:29 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On 10/06/2011 11:26 AM, Steven Rostedt wrote:> On Thu, 2011-10-06 at 14:10 -0400, Jason Baron wrote: > >>> Looks like jmp2 is about 5% faster than jmp5 on Sandybridge with this >>> benchmark. >>> >>> But insignificant difference on Nehalem. >>> >>> J >> >> It would be cool if we could make the total width 2-bytes, when >> possible. It might be possible by making the initial ''JUMP_LABEL_INITIAL_NOP'' >> as a ''jmp'' to the ''l_yes'' label. And then patching that with a no-op at boot >> time or link time - letting the compiler pick the width. In that way we could >> get the optimal width... > > Why not just do it? > > jump_label is encapsulated in arch_static_branch() which on x86 looks > like: > > static __always_inline bool arch_static_branch(struct jump_label_key *key) > { > asm goto("1:" > JUMP_LABEL_INITIAL_NOP > ".pushsection __jump_table, \"aw\" \n\t" > _ASM_ALIGN "\n\t" > _ASM_PTR "1b, %l[l_yes], %c0 \n\t" > ".popsection \n\t" > : : "i" (key) : : l_yes); > return false; > l_yes: > return true; > } > > > That jmp to l_yes should easily be a two byte jump. > > If not I''m sure it would be easy to catch it before modifying the code. > And then complain real loudly about it. >The important thing is that it requires the build-time elimination of jumps. It''s just work. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don''t speak on their behalf. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jason Baron
2011-Oct-06  18:33 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On Thu, Oct 06, 2011 at 11:15:07AM -0700, Jeremy Fitzhardinge wrote:> On 10/06/2011 11:10 AM, Jason Baron wrote: > > On Thu, Oct 06, 2011 at 10:53:29AM -0700, Jeremy Fitzhardinge wrote: > >> On 10/05/2011 05:17 PM, H. Peter Anvin wrote: > >>> On 10/05/2011 05:16 PM, Jeremy Fitzhardinge wrote: > >>>> On 10/04/2011 09:30 AM, H. Peter Anvin wrote: > >>>>> On 10/04/2011 07:10 AM, Jason Baron wrote: > >>>>>> 1) The jmp +0, is a ''safe'' no-op that I know is going to initially > >>>>>> boot for all x86. I''m not sure if there is a 5-byte nop that works on > >>>>>> all x86 variants - but by using jmp +0, we make it much easier to debug > >>>>>> cases where we may be using broken no-ops. > >>>>>> > >>>>> There are *plenty*. jmp+0 is about as pessimal as you can get. > >>>> As an aside, do you know if a 2-byte unconditional jmp is any more > >>>> efficient than 5-byte, aside from just being a smaller instruction and > >>>> taking less icache? > >>>> > >>> I don''t know for sure, no. I probably depends on the CPU. > >> Looks like jmp2 is about 5% faster than jmp5 on Sandybridge with this > >> benchmark. > >> > >> But insignificant difference on Nehalem. > >> > >> J > > It would be cool if we could make the total width 2-bytes, when > > possible. It might be possible by making the initial ''JUMP_LABEL_INITIAL_NOP'' > > as a ''jmp'' to the ''l_yes'' label. And then patching that with a no-op at boot > > time or link time - letting the compiler pick the width. In that way we could > > get the optimal width... > > I''ll have a look at it later today if I get a moment. Should be fairly > straightforward. >cool. It does add some complication, I think...detecting the 2-byte vs. 5-byte, and if done at boot time, possibly taking the undesired branch...> What about the rest of the series. Do you think it looks cooked enough > for next mergewindow? > > JYes, it looks good to me thanks! Feel free to add my ack to the series. thanks, -Jason _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
H. Peter Anvin
2011-Oct-06  18:35 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On 10/06/2011 11:33 AM, Jason Baron wrote:> > cool. It does add some complication, I think...detecting the 2-byte vs. > 5-byte, and if done at boot time, possibly taking the undesired > branch... >It has to be done at build time to be sane, IMO. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don''t speak on their behalf. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jason Baron
2011-Oct-06  18:38 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On Thu, Oct 06, 2011 at 11:29:25AM -0700, H. Peter Anvin wrote:> On 10/06/2011 11:26 AM, Steven Rostedt wrote: > > On Thu, 2011-10-06 at 14:10 -0400, Jason Baron wrote: > > > >>> Looks like jmp2 is about 5% faster than jmp5 on Sandybridge with this > >>> benchmark. > >>> > >>> But insignificant difference on Nehalem. > >>> > >>> J > >> > >> It would be cool if we could make the total width 2-bytes, when > >> possible. It might be possible by making the initial ''JUMP_LABEL_INITIAL_NOP'' > >> as a ''jmp'' to the ''l_yes'' label. And then patching that with a no-op at boot > >> time or link time - letting the compiler pick the width. In that way we could > >> get the optimal width... > > > > Why not just do it? > > > > jump_label is encapsulated in arch_static_branch() which on x86 looks > > like: > > > > static __always_inline bool arch_static_branch(struct jump_label_key *key) > > { > > asm goto("1:" > > JUMP_LABEL_INITIAL_NOP > > ".pushsection __jump_table, \"aw\" \n\t" > > _ASM_ALIGN "\n\t" > > _ASM_PTR "1b, %l[l_yes], %c0 \n\t" > > ".popsection \n\t" > > : : "i" (key) : : l_yes); > > return false; > > l_yes: > > return true; > > } > > > > > > That jmp to l_yes should easily be a two byte jump.remember the compiler is moving the l_yes out of line, so its not necessarily always a two byte jump. Also, I plan to look at a possible ''cold'' label for the ''l_yes'' branch, so that it can moved to a separate ''cold'' section, but we might only want that for some cases...> > > > If not I''m sure it would be easy to catch it before modifying the code. > > And then complain real loudly about it. > > > > The important thing is that it requires the build-time elimination of > jumps. It''s just work. > > -hpa >Right, its certainly doable, but I''m not sure its so simple, since we''ll need a pass to eliminate the jumps - which can be keyed off the ''__jump_table'' section. thanks, -Jason _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jason Baron
2011-Oct-06  18:43 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On Thu, Oct 06, 2011 at 11:35:49AM -0700, H. Peter Anvin wrote:> On 10/06/2011 11:33 AM, Jason Baron wrote: > > > > cool. It does add some complication, I think...detecting the 2-byte vs. > > 5-byte, and if done at boot time, possibly taking the undesired > > branch... > > > > It has to be done at build time to be sane, IMO. > > -hpa >agreed. -Jason _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Richard Henderson
2011-Oct-06  18:50 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On 10/06/2011 11:26 AM, Steven Rostedt wrote:> On Thu, 2011-10-06 at 14:10 -0400, Jason Baron wrote: > >>> Looks like jmp2 is about 5% faster than jmp5 on Sandybridge with this >>> benchmark. >>> >>> But insignificant difference on Nehalem. >>> >>> J >> >> It would be cool if we could make the total width 2-bytes, when >> possible. It might be possible by making the initial ''JUMP_LABEL_INITIAL_NOP'' >> as a ''jmp'' to the ''l_yes'' label. And then patching that with a no-op at boot >> time or link time - letting the compiler pick the width. In that way we could >> get the optimal width... > > Why not just do it? > > jump_label is encapsulated in arch_static_branch() which on x86 looks > like: > > static __always_inline bool arch_static_branch(struct jump_label_key *key) > { > asm goto("1:" > JUMP_LABEL_INITIAL_NOP > ".pushsection __jump_table, \"aw\" \n\t" > _ASM_ALIGN "\n\t" > _ASM_PTR "1b, %l[l_yes], %c0 \n\t" > ".popsection \n\t" > : : "i" (key) : : l_yes); > return false; > l_yes: > return true; > } > > > That jmp to l_yes should easily be a two byte jump.Until the compiler decides to re-order the code. That''s the problem -- in the general case you do not know how far away the destination is really going to be. There are a couple of possibilities for improvement: (1) Do as Jason suggests above and let the assembler figure out the size of the branch that is needed. Without adding more data to __jump_table, you''ll want to be extremely careful about checking the two pointers to see what size branch has been installed. (2) Always reserve 5 bytes of space, but if the distance is small enough patch in a 2-byte jump. That doesn''t help with the icache footprint. (3) There is no 3. I was going to say something clever about gas .ifne conditionals, but a quick test revealed they don''t work for forward declarations. r~ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Rostedt
2011-Oct-06  19:28 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On Thu, 2011-10-06 at 11:50 -0700, Richard Henderson wrote:> On 10/06/2011 11:26 AM, Steven Rostedt wrote: > > > > > That jmp to l_yes should easily be a two byte jump. > > Until the compiler decides to re-order the code. That''s the problem -- > in the general case you do not know how far away the destination is really > going to be.Yeah, I was discussing this with Peter Zijlstra on IRC.> > There are a couple of possibilities for improvement: > > (1) Do as Jason suggests above and let the assembler figure out the size > of the branch that is needed. Without adding more data to __jump_table, > you''ll want to be extremely careful about checking the two pointers to > see what size branch has been installed.Yeah, that could be done at patch time.> > (2) Always reserve 5 bytes of space, but if the distance is small enough > patch in a 2-byte jump. That doesn''t help with the icache footprint.I don''t think this one is worth it. -- Steve _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Rostedt
2011-Oct-06  19:34 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On Thu, 2011-10-06 at 14:38 -0400, Jason Baron wrote:> Right, its certainly doable, but I''m not sure its so simple, since we''ll > need a pass to eliminate the jumps - which can be keyed off the > ''__jump_table'' section.Look at the code of scripts/recordmcount.c and friends. It does two things. 1) find all the callers of mcount and make a section for it. 2) For those callers of mcount that is in sections that are not whitelisted, and therefor will not be patched, to replace the call to mcount with a nop. We can use this code, or a copy of it, to do the same with jump_label. Have the x86 jump_label be: static __always_inline bool arch_static_branch(struct jump_label_key *key) { asm goto("1:" "jmp l_yes\n" ".pushsection __jump_table, \"aw\" \n\t" _ASM_ALIGN "\n\t" _ASM_PTR "1b, %l[l_yes], %c0 \n\t" ".popsection \n\t" : : "i" (key) : : l_yes); return false; l_yes: return true; } Then have the record_jumplabel.c (or whatever it''s called) find all the jmps at run time, and convert them into the appropriate nop. Then at runtime patching, the jumplabel code could figure out what size jump it needs to replace it. -- Steve _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jason Baron
2011-Oct-06  20:33 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On Thu, Oct 06, 2011 at 03:34:13PM -0400, Steven Rostedt wrote:> On Thu, 2011-10-06 at 14:38 -0400, Jason Baron wrote: > > > Right, its certainly doable, but I''m not sure its so simple, since we''ll > > need a pass to eliminate the jumps - which can be keyed off the > > ''__jump_table'' section. > > Look at the code of scripts/recordmcount.c and friends. > > It does two things. > > 1) find all the callers of mcount and make a section for it. > > 2) For those callers of mcount that is in sections that are not > whitelisted, and therefor will not be patched, to replace the call to > mcount with a nop. > > > We can use this code, or a copy of it, to do the same with jump_label. > Have the x86 jump_label be: > > > static __always_inline bool arch_static_branch(struct jump_label_key > *key) > { > asm goto("1:" > "jmp l_yes\n" > ".pushsection __jump_table, \"aw\" \n\t" > _ASM_ALIGN "\n\t" > _ASM_PTR "1b, %l[l_yes], %c0 \n\t" > ".popsection \n\t" > : : "i" (key) : : l_yes); > return false; > l_yes: > return true; > } > > Then have the record_jumplabel.c (or whatever it''s called) find all the > jmps at run time, and convert them into the appropriate nop. >I''d prefer to do this at build-time as hpa said. We don''t want there to be some race b/w patching in the no-ops and taking an incorrect branch.> Then at runtime patching, the jumplabel code could figure out what size > jump it needs to replace it. > > -- Steve > >sounds like a good plan. thanks for the pointers! -Jason _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Rostedt
2011-Oct-06  20:45 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On Thu, 2011-10-06 at 16:33 -0400, Jason Baron wrote:> > Then have the record_jumplabel.c (or whatever it''s called) find all the > > jmps at run time, and convert them into the appropriate nop. > > > > I''d prefer to do this at build-time as hpa said. We don''t want there to > be some race b/w patching in the no-ops and taking an incorrect branch.Yep, this record_jumplabel.c would execute at build time. Just like the record_mcount.c does.> > > Then at runtime patching, the jumplabel code could figure out what size > > jump it needs to replace it.The runtime patching is when we can figure out what size the compiler used. That wont change. It should be pretty trivial to do. I can''t see any races here. -- Steve _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Oct-06  21:39 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On 10/06/2011 11:13 AM, H. Peter Anvin wrote:> On 10/06/2011 11:10 AM, Jason Baron wrote: >> It would be cool if we could make the total width 2-bytes, when >> possible. It might be possible by making the initial ''JUMP_LABEL_INITIAL_NOP'' >> as a ''jmp'' to the ''l_yes'' label. And then patching that with a no-op at boot >> time or link time - letting the compiler pick the width. In that way we could >> get the optimal width... >> > Yes, that would be a win just based on icache footprint alone.I''m not sure it would be a win, necessarily. My test with back-to-back jmp2 was definitely slower than with the nop padding it out to 5 bytes; I suspect that''s a result of having too many jmps within one cacheline. Of course, there''s no reason why the CPU would optimise for jumps to jumps, so perhaps its just hitting a "stupid programmer" path. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Oct-06  21:42 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On 10/06/2011 12:28 PM, Steven Rostedt wrote:>> (2) Always reserve 5 bytes of space, but if the distance is small enough >> patch in a 2-byte jump. That doesn''t help with the icache footprint. > I don''t think this one is worth it.I disagree. This is what I benchmarked as having a 5% improvement. If squashing out the padding helps, then that''s a separate optimisation. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Rostedt
2011-Oct-06  22:06 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On Thu, 2011-10-06 at 14:42 -0700, Jeremy Fitzhardinge wrote:> On 10/06/2011 12:28 PM, Steven Rostedt wrote: > >> (2) Always reserve 5 bytes of space, but if the distance is small enough > >> patch in a 2-byte jump. That doesn''t help with the icache footprint. > > I don''t think this one is worth it. > > I disagree. This is what I benchmarked as having a 5% improvement. If > squashing out the padding helps, then that''s a separate optimisation.But it only speeds up the tracing case. The non-tracing case is a nop and 5bytes is 5bytes regardless. Did you see a 5% speed up while tracing was happening? How did you do your test. I find a 5 byte compared to a 2 byte jump being negligible with the rest of the overhead of tracing, but I could be wrong. -- Steve _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Rostedt
2011-Oct-06  22:08 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On Thu, 2011-10-06 at 14:39 -0700, Jeremy Fitzhardinge wrote:> I''m not sure it would be a win, necessarily. My test with back-to-back > jmp2 was definitely slower than with the nop padding it out to 5 bytes; > I suspect that''s a result of having too many jmps within one cacheline. > Of course, there''s no reason why the CPU would optimise for jumps to > jumps, so perhaps its just hitting a "stupid programmer" path.Just a note. Micro-benchmarks are known to be as good as political polls and statistics. They show a much different view of the world than what is really there. -- Steve _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Oct-06  22:10 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On 10/06/2011 03:06 PM, Steven Rostedt wrote:> But it only speeds up the tracing case. The non-tracing case is a nop > and 5bytes is 5bytes regardless. > > Did you see a 5% speed up while tracing was happening? How did you do > your test. I find a 5 byte compared to a 2 byte jump being negligible > with the rest of the overhead of tracing, but I could be wrong.You''re right, this was a completely artificial microbenchmark. In practice the improvement would be a much smaller effect. But bear in mind, I''m not using jump-label for tracing. While its important for the "disabled" state to be quick, performance of the "enabled" state is also important. Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Rostedt
2011-Oct-06  22:20 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On Thu, 2011-10-06 at 15:10 -0700, Jeremy Fitzhardinge wrote:> On 10/06/2011 03:06 PM, Steven Rostedt wrote:> But bear in mind, I''m not using jump-label for tracing. While its > important for the "disabled" state to be quick, performance of the > "enabled" state is also important.Sorry, I''m still thinking jump-label for tracing over. But that said, having the nop match is the best of both worlds. I think having a update_jumplabel.c that is just like the record_mcount.c which modifies the code right after it was compiled is the best thing to do. That is, have the assembler determine what size jumps to use and update them to nops right in the object file before linking. This should be rather trivial to do. -- Steve _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Rostedt
2011-Oct-07  17:09 UTC
[Xen-devel] [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps
Note, this is just hacked together and needs to be cleaned up. Please do
not comment on formatting or other sloppiness of this patch. I know
it''s
sloppy and I left debug statements in. I want the comments to be on the
idea of the patch.
I created a new file called scripts/update_jump_label.[ch] based on some
of the work of recordmcount.c. This is executed at build time on all
object files just like recordmcount is. But it does not add any new
sections, it just modifies the code at build time to convert all jump
labels into nops.
The idea is in arch/x86/include/asm/jump_label.h to not place a nop, but
instead to insert a jmp to the label. Depending on how gcc optimizes the
code, the jmp will be either end up being a 2 byte or 5 byte jump.
After an object is compiled, update_jump_label is executed on this file
and it reads the ELF relocation table to find the jump label locations
and examines what jump was used. It then converts the jump into either a
2 byte or 5 byte nop that is appropriate.
At boot time, the jump labels no longer need to be converted (although
we may do so in the future to use better nops depending on the machine
that is used). When jump labels are enabled, the code is examined to see
if a two byte or 5 byte version was used, and the appropriate update is
made.
I just booted this patch and it worked. I was able to enable and disable
trace points using jump labels. Benchmarks are welcomed :)
Comments and thoughts?
-- Steve
Sloppy-signed-off-by: Steven Rostedt <rostedt@goodmis.org>
diff --git a/Makefile b/Makefile
index 31f967c..8368f42 100644
--- a/Makefile
+++ b/Makefile
@@ -245,7 +245,7 @@ CONFIG_SHELL := $(shell if [ -x "$$BASH" ]; then
echo $$BASH; \
 
 HOSTCC       = gcc
 HOSTCXX      = g++
-HOSTCFLAGS   = -Wall -Wmissing-prototypes -Wstrict-prototypes -O2
-fomit-frame-pointer
+HOSTCFLAGS   = -Wall -Wmissing-prototypes -Wstrict-prototypes -g
-fomit-frame-pointer
 HOSTCXXFLAGS = -O2
 
 # Decide whether to build built-in, modular, or both.
@@ -611,6 +611,13 @@ ifdef CONFIG_DYNAMIC_FTRACE
 endif
 endif
 
+ifdef CONFIG_JUMP_LABEL
+	ifdef CONFIG_HAVE_BUILD_TIME_JUMP_LABEL
+		BUILD_UPDATE_JUMP_LABEL := y
+		export BUILD_UPDATE_JUMP_LABEL
+	endif
+endif
+
 # We trigger additional mismatches with less inlining
 ifdef CONFIG_DEBUG_SECTION_MISMATCH
 KBUILD_CFLAGS += $(call cc-option, -fno-inline-functions-called-once)
diff --git a/arch/Kconfig b/arch/Kconfig
index 4b0669c..8fa6934 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -169,6 +169,12 @@ config HAVE_PERF_EVENTS_NMI
 	  subsystem.  Also has support for calculating CPU cycle events
 	  to determine how many clock cycles in a given period.
 
+config HAVE_BUILD_TIME_JUMP_LABEL
+       bool
+       help
+	If an arch uses scripts/update_jump_label to patch in jump nops
+	at build time, then it must enable this option.
+
 config HAVE_ARCH_JUMP_LABEL
 	bool
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 6a47bb2..6de726a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -61,6 +61,7 @@ config X86
 	select HAVE_ARCH_KMEMCHECK
 	select HAVE_USER_RETURN_NOTIFIER
 	select HAVE_ARCH_JUMP_LABEL
+	select HAVE_BUILD_TIME_JUMP_LABEL
 	select HAVE_TEXT_POKE_SMP
 	select HAVE_GENERIC_HARDIRQS
 	select HAVE_SPARSE_IRQ
diff --git a/arch/x86/include/asm/jump_label.h
b/arch/x86/include/asm/jump_label.h
index a32b18c..872b3e1 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -14,7 +14,7 @@
 static __always_inline bool arch_static_branch(struct jump_label_key *key)
 {
 	asm goto("1:"
-		JUMP_LABEL_INITIAL_NOP
+		"jmp %l[l_yes]\n"
 		".pushsection __jump_table,  \"aw\" \n\t"
 		_ASM_ALIGN "\n\t"
 		_ASM_PTR "1b, %l[l_yes], %c0 \n\t"
diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index 3fee346..1f7f88f 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -16,34 +16,75 @@
 
 #ifdef HAVE_JUMP_LABEL
 
+static unsigned char nop_short[] = { P6_NOP2 };
+
 union jump_code_union {
 	char code[JUMP_LABEL_NOP_SIZE];
 	struct {
 		char jump;
 		int offset;
 	} __attribute__((packed));
+	struct {
+		char jump_short;
+		char offset_short;
+	} __attribute__((packed));
 };
 
 void arch_jump_label_transform(struct jump_entry *entry,
 			       enum jump_label_type type)
 {
 	union jump_code_union code;
+	unsigned char op;
+	unsigned size;
+	unsigned char nop;
+
+	/* Use probe_kernel_read()? */
+	op = *(unsigned char *)entry->code;
+	nop = ideal_nops[NOP_ATOMIC5][0];
 
 	if (type == JUMP_LABEL_ENABLE) {
-		code.jump = 0xe9;
-		code.offset = entry->target -
-				(entry->code + JUMP_LABEL_NOP_SIZE);
-	} else
-		memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE);
+		if (op == 0xe9 || op == 0xeb)
+			/* Already enabled. Warn? */
+			return;
+
+		/* FIXME for all archs */
+		if (op == nop_short[0]) {
+			size = 2;
+			code.jump_short = 0xeb;
+			code.offset = entry->target -
+				(entry->code + 2);
+			/* Check for overflow ? */
+		} else if (op == nop) {
+			size = JUMP_LABEL_NOP_SIZE;
+			code.jump = 0xe9;
+			code.offset = entry->target - (entry->code + size);
+		} else
+			return; /* WARN ? */
+
+	} else {
+		if (op == nop_short[0] || nop)
+			/* Already disabled, warn? */
+			return;
+
+		if (op == 0xe9) {
+			size = JUMP_LABEL_NOP_SIZE;
+			memcpy(&code, ideal_nops[NOP_ATOMIC5], size);
+		} else if (op == 0xeb) {
+			size = 2;
+			memcpy(&code, nop_short, size);
+		} else
+			return; /* WARN ? */
+	}
 	get_online_cpus();
 	mutex_lock(&text_mutex);
-	text_poke_smp((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
+	text_poke_smp((void *)entry->code, &code, size);
 	mutex_unlock(&text_mutex);
 	put_online_cpus();
 }
 
 void arch_jump_label_text_poke_early(jump_label_t addr)
 {
+	return;
 	text_poke_early((void *)addr, ideal_nops[NOP_ATOMIC5],
 			JUMP_LABEL_NOP_SIZE);
 }
diff --git a/scripts/Makefile b/scripts/Makefile
index df7678f..738b65c 100644
--- a/scripts/Makefile
+++ b/scripts/Makefile
@@ -13,6 +13,7 @@ hostprogs-$(CONFIG_LOGO)         += pnmtologo
 hostprogs-$(CONFIG_VT)           += conmakehash
 hostprogs-$(CONFIG_IKCONFIG)     += bin2c
 hostprogs-$(BUILD_C_RECORDMCOUNT) += recordmcount
+hostprogs-$(BUILD_UPDATE_JUMP_LABEL) += update_jump_label
 
 always		:= $(hostprogs-y) $(hostprogs-m)
 
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index a0fd502..bc0d89b 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -258,6 +258,15 @@ cmd_modversions =								\
 	fi;
 endif
 
+ifdef BUILD_UPDATE_JUMP_LABEL
+update_jump_label_source := $(srctree)/scripts/update_jump_label.c \
+			$(srctree)/scripts/update_jump_label.h
+cmd_update_jump_label =						\
+	if [ $(@) != "scripts/mod/empty.o" ]; then		\
+		$(objtree)/scripts/update_jump_label "$(@)";	\
+	fi;
+endif
+
 ifdef CONFIG_FTRACE_MCOUNT_RECORD
 ifdef BUILD_C_RECORDMCOUNT
 ifeq ("$(origin RECORDMCOUNT_WARN)", "command line")
@@ -294,6 +303,7 @@ define rule_cc_o_c
 	$(cmd_modversions)						  \
 	$(call echo-cmd,record_mcount)					  \
 	$(cmd_record_mcount)						  \
+	$(cmd_update_jump_label)					  \
 	scripts/basic/fixdep $(depfile) $@ ''$(call make-cmd,cc_o_c)''
>    \
 	                                              $(dot-target).tmp;  \
 	rm -f $(depfile);						  \
@@ -301,13 +311,14 @@ define rule_cc_o_c
 endef
 
 # Built-in and composite module parts
-$(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
+$(obj)/%.o: $(src)/%.c $(recordmcount_source) $(update_jump_label_source) FORCE
 	$(call cmd,force_checksrc)
 	$(call if_changed_rule,cc_o_c)
 
 # Single-part modules are special since we need to mark them in $(MODVERDIR)
 
-$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE
+$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) \
+		  $(update_jump_label_source) FORCE
 	$(call cmd,force_checksrc)
 	$(call if_changed_rule,cc_o_c)
 	@{ echo $(@:.o=.ko); echo $@; } > $(MODVERDIR)/$(@F:.o=.mod)
diff --git a/scripts/update_jump_label.c b/scripts/update_jump_label.c
new file mode 100644
index 0000000..86e17bc
--- /dev/null
+++ b/scripts/update_jump_label.c
@@ -0,0 +1,349 @@
+/*
+ * update_jump_label.c: replace jmps with nops at compile time.
+ * Copyright 2010 Steven Rostedt <srostedt@redhat.com>, Red Hat Inc.
+ *  Parsing of the elf file was influenced by recordmcount.c
+ *  originally written by and copyright to John F. Reiser
<jreiser@BitWagon.com>.
+ */
+
+/*
+ * Note, this code is originally designed for x86, but may be used by
+ * other archs to do the nop updates at compile time instead of at boot time.
+ * X86 uses this as an optimization, as jmps can be either 2 bytes or 5 bytes.
+ * Inserting a 2 byte where possible helps with both CPU performance and
+ * icache strain.
+ */
+#include <sys/types.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <getopt.h>
+#include <elf.h>
+#include <fcntl.h>
+#include <setjmp.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdarg.h>
+#include <string.h>
+#include <unistd.h>
+
+static int fd_map;	/* File descriptor for file being modified. */
+static struct stat sb;	/* Remember .st_size, etc. */
+static int mmap_failed; /* Boolean flag. */
+
+static void die(const char *err, const char *fmt, ...)
+{
+	va_list ap;
+
+	if (err)
+		perror(err);
+
+	if (fmt) {
+		va_start(ap, fmt);
+		fprintf(stderr, "Fatal error:  ");
+		vfprintf(stderr, fmt, ap);
+		fprintf(stderr, "\n");
+		va_end(ap);
+	}
+
+	exit(1);
+}
+
+static void usage(char **argv)
+{
+	char *arg = argv[0];
+	char *p = arg+strlen(arg);
+
+	while (p >= arg && *p != ''/'')
+		p--;
+	p++;
+
+	printf("usage: %s file\n"
+	       "\n",p);
+	exit(-1);
+}
+
+/* w8rev, w8nat, ...: Handle endianness. */
+
+static uint64_t w8rev(uint64_t const x)
+{
+	return   ((0xff & (x >> (0 * 8))) << (7 * 8))
+	       | ((0xff & (x >> (1 * 8))) << (6 * 8))
+	       | ((0xff & (x >> (2 * 8))) << (5 * 8))
+	       | ((0xff & (x >> (3 * 8))) << (4 * 8))
+	       | ((0xff & (x >> (4 * 8))) << (3 * 8))
+	       | ((0xff & (x >> (5 * 8))) << (2 * 8))
+	       | ((0xff & (x >> (6 * 8))) << (1 * 8))
+	       | ((0xff & (x >> (7 * 8))) << (0 * 8));
+}
+
+static uint32_t w4rev(uint32_t const x)
+{
+	return   ((0xff & (x >> (0 * 8))) << (3 * 8))
+	       | ((0xff & (x >> (1 * 8))) << (2 * 8))
+	       | ((0xff & (x >> (2 * 8))) << (1 * 8))
+	       | ((0xff & (x >> (3 * 8))) << (0 * 8));
+}
+
+static uint32_t w2rev(uint16_t const x)
+{
+	return   ((0xff & (x >> (0 * 8))) << (1 * 8))
+	       | ((0xff & (x >> (1 * 8))) << (0 * 8));
+}
+
+static uint64_t w8nat(uint64_t const x)
+{
+	return x;
+}
+
+static uint32_t w4nat(uint32_t const x)
+{
+	return x;
+}
+
+static uint32_t w2nat(uint16_t const x)
+{
+	return x;
+}
+
+static uint64_t (*w8)(uint64_t);
+static uint32_t (*w)(uint32_t);
+static uint32_t (*w2)(uint16_t);
+
+/* ulseek, uread, ...:  Check return value for errors. */
+
+static off_t
+ulseek(int const fd, off_t const offset, int const whence)
+{
+	off_t const w = lseek(fd, offset, whence);
+	if (w == (off_t)-1)
+		die("lseek", NULL);
+
+	return w;
+}
+
+static size_t
+uread(int const fd, void *const buf, size_t const count)
+{
+	size_t const n = read(fd, buf, count);
+	if (n != count)
+		die("read", NULL);
+
+	return n;
+}
+
+static size_t
+uwrite(int const fd, void const *const buf, size_t const count)
+{
+	size_t const n = write(fd, buf, count);
+	if (n != count)
+		die("write", NULL);
+
+	return n;
+}
+
+static void *
+umalloc(size_t size)
+{
+	void *const addr = malloc(size);
+	if (addr == 0)
+		die("malloc", "malloc failed: %zu bytes\n", size);
+
+	return addr;
+}
+
+/*
+ * Get the whole file as a programming convenience in order to avoid
+ * malloc+lseek+read+free of many pieces.  If successful, then mmap
+ * avoids copying unused pieces; else just read the whole file.
+ * Open for both read and write; new info will be appended to the file.
+ * Use MAP_PRIVATE so that a few changes to the in-memory ElfXX_Ehdr
+ * do not propagate to the file until an explicit overwrite at the last.
+ * This preserves most aspects of consistency (all except .st_size)
+ * for simultaneous readers of the file while we are appending to it.
+ * However, multiple writers still are bad.  We choose not to use
+ * locking because it is expensive and the use case of kernel build
+ * makes multiple writers unlikely.
+ */
+static void *mmap_file(char const *fname)
+{
+	void *addr;
+
+	fd_map = open(fname, O_RDWR);
+	if (fd_map < 0 || fstat(fd_map, &sb) < 0)
+		die(fname, "failed to open file");
+
+	if (!S_ISREG(sb.st_mode))
+		die(NULL, "not a regular file: %s\n", fname);
+
+	addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE,
+		    fd_map, 0);
+
+	mmap_failed = 0;
+	if (addr == MAP_FAILED) {
+		mmap_failed = 1;
+		addr = umalloc(sb.st_size);
+		uread(fd_map, addr, sb.st_size);
+	}
+	return addr;
+}
+
+static void munmap_file(void *addr)
+{
+	if (!mmap_failed)
+		munmap(addr, sb.st_size);
+	else
+		free(addr);
+	close(fd_map);
+}
+
+static unsigned char ideal_nop5_x86_64[5] = { 0x0f, 0x1f, 0x44, 0x00, 0x00 };
+static unsigned char ideal_nop5_x86_32[5] = { 0x3e, 0x8d, 0x74, 0x26, 0x00 };
+static unsigned char ideal_nop2_x86[2] = { 0x66, 0x99 };
+static unsigned char *ideal_nop;
+
+static int (*make_nop)(void *map, size_t const offset);
+
+static int make_nop_x86(void *map, size_t const offset)
+{
+	unsigned char *op;
+	unsigned char *nop;
+	int size;
+
+	/* Determine which type of jmp this is 2 byte or 5. */
+	op = map + offset;
+	switch (*op) {
+	case 0xeb: /* 2 byte */
+		size = 2;
+		nop = ideal_nop2_x86;
+		break;
+	case 0xe9: /* 5 byte */
+		size = 5;
+		nop = ideal_nop;
+		break;
+	default:
+		die(NULL, "Bad jump label section\n");
+	}
+
+	/* convert to nop */
+	ulseek(fd_map, offset, SEEK_SET);
+	uwrite(fd_map, nop, size);
+	return 0;
+}
+
+/* 32 bit and 64 bit are very similar */
+#include "update_jump_label.h"
+#define UPDATE_JUMP_LABEL_64
+#include "update_jump_label.h"
+
+static int do_file(const char *fname)
+{
+	Elf32_Ehdr *const ehdr = mmap_file(fname);
+	unsigned int reltype = 0;
+
+	w = w4nat;
+	w2 = w2nat;
+	w8 = w8nat;
+	switch (ehdr->e_ident[EI_DATA]) {
+		static unsigned int const endian = 1;
+	default:
+		die(NULL, "unrecognized ELF data encoding %d: %s\n",
+			ehdr->e_ident[EI_DATA], fname);
+		break;
+	case ELFDATA2LSB:
+		if (*(unsigned char const *)&endian != 1) {
+			/* main() is big endian, file.o is little endian. */
+			w = w4rev;
+			w2 = w2rev;
+			w8 = w8rev;
+		}
+		break;
+	case ELFDATA2MSB:
+		if (*(unsigned char const *)&endian != 0) {
+			/* main() is little endian, file.o is big endian. */
+			w = w4rev;
+			w2 = w2rev;
+			w8 = w8rev;
+		}
+		break;
+	}  /* end switch */
+
+	if (memcmp(ELFMAG, ehdr->e_ident, SELFMAG) != 0 ||
+	    w2(ehdr->e_type) != ET_REL ||
+	    ehdr->e_ident[EI_VERSION] != EV_CURRENT)
+		die(NULL, "unrecognized ET_REL file %s\n", fname);
+
+	switch (w2(ehdr->e_machine)) {
+	default:
+		die(NULL, "unrecognized e_machine %d %s\n",
+		    w2(ehdr->e_machine), fname);
+		break;
+	case EM_386:
+		reltype = R_386_32;
+		make_nop = make_nop_x86;
+		ideal_nop = ideal_nop5_x86_32;
+		break;
+	case EM_ARM:	 reltype = R_ARM_ABS32;
+			 break;
+	case EM_IA_64:	 reltype = R_IA64_IMM64; break;
+	case EM_MIPS:	 /* reltype: e_class    */ break;
+	case EM_PPC:	 reltype = R_PPC_ADDR32;   break;
+	case EM_PPC64:	 reltype = R_PPC64_ADDR64; break;
+	case EM_S390:    /* reltype: e_class    */ break;
+	case EM_SH:	 reltype = R_SH_DIR32;                 break;
+	case EM_SPARCV9: reltype = R_SPARC_64;     break;
+	case EM_X86_64:
+		make_nop = make_nop_x86;
+		ideal_nop = ideal_nop5_x86_64;
+		reltype = R_X86_64_64;
+		break;
+	}  /* end switch */
+
+	switch (ehdr->e_ident[EI_CLASS]) {
+	default:
+		die(NULL, "unrecognized ELF class %d %s\n",
+		    ehdr->e_ident[EI_CLASS], fname);
+		break;
+	case ELFCLASS32:
+		if (w2(ehdr->e_ehsize) != sizeof(Elf32_Ehdr)
+		||  w2(ehdr->e_shentsize) != sizeof(Elf32_Shdr))
+			die(NULL, "unrecognized ET_REL file: %s\n", fname);
+
+		if (w2(ehdr->e_machine) == EM_S390) {
+			reltype = R_390_32;
+		}
+		if (w2(ehdr->e_machine) == EM_MIPS) {
+			reltype = R_MIPS_32;
+		}
+		do_func32(ehdr, fname, reltype);
+		break;
+	case ELFCLASS64: {
+		Elf64_Ehdr *const ghdr = (Elf64_Ehdr *)ehdr;
+		if (w2(ghdr->e_ehsize) != sizeof(Elf64_Ehdr)
+		||  w2(ghdr->e_shentsize) != sizeof(Elf64_Shdr))
+			die(NULL, "unrecognized ET_REL file: %s\n", fname);
+
+		if (w2(ghdr->e_machine) == EM_S390)
+			reltype = R_390_64;
+
+#if 0
+		if (w2(ghdr->e_machine) == EM_MIPS) {
+			reltype = R_MIPS_64;
+			Elf64_r_sym = MIPS64_r_sym;
+		}
+#endif
+		do_func64(ghdr, fname, reltype);
+		break;
+	}
+	}  /* end switch */
+
+	munmap_file(ehdr);
+	return 0;
+}
+
+int main (int argc, char **argv)
+{
+	if (argc != 2)
+		usage(argv);
+	
+	return do_file(argv[1]);
+}
+
diff --git a/scripts/update_jump_label.h b/scripts/update_jump_label.h
new file mode 100644
index 0000000..6ff9846
--- /dev/null
+++ b/scripts/update_jump_label.h
@@ -0,0 +1,322 @@
+/*
+ * recordmcount.h
+ *
+ * This code was taken out of recordmcount.c written by
+ * Copyright 2009 John F. Reiser <jreiser@BitWagon.com>.  All rights
reserved.
+ *
+ * The original code had the same algorithms for both 32bit
+ * and 64bit ELF files, but the code was duplicated to support
+ * the difference in structures that were used. This
+ * file creates a macro of everything that is different between
+ * the 64 and 32 bit code, such that by including this header
+ * twice we can create both sets of functions by including this
+ * header once with RECORD_MCOUNT_64 undefined, and again with
+ * it defined.
+ *
+ * This conversion to macros was done by:
+ * Copyright 2010 Steven Rostedt <srostedt@redhat.com>, Red Hat Inc.
+ *
+ * Licensed under the GNU General Public License, version 2 (GPLv2).
+ */
+
+#undef EBITS
+#undef _w
+#undef _align
+#undef _size
+
+#ifdef UPDATE_JUMP_LABEL_64
+# define EBITS			64
+# define _w			w8
+# define _align			7u
+# define _size			8
+#else
+# define EBITS			32
+# define _w			w
+# define _align			3u
+# define _size			4
+#endif
+
+#define _FBITS(x, e)	x##e
+#define FBITS(x, e)	_FBITS(x,e)
+#define FUNC(x)		FBITS(x,EBITS)
+
+#undef Elf_Addr
+#undef Elf_Ehdr
+#undef Elf_Shdr
+#undef Elf_Rel
+#undef Elf_Rela
+#undef Elf_Sym
+#undef ELF_R_SYM
+#undef ELF_R_TYPE
+
+#define __ATTACH(x,y,z)	x##y##z
+#define ATTACH(x,y,z)	__ATTACH(x,y,z)
+
+#define Elf_Addr	ATTACH(Elf,EBITS,_Addr)
+#define Elf_Ehdr	ATTACH(Elf,EBITS,_Ehdr)
+#define Elf_Shdr	ATTACH(Elf,EBITS,_Shdr)
+#define Elf_Rel		ATTACH(Elf,EBITS,_Rel)
+#define Elf_Rela	ATTACH(Elf,EBITS,_Rela)
+#define Elf_Sym		ATTACH(Elf,EBITS,_Sym)
+#define uint_t		ATTACH(uint,EBITS,_t)
+#define ELF_R_SYM	ATTACH(ELF,EBITS,_R_SYM)
+#define ELF_R_TYPE	ATTACH(ELF,EBITS,_R_TYPE)
+
+#undef get_shdr
+#define get_shdr(ehdr) ((Elf_Shdr *)(_w((ehdr)->e_shoff) + (void *)(ehdr)))
+
+#undef get_section_loc
+#define get_section_loc(ehdr, shdr)(_w((shdr)->sh_offset) + (void *)(ehdr))
+
+/* Functions and pointers that do_file() may override for specific e_machine.
*/
+
+#if 0
+static uint_t FUNC(fn_ELF_R_SYM)(Elf_Rel const *rp)
+{
+	return ELF_R_SYM(_w(rp->r_info));
+}
+static uint_t (*FUNC(Elf_r_sym))(Elf_Rel const *rp) = FUNC(fn_ELF_R_SYM);
+#endif
+
+static void FUNC(get_sym_str_and_relp)(Elf_Shdr const *const relhdr,
+				 Elf_Ehdr const *const ehdr,
+				 Elf_Sym const **sym0,
+				 char const **str0,
+				 Elf_Rel const **relp)
+{
+	Elf_Shdr *const shdr0 = get_shdr(ehdr);
+	unsigned const symsec_sh_link = w(relhdr->sh_link);
+	Elf_Shdr const *const symsec = &shdr0[symsec_sh_link];
+	Elf_Shdr const *const strsec = &shdr0[w(symsec->sh_link)];
+	Elf_Rel const *const rel0 +		(Elf_Rel const *)get_section_loc(ehdr, relhdr);
+
+	*sym0 = (Elf_Sym const *)get_section_loc(ehdr, symsec);
+
+	*str0 = (char const *)get_section_loc(ehdr, strsec);
+
+	*relp = rel0;
+}
+
+/*
+ * Read the relocation table again, but this time its called on sections
+ * that are not going to be traced. The mcount calls here will be converted
+ * into nops.
+ */
+static void FUNC(nop_jump_label)(Elf_Shdr const *const relhdr,
+		       Elf_Ehdr const *const ehdr,
+		       const char *const txtname)
+{
+	Elf_Shdr *const shdr0 = get_shdr(ehdr);
+	Elf_Sym const *sym0;
+	char const *str0;
+	Elf_Rel const *relp;
+	Elf_Rela const *relap;
+	Elf_Shdr const *const shdr = &shdr0[w(relhdr->sh_info)];
+	unsigned rel_entsize = w(relhdr->sh_entsize);
+	unsigned const nrel = _w(relhdr->sh_size) / rel_entsize;
+	int t;
+
+	FUNC(get_sym_str_and_relp)(relhdr, ehdr, &sym0, &str0, &relp);
+
+	for (t = nrel; t > 0; t -= 3) {
+		int ret = -1;
+
+		relap = (Elf_Rela const *)relp;
+		printf("rel offset=%lx info=%lx sym=%lx type=%lx addend=%lx\n",
+		       (long)relap->r_offset, (long)relap->r_info,
+		       (long)ELF_R_SYM(relap->r_info),
+		       (long)ELF_R_TYPE(relap->r_info),
+		       (long)relap->r_addend);
+
+		if (0 && make_nop)
+			ret = make_nop((void *)ehdr, shdr->sh_offset + relp->r_offset);
+
+		/* jump label sections are paired in threes */
+		relp = (Elf_Rel const *)(rel_entsize * 3 + (void *)relp);
+	}
+}
+
+/* Evade ISO C restriction: no declaration after statement in has_rel_mcount.
*/
+static char const *
+FUNC(__has_rel_jump_table)(Elf_Shdr const *const relhdr,  /* is SHT_REL or
SHT_RELA */
+		 Elf_Shdr const *const shdr0,
+		 char const *const shstrtab,
+		 char const *const fname)
+{
+	/* .sh_info depends on .sh_type == SHT_REL[,A] */
+	Elf_Shdr const *const txthdr = &shdr0[w(relhdr->sh_info)];
+	char const *const txtname = &shstrtab[w(txthdr->sh_name)];
+
+	if (strcmp("__jump_table", txtname) == 0) {
+		fprintf(stderr, "warning: __mcount_loc already exists: %s\n",
+			fname);
+//		succeed_file();
+	}
+	if (w(txthdr->sh_type) != SHT_PROGBITS ||
+	    !(w(txthdr->sh_flags) & SHF_EXECINSTR))
+		return NULL;
+	return txtname;
+}
+
+static char const *FUNC(has_rel_jump_table)(Elf_Shdr const *const relhdr,
+				      Elf_Shdr const *const shdr0,
+				      char const *const shstrtab,
+				      char const *const fname)
+{
+	if (w(relhdr->sh_type) != SHT_REL && w(relhdr->sh_type) !=
SHT_RELA)
+		return NULL;
+	return FUNC(__has_rel_jump_table)(relhdr, shdr0, shstrtab, fname);
+}
+
+/* Find relocation section hdr for a given section */
+static const Elf_Shdr *
+FUNC(find_relhdr)(const Elf_Ehdr *ehdr, const Elf_Shdr *shdr)
+{
+	const Elf_Shdr *shdr0 = get_shdr(ehdr);
+	int nhdr = w2(ehdr->e_shnum);
+	const Elf_Shdr *hdr;
+	int i;
+
+	for (hdr = shdr0, i = 0; i < nhdr; hdr = &shdr0[++i]) {
+		if (w(hdr->sh_type) != SHT_REL &&
+		    w(hdr->sh_type) != SHT_RELA)
+			continue;
+
+		/*
+		 * The relocation section''s info field holds
+		 * the section index that it represents.
+		 */
+		if (shdr == &shdr0[w(hdr->sh_info)])
+			return hdr;
+	}
+	return NULL;
+}
+
+/* Find a section headr based on name and type */
+static const Elf_Shdr *
+FUNC(find_shdr)(const Elf_Ehdr *ehdr, const char *name, uint_t type)
+{
+	const Elf_Shdr *shdr0 = get_shdr(ehdr);
+	const Elf_Shdr *shstr = &shdr0[w2(ehdr->e_shstrndx)];
+	const char *shstrtab = (char *)get_section_loc(ehdr, shstr);
+	int nhdr = w2(ehdr->e_shnum);
+	const Elf_Shdr *hdr;
+	const char *hdrname;
+	int i;
+
+	for (hdr = shdr0, i = 0; i < nhdr; hdr = &shdr0[++i]) {
+		if (w(hdr->sh_type) != type)
+			continue;
+
+		/* If we are just looking for a section by type (ie. SYMTAB) */
+		if (!name)
+			return hdr;
+
+		hdrname = &shstrtab[w(hdr->sh_name)];
+		if (strcmp(hdrname, name) == 0)
+			return hdr;
+	}
+	return NULL;
+}
+
+static void
+FUNC(section_update)(const Elf_Ehdr *ehdr, const Elf_Shdr *symhdr,
+		     unsigned shtype, const Elf_Rel *rel, void *data)
+{
+	const Elf_Shdr *shdr0 = get_shdr(ehdr);
+	const Elf_Shdr *targethdr;
+	const Elf_Rela *rela;
+	const Elf_Sym *syment;
+	uint_t offset = _w(rel->r_offset);
+	uint_t info = _w(rel->r_info);
+	uint_t sym = ELF_R_SYM(info);
+	uint_t type = ELF_R_TYPE(info);
+	uint_t addend;
+	uint_t targetloc;
+
+	if (shtype == SHT_RELA) {
+		rela = (const Elf_Rela *)rel;
+		addend = _w(rela->r_addend);
+	} else
+		addend = _w(*(unsigned short *)(data + offset));
+
+	syment = (const Elf_Sym *)get_section_loc(ehdr, symhdr);
+	targethdr = &shdr0[w2(syment[sym].st_shndx)];
+	targetloc = _w(targethdr->sh_offset);
+
+	/* TODO, need a separate function for all archs */
+	if (type != R_386_32)
+		die(NULL, "Arch relocation type %d not supported", type);
+
+	targetloc += addend;
+
+#if 1
+	printf("offset=%x target=%x shoffset=%x add=%x\n",
+	       offset, targetloc, _w(targethdr->sh_offset), addend);
+#endif
+	*(uint_t *)(data + offset) = targetloc;
+}
+
+/* Overall supervision for Elf32 ET_REL file. */
+static void
+FUNC(do_func)(Elf_Ehdr *ehdr, char const *const fname, unsigned const reltype)
+{
+	const Elf_Shdr *jlshdr;
+	const Elf_Shdr *jlrhdr;
+	const Elf_Shdr *symhdr;
+	const Elf_Rel *rel;
+	unsigned size;
+	unsigned cnt;
+	unsigned i;
+	uint_t type;
+	void *jdata;
+	void *data;
+
+	jlshdr = FUNC(find_shdr)(ehdr, "__jump_table", SHT_PROGBITS);
+	if (!jlshdr)
+		return;
+
+	jlrhdr = FUNC(find_relhdr)(ehdr, jlshdr);
+	if (!jlrhdr)
+		return;
+
+	/*
+	 * Create and fill in the __jump_table section and use it to
+	 * find the offsets into the text that we want to update.
+	 * We create it so that we do not depend on the order of the
+	 * relocations, and use the table directly, as it is broken
+	 * up into sections.
+	 */
+	size = _w(jlshdr->sh_size);
+	data = umalloc(size);
+
+	jdata = (void *)get_section_loc(ehdr, jlshdr);
+	memcpy(data, jdata, size);
+
+	cnt = _w(jlrhdr->sh_size) / w(jlrhdr->sh_entsize);
+
+	rel = (const Elf_Rel *)get_section_loc(ehdr, jlrhdr);
+
+	/* Is this as Rel or Rela? */
+	type = w(jlrhdr->sh_type);
+
+	symhdr = FUNC(find_shdr)(ehdr, NULL, SHT_SYMTAB);
+
+	for (i = 0; i < cnt; i++) {
+		FUNC(section_update)(ehdr, symhdr, type, rel, data);
+		rel = (void *)rel + w(jlrhdr->sh_entsize);
+	}
+
+	/*
+	 * This is specific to x86. The jump_table is stored in three
+	 * long words. The first is the location of the jmp target we
+	 * must update.
+	 */
+	cnt = size / sizeof(uint_t);
+
+	for (i = 0; i < cnt; i += 3)
+		if (0)make_nop((void *)ehdr, *(uint_t *)(data + i * sizeof(uint_t)));
+
+	free(data);
+}
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Jason Baron
2011-Oct-07  18:52 UTC
[Xen-devel] Re: [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps
On Fri, Oct 07, 2011 at 01:09:32PM -0400, Steven Rostedt wrote:> Note, this is just hacked together and needs to be cleaned up. Please do > not comment on formatting or other sloppiness of this patch. I know it''s > sloppy and I left debug statements in. I want the comments to be on the > idea of the patch. > > I created a new file called scripts/update_jump_label.[ch] based on some > of the work of recordmcount.c. This is executed at build time on all > object files just like recordmcount is. But it does not add any new > sections, it just modifies the code at build time to convert all jump > labels into nops. > > The idea is in arch/x86/include/asm/jump_label.h to not place a nop, but > instead to insert a jmp to the label. Depending on how gcc optimizes the > code, the jmp will be either end up being a 2 byte or 5 byte jump. > > After an object is compiled, update_jump_label is executed on this file > and it reads the ELF relocation table to find the jump label locations > and examines what jump was used. It then converts the jump into either a > 2 byte or 5 byte nop that is appropriate. > > At boot time, the jump labels no longer need to be converted (although > we may do so in the future to use better nops depending on the machine > that is used). When jump labels are enabled, the code is examined to see > if a two byte or 5 byte version was used, and the appropriate update is > made. > > I just booted this patch and it worked. I was able to enable and disable > trace points using jump labels. Benchmarks are welcomed :) > > Comments and thoughts? >Generally, I really like it, I guess b/c I suggested it :) I''ll try and run some workloads on it - A real simple one, I used recently was putting a single jump label in ''getppid()'' and then calling it in a loop - I wonder if the short nop vs long nop would show up there, as a baseline test. (fwiw, the jump label vs. no jump label for this test was anywhere b/w 1-5% improvement). Anyways, some comments below.> -- Steve > > Sloppy-signed-off-by: Steven Rostedt <rostedt@goodmis.org> > > diff --git a/Makefile b/Makefile > index 31f967c..8368f42 100644 > --- a/Makefile > +++ b/Makefile > @@ -245,7 +245,7 @@ CONFIG_SHELL := $(shell if [ -x "$$BASH" ]; then echo $$BASH; \ > > HOSTCC = gcc > HOSTCXX = g++ > -HOSTCFLAGS = -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer > +HOSTCFLAGS = -Wall -Wmissing-prototypes -Wstrict-prototypes -g -fomit-frame-pointer > HOSTCXXFLAGS = -O2 > > # Decide whether to build built-in, modular, or both. > @@ -611,6 +611,13 @@ ifdef CONFIG_DYNAMIC_FTRACE > endif > endif > > +ifdef CONFIG_JUMP_LABEL > + ifdef CONFIG_HAVE_BUILD_TIME_JUMP_LABEL > + BUILD_UPDATE_JUMP_LABEL := y > + export BUILD_UPDATE_JUMP_LABEL > + endif > +endif > + > # We trigger additional mismatches with less inlining > ifdef CONFIG_DEBUG_SECTION_MISMATCH > KBUILD_CFLAGS += $(call cc-option, -fno-inline-functions-called-once) > diff --git a/arch/Kconfig b/arch/Kconfig > index 4b0669c..8fa6934 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -169,6 +169,12 @@ config HAVE_PERF_EVENTS_NMI > subsystem. Also has support for calculating CPU cycle events > to determine how many clock cycles in a given period. > > +config HAVE_BUILD_TIME_JUMP_LABEL > + bool > + help > + If an arch uses scripts/update_jump_label to patch in jump nops > + at build time, then it must enable this option. > + > config HAVE_ARCH_JUMP_LABEL > bool > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 6a47bb2..6de726a 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -61,6 +61,7 @@ config X86 > select HAVE_ARCH_KMEMCHECK > select HAVE_USER_RETURN_NOTIFIER > select HAVE_ARCH_JUMP_LABEL > + select HAVE_BUILD_TIME_JUMP_LABEL > select HAVE_TEXT_POKE_SMP > select HAVE_GENERIC_HARDIRQS > select HAVE_SPARSE_IRQ > diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h > index a32b18c..872b3e1 100644 > --- a/arch/x86/include/asm/jump_label.h > +++ b/arch/x86/include/asm/jump_label.h > @@ -14,7 +14,7 @@ > static __always_inline bool arch_static_branch(struct jump_label_key *key) > { > asm goto("1:" > - JUMP_LABEL_INITIAL_NOP > + "jmp %l[l_yes]\n" > ".pushsection __jump_table, \"aw\" \n\t" > _ASM_ALIGN "\n\t" > _ASM_PTR "1b, %l[l_yes], %c0 \n\t" > diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c > index 3fee346..1f7f88f 100644 > --- a/arch/x86/kernel/jump_label.c > +++ b/arch/x86/kernel/jump_label.c > @@ -16,34 +16,75 @@ > > #ifdef HAVE_JUMP_LABEL > > +static unsigned char nop_short[] = { P6_NOP2 }; > + > union jump_code_union { > char code[JUMP_LABEL_NOP_SIZE]; > struct { > char jump; > int offset; > } __attribute__((packed)); > + struct { > + char jump_short; > + char offset_short; > + } __attribute__((packed)); > }; > > void arch_jump_label_transform(struct jump_entry *entry, > enum jump_label_type type) > { > union jump_code_union code; > + unsigned char op; > + unsigned size; > + unsigned char nop; > + > + /* Use probe_kernel_read()? */ > + op = *(unsigned char *)entry->code; > + nop = ideal_nops[NOP_ATOMIC5][0]; > > if (type == JUMP_LABEL_ENABLE) { > - code.jump = 0xe9; > - code.offset = entry->target - > - (entry->code + JUMP_LABEL_NOP_SIZE); > - } else > - memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE); > + if (op == 0xe9 || op == 0xeb) > + /* Already enabled. Warn? */ > + return; > +Using the jump_label_inc/dec interface this shouldn''t happen, I would have it be BUG> + /* FIXME for all archs */ > + if (op == nop_short[0]) { > + size = 2; > + code.jump_short = 0xeb; > + code.offset = entry->target - > + (entry->code + 2); > + /* Check for overflow ? */ > + } else if (op == nop) { > + size = JUMP_LABEL_NOP_SIZE; > + code.jump = 0xe9; > + code.offset = entry->target - (entry->code + size); > + } else > + return; /* WARN ? */same here, at least WARN, more likely BUG()> + > + } else { > + if (op == nop_short[0] || nop) > + /* Already disabled, warn? */ > + return; > +same here.> + if (op == 0xe9) { > + size = JUMP_LABEL_NOP_SIZE; > + memcpy(&code, ideal_nops[NOP_ATOMIC5], size); > + } else if (op == 0xeb) { > + size = 2; > + memcpy(&code, nop_short, size); > + } else > + return; /* WARN ? */same here> + } > get_online_cpus(); > mutex_lock(&text_mutex); > - text_poke_smp((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE); > + text_poke_smp((void *)entry->code, &code, size); > mutex_unlock(&text_mutex); > put_online_cpus(); > } > > void arch_jump_label_text_poke_early(jump_label_t addr) > { > + return; > text_poke_early((void *)addr, ideal_nops[NOP_ATOMIC5], > JUMP_LABEL_NOP_SIZE); > }hmmm...we spent a bunch of time selecting the ''ideal'' run-time noops I wouldn''t want to drop that work.> diff --git a/scripts/Makefile b/scripts/Makefile > index df7678f..738b65c 100644 > --- a/scripts/Makefile > +++ b/scripts/Makefile > @@ -13,6 +13,7 @@ hostprogs-$(CONFIG_LOGO) += pnmtologo > hostprogs-$(CONFIG_VT) += conmakehash > hostprogs-$(CONFIG_IKCONFIG) += bin2c > hostprogs-$(BUILD_C_RECORDMCOUNT) += recordmcount > +hostprogs-$(BUILD_UPDATE_JUMP_LABEL) += update_jump_label > > always := $(hostprogs-y) $(hostprogs-m) > > diff --git a/scripts/Makefile.build b/scripts/Makefile.build > index a0fd502..bc0d89b 100644 > --- a/scripts/Makefile.build > +++ b/scripts/Makefile.build > @@ -258,6 +258,15 @@ cmd_modversions = \ > fi; > endif > > +ifdef BUILD_UPDATE_JUMP_LABEL > +update_jump_label_source := $(srctree)/scripts/update_jump_label.c \ > + $(srctree)/scripts/update_jump_label.h > +cmd_update_jump_label = \ > + if [ $(@) != "scripts/mod/empty.o" ]; then \ > + $(objtree)/scripts/update_jump_label "$(@)"; \ > + fi; > +endif > + > ifdef CONFIG_FTRACE_MCOUNT_RECORD > ifdef BUILD_C_RECORDMCOUNT > ifeq ("$(origin RECORDMCOUNT_WARN)", "command line") > @@ -294,6 +303,7 @@ define rule_cc_o_c > $(cmd_modversions) \ > $(call echo-cmd,record_mcount) \ > $(cmd_record_mcount) \ > + $(cmd_update_jump_label) \ > scripts/basic/fixdep $(depfile) $@ ''$(call make-cmd,cc_o_c)'' > \ > $(dot-target).tmp; \ > rm -f $(depfile); \ > @@ -301,13 +311,14 @@ define rule_cc_o_c > endef > > # Built-in and composite module parts > -$(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE > +$(obj)/%.o: $(src)/%.c $(recordmcount_source) $(update_jump_label_source) FORCE > $(call cmd,force_checksrc) > $(call if_changed_rule,cc_o_c) > > # Single-part modules are special since we need to mark them in $(MODVERDIR) > > -$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE > +$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) \ > + $(update_jump_label_source) FORCE > $(call cmd,force_checksrc) > $(call if_changed_rule,cc_o_c) > @{ echo $(@:.o=.ko); echo $@; } > $(MODVERDIR)/$(@F:.o=.mod) > diff --git a/scripts/update_jump_label.c b/scripts/update_jump_label.c > new file mode 100644 > index 0000000..86e17bc > --- /dev/null > +++ b/scripts/update_jump_label.c > @@ -0,0 +1,349 @@ > +/* > + * update_jump_label.c: replace jmps with nops at compile time. > + * Copyright 2010 Steven Rostedt <srostedt@redhat.com>, Red Hat Inc. > + * Parsing of the elf file was influenced by recordmcount.c > + * originally written by and copyright to John F. Reiser <jreiser@BitWagon.com>. > + */ > + > +/* > + * Note, this code is originally designed for x86, but may be used by > + * other archs to do the nop updates at compile time instead of at boot time. > + * X86 uses this as an optimization, as jmps can be either 2 bytes or 5 bytes. > + * Inserting a 2 byte where possible helps with both CPU performance and > + * icache strain. > + */ > +#include <sys/types.h> > +#include <sys/mman.h> > +#include <sys/stat.h> > +#include <getopt.h> > +#include <elf.h> > +#include <fcntl.h> > +#include <setjmp.h> > +#include <stdio.h> > +#include <stdlib.h> > +#include <stdarg.h> > +#include <string.h> > +#include <unistd.h> > + > +static int fd_map; /* File descriptor for file being modified. */ > +static struct stat sb; /* Remember .st_size, etc. */ > +static int mmap_failed; /* Boolean flag. */ > + > +static void die(const char *err, const char *fmt, ...) > +{ > + va_list ap; > + > + if (err) > + perror(err); > + > + if (fmt) { > + va_start(ap, fmt); > + fprintf(stderr, "Fatal error: "); > + vfprintf(stderr, fmt, ap); > + fprintf(stderr, "\n"); > + va_end(ap); > + } > + > + exit(1); > +} > + > +static void usage(char **argv) > +{ > + char *arg = argv[0]; > + char *p = arg+strlen(arg); > + > + while (p >= arg && *p != ''/'') > + p--; > + p++; > + > + printf("usage: %s file\n" > + "\n",p); > + exit(-1); > +} > + > +/* w8rev, w8nat, ...: Handle endianness. */ > + > +static uint64_t w8rev(uint64_t const x) > +{ > + return ((0xff & (x >> (0 * 8))) << (7 * 8)) > + | ((0xff & (x >> (1 * 8))) << (6 * 8)) > + | ((0xff & (x >> (2 * 8))) << (5 * 8)) > + | ((0xff & (x >> (3 * 8))) << (4 * 8)) > + | ((0xff & (x >> (4 * 8))) << (3 * 8)) > + | ((0xff & (x >> (5 * 8))) << (2 * 8)) > + | ((0xff & (x >> (6 * 8))) << (1 * 8)) > + | ((0xff & (x >> (7 * 8))) << (0 * 8)); > +} > + > +static uint32_t w4rev(uint32_t const x) > +{ > + return ((0xff & (x >> (0 * 8))) << (3 * 8)) > + | ((0xff & (x >> (1 * 8))) << (2 * 8)) > + | ((0xff & (x >> (2 * 8))) << (1 * 8)) > + | ((0xff & (x >> (3 * 8))) << (0 * 8)); > +} > + > +static uint32_t w2rev(uint16_t const x) > +{ > + return ((0xff & (x >> (0 * 8))) << (1 * 8)) > + | ((0xff & (x >> (1 * 8))) << (0 * 8)); > +} > + > +static uint64_t w8nat(uint64_t const x) > +{ > + return x; > +} > + > +static uint32_t w4nat(uint32_t const x) > +{ > + return x; > +} > + > +static uint32_t w2nat(uint16_t const x) > +{ > + return x; > +} > + > +static uint64_t (*w8)(uint64_t); > +static uint32_t (*w)(uint32_t); > +static uint32_t (*w2)(uint16_t); > + > +/* ulseek, uread, ...: Check return value for errors. */ > + > +static off_t > +ulseek(int const fd, off_t const offset, int const whence) > +{ > + off_t const w = lseek(fd, offset, whence); > + if (w == (off_t)-1) > + die("lseek", NULL); > + > + return w; > +} > + > +static size_t > +uread(int const fd, void *const buf, size_t const count) > +{ > + size_t const n = read(fd, buf, count); > + if (n != count) > + die("read", NULL); > + > + return n; > +} > + > +static size_t > +uwrite(int const fd, void const *const buf, size_t const count) > +{ > + size_t const n = write(fd, buf, count); > + if (n != count) > + die("write", NULL); > + > + return n; > +} > + > +static void * > +umalloc(size_t size) > +{ > + void *const addr = malloc(size); > + if (addr == 0) > + die("malloc", "malloc failed: %zu bytes\n", size); > + > + return addr; > +} > + > +/* > + * Get the whole file as a programming convenience in order to avoid > + * malloc+lseek+read+free of many pieces. If successful, then mmap > + * avoids copying unused pieces; else just read the whole file. > + * Open for both read and write; new info will be appended to the file. > + * Use MAP_PRIVATE so that a few changes to the in-memory ElfXX_Ehdr > + * do not propagate to the file until an explicit overwrite at the last. > + * This preserves most aspects of consistency (all except .st_size) > + * for simultaneous readers of the file while we are appending to it. > + * However, multiple writers still are bad. We choose not to use > + * locking because it is expensive and the use case of kernel build > + * makes multiple writers unlikely. > + */ > +static void *mmap_file(char const *fname) > +{ > + void *addr; > + > + fd_map = open(fname, O_RDWR); > + if (fd_map < 0 || fstat(fd_map, &sb) < 0) > + die(fname, "failed to open file"); > + > + if (!S_ISREG(sb.st_mode)) > + die(NULL, "not a regular file: %s\n", fname); > + > + addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE, > + fd_map, 0); > + > + mmap_failed = 0; > + if (addr == MAP_FAILED) { > + mmap_failed = 1; > + addr = umalloc(sb.st_size); > + uread(fd_map, addr, sb.st_size); > + } > + return addr; > +} > + > +static void munmap_file(void *addr) > +{ > + if (!mmap_failed) > + munmap(addr, sb.st_size); > + else > + free(addr); > + close(fd_map); > +} > + > +static unsigned char ideal_nop5_x86_64[5] = { 0x0f, 0x1f, 0x44, 0x00, 0x00 }; > +static unsigned char ideal_nop5_x86_32[5] = { 0x3e, 0x8d, 0x74, 0x26, 0x00 }; > +static unsigned char ideal_nop2_x86[2] = { 0x66, 0x99 }; > +static unsigned char *ideal_nop; > + > +static int (*make_nop)(void *map, size_t const offset); > + > +static int make_nop_x86(void *map, size_t const offset) > +{ > + unsigned char *op; > + unsigned char *nop; > + int size; > + > + /* Determine which type of jmp this is 2 byte or 5. */ > + op = map + offset; > + switch (*op) { > + case 0xeb: /* 2 byte */ > + size = 2; > + nop = ideal_nop2_x86; > + break; > + case 0xe9: /* 5 byte */ > + size = 5; > + nop = ideal_nop; > + break; > + default: > + die(NULL, "Bad jump label section\n"); > + } > + > + /* convert to nop */ > + ulseek(fd_map, offset, SEEK_SET); > + uwrite(fd_map, nop, size); > + return 0; > +} > + > +/* 32 bit and 64 bit are very similar */ > +#include "update_jump_label.h" > +#define UPDATE_JUMP_LABEL_64 > +#include "update_jump_label.h" > + > +static int do_file(const char *fname) > +{ > + Elf32_Ehdr *const ehdr = mmap_file(fname); > + unsigned int reltype = 0; > + > + w = w4nat; > + w2 = w2nat; > + w8 = w8nat; > + switch (ehdr->e_ident[EI_DATA]) { > + static unsigned int const endian = 1; > + default: > + die(NULL, "unrecognized ELF data encoding %d: %s\n", > + ehdr->e_ident[EI_DATA], fname); > + break; > + case ELFDATA2LSB: > + if (*(unsigned char const *)&endian != 1) { > + /* main() is big endian, file.o is little endian. */ > + w = w4rev; > + w2 = w2rev; > + w8 = w8rev; > + } > + break; > + case ELFDATA2MSB: > + if (*(unsigned char const *)&endian != 0) { > + /* main() is little endian, file.o is big endian. */ > + w = w4rev; > + w2 = w2rev; > + w8 = w8rev; > + } > + break; > + } /* end switch */ > + > + if (memcmp(ELFMAG, ehdr->e_ident, SELFMAG) != 0 || > + w2(ehdr->e_type) != ET_REL || > + ehdr->e_ident[EI_VERSION] != EV_CURRENT) > + die(NULL, "unrecognized ET_REL file %s\n", fname); > + > + switch (w2(ehdr->e_machine)) { > + default: > + die(NULL, "unrecognized e_machine %d %s\n", > + w2(ehdr->e_machine), fname); > + break; > + case EM_386: > + reltype = R_386_32; > + make_nop = make_nop_x86; > + ideal_nop = ideal_nop5_x86_32; > + break; > + case EM_ARM: reltype = R_ARM_ABS32; > + break; > + case EM_IA_64: reltype = R_IA64_IMM64; break; > + case EM_MIPS: /* reltype: e_class */ break; > + case EM_PPC: reltype = R_PPC_ADDR32; break; > + case EM_PPC64: reltype = R_PPC64_ADDR64; break; > + case EM_S390: /* reltype: e_class */ break; > + case EM_SH: reltype = R_SH_DIR32; break; > + case EM_SPARCV9: reltype = R_SPARC_64; break; > + case EM_X86_64: > + make_nop = make_nop_x86; > + ideal_nop = ideal_nop5_x86_64; > + reltype = R_X86_64_64; > + break; > + } /* end switch */ > + > + switch (ehdr->e_ident[EI_CLASS]) { > + default: > + die(NULL, "unrecognized ELF class %d %s\n", > + ehdr->e_ident[EI_CLASS], fname); > + break; > + case ELFCLASS32: > + if (w2(ehdr->e_ehsize) != sizeof(Elf32_Ehdr) > + || w2(ehdr->e_shentsize) != sizeof(Elf32_Shdr)) > + die(NULL, "unrecognized ET_REL file: %s\n", fname); > + > + if (w2(ehdr->e_machine) == EM_S390) { > + reltype = R_390_32; > + } > + if (w2(ehdr->e_machine) == EM_MIPS) { > + reltype = R_MIPS_32; > + } > + do_func32(ehdr, fname, reltype); > + break; > + case ELFCLASS64: { > + Elf64_Ehdr *const ghdr = (Elf64_Ehdr *)ehdr; > + if (w2(ghdr->e_ehsize) != sizeof(Elf64_Ehdr) > + || w2(ghdr->e_shentsize) != sizeof(Elf64_Shdr)) > + die(NULL, "unrecognized ET_REL file: %s\n", fname); > + > + if (w2(ghdr->e_machine) == EM_S390) > + reltype = R_390_64; > + > +#if 0 > + if (w2(ghdr->e_machine) == EM_MIPS) { > + reltype = R_MIPS_64; > + Elf64_r_sym = MIPS64_r_sym; > + } > +#endif > + do_func64(ghdr, fname, reltype); > + break; > + } > + } /* end switch */ > + > + munmap_file(ehdr); > + return 0; > +} > + > +int main (int argc, char **argv) > +{ > + if (argc != 2) > + usage(argv); > + > + return do_file(argv[1]); > +} > + > diff --git a/scripts/update_jump_label.h b/scripts/update_jump_label.h > new file mode 100644 > index 0000000..6ff9846 > --- /dev/null > +++ b/scripts/update_jump_label.h > @@ -0,0 +1,322 @@ > +/* > + * recordmcount.h > + * > + * This code was taken out of recordmcount.c written by > + * Copyright 2009 John F. Reiser <jreiser@BitWagon.com>. All rights reserved. > + * > + * The original code had the same algorithms for both 32bit > + * and 64bit ELF files, but the code was duplicated to support > + * the difference in structures that were used. This > + * file creates a macro of everything that is different between > + * the 64 and 32 bit code, such that by including this header > + * twice we can create both sets of functions by including this > + * header once with RECORD_MCOUNT_64 undefined, and again with > + * it defined. > + * > + * This conversion to macros was done by: > + * Copyright 2010 Steven Rostedt <srostedt@redhat.com>, Red Hat Inc. > + * > + * Licensed under the GNU General Public License, version 2 (GPLv2). > + */ > + > +#undef EBITS > +#undef _w > +#undef _align > +#undef _size > + > +#ifdef UPDATE_JUMP_LABEL_64 > +# define EBITS 64 > +# define _w w8 > +# define _align 7u > +# define _size 8 > +#else > +# define EBITS 32 > +# define _w w > +# define _align 3u > +# define _size 4 > +#endif > + > +#define _FBITS(x, e) x##e > +#define FBITS(x, e) _FBITS(x,e) > +#define FUNC(x) FBITS(x,EBITS) > + > +#undef Elf_Addr > +#undef Elf_Ehdr > +#undef Elf_Shdr > +#undef Elf_Rel > +#undef Elf_Rela > +#undef Elf_Sym > +#undef ELF_R_SYM > +#undef ELF_R_TYPE > + > +#define __ATTACH(x,y,z) x##y##z > +#define ATTACH(x,y,z) __ATTACH(x,y,z) > + > +#define Elf_Addr ATTACH(Elf,EBITS,_Addr) > +#define Elf_Ehdr ATTACH(Elf,EBITS,_Ehdr) > +#define Elf_Shdr ATTACH(Elf,EBITS,_Shdr) > +#define Elf_Rel ATTACH(Elf,EBITS,_Rel) > +#define Elf_Rela ATTACH(Elf,EBITS,_Rela) > +#define Elf_Sym ATTACH(Elf,EBITS,_Sym) > +#define uint_t ATTACH(uint,EBITS,_t) > +#define ELF_R_SYM ATTACH(ELF,EBITS,_R_SYM) > +#define ELF_R_TYPE ATTACH(ELF,EBITS,_R_TYPE) > + > +#undef get_shdr > +#define get_shdr(ehdr) ((Elf_Shdr *)(_w((ehdr)->e_shoff) + (void *)(ehdr))) > + > +#undef get_section_loc > +#define get_section_loc(ehdr, shdr)(_w((shdr)->sh_offset) + (void *)(ehdr)) > + > +/* Functions and pointers that do_file() may override for specific e_machine. */ > + > +#if 0 > +static uint_t FUNC(fn_ELF_R_SYM)(Elf_Rel const *rp) > +{ > + return ELF_R_SYM(_w(rp->r_info)); > +} > +static uint_t (*FUNC(Elf_r_sym))(Elf_Rel const *rp) = FUNC(fn_ELF_R_SYM); > +#endif > + > +static void FUNC(get_sym_str_and_relp)(Elf_Shdr const *const relhdr, > + Elf_Ehdr const *const ehdr, > + Elf_Sym const **sym0, > + char const **str0, > + Elf_Rel const **relp) > +{ > + Elf_Shdr *const shdr0 = get_shdr(ehdr); > + unsigned const symsec_sh_link = w(relhdr->sh_link); > + Elf_Shdr const *const symsec = &shdr0[symsec_sh_link]; > + Elf_Shdr const *const strsec = &shdr0[w(symsec->sh_link)]; > + Elf_Rel const *const rel0 > + (Elf_Rel const *)get_section_loc(ehdr, relhdr); > + > + *sym0 = (Elf_Sym const *)get_section_loc(ehdr, symsec); > + > + *str0 = (char const *)get_section_loc(ehdr, strsec); > + > + *relp = rel0; > +} > + > +/* > + * Read the relocation table again, but this time its called on sections > + * that are not going to be traced. The mcount calls here will be converted > + * into nops. > + */ > +static void FUNC(nop_jump_label)(Elf_Shdr const *const relhdr, > + Elf_Ehdr const *const ehdr, > + const char *const txtname) > +{ > + Elf_Shdr *const shdr0 = get_shdr(ehdr); > + Elf_Sym const *sym0; > + char const *str0; > + Elf_Rel const *relp; > + Elf_Rela const *relap; > + Elf_Shdr const *const shdr = &shdr0[w(relhdr->sh_info)]; > + unsigned rel_entsize = w(relhdr->sh_entsize); > + unsigned const nrel = _w(relhdr->sh_size) / rel_entsize; > + int t; > + > + FUNC(get_sym_str_and_relp)(relhdr, ehdr, &sym0, &str0, &relp); > + > + for (t = nrel; t > 0; t -= 3) { > + int ret = -1; > + > + relap = (Elf_Rela const *)relp; > + printf("rel offset=%lx info=%lx sym=%lx type=%lx addend=%lx\n", > + (long)relap->r_offset, (long)relap->r_info, > + (long)ELF_R_SYM(relap->r_info), > + (long)ELF_R_TYPE(relap->r_info), > + (long)relap->r_addend); > + > + if (0 && make_nop) > + ret = make_nop((void *)ehdr, shdr->sh_offset + relp->r_offset); > + > + /* jump label sections are paired in threes */ > + relp = (Elf_Rel const *)(rel_entsize * 3 + (void *)relp); > + } > +} > + > +/* Evade ISO C restriction: no declaration after statement in has_rel_mcount. */ > +static char const * > +FUNC(__has_rel_jump_table)(Elf_Shdr const *const relhdr, /* is SHT_REL or SHT_RELA */ > + Elf_Shdr const *const shdr0, > + char const *const shstrtab, > + char const *const fname) > +{ > + /* .sh_info depends on .sh_type == SHT_REL[,A] */ > + Elf_Shdr const *const txthdr = &shdr0[w(relhdr->sh_info)]; > + char const *const txtname = &shstrtab[w(txthdr->sh_name)]; > + > + if (strcmp("__jump_table", txtname) == 0) { > + fprintf(stderr, "warning: __mcount_loc already exists: %s\n", > + fname); > +// succeed_file(); > + } > + if (w(txthdr->sh_type) != SHT_PROGBITS || > + !(w(txthdr->sh_flags) & SHF_EXECINSTR)) > + return NULL; > + return txtname; > +} > + > +static char const *FUNC(has_rel_jump_table)(Elf_Shdr const *const relhdr, > + Elf_Shdr const *const shdr0, > + char const *const shstrtab, > + char const *const fname) > +{ > + if (w(relhdr->sh_type) != SHT_REL && w(relhdr->sh_type) != SHT_RELA) > + return NULL; > + return FUNC(__has_rel_jump_table)(relhdr, shdr0, shstrtab, fname); > +} > + > +/* Find relocation section hdr for a given section */ > +static const Elf_Shdr * > +FUNC(find_relhdr)(const Elf_Ehdr *ehdr, const Elf_Shdr *shdr) > +{ > + const Elf_Shdr *shdr0 = get_shdr(ehdr); > + int nhdr = w2(ehdr->e_shnum); > + const Elf_Shdr *hdr; > + int i; > + > + for (hdr = shdr0, i = 0; i < nhdr; hdr = &shdr0[++i]) { > + if (w(hdr->sh_type) != SHT_REL && > + w(hdr->sh_type) != SHT_RELA) > + continue; > + > + /* > + * The relocation section''s info field holds > + * the section index that it represents. > + */ > + if (shdr == &shdr0[w(hdr->sh_info)]) > + return hdr; > + } > + return NULL; > +} > + > +/* Find a section headr based on name and type */ > +static const Elf_Shdr * > +FUNC(find_shdr)(const Elf_Ehdr *ehdr, const char *name, uint_t type) > +{ > + const Elf_Shdr *shdr0 = get_shdr(ehdr); > + const Elf_Shdr *shstr = &shdr0[w2(ehdr->e_shstrndx)]; > + const char *shstrtab = (char *)get_section_loc(ehdr, shstr); > + int nhdr = w2(ehdr->e_shnum); > + const Elf_Shdr *hdr; > + const char *hdrname; > + int i; > + > + for (hdr = shdr0, i = 0; i < nhdr; hdr = &shdr0[++i]) { > + if (w(hdr->sh_type) != type) > + continue; > + > + /* If we are just looking for a section by type (ie. SYMTAB) */ > + if (!name) > + return hdr; > + > + hdrname = &shstrtab[w(hdr->sh_name)]; > + if (strcmp(hdrname, name) == 0) > + return hdr; > + } > + return NULL; > +} > + > +static void > +FUNC(section_update)(const Elf_Ehdr *ehdr, const Elf_Shdr *symhdr, > + unsigned shtype, const Elf_Rel *rel, void *data) > +{ > + const Elf_Shdr *shdr0 = get_shdr(ehdr); > + const Elf_Shdr *targethdr; > + const Elf_Rela *rela; > + const Elf_Sym *syment; > + uint_t offset = _w(rel->r_offset); > + uint_t info = _w(rel->r_info); > + uint_t sym = ELF_R_SYM(info); > + uint_t type = ELF_R_TYPE(info); > + uint_t addend; > + uint_t targetloc; > + > + if (shtype == SHT_RELA) { > + rela = (const Elf_Rela *)rel; > + addend = _w(rela->r_addend); > + } else > + addend = _w(*(unsigned short *)(data + offset)); > + > + syment = (const Elf_Sym *)get_section_loc(ehdr, symhdr); > + targethdr = &shdr0[w2(syment[sym].st_shndx)]; > + targetloc = _w(targethdr->sh_offset); > + > + /* TODO, need a separate function for all archs */ > + if (type != R_386_32) > + die(NULL, "Arch relocation type %d not supported", type); > + > + targetloc += addend; > + > +#if 1 > + printf("offset=%x target=%x shoffset=%x add=%x\n", > + offset, targetloc, _w(targethdr->sh_offset), addend); > +#endif > + *(uint_t *)(data + offset) = targetloc; > +} > + > +/* Overall supervision for Elf32 ET_REL file. */ > +static void > +FUNC(do_func)(Elf_Ehdr *ehdr, char const *const fname, unsigned const reltype) > +{ > + const Elf_Shdr *jlshdr; > + const Elf_Shdr *jlrhdr; > + const Elf_Shdr *symhdr; > + const Elf_Rel *rel; > + unsigned size; > + unsigned cnt; > + unsigned i; > + uint_t type; > + void *jdata; > + void *data; > + > + jlshdr = FUNC(find_shdr)(ehdr, "__jump_table", SHT_PROGBITS); > + if (!jlshdr) > + return; > + > + jlrhdr = FUNC(find_relhdr)(ehdr, jlshdr); > + if (!jlrhdr) > + return; > + > + /* > + * Create and fill in the __jump_table section and use it to > + * find the offsets into the text that we want to update. > + * We create it so that we do not depend on the order of the > + * relocations, and use the table directly, as it is broken > + * up into sections. > + */ > + size = _w(jlshdr->sh_size); > + data = umalloc(size); > + > + jdata = (void *)get_section_loc(ehdr, jlshdr); > + memcpy(data, jdata, size); > + > + cnt = _w(jlrhdr->sh_size) / w(jlrhdr->sh_entsize); > + > + rel = (const Elf_Rel *)get_section_loc(ehdr, jlrhdr); > + > + /* Is this as Rel or Rela? */ > + type = w(jlrhdr->sh_type); > + > + symhdr = FUNC(find_shdr)(ehdr, NULL, SHT_SYMTAB); > + > + for (i = 0; i < cnt; i++) { > + FUNC(section_update)(ehdr, symhdr, type, rel, data); > + rel = (void *)rel + w(jlrhdr->sh_entsize); > + } > + > + /* > + * This is specific to x86. The jump_table is stored in three > + * long words. The first is the location of the jmp target we > + * must update. > + */ > + cnt = size / sizeof(uint_t); > + > + for (i = 0; i < cnt; i += 3) > + if (0)make_nop((void *)ehdr, *(uint_t *)(data + i * sizeof(uint_t))); > +hmmmm, isn''t this the line that actually writes in the no-ops? why isn''t it disabled?> + free(data); > +} > >Thanks again for doing this...I was still understanding recordmcount.c ;) -Jason _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Rostedt
2011-Oct-07  19:21 UTC
[Xen-devel] Re: [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps
On Fri, 2011-10-07 at 14:52 -0400, Jason Baron wrote:> On Fri, Oct 07, 2011 at 01:09:32PM -0400, Steven Rostedt wrote: > > Note, this is just hacked together and needs to be cleaned up. Please do > > not comment on formatting or other sloppiness of this patch. I know it''s > > sloppy and I left debug statements in. I want the comments to be on the > > idea of the patch. > > > > I created a new file called scripts/update_jump_label.[ch] based on some > > of the work of recordmcount.c. This is executed at build time on all > > object files just like recordmcount is. But it does not add any new > > sections, it just modifies the code at build time to convert all jump > > labels into nops. > > > > The idea is in arch/x86/include/asm/jump_label.h to not place a nop, but > > instead to insert a jmp to the label. Depending on how gcc optimizes the > > code, the jmp will be either end up being a 2 byte or 5 byte jump. > > > > After an object is compiled, update_jump_label is executed on this file > > and it reads the ELF relocation table to find the jump label locations > > and examines what jump was used. It then converts the jump into either a > > 2 byte or 5 byte nop that is appropriate. > > > > At boot time, the jump labels no longer need to be converted (although > > we may do so in the future to use better nops depending on the machine > > that is used). When jump labels are enabled, the code is examined to see > > if a two byte or 5 byte version was used, and the appropriate update is > > made. > > > > I just booted this patch and it worked. I was able to enable and disable > > trace points using jump labels. Benchmarks are welcomed :) > > > > Comments and thoughts? > > > > Generally, I really like it, I guess b/c I suggested it :)Yeah, I saw your suggestion about using jump for the asm. I didn''t read carefully enough if you suggested the build time changes or not. I thought about recordmcount at that moment, and decided to try it ;)> I''ll try and > run some workloads on it - A real simple one, I used recently was putting > a single jump label in ''getppid()'' and then calling it in a loop - I > wonder if the short nop vs long nop would show up there, as a baseline > test. (fwiw, the jump label vs. no jump label for this test was anywhere > b/w 1-5% improvement). > > Anyways, some comments below. > > > -- Steve > > > > Sloppy-signed-off-by: Steven Rostedt <rostedt@goodmis.org> > > > > diff --git a/Makefile b/Makefile > > index 31f967c..8368f42 100644 > > --- a/Makefile > > +++ b/Makefile > > @@ -245,7 +245,7 @@ CONFIG_SHELL := $(shell if [ -x "$$BASH" ]; then echo $$BASH; \ > > > > HOSTCC = gcc > > HOSTCXX = g++ > > -HOSTCFLAGS = -Wall -Wmissing-prototypes -Wstrict-prototypes -O2 -fomit-frame-pointer > > +HOSTCFLAGS = -Wall -Wmissing-prototypes -Wstrict-prototypes -g -fomit-frame-pointer > > HOSTCXXFLAGS = -O2 > > > > # Decide whether to build built-in, modular, or both. > > @@ -611,6 +611,13 @@ ifdef CONFIG_DYNAMIC_FTRACE > > endif > > endif > > > > +ifdef CONFIG_JUMP_LABEL > > + ifdef CONFIG_HAVE_BUILD_TIME_JUMP_LABEL > > + BUILD_UPDATE_JUMP_LABEL := y > > + export BUILD_UPDATE_JUMP_LABEL > > + endif > > +endif > > + > > # We trigger additional mismatches with less inlining > > ifdef CONFIG_DEBUG_SECTION_MISMATCH > > KBUILD_CFLAGS += $(call cc-option, -fno-inline-functions-called-once) > > diff --git a/arch/Kconfig b/arch/Kconfig > > index 4b0669c..8fa6934 100644 > > --- a/arch/Kconfig > > +++ b/arch/Kconfig > > @@ -169,6 +169,12 @@ config HAVE_PERF_EVENTS_NMI > > subsystem. Also has support for calculating CPU cycle events > > to determine how many clock cycles in a given period. > > > > +config HAVE_BUILD_TIME_JUMP_LABEL > > + bool > > + help > > + If an arch uses scripts/update_jump_label to patch in jump nops > > + at build time, then it must enable this option. > > + > > config HAVE_ARCH_JUMP_LABEL > > bool > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > index 6a47bb2..6de726a 100644 > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -61,6 +61,7 @@ config X86 > > select HAVE_ARCH_KMEMCHECK > > select HAVE_USER_RETURN_NOTIFIER > > select HAVE_ARCH_JUMP_LABEL > > + select HAVE_BUILD_TIME_JUMP_LABEL > > select HAVE_TEXT_POKE_SMP > > select HAVE_GENERIC_HARDIRQS > > select HAVE_SPARSE_IRQ > > diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h > > index a32b18c..872b3e1 100644 > > --- a/arch/x86/include/asm/jump_label.h > > +++ b/arch/x86/include/asm/jump_label.h > > @@ -14,7 +14,7 @@ > > static __always_inline bool arch_static_branch(struct jump_label_key *key) > > { > > asm goto("1:" > > - JUMP_LABEL_INITIAL_NOP > > + "jmp %l[l_yes]\n" > > ".pushsection __jump_table, \"aw\" \n\t" > > _ASM_ALIGN "\n\t" > > _ASM_PTR "1b, %l[l_yes], %c0 \n\t" > > diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c > > index 3fee346..1f7f88f 100644 > > --- a/arch/x86/kernel/jump_label.c > > +++ b/arch/x86/kernel/jump_label.c > > @@ -16,34 +16,75 @@ > > > > #ifdef HAVE_JUMP_LABEL > > > > +static unsigned char nop_short[] = { P6_NOP2 }; > > + > > union jump_code_union { > > char code[JUMP_LABEL_NOP_SIZE]; > > struct { > > char jump; > > int offset; > > } __attribute__((packed)); > > + struct { > > + char jump_short; > > + char offset_short; > > + } __attribute__((packed)); > > }; > > > > void arch_jump_label_transform(struct jump_entry *entry, > > enum jump_label_type type) > > { > > union jump_code_union code; > > + unsigned char op; > > + unsigned size; > > + unsigned char nop; > > + > > + /* Use probe_kernel_read()? */ > > + op = *(unsigned char *)entry->code; > > + nop = ideal_nops[NOP_ATOMIC5][0]; > > > > if (type == JUMP_LABEL_ENABLE) { > > - code.jump = 0xe9; > > - code.offset = entry->target - > > - (entry->code + JUMP_LABEL_NOP_SIZE); > > - } else > > - memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE); > > + if (op == 0xe9 || op == 0xeb) > > + /* Already enabled. Warn? */ > > + return; > > + > > Using the jump_label_inc/dec interface this shouldn''t happen, I would > have it be BUG > > > > + /* FIXME for all archs */ > > + if (op == nop_short[0]) { > > + size = 2; > > + code.jump_short = 0xeb; > > + code.offset = entry->target - > > + (entry->code + 2); > > + /* Check for overflow ? */ > > + } else if (op == nop) { > > + size = JUMP_LABEL_NOP_SIZE; > > + code.jump = 0xe9; > > + code.offset = entry->target - (entry->code + size); > > + } else > > + return; /* WARN ? */ > > same here, at least WARN, more likely BUG()I just don''t like using BUG(). BUG() means that if we continue we will corrupt the filesystem or make you go blind. WARN and returning here should not cause any harm and will even let those with X terminals see oops in /var/log/messages.> > > + > > + } else { > > + if (op == nop_short[0] || nop) > > + /* Already disabled, warn? */ > > + return; > > + > > same here. > > > + if (op == 0xe9) { > > + size = JUMP_LABEL_NOP_SIZE; > > + memcpy(&code, ideal_nops[NOP_ATOMIC5], size); > > + } else if (op == 0xeb) { > > + size = 2; > > + memcpy(&code, nop_short, size); > > + } else > > + return; /* WARN ? */ > > same hereWARN is better.> > > + } > > get_online_cpus(); > > mutex_lock(&text_mutex); > > - text_poke_smp((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE); > > + text_poke_smp((void *)entry->code, &code, size); > > mutex_unlock(&text_mutex); > > put_online_cpus(); > > } > > > > void arch_jump_label_text_poke_early(jump_label_t addr) > > { > > + return; > > text_poke_early((void *)addr, ideal_nops[NOP_ATOMIC5], > > JUMP_LABEL_NOP_SIZE); > > } > > hmmm...we spent a bunch of time selecting the ''ideal'' run-time noops I > wouldn''t want to drop that work.Read my change log again :)> > > diff --git a/scripts/Makefile b/scripts/Makefile > > index df7678f..738b65c 100644 > > --- a/scripts/Makefile > > +++ b/scripts/Makefile > > @@ -13,6 +13,7 @@ hostprogs-$(CONFIG_LOGO) += pnmtologo > > hostprogs-$(CONFIG_VT) += conmakehash > > hostprogs-$(CONFIG_IKCONFIG) += bin2c > > hostprogs-$(BUILD_C_RECORDMCOUNT) += recordmcount > > +hostprogs-$(BUILD_UPDATE_JUMP_LABEL) += update_jump_label > > > > always := $(hostprogs-y) $(hostprogs-m) > > > > diff --git a/scripts/Makefile.build b/scripts/Makefile.build > > index a0fd502..bc0d89b 100644 > > --- a/scripts/Makefile.build > > +++ b/scripts/Makefile.build > > @@ -258,6 +258,15 @@ cmd_modversions = \ > > fi; > > endif > > > > +ifdef BUILD_UPDATE_JUMP_LABEL > > +update_jump_label_source := $(srctree)/scripts/update_jump_label.c \ > > + $(srctree)/scripts/update_jump_label.h > > +cmd_update_jump_label = \ > > + if [ $(@) != "scripts/mod/empty.o" ]; then \ > > + $(objtree)/scripts/update_jump_label "$(@)"; \ > > + fi; > > +endif > > + > > ifdef CONFIG_FTRACE_MCOUNT_RECORD > > ifdef BUILD_C_RECORDMCOUNT > > ifeq ("$(origin RECORDMCOUNT_WARN)", "command line") > > @@ -294,6 +303,7 @@ define rule_cc_o_c > > $(cmd_modversions) \ > > $(call echo-cmd,record_mcount) \ > > $(cmd_record_mcount) \ > > + $(cmd_update_jump_label) \ > > scripts/basic/fixdep $(depfile) $@ ''$(call make-cmd,cc_o_c)'' > \ > > $(dot-target).tmp; \ > > rm -f $(depfile); \ > > @@ -301,13 +311,14 @@ define rule_cc_o_c > > endef > > > > # Built-in and composite module parts > > -$(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE > > +$(obj)/%.o: $(src)/%.c $(recordmcount_source) $(update_jump_label_source) FORCE > > $(call cmd,force_checksrc) > > $(call if_changed_rule,cc_o_c) > > > > # Single-part modules are special since we need to mark them in $(MODVERDIR) > > > > -$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE > > +$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) \ > > + $(update_jump_label_source) FORCE > > $(call cmd,force_checksrc) > > $(call if_changed_rule,cc_o_c) > > @{ echo $(@:.o=.ko); echo $@; } > $(MODVERDIR)/$(@F:.o=.mod) > > diff --git a/scripts/update_jump_label.c b/scripts/update_jump_label.c > > new file mode 100644 > > index 0000000..86e17bc> > + /* > > + * This is specific to x86. The jump_table is stored in three > > + * long words. The first is the location of the jmp target we > > + * must update. > > + */ > > + cnt = size / sizeof(uint_t); > > + > > + for (i = 0; i < cnt; i += 3) > > + if (0)make_nop((void *)ehdr, *(uint_t *)(data + i * sizeof(uint_t))); > > + > > hmmmm, isn''t this the line that actually writes in the no-ops? why isn''t > it disabled?*DOH!* Oh crap! I added that to dump more debug info. Let me make sure it actually works ;)> > > + free(data); > > +} > > > > > > Thanks again for doing this...I was still understanding recordmcount.c ;)And after this, I plan on giving recordmcount.c an overhaul so it is easier to read. Right now it is written with lots of micro-optimizations, with the sacrifice to simplicity. Modifying this code is nasty, and it needs to be read by mere mortals. I''ll go and test with actual modifications and see if it still works. -- Steve _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Rostedt
2011-Oct-07  19:33 UTC
[Xen-devel] Re: [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps
On Fri, 2011-10-07 at 15:21 -0400, Steven Rostedt wrote:> > > + /* > > > + * This is specific to x86. The jump_table is stored in three > > > + * long words. The first is the location of the jmp target we > > > + * must update. > > > + */ > > > + cnt = size / sizeof(uint_t); > > > + > > > + for (i = 0; i < cnt; i += 3) > > > + if (0)make_nop((void *)ehdr, *(uint_t *)(data + i * sizeof(uint_t)));I just compiled and booted the - if (0)make_nop((void *)ehdr, *(uint_t *)(data + i * sizeof(uint_t))); + make_nop((void *)ehdr, *(uint_t *)(data + i * sizeof(uint_t))); version, and it still works. Phew! -- Steve _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Oct-07  19:40 UTC
[Xen-devel] Re: [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps
On 10/07/2011 10:09 AM, Steven Rostedt wrote:> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c > index 3fee346..1f7f88f 100644 > --- a/arch/x86/kernel/jump_label.c > +++ b/arch/x86/kernel/jump_label.c > @@ -16,34 +16,75 @@ > > #ifdef HAVE_JUMP_LABEL > > +static unsigned char nop_short[] = { P6_NOP2 }; > + > union jump_code_union { > char code[JUMP_LABEL_NOP_SIZE]; > struct { > char jump; > int offset; > } __attribute__((packed)); > + struct { > + char jump_short; > + char offset_short; > + } __attribute__((packed)); > }; > > void arch_jump_label_transform(struct jump_entry *entry, > enum jump_label_type type) > { > union jump_code_union code; > + unsigned char op; > + unsigned size; > + unsigned char nop; > + > + /* Use probe_kernel_read()? */ > + op = *(unsigned char *)entry->code; > + nop = ideal_nops[NOP_ATOMIC5][0]; > > if (type == JUMP_LABEL_ENABLE) { > - code.jump = 0xe9; > - code.offset = entry->target - > - (entry->code + JUMP_LABEL_NOP_SIZE); > - } else > - memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE); > + if (op == 0xe9 || op == 0xeb) > + /* Already enabled. Warn? */ > + return; > + > + /* FIXME for all archs */By "archs", do you mean different x86 variants?> + if (op == nop_short[0]) {My gut feeling is that all this "trying to determine the jump size by sniffing the instruction" stuff seems pretty fragile. Couldn''t you store the jump size in the jump_label structure (even as a bit hidden away somewhere)? J> + size = 2; > + code.jump_short = 0xeb; > + code.offset = entry->target - > + (entry->code + 2); > + /* Check for overflow ? */ > + } else if (op == nop) { > + size = JUMP_LABEL_NOP_SIZE; > + code.jump = 0xe9; > + code.offset = entry->target - (entry->code + size); > + } else > + return; /* WARN ? */ > + > + } else { > + if (op == nop_short[0] || nop) > + /* Already disabled, warn? */ > + return; > + > + if (op == 0xe9) { > + size = JUMP_LABEL_NOP_SIZE; > + memcpy(&code, ideal_nops[NOP_ATOMIC5], size); > + } else if (op == 0xeb) { > + size = 2; > + memcpy(&code, nop_short, size); > + } else > + return; /* WARN ? */ > + } > get_online_cpus(); > mutex_lock(&text_mutex); > - text_poke_smp((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE); > + text_poke_smp((void *)entry->code, &code, size); > mutex_unlock(&text_mutex); > put_online_cpus(); > } > > void arch_jump_label_text_poke_early(jump_label_t addr) > { > + return; > text_poke_early((void *)addr, ideal_nops[NOP_ATOMIC5], > JUMP_LABEL_NOP_SIZE); > } > diff --git a/scripts/Makefile b/scripts/Makefile > index df7678f..738b65c 100644 > --- a/scripts/Makefile > +++ b/scripts/Makefile > @@ -13,6 +13,7 @@ hostprogs-$(CONFIG_LOGO) += pnmtologo > hostprogs-$(CONFIG_VT) += conmakehash > hostprogs-$(CONFIG_IKCONFIG) += bin2c > hostprogs-$(BUILD_C_RECORDMCOUNT) += recordmcount > +hostprogs-$(BUILD_UPDATE_JUMP_LABEL) += update_jump_label > > always := $(hostprogs-y) $(hostprogs-m) > > diff --git a/scripts/Makefile.build b/scripts/Makefile.build > index a0fd502..bc0d89b 100644 > --- a/scripts/Makefile.build > +++ b/scripts/Makefile.build > @@ -258,6 +258,15 @@ cmd_modversions = \ > fi; > endif > > +ifdef BUILD_UPDATE_JUMP_LABEL > +update_jump_label_source := $(srctree)/scripts/update_jump_label.c \ > + $(srctree)/scripts/update_jump_label.h > +cmd_update_jump_label = \ > + if [ $(@) != "scripts/mod/empty.o" ]; then \ > + $(objtree)/scripts/update_jump_label "$(@)"; \ > + fi; > +endif > + > ifdef CONFIG_FTRACE_MCOUNT_RECORD > ifdef BUILD_C_RECORDMCOUNT > ifeq ("$(origin RECORDMCOUNT_WARN)", "command line") > @@ -294,6 +303,7 @@ define rule_cc_o_c > $(cmd_modversions) \ > $(call echo-cmd,record_mcount) \ > $(cmd_record_mcount) \ > + $(cmd_update_jump_label) \ > scripts/basic/fixdep $(depfile) $@ ''$(call make-cmd,cc_o_c)'' > \ > $(dot-target).tmp; \ > rm -f $(depfile); \ > @@ -301,13 +311,14 @@ define rule_cc_o_c > endef > > # Built-in and composite module parts > -$(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE > +$(obj)/%.o: $(src)/%.c $(recordmcount_source) $(update_jump_label_source) FORCE > $(call cmd,force_checksrc) > $(call if_changed_rule,cc_o_c) > > # Single-part modules are special since we need to mark them in $(MODVERDIR) > > -$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) FORCE > +$(single-used-m): $(obj)/%.o: $(src)/%.c $(recordmcount_source) \ > + $(update_jump_label_source) FORCE > $(call cmd,force_checksrc) > $(call if_changed_rule,cc_o_c) > @{ echo $(@:.o=.ko); echo $@; } > $(MODVERDIR)/$(@F:.o=.mod) > diff --git a/scripts/update_jump_label.c b/scripts/update_jump_label.c > new file mode 100644 > index 0000000..86e17bc > --- /dev/null > +++ b/scripts/update_jump_label.c > @@ -0,0 +1,349 @@ > +/* > + * update_jump_label.c: replace jmps with nops at compile time. > + * Copyright 2010 Steven Rostedt <srostedt@redhat.com>, Red Hat Inc. > + * Parsing of the elf file was influenced by recordmcount.c > + * originally written by and copyright to John F. Reiser <jreiser@BitWagon.com>. > + */ > + > +/* > + * Note, this code is originally designed for x86, but may be used by > + * other archs to do the nop updates at compile time instead of at boot time. > + * X86 uses this as an optimization, as jmps can be either 2 bytes or 5 bytes. > + * Inserting a 2 byte where possible helps with both CPU performance and > + * icache strain. > + */ > +#include <sys/types.h> > +#include <sys/mman.h> > +#include <sys/stat.h> > +#include <getopt.h> > +#include <elf.h> > +#include <fcntl.h> > +#include <setjmp.h> > +#include <stdio.h> > +#include <stdlib.h> > +#include <stdarg.h> > +#include <string.h> > +#include <unistd.h> > + > +static int fd_map; /* File descriptor for file being modified. */ > +static struct stat sb; /* Remember .st_size, etc. */ > +static int mmap_failed; /* Boolean flag. */ > + > +static void die(const char *err, const char *fmt, ...) > +{ > + va_list ap; > + > + if (err) > + perror(err); > + > + if (fmt) { > + va_start(ap, fmt); > + fprintf(stderr, "Fatal error: "); > + vfprintf(stderr, fmt, ap); > + fprintf(stderr, "\n"); > + va_end(ap); > + } > + > + exit(1); > +} > + > +static void usage(char **argv) > +{ > + char *arg = argv[0]; > + char *p = arg+strlen(arg); > + > + while (p >= arg && *p != ''/'') > + p--; > + p++; > + > + printf("usage: %s file\n" > + "\n",p); > + exit(-1); > +} > + > +/* w8rev, w8nat, ...: Handle endianness. */ > + > +static uint64_t w8rev(uint64_t const x) > +{ > + return ((0xff & (x >> (0 * 8))) << (7 * 8)) > + | ((0xff & (x >> (1 * 8))) << (6 * 8)) > + | ((0xff & (x >> (2 * 8))) << (5 * 8)) > + | ((0xff & (x >> (3 * 8))) << (4 * 8)) > + | ((0xff & (x >> (4 * 8))) << (3 * 8)) > + | ((0xff & (x >> (5 * 8))) << (2 * 8)) > + | ((0xff & (x >> (6 * 8))) << (1 * 8)) > + | ((0xff & (x >> (7 * 8))) << (0 * 8)); > +} > + > +static uint32_t w4rev(uint32_t const x) > +{ > + return ((0xff & (x >> (0 * 8))) << (3 * 8)) > + | ((0xff & (x >> (1 * 8))) << (2 * 8)) > + | ((0xff & (x >> (2 * 8))) << (1 * 8)) > + | ((0xff & (x >> (3 * 8))) << (0 * 8)); > +} > + > +static uint32_t w2rev(uint16_t const x) > +{ > + return ((0xff & (x >> (0 * 8))) << (1 * 8)) > + | ((0xff & (x >> (1 * 8))) << (0 * 8)); > +} > + > +static uint64_t w8nat(uint64_t const x) > +{ > + return x; > +} > + > +static uint32_t w4nat(uint32_t const x) > +{ > + return x; > +} > + > +static uint32_t w2nat(uint16_t const x) > +{ > + return x; > +} > + > +static uint64_t (*w8)(uint64_t); > +static uint32_t (*w)(uint32_t); > +static uint32_t (*w2)(uint16_t); > + > +/* ulseek, uread, ...: Check return value for errors. */ > + > +static off_t > +ulseek(int const fd, off_t const offset, int const whence) > +{ > + off_t const w = lseek(fd, offset, whence); > + if (w == (off_t)-1) > + die("lseek", NULL); > + > + return w; > +} > + > +static size_t > +uread(int const fd, void *const buf, size_t const count) > +{ > + size_t const n = read(fd, buf, count); > + if (n != count) > + die("read", NULL); > + > + return n; > +} > + > +static size_t > +uwrite(int const fd, void const *const buf, size_t const count) > +{ > + size_t const n = write(fd, buf, count); > + if (n != count) > + die("write", NULL); > + > + return n; > +} > + > +static void * > +umalloc(size_t size) > +{ > + void *const addr = malloc(size); > + if (addr == 0) > + die("malloc", "malloc failed: %zu bytes\n", size); > + > + return addr; > +} > + > +/* > + * Get the whole file as a programming convenience in order to avoid > + * malloc+lseek+read+free of many pieces. If successful, then mmap > + * avoids copying unused pieces; else just read the whole file. > + * Open for both read and write; new info will be appended to the file. > + * Use MAP_PRIVATE so that a few changes to the in-memory ElfXX_Ehdr > + * do not propagate to the file until an explicit overwrite at the last. > + * This preserves most aspects of consistency (all except .st_size) > + * for simultaneous readers of the file while we are appending to it. > + * However, multiple writers still are bad. We choose not to use > + * locking because it is expensive and the use case of kernel build > + * makes multiple writers unlikely. > + */ > +static void *mmap_file(char const *fname) > +{ > + void *addr; > + > + fd_map = open(fname, O_RDWR); > + if (fd_map < 0 || fstat(fd_map, &sb) < 0) > + die(fname, "failed to open file"); > + > + if (!S_ISREG(sb.st_mode)) > + die(NULL, "not a regular file: %s\n", fname); > + > + addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE, > + fd_map, 0); > + > + mmap_failed = 0; > + if (addr == MAP_FAILED) { > + mmap_failed = 1; > + addr = umalloc(sb.st_size); > + uread(fd_map, addr, sb.st_size); > + } > + return addr; > +} > + > +static void munmap_file(void *addr) > +{ > + if (!mmap_failed) > + munmap(addr, sb.st_size); > + else > + free(addr); > + close(fd_map); > +} > + > +static unsigned char ideal_nop5_x86_64[5] = { 0x0f, 0x1f, 0x44, 0x00, 0x00 }; > +static unsigned char ideal_nop5_x86_32[5] = { 0x3e, 0x8d, 0x74, 0x26, 0x00 }; > +static unsigned char ideal_nop2_x86[2] = { 0x66, 0x99 }; > +static unsigned char *ideal_nop; > + > +static int (*make_nop)(void *map, size_t const offset); > + > +static int make_nop_x86(void *map, size_t const offset) > +{ > + unsigned char *op; > + unsigned char *nop; > + int size; > + > + /* Determine which type of jmp this is 2 byte or 5. */ > + op = map + offset; > + switch (*op) { > + case 0xeb: /* 2 byte */ > + size = 2; > + nop = ideal_nop2_x86; > + break; > + case 0xe9: /* 5 byte */ > + size = 5; > + nop = ideal_nop; > + break; > + default: > + die(NULL, "Bad jump label section\n"); > + } > + > + /* convert to nop */ > + ulseek(fd_map, offset, SEEK_SET); > + uwrite(fd_map, nop, size); > + return 0; > +} > + > +/* 32 bit and 64 bit are very similar */ > +#include "update_jump_label.h" > +#define UPDATE_JUMP_LABEL_64 > +#include "update_jump_label.h" > + > +static int do_file(const char *fname) > +{ > + Elf32_Ehdr *const ehdr = mmap_file(fname); > + unsigned int reltype = 0; > + > + w = w4nat; > + w2 = w2nat; > + w8 = w8nat; > + switch (ehdr->e_ident[EI_DATA]) { > + static unsigned int const endian = 1; > + default: > + die(NULL, "unrecognized ELF data encoding %d: %s\n", > + ehdr->e_ident[EI_DATA], fname); > + break; > + case ELFDATA2LSB: > + if (*(unsigned char const *)&endian != 1) { > + /* main() is big endian, file.o is little endian. */ > + w = w4rev; > + w2 = w2rev; > + w8 = w8rev; > + } > + break; > + case ELFDATA2MSB: > + if (*(unsigned char const *)&endian != 0) { > + /* main() is little endian, file.o is big endian. */ > + w = w4rev; > + w2 = w2rev; > + w8 = w8rev; > + } > + break; > + } /* end switch */ > + > + if (memcmp(ELFMAG, ehdr->e_ident, SELFMAG) != 0 || > + w2(ehdr->e_type) != ET_REL || > + ehdr->e_ident[EI_VERSION] != EV_CURRENT) > + die(NULL, "unrecognized ET_REL file %s\n", fname); > + > + switch (w2(ehdr->e_machine)) { > + default: > + die(NULL, "unrecognized e_machine %d %s\n", > + w2(ehdr->e_machine), fname); > + break; > + case EM_386: > + reltype = R_386_32; > + make_nop = make_nop_x86; > + ideal_nop = ideal_nop5_x86_32; > + break; > + case EM_ARM: reltype = R_ARM_ABS32; > + break; > + case EM_IA_64: reltype = R_IA64_IMM64; break; > + case EM_MIPS: /* reltype: e_class */ break; > + case EM_PPC: reltype = R_PPC_ADDR32; break; > + case EM_PPC64: reltype = R_PPC64_ADDR64; break; > + case EM_S390: /* reltype: e_class */ break; > + case EM_SH: reltype = R_SH_DIR32; break; > + case EM_SPARCV9: reltype = R_SPARC_64; break; > + case EM_X86_64: > + make_nop = make_nop_x86; > + ideal_nop = ideal_nop5_x86_64; > + reltype = R_X86_64_64; > + break; > + } /* end switch */ > + > + switch (ehdr->e_ident[EI_CLASS]) { > + default: > + die(NULL, "unrecognized ELF class %d %s\n", > + ehdr->e_ident[EI_CLASS], fname); > + break; > + case ELFCLASS32: > + if (w2(ehdr->e_ehsize) != sizeof(Elf32_Ehdr) > + || w2(ehdr->e_shentsize) != sizeof(Elf32_Shdr)) > + die(NULL, "unrecognized ET_REL file: %s\n", fname); > + > + if (w2(ehdr->e_machine) == EM_S390) { > + reltype = R_390_32; > + } > + if (w2(ehdr->e_machine) == EM_MIPS) { > + reltype = R_MIPS_32; > + } > + do_func32(ehdr, fname, reltype); > + break; > + case ELFCLASS64: { > + Elf64_Ehdr *const ghdr = (Elf64_Ehdr *)ehdr; > + if (w2(ghdr->e_ehsize) != sizeof(Elf64_Ehdr) > + || w2(ghdr->e_shentsize) != sizeof(Elf64_Shdr)) > + die(NULL, "unrecognized ET_REL file: %s\n", fname); > + > + if (w2(ghdr->e_machine) == EM_S390) > + reltype = R_390_64; > + > +#if 0 > + if (w2(ghdr->e_machine) == EM_MIPS) { > + reltype = R_MIPS_64; > + Elf64_r_sym = MIPS64_r_sym; > + } > +#endif > + do_func64(ghdr, fname, reltype); > + break; > + } > + } /* end switch */ > + > + munmap_file(ehdr); > + return 0; > +} > + > +int main (int argc, char **argv) > +{ > + if (argc != 2) > + usage(argv); > + > + return do_file(argv[1]); > +} > + > diff --git a/scripts/update_jump_label.h b/scripts/update_jump_label.h > new file mode 100644 > index 0000000..6ff9846 > --- /dev/null > +++ b/scripts/update_jump_label.h > @@ -0,0 +1,322 @@ > +/* > + * recordmcount.h > + * > + * This code was taken out of recordmcount.c written by > + * Copyright 2009 John F. Reiser <jreiser@BitWagon.com>. All rights reserved. > + * > + * The original code had the same algorithms for both 32bit > + * and 64bit ELF files, but the code was duplicated to support > + * the difference in structures that were used. This > + * file creates a macro of everything that is different between > + * the 64 and 32 bit code, such that by including this header > + * twice we can create both sets of functions by including this > + * header once with RECORD_MCOUNT_64 undefined, and again with > + * it defined. > + * > + * This conversion to macros was done by: > + * Copyright 2010 Steven Rostedt <srostedt@redhat.com>, Red Hat Inc. > + * > + * Licensed under the GNU General Public License, version 2 (GPLv2). > + */ > + > +#undef EBITS > +#undef _w > +#undef _align > +#undef _size > + > +#ifdef UPDATE_JUMP_LABEL_64 > +# define EBITS 64 > +# define _w w8 > +# define _align 7u > +# define _size 8 > +#else > +# define EBITS 32 > +# define _w w > +# define _align 3u > +# define _size 4 > +#endif > + > +#define _FBITS(x, e) x##e > +#define FBITS(x, e) _FBITS(x,e) > +#define FUNC(x) FBITS(x,EBITS) > + > +#undef Elf_Addr > +#undef Elf_Ehdr > +#undef Elf_Shdr > +#undef Elf_Rel > +#undef Elf_Rela > +#undef Elf_Sym > +#undef ELF_R_SYM > +#undef ELF_R_TYPE > + > +#define __ATTACH(x,y,z) x##y##z > +#define ATTACH(x,y,z) __ATTACH(x,y,z) > + > +#define Elf_Addr ATTACH(Elf,EBITS,_Addr) > +#define Elf_Ehdr ATTACH(Elf,EBITS,_Ehdr) > +#define Elf_Shdr ATTACH(Elf,EBITS,_Shdr) > +#define Elf_Rel ATTACH(Elf,EBITS,_Rel) > +#define Elf_Rela ATTACH(Elf,EBITS,_Rela) > +#define Elf_Sym ATTACH(Elf,EBITS,_Sym) > +#define uint_t ATTACH(uint,EBITS,_t) > +#define ELF_R_SYM ATTACH(ELF,EBITS,_R_SYM) > +#define ELF_R_TYPE ATTACH(ELF,EBITS,_R_TYPE) > + > +#undef get_shdr > +#define get_shdr(ehdr) ((Elf_Shdr *)(_w((ehdr)->e_shoff) + (void *)(ehdr))) > + > +#undef get_section_loc > +#define get_section_loc(ehdr, shdr)(_w((shdr)->sh_offset) + (void *)(ehdr)) > + > +/* Functions and pointers that do_file() may override for specific e_machine. */ > + > +#if 0 > +static uint_t FUNC(fn_ELF_R_SYM)(Elf_Rel const *rp) > +{ > + return ELF_R_SYM(_w(rp->r_info)); > +} > +static uint_t (*FUNC(Elf_r_sym))(Elf_Rel const *rp) = FUNC(fn_ELF_R_SYM); > +#endif > + > +static void FUNC(get_sym_str_and_relp)(Elf_Shdr const *const relhdr, > + Elf_Ehdr const *const ehdr, > + Elf_Sym const **sym0, > + char const **str0, > + Elf_Rel const **relp) > +{ > + Elf_Shdr *const shdr0 = get_shdr(ehdr); > + unsigned const symsec_sh_link = w(relhdr->sh_link); > + Elf_Shdr const *const symsec = &shdr0[symsec_sh_link]; > + Elf_Shdr const *const strsec = &shdr0[w(symsec->sh_link)]; > + Elf_Rel const *const rel0 > + (Elf_Rel const *)get_section_loc(ehdr, relhdr); > + > + *sym0 = (Elf_Sym const *)get_section_loc(ehdr, symsec); > + > + *str0 = (char const *)get_section_loc(ehdr, strsec); > + > + *relp = rel0; > +} > + > +/* > + * Read the relocation table again, but this time its called on sections > + * that are not going to be traced. The mcount calls here will be converted > + * into nops. > + */ > +static void FUNC(nop_jump_label)(Elf_Shdr const *const relhdr, > + Elf_Ehdr const *const ehdr, > + const char *const txtname) > +{ > + Elf_Shdr *const shdr0 = get_shdr(ehdr); > + Elf_Sym const *sym0; > + char const *str0; > + Elf_Rel const *relp; > + Elf_Rela const *relap; > + Elf_Shdr const *const shdr = &shdr0[w(relhdr->sh_info)]; > + unsigned rel_entsize = w(relhdr->sh_entsize); > + unsigned const nrel = _w(relhdr->sh_size) / rel_entsize; > + int t; > + > + FUNC(get_sym_str_and_relp)(relhdr, ehdr, &sym0, &str0, &relp); > + > + for (t = nrel; t > 0; t -= 3) { > + int ret = -1; > + > + relap = (Elf_Rela const *)relp; > + printf("rel offset=%lx info=%lx sym=%lx type=%lx addend=%lx\n", > + (long)relap->r_offset, (long)relap->r_info, > + (long)ELF_R_SYM(relap->r_info), > + (long)ELF_R_TYPE(relap->r_info), > + (long)relap->r_addend); > + > + if (0 && make_nop) > + ret = make_nop((void *)ehdr, shdr->sh_offset + relp->r_offset); > + > + /* jump label sections are paired in threes */ > + relp = (Elf_Rel const *)(rel_entsize * 3 + (void *)relp); > + } > +} > + > +/* Evade ISO C restriction: no declaration after statement in has_rel_mcount. */ > +static char const * > +FUNC(__has_rel_jump_table)(Elf_Shdr const *const relhdr, /* is SHT_REL or SHT_RELA */ > + Elf_Shdr const *const shdr0, > + char const *const shstrtab, > + char const *const fname) > +{ > + /* .sh_info depends on .sh_type == SHT_REL[,A] */ > + Elf_Shdr const *const txthdr = &shdr0[w(relhdr->sh_info)]; > + char const *const txtname = &shstrtab[w(txthdr->sh_name)]; > + > + if (strcmp("__jump_table", txtname) == 0) { > + fprintf(stderr, "warning: __mcount_loc already exists: %s\n", > + fname); > +// succeed_file(); > + } > + if (w(txthdr->sh_type) != SHT_PROGBITS || > + !(w(txthdr->sh_flags) & SHF_EXECINSTR)) > + return NULL; > + return txtname; > +} > + > +static char const *FUNC(has_rel_jump_table)(Elf_Shdr const *const relhdr, > + Elf_Shdr const *const shdr0, > + char const *const shstrtab, > + char const *const fname) > +{ > + if (w(relhdr->sh_type) != SHT_REL && w(relhdr->sh_type) != SHT_RELA) > + return NULL; > + return FUNC(__has_rel_jump_table)(relhdr, shdr0, shstrtab, fname); > +} > + > +/* Find relocation section hdr for a given section */ > +static const Elf_Shdr * > +FUNC(find_relhdr)(const Elf_Ehdr *ehdr, const Elf_Shdr *shdr) > +{ > + const Elf_Shdr *shdr0 = get_shdr(ehdr); > + int nhdr = w2(ehdr->e_shnum); > + const Elf_Shdr *hdr; > + int i; > + > + for (hdr = shdr0, i = 0; i < nhdr; hdr = &shdr0[++i]) { > + if (w(hdr->sh_type) != SHT_REL && > + w(hdr->sh_type) != SHT_RELA) > + continue; > + > + /* > + * The relocation section''s info field holds > + * the section index that it represents. > + */ > + if (shdr == &shdr0[w(hdr->sh_info)]) > + return hdr; > + } > + return NULL; > +} > + > +/* Find a section headr based on name and type */ > +static const Elf_Shdr * > +FUNC(find_shdr)(const Elf_Ehdr *ehdr, const char *name, uint_t type) > +{ > + const Elf_Shdr *shdr0 = get_shdr(ehdr); > + const Elf_Shdr *shstr = &shdr0[w2(ehdr->e_shstrndx)]; > + const char *shstrtab = (char *)get_section_loc(ehdr, shstr); > + int nhdr = w2(ehdr->e_shnum); > + const Elf_Shdr *hdr; > + const char *hdrname; > + int i; > + > + for (hdr = shdr0, i = 0; i < nhdr; hdr = &shdr0[++i]) { > + if (w(hdr->sh_type) != type) > + continue; > + > + /* If we are just looking for a section by type (ie. SYMTAB) */ > + if (!name) > + return hdr; > + > + hdrname = &shstrtab[w(hdr->sh_name)]; > + if (strcmp(hdrname, name) == 0) > + return hdr; > + } > + return NULL; > +} > + > +static void > +FUNC(section_update)(const Elf_Ehdr *ehdr, const Elf_Shdr *symhdr, > + unsigned shtype, const Elf_Rel *rel, void *data) > +{ > + const Elf_Shdr *shdr0 = get_shdr(ehdr); > + const Elf_Shdr *targethdr; > + const Elf_Rela *rela; > + const Elf_Sym *syment; > + uint_t offset = _w(rel->r_offset); > + uint_t info = _w(rel->r_info); > + uint_t sym = ELF_R_SYM(info); > + uint_t type = ELF_R_TYPE(info); > + uint_t addend; > + uint_t targetloc; > + > + if (shtype == SHT_RELA) { > + rela = (const Elf_Rela *)rel; > + addend = _w(rela->r_addend); > + } else > + addend = _w(*(unsigned short *)(data + offset)); > + > + syment = (const Elf_Sym *)get_section_loc(ehdr, symhdr); > + targethdr = &shdr0[w2(syment[sym].st_shndx)]; > + targetloc = _w(targethdr->sh_offset); > + > + /* TODO, need a separate function for all archs */ > + if (type != R_386_32) > + die(NULL, "Arch relocation type %d not supported", type); > + > + targetloc += addend; > + > +#if 1 > + printf("offset=%x target=%x shoffset=%x add=%x\n", > + offset, targetloc, _w(targethdr->sh_offset), addend); > +#endif > + *(uint_t *)(data + offset) = targetloc; > +} > + > +/* Overall supervision for Elf32 ET_REL file. */ > +static void > +FUNC(do_func)(Elf_Ehdr *ehdr, char const *const fname, unsigned const reltype) > +{ > + const Elf_Shdr *jlshdr; > + const Elf_Shdr *jlrhdr; > + const Elf_Shdr *symhdr; > + const Elf_Rel *rel; > + unsigned size; > + unsigned cnt; > + unsigned i; > + uint_t type; > + void *jdata; > + void *data; > + > + jlshdr = FUNC(find_shdr)(ehdr, "__jump_table", SHT_PROGBITS); > + if (!jlshdr) > + return; > + > + jlrhdr = FUNC(find_relhdr)(ehdr, jlshdr); > + if (!jlrhdr) > + return; > + > + /* > + * Create and fill in the __jump_table section and use it to > + * find the offsets into the text that we want to update. > + * We create it so that we do not depend on the order of the > + * relocations, and use the table directly, as it is broken > + * up into sections. > + */ > + size = _w(jlshdr->sh_size); > + data = umalloc(size); > + > + jdata = (void *)get_section_loc(ehdr, jlshdr); > + memcpy(data, jdata, size); > + > + cnt = _w(jlrhdr->sh_size) / w(jlrhdr->sh_entsize); > + > + rel = (const Elf_Rel *)get_section_loc(ehdr, jlrhdr); > + > + /* Is this as Rel or Rela? */ > + type = w(jlrhdr->sh_type); > + > + symhdr = FUNC(find_shdr)(ehdr, NULL, SHT_SYMTAB); > + > + for (i = 0; i < cnt; i++) { > + FUNC(section_update)(ehdr, symhdr, type, rel, data); > + rel = (void *)rel + w(jlrhdr->sh_entsize); > + } > + > + /* > + * This is specific to x86. The jump_table is stored in three > + * long words. The first is the location of the jmp target we > + * must update. > + */ > + cnt = size / sizeof(uint_t); > + > + for (i = 0; i < cnt; i += 3) > + if (0)make_nop((void *)ehdr, *(uint_t *)(data + i * sizeof(uint_t))); > + > + free(data); > +} > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Rostedt
2011-Oct-07  19:58 UTC
[Xen-devel] Re: [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps
On Fri, 2011-10-07 at 12:40 -0700, Jeremy Fitzhardinge wrote:> On 10/07/2011 10:09 AM, Steven Rostedt wrote: > > diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c > > index 3fee346..1f7f88f 100644 > > --- a/arch/x86/kernel/jump_label.c > > +++ b/arch/x86/kernel/jump_label.c > > @@ -16,34 +16,75 @@ > > > > #ifdef HAVE_JUMP_LABEL > > > > +static unsigned char nop_short[] = { P6_NOP2 }; > > + > > union jump_code_union { > > char code[JUMP_LABEL_NOP_SIZE]; > > struct { > > char jump; > > int offset; > > } __attribute__((packed)); > > + struct { > > + char jump_short; > > + char offset_short; > > + } __attribute__((packed)); > > }; > > > > void arch_jump_label_transform(struct jump_entry *entry, > > enum jump_label_type type) > > { > > union jump_code_union code; > > + unsigned char op; > > + unsigned size; > > + unsigned char nop; > > + > > + /* Use probe_kernel_read()? */ > > + op = *(unsigned char *)entry->code; > > + nop = ideal_nops[NOP_ATOMIC5][0]; > > > > if (type == JUMP_LABEL_ENABLE) { > > - code.jump = 0xe9; > > - code.offset = entry->target - > > - (entry->code + JUMP_LABEL_NOP_SIZE); > > - } else > > - memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE); > > + if (op == 0xe9 || op == 0xeb) > > + /* Already enabled. Warn? */ > > + return; > > + > > + /* FIXME for all archs */ > > By "archs", do you mean different x86 variants?Yeah, that was a confusing use of archs. This was to make sure it works for all nops for different variants of x86.> > > + if (op == nop_short[0]) { > > My gut feeling is that all this "trying to determine the jump size by > sniffing the instruction" stuff seems pretty fragile. Couldn''t you > store the jump size in the jump_label structure (even as a bit hidden > away somewhere)?We could but it''s not as fragile as you think. This is machine code, and it should be a jump or not. I could add more checks, that is, to look at the full nop to make sure it is truly a nop. But for the jump side, a byte instruction that starts with e9 is definitely a jump. I could harden this more like what we do with mcount updates in the function tracer. I actually calculate what I expect to be there before looking at what is there. The entire instruction is checked. If it does not match, then we fail and give big warnings about it. Other than that, it should be quite solid. If we don''t get a match, we should warn and disable jump labels. No BUG()! -- Steve _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Peter Zijlstra
2011-Oct-07  20:04 UTC
[Xen-devel] Re: [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps
On Fri, 2011-10-07 at 12:40 -0700, Jeremy Fitzhardinge wrote:> Jcan we please, pretty please start trimming email replies? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
H. Peter Anvin
2011-Oct-07  21:48 UTC
[Xen-devel] Re: [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps
On 10/07/2011 12:21 PM, Steven Rostedt wrote:>> >> same here, at least WARN, more likely BUG() > > I just don''t like using BUG(). BUG() means that if we continue we will > corrupt the filesystem or make you go blind. WARN and returning here > should not cause any harm and will even let those with X terminals see > oops in /var/log/messages. >Uh, NO. If this is wrong something in the kernel code stream is corrupted (heck, you might just have caught a rootkit!) Die. NOW. -hpa _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Rostedt
2011-Oct-07  22:00 UTC
[Xen-devel] Re: [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps
On Fri, 2011-10-07 at 14:48 -0700, H. Peter Anvin wrote:> On 10/07/2011 12:21 PM, Steven Rostedt wrote: > >> > >> same here, at least WARN, more likely BUG() > > > > I just don''t like using BUG(). BUG() means that if we continue we will > > corrupt the filesystem or make you go blind. WARN and returning here > > should not cause any harm and will even let those with X terminals see > > oops in /var/log/messages. > > > > Uh, NO. > > If this is wrong something in the kernel code stream is corrupted (heck, > you might just have caught a rootkit!) > > Die. NOW.Ouch, quite shaken by k.org? I guess I should have substituted go blind with being hacked. The thing is, it may be as simple as an out of tree module screwing up the jump table. Or worse, gcc not doing things that we did not expect. If this is the case, jump labels can be disabled from modifying code. But if we just want to do the BUG() case, this will be a big hammer to the code and we just prevent any further progress until the issue is addressed. Which may be tell people to disable jump labels in their code, or use a different compiler. Currently ftrace takes the approach to WARN() and disable itself when it finds an anomaly from what it expects to modify. The times this has triggered has been either a problem with writing to the code, due to securities preventing code modification, or the scan of the relocation tables mistook a data point as code. The later I could foresee happening with jump labels. -- Steve _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
H. Peter Anvin
2011-Oct-07  22:03 UTC
[Xen-devel] Re: [PATCH][RFC] jump_labels/x86: Use either 5 byte or 2 byte jumps
On 10/07/2011 03:00 PM, Steven Rostedt wrote:> > Ouch, quite shaken by k.org? I guess I should have substituted go blind > with being hacked. >Well, yes, but I would have said exactly the same thing before.> The thing is, it may be as simple as an out of tree module screwing up > the jump table. Or worse, gcc not doing things that we did not expect. > If this is the case, jump labels can be disabled from modifying code. > > But if we just want to do the BUG() case, this will be a big hammer to > the code and we just prevent any further progress until the issue is > addressed. Which may be tell people to disable jump labels in their > code, or use a different compiler.That is EXACTLY what should happen. Something is wrong to the point of the kernel is *known* to be executing the wrong code. That is an extremely serious condition and should be treated as such. If you want, you could have a debug option to demote this to WARN, but I really don''t want to see it by default. -hpa _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jason Baron
2011-Oct-10  15:36 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On Sat, Oct 01, 2011 at 02:55:35PM -0700, Jeremy Fitzhardinge wrote:> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > > If a key has been enabled before jump_label_init() is called, don''t > nop it out. > > This removes arch_jump_label_text_poke_early() (which can only nop > out a site) and uses arch_jump_label_transform() instead. > > Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > --- > include/linux/jump_label.h | 3 ++- > kernel/jump_label.c | 20 ++++++++------------ > 2 files changed, 10 insertions(+), 13 deletions(-) > > diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h > index 1213e9d..c8fb1b3 100644 > --- a/include/linux/jump_label.h > +++ b/include/linux/jump_label.h > @@ -45,7 +45,8 @@ extern void jump_label_lock(void); > extern void jump_label_unlock(void); > extern void arch_jump_label_transform(struct jump_entry *entry, > enum jump_label_type type); > -extern void arch_jump_label_text_poke_early(jump_label_t addr); > +extern void arch_jump_label_transform_early(struct jump_entry *entry, > + enum jump_label_type type); > extern int jump_label_text_reserved(void *start, void *end); > extern void jump_label_inc(struct jump_label_key *key); > extern void jump_label_dec(struct jump_label_key *key); > diff --git a/kernel/jump_label.c b/kernel/jump_label.c > index a8ce450..059202d5 100644 > --- a/kernel/jump_label.c > +++ b/kernel/jump_label.c > @@ -121,13 +121,6 @@ static void __jump_label_update(struct jump_label_key *key, > } > } > > -/* > - * Not all archs need this. > - */ > -void __weak arch_jump_label_text_poke_early(jump_label_t addr) > -{ > -} > - > static __init int jump_label_init(void) > { > struct jump_entry *iter_start = __start___jump_table; > @@ -139,12 +132,15 @@ static __init int jump_label_init(void) > jump_label_sort_entries(iter_start, iter_stop); > > for (iter = iter_start; iter < iter_stop; iter++) { > - arch_jump_label_text_poke_early(iter->code); > - if (iter->key == (jump_label_t)(unsigned long)key) > + struct jump_label_key *iterk; > + > + iterk = (struct jump_label_key *)(unsigned long)iter->key; > + arch_jump_label_transform(iter, jump_label_enabled(iterk) ? > + JUMP_LABEL_ENABLE : JUMP_LABEL_DISABLE); > + if (iterk == key) > continue; > > - key = (struct jump_label_key *)(unsigned long)iter->key; > - atomic_set(&key->enabled, 0); > + key = iterk; > key->entries = iter; > #ifdef CONFIG_MODULES > key->next = NULL; > @@ -212,7 +208,7 @@ void jump_label_apply_nops(struct module *mod) > return; > > for (iter = iter_start; iter < iter_stop; iter++) > - arch_jump_label_text_poke_early(iter->code); > + arch_jump_label_transform(iter, JUMP_LABEL_DISABLE); > } > > static int jump_label_add_module(struct module *mod) > -- > 1.7.6.2 >Hi, I just realized that the early call to jump_label_inc(), isn''t being honored with this patch until later when we invoke jump_label_init(). That strikes me as being inconsistent. When jump_label_inc() returns we should expect the branch to be updated. Thus, I think what probably want is to add a new ''int jump_label_init'' flag. If its not set we can call ''jump_label_init()'' from jump_label_inc()/dec(). And jump_label_init() can avoid initialization if its already set. Thanks, -Jason _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2011-Oct-10  19:58 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On 10/10/2011 08:36 AM, Jason Baron wrote:> On Sat, Oct 01, 2011 at 02:55:35PM -0700, Jeremy Fitzhardinge wrote: >> From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> >> >> If a key has been enabled before jump_label_init() is called, don''t >> nop it out. >> >> This removes arch_jump_label_text_poke_early() (which can only nop >> out a site) and uses arch_jump_label_transform() instead. >> >> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> >> --- >> include/linux/jump_label.h | 3 ++- >> kernel/jump_label.c | 20 ++++++++------------ >> 2 files changed, 10 insertions(+), 13 deletions(-) >> >> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h >> index 1213e9d..c8fb1b3 100644 >> --- a/include/linux/jump_label.h >> +++ b/include/linux/jump_label.h >> @@ -45,7 +45,8 @@ extern void jump_label_lock(void); >> extern void jump_label_unlock(void); >> extern void arch_jump_label_transform(struct jump_entry *entry, >> enum jump_label_type type); >> -extern void arch_jump_label_text_poke_early(jump_label_t addr); >> +extern void arch_jump_label_transform_early(struct jump_entry *entry, >> + enum jump_label_type type); >> extern int jump_label_text_reserved(void *start, void *end); >> extern void jump_label_inc(struct jump_label_key *key); >> extern void jump_label_dec(struct jump_label_key *key); >> diff --git a/kernel/jump_label.c b/kernel/jump_label.c >> index a8ce450..059202d5 100644 >> --- a/kernel/jump_label.c >> +++ b/kernel/jump_label.c >> @@ -121,13 +121,6 @@ static void __jump_label_update(struct jump_label_key *key, >> } >> } >> >> -/* >> - * Not all archs need this. >> - */ >> -void __weak arch_jump_label_text_poke_early(jump_label_t addr) >> -{ >> -} >> - >> static __init int jump_label_init(void) >> { >> struct jump_entry *iter_start = __start___jump_table; >> @@ -139,12 +132,15 @@ static __init int jump_label_init(void) >> jump_label_sort_entries(iter_start, iter_stop); >> >> for (iter = iter_start; iter < iter_stop; iter++) { >> - arch_jump_label_text_poke_early(iter->code); >> - if (iter->key == (jump_label_t)(unsigned long)key) >> + struct jump_label_key *iterk; >> + >> + iterk = (struct jump_label_key *)(unsigned long)iter->key; >> + arch_jump_label_transform(iter, jump_label_enabled(iterk) ? >> + JUMP_LABEL_ENABLE : JUMP_LABEL_DISABLE); >> + if (iterk == key) >> continue; >> >> - key = (struct jump_label_key *)(unsigned long)iter->key; >> - atomic_set(&key->enabled, 0); >> + key = iterk; >> key->entries = iter; >> #ifdef CONFIG_MODULES >> key->next = NULL; >> @@ -212,7 +208,7 @@ void jump_label_apply_nops(struct module *mod) >> return; >> >> for (iter = iter_start; iter < iter_stop; iter++) >> - arch_jump_label_text_poke_early(iter->code); >> + arch_jump_label_transform(iter, JUMP_LABEL_DISABLE); >> } >> >> static int jump_label_add_module(struct module *mod) >> -- >> 1.7.6.2 >> > Hi, > > I just realized that the early call to jump_label_inc(), isn''t being > honored with this patch until later when we invoke jump_label_init(). > That strikes me as being inconsistent. When jump_label_inc() returns we > should expect the branch to be updated.Why is that? It looks to me like it will unconditionally update the instruction, irrespective of whether _init() has been called?> Thus, I think what probably want is to add a new ''int jump_label_init'' > flag. If its not set we can call ''jump_label_init()'' from > jump_label_inc()/dec().Hm. I worry that it may end up calling jump_label_init() in an unexpected context, especially since it may well be config-dependent, or adding a jump_label_inc() later on starts mysteriously failing.> And jump_label_init() can avoid initialization > if its already set.That doesn''t seem worthwhile in itself. I suspect the number of "early" jump_label_incs will be very small (or we should look at doing the init earlier). J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jason Baron
2011-Oct-10  20:10 UTC
[Xen-devel] Re: [PATCH RFC V2 3/5] jump_label: if a key has already been initialized, don''t nop it out
On Mon, Oct 10, 2011 at 12:58:19PM -0700, Jeremy Fitzhardinge wrote:> > Hi, > > > > I just realized that the early call to jump_label_inc(), isn''t being > > honored with this patch until later when we invoke jump_label_init(). > > That strikes me as being inconsistent. When jump_label_inc() returns we > > should expect the branch to be updated. > > Why is that? It looks to me like it will unconditionally update the > instruction, irrespective of whether _init() has been called? >No. jump_label_init(), sets up key->entries, to point into the jump table...before that jump_label_update(), doesn''t know where the table is located, and will just return, without doing the update.> > Thus, I think what probably want is to add a new ''int jump_label_init'' > > flag. If its not set we can call ''jump_label_init()'' from > > jump_label_inc()/dec(). > > Hm. I worry that it may end up calling jump_label_init() in an > unexpected context, especially since it may well be config-dependent, or > adding a jump_label_inc() later on starts mysteriously failing.good point.> > > And jump_label_init() can avoid initialization > > if its already set. > > That doesn''t seem worthwhile in itself. I suspect the number of "early" > jump_label_incs will be very small (or we should look at doing the init > earlier). > > JI have it as ''early_initcall()'', but perhaps it should be moved into init/main.c. I don''t think there''s any reason it can''t be done super early. So I think this might be the best answer. It will also simplify your series. Thanks, -Jason _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel