similar to: [PATCH V11 04/17] locking/qspinlock: Improve xchg_tail for number of cpus >= 16k

Displaying 20 results from an estimated 1000 matches similar to: "[PATCH V11 04/17] locking/qspinlock: Improve xchg_tail for number of cpus >= 16k"

2014 Jun 15
0
[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS
From: Peter Zijlstra <peterz at infradead.org> When we allow for a max NR_CPUS < 2^14 we can optimize the pending wait-acquire and the xchg_tail() operations. By growing the pending bit to a byte, we reduce the tail to 16bit. This means we can use xchg16 for the tail part and do away with all the repeated compxchg() operations. This in turn allows us to unconditionally acquire; the
2014 Apr 17
0
[PATCH v9 05/19] qspinlock: Optimize for smaller NR_CPUS
When we allow for a max NR_CPUS < 2^14 we can optimize the pending wait-acquire and the xchg_tail() operations. By growing the pending bit to a byte, we reduce the tail to 16bit. This means we can use xchg16 for the tail part and do away with all the repeated compxchg() operations. This in turn allows us to unconditionally acquire; the locked state as observed by the wait loops cannot change.
2014 Jun 18
1
[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS
On Sun, Jun 15, 2014 at 02:47:02PM +0200, Peter Zijlstra wrote: > From: Peter Zijlstra <peterz at infradead.org> > > When we allow for a max NR_CPUS < 2^14 we can optimize the pending > wait-acquire and the xchg_tail() operations. > > By growing the pending bit to a byte, we reduce the tail to 16bit. > This means we can use xchg16 for the tail part and do away with
2014 Jun 18
1
[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS
On Sun, Jun 15, 2014 at 02:47:02PM +0200, Peter Zijlstra wrote: > From: Peter Zijlstra <peterz at infradead.org> > > When we allow for a max NR_CPUS < 2^14 we can optimize the pending > wait-acquire and the xchg_tail() operations. > > By growing the pending bit to a byte, we reduce the tail to 16bit. > This means we can use xchg16 for the tail part and do away with
2015 Jul 30
0
[LLVMdev] [x86] Prefetch intrinsics and prefetchw
Hi, I am looking at how the PREFETCHW instruction is matched to the IR prefetch intrinsic (and __builtin_prefetch). Consider this C program: char foo[100]; int bar(void) { __builtin_prefetch(foo, 0, 0); __builtin_prefetch(foo, 0, 1); __builtin_prefetch(foo, 0, 2); __builtin_prefetch(foo, 0, 3); __builtin_prefetch(foo, 1, 0); __builtin_prefetch(foo, 1, 1);
2014 Jun 15
0
[PATCH 04/11] qspinlock: Extract out the exchange of tail code word
From: Waiman Long <Waiman.Long at hp.com> This patch extracts the logic for the exchange of new and previous tail code words into a new xchg_tail() function which can be optimized in a later patch. Signed-off-by: Waiman Long <Waiman.Long at hp.com> Signed-off-by: Peter Zijlstra <peterz at infradead.org> --- include/asm-generic/qspinlock_types.h | 2 +
2014 Apr 17
0
[PATCH v9 04/19] qspinlock: Extract out the exchange of tail code word
This patch extracts the logic for the exchange of new and previous tail code words into a new xchg_tail() function which can be optimized in a later patch. Signed-off-by: Waiman Long <Waiman.Long at hp.com> --- include/asm-generic/qspinlock_types.h | 2 + kernel/locking/qspinlock.c | 61 +++++++++++++++++++++------------ 2 files changed, 41 insertions(+), 22 deletions(-)
2014 Jun 18
0
[PATCH 04/11] qspinlock: Extract out the exchange of tail code word
Il 17/06/2014 22:55, Konrad Rzeszutek Wilk ha scritto: > On Sun, Jun 15, 2014 at 02:47:01PM +0200, Peter Zijlstra wrote: >> From: Waiman Long <Waiman.Long at hp.com> >> >> This patch extracts the logic for the exchange of new and previous tail >> code words into a new xchg_tail() function which can be optimized in a >> later patch. > > And also adds a
2014 Jun 17
3
[PATCH 04/11] qspinlock: Extract out the exchange of tail code word
On Sun, Jun 15, 2014 at 02:47:01PM +0200, Peter Zijlstra wrote: > From: Waiman Long <Waiman.Long at hp.com> > > This patch extracts the logic for the exchange of new and previous tail > code words into a new xchg_tail() function which can be optimized in a > later patch. And also adds a third try on acquiring the lock. That I think should be a seperate patch. And instead of
2014 Jun 17
3
[PATCH 04/11] qspinlock: Extract out the exchange of tail code word
On Sun, Jun 15, 2014 at 02:47:01PM +0200, Peter Zijlstra wrote: > From: Waiman Long <Waiman.Long at hp.com> > > This patch extracts the logic for the exchange of new and previous tail > code words into a new xchg_tail() function which can be optimized in a > later patch. And also adds a third try on acquiring the lock. That I think should be a seperate patch. And instead of
2014 Jul 07
0
[PATCH 05/11] qspinlock: Optimize for smaller NR_CPUS
On Mon, Jul 07, 2014 at 05:08:17PM +0200, Paolo Bonzini wrote: > Il 07/07/2014 16:35, Peter Zijlstra ha scritto: > >On Wed, Jun 18, 2014 at 01:39:52PM +0200, Paolo Bonzini wrote: > >>Il 15/06/2014 14:47, Peter Zijlstra ha scritto: > >>> > >>>- for (;;) { > >>>- new = (val & ~_Q_PENDING_MASK) | _Q_LOCKED_VAL; > >>>- >
2014 May 21
0
[RFC 08/07] qspinlock: integrate pending bit into queue
2014-05-21 18:49+0200, Radim Kr?m??: > 2014-05-19 16:17-0400, Waiman Long: > > As for now, I will focus on just having one pending bit. > > I'll throw some ideas at it, One of the ideas follows; it seems sound, but I haven't benchmarked it thoroughly. (Wasted a lot of time by writing/playing with various tools and loads.) Dbench on ext4 ramdisk, hackbench and ebizzy
2007 Dec 18
3
[PATCH] finish processor.h integration
What's left in processor_32.h and processor_64.h cannot be cleanly integrated. However, it's just a couple of definitions. They are moved to processor.h around ifdefs, and the original files are deleted. Note that there's much less headers included in the final version. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> --- include/asm-x86/processor.h | 140
2007 Dec 18
3
[PATCH] finish processor.h integration
What's left in processor_32.h and processor_64.h cannot be cleanly integrated. However, it's just a couple of definitions. They are moved to processor.h around ifdefs, and the original files are deleted. Note that there's much less headers included in the final version. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> --- include/asm-x86/processor.h | 140
2014 Apr 17
2
[PATCH v9 06/19] qspinlock: prolong the stay in the pending bit path
On Thu, Apr 17, 2014 at 11:03:58AM -0400, Waiman Long wrote: > There is a problem in the current trylock_pending() function. When the > lock is free, but the pending bit holder hasn't grabbed the lock & > cleared the pending bit yet, the trylock_pending() function will fail. I remember seeing some of this.. > It can be seen that the queue spinlock is slower than the ticket
2014 Apr 17
2
[PATCH v9 06/19] qspinlock: prolong the stay in the pending bit path
On Thu, Apr 17, 2014 at 11:03:58AM -0400, Waiman Long wrote: > There is a problem in the current trylock_pending() function. When the > lock is free, but the pending bit holder hasn't grabbed the lock & > cleared the pending bit yet, the trylock_pending() function will fail. I remember seeing some of this.. > It can be seen that the queue spinlock is slower than the ticket
2014 Apr 18
0
[PATCH v9 06/19] qspinlock: prolong the stay in the pending bit path
On 04/17/2014 12:36 PM, Peter Zijlstra wrote: > On Thu, Apr 17, 2014 at 11:03:58AM -0400, Waiman Long wrote: >> There is a problem in the current trylock_pending() function. When the >> lock is free, but the pending bit holder hasn't grabbed the lock& >> cleared the pending bit yet, the trylock_pending() function will fail. > I remember seeing some of this.. >
2014 Jun 15
0
[PATCH 07/11] qspinlock: Use a simple write to grab the lock, if applicable
From: Waiman Long <Waiman.Long at hp.com> Currently, atomic_cmpxchg() is used to get the lock. However, this is not really necessary if there is more than one task in the queue and the queue head don't need to reset the queue code word. For that case, a simple write to set the lock bit is enough as the queue head will be the only one eligible to get the lock as long as it checks that
2014 Apr 17
0
[PATCH v9 07/19] qspinlock: Use a simple write to grab the lock, if applicable
Currently, atomic_cmpxchg() is used to get the lock. However, this is not really necessary if there is more than one task in the queue and the queue head don't need to reset the queue code word. For that case, a simple write to set the lock bit is enough as the queue head will be the only one eligible to get the lock as long as it checks that both the lock and pending bits are not set. The
2014 May 07
0
[PATCH v10 07/19] qspinlock: Use a simple write to grab the lock, if applicable
Currently, atomic_cmpxchg() is used to get the lock. However, this is not really necessary if there is more than one task in the queue and the queue head don't need to reset the queue code word. For that case, a simple write to set the lock bit is enough as the queue head will be the only one eligible to get the lock as long as it checks that both the lock and pending bits are not set. The