thr3ads.net - similar to: "[PATCH] linux/x86: Use cpu_relax() rather than barrier() in smp_call

[PATCH] Simplify smp_call_function*() by using common implementation

2007 Apr 18

2

[PATCH] Simplify smp_call_function*() by using common implementation

smp_call_function and smp_call_function_single are almost complete duplicates of the same logic. This patch combines them by implementing them in terms of the more general smp_call_function_mask(). [ Jan, Andi: This only changes arch/i386; can x86_64 be changed in the same way? ] [ Rebased onto Jan's x86_64-mm-consolidate-smp_send_stop patch ] Signed-off-by: Jeremy Fitzhardinge

[PATCH] Simplify smp_call_function*() by using common implementation

2007 Apr 18

2

[PATCH] Simplify smp_call_function*() by using common implementation

smp_call_function and smp_call_function_single are almost complete duplicates of the same logic. This patch combines them by implementing them in terms of the more general smp_call_function_mask(). [ Jan, Andi: This only changes arch/i386; can x86_64 be changed in the same way? ] [ Rebased onto Jan's x86_64-mm-consolidate-smp_send_stop patch ] Signed-off-by: Jeremy Fitzhardinge

[PATCH V2] xen/arm: implement smp_call_function

2013 May 07

1

[PATCH V2] xen/arm: implement smp_call_function

From: Julien Grall <julien.grall@citrix.com> Move smp_call_function and on_selected_cpus to common code. Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org> --- Changes in V2: - Add copyright header in xen/common/smp.c xen/arch/arm/gic.c | 3 ++ xen/arch/arm/smp.c

[GIT PULL v2 3/5] s390: make cpu_relax a barrier again

2016 Oct 25

0

[GIT PULL v2 3/5] s390: make cpu_relax a barrier again

stop_machine seemed to be the only important place for yielding during cpu_relax. This was fixed by using cpu_relax_yield. Therefore, we can now redefine cpu_relax to be a barrier instead on s390, making s390 identical to all other architectures. Signed-off-by: Christian Borntraeger <borntraeger at de.ibm.com> --- arch/s390/include/asm/processor.h | 2 +- 1 file changed, 1 insertion(+), 1

[PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency

2016 Oct 21

1

[PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency

On 10/21/2016 04:57 PM, David Miller wrote: > From: Christian Borntraeger <borntraeger at de.ibm.com> > Date: Fri, 21 Oct 2016 13:58:53 +0200 > >> For spinning loops people did often use barrier() or cpu_relax(). >> For most architectures cpu_relax and barrier are the same, but on >> some architectures cpu_relax can add some latency. For example on s390 >>

[PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency

2016 Oct 21

1

[PATCH/RFC 0/5] cpu_relax: introduce yield, remove lowlatency

On 10/21/2016 04:57 PM, David Miller wrote: > From: Christian Borntraeger <borntraeger at de.ibm.com> > Date: Fri, 21 Oct 2016 13:58:53 +0200 > >> For spinning loops people did often use barrier() or cpu_relax(). >> For most architectures cpu_relax and barrier are the same, but on >> some architectures cpu_relax can add some latency. For example on s390 >>

[PATCH 10/11] qspinlock: Paravirt support

2014 Jun 16

4

[PATCH 10/11] qspinlock: Paravirt support

On 06/15/2014 08:47 AM, Peter Zijlstra wrote: > > > > +#ifdef CONFIG_PARAVIRT_SPINLOCKS > + > +/* > + * Write a comment about how all this works... > + */ > + > +#define _Q_LOCKED_SLOW (2U<< _Q_LOCKED_OFFSET) > + > +struct pv_node { > + struct mcs_spinlock mcs; > + struct mcs_spinlock __offset[3]; > + int cpu, head; > +}; I am wondering why

[PATCH 10/11] qspinlock: Paravirt support

2014 Jun 16

4

[PATCH 10/11] qspinlock: Paravirt support

On 06/15/2014 08:47 AM, Peter Zijlstra wrote: > > > > +#ifdef CONFIG_PARAVIRT_SPINLOCKS > + > +/* > + * Write a comment about how all this works... > + */ > + > +#define _Q_LOCKED_SLOW (2U<< _Q_LOCKED_OFFSET) > + > +struct pv_node { > + struct mcs_spinlock mcs; > + struct mcs_spinlock __offset[3]; > + int cpu, head; > +}; I am wondering why

[GIT PULL v2 0/5] cpu_relax: drop lowlatency, introduce yield

2016 Oct 25

7

[GIT PULL v2 0/5] cpu_relax: drop lowlatency, introduce yield

Peter, here is v2 with some improved patch descriptions and some fixes. The previous version has survived one day of linux-next and I only changed small parts. So unless there is some other issue, feel free to pull (or to apply the patches) to tip/locking. The following changes since commit 07d9a380680d1c0eb51ef87ff2eab5c994949e69: Linux 4.9-rc2 (2016-10-23 17:10:14 -0700) are available in

[GIT PULL v2 0/5] cpu_relax: drop lowlatency, introduce yield

2016 Oct 25

7

[GIT PULL v2 0/5] cpu_relax: drop lowlatency, introduce yield

Peter, here is v2 with some improved patch descriptions and some fixes. The previous version has survived one day of linux-next and I only changed small parts. So unless there is some other issue, feel free to pull (or to apply the patches) to tip/locking. The following changes since commit 07d9a380680d1c0eb51ef87ff2eab5c994949e69: Linux 4.9-rc2 (2016-10-23 17:10:14 -0700) are available in

[PATCH v9 06/19] qspinlock: prolong the stay in the pending bit path

2014 Apr 18

0

[PATCH v9 06/19] qspinlock: prolong the stay in the pending bit path

On 04/17/2014 12:36 PM, Peter Zijlstra wrote: > On Thu, Apr 17, 2014 at 11:03:58AM -0400, Waiman Long wrote: >> There is a problem in the current trylock_pending() function. When the >> lock is free, but the pending bit holder hasn't grabbed the lock& >> cleared the pending bit yet, the trylock_pending() function will fail. > I remember seeing some of this.. >

[PATCH 10/11] qspinlock: Paravirt support

2014 Jun 15

0

[PATCH 10/11] qspinlock: Paravirt support

Add minimal paravirt support. The code aims for minimal impact on the native case. On the lock side we add one jump label (asm_goto) and 4 paravirt callee saved calls that default to NOPs. The only effects are the extra NOPs and some pointless MOVs to accomodate the calling convention. No register spills happen because of this (x86_64). On the unlock side we have one paravirt callee saved call,

[PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly

2008 May 08

2

[PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly

On kvm I have seen some rare hangs in stop_machine when I used more guest cpus than hosts cpus. e.g. 32 guest cpus on 1 host cpu triggered the hang quite often. I could also reproduce the problem on a 4 way z/VM host with a 64 way guest. It turned out that the guest was consuming all available cpus mostly for spinning on scheduler locks like rq->lock. This is expected as the threads are

[PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly

2008 May 08

2

[PATCH/RFC] stop_machine: make stop_machine_run more virtualization friendly

On kvm I have seen some rare hangs in stop_machine when I used more guest cpus than hosts cpus. e.g. 32 guest cpus on 1 host cpu triggered the hang quite often. I could also reproduce the problem on a 4 way z/VM host with a 64 way guest. It turned out that the guest was consuming all available cpus mostly for spinning on scheduler locks like rq->lock. This is expected as the threads are

[PATCH v9 06/19] qspinlock: prolong the stay in the pending bit path

2014 Apr 17

2

[PATCH v9 06/19] qspinlock: prolong the stay in the pending bit path

On Thu, Apr 17, 2014 at 11:03:58AM -0400, Waiman Long wrote: > There is a problem in the current trylock_pending() function. When the > lock is free, but the pending bit holder hasn't grabbed the lock & > cleared the pending bit yet, the trylock_pending() function will fail. I remember seeing some of this.. > It can be seen that the queue spinlock is slower than the ticket

[PATCH v9 06/19] qspinlock: prolong the stay in the pending bit path

2014 Apr 17

2

[PATCH v9 06/19] qspinlock: prolong the stay in the pending bit path

On Thu, Apr 17, 2014 at 11:03:58AM -0400, Waiman Long wrote: > There is a problem in the current trylock_pending() function. When the > lock is free, but the pending bit holder hasn't grabbed the lock & > cleared the pending bit yet, the trylock_pending() function will fail. I remember seeing some of this.. > It can be seen that the queue spinlock is slower than the ticket

[PATCH 8/9] qspinlock: Generic paravirt support

2015 Mar 19

0

[PATCH 8/9] qspinlock: Generic paravirt support

On Wed, Mar 18, 2015 at 04:50:37PM -0400, Waiman Long wrote: > >+ this_cpu_write(__pv_lock_wait, lock); > > We may run into the same problem of needing to have 4 queue nodes per CPU. > If an interrupt happens just after the write and before the actual wait and > it goes through the same sequence, it will overwrite the __pv_lock_wait[] > entry. So we may have lost wakeup.

[PATCH 8/9] qspinlock: Generic paravirt support

2015 Mar 19

0

[PATCH 8/9] qspinlock: Generic paravirt support

On Wed, Mar 18, 2015 at 04:50:37PM -0400, Waiman Long wrote: > >+ this_cpu_write(__pv_lock_wait, lock); > > We may run into the same problem of needing to have 4 queue nodes per CPU. > If an interrupt happens just after the write and before the actual wait and > it goes through the same sequence, it will overwrite the __pv_lock_wait[] > entry. So we may have lost wakeup.

[PATCH v10 06/19] qspinlock: prolong the stay in the pending bit path

2014 May 08

2

[PATCH v10 06/19] qspinlock: prolong the stay in the pending bit path

On Wed, May 07, 2014 at 11:01:34AM -0400, Waiman Long wrote: > @@ -221,11 +222,37 @@ static inline int trylock_pending(struct qspinlock *lock, u32 *pval) > */ > for (;;) { > /* > - * If we observe any contention; queue. > + * If we observe that the queue is not empty, > + * return and be queued. > */ > - if (val & ~_Q_LOCKED_MASK) > + if (val

[PATCH v10 06/19] qspinlock: prolong the stay in the pending bit path

2014 May 08

2

[PATCH v10 06/19] qspinlock: prolong the stay in the pending bit path

On Wed, May 07, 2014 at 11:01:34AM -0400, Waiman Long wrote: > @@ -221,11 +222,37 @@ static inline int trylock_pending(struct qspinlock *lock, u32 *pval) > */ > for (;;) { > /* > - * If we observe any contention; queue. > + * If we observe that the queue is not empty, > + * return and be queued. > */ > - if (val & ~_Q_LOCKED_MASK) > + if (val

similar to: [PATCH] linux/x86: Use cpu_relax() rather than barrier() in smp_call_function()