search for: xchg

Displaying 20 results from an estimated 396 matches for "xchg".

2019 Aug 20
1
Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
...libc/arch/i386/libgcc/__lshrdi3.S >> use the following code sequences for shift counts greater 31: >> >> 1: 1: >> xorl %edx,%edx shrl %cl,%edx >> shl %cl,%eax xorl %eax,%eax >> ^ >> xchgl %edx,%eax xchgl %edx,%eax >> ret ret >> >> At least and especially on Intel processors XCHG was and >> still is a rather slow instruction and should be avoided. >> Use the following better code sequences instead: >> >&...
2007 Apr 18
1
[PATCH 3/4] Pte xchg optimization.patch
In situations where page table updates need only be made locally, and there is no cross-processor A/D bit races involved, we need not use the heavyweight xchg instruction to atomically fetch and clear page table entries. Instead, we can just read and clear them directly. This introduces a neat optimization for non-SMP kernels; drop the atomic xchg operations from page table updates. Thanks to Michel Lespinasse for noting this potential optimization....
2007 Apr 18
1
[PATCH 3/4] Pte xchg optimization.patch
In situations where page table updates need only be made locally, and there is no cross-processor A/D bit races involved, we need not use the heavyweight xchg instruction to atomically fetch and clear page table entries. Instead, we can just read and clear them directly. This introduces a neat optimization for non-SMP kernels; drop the atomic xchg operations from page table updates. Thanks to Michel Lespinasse for noting this potential optimization....
2019 Aug 15
2
Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
...https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S use the following code sequences for shift counts greater 31: 1: 1: xorl %edx,%edx shrl %cl,%edx shl %cl,%eax xorl %eax,%eax ^ xchgl %edx,%eax xchgl %edx,%eax ret ret At least and especially on Intel processors XCHG was and still is a rather slow instruction and should be avoided. Use the following better code sequences instead: 1: 1: shll %cl,%eax...
2008 Jun 10
1
[PATCH] xen: Use wmb instead of rmb in xen_evtchn_do_upcall().
...hanged, 1 insertions(+), 1 deletions(-) diff --git a/drivers/xen/events.c b/drivers/xen/events.c index 73d78dc..332dd63 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -529,7 +529,7 @@ void xen_evtchn_do_upcall(struct pt_regs *regs) #ifndef CONFIG_X86 /* No need for a barrier -- XCHG is a barrier on x86. */ /* Clear master flag /before/ clearing selector flag. */ - rmb(); + wmb(); #endif pending_words = xchg(&vcpu_info->evtchn_pending_sel, 0); while (pending_words != 0) { -- 1.5.3
2016 Jan 12
3
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote: > On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave at stgolabs.net> wrote: > > > > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is > > constantly cheaper (by at least half the latency) than MFENCE. While there > > was a decent amount of variation, this difference remained rather constant. > > Mind testing "lock addq $0,0(%rsp)" instead of mfence? That's what we > use on old cpu's with...
2016 Jan 12
3
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote: > On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave at stgolabs.net> wrote: > > > > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is > > constantly cheaper (by at least half the latency) than MFENCE. While there > > was a decent amount of variation, this difference remained rather constant. > > Mind testing "lock addq $0,0(%rsp)" instead of mfence? That's what we > use on old cpu's with...
2014 Jun 17
3
[PATCH 04/11] qspinlock: Extract out the exchange of tail code word
On Sun, Jun 15, 2014 at 02:47:01PM +0200, Peter Zijlstra wrote: > From: Waiman Long <Waiman.Long at hp.com> > > This patch extracts the logic for the exchange of new and previous tail > code words into a new xchg_tail() function which can be optimized in a > later patch. And also adds a third try on acquiring the lock. That I think should be a seperate patch. And instead of saying 'later patch' you should spell out the name of the patch. Especially as this might not be obvious from somebody doi...
2014 Jun 17
3
[PATCH 04/11] qspinlock: Extract out the exchange of tail code word
On Sun, Jun 15, 2014 at 02:47:01PM +0200, Peter Zijlstra wrote: > From: Waiman Long <Waiman.Long at hp.com> > > This patch extracts the logic for the exchange of new and previous tail > code words into a new xchg_tail() function which can be optimized in a > later patch. And also adds a third try on acquiring the lock. That I think should be a seperate patch. And instead of saying 'later patch' you should spell out the name of the patch. Especially as this might not be obvious from somebody doi...
2019 Aug 19
0
Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
.../klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S > use the following code sequences for shift counts greater 31: > > 1: 1: > xorl %edx,%edx shrl %cl,%edx > shl %cl,%eax xorl %eax,%eax > ^ > xchgl %edx,%eax xchgl %edx,%eax > ret ret > > At least and especially on Intel processors XCHG was and > still is a rather slow instruction and should be avoided. > Use the following better code sequences instead: > > 1:...
2006 May 26
0
[PATCH]: Fixing the xchg emulation bug
The following patch fixes a bug in the emulation of the xchg instruction. This bug has prevented us from booting fully virtualized SMP guests that write to the APIC using the xchg instruction (when CONFIG_X86_GOOD_APIC is not set). On 32 bit platforms, sles 10 kernels are built without CONFIG_x86_GOOD_APIC not set and hence we have had problems booting full...
2015 Apr 02
3
[PATCH 8/9] qspinlock: Generic paravirt support
...but can probe until we find the entry. > >This will be bound in cost to the same number if probes we required for > >insertion and avoids the full array scan. > > > >Now I think we can indeed do this, if as said earlier we do not clear > >the bucket on insert if the cmpxchg succeeds, in that case the unlock > >will observe _Q_SLOW_VAL and do the lookup, the lookup will then find > >the entry. And we then need the unlock to clear the entry. > >_Q_SLOW_VAL > >Does that explain this? Or should I try again with code? > > OK, I got your propo...
2015 Apr 02
3
[PATCH 8/9] qspinlock: Generic paravirt support
...but can probe until we find the entry. > >This will be bound in cost to the same number if probes we required for > >insertion and avoids the full array scan. > > > >Now I think we can indeed do this, if as said earlier we do not clear > >the bucket on insert if the cmpxchg succeeds, in that case the unlock > >will observe _Q_SLOW_VAL and do the lookup, the lookup will then find > >the entry. And we then need the unlock to clear the entry. > >_Q_SLOW_VAL > >Does that explain this? Or should I try again with code? > > OK, I got your propo...
2014 Jun 18
0
[PATCH 04/11] qspinlock: Extract out the exchange of tail code word
...17/06/2014 22:55, Konrad Rzeszutek Wilk ha scritto: > On Sun, Jun 15, 2014 at 02:47:01PM +0200, Peter Zijlstra wrote: >> From: Waiman Long <Waiman.Long at hp.com> >> >> This patch extracts the logic for the exchange of new and previous tail >> code words into a new xchg_tail() function which can be optimized in a >> later patch. > > And also adds a third try on acquiring the lock. That I think should > be a seperate patch. It doesn't really add a new try, the old code is: - for (;;) { - new = _Q_LOCKED_VAL; - if (val) - new = tail | (val...
2009 Apr 24
1
Bugs in pxelinux.asm - syslinux 3.75
In pxelinux.asm the xchg instruction in xchg ax,ax . data_on_top: should be xchg ax,dx, I think At the end of pxe_get_cached_info routine,there is and ax,ax jnz .err It is supposed to test for AX status, but since pxenv does pushad and popad, AX doesn't contain statu...
2015 Apr 29
4
[PATCH v16 13/14] pvqspinlock: Improve slowpath performance by avoiding cmpxchg
On Fri, Apr 24, 2015 at 02:56:42PM -0400, Waiman Long wrote: > In the pv_scan_next() function, the slow cmpxchg atomic operation is > performed even if the other CPU is not even close to being halted. This > extra cmpxchg can harm slowpath performance. > > This patch introduces the new mayhalt flag to indicate if the other > spinning CPU is close to being halted or not. The current threshold...
2015 Apr 29
4
[PATCH v16 13/14] pvqspinlock: Improve slowpath performance by avoiding cmpxchg
On Fri, Apr 24, 2015 at 02:56:42PM -0400, Waiman Long wrote: > In the pv_scan_next() function, the slow cmpxchg atomic operation is > performed even if the other CPU is not even close to being halted. This > extra cmpxchg can harm slowpath performance. > > This patch introduces the new mayhalt flag to indicate if the other > spinning CPU is close to being halted or not. The current threshold...
2014 May 08
1
[PATCH v10 03/19] qspinlock: Add pending bit
...e_tail(smp_processor_id(), idx); > @@ -119,15 +196,18 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) > node->next = NULL; > > /* > + * we already touched the queueing cacheline; don't bother with pending > + * stuff. > + * > * trylock || xchg(lock, node) > * > - * 0,0 -> 0,1 ; trylock > - * p,x -> n,x ; prev = xchg(lock, node) > + * 0,0,0 -> 0,0,1 ; trylock > + * p,y,x -> n,y,x ; prev = xchg(lock, node) > */ And any value of @val we might have had here is completely out-dated. The only thing that...
2014 May 08
1
[PATCH v10 03/19] qspinlock: Add pending bit
...e_tail(smp_processor_id(), idx); > @@ -119,15 +196,18 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val) > node->next = NULL; > > /* > + * we already touched the queueing cacheline; don't bother with pending > + * stuff. > + * > * trylock || xchg(lock, node) > * > - * 0,0 -> 0,1 ; trylock > - * p,x -> n,x ; prev = xchg(lock, node) > + * 0,0,0 -> 0,0,1 ; trylock > + * p,y,x -> n,y,x ; prev = xchg(lock, node) > */ And any value of @val we might have had here is completely out-dated. The only thing that...
2020 Feb 10
3
atomic ops are optimized with incorrect semantics .
Hi All, With the "https://gcc.godbolt.org/z/yBYTrd" case . the atomic is converted to non atomic ops for x86 like from xchg dword ptr [100], eax to mov dword ptr [100], 1 the pass is responsible for this tranformation was instCombine i.e InstCombiner::visitAtomicRMWInst which converts the IR like %0 = atomicrmw xchg i32* inttoptr (i64 100 to i32*), i32 1 monotonic to store atomic i32 1, i32* inttoptr (i64 100 to i32...