Displaying 20 results from an estimated 396 matches for "xchg".
2019 Aug 20
1
Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
...libc/arch/i386/libgcc/__lshrdi3.S
>> use the following code sequences for shift counts greater 31:
>>
>> 1: 1:
>> xorl %edx,%edx shrl %cl,%edx
>> shl %cl,%eax xorl %eax,%eax
>> ^
>> xchgl %edx,%eax xchgl %edx,%eax
>> ret ret
>>
>> At least and especially on Intel processors XCHG was and
>> still is a rather slow instruction and should be avoided.
>> Use the following better code sequences instead:
>>
>&...
2007 Apr 18
1
[PATCH 3/4] Pte xchg optimization.patch
In situations where page table updates need only be made locally, and there
is no cross-processor A/D bit races involved, we need not use the heavyweight
xchg instruction to atomically fetch and clear page table entries. Instead,
we can just read and clear them directly.
This introduces a neat optimization for non-SMP kernels; drop the atomic
xchg operations from page table updates.
Thanks to Michel Lespinasse for noting this potential optimization....
2007 Apr 18
1
[PATCH 3/4] Pte xchg optimization.patch
In situations where page table updates need only be made locally, and there
is no cross-processor A/D bit races involved, we need not use the heavyweight
xchg instruction to atomically fetch and clear page table entries. Instead,
we can just read and clear them directly.
This introduces a neat optimization for non-SMP kernels; drop the atomic
xchg operations from page table updates.
Thanks to Michel Lespinasse for noting this potential optimization....
2019 Aug 15
2
Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
...https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S
use the following code sequences for shift counts greater 31:
1: 1:
xorl %edx,%edx shrl %cl,%edx
shl %cl,%eax xorl %eax,%eax
^
xchgl %edx,%eax xchgl %edx,%eax
ret ret
At least and especially on Intel processors XCHG was and
still is a rather slow instruction and should be avoided.
Use the following better code sequences instead:
1: 1:
shll %cl,%eax...
2008 Jun 10
1
[PATCH] xen: Use wmb instead of rmb in xen_evtchn_do_upcall().
...hanged, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 73d78dc..332dd63 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -529,7 +529,7 @@ void xen_evtchn_do_upcall(struct pt_regs *regs)
#ifndef CONFIG_X86 /* No need for a barrier -- XCHG is a barrier on x86. */
/* Clear master flag /before/ clearing selector flag. */
- rmb();
+ wmb();
#endif
pending_words = xchg(&vcpu_info->evtchn_pending_sel, 0);
while (pending_words != 0) {
--
1.5.3
2016 Jan 12
3
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote:
> On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave at stgolabs.net> wrote:
> >
> > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is
> > constantly cheaper (by at least half the latency) than MFENCE. While there
> > was a decent amount of variation, this difference remained rather constant.
>
> Mind testing "lock addq $0,0(%rsp)" instead of mfence? That's what we
> use on old cpu's with...
2016 Jan 12
3
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote:
> On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave at stgolabs.net> wrote:
> >
> > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is
> > constantly cheaper (by at least half the latency) than MFENCE. While there
> > was a decent amount of variation, this difference remained rather constant.
>
> Mind testing "lock addq $0,0(%rsp)" instead of mfence? That's what we
> use on old cpu's with...
2014 Jun 17
3
[PATCH 04/11] qspinlock: Extract out the exchange of tail code word
On Sun, Jun 15, 2014 at 02:47:01PM +0200, Peter Zijlstra wrote:
> From: Waiman Long <Waiman.Long at hp.com>
>
> This patch extracts the logic for the exchange of new and previous tail
> code words into a new xchg_tail() function which can be optimized in a
> later patch.
And also adds a third try on acquiring the lock. That I think should
be a seperate patch.
And instead of saying 'later patch' you should spell out the name
of the patch. Especially as this might not be obvious from somebody
doi...
2014 Jun 17
3
[PATCH 04/11] qspinlock: Extract out the exchange of tail code word
On Sun, Jun 15, 2014 at 02:47:01PM +0200, Peter Zijlstra wrote:
> From: Waiman Long <Waiman.Long at hp.com>
>
> This patch extracts the logic for the exchange of new and previous tail
> code words into a new xchg_tail() function which can be optimized in a
> later patch.
And also adds a third try on acquiring the lock. That I think should
be a seperate patch.
And instead of saying 'later patch' you should spell out the name
of the patch. Especially as this might not be obvious from somebody
doi...
2019 Aug 19
0
Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
.../klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S
> use the following code sequences for shift counts greater 31:
>
> 1: 1:
> xorl %edx,%edx shrl %cl,%edx
> shl %cl,%eax xorl %eax,%eax
> ^
> xchgl %edx,%eax xchgl %edx,%eax
> ret ret
>
> At least and especially on Intel processors XCHG was and
> still is a rather slow instruction and should be avoided.
> Use the following better code sequences instead:
>
> 1:...
2006 May 26
0
[PATCH]: Fixing the xchg emulation bug
The following patch fixes a bug in the emulation of the xchg
instruction. This bug has prevented us from booting fully virtualized
SMP guests that write to the APIC using the xchg instruction (when
CONFIG_X86_GOOD_APIC is not set). On 32 bit platforms, sles 10 kernels
are built without CONFIG_x86_GOOD_APIC not set and hence we have had
problems booting full...
2015 Apr 02
3
[PATCH 8/9] qspinlock: Generic paravirt support
...but can probe until we find the entry.
> >This will be bound in cost to the same number if probes we required for
> >insertion and avoids the full array scan.
> >
> >Now I think we can indeed do this, if as said earlier we do not clear
> >the bucket on insert if the cmpxchg succeeds, in that case the unlock
> >will observe _Q_SLOW_VAL and do the lookup, the lookup will then find
> >the entry. And we then need the unlock to clear the entry.
> >_Q_SLOW_VAL
> >Does that explain this? Or should I try again with code?
>
> OK, I got your propo...
2015 Apr 02
3
[PATCH 8/9] qspinlock: Generic paravirt support
...but can probe until we find the entry.
> >This will be bound in cost to the same number if probes we required for
> >insertion and avoids the full array scan.
> >
> >Now I think we can indeed do this, if as said earlier we do not clear
> >the bucket on insert if the cmpxchg succeeds, in that case the unlock
> >will observe _Q_SLOW_VAL and do the lookup, the lookup will then find
> >the entry. And we then need the unlock to clear the entry.
> >_Q_SLOW_VAL
> >Does that explain this? Or should I try again with code?
>
> OK, I got your propo...
2014 Jun 18
0
[PATCH 04/11] qspinlock: Extract out the exchange of tail code word
...17/06/2014 22:55, Konrad Rzeszutek Wilk ha scritto:
> On Sun, Jun 15, 2014 at 02:47:01PM +0200, Peter Zijlstra wrote:
>> From: Waiman Long <Waiman.Long at hp.com>
>>
>> This patch extracts the logic for the exchange of new and previous tail
>> code words into a new xchg_tail() function which can be optimized in a
>> later patch.
>
> And also adds a third try on acquiring the lock. That I think should
> be a seperate patch.
It doesn't really add a new try, the old code is:
- for (;;) {
- new = _Q_LOCKED_VAL;
- if (val)
- new = tail | (val...
2009 Apr 24
1
Bugs in pxelinux.asm - syslinux 3.75
In pxelinux.asm
the xchg instruction in
xchg ax,ax
.
data_on_top:
should be xchg ax,dx, I think
At the end of
pxe_get_cached_info routine,there is
and ax,ax
jnz .err
It is supposed to test for AX status, but
since pxenv does pushad and popad, AX doesn't contain statu...
2015 Apr 29
4
[PATCH v16 13/14] pvqspinlock: Improve slowpath performance by avoiding cmpxchg
On Fri, Apr 24, 2015 at 02:56:42PM -0400, Waiman Long wrote:
> In the pv_scan_next() function, the slow cmpxchg atomic operation is
> performed even if the other CPU is not even close to being halted. This
> extra cmpxchg can harm slowpath performance.
>
> This patch introduces the new mayhalt flag to indicate if the other
> spinning CPU is close to being halted or not. The current threshold...
2015 Apr 29
4
[PATCH v16 13/14] pvqspinlock: Improve slowpath performance by avoiding cmpxchg
On Fri, Apr 24, 2015 at 02:56:42PM -0400, Waiman Long wrote:
> In the pv_scan_next() function, the slow cmpxchg atomic operation is
> performed even if the other CPU is not even close to being halted. This
> extra cmpxchg can harm slowpath performance.
>
> This patch introduces the new mayhalt flag to indicate if the other
> spinning CPU is close to being halted or not. The current threshold...
2014 May 08
1
[PATCH v10 03/19] qspinlock: Add pending bit
...e_tail(smp_processor_id(), idx);
> @@ -119,15 +196,18 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val)
> node->next = NULL;
>
> /*
> + * we already touched the queueing cacheline; don't bother with pending
> + * stuff.
> + *
> * trylock || xchg(lock, node)
> *
> - * 0,0 -> 0,1 ; trylock
> - * p,x -> n,x ; prev = xchg(lock, node)
> + * 0,0,0 -> 0,0,1 ; trylock
> + * p,y,x -> n,y,x ; prev = xchg(lock, node)
> */
And any value of @val we might have had here is completely out-dated.
The only thing that...
2014 May 08
1
[PATCH v10 03/19] qspinlock: Add pending bit
...e_tail(smp_processor_id(), idx);
> @@ -119,15 +196,18 @@ void queue_spin_lock_slowpath(struct qspinlock *lock, u32 val)
> node->next = NULL;
>
> /*
> + * we already touched the queueing cacheline; don't bother with pending
> + * stuff.
> + *
> * trylock || xchg(lock, node)
> *
> - * 0,0 -> 0,1 ; trylock
> - * p,x -> n,x ; prev = xchg(lock, node)
> + * 0,0,0 -> 0,0,1 ; trylock
> + * p,y,x -> n,y,x ; prev = xchg(lock, node)
> */
And any value of @val we might have had here is completely out-dated.
The only thing that...
2020 Feb 10
3
atomic ops are optimized with incorrect semantics .
Hi All,
With the "https://gcc.godbolt.org/z/yBYTrd" case .
the atomic is converted to non atomic ops for x86 like
from
xchg dword ptr [100], eax
to
mov dword ptr [100], 1
the pass is responsible for this tranformation was instCombine
i.e InstCombiner::visitAtomicRMWInst
which converts the IR like
%0 = atomicrmw xchg i32* inttoptr (i64 100 to i32*), i32 1 monotonic
to
store atomic i32 1, i32* inttoptr (i64 100 to i32...