search for: addl

Displaying 20 results from an estimated 418 matches for "addl".

Did you mean: add
2003 Aug 22
2
kernel: locore.s doesn't assemble (fillkpt, $PAGE_SHIFT, $PTESHIFT)
...alid for `shr' shrl $PAGE_SHIFT,%ecx /tmp/ccOO8Chb.s:2496: Error: suffix or operands invalid for `shr' /tmp/ccOO8Chb.s:2496: Error: suffix or operands invalid for `shl' movl %eax, %ebx ; shrl $PAGE_SHIFT, %ebx ; shll $PTESHIFT,%ebx ; addl (( KPTphys )-KERNBASE) ,%ebx ; orl $0x001 ,%eax ; orl %edx ,%eax ; 1: movl %eax,(%ebx) ; addl $PAGE_SIZE,%eax ; addl $PTESIZE,%ebx ; loop 1b /tmp/ccOO8Chb.s:2512: Error: suffix or operands invalid for `shr' shrl...
2017 Oct 27
1
[PATCH v6] x86: use lock+addl for smp_mb()
...d is easy to reproduce by sticking a barrier in a small non-inline function. So let's use a negative offset - which avoids this problem since we build with the red zone disabled. For userspace, use an address just below the redzone. The one difference between lock+add and mfence is that lock+addl does not affect clflush, previous patches converted all uses of clflush to call mb(), such that changes to smp_mb won't affect it. Update mb/rmb/wmb on 32 bit to use the negative offset, too, for consistency. As a follow-up, it might be worth considering switching users of clflush to another...
2017 Oct 27
1
[PATCH v6] x86: use lock+addl for smp_mb()
...d is easy to reproduce by sticking a barrier in a small non-inline function. So let's use a negative offset - which avoids this problem since we build with the red zone disabled. For userspace, use an address just below the redzone. The one difference between lock+add and mfence is that lock+addl does not affect clflush, previous patches converted all uses of clflush to call mb(), such that changes to smp_mb won't affect it. Update mb/rmb/wmb on 32 bit to use the negative offset, too, for consistency. As a follow-up, it might be worth considering switching users of clflush to another...
2016 Jan 13
6
[PATCH v3 0/4] x86: faster mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So let's use the locked variant everywhere. While I was at it, I found some inconsistencies in comments in arch/x86/include/asm/barrier.h The documentation fixes are included first - I verified that they do not change the generated code at all. They should be safe...
2016 Jan 13
6
[PATCH v3 0/4] x86: faster mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So let's use the locked variant everywhere. While I was at it, I found some inconsistencies in comments in arch/x86/include/asm/barrier.h The documentation fixes are included first - I verified that they do not change the generated code at all. They should be safe...
2016 Jan 27
6
[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around this code. Fortunatel...
2016 Jan 27
6
[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around this code. Fortunatel...
2013 Feb 14
2
[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates
...llc < ~/temp/z.ll -march=x86 -tailcallopt -O3 The produced code is (cleaned up a bit) tailcallee: # @tailcallee movl 4(%esp), %eax ret $12 tailcaller: # @tailcaller subl $12, %esp movl %edx, 20(%esp) movl %ecx, 16(%esp) addl $12, %esp jmp tailcallee # TAILCALL foo: # @foo subl $12, %esp movl 20(%esp), %ecx movl 16(%esp), %edx calll tailcaller subl $12, %esp addl $-6, %eax addl $12, %esp ret A number of questions arise here: 1) Notice that ...
2016 Jan 27
0
[PATCH v4 5/5] x86: drop mfence in favor of lock+addl
...), 4 deletions(-) diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index bfb28ca..7ab9581 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -11,11 +11,11 @@ */ #ifdef CONFIG_X86_32 -#define mb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "mfence", \ +#define mb() asm volatile(ALTERNATIVE("lock; addl $0,-4(%%esp)", "mfence", \ X86_FEATURE_XMM2) ::: "memory", "cc") -#define rmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "lfence&qu...
2016 Jan 28
0
[PATCH v5 1/5] x86: add cc clobber for addl
addl clobbers flags (such as CF) but barrier.h didn't tell this to gcc. Historically, gcc doesn't need one on x86, and always considers flags clobbered. We are probably missing the cc clobber in a *lot* of places for this reason. But even if not necessary, it's probably a good thing to add...
2016 Jan 28
10
[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around this code. Fortunatel...
2016 Jan 28
10
[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around this code. Fortunatel...
2013 Feb 15
0
[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates
...t; The produced code is (cleaned up a bit) > > tailcallee: # @tailcallee > movl 4(%esp), %eax > ret $12 > > tailcaller: # @tailcaller > subl $12, %esp > movl %edx, 20(%esp) > movl %ecx, 16(%esp) > addl $12, %esp > jmp tailcallee # TAILCALL > > foo: # @foo > subl $12, %esp > movl 20(%esp), %ecx > movl 16(%esp), %edx > calll tailcaller > subl $12, %esp > addl $-6, %eax > addl $12, %esp > ret...
2016 Jan 12
7
[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs. So let's use the locked variant everywhere - helps keep the code simple as well. While I was at it, I found some inconsistencies in comments in arch/x86/include/asm/barrier.h I hope I'm not splitting this up too much - the reason is I wanted to iso...
2016 Jan 12
7
[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs. So let's use the locked variant everywhere - helps keep the code simple as well. While I was at it, I found some inconsistencies in comments in arch/x86/include/asm/barrier.h I hope I'm not splitting this up too much - the reason is I wanted to iso...
2016 Apr 04
2
How to call an (x86) cleanup/catchpad funclet
...atible SEH structures for my personality on x86/Windows and my handler works fine, but the only thing I can't figure out is how to call these funclets, they look like: Catch: "?catch$3@?0?m3 at 4HA": LBB4_3: # %BasicBlock26 pushl %ebp pushl %eax addl $12, %ebp movl %esp, -28(%ebp) movl $LBB4_5, %eax addl $4, %esp popl %ebp retl # CATCHRET cleanup: "?dtor$2@?0?m2 at 4HA": LBB3_2: pushl %ebp subl $8, %esp addl $12, %ebp movl %ebp, %eax movl %esp, %ecx movl %eax, 4(%ecx) movl $1, (%ecx) calll m2$...
2013 Feb 15
2
[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates
...t) >> >> tailcallee: # @tailcallee >> movl 4(%esp), %eax >> ret $12 >> >> tailcaller: # @tailcaller >> subl $12, %esp >> movl %edx, 20(%esp) >> movl %ecx, 16(%esp) >> addl $12, %esp >> jmp tailcallee # TAILCALL >> >> foo: # @foo >> subl $12, %esp >> movl 20(%esp), %ecx >> movl 16(%esp), %edx >> calll tailcaller >> subl $12, %esp >> addl $-6, %ea...
2005 Feb 22
0
[LLVMdev] Area for improvement
When I increased COLS to the point where the loop could no longer be unrolled, the selection dag code generator generated effectively the same code as the default X86 code generator. Lots of redundant imul/movl/addl sequences. It can't clean it up either. Only unrolling all nested loops permits it to be optimized away, regardless of code generator. Jeff Cohen wrote: > I noticed that fourinarow is one of the programs in which LLVM is much > slower than GCC, so I decided to take a look and see w...
2016 Jan 27
1
[PATCH v4 5/5] x86: drop mfence in favor of lock+addl
On Wed, Jan 27, 2016 at 7:10 AM, Michael S. Tsirkin <mst at redhat.com> wrote: > > -#define __smp_mb() mb() > +#define __smp_mb() asm volatile("lock; addl $0,-4(%%esp)" ::: "memory", "cc") So this doesn't look right for x86-64. Using %esp rather than %rsp. How did that even work for you? Linus
2016 Jan 27
1
[PATCH v4 5/5] x86: drop mfence in favor of lock+addl
On Wed, Jan 27, 2016 at 7:10 AM, Michael S. Tsirkin <mst at redhat.com> wrote: > > -#define __smp_mb() mb() > +#define __smp_mb() asm volatile("lock; addl $0,-4(%%esp)" ::: "memory", "cc") So this doesn't look right for x86-64. Using %esp rather than %rsp. How did that even work for you? Linus