thr3ads.net - search: "addls"

kernel: locore.s doesn't assemble (fillkpt, $PAGE_SHIFT, $PTESHIFT)

2003 Aug 22

2

kernel: locore.s doesn't assemble (fillkpt, $PAGE_SHIFT, $PTESHIFT)

since august 8th, 2003 the kernel on my i386 pentiumIII won't compile. the problem arises in locore.s with the definition of the constants $PAGE_SHIFT and $PTESHIFT used in `shr' and `shl' instructions within the macros `fillkpt' and `fillkptphys'. i've tried to cvsup(1) RELENG_4 and RELENG_4_8 every day for over a week now, but kernel builds (as part of a buildworld)

[PATCH v6] x86: use lock+addl for smp_mb()

2017 Oct 27

1

[PATCH v6] x86: use lock+addl for smp_mb()

mfence appears to be way slower than a locked instruction - let's use lock+add unconditionally, as we always did on old 32-bit. Results: perf stat -r 10 -- ./virtio_ring_0_9 --sleep --host-affinity 0 --guest-affinity 0 Before: 0.922565990 seconds time elapsed ( +- 1.15% ) After: 0.578667024 seconds time elapsed

[PATCH v6] x86: use lock+addl for smp_mb()

2017 Oct 27

1

[PATCH v6] x86: use lock+addl for smp_mb()

mfence appears to be way slower than a locked instruction - let's use lock+add unconditionally, as we always did on old 32-bit. Results: perf stat -r 10 -- ./virtio_ring_0_9 --sleep --host-affinity 0 --guest-affinity 0 Before: 0.922565990 seconds time elapsed ( +- 1.15% ) After: 0.578667024 seconds time elapsed

[PATCH v3 0/4] x86: faster mb()+documentation tweaks

2016 Jan 13

6

[PATCH v3 0/4] x86: faster mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So let's use the locked variant everywhere. While I was at it, I found some inconsistencies in comments in arch/x86/include/asm/barrier.h The documentation fixes are included first - I verified that they do not change the generated code at all.

[PATCH v3 0/4] x86: faster mb()+documentation tweaks

2016 Jan 13

6

[PATCH v3 0/4] x86: faster mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So let's use the locked variant everywhere. While I was at it, I found some inconsistencies in comments in arch/x86/include/asm/barrier.h The documentation fixes are included first - I verified that they do not change the generated code at all.

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

2016 Jan 27

6

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

2016 Jan 27

6

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around

[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates

2013 Feb 14

2

[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates

Hello, While investigating one of the existing tests (test/CodeGen/X86/tailcallpic2.ll), I ran into IR that produces some interesting code. The IR is very straightforward: define protected fastcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32 %a4) { entry: ret i32 %a3 } define fastcc i32 @tailcaller(i32 %in1, i32 %in2) { entry: %tmp11 = tail call fastcc i32 @tailcallee( i32 %in1, i32 %in2, i32

[PATCH v4 5/5] x86: drop mfence in favor of lock+addl

2016 Jan 27

0

[PATCH v4 5/5] x86: drop mfence in favor of lock+addl

mfence appears to be way slower than a locked instruction - let's use lock+add unconditionally, as we always did on old 32-bit. Just poking at SP would be the most natural, but if we then read the value from SP, we get a false dependency which will slow us down. This was noted in this article: http://shipilev.net/blog/2014/on-the-fence-with-dependencies/ And is easy to reproduce by sticking

[PATCH v5 1/5] x86: add cc clobber for addl

2016 Jan 28

0

[PATCH v5 1/5] x86: add cc clobber for addl

addl clobbers flags (such as CF) but barrier.h didn't tell this to gcc. Historically, gcc doesn't need one on x86, and always considers flags clobbered. We are probably missing the cc clobber in a *lot* of places for this reason. But even if not necessary, it's probably a good thing to add for documentation, and in case gcc semantcs ever change. Reported-by: Borislav Petkov <bp at

[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks

2016 Jan 28

10

[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around

[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks

2016 Jan 28

10

[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around

[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates

2013 Feb 15

0

[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates

Hey Eli, On Thu, Feb 14, 2013 at 5:45 PM, Eli Bendersky <eliben at google.com> wrote: > Hello, > > While investigating one of the existing tests > (test/CodeGen/X86/tailcallpic2.ll), I ran into IR that produces some > interesting code. The IR is very straightforward: > > define protected fastcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32 > %a4) { > entry: >

[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

2016 Jan 12

7

[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs. So let's use the locked variant everywhere - helps keep the code simple as well. While I was at it, I found some inconsistencies in comments in arch/x86/include/asm/barrier.h I hope I'm not splitting this up too much - the reason

[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

2016 Jan 12

7

[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs. So let's use the locked variant everywhere - helps keep the code simple as well. While I was at it, I found some inconsistencies in comments in arch/x86/include/asm/barrier.h I hope I'm not splitting this up too much - the reason

How to call an (x86) cleanup/catchpad funclet

2016 Apr 04

2

How to call an (x86) cleanup/catchpad funclet

I've modified llvm to emit vc++ compatible SEH structures for my personality on x86/Windows and my handler works fine, but the only thing I can't figure out is how to call these funclets, they look like: Catch: "?catch$3@?0?m3 at 4HA": LBB4_3: # %BasicBlock26 pushl %ebp pushl %eax addl $12, %ebp movl %esp, -28(%ebp) movl $LBB4_5, %eax

[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates

2013 Feb 15

2

[LLVMdev] Question about fastcc assumptions and seemingly superfluous %esp updates

>> While investigating one of the existing tests >> (test/CodeGen/X86/tailcallpic2.ll), I ran into IR that produces some >> interesting code. The IR is very straightforward: >> >> define protected fastcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32 >> %a4) { >> entry: >> ret i32 %a3 >> } >> >> define fastcc i32 @tailcaller(i32

[LLVMdev] Area for improvement

2005 Feb 22

0

[LLVMdev] Area for improvement

When I increased COLS to the point where the loop could no longer be unrolled, the selection dag code generator generated effectively the same code as the default X86 code generator. Lots of redundant imul/movl/addl sequences. It can't clean it up either. Only unrolling all nested loops permits it to be optimized away, regardless of code generator. Jeff Cohen wrote: > I noticed

[PATCH v4 5/5] x86: drop mfence in favor of lock+addl

2016 Jan 27

1

[PATCH v4 5/5] x86: drop mfence in favor of lock+addl

On Wed, Jan 27, 2016 at 7:10 AM, Michael S. Tsirkin <mst at redhat.com> wrote: > > -#define __smp_mb() mb() > +#define __smp_mb() asm volatile("lock; addl $0,-4(%%esp)" ::: "memory", "cc") So this doesn't look right for x86-64. Using %esp rather than %rsp. How did that even work for you? Linus

[PATCH v4 5/5] x86: drop mfence in favor of lock+addl

2016 Jan 27

1

[PATCH v4 5/5] x86: drop mfence in favor of lock+addl

On Wed, Jan 27, 2016 at 7:10 AM, Michael S. Tsirkin <mst at redhat.com> wrote: > > -#define __smp_mb() mb() > +#define __smp_mb() asm volatile("lock; addl $0,-4(%%esp)" ::: "memory", "cc") So this doesn't look right for x86-64. Using %esp rather than %rsp. How did that even work for you? Linus

search for: addls