thr3ads.net - similar to: "llvm, new language and inline assembly."

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

3

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote: > On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave at stgolabs.net> wrote: > > > > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is > > constantly cheaper (by at least half the latency) than MFENCE. While there > > was a decent amount of variation, this difference

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

3

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote: > On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave at stgolabs.net> wrote: > > > > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is > > constantly cheaper (by at least half the latency) than MFENCE. While there > > was a decent amount of variation, this difference

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

0

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin <mst at redhat.com> wrote: > #ifdef xchgrz > /* same as xchg but poking at gcc red zone */ > #define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");": "=r"(ret) :: "memory", "cc"); } while (0) > #endif That's not safe in general. gcc might be using its

Meaning of IR inline assembly

2015 Nov 18

2

Meaning of IR inline assembly

Thanks, but I could not find the imr, dirflag, fpsr constraints here. Just the usual gcc/clang inline assembly constraints. Those one were of my concern, actually :) -- Alex 18.11.2015, 17:11, "David Siegel" <agnat at icloud.com>: >> On 18.11.2015, at 16:28, AlexandreFressange via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> I reduced the above

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

1

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Tue, Jan 12, 2016 at 09:20:06AM -0800, Linus Torvalds wrote: > On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin <mst at redhat.com> wrote: > > #ifdef xchgrz > > /* same as xchg but poking at gcc red zone */ > > #define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");": "=r"(ret) :: "memory", "cc"); }

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

1

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Tue, Jan 12, 2016 at 09:20:06AM -0800, Linus Torvalds wrote: > On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin <mst at redhat.com> wrote: > > #ifdef xchgrz > > /* same as xchg but poking at gcc red zone */ > > #define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");": "=r"(ret) :: "memory", "cc"); }

Meaning of IR inline assembly

2015 Nov 18

2

Meaning of IR inline assembly

Hello, Most of the IR language is correctly explained; but with inline assembly I feel alone at some point: define i32 @main(i32 %argc, i8** %argv) #0 { ... //some uninteresting bloat here call void asm sideeffect "outw %eax, $0", "imr,~{dirflag},~{fpsr},~{flags}"(i32 %8) #2, !srcloc !2 ret i32 0 } I reduced the above code to the offending line containing:

[PATCH v4 5/5] x86: drop mfence in favor of lock+addl

2016 Jan 27

0

[PATCH v4 5/5] x86: drop mfence in favor of lock+addl

mfence appears to be way slower than a locked instruction - let's use lock+add unconditionally, as we always did on old 32-bit. Just poking at SP would be the most natural, but if we then read the value from SP, we get a false dependency which will slow us down. This was noted in this article: http://shipilev.net/blog/2014/on-the-fence-with-dependencies/ And is easy to reproduce by sticking

[PATCH v5 1/5] x86: add cc clobber for addl

2016 Jan 28

0

[PATCH v5 1/5] x86: add cc clobber for addl

addl clobbers flags (such as CF) but barrier.h didn't tell this to gcc. Historically, gcc doesn't need one on x86, and always considers flags clobbered. We are probably missing the cc clobber in a *lot* of places for this reason. But even if not necessary, it's probably a good thing to add for documentation, and in case gcc semantcs ever change. Reported-by: Borislav Petkov <bp at

[PATCH v6] x86: use lock+addl for smp_mb()

2017 Oct 27

1

[PATCH v6] x86: use lock+addl for smp_mb()

mfence appears to be way slower than a locked instruction - let's use lock+add unconditionally, as we always did on old 32-bit. Results: perf stat -r 10 -- ./virtio_ring_0_9 --sleep --host-affinity 0 --guest-affinity 0 Before: 0.922565990 seconds time elapsed ( +- 1.15% ) After: 0.578667024 seconds time elapsed

[PATCH v6] x86: use lock+addl for smp_mb()

2017 Oct 27

1

[PATCH v6] x86: use lock+addl for smp_mb()

mfence appears to be way slower than a locked instruction - let's use lock+add unconditionally, as we always did on old 32-bit. Results: perf stat -r 10 -- ./virtio_ring_0_9 --sleep --host-affinity 0 --guest-affinity 0 Before: 0.922565990 seconds time elapsed ( +- 1.15% ) After: 0.578667024 seconds time elapsed

[LLVMdev] MFENCE encoding

2008 Oct 17

0

[LLVMdev] MFENCE encoding

Hmm. mfence and lfence needs special handling. I'll take a look. Evan On Oct 16, 2008, at 10:46 PM, Mon Ping Wang wrote: > Hi, > > I have a problem with creating a MFENCE on X86 with SSE > > In X86InstrSSE.td, a MFENCE is > def MFENCE : I<0xAE, MRM6m, (outs), (ins), > "mfence", [(int_x86_sse2_mfence)]>, TB, Requires< > [HasSSE2]>;

[LLVMdev] MFENCE encoding

2008 Oct 17

1

[LLVMdev] MFENCE encoding

I've fixed this (untested though). http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20081013/068611.html Evan On Oct 17, 2008, at 9:51 AM, Evan Cheng wrote: > Hmm. mfence and lfence needs special handling. I'll take a look. > > Evan > > On Oct 16, 2008, at 10:46 PM, Mon Ping Wang wrote: > >> Hi, >> >> I have a problem with creating a MFENCE

[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support

2017 Feb 14

2

[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support

Thomas Gleixner <tglx at linutronix.de> writes: > On Tue, 14 Feb 2017, Vitaly Kuznetsov wrote: > >> Hi, >> >> while we're still waiting for a definitive ACK from Microsoft that the >> algorithm is good for SMP case (as we can't prevent the code in vdso from >> migrating between CPUs) I'd like to send v2 with some modifications to keep

[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support

2017 Feb 14

2

[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support

Thomas Gleixner <tglx at linutronix.de> writes: > On Tue, 14 Feb 2017, Vitaly Kuznetsov wrote: > >> Hi, >> >> while we're still waiting for a definitive ACK from Microsoft that the >> algorithm is good for SMP case (as we can't prevent the code in vdso from >> migrating between CPUs) I'd like to send v2 with some modifications to keep

[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support

2017 Feb 14

0

[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support

On Tue, Feb 14, 2017 at 7:50 AM, Vitaly Kuznetsov <vkuznets at redhat.com> wrote: > Thomas Gleixner <tglx at linutronix.de> writes: > >> On Tue, 14 Feb 2017, Vitaly Kuznetsov wrote: >> >>> Hi, >>> >>> while we're still waiting for a definitive ACK from Microsoft that the >>> algorithm is good for SMP case (as we can't

[LLVMdev] MFENCE encoding

2008 Oct 17

2

[LLVMdev] MFENCE encoding

Hi, I have a problem with creating a MFENCE on X86 with SSE In X86InstrSSE.td, a MFENCE is def MFENCE : I<0xAE, MRM6m, (outs), (ins), "mfence", [(int_x86_sse2_mfence)]>, TB, Requires< [HasSSE2]>; In X86CodeEmitter.cpp in emitInstruction case X86II::MRM6m: case X86II::MRM7m: { intptr_t PCAdj = (CurOp+4 != NumOps) ?

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

2016 Jan 27

6

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

2016 Jan 27

6

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around

RFC: non-temporal fencing in LLVM IR

2016 Jan 14

2

RFC: non-temporal fencing in LLVM IR

Hi JF, Philip, Clang currently has __builtin_nontemporal_store and __builtin_nontemporal_load. How will the usage model for those change? Thanks again, Hal ----- Original Message ----- > From: "Philip Reames via llvm-dev" <llvm-dev at lists.llvm.org> > To: "JF Bastien" <jfb at google.com>, "llvm-dev" > <llvm-dev at lists.llvm.org> >

similar to: llvm, new language and inline assembly.