thr3ads.net - search: "lfences"

_mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier

2020 Jul 28

2

_mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier

_mm_lfence was originally documented as a load fence. But in light of speculative execution vulnerabilities it has started being advertised as a way to prevent speculative execution. Current Intel Software Development Manual documents it as "Specifically, LFENCE does not execute until all prior instructions have completed locally, and no later instruction begins execution until LFENCE

_mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier

2020 Aug 09

2

_mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier

...c, label %then, label %else then: call void @llvm.x86.sse2.lfence() convergent [ "convergencectrl"(token%token) ] ... else: call void @llvm.x86.sse2.lfence() convergent [ "convergencectrl"(token %token) ] ... ... and this would prevent the hoisting of the lfences. The puzzle to me is whether one can justify this use of the convergence tokens from a theoretical point of view. We describe convergence control in terms of threads that communicate, which is a faithful description of what's happening in the GPU use case. I wonder whether for the speculative...

LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007 Oct 16

1

LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

Nick Piggin <npiggin@suse.de> wrote: > > Also, for non-wb memory. I don't think the Intel document referenced > says anything about this, but the AMD document says that loads can pass > loads (page 8, rule b). > > This is why our rmb() is still an lfence. BTW, Xen (in particular, the code in drivers/xen) uses mb/rmb/wmb instead of smp_mb/smp_rmb/smp_wmb when it

LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007 Oct 16

1

LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

Nick Piggin <npiggin@suse.de> wrote: > > Also, for non-wb memory. I don't think the Intel document referenced > says anything about this, but the AMD document says that loads can pass > loads (page 8, rule b). > > This is why our rmb() is still an lfence. BTW, Xen (in particular, the code in drivers/xen) uses mb/rmb/wmb instead of smp_mb/smp_rmb/smp_wmb when it

[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support

2017 Feb 14

2

[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support

Thomas Gleixner <tglx at linutronix.de> writes: > On Tue, 14 Feb 2017, Vitaly Kuznetsov wrote: > >> Hi, >> >> while we're still waiting for a definitive ACK from Microsoft that the >> algorithm is good for SMP case (as we can't prevent the code in vdso from >> migrating between CPUs) I'd like to send v2 with some modifications to keep

[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support

2017 Feb 14

2

[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support

Thomas Gleixner <tglx at linutronix.de> writes: > On Tue, 14 Feb 2017, Vitaly Kuznetsov wrote: > >> Hi, >> >> while we're still waiting for a definitive ACK from Microsoft that the >> algorithm is good for SMP case (as we can't prevent the code in vdso from >> migrating between CPUs) I'd like to send v2 with some modifications to keep

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

3

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote: > On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave at stgolabs.net> wrote: > > > > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is > > constantly cheaper (by at least half the latency) than MFENCE. While there > > was a decent amount of variation, this difference

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

3

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote: > On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave at stgolabs.net> wrote: > > > > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is > > constantly cheaper (by at least half the latency) than MFENCE. While there > > was a decent amount of variation, this difference

[PATCH v3 0/4] x86: faster mb()+documentation tweaks

2016 Jan 13

6

[PATCH v3 0/4] x86: faster mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So let's use the locked variant everywhere. While I was at it, I found some inconsistencies in comments in arch/x86/include/asm/barrier.h The documentation fixes are included first - I verified that they do not change the generated code at all.

[PATCH v3 0/4] x86: faster mb()+documentation tweaks

2016 Jan 13

6

[PATCH v3 0/4] x86: faster mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So let's use the locked variant everywhere. While I was at it, I found some inconsistencies in comments in arch/x86/include/asm/barrier.h The documentation fixes are included first - I verified that they do not change the generated code at all.

[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support

2017 Feb 14

0

[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support

On Tue, Feb 14, 2017 at 7:50 AM, Vitaly Kuznetsov <vkuznets at redhat.com> wrote: > Thomas Gleixner <tglx at linutronix.de> writes: > >> On Tue, 14 Feb 2017, Vitaly Kuznetsov wrote: >> >>> Hi, >>> >>> while we're still waiting for a definitive ACK from Microsoft that the >>> algorithm is good for SMP case (as we can't

retpoline mitigation and 6.0

2018 Feb 03

0

retpoline mitigation and 6.0

On Sat, 2018-02-03 at 00:23 +0000, Chandler Carruth wrote: > > Two aspects to this... > > One, we're somewhat reluctant to guarantee an ABI here. At least I > am. While we don't *expect* rampant divergence here, I don't want > this to become something we cannot change if there are good reasons > to do so. We've already changed the thunks once based on

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

2016 Jan 27

6

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

2016 Jan 27

6

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around

[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks

2016 Jan 28

10

[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around

[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks

2016 Jan 28

10

[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

1

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Tue, Jan 12, 2016 at 09:20:06AM -0800, Linus Torvalds wrote: > On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin <mst at redhat.com> wrote: > > #ifdef xchgrz > > /* same as xchg but poking at gcc red zone */ > > #define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");": "=r"(ret) :: "memory", "cc"); }

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

1

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Tue, Jan 12, 2016 at 09:20:06AM -0800, Linus Torvalds wrote: > On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin <mst at redhat.com> wrote: > > #ifdef xchgrz > > /* same as xchg but poking at gcc red zone */ > > #define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");": "=r"(ret) :: "memory", "cc"); }

RFC: Speculative Load Hardening (a Spectre variant #1 mitigation)

2018 Mar 23

5

RFC: Speculative Load Hardening (a Spectre variant #1 mitigation)

Hello all, I've been working for the last month or so on a comprehensive mitigation approach to variant #1 of Spectre. There are a bunch of reasons why this is desirable: - Critical software that is unlikely to be easily hand-mitigated (or where the performance tradeoff isn't worth it) will have a compelling option. - It gives us a baseline on performance for hand-mitigation. - Combined

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

0

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin <mst at redhat.com> wrote: > #ifdef xchgrz > /* same as xchg but poking at gcc red zone */ > #define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");": "=r"(ret) :: "memory", "cc"); } while (0) > #endif That's not safe in general. gcc might be using its

search for: lfences