thr3ads.net - similar to: "_mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier"

_mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier

2020 Aug 09

2

_mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier

Hi Craig, The review for the similar GPU problem is now up here: https://reviews.llvm.org/D85603 (+ some other patches on the Phabricator stack). >From a pragmatic perspective, the constraints added to program transforms there are sufficient for what you need. You'd produce IR such as: %token = call token @llvm.experimental.convergence.anchor() br i1 %c, label %then, label %else

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

3

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote: > On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave at stgolabs.net> wrote: > > > > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is > > constantly cheaper (by at least half the latency) than MFENCE. While there > > was a decent amount of variation, this difference

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

3

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote: > On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave at stgolabs.net> wrote: > > > > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is > > constantly cheaper (by at least half the latency) than MFENCE. While there > > was a decent amount of variation, this difference

[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support

2017 Feb 14

2

[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support

Thomas Gleixner <tglx at linutronix.de> writes: > On Tue, 14 Feb 2017, Vitaly Kuznetsov wrote: > >> Hi, >> >> while we're still waiting for a definitive ACK from Microsoft that the >> algorithm is good for SMP case (as we can't prevent the code in vdso from >> migrating between CPUs) I'd like to send v2 with some modifications to keep

[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support

2017 Feb 14

2

[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support

Thomas Gleixner <tglx at linutronix.de> writes: > On Tue, 14 Feb 2017, Vitaly Kuznetsov wrote: > >> Hi, >> >> while we're still waiting for a definitive ACK from Microsoft that the >> algorithm is good for SMP case (as we can't prevent the code in vdso from >> migrating between CPUs) I'd like to send v2 with some modifications to keep

RFC: Speculative Load Hardening (a Spectre variant #1 mitigation)

2018 Mar 23

5

RFC: Speculative Load Hardening (a Spectre variant #1 mitigation)

Hello all, I've been working for the last month or so on a comprehensive mitigation approach to variant #1 of Spectre. There are a bunch of reasons why this is desirable: - Critical software that is unlikely to be easily hand-mitigated (or where the performance tradeoff isn't worth it) will have a compelling option. - It gives us a baseline on performance for hand-mitigation. - Combined

retpoline mitigation and 6.0

2018 Feb 03

0

retpoline mitigation and 6.0

On Sat, 2018-02-03 at 00:23 +0000, Chandler Carruth wrote: > > Two aspects to this... > > One, we're somewhat reluctant to guarantee an ABI here. At least I > am. While we don't *expect* rampant divergence here, I don't want > this to become something we cannot change if there are good reasons > to do so. We've already changed the thunks once based on

[PATCH v6] x86: use lock+addl for smp_mb()

2017 Oct 27

1

[PATCH v6] x86: use lock+addl for smp_mb()

mfence appears to be way slower than a locked instruction - let's use lock+add unconditionally, as we always did on old 32-bit. Results: perf stat -r 10 -- ./virtio_ring_0_9 --sleep --host-affinity 0 --guest-affinity 0 Before: 0.922565990 seconds time elapsed ( +- 1.15% ) After: 0.578667024 seconds time elapsed

[PATCH v6] x86: use lock+addl for smp_mb()

2017 Oct 27

1

[PATCH v6] x86: use lock+addl for smp_mb()

mfence appears to be way slower than a locked instruction - let's use lock+add unconditionally, as we always did on old 32-bit. Results: perf stat -r 10 -- ./virtio_ring_0_9 --sleep --host-affinity 0 --guest-affinity 0 Before: 0.922565990 seconds time elapsed ( +- 1.15% ) After: 0.578667024 seconds time elapsed

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

1

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Tue, Jan 12, 2016 at 09:20:06AM -0800, Linus Torvalds wrote: > On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin <mst at redhat.com> wrote: > > #ifdef xchgrz > > /* same as xchg but poking at gcc red zone */ > > #define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");": "=r"(ret) :: "memory", "cc"); }

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

1

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Tue, Jan 12, 2016 at 09:20:06AM -0800, Linus Torvalds wrote: > On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin <mst at redhat.com> wrote: > > #ifdef xchgrz > > /* same as xchg but poking at gcc red zone */ > > #define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");": "=r"(ret) :: "memory", "cc"); }

LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007 Oct 16

1

LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

Nick Piggin <npiggin@suse.de> wrote: > > Also, for non-wb memory. I don't think the Intel document referenced > says anything about this, but the AMD document says that loads can pass > loads (page 8, rule b). > > This is why our rmb() is still an lfence. BTW, Xen (in particular, the code in drivers/xen) uses mb/rmb/wmb instead of smp_mb/smp_rmb/smp_wmb when it

LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2007 Oct 16

1

LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

Nick Piggin <npiggin@suse.de> wrote: > > Also, for non-wb memory. I don't think the Intel document referenced > says anything about this, but the AMD document says that loads can pass > loads (page 8, rule b). > > This is why our rmb() is still an lfence. BTW, Xen (in particular, the code in drivers/xen) uses mb/rmb/wmb instead of smp_mb/smp_rmb/smp_wmb when it

retpoline mitigation and 6.0

2018 Feb 03

4

retpoline mitigation and 6.0

On Fri, Feb 2, 2018 at 4:03 PM David Woodhouse <dwmw2 at infradead.org> wrote: > On Thu, 2018-02-01 at 10:10 +0100, Hans Wennborg via llvm-dev wrote: > > > > I saw the retpoline mitigation landed in r323155. Are we ready to > > merge this to 6.0, or are there any open issues that we're waiting > > for? Also, were there any followups I should know about? Also,

[LLVMdev] MFENCE encoding

2008 Oct 17

0

[LLVMdev] MFENCE encoding

Hmm. mfence and lfence needs special handling. I'll take a look. Evan On Oct 16, 2008, at 10:46 PM, Mon Ping Wang wrote: > Hi, > > I have a problem with creating a MFENCE on X86 with SSE > > In X86InstrSSE.td, a MFENCE is > def MFENCE : I<0xAE, MRM6m, (outs), (ins), > "mfence", [(int_x86_sse2_mfence)]>, TB, Requires< > [HasSSE2]>;

[RFC] Introducing convergence control bundles and intrinsics

2020 Aug 09

2

[RFC] Introducing convergence control bundles and intrinsics

Hi all, please see https://reviews.llvm.org/D85603 and its related changes for our most recent and hopefully final attempt at putting the `convergent` attribute on a solid theoretical foundation in a way that is useful for modern GPU compiler use cases. We have clear line of sight to enabling a new control flow implementation in the AMDGPU backend which is built on this foundation. I have

[LLVMdev] MFENCE encoding

2008 Oct 17

1

[LLVMdev] MFENCE encoding

I've fixed this (untested though). http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20081013/068611.html Evan On Oct 17, 2008, at 9:51 AM, Evan Cheng wrote: > Hmm. mfence and lfence needs special handling. I'll take a look. > > Evan > > On Oct 16, 2008, at 10:46 PM, Mon Ping Wang wrote: > >> Hi, >> >> I have a problem with creating a MFENCE

[RFC] Speculative Execution Side Effect Suppression for Mitigating Load Value Injection

2020 Mar 10

2

[RFC] Speculative Execution Side Effect Suppression for Mitigating Load Value Injection

Hi everyone, Some Intel processors have a newly disclosed vulnerability named Load Value Injection. One pager on Load Value Injection: https://software.intel.com/security-software-guidance/software-guidance/load-value-injection Deep dive on Load Value Injection: https://software.intel.com/security-software-guidance/insights/deep-dive-load-value-injection I wrote this compiler pass that can

[RFC] Introducing convergence control bundles and intrinsics

2020 Aug 17

2

[RFC] Introducing convergence control bundles and intrinsics

Hi Hal, On Mon, Aug 17, 2020 at 2:13 AM Hal Finkel <hfinkel at anl.gov> wrote: > Thanks for sending this. What do you think that we should do with the > existing convergent attribute? My preference, which is implicitly expressed in the review, is to use `convergent` both for the new and the old thing. They are implicitly distinguished via the "convergencectrl" operand

[RFC] Introducing convergence control bundles and intrinsics

2020 Aug 17

2

[RFC] Introducing convergence control bundles and intrinsics

On Mon, Aug 17, 2020 at 7:14 PM Hal Finkel <hfinkel at anl.gov> wrote: > On 8/17/20 11:51 AM, Nicolai Hähnle wrote: > > Hi Hal, > > > > On Mon, Aug 17, 2020 at 2:13 AM Hal Finkel <hfinkel at anl.gov> wrote: > >> Thanks for sending this. What do you think that we should do with the > >> existing convergent attribute? > > My preference, which

similar to: _mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier