Displaying 20 results from an estimated 83 matches for "lfences".
Did you mean:
fences
2020 Jul 28
2
_mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier
_mm_lfence was originally documented as a load fence. But in light of
speculative execution vulnerabilities it has started being advertised as a
way to prevent speculative execution. Current Intel Software Development
Manual documents it as "Specifically, LFENCE does not execute until all
prior instructions have completed locally, and no later instruction begins
execution until LFENCE
2020 Aug 09
2
_mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier
...c, label %then, label %else
then:
call void @llvm.x86.sse2.lfence() convergent [
"convergencectrl"(token%token) ]
...
else:
call void @llvm.x86.sse2.lfence() convergent [
"convergencectrl"(token %token) ]
...
... and this would prevent the hoisting of the lfences.
The puzzle to me is whether one can justify this use of the
convergence tokens from a theoretical point of view. We describe
convergence control in terms of threads that communicate, which is a
faithful description of what's happening in the GPU use case. I wonder
whether for the speculative...
2007 Oct 16
1
LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
Nick Piggin <npiggin@suse.de> wrote:
>
> Also, for non-wb memory. I don't think the Intel document referenced
> says anything about this, but the AMD document says that loads can pass
> loads (page 8, rule b).
>
> This is why our rmb() is still an lfence.
BTW, Xen (in particular, the code in drivers/xen) uses mb/rmb/wmb
instead of smp_mb/smp_rmb/smp_wmb when it
2007 Oct 16
1
LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
Nick Piggin <npiggin@suse.de> wrote:
>
> Also, for non-wb memory. I don't think the Intel document referenced
> says anything about this, but the AMD document says that loads can pass
> loads (page 8, rule b).
>
> This is why our rmb() is still an lfence.
BTW, Xen (in particular, the code in drivers/xen) uses mb/rmb/wmb
instead of smp_mb/smp_rmb/smp_wmb when it
2017 Feb 14
2
[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support
Thomas Gleixner <tglx at linutronix.de> writes:
> On Tue, 14 Feb 2017, Vitaly Kuznetsov wrote:
>
>> Hi,
>>
>> while we're still waiting for a definitive ACK from Microsoft that the
>> algorithm is good for SMP case (as we can't prevent the code in vdso from
>> migrating between CPUs) I'd like to send v2 with some modifications to keep
2017 Feb 14
2
[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support
Thomas Gleixner <tglx at linutronix.de> writes:
> On Tue, 14 Feb 2017, Vitaly Kuznetsov wrote:
>
>> Hi,
>>
>> while we're still waiting for a definitive ACK from Microsoft that the
>> algorithm is good for SMP case (as we can't prevent the code in vdso from
>> migrating between CPUs) I'd like to send v2 with some modifications to keep
2016 Jan 12
3
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote:
> On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave at stgolabs.net> wrote:
> >
> > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is
> > constantly cheaper (by at least half the latency) than MFENCE. While there
> > was a decent amount of variation, this difference
2016 Jan 12
3
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote:
> On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave at stgolabs.net> wrote:
> >
> > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is
> > constantly cheaper (by at least half the latency) than MFENCE. While there
> > was a decent amount of variation, this difference
2016 Jan 13
6
[PATCH v3 0/4] x86: faster mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.
So let's use the locked variant everywhere.
While I was at it, I found some inconsistencies in comments in
arch/x86/include/asm/barrier.h
The documentation fixes are included first - I verified that
they do not change the generated code at all.
2016 Jan 13
6
[PATCH v3 0/4] x86: faster mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.
So let's use the locked variant everywhere.
While I was at it, I found some inconsistencies in comments in
arch/x86/include/asm/barrier.h
The documentation fixes are included first - I verified that
they do not change the generated code at all.
2017 Feb 14
0
[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support
On Tue, Feb 14, 2017 at 7:50 AM, Vitaly Kuznetsov <vkuznets at redhat.com> wrote:
> Thomas Gleixner <tglx at linutronix.de> writes:
>
>> On Tue, 14 Feb 2017, Vitaly Kuznetsov wrote:
>>
>>> Hi,
>>>
>>> while we're still waiting for a definitive ACK from Microsoft that the
>>> algorithm is good for SMP case (as we can't
2018 Feb 03
0
retpoline mitigation and 6.0
On Sat, 2018-02-03 at 00:23 +0000, Chandler Carruth wrote:
>
> Two aspects to this...
>
> One, we're somewhat reluctant to guarantee an ABI here. At least I
> am. While we don't *expect* rampant divergence here, I don't want
> this to become something we cannot change if there are good reasons
> to do so. We've already changed the thunks once based on
2016 Jan 27
6
[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.
So we really should use the locked variant everywhere, except that intel manual
says that clflush is only ordered by mfence, so we can't.
Note: some callers of clflush seems to assume sfence will
order it, so there could be existing bugs around
2016 Jan 27
6
[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.
So we really should use the locked variant everywhere, except that intel manual
says that clflush is only ordered by mfence, so we can't.
Note: some callers of clflush seems to assume sfence will
order it, so there could be existing bugs around
2016 Jan 28
10
[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.
So we really should use the locked variant everywhere, except that intel manual
says that clflush is only ordered by mfence, so we can't.
Note: some callers of clflush seems to assume sfence will
order it, so there could be existing bugs around
2016 Jan 28
10
[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.
So we really should use the locked variant everywhere, except that intel manual
says that clflush is only ordered by mfence, so we can't.
Note: some callers of clflush seems to assume sfence will
order it, so there could be existing bugs around
2016 Jan 12
1
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
On Tue, Jan 12, 2016 at 09:20:06AM -0800, Linus Torvalds wrote:
> On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin <mst at redhat.com> wrote:
> > #ifdef xchgrz
> > /* same as xchg but poking at gcc red zone */
> > #define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");": "=r"(ret) :: "memory", "cc"); }
2016 Jan 12
1
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
On Tue, Jan 12, 2016 at 09:20:06AM -0800, Linus Torvalds wrote:
> On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin <mst at redhat.com> wrote:
> > #ifdef xchgrz
> > /* same as xchg but poking at gcc red zone */
> > #define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");": "=r"(ret) :: "memory", "cc"); }
2018 Mar 23
5
RFC: Speculative Load Hardening (a Spectre variant #1 mitigation)
Hello all,
I've been working for the last month or so on a comprehensive mitigation
approach to variant #1 of Spectre. There are a bunch of reasons why this is
desirable:
- Critical software that is unlikely to be easily hand-mitigated (or where
the performance tradeoff isn't worth it) will have a compelling option.
- It gives us a baseline on performance for hand-mitigation.
- Combined
2016 Jan 12
0
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
On Tue, Jan 12, 2016 at 5:57 AM, Michael S. Tsirkin <mst at redhat.com> wrote:
> #ifdef xchgrz
> /* same as xchg but poking at gcc red zone */
> #define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");": "=r"(ret) :: "memory", "cc"); } while (0)
> #endif
That's not safe in general. gcc might be using its