Displaying 20 results from an estimated 83 matches for "lfence".
Did you mean:
sfence
2020 Jul 28
2
_mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier
_mm_lfence was originally documented as a load fence. But in light of
speculative execution vulnerabilities it has started being advertised as a
way to prevent speculative execution. Current Intel Software Development
Manual documents it as "Specifically, LFENCE does not execute until all
prior instructi...
2020 Aug 09
2
_mm_lfence in both pathes of an if/else are hoisted by SimplfyCFG potentially breaking use as a speculation barrier
...stack).
>From a pragmatic perspective, the constraints added to program
transforms there are sufficient for what you need. You'd produce IR
such as:
%token = call token @llvm.experimental.convergence.anchor()
br i1 %c, label %then, label %else
then:
call void @llvm.x86.sse2.lfence() convergent [
"convergencectrl"(token%token) ]
...
else:
call void @llvm.x86.sse2.lfence() convergent [
"convergencectrl"(token %token) ]
...
... and this would prevent the hoisting of the lfences.
The puzzle to me is whether one can justify this use of the
co...
2007 Oct 16
1
LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
Nick Piggin <npiggin@suse.de> wrote:
>
> Also, for non-wb memory. I don't think the Intel document referenced
> says anything about this, but the AMD document says that loads can pass
> loads (page 8, rule b).
>
> This is why our rmb() is still an lfence.
BTW, Xen (in particular, the code in drivers/xen) uses mb/rmb/wmb
instead of smp_mb/smp_rmb/smp_wmb when it accesses memory that's
shared with other Xen domains or the hypervisor.
The reason this is necessary is because even if a Xen domain is
UP the hypervisor might be SMP.
It would be nic...
2007 Oct 16
1
LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)
Nick Piggin <npiggin@suse.de> wrote:
>
> Also, for non-wb memory. I don't think the Intel document referenced
> says anything about this, but the AMD document says that loads can pass
> loads (page 8, rule b).
>
> This is why our rmb() is still an lfence.
BTW, Xen (in particular, the code in drivers/xen) uses mb/rmb/wmb
instead of smp_mb/smp_rmb/smp_wmb when it accesses memory that's
shared with other Xen domains or the hypervisor.
The reason this is necessary is because even if a Xen domain is
UP the hypervisor might be SMP.
It would be nic...
2017 Feb 14
2
[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support
...e completed before the
> second read of the sequence counter. I am working with the Windows team to correctly
> reflect this algorithm in the Hyper-V specification.
Thank you,
do I get it right that combining the above I only need to replace
virt_rmb() barriers with plain rmb() to get 'lfence' in hv_read_tsc_page
(PATCH 2)? As members of struct ms_hyperv_tsc_page are volatile we don't
need READ_ONCE(), compilers are not allowed to merge accesses. The
resulting code looks good to me:
(gdb) disassemble read_hv_clock_tsc
Dump of assembler code for function read_hv_clock_tsc:
0...
2017 Feb 14
2
[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support
...e completed before the
> second read of the sequence counter. I am working with the Windows team to correctly
> reflect this algorithm in the Hyper-V specification.
Thank you,
do I get it right that combining the above I only need to replace
virt_rmb() barriers with plain rmb() to get 'lfence' in hv_read_tsc_page
(PATCH 2)? As members of struct ms_hyperv_tsc_page are volatile we don't
need READ_ONCE(), compilers are not allowed to merge accesses. The
resulting code looks good to me:
(gdb) disassemble read_hv_clock_tsc
Dump of assembler code for function read_hv_clock_tsc:
0...
2016 Jan 12
3
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
...xchg but poking at gcc red zone */
#define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");": "=r"(ret) :: "memory", "cc"); } while (0)
#endif
#ifdef mfence
#define barrier() asm("mfence" ::: "memory")
#endif
#ifdef lfence
#define barrier() asm("lfence" ::: "memory")
#endif
#ifdef sfence
#define barrier() asm("sfence" ::: "memory")
#endif
int main(int argc, char **argv)
{
int i;
int j = 1234;
/*
* Test barrier in a loop. We also poke at a volatile variable in an
* atte...
2016 Jan 12
3
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
...xchg but poking at gcc red zone */
#define barrier() do { int ret; asm volatile ("xchgl %0, -4(%%" SP ");": "=r"(ret) :: "memory", "cc"); } while (0)
#endif
#ifdef mfence
#define barrier() asm("mfence" ::: "memory")
#endif
#ifdef lfence
#define barrier() asm("lfence" ::: "memory")
#endif
#ifdef sfence
#define barrier() asm("sfence" ::: "memory")
#endif
int main(int argc, char **argv)
{
int i;
int j = 1234;
/*
* Test barrier in a loop. We also poke at a volatile variable in an
* atte...
2016 Jan 13
6
[PATCH v3 0/4] x86: faster mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.
So let's use the locked variant everywhere.
While I was at it, I found some inconsistencies in comments in
arch/x86/include/asm/barrier.h
The documentation fixes are included first - I verified that
they do not change the generated code at all.
2016 Jan 13
6
[PATCH v3 0/4] x86: faster mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.
So let's use the locked variant everywhere.
While I was at it, I found some inconsistencies in comments in
arch/x86/include/asm/barrier.h
The documentation fixes are included first - I verified that
they do not change the generated code at all.
2017 Feb 14
0
[PATCH v2 0/3] x86/vdso: Add Hyper-V TSC page clocksource support
...ond read of the sequence counter. I am working with the Windows team to correctly
>> reflect this algorithm in the Hyper-V specification.
>
>
> Thank you,
>
> do I get it right that combining the above I only need to replace
> virt_rmb() barriers with plain rmb() to get 'lfence' in hv_read_tsc_page
> (PATCH 2)? As members of struct ms_hyperv_tsc_page are volatile we don't
> need READ_ONCE(), compilers are not allowed to merge accesses. The
> resulting code looks good to me:
No, on multiple counts, unfortunately.
1. LFENCE is basically useless except for...
2018 Feb 03
0
retpoline mitigation and 6.0
...re somewhat reluctant to guarantee an ABI here. At least I
> am. While we don't *expect* rampant divergence here, I don't want
> this to become something we cannot change if there are good reasons
> to do so. We've already changed the thunks once based on feedback
> (putting LFENCE after the PAUSE).
Surely adding the lfence was changing your implementation, not the ABI?
And if we really are talking about the *ABI* not the implementation,
I'm not sure I understand your concern.
The ABI for each thunk is that it is identical in all respects, apart
from speculation, to ...
2016 Jan 27
6
[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.
So we really should use the locked variant everywhere, except that intel manual
says that clflush is only ordered by mfence, so we can't.
Note: some callers of clflush seems to assume sfence will
order it, so there could be existing bugs around
2016 Jan 27
6
[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.
So we really should use the locked variant everywhere, except that intel manual
says that clflush is only ordered by mfence, so we can't.
Note: some callers of clflush seems to assume sfence will
order it, so there could be existing bugs around
2016 Jan 28
10
[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.
So we really should use the locked variant everywhere, except that intel manual
says that clflush is only ordered by mfence, so we can't.
Note: some callers of clflush seems to assume sfence will
order it, so there could be existing bugs around
2016 Jan 28
10
[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.
So we really should use the locked variant everywhere, except that intel manual
says that clflush is only ordered by mfence, so we can't.
Note: some callers of clflush seems to assume sfence will
order it, so there could be existing bugs around
2016 Jan 12
1
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
...w loop is fine. And while the xchg into
> the redzoen wouldn't be acceptable as a real implementation, for
> timing testing it's likely fine (ie you aren't hitting the problem it
> can cause).
>
> > So mfence is more expensive than locked instructions/xchg, but sfence/lfence
> > are slightly faster, and xchg and locked instructions are very close if
> > not the same.
>
> Note that we never actually *use* lfence/sfence. They are pointless
> instructions when looking at CPU memory ordering, because for pure CPU
> memory ordering stores and loads...
2016 Jan 12
1
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
...w loop is fine. And while the xchg into
> the redzoen wouldn't be acceptable as a real implementation, for
> timing testing it's likely fine (ie you aren't hitting the problem it
> can cause).
>
> > So mfence is more expensive than locked instructions/xchg, but sfence/lfence
> > are slightly faster, and xchg and locked instructions are very close if
> > not the same.
>
> Note that we never actually *use* lfence/sfence. They are pointless
> instructions when looking at CPU memory ordering, because for pure CPU
> memory ordering stores and loads...
2018 Mar 23
5
RFC: Speculative Load Hardening (a Spectre variant #1 mitigation)
...movq (%rcx), %rdi # Hardened load.
movl (%rdi), %edx # Unhardened load due to dependent
addr.
```
This doesn't check the load through `%rdi` as that pointer is dependent on a
checked load already.
###### Protect large, load-heavy blocks with a single lfence
It may be worth using a single `lfence` instruction at the start of a block
which begins with a (very) large number of loads that require independent
protection *and* which require hardening the address of the load. However,
this
is unlikely to be profitable in practice. The latency hit of the har...
2016 Jan 12
0
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
...gs go" the stupid raw loop is fine. And while the xchg into
the redzoen wouldn't be acceptable as a real implementation, for
timing testing it's likely fine (ie you aren't hitting the problem it
can cause).
> So mfence is more expensive than locked instructions/xchg, but sfence/lfence
> are slightly faster, and xchg and locked instructions are very close if
> not the same.
Note that we never actually *use* lfence/sfence. They are pointless
instructions when looking at CPU memory ordering, because for pure CPU
memory ordering stores and loads are already ordered.
The only...