thr3ads.net - search: "sfenc"

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

3

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

...;xchgl %0, -4(%%" SP ");": "=r"(ret) :: "memory", "cc"); } while (0) #endif #ifdef mfence #define barrier() asm("mfence" ::: "memory") #endif #ifdef lfence #define barrier() asm("lfence" ::: "memory") #endif #ifdef sfence #define barrier() asm("sfence" ::: "memory") #endif int main(int argc, char **argv) { int i; int j = 1234; /* * Test barrier in a loop. We also poke at a volatile variable in an * attempt to make it a bit more realistic - this way there's something * in the store...

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

3

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

...;xchgl %0, -4(%%" SP ");": "=r"(ret) :: "memory", "cc"); } while (0) #endif #ifdef mfence #define barrier() asm("mfence" ::: "memory") #endif #ifdef lfence #define barrier() asm("lfence" ::: "memory") #endif #ifdef sfence #define barrier() asm("sfence" ::: "memory") #endif int main(int argc, char **argv) { int i; int j = 1234; /* * Test barrier in a loop. We also poke at a volatile variable in an * attempt to make it a bit more realistic - this way there's something * in the store...

RFC: non-temporal fencing in LLVM IR

2016 Jan 14

2

RFC: non-temporal fencing in LLVM IR

...of the fences associated >>>> with x86 non-temporal accesses. >>>> >>>> AFAICT, nontemporal loads and stores seem to have different fencing >>>> rules on x86, none of them very clear. Nontemporal stores should probably >>>> ideally use an SFENCE. Locked instructions seem to be documented to work >>>> with MOVNTDQA. In both cases, there seems to be only empirical evidence as >>>> to which side(s) of the nontemporal operations they should go on? >>>> >>>> I finally decided that I was OK with...

[LLVMdev] Non-temporal moves in memset [Was: ASM output with JIT / codegen barriers]

2010 Jan 05

1

[LLVMdev] Non-temporal moves in memset [Was: ASM output with JIT / codegen barriers]

...;>>>> temporal >>>>> store to memory. This exempts it from all ordering considerations Hm...off topic from my original email since I think this is only relevant for multithreaded code... But from what I can tell, an implementation of memset that does not contain an sfence after using movnti is considered broken. Callers of memset would not (and should not need to) know that they must use an actual memory barrier (sfence) after the memset call to get the usual x86 store-store guarantee. Thread describing that bug in glibc memset implementation: http://sourcew...

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

1

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

...upid raw loop is fine. And while the xchg into > the redzoen wouldn't be acceptable as a real implementation, for > timing testing it's likely fine (ie you aren't hitting the problem it > can cause). > > > So mfence is more expensive than locked instructions/xchg, but sfence/lfence > > are slightly faster, and xchg and locked instructions are very close if > > not the same. > > Note that we never actually *use* lfence/sfence. They are pointless > instructions when looking at CPU memory ordering, because for pure CPU > memory ordering stores an...

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

1

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

...upid raw loop is fine. And while the xchg into > the redzoen wouldn't be acceptable as a real implementation, for > timing testing it's likely fine (ie you aren't hitting the problem it > can cause). > > > So mfence is more expensive than locked instructions/xchg, but sfence/lfence > > are slightly faster, and xchg and locked instructions are very close if > > not the same. > > Note that we never actually *use* lfence/sfence. They are pointless > instructions when looking at CPU memory ordering, because for pure CPU > memory ordering stores an...

RFC: non-temporal fencing in LLVM IR

2016 Jan 15

3

RFC: non-temporal fencing in LLVM IR

...iated with x86 non-temporal accesses. > > AFAICT, nontemporal loads and stores seem to have > different fencing rules on x86, none of them very > clear. Nontemporal stores should probably ideally > use an SFENCE. Locked instructions seem to be > documented to work with MOVNTDQA. In both cases, > there seems to be only empirical evidence as to > which side(s) of the nontemporal operations they > should go on? >...

[PATCH v3 0/4] x86: faster mb()+documentation tweaks

2016 Jan 13

6

[PATCH v3 0/4] x86: faster mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So let's use the locked variant everywhere. While I was at it, I found some inconsistencies in comments in arch/x86/include/asm/barrier.h The documentation fixes are included first - I verified that they do not change the generated code at all.

[PATCH v3 0/4] x86: faster mb()+documentation tweaks

2016 Jan 13

6

[PATCH v3 0/4] x86: faster mb()+documentation tweaks

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So let's use the locked variant everywhere. While I was at it, I found some inconsistencies in comments in arch/x86/include/asm/barrier.h The documentation fixes are included first - I verified that they do not change the generated code at all.

RFC: non-temporal fencing in LLVM IR

2016 Jan 14

2

RFC: non-temporal fencing in LLVM IR

...ll in favor of more systematic handling of the fences associated >> with x86 non-temporal accesses. >> >> AFAICT, nontemporal loads and stores seem to have different fencing rules >> on x86, none of them very clear. Nontemporal stores should probably >> ideally use an SFENCE. Locked instructions seem to be documented to work >> with MOVNTDQA. In both cases, there seems to be only empirical evidence as >> to which side(s) of the nontemporal operations they should go on? >> >> I finally decided that I was OK with using a LOCKed top-of-stack upd...

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

0

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

...an things go" the stupid raw loop is fine. And while the xchg into the redzoen wouldn't be acceptable as a real implementation, for timing testing it's likely fine (ie you aren't hitting the problem it can cause). > So mfence is more expensive than locked instructions/xchg, but sfence/lfence > are slightly faster, and xchg and locked instructions are very close if > not the same. Note that we never actually *use* lfence/sfence. They are pointless instructions when looking at CPU memory ordering, because for pure CPU memory ordering stores and loads are already ordered....

[PATCH v6] x86: use lock+addl for smp_mb()

2017 Oct 27

1

[PATCH v6] x86: use lock+addl for smp_mb()

...k; addl $0,0(%%esp)", "lfence", \ +#define rmb() asm volatile(ALTERNATIVE("lock; addl $0,-4(%%esp)", "lfence", \ X86_FEATURE_XMM2) ::: "memory", "cc") -#define wmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "sfence", \ +#define wmb() asm volatile(ALTERNATIVE("lock; addl $0,-4(%%esp)", "sfence", \ X86_FEATURE_XMM2) ::: "memory", "cc") #else #define mb() asm volatile("mfence":::"memory") @@ -30,7 +30,11 @@ #endif #define dma_wmb(...

[PATCH v6] x86: use lock+addl for smp_mb()

2017 Oct 27

1

[PATCH v6] x86: use lock+addl for smp_mb()

...k; addl $0,0(%%esp)", "lfence", \ +#define rmb() asm volatile(ALTERNATIVE("lock; addl $0,-4(%%esp)", "lfence", \ X86_FEATURE_XMM2) ::: "memory", "cc") -#define wmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "sfence", \ +#define wmb() asm volatile(ALTERNATIVE("lock; addl $0,-4(%%esp)", "sfence", \ X86_FEATURE_XMM2) ::: "memory", "cc") #else #define mb() asm volatile("mfence":::"memory") @@ -30,7 +30,11 @@ #endif #define dma_wmb(...

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

2016 Jan 27

6

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

...x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around this code. Fortunately no callers of clflush (except one) order it using smp_mb(), so after fixing that one caller, it seems safe to override smp_mb straight away. Down the road, it might make sense to introduce clflush_mb() and switch to tha...

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

2016 Jan 27

6

[PATCH v4 0/5] x86: faster smp_mb()+documentation tweaks

...x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around this code. Fortunately no callers of clflush (except one) order it using smp_mb(), so after fixing that one caller, it seems safe to override smp_mb straight away. Down the road, it might make sense to introduce clflush_mb() and switch to tha...

[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks

2016 Jan 28

10

[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks

...x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around this code. Fortunately no callers of clflush (except one) order it using smp_mb(), so after fixing that one caller, it seems safe to override smp_mb straight away. Down the road, it might make sense to introduce clflush_mb() and switch to tha...

[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks

2016 Jan 28

10

[PATCH v5 0/5] x86: faster smp_mb()+documentation tweaks

...x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl that we use on older CPUs. So we really should use the locked variant everywhere, except that intel manual says that clflush is only ordered by mfence, so we can't. Note: some callers of clflush seems to assume sfence will order it, so there could be existing bugs around this code. Fortunately no callers of clflush (except one) order it using smp_mb(), so after fixing that one caller, it seems safe to override smp_mb straight away. Down the road, it might make sense to introduce clflush_mb() and switch to tha...

RFC: non-temporal fencing in LLVM IR

2016 Jan 14

4

RFC: non-temporal fencing in LLVM IR

...he same problem for normal loads. I'm all in favor of more systematic handling of the fences associated with x86 non-temporal accesses. AFAICT, nontemporal loads and stores seem to have different fencing rules on x86, none of them very clear. Nontemporal stores should probably ideally use an SFENCE. Locked instructions seem to be documented to work with MOVNTDQA. In both cases, there seems to be only empirical evidence as to which side(s) of the nontemporal operations they should go on? I finally decided that I was OK with using a LOCKed top-of-stack update as a fence in Java on x86. I...

[PATCH v4 5/5] x86: drop mfence in favor of lock+addl

2016 Jan 27

0

[PATCH v4 5/5] x86: drop mfence in favor of lock+addl

...k; addl $0,0(%%esp)", "lfence", \ +#define rmb() asm volatile(ALTERNATIVE("lock; addl $0,-4(%%esp)", "lfence", \ X86_FEATURE_XMM2) ::: "memory", "cc") -#define wmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "sfence", \ +#define wmb() asm volatile(ALTERNATIVE("lock; addl $0,-4(%%esp)", "sfence", \ X86_FEATURE_XMM2) ::: "memory", "cc") #else #define mb() asm volatile("mfence":::"memory") @@ -30,7 +30,7 @@ #endif #define dma_wmb()...

[PATCH v5 1/5] x86: add cc clobber for addl

2016 Jan 28

0

[PATCH v5 1/5] x86: add cc clobber for addl

...be a * nop for these. */ -#define mb() alternative("lock; addl $0,0(%%esp)", "mfence", X86_FEATURE_XMM2) -#define rmb() alternative("lock; addl $0,0(%%esp)", "lfence", X86_FEATURE_XMM2) -#define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", X86_FEATURE_XMM) +#define mb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "mfence", \ + X86_FEATURE_XMM2) ::: "memory", "cc") +#define rmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "lfence", \ +...

search for: sfenc