similar to: RFC: non-temporal fencing in LLVM IR

Displaying 20 results from an estimated 8000 matches similar to: "RFC: non-temporal fencing in LLVM IR"

2016 Jan 14
2
RFC: non-temporal fencing in LLVM IR
Hi JF, Philip, Clang currently has __builtin_nontemporal_store and __builtin_nontemporal_load. How will the usage model for those change? Thanks again, Hal ----- Original Message ----- > From: "Philip Reames via llvm-dev" <llvm-dev at lists.llvm.org> > To: "JF Bastien" <jfb at google.com>, "llvm-dev" > <llvm-dev at lists.llvm.org> >
2016 Jan 13
2
RFC: non-temporal fencing in LLVM IR
On Wed, Jan 13, 2016 at 10:32 AM, John Brawn <John.Brawn at arm.com> wrote: > *What about non-x86 architectures?* > > > > Architectures such as ARMv8 support non-temporal instructions and require > barriers such as DMB nshld to order loads and DMB nshst to order stores. > > > > Even ARM's address-dependency rule (a.k.a. the ill-fated >
2016 Jan 14
4
RFC: non-temporal fencing in LLVM IR
I agree with Tim's assessment for ARM. That's interesting; I wasn't previously aware of that instruction. My understanding is that Alpha would have the same problem for normal loads. I'm all in favor of more systematic handling of the fences associated with x86 non-temporal accesses. AFAICT, nontemporal loads and stores seem to have different fencing rules on x86, none of them
2016 Jan 14
2
RFC: non-temporal fencing in LLVM IR
On Thu, Jan 14, 2016 at 1:10 PM, David Majnemer via llvm-dev < llvm-dev at lists.llvm.org> wrote: > > > On Wed, Jan 13, 2016 at 7:00 PM, Hans Boehm via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> I agree with Tim's assessment for ARM. That's interesting; I wasn't >> previously aware of that instruction. >> >> My
2016 Jan 14
2
RFC: non-temporal fencing in LLVM IR
On Thu, Jan 14, 2016 at 1:35 PM, David Majnemer <david.majnemer at gmail.com> wrote: > > > On Thu, Jan 14, 2016 at 1:13 PM, JF Bastien <jfb at google.com> wrote: > >> On Thu, Jan 14, 2016 at 1:10 PM, David Majnemer via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> >>> >>> On Wed, Jan 13, 2016 at 7:00 PM, Hans
2016 Jan 15
3
RFC: non-temporal fencing in LLVM IR
On 01/14/2016 04:05 PM, Hans Boehm via llvm-dev wrote: > > > On Thu, Jan 14, 2016 at 1:37 PM, JF Bastien <jfb at google.com > <mailto:jfb at google.com>> wrote: > > On Thu, Jan 14, 2016 at 1:35 PM, David Majnemer > <david.majnemer at gmail.com <mailto:david.majnemer at gmail.com>> wrote: > > > > On Thu, Jan 14, 2016 at 1:13
2016 Jan 12
3
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote: > On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave at stgolabs.net> wrote: > > > > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is > > constantly cheaper (by at least half the latency) than MFENCE. While there > > was a decent amount of variation, this difference
2016 Jan 12
3
[PATCH 3/4] x86,asm: Re-work smp_store_mb()
On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote: > On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave at stgolabs.net> wrote: > > > > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is > > constantly cheaper (by at least half the latency) than MFENCE. While there > > was a decent amount of variation, this difference
2018 Jan 20
2
Non-Temporal hints from Loop Vectorizer
i have already seen usage of __builtin_nontemporal_store but i want to automate identification of non temporal loads/stores. i think i need to go for a pass. is it possiblee to detect non temporal loops without polly? On Sat, Jan 20, 2018 at 11:26 PM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote: > On 20/01/2018 18:16, hameeza ahmed wrote: > > Actually i am working on vector
2018 Jan 20
2
Non-Temporal hints from Loop Vectorizer
Actually i am working on vector accelerator which will perform those instructions which are non temporal. for instance if i have this loop for(i=0;i<2048;i++) a[i]=b[i]+c[i]; currently it emits following IR; %0 = getelementptr inbounds [2048 x i32], [2048 x i32]* @b, i64 0, i64 %index %1 = bitcast i32* %0 to <16 x i32>* %wide.load = load <16 x i32>, <16 x i32>* %1,
2016 Jan 12
7
[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs. So let's use the locked variant everywhere - helps keep the code simple as well. While I was at it, I found some inconsistencies in comments in arch/x86/include/asm/barrier.h I hope I'm not splitting this up too much - the reason
2016 Jan 12
7
[PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs. So let's use the locked variant everywhere - helps keep the code simple as well. While I was at it, I found some inconsistencies in comments in arch/x86/include/asm/barrier.h I hope I'm not splitting this up too much - the reason
2018 Jan 20
2
Non-Temporal hints from Loop Vectorizer
Hello, My work deals with non-temporal loads and stores i found non-temporal meta data in llvm documentation but its not shown in IR. How to get non-temporal meta data? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180120/7dd4ba6f/attachment.html>
2018 Jan 21
0
Non-Temporal hints from Loop Vectorizer
On 01/20/2018 12:29 PM, hameeza ahmed via llvm-dev wrote: > i have already seen usage of __builtin_nontemporal_store but i want to > automate identification of non temporal loads/stores. i think i need > to go for a pass. is it possiblee to detect non temporal loops without > polly? Yes, but we don't have anything that does that right now. The cost modeling is non-trivial,
2018 Jan 20
0
Non-Temporal hints from Loop Vectorizer
On 20/01/2018 18:16, hameeza ahmed wrote: > Actually i am working on vector accelerator which will perform those > instructions which are non temporal. > > for instance if i have this loop > > for(i=0;i<2048;i++) > a[i]=b[i]+c[i]; > > currently it emits following IR; > > >   %0 = getelementptr inbounds [2048 x i32], [2048 x i32]* @b, i64 0, > i64 %index
2018 Jan 20
0
Non-Temporal hints from Loop Vectorizer
On 20/01/2018 17:44, hameeza ahmed via llvm-dev wrote: > Hello, > > My work deals with non-temporal loads and stores i found non-temporal > meta data in llvm documentation but its not shown in IR. > > How to get non-temporal meta data? llvm\test\CodeGen\X86\nontemporal-loads.ll shows how to create nt vector loads in IR - is that what you're after? Simon.
2020 Apr 28
2
Nontemporal memory accesses and fences
The current specification of the behavior of the !nontemporal attribute in LLVM, and the __builtin_nontemporal_* functions in Clang, is rather spartan and underspecified. In effect, it says the following things: * Atomic !nontemporal has no defined semantics * !nontemporal may use special instructions to save cache bandwidth, such as "MOVNT" on x86. What is crucially lacking
2016 May 03
6
[RFC] Non-Temporal hints from Loop Vectorizer
Hello all, I've been wondering why Clang doesn't generate non-temporal stores when compiling the STREAM benchmark [1] and therefore doesn't yield optimal results. It turned out that the Loop Vectorizer correctly vectorizes the arithmetic operations and also merges the loads and stores into vector operations. However it doesn't add the '!nontemporal' metadata which would
2020 Apr 29
2
Nontemporal memory accesses and fences
________________________________ From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of JF Bastien via llvm-dev <llvm-dev at lists.llvm.org> Sent: Tuesday, April 28, 2020 4:54 PM To: Cranmer, Joshua <joshua.cranmer at intel.com> Cc: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] Nontemporal memory accesses and fences I see
2010 Feb 11
3
[LLVMdev] Adding NonTemporal
While hacking around in the SelectionDAG build code, I've made the isVolatile, (new) isNonTemporal and Alignment parameters to SelectionDAG::getLoad/getStore and friends non-default. I've already caught one bug in the XCore backend by doing this: if (Offset % 4 == 0) { // We've managed to infer better alignment information than the load // already has. Use an aligned