hameeza ahmed via llvm-dev
2018-Jan-20 18:16 UTC
[llvm-dev] Non-Temporal hints from Loop Vectorizer
Actually i am working on vector accelerator which will perform those instructions which are non temporal. for instance if i have this loop for(i=0;i<2048;i++) a[i]=b[i]+c[i]; currently it emits following IR; %0 = getelementptr inbounds [2048 x i32], [2048 x i32]* @b, i64 0, i64 %index %1 = bitcast i32* %0 to <16 x i32>* %wide.load = load <16 x i32>, <16 x i32>* %1, align 16, !tbaa !1 %8 = getelementptr inbounds [2048 x i32], [2048 x i32]* @c, i64 0, i64 %index %9 = bitcast i32* %8 to <16 x i32>* %wide.load14 = load <16 x i32>, <16 x i32>* %9, align 16, !tbaa !1 %16 = add nsw <16 x i32> %wide.load14, %wide.load %20 = getelementptr inbounds [2048 x i32], [2048 x i32]* @a, i64 0, i64 %index %21 = bitcast i32* %20 to <16 x i32>* store <16 x i32> %16, <16 x i32>* %21, align 16, !tbaa !1 However, i want it to emit following IR %0 = getelementptr inbounds [2048 x i32], [2048 x i32]* @b, i64 0, i64 %index %1 = bitcast i32* %0 to <16 x i32>* %wide.load = load <16 x i32>, <16 x i32>* %1, align 16, !tbaa !1, !nontemporal !1 %8 = getelementptr inbounds [2048 x i32], [2048 x i32]* @c, i64 0, i64 %index %9 = bitcast i32* %8 to <16 x i32>* %wide.load14 = load <16 x i32>, <16 x i32>* %9, align 16, !tbaa !1, !nontemporal !1 %16 = add nsw <16 x i32> %wide.load14, %wide.load, !nontemporal !1 %20 = getelementptr inbounds [2048 x i32], [2048 x i32]* @a, i64 0, i64 %index %21 = bitcast i32* %20 to <16 x i32>* store <16 x i32> %16, <16 x i32>* %21, align 16, !tbaa !1, !nontemporal !1 so that i can offload load, add, store to accelerator hardware. is it possible here? do i need a separate pass to detect whether the loop has non temporal data or polly will help here? what do you say? On Sat, Jan 20, 2018 at 11:02 PM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote:> On 20/01/2018 17:44, hameeza ahmed via llvm-dev wrote: > >> Hello, >> >> My work deals with non-temporal loads and stores i found non-temporal >> meta data in llvm documentation but its not shown in IR. >> >> How to get non-temporal meta data? >> > llvm\test\CodeGen\X86\nontemporal-loads.ll shows how to create nt vector > loads in IR - is that what you're after? > > Simon. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180120/134a88ce/attachment.html>
Simon Pilgrim via llvm-dev
2018-Jan-20 18:26 UTC
[llvm-dev] Non-Temporal hints from Loop Vectorizer
On 20/01/2018 18:16, hameeza ahmed wrote:> Actually i am working on vector accelerator which will perform those > instructions which are non temporal. > > for instance if i have this loop > > for(i=0;i<2048;i++) > a[i]=b[i]+c[i]; > > currently it emits following IR; > > > %0 = getelementptr inbounds [2048 x i32], [2048 x i32]* @b, i64 0, > i64 %index > %1 = bitcast i32* %0 to <16 x i32>* > %wide.load = load <16 x i32>, <16 x i32>* %1, align 16, !tbaa !1 > %8 = getelementptr inbounds [2048 x i32], [2048 x i32]* @c, i64 0, > i64 %index > %9 = bitcast i32* %8 to <16 x i32>* > %wide.load14 = load <16 x i32>, <16 x i32>* %9, align 16, !tbaa !1 > %16 = add nsw <16 x i32> %wide.load14, %wide.load > %20 = getelementptr inbounds [2048 x i32], [2048 x i32]* @a, i64 0, > i64 %index > %21 = bitcast i32* %20 to <16 x i32>* > store <16 x i32> %16, <16 x i32>* %21, align 16, !tbaa !1 > > > However, i want it to emit following IR > > %0 = getelementptr inbounds [2048 x i32], [2048 x i32]* @b, i64 0, > i64 %index > %1 = bitcast i32* %0 to <16 x i32>* > %wide.load = load <16 x i32>, <16 x i32>* %1, align 16, !tbaa !1, > !nontemporal !1 > %8 = getelementptr inbounds [2048 x i32], [2048 x i32]* @c, i64 0, > i64 %index > %9 = bitcast i32* %8 to <16 x i32>* > %wide.load14 = load <16 x i32>, <16 x i32>* %9, align 16, !tbaa > !1, !nontemporal !1 > %16 = add nsw <16 x i32> %wide.load14, %wide.load, !nontemporal !1 > %20 = getelementptr inbounds [2048 x i32], [2048 x i32]* @a, i64 0, > i64 %index > %21 = bitcast i32* %20 to <16 x i32>* > store <16 x i32> %16, <16 x i32>* %21, align 16, !tbaa > !1, !nontemporal !1 > > so that i can offload load, add, store to accelerator hardware. is it > possible here? do i need a separate pass to detect whether the loop > has non temporal data or polly will help here? what do you say?From C/C++ you just need to use the __builtin_nontemporal_store/__builtin_nontemporal_load builtins to tag the stores/loads with the nontemporal flag. for(i=0;i<2048;i++) { __builtin_nontemporal_store( __builtin_nontemporal_load(b+i) + __builtin_nontemporal_load(c + i), a + i ); } There may be an attribute you can tag pointers with instead but I don't know off hand.> On Sat, Jan 20, 2018 at 11:02 PM, Simon Pilgrim > <llvm-dev at redking.me.uk <mailto:llvm-dev at redking.me.uk>> wrote: > > On 20/01/2018 17:44, hameeza ahmed via llvm-dev wrote: > > Hello, > > My work deals with non-temporal loads and stores i found > non-temporal meta data in llvm documentation but its not shown > in IR. > > How to get non-temporal meta data? > > llvm\test\CodeGen\X86\nontemporal-loads.ll shows how to create nt > vector loads in IR - is that what you're after? > > Simon. > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180120/ad76c719/attachment.html>
hameeza ahmed via llvm-dev
2018-Jan-20 18:29 UTC
[llvm-dev] Non-Temporal hints from Loop Vectorizer
i have already seen usage of __builtin_nontemporal_store but i want to automate identification of non temporal loads/stores. i think i need to go for a pass. is it possiblee to detect non temporal loops without polly? On Sat, Jan 20, 2018 at 11:26 PM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote:> On 20/01/2018 18:16, hameeza ahmed wrote: > > Actually i am working on vector accelerator which will perform those > instructions which are non temporal. > > for instance if i have this loop > > for(i=0;i<2048;i++) > a[i]=b[i]+c[i]; > > currently it emits following IR; > > > %0 = getelementptr inbounds [2048 x i32], [2048 x i32]* @b, i64 0, i64 > %index > %1 = bitcast i32* %0 to <16 x i32>* > %wide.load = load <16 x i32>, <16 x i32>* %1, align 16, !tbaa !1 > %8 = getelementptr inbounds [2048 x i32], [2048 x i32]* @c, i64 0, i64 > %index > %9 = bitcast i32* %8 to <16 x i32>* > %wide.load14 = load <16 x i32>, <16 x i32>* %9, align 16, !tbaa !1 > %16 = add nsw <16 x i32> %wide.load14, %wide.load > %20 = getelementptr inbounds [2048 x i32], [2048 x i32]* @a, i64 0, i64 > %index > %21 = bitcast i32* %20 to <16 x i32>* > store <16 x i32> %16, <16 x i32>* %21, align 16, !tbaa !1 > > > However, i want it to emit following IR > > %0 = getelementptr inbounds [2048 x i32], [2048 x i32]* @b, i64 0, i64 > %index > %1 = bitcast i32* %0 to <16 x i32>* > %wide.load = load <16 x i32>, <16 x i32>* %1, align 16, !tbaa !1, > !nontemporal !1 > %8 = getelementptr inbounds [2048 x i32], [2048 x i32]* @c, i64 0, i64 > %index > %9 = bitcast i32* %8 to <16 x i32>* > %wide.load14 = load <16 x i32>, <16 x i32>* %9, align 16, !tbaa > !1, !nontemporal !1 > %16 = add nsw <16 x i32> %wide.load14, %wide.load, !nontemporal !1 > %20 = getelementptr inbounds [2048 x i32], [2048 x i32]* @a, i64 0, i64 > %index > %21 = bitcast i32* %20 to <16 x i32>* > store <16 x i32> %16, <16 x i32>* %21, align 16, !tbaa !1, !nontemporal > !1 > > so that i can offload load, add, store to accelerator hardware. is it > possible here? do i need a separate pass to detect whether the loop has non > temporal data or polly will help here? what do you say? > > From C/C++ you just need to use the __builtin_nontemporal_store/__builtin_nontemporal_load > builtins to tag the stores/loads with the nontemporal flag. > > for(i=0;i<2048;i++) { > __builtin_nontemporal_store( __builtin_nontemporal_load(b+i) + > __builtin_nontemporal_load(c + i), a + i ); > } > > There may be an attribute you can tag pointers with instead but I don't > know off hand. > > On Sat, Jan 20, 2018 at 11:02 PM, Simon Pilgrim <llvm-dev at redking.me.uk> > wrote: > >> On 20/01/2018 17:44, hameeza ahmed via llvm-dev wrote: >> >>> Hello, >>> >>> My work deals with non-temporal loads and stores i found non-temporal >>> meta data in llvm documentation but its not shown in IR. >>> >>> How to get non-temporal meta data? >>> >> llvm\test\CodeGen\X86\nontemporal-loads.ll shows how to create nt vector >> loads in IR - is that what you're after? >> >> Simon. >> > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180120/923638cd/attachment.html>