similar to: Stream loop in llvm

Displaying 20 results from an estimated 200000 matches similar to: "Stream loop in llvm"

2018 Jan 29
1
Polly loop offloading to Accelerator
Thank You. i used -polly-ast-detect-parallel but there is no coincident info generated; my c code is simple vec-sum as follows; #include <stdio.h> int a[2048], b[2048], c[2048]; foo () { int i; for (i=0; i<2048; i++) { a[i]=b[5] + c[i]; } } i executed following commands; $clang -S -emit-llvm vec-sum.cpp -march=native -O3 -mllvm -disable-llvm-optzns -o vec-sum.s $opt -S
2018 Jan 21
0
Non-Temporal hints from Loop Vectorizer
On 01/20/2018 12:29 PM, hameeza ahmed via llvm-dev wrote: > i have already seen usage of __builtin_nontemporal_store but i want to > automate identification of non temporal loads/stores. i think i need > to go for a pass. is it possiblee to detect non temporal loops without > polly? Yes, but we don't have anything that does that right now. The cost modeling is non-trivial,
2018 Jan 20
2
Non-Temporal hints from Loop Vectorizer
i have already seen usage of __builtin_nontemporal_store but i want to automate identification of non temporal loads/stores. i think i need to go for a pass. is it possiblee to detect non temporal loops without polly? On Sat, Jan 20, 2018 at 11:26 PM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote: > On 20/01/2018 18:16, hameeza ahmed wrote: > > Actually i am working on vector
2018 Jan 20
0
Non-Temporal hints from Loop Vectorizer
On 20/01/2018 18:16, hameeza ahmed wrote: > Actually i am working on vector accelerator which will perform those > instructions which are non temporal. > > for instance if i have this loop > > for(i=0;i<2048;i++) > a[i]=b[i]+c[i]; > > currently it emits following IR; > > >   %0 = getelementptr inbounds [2048 x i32], [2048 x i32]* @b, i64 0, > i64 %index
2018 Jan 20
2
Non-Temporal hints from Loop Vectorizer
Actually i am working on vector accelerator which will perform those instructions which are non temporal. for instance if i have this loop for(i=0;i<2048;i++) a[i]=b[i]+c[i]; currently it emits following IR; %0 = getelementptr inbounds [2048 x i32], [2048 x i32]* @b, i64 0, i64 %index %1 = bitcast i32* %0 to <16 x i32>* %wide.load = load <16 x i32>, <16 x i32>* %1,
2018 Jan 20
1
Polly loop offloading to Accelerator
Hello, i have been working with an accelerator backend. the accelerator has large vector/simd units. i want streaming loops (non-temporal) vectorized present in code to be offloaded to accelerator simd units. i find polly really suitable for this. i am thinking if the generated IR is passed to polly and then it analyzes loop to know it posses no reuse, if such loop is identified accelerator
2018 Jan 20
0
Non-Temporal hints from Loop Vectorizer
On 20/01/2018 17:44, hameeza ahmed via llvm-dev wrote: > Hello, > > My work deals with non-temporal loads and stores i found non-temporal > meta data in llvm documentation but its not shown in IR. > > How to get non-temporal meta data? llvm\test\CodeGen\X86\nontemporal-loads.ll shows how to create nt vector loads in IR - is that what you're after? Simon.
2018 Jan 20
2
Non-Temporal hints from Loop Vectorizer
Hello, My work deals with non-temporal loads and stores i found non-temporal meta data in llvm documentation but its not shown in IR. How to get non-temporal meta data? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180120/7dd4ba6f/attachment.html>
2016 May 03
2
[RFC] Non-Temporal hints from Loop Vectorizer
Steve Canon is on vacation, so I’m going to word for word quote his take on the compiler autogenerating nontemporal hints: "nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope n” — Steve Canon —escha > On May 3, 2016, at 10:26 AM, via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Non-temporal hints
2017 Jun 27
3
LLVM Matrix Multiplication Loop Vectorizer
Hello, i am trying to vectorize a simple matrix multiplication in llvm; here is my code; #include <stdio.h> #define N 1000 // This function multiplies A[][] and B[][], and stores // the result in C[][] void multiply(int A[][N], int B[][N], int C[][N]) { int i, j, k; for (i = 0; i < N; i++) { for (j = 0; j < N; j++) { C[i][j] = 0;
2016 May 03
6
[RFC] Non-Temporal hints from Loop Vectorizer
Hello all, I've been wondering why Clang doesn't generate non-temporal stores when compiling the STREAM benchmark [1] and therefore doesn't yield optimal results. It turned out that the Loop Vectorizer correctly vectorizes the arithmetic operations and also merges the loads and stores into vector operations. However it doesn't add the '!nontemporal' metadata which would
2018 May 28
0
[RFC] A New Divergence Analysis for LLVM
TL;DR This RFC is a joint effort by Intel and Saarland University to bring the divergence analysis of the Region Vectorizer [1,2,3,4,5] (dubbed the vectorization analysis of RV) to LLVM. The implementation is available on github for feedback [0]. The existing divergence analysis infrastructure in LLVM has conceptual limitations (structured control, SCEV based). The new analysis resolves bugs
2018 Aug 02
2
Vectorizing remainder loop
Hi Hameeza, Aside from Ashutosh's patch..... When the vector width is that large, we can't keep vectorizing remainder like below. It'll be a huge code size if nothing else ---- hitting ITLB miss because of this is very bad, for example. VF=2048 // main vector loop VF=1024 // vectorized remainder 1 VF=512 // vectorized remainder 2 ... Vectorize remainder until trip count is
2017 Jul 10
2
Conditional Register Assignment based on the no of loop iterations
Here basically my problem is vector width since i have used v64i32 in my backend. now if vector width=64. i want the Reg_B class registers to be assigned and if vector width=2048 i want Reg_A registers to be assigned to instruction. Should i incorporate the solution in lowering stage? some thing like; addRegisterClass(MVT::v2048i32, &X86::Reg_B);
2018 Jul 29
2
Vectorizing remainder loop
Hello, I m working on a hardware with very large vector width till v2048. Now when I vectorize using llvm default vectorizer maximum 2047 iterations are scalar remainder loop. These are not vectorized by llvm which increases the cost. However these should be vectorized using next available vector width I.e v1024, v512, v256, v128, v64, v32, v16, v8, v4..... The issue of scalar remainder loop has
2018 Aug 03
2
Vectorizing remainder loop
>it cannot afford large size masks for large vectors So, even a standard way of vectorizing remainder in masked or unmasked fashion wouldn’t work, I suppose. Ouch. I suppose VPlan should be able to model this kind of gigantic remainder vector code (when the time comes). Not pretty at all, though. Now, be fully aware that Direction #2 is really a poor (or rather extremely poor) person’s
2005 Nov 15
0
OggPCM2 : chunked vs interleaved data
On 2005-11-16, Jean-Marc Valin wrote: > Otherwise, what do you feel should be changed? One obvious thing that seems to be lacking is the granulepos mapping. As suggested in Ogg documentation, for audio a simple sampling frame number ought to suffice, but I think the convention should still be spelled out. Secondly, I'd like to see the channel map fleshed out in more detail. (Beware
2007 Sep 11
0
Ogg stream URIs
So, I thought I'd split off a few discussions from the ongoing discussion of the details of a metadata format. A few other things are needed in support, one of these is a scheme for referring to IDs on Ogg files. Curently there is CMML, you can refer to temporal fragments at the stream level. These may be CMML IDs or time interval queries/fragments as in
2017 Aug 17
2
unable to emit vectorized code in LLVM IR
even if i make my code as follows: vectorized instructions not get emitted. What to do? int main(int argc, char** argv) { int a[1000], b[1000], c[1000]; int g=0; int aa=atoi(argv[1]), bb=atoi(argv[2]); for (int i=0; i<1000; i++) { a[i]=aa, b[i]=bb; c[i]=a[i] + b[i]; g+=c[i]; } printf("sum: %d\n", g); return 0; } On Thu, Aug 17, 2017 at 10:03 PM, Craig Topper <craig.topper at
2017 Aug 17
2
unable to emit vectorized code in LLVM IR
Ok. I have managed to vectorize the second loop in the following code. But the first loop is still not vectorized? Why? int main(int argc, char** argv) { int a[1000], b[1000], c[1000]; int g=0; int aa=atoi(argv[1]), bb=atoi(argv[2]); for (int i=0; i<1000; i++) { a[i]=aa+i, b[i]=bb+i;} for (int i=0; i<1000; i++) { c[i]=a[i] + b[i]; g+=c[i]; } printf("sum: %d\n", g); return 0;