thr3ads.net - similar to: "Stream loop in llvm"

Displaying 20 results from an estimated 200000 matches similar to: "Stream loop in llvm"

2018 Jan 29

Polly loop offloading to Accelerator

Thank You. i used -polly-ast-detect-parallel but there is no coincident info generated; my c code is simple vec-sum as follows; #include <stdio.h> int a[2048], b[2048], c[2048]; foo () { int i; for (i=0; i<2048; i++) { a[i]=b[5] + c[i]; } } i executed following commands; $clang -S -emit-llvm vec-sum.cpp -march=native -O3 -mllvm -disable-llvm-optzns -o vec-sum.s $opt -S

Non-Temporal hints from Loop Vectorizer

2018 Jan 21

Non-Temporal hints from Loop Vectorizer

On 01/20/2018 12:29 PM, hameeza ahmed via llvm-dev wrote: > i have already seen usage of __builtin_nontemporal_store but i want to > automate identification of non temporal loads/stores. i think i need > to go for a pass. is it possiblee to detect non temporal loops without > polly? Yes, but we don't have anything that does that right now. The cost modeling is non-trivial,

Non-Temporal hints from Loop Vectorizer

2018 Jan 20

Non-Temporal hints from Loop Vectorizer

i have already seen usage of __builtin_nontemporal_store but i want to automate identification of non temporal loads/stores. i think i need to go for a pass. is it possiblee to detect non temporal loops without polly? On Sat, Jan 20, 2018 at 11:26 PM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote: > On 20/01/2018 18:16, hameeza ahmed wrote: > > Actually i am working on vector

Non-Temporal hints from Loop Vectorizer

2018 Jan 20

Non-Temporal hints from Loop Vectorizer

On 20/01/2018 18:16, hameeza ahmed wrote: > Actually i am working on vector accelerator which will perform those > instructions which are non temporal. > > for instance if i have this loop > > for(i=0;i<2048;i++) > a[i]=b[i]+c[i]; > > currently it emits following IR; > > > %0 = getelementptr inbounds [2048 x i32], [2048 x i32]* @b, i64 0, > i64 %index

Non-Temporal hints from Loop Vectorizer

2018 Jan 20

Non-Temporal hints from Loop Vectorizer

Actually i am working on vector accelerator which will perform those instructions which are non temporal. for instance if i have this loop for(i=0;i<2048;i++) a[i]=b[i]+c[i]; currently it emits following IR; %0 = getelementptr inbounds [2048 x i32], [2048 x i32]* @b, i64 0, i64 %index %1 = bitcast i32* %0 to <16 x i32>* %wide.load = load <16 x i32>, <16 x i32>* %1,

Polly loop offloading to Accelerator

2018 Jan 20

Polly loop offloading to Accelerator

Hello, i have been working with an accelerator backend. the accelerator has large vector/simd units. i want streaming loops (non-temporal) vectorized present in code to be offloaded to accelerator simd units. i find polly really suitable for this. i am thinking if the generated IR is passed to polly and then it analyzes loop to know it posses no reuse, if such loop is identified accelerator

Non-Temporal hints from Loop Vectorizer

2018 Jan 20

Non-Temporal hints from Loop Vectorizer

On 20/01/2018 17:44, hameeza ahmed via llvm-dev wrote: > Hello, > > My work deals with non-temporal loads and stores i found non-temporal > meta data in llvm documentation but its not shown in IR. > > How to get non-temporal meta data? llvm\test\CodeGen\X86\nontemporal-loads.ll shows how to create nt vector loads in IR - is that what you're after? Simon.

Non-Temporal hints from Loop Vectorizer

2018 Jan 20

Non-Temporal hints from Loop Vectorizer

Hello, My work deals with non-temporal loads and stores i found non-temporal meta data in llvm documentation but its not shown in IR. How to get non-temporal meta data? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180120/7dd4ba6f/attachment.html>

[RFC] Non-Temporal hints from Loop Vectorizer

2016 May 03

[RFC] Non-Temporal hints from Loop Vectorizer

Steve Canon is on vacation, so I’m going to word for word quote his take on the compiler autogenerating nontemporal hints: "nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope nope n” — Steve Canon —escha > On May 3, 2016, at 10:26 AM, via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Non-temporal hints

LLVM Matrix Multiplication Loop Vectorizer

2017 Jun 27

LLVM Matrix Multiplication Loop Vectorizer

Hello, i am trying to vectorize a simple matrix multiplication in llvm; here is my code; #include <stdio.h> #define N 1000 // This function multiplies A[][] and B[][], and stores // the result in C[][] void multiply(int A[][N], int B[][N], int C[][N]) { int i, j, k; for (i = 0; i < N; i++) { for (j = 0; j < N; j++) { C[i][j] = 0;

[RFC] Non-Temporal hints from Loop Vectorizer

2016 May 03

[RFC] Non-Temporal hints from Loop Vectorizer

Hello all, I've been wondering why Clang doesn't generate non-temporal stores when compiling the STREAM benchmark [1] and therefore doesn't yield optimal results. It turned out that the Loop Vectorizer correctly vectorizes the arithmetic operations and also merges the loads and stores into vector operations. However it doesn't add the '!nontemporal' metadata which would

[RFC] A New Divergence Analysis for LLVM

2018 May 28

[RFC] A New Divergence Analysis for LLVM

TL;DR This RFC is a joint effort by Intel and Saarland University to bring the divergence analysis of the Region Vectorizer [1,2,3,4,5] (dubbed the vectorization analysis of RV) to LLVM. The implementation is available on github for feedback [0]. The existing divergence analysis infrastructure in LLVM has conceptual limitations (structured control, SCEV based). The new analysis resolves bugs

Vectorizing remainder loop

2018 Aug 02

Vectorizing remainder loop

Hi Hameeza, Aside from Ashutosh's patch..... When the vector width is that large, we can't keep vectorizing remainder like below. It'll be a huge code size if nothing else ---- hitting ITLB miss because of this is very bad, for example. VF=2048 // main vector loop VF=1024 // vectorized remainder 1 VF=512 // vectorized remainder 2 ... Vectorize remainder until trip count is

Conditional Register Assignment based on the no of loop iterations

2017 Jul 10

Conditional Register Assignment based on the no of loop iterations

Here basically my problem is vector width since i have used v64i32 in my backend. now if vector width=64. i want the Reg_B class registers to be assigned and if vector width=2048 i want Reg_A registers to be assigned to instruction. Should i incorporate the solution in lowering stage? some thing like; addRegisterClass(MVT::v2048i32, &X86::Reg_B);

Vectorizing remainder loop

2018 Jul 29

Vectorizing remainder loop

Hello, I m working on a hardware with very large vector width till v2048. Now when I vectorize using llvm default vectorizer maximum 2047 iterations are scalar remainder loop. These are not vectorized by llvm which increases the cost. However these should be vectorized using next available vector width I.e v1024, v512, v256, v128, v64, v32, v16, v8, v4..... The issue of scalar remainder loop has

Vectorizing remainder loop

2018 Aug 03

Vectorizing remainder loop

>it cannot afford large size masks for large vectors So, even a standard way of vectorizing remainder in masked or unmasked fashion wouldn’t work, I suppose. Ouch. I suppose VPlan should be able to model this kind of gigantic remainder vector code (when the time comes). Not pretty at all, though. Now, be fully aware that Direction #2 is really a poor (or rather extremely poor) person’s

OggPCM2 : chunked vs interleaved data

2005 Nov 15

OggPCM2 : chunked vs interleaved data

On 2005-11-16, Jean-Marc Valin wrote: > Otherwise, what do you feel should be changed? One obvious thing that seems to be lacking is the granulepos mapping. As suggested in Ogg documentation, for audio a simple sampling frame number ought to suffice, but I think the convention should still be spelled out. Secondly, I'd like to see the channel map fleshed out in more detail. (Beware

Ogg stream URIs

2007 Sep 11

Ogg stream URIs

So, I thought I'd split off a few discussions from the ongoing discussion of the details of a metadata format. A few other things are needed in support, one of these is a scheme for referring to IDs on Ogg files. Curently there is CMML, you can refer to temporal fragments at the stream level. These may be CMML IDs or time interval queries/fragments as in

unable to emit vectorized code in LLVM IR

2017 Aug 17

unable to emit vectorized code in LLVM IR

even if i make my code as follows: vectorized instructions not get emitted. What to do? int main(int argc, char** argv) { int a[1000], b[1000], c[1000]; int g=0; int aa=atoi(argv[1]), bb=atoi(argv[2]); for (int i=0; i<1000; i++) { a[i]=aa, b[i]=bb; c[i]=a[i] + b[i]; g+=c[i]; } printf("sum: %d\n", g); return 0; } On Thu, Aug 17, 2017 at 10:03 PM, Craig Topper <craig.topper at

unable to emit vectorized code in LLVM IR

2017 Aug 17

unable to emit vectorized code in LLVM IR

Ok. I have managed to vectorize the second loop in the following code. But the first loop is still not vectorized? Why? int main(int argc, char** argv) { int a[1000], b[1000], c[1000]; int g=0; int aa=atoi(argv[1]), bb=atoi(argv[2]); for (int i=0; i<1000; i++) { a[i]=aa+i, b[i]=bb+i;} for (int i=0; i<1000; i++) { c[i]=a[i] + b[i]; g+=c[i]; } printf("sum: %d\n", g); return 0;

similar to: Stream loop in llvm