search for: ir0

Displaying 20 results from an estimated 25 matches for "ir0".

Did you mean: ir
2013 Oct 31
2
[LLVMdev] loop vectorizer
...eteness, here the code: > > void bar(std::uint64_t start, std::uint64_t end, float * __restrict__ c, float * __restrict__ a, float * __restrict__ b) > { > const std::uint64_t inner = 4; > for (std::uint64_t i = start ; i < end ; i+=4 ) { > { > const std::uint64_t ir0 = ( ((i+0)/inner) * 2 + 0 ) * inner + (i+0)%4; > const std::uint64_t ir1 = ( ((i+0)/inner) * 2 + 1 ) * inner + (i+0)%4; > c[ ir0 ] = a[ ir0 ] + b[ ir0 ]; > c[ ir1 ] = a[ ir1 ] + b[ ir1 ]; > } > { > const std::uint64_t ir0 =...
2013 Oct 31
0
[LLVMdev] loop vectorizer
...index_1 = 15 For completeness, here the code: void bar(std::uint64_t start, std::uint64_t end, float * __restrict__ c, float * __restrict__ a, float * __restrict__ b) { const std::uint64_t inner = 4; for (std::uint64_t i = start ; i < end ; i+=4 ) { { const std::uint64_t ir0 = ( ((i+0)/inner) * 2 + 0 ) * inner + (i+0)%4; const std::uint64_t ir1 = ( ((i+0)/inner) * 2 + 1 ) * inner + (i+0)%4; c[ ir0 ] = a[ ir0 ] + b[ ir0 ]; c[ ir1 ] = a[ ir1 ] + b[ ir1 ]; } { const std::uint64_t ir0 = ( ((i+1)/inner)...
2013 Oct 31
3
[LLVMdev] loop vectorizer misses opportunity, exploit
...include <cstdint> > #include <iostream> > > void bar(std::uint64_t start, std::uint64_t end, float * __restrict__ c, float * __restrict__ a, float * __restrict__ b) > { > for ( std::uint64_t i = start ; i < end ; i += 4 ) { > { > const std::uint64_t ir0 = (i+0)%4 + 8*((i+0)/4); > c[ ir0 ] = a[ ir0 ] + b[ ir0 ]; > } > { > const std::uint64_t ir0 = (i+1)%4 + 8*((i+1)/4); > c[ ir0 ] = a[ ir0 ] + b[ ir0 ]; > } > { > const std::uint64_t ir0 = (i+2)%4 + 8*((i...
2013 Oct 31
0
[LLVMdev] loop vectorizer misses opportunity, exploit
...on passes fail to optimize: #include <cstdint> #include <iostream> void bar(std::uint64_t start, std::uint64_t end, float * __restrict__ c, float * __restrict__ a, float * __restrict__ b) { for ( std::uint64_t i = start ; i < end ; i += 4 ) { { const std::uint64_t ir0 = (i+0)%4 + 8*((i+0)/4); c[ ir0 ] = a[ ir0 ] + b[ ir0 ]; } { const std::uint64_t ir0 = (i+1)%4 + 8*((i+1)/4); c[ ir0 ] = a[ ir0 ] + b[ ir0 ]; } { const std::uint64_t ir0 = (i+2)%4 + 8*((i+2)/4); c[ ir0 ]...
2013 Oct 31
5
[LLVMdev] loop vectorizer
On 30 October 2013 18:40, Frank Winter <fwinter at jlab.org> wrote: > const std::uint64_t ir0 = (i+0)%4; // not working > I thought this would be the case when I saw the original expression. Maybe we need to teach module arithmetic to SCEV? --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/...
2013 Oct 31
3
[LLVMdev] loop vectorizer misses opportunity, exploit
...> #include <cstdint> > #include <iostream> > > void bar(std::uint64_t start, std::uint64_t end, float * __restrict__ > c, float * __restrict__ a, float * __restrict__ b) > { > for ( std::uint64_t i = start ; i < end ; i += 4 ) { > { > const std::uint64_t ir0 = (i+0)%4 + 8*((i+0)/4); > c[ ir0 ] = a[ ir0 ] + b[ ir0 ]; > } > { > const std::uint64_t ir0 = (i+1)%4 + 8*((i+1)/4); > c[ ir0 ] = a[ ir0 ] + b[ ir0 ]; > } > { > const std::uint64_t ir0 = (i+2)%4 + 8*((i+2)/4); > c[ ir0 ] = a[ ir0 ] + b[ ir0 ]; > } > { > const st...
2013 Oct 31
0
[LLVMdev] loop vectorizer misses opportunity, exploit
...lude <iostream> >> >> void bar(std::uint64_t start, std::uint64_t end, float * >> __restrict__ c, float * __restrict__ a, float * __restrict__ b) >> { >> for ( std::uint64_t i = start ; i < end ; i += 4 ) { >> { >> const std::uint64_t ir0 = (i+0)%4 + 8*((i+0)/4); >> c[ ir0 ] = a[ ir0 ] + b[ ir0 ]; >> } >> { >> const std::uint64_t ir0 = (i+1)%4 + 8*((i+1)/4); >> c[ ir0 ] = a[ ir0 ] + b[ ir0 ]; >> } >> { >> const std...
2013 Oct 31
0
[LLVMdev] loop vectorizer misses opportunity, exploit
...>> #include <iostream> >> >> void bar(std::uint64_t start, std::uint64_t end, float * __restrict__ >> c, float * __restrict__ a, float * __restrict__ b) >> { >> for ( std::uint64_t i = start ; i < end ; i += 4 ) { >> { >> const std::uint64_t ir0 = (i+0)%4 + 8*((i+0)/4); >> c[ ir0 ] = a[ ir0 ] + b[ ir0 ]; >> } >> { >> const std::uint64_t ir0 = (i+1)%4 + 8*((i+1)/4); >> c[ ir0 ] = a[ ir0 ] + b[ ir0 ]; >> } >> { >> const std::uint64_t ir0 = (i+2)%4 + 8*((i+2)/4); >> c[ ir0 ] = a[ ir0 ] + b...
2013 Oct 31
0
[LLVMdev] loop vectorizer
I tried the following on the hand-unrolled loop: const std::uint64_t ir0 = i*8+0; // working const std::uint64_t ir0 = i%4+0; // working const std::uint64_t ir0 = (i+0)%4; // not working '+0' means +1,+2,+3 in the unrolled iterations. 'Working' means the SLP vectorizer succeeded. Thus, when working 'towards' the correct index...
2013 Oct 30
3
[LLVMdev] loop vectorizer
Hi Frank, > We are looking at a variety of target architectures. Ultimately we aim to run on BG/Q and Intel Xeon Phi (native). However, running on those architectures with the LLVM technology is planned in some future. As a first step we would target vanilla x86 with SSE/AVX 128/256 as a proof-of-concept. Great! It should be easy to support these targets. When you said wide-vectors I assumed
2013 Nov 06
3
[LLVMdev] loop vectorizer
...3, at 11:21 PM, Renato Golin <renato.golin at linaro.org > <mailto:renato.golin at linaro.org>> wrote: > >> On 30 October 2013 18:40, Frank Winter <fwinter at jlab.org >> <mailto:fwinter at jlab.org>> wrote: >> >> const std::uint64_t ir0 = (i+0)%4; // not working >> >> >> I thought this would be the case when I saw the original expression. >> Maybe we need to teach module arithmetic to SCEV? > > I let this thread get stale, so here’s the background again: > > source: > > const std::...
2013 Nov 06
0
[LLVMdev] loop vectorizer
On Oct 30, 2013, at 11:21 PM, Renato Golin <renato.golin at linaro.org> wrote: > On 30 October 2013 18:40, Frank Winter <fwinter at jlab.org> wrote: > const std::uint64_t ir0 = (i+0)%4; // not working > > I thought this would be the case when I saw the original expression. Maybe we need to teach module arithmetic to SCEV? I let this thread get stale, so here’s the background again: source: const std::uint64_t ir0 = i%4 + 8*(i/4); c[ ir0 ]...
2013 Nov 06
0
[LLVMdev] loop vectorizer
...>> On 05/11/13 22:12, Andrew Trick wrote: >> >>> On Oct 30, 2013, at 11:21 PM, Renato Golin <renato.golin at linaro.org> wrote: >>> >>> On 30 October 2013 18:40, Frank Winter <fwinter at jlab.org> wrote: >>>> const std::uint64_t ir0 = (i+0)%4; // not working >>> >>> I thought this would be the case when I saw the original expression. Maybe we need to teach module arithmetic to SCEV? >> >> I let this thread get stale, so here’s the background again: >> >> source: >> >>...
2013 Oct 30
2
[LLVMdev] loop vectorizer
The debug messages are misleading. They should read “trying to vectorize a list of …”; The problem is that the SCEV analysis is unable to detect that C[ir0] and C[ir1] are consecutive. Is this loop from an important benchmark ? Thanks, Nadav On Oct 30, 2013, at 11:13 AM, Frank Winter <fwinter at jlab.org> wrote: > The SLP vectorizer apparently did something in the prologue of the function (where storing of arguments on the stack happens)...
2013 Oct 30
0
[LLVMdev] loop vectorizer
Well, they are not directly consecutive. They are consecutive with a constant offset or stride: ir1 = ir0 + 4 If I rewrite the function in this form void bar(std::uint64_t start, std::uint64_t end, float * __restrict__ c, float * __restrict__ a, float * __restrict__ b) { const std::uint64_t inner = 4; for (std::uint64_t i = start ; i < end ; ++i ) { const std::uint64_t ir0 = ( (i/inne...
2013 Oct 30
2
[LLVMdev] loop vectorizer
...rizer seems to be not able to vectorize the following code: void bar(std::uint64_t start, std::uint64_t end, float * __restrict__ c, float * __restrict__ a, float * __restrict__ b) { const std::uint64_t inner = 4; for (std::uint64_t i = start ; i < end ; ++i ) { const std::uint64_t ir0 = ( (i/inner) * 2 + 0 ) * inner + i%4; const std::uint64_t ir1 = ( (i/inner) * 2 + 1 ) * inner + i%4; c[ ir0 ] = a[ ir0 ] + b[ ir0 ]; c[ ir1 ] = a[ ir1 ] + b[ ir1 ]; } } LV: Found a loop: for.body LV: Found an induction variable. LV: We need to do...
2013 Oct 30
0
[LLVMdev] loop vectorizer
...ally mean the current LLVM cannot vectorize the function?: void bar(std::uint64_t start, std::uint64_t end, float * __restrict__ c, float * __restrict__ a, float * __restrict__ b) { const std::uint64_t inner = 4; for (std::uint64_t i = start ; i < end ; ++i ) { const std::uint64_t ir0 = ( (i/inner) * 2 + 0 ) * inner + i%4; const std::uint64_t ir1 = ( (i/inner) * 2 + 1 ) * inner + i%4; c[ ir0 ] = a[ ir0 ] + b[ ir0 ]; c[ ir1 ] = a[ ir1 ] + b[ ir1 ]; } } I was trying the following: clang++ -emit-llvm -S loop.cc -std=c++11 (this wr...
2013 Oct 30
0
[LLVMdev] loop vectorizer
Hi Frank, The access pattern to arrays a and b is non-linear. Unrolled loops are usually handled by the SLP-vectorizer. Are ir0 and ir1 consecutive for all values for i ? Thanks, Nadav On Oct 30, 2013, at 9:05 AM, Frank Winter <fwinter at jlab.org> wrote: > The loop vectorizer seems to be not able to vectorize the following code: > > void bar(std::uint64_t start, std::uint64_t end, float * __restrict__...
2013 Oct 30
3
[LLVMdev] loop vectorizer
...t; > > > On 30/10/13 13:28, Renato Golin wrote: > > > > > On 30 October 2013 09:25, Nadav Rotem < nrotem at apple.com > wrote: > > > The access pattern to arrays a and b is non-linear. Unrolled loops > are usually handled by the SLP-vectorizer. Are ir0 and ir1 > consecutive for all values for i ? > > > Based on his list of values, it seems that the induction stride is > linear within each block of 4 iterations, but it's not a clear > relationship. > > > As you say, it should be possible to spot that once the loo...
2013 Nov 01
2
[LLVMdev] loop vectorizer: this loop is not worth vectorizing
...(std::uint64_t start, std::uint64_t end, float * __restrict__ c, float * __restrict__ a, float * __restrict__ b) { const std::uint64_t inner = 4; for (std::uint64_t i = start/inner ; i < end/inner ; i++ ) { for (std::uint64_t q = 0 ; q < inner ; q++ ) { const std::uint64_t ir0 = ( i * 2 + 0 ) * inner + q; const std::uint64_t ir1 = ( i * 2 + 1 ) * inner + q; c[ ir0 ] = a[ ir0 ] + b[ ir0 ]; c[ ir1 ] = a[ ir1 ] + b[ ir1 ]; } } } the loop vectorizer complains as well, but the produced code is vectorized: LV: Che...