Displaying 20 results from an estimated 25 matches for "ir0".
Did you mean:
ir
2013 Oct 31
2
[LLVMdev] loop vectorizer
...eteness, here the code:
>
> void bar(std::uint64_t start, std::uint64_t end, float * __restrict__ c, float * __restrict__ a, float * __restrict__ b)
> {
> const std::uint64_t inner = 4;
> for (std::uint64_t i = start ; i < end ; i+=4 ) {
> {
> const std::uint64_t ir0 = ( ((i+0)/inner) * 2 + 0 ) * inner + (i+0)%4;
> const std::uint64_t ir1 = ( ((i+0)/inner) * 2 + 1 ) * inner + (i+0)%4;
> c[ ir0 ] = a[ ir0 ] + b[ ir0 ];
> c[ ir1 ] = a[ ir1 ] + b[ ir1 ];
> }
> {
> const std::uint64_t ir0 =...
2013 Oct 31
0
[LLVMdev] loop vectorizer
...index_1 = 15
For completeness, here the code:
void bar(std::uint64_t start, std::uint64_t end, float * __restrict__
c, float * __restrict__ a, float * __restrict__ b)
{
const std::uint64_t inner = 4;
for (std::uint64_t i = start ; i < end ; i+=4 ) {
{
const std::uint64_t ir0 = ( ((i+0)/inner) * 2 + 0 ) * inner +
(i+0)%4;
const std::uint64_t ir1 = ( ((i+0)/inner) * 2 + 1 ) * inner +
(i+0)%4;
c[ ir0 ] = a[ ir0 ] + b[ ir0 ];
c[ ir1 ] = a[ ir1 ] + b[ ir1 ];
}
{
const std::uint64_t ir0 = ( ((i+1)/inner)...
2013 Oct 31
3
[LLVMdev] loop vectorizer misses opportunity, exploit
...include <cstdint>
> #include <iostream>
>
> void bar(std::uint64_t start, std::uint64_t end, float * __restrict__ c, float * __restrict__ a, float * __restrict__ b)
> {
> for ( std::uint64_t i = start ; i < end ; i += 4 ) {
> {
> const std::uint64_t ir0 = (i+0)%4 + 8*((i+0)/4);
> c[ ir0 ] = a[ ir0 ] + b[ ir0 ];
> }
> {
> const std::uint64_t ir0 = (i+1)%4 + 8*((i+1)/4);
> c[ ir0 ] = a[ ir0 ] + b[ ir0 ];
> }
> {
> const std::uint64_t ir0 = (i+2)%4 + 8*((i...
2013 Oct 31
0
[LLVMdev] loop vectorizer misses opportunity, exploit
...on
passes fail to optimize:
#include <cstdint>
#include <iostream>
void bar(std::uint64_t start, std::uint64_t end, float * __restrict__
c, float * __restrict__ a, float * __restrict__ b)
{
for ( std::uint64_t i = start ; i < end ; i += 4 ) {
{
const std::uint64_t ir0 = (i+0)%4 + 8*((i+0)/4);
c[ ir0 ] = a[ ir0 ] + b[ ir0 ];
}
{
const std::uint64_t ir0 = (i+1)%4 + 8*((i+1)/4);
c[ ir0 ] = a[ ir0 ] + b[ ir0 ];
}
{
const std::uint64_t ir0 = (i+2)%4 + 8*((i+2)/4);
c[ ir0 ]...
2013 Oct 31
5
[LLVMdev] loop vectorizer
On 30 October 2013 18:40, Frank Winter <fwinter at jlab.org> wrote:
> const std::uint64_t ir0 = (i+0)%4; // not working
>
I thought this would be the case when I saw the original expression. Maybe
we need to teach module arithmetic to SCEV?
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/...
2013 Oct 31
3
[LLVMdev] loop vectorizer misses opportunity, exploit
...> #include <cstdint>
> #include <iostream>
>
> void bar(std::uint64_t start, std::uint64_t end, float * __restrict__
> c, float * __restrict__ a, float * __restrict__ b)
> {
> for ( std::uint64_t i = start ; i < end ; i += 4 ) {
> {
> const std::uint64_t ir0 = (i+0)%4 + 8*((i+0)/4);
> c[ ir0 ] = a[ ir0 ] + b[ ir0 ];
> }
> {
> const std::uint64_t ir0 = (i+1)%4 + 8*((i+1)/4);
> c[ ir0 ] = a[ ir0 ] + b[ ir0 ];
> }
> {
> const std::uint64_t ir0 = (i+2)%4 + 8*((i+2)/4);
> c[ ir0 ] = a[ ir0 ] + b[ ir0 ];
> }
> {
> const st...
2013 Oct 31
0
[LLVMdev] loop vectorizer misses opportunity, exploit
...lude <iostream>
>>
>> void bar(std::uint64_t start, std::uint64_t end, float *
>> __restrict__ c, float * __restrict__ a, float * __restrict__ b)
>> {
>> for ( std::uint64_t i = start ; i < end ; i += 4 ) {
>> {
>> const std::uint64_t ir0 = (i+0)%4 + 8*((i+0)/4);
>> c[ ir0 ] = a[ ir0 ] + b[ ir0 ];
>> }
>> {
>> const std::uint64_t ir0 = (i+1)%4 + 8*((i+1)/4);
>> c[ ir0 ] = a[ ir0 ] + b[ ir0 ];
>> }
>> {
>> const std...
2013 Oct 31
0
[LLVMdev] loop vectorizer misses opportunity, exploit
...>> #include <iostream>
>>
>> void bar(std::uint64_t start, std::uint64_t end, float * __restrict__
>> c, float * __restrict__ a, float * __restrict__ b)
>> {
>> for ( std::uint64_t i = start ; i < end ; i += 4 ) {
>> {
>> const std::uint64_t ir0 = (i+0)%4 + 8*((i+0)/4);
>> c[ ir0 ] = a[ ir0 ] + b[ ir0 ];
>> }
>> {
>> const std::uint64_t ir0 = (i+1)%4 + 8*((i+1)/4);
>> c[ ir0 ] = a[ ir0 ] + b[ ir0 ];
>> }
>> {
>> const std::uint64_t ir0 = (i+2)%4 + 8*((i+2)/4);
>> c[ ir0 ] = a[ ir0 ] + b...
2013 Oct 31
0
[LLVMdev] loop vectorizer
I tried the following on the hand-unrolled loop:
const std::uint64_t ir0 = i*8+0; // working
const std::uint64_t ir0 = i%4+0; // working
const std::uint64_t ir0 = (i+0)%4; // not working
'+0' means +1,+2,+3 in the unrolled iterations.
'Working' means the SLP vectorizer succeeded.
Thus, when working 'towards' the correct index...
2013 Oct 30
3
[LLVMdev] loop vectorizer
Hi Frank,
> We are looking at a variety of target architectures. Ultimately we aim to run on BG/Q and Intel Xeon Phi (native). However, running on those architectures with the LLVM technology is planned in some future. As a first step we would target vanilla x86 with SSE/AVX 128/256 as a proof-of-concept.
Great! It should be easy to support these targets. When you said wide-vectors I assumed
2013 Nov 06
3
[LLVMdev] loop vectorizer
...3, at 11:21 PM, Renato Golin <renato.golin at linaro.org
> <mailto:renato.golin at linaro.org>> wrote:
>
>> On 30 October 2013 18:40, Frank Winter <fwinter at jlab.org
>> <mailto:fwinter at jlab.org>> wrote:
>>
>> const std::uint64_t ir0 = (i+0)%4; // not working
>>
>>
>> I thought this would be the case when I saw the original expression.
>> Maybe we need to teach module arithmetic to SCEV?
>
> I let this thread get stale, so here’s the background again:
>
> source:
>
> const std::...
2013 Nov 06
0
[LLVMdev] loop vectorizer
On Oct 30, 2013, at 11:21 PM, Renato Golin <renato.golin at linaro.org> wrote:
> On 30 October 2013 18:40, Frank Winter <fwinter at jlab.org> wrote:
> const std::uint64_t ir0 = (i+0)%4; // not working
>
> I thought this would be the case when I saw the original expression. Maybe we need to teach module arithmetic to SCEV?
I let this thread get stale, so here’s the background again:
source:
const std::uint64_t ir0 = i%4 + 8*(i/4);
c[ ir0 ]...
2013 Nov 06
0
[LLVMdev] loop vectorizer
...>> On 05/11/13 22:12, Andrew Trick wrote:
>>
>>> On Oct 30, 2013, at 11:21 PM, Renato Golin <renato.golin at linaro.org> wrote:
>>>
>>> On 30 October 2013 18:40, Frank Winter <fwinter at jlab.org> wrote:
>>>> const std::uint64_t ir0 = (i+0)%4; // not working
>>>
>>> I thought this would be the case when I saw the original expression. Maybe we need to teach module arithmetic to SCEV?
>>
>> I let this thread get stale, so here’s the background again:
>>
>> source:
>>
>>...
2013 Oct 30
2
[LLVMdev] loop vectorizer
The debug messages are misleading. They should read “trying to vectorize a list of …”; The problem is that the SCEV analysis is unable to detect that C[ir0] and C[ir1] are consecutive. Is this loop from an important benchmark ?
Thanks,
Nadav
On Oct 30, 2013, at 11:13 AM, Frank Winter <fwinter at jlab.org> wrote:
> The SLP vectorizer apparently did something in the prologue of the function (where storing of arguments on the stack happens)...
2013 Oct 30
0
[LLVMdev] loop vectorizer
Well, they are not directly consecutive. They are consecutive with a
constant offset or stride:
ir1 = ir0 + 4
If I rewrite the function in this form
void bar(std::uint64_t start, std::uint64_t end, float * __restrict__
c, float * __restrict__ a, float * __restrict__ b)
{
const std::uint64_t inner = 4;
for (std::uint64_t i = start ; i < end ; ++i ) {
const std::uint64_t ir0 = ( (i/inne...
2013 Oct 30
2
[LLVMdev] loop vectorizer
...rizer seems to be not able to vectorize the following code:
void bar(std::uint64_t start, std::uint64_t end, float * __restrict__
c, float * __restrict__ a, float * __restrict__ b)
{
const std::uint64_t inner = 4;
for (std::uint64_t i = start ; i < end ; ++i ) {
const std::uint64_t ir0 = ( (i/inner) * 2 + 0 ) * inner + i%4;
const std::uint64_t ir1 = ( (i/inner) * 2 + 1 ) * inner + i%4;
c[ ir0 ] = a[ ir0 ] + b[ ir0 ];
c[ ir1 ] = a[ ir1 ] + b[ ir1 ];
}
}
LV: Found a loop: for.body
LV: Found an induction variable.
LV: We need to do...
2013 Oct 30
0
[LLVMdev] loop vectorizer
...ally mean the current LLVM cannot vectorize the function?:
void bar(std::uint64_t start, std::uint64_t end, float * __restrict__
c, float * __restrict__ a, float * __restrict__ b)
{
const std::uint64_t inner = 4;
for (std::uint64_t i = start ; i < end ; ++i ) {
const std::uint64_t ir0 = ( (i/inner) * 2 + 0 ) * inner + i%4;
const std::uint64_t ir1 = ( (i/inner) * 2 + 1 ) * inner + i%4;
c[ ir0 ] = a[ ir0 ] + b[ ir0 ];
c[ ir1 ] = a[ ir1 ] + b[ ir1 ];
}
}
I was trying the following:
clang++ -emit-llvm -S loop.cc -std=c++11
(this wr...
2013 Oct 30
0
[LLVMdev] loop vectorizer
Hi Frank,
The access pattern to arrays a and b is non-linear. Unrolled loops are usually handled by the SLP-vectorizer. Are ir0 and ir1 consecutive for all values for i ?
Thanks,
Nadav
On Oct 30, 2013, at 9:05 AM, Frank Winter <fwinter at jlab.org> wrote:
> The loop vectorizer seems to be not able to vectorize the following code:
>
> void bar(std::uint64_t start, std::uint64_t end, float * __restrict__...
2013 Oct 30
3
[LLVMdev] loop vectorizer
...t;
>
>
> On 30/10/13 13:28, Renato Golin wrote:
>
>
>
>
> On 30 October 2013 09:25, Nadav Rotem < nrotem at apple.com > wrote:
>
>
> The access pattern to arrays a and b is non-linear. Unrolled loops
> are usually handled by the SLP-vectorizer. Are ir0 and ir1
> consecutive for all values for i ?
>
>
> Based on his list of values, it seems that the induction stride is
> linear within each block of 4 iterations, but it's not a clear
> relationship.
>
>
> As you say, it should be possible to spot that once the loo...
2013 Nov 01
2
[LLVMdev] loop vectorizer: this loop is not worth vectorizing
...(std::uint64_t start, std::uint64_t end, float * __restrict__
c, float * __restrict__ a, float * __restrict__ b)
{
const std::uint64_t inner = 4;
for (std::uint64_t i = start/inner ; i < end/inner ; i++ ) {
for (std::uint64_t q = 0 ; q < inner ; q++ ) {
const std::uint64_t ir0 = ( i * 2 + 0 ) * inner + q;
const std::uint64_t ir1 = ( i * 2 + 1 ) * inner + q;
c[ ir0 ] = a[ ir0 ] + b[ ir0 ];
c[ ir1 ] = a[ ir1 ] + b[ ir1 ];
}
}
}
the loop vectorizer complains as well, but the produced code is vectorized:
LV: Che...