thr3ads.net - llvm dev - [llvm-dev] LLVM Vectorisation Bug [Aug 2017]

If this information is useful, please help other people find it:
Share via:

hameeza ahmed via llvm-dev

2017-Aug-05 17:55 UTC

[llvm-dev] LLVM Vectorisation Bug

I have matrix multiplication and stencil code. I vectorise it through the
following command.

opt  -S -O3 -force-vector-width=2048 stencil.ll -o stencil_o3.ll

in both the examples of matrix mult and stencil it vectorises fine when my
loop iterations >2048. but if i keep both iterations and vector width=2048.
it produces scalar code IR not vectorizes it.

Is it llvm bug?

Please help me.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170805/397231bb/attachment.html>

Renato Golin via llvm-dev

2017-Aug-06 13:28 UTC

head link

[llvm-dev] LLVM Vectorisation Bug

On 5 August 2017 at 14:55, hameeza ahmed via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> I have matrix multiplication and stencil code. I vectorise it through the
> following command.
>
> opt  -S -O3 -force-vector-width=2048 stencil.ll -o stencil_o3.ll
>
> in both the examples of matrix mult and stencil it vectorises fine when my
> loop iterations >2048. but if i keep both iterations and vector
width=2048.
> it produces scalar code IR not vectorizes it.
Hi Ahmed,

Can you show us your code?

I tried this example:

void foo(int *a, int *b, int *c) {
  for (int i=0; i<2048; i++)
    a[i] = b[i] + c[i];
}

Then ran Clang to produce IR and your opt line above and got a vectorised loop:

vector.body:                                      ; preds %vector.body.preheader
  %0 = bitcast i32* %b to <2048 x i32>*
  %wide.load = load <2048 x i32>, <2048 x i32>* %0, align 4,
!alias.scope !1
  %1 = bitcast i32* %c to <2048 x i32>*
  %wide.load17 = load <2048 x i32>, <2048 x i32>* %1, align 4,
!alias.scope !4
  %2 = add nsw <2048 x i32> %wide.load17, %wide.load
  %3 = bitcast i32* %a to <2048 x i32>*
  store <2048 x i32> %2, <2048 x i32>* %3, align 4, !alias.scope !6,
!noalias !8
  br label %for.end

So, this seems to be either a bug in your code (off-by-one, loop
dependencies, etc) or some missing optimisation in Clang, which we'll
only know when we can actually see the code.

cheers,
--renato

Renato Golin via llvm-dev

2017-Aug-06 20:43 UTC

head link

[llvm-dev] LLVM Vectorisation Bug

On 6 August 2017 at 10:49, hameeza ahmed <hahmed2305 at gmail.com>
wrote:> Thank You,
> Stencil code is attached here.
Right, that explains it: your tail loop count doesn't reach 2048 iterations:

#define N 2048
for (i = 1; i <= N-2; i++)
  for (j = 1; j <= N-2; j++)
    a[i][j] = b[i][j];

That'll be 2045 iterations.

Artificially playing with the ranges (N+1, etc) yields vector code, as expected.

Same for the main loop:

   float con=0.2;
   for (k = 0; k < N; k++) {
       for (i = 1; i <= N-2; i++)
           for (j = 1; j <= N-2; j++)
              b[i][j] = con * (a[i][j] + a[i-1][j] + a[i+1][j] +
a[i][j-1] + a[i][j+1]);

cheers,
--renato

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Aug 2017 - LLVM Vectorisation Bug

[llvm-dev] LLVM Vectorisation Bug

[llvm-dev] LLVM Vectorisation Bug

[llvm-dev] LLVM Vectorisation Bug

Possibly Parallel Threads