Displaying 20 results from an estimated 20000 matches similar to: "[LLVMdev] Why is the loop vectorizer not working on my function?"
2013 Oct 26
2
[LLVMdev] Why is the loop vectorizer not working on my function?
Hi Hal!
I am using the 'x86_64' target. Below the complete module dump and here 
the command line:
opt -march=x64-64 -loop-vectorize -debug-only=loop-vectorize -S test.ll
Frank
; ModuleID = 'test.ll'
target datalayout = 
"e-p:64:64:64-S128-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f16:16:16-f32:32:32-f64:64:64-f128:128:128-v64:64:64-v128:12
2013 Oct 26
3
[LLVMdev] Why is the loop vectorizer not working on my function?
----- Original Message -----
> >>> LV: The Widest type: 32 bits.
> >>> LV: The Widest register is: 32 bits.
> 
> Yep, we don’t pick up the right TTI.
> 
> Try -march=x86-64 (or leave it out) you already have this info in the
> triple.
> 
> Then it should work (does for me with your example below).
That may depend on what CPU is picks by default; Frank,
2013 Oct 26
2
[LLVMdev] Why is the loop vectorizer not working on my function?
Hi Arnold,
adding '-debug-only=loop-vectorize' to the command gives:
LV: Checking a loop in "bar"
LV: Found a loop: L0
LV: Found an induction variable.
LV: Found an unidentified write ptr:   %7 = load float** %6
LV: Found an unidentified read ptr:   %10 = load float** %9
LV: Found an unidentified read ptr:   %13 = load float** %12
LV: We need to do 2 pointer comparisons.
LV: We
2013 Oct 26
0
[LLVMdev] Why is the loop vectorizer not working on my function?
>>> LV: The Widest type: 32 bits.
>>> LV: The Widest register is: 32 bits.
Yep, we don’t pick up the right TTI.
Try -march=x86-64 (or leave it out) you already have this info in the triple.
Then it should work (does for me with your example below).
On Oct 26, 2013, at 2:16 PM, Frank Winter <fwinter at jlab.org> wrote:
> Hi Hal!
> 
> I am using the
2013 Oct 26
0
[LLVMdev] Why is the loop vectorizer not working on my function?
----- Original Message -----
> Hi Arnold,
> 
> adding '-debug-only=loop-vectorize' to the command gives:
> 
> LV: Checking a loop in "bar"
> LV: Found a loop: L0
> LV: Found an induction variable.
> LV: Found an unidentified write ptr:   %7 = load float** %6
> LV: Found an unidentified read ptr:   %10 = load float** %9
> LV: Found an unidentified
2013 Oct 26
0
[LLVMdev] Why is the loop vectorizer not working on my function?
Hi Frank,
Sent from my iPhone
> On Oct 26, 2013, at 10:03 AM, Frank Winter <fwinter at jlab.org> wrote:
> 
> My function implements a simple loop:
> 
> void bar( int start, int end, float* A, float* B, float* C)
> {
>    for (int i=start; i<end;++i)
>       A[i] = B[i] * C[i];
> }
> 
> This looks pretty much like the standard example. However, I built
2013 Oct 26
0
[LLVMdev] Why is the loop vectorizer not working on my function?
I would need this to work when calling the vectorizer through
the function pass manager. Unfortunately I am having the same
problem there:
LV: The Widest type: 32 bits.
LV: The Widest register is: 32 bits.
It's not picking the target information, although I tried with and
without the target triple in the module.
Any idea what could be wrong?
Frank
On 26/10/13 15:54, Hal Finkel wrote:
2013 Oct 27
3
[LLVMdev] Why is the loop vectorizer not working on my function?
Hi Frank,
On Oct 26, 2013, at 6:29 PM, Frank Winter <fwinter at jlab.org> wrote:
> I would need this to work when calling the vectorizer through
> the function pass manager. Unfortunately I am having the same
> problem there:
I am not sure which function pass manager you are referring here. I assume you create your own (you are not using opt but configure your own pass
2013 Oct 27
0
[LLVMdev] Why is the loop vectorizer not working on my function?
Hi Arnold,
thanks for the detailed setup. Still, I haven't figured out the right 
thing to do.
I would need only the native target since all generated code will 
execute on the JIT execution machine (right now, the old JIT interface). 
There is no need for other targets.
Maybe it would be good to ask specific questions:
How do I get the triple for the native target?
How do I setup the
2013 Oct 28
2
[LLVMdev] Loop vectorizer dosen't find loop bounds
Bingo! That works (when coming from C source)
Now, I have a serious problem. I am not coming from C but I build the 
function with the builder. I am also forced to change the signature and 
load the pointers a,b,c afterwards:
define void @bar([8 x i8]* nocapture readonly %arg_ptr) #0 {
entrypoint:
   %0 = bitcast [8 x i8]* %arg_ptr to i32*
   %1 = load i32* %0, align 4
   %2 = getelementptr [8 x
2013 Oct 29
2
[LLVMdev] Loop vectorizer dosen't find loop bounds
Thanks for the alternatives!
I am trying the 'extracting sub-function' approach. However, it seems I 
can't get the 'subfunction' to pass the verifier. This is my subfunction:
define void @main_extern([8 x i8]* %arg_ptr) {
entrypoint:
   %0 = getelementptr [8 x i8]* %arg_ptr, i32 0
   %1 = bitcast [8 x i8]* %0 to i64*
   %2 = load i64* %1
   %3 = getelementptr [8 x i8]*
2013 Oct 28
0
[LLVMdev] Loop vectorizer dosen't find loop bounds
----- Original Message -----
> Bingo! That works (when coming from C source)
> 
> Now, I have a serious problem. I am not coming from C but I build the
> function with the builder. I am also forced to change the signature
> and
> load the pointers a,b,c afterwards:
> 
> define void @bar([8 x i8]* nocapture readonly %arg_ptr) #0 {
> entrypoint:
>    %0 = bitcast [8 x
2013 Oct 29
0
[LLVMdev] Loop vectorizer dosen't find loop bounds
----- Original Message -----
> Thanks for the alternatives!
> 
> I am trying the 'extracting sub-function' approach. However, it seems
> I
> can't get the 'subfunction' to pass the verifier. This is my
> subfunction:
> 
> define void @main_extern([8 x i8]* %arg_ptr) {
> entrypoint:
>    %0 = getelementptr [8 x i8]* %arg_ptr, i32 0
>    %1 =
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The following IR implements the following nested loop:
for (int i = start ; i < end ; ++i )
   for (int p = 0 ; p < 4 ; ++p )
     a[i*4+p] = b[i*4+p] + c[i*4+p];
define void @main(i64 %arg0, i64 %arg1, i1 %arg2, i64 %arg3, float* 
noalias %arg4, float* noalias %arg5, float* noalias %arg6) {
entrypoint:
   br i1 %arg2, label %L0, label %L1
L0:                                           
2013 Oct 28
0
[LLVMdev] Loop vectorizer dosen't find loop bounds
----- Original Message -----
> I am trying to vectorize the function
> 
> void bar(float *c, float *a, float *b)
> {
>    const int width = 256;
>    for (int i = 0 ; i < 256 ; ++i ) {
>      c[ i ]         = a[ i ]         + b[ i ];
>      c[ width + i ] = a[ width + i ] + b[ width + i ];
>    }
> }
> 
> using the following commands
> 
> clang
2013 Oct 28
2
[LLVMdev] Loop vectorizer dosen't find loop bounds
I am trying to vectorize the function
void bar(float *c, float *a, float *b)
{
   const int width = 256;
   for (int i = 0 ; i < 256 ; ++i ) {
     c[ i ]         = a[ i ]         + b[ i ];
     c[ width + i ] = a[ width + i ] + b[ width + i ];
   }
}
using the following commands
clang -emit-llvm -S loop.c
opt loop.ll -O3 -debug-only=loop-vectorize -S -o -
LV: Checking a loop in
2013 Nov 06
0
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The loop vectorizer relies on cleanup passes to be run after it:
from Transforms/IPO/PassManagerBuilder.cpp:
    // Add the various vectorization passes and relevant cleanup passes for
    // them since we are no longer in the middle of the main scalar pipeline.
    MPM.add(createLoopVectorizePass(DisableUnrollLoops));
    MPM.add(createInstructionCombiningPass());
   
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The instcombine pass cleans up a lot.
Any idea why there are still shufflevector, insertelement, *and* bitcast 
(!!) etc. instructions left? The original loop is so clean, a textbook 
example I'd say. There is no need to shuffle anything.At least I don't 
see it.
Frank
vector.ph:                                        ; preds = %L5
   %broadcast.splatinsert1 = insertelement <4 x
2013 Nov 01
2
[LLVMdev] loop vectorizer: this loop is not worth vectorizing
I am trying a setup where the one loop is rewritten as two loops. This 
avoids the 'rem' and 'div' instructions in the index calculation (which 
give the loop vectorizer a hard time).
However, with this setup the loop vectorizer complains about a too small 
loop.
LV: Checking a loop in "main"
LV: Found a loop: L3
LV: Found a loop with a very small trip count. This loop
2013 Nov 01
0
[LLVMdev] loop vectorizer: this loop is not worth vectorizing
In the case when coming from C it was probably the loop unroller and SLP 
vectorizer which vectorized the code. Potentially I could do the same in 
the IR. However, the loop body that is generated in the IR can get very 
large. Thus, the loop unroller will refuse to unroll the loop in a large 
number of (important) cases.
Isn't there a way to convince the loop vectorizer that it should