Displaying 20 results from an estimated 8000 matches similar to: "[LLVMdev] Loop vectorizer dosen't find loop bounds"
2013 Oct 28
2
[LLVMdev] Loop vectorizer dosen't find loop bounds
Bingo! That works (when coming from C source)
Now, I have a serious problem. I am not coming from C but I build the
function with the builder. I am also forced to change the signature and
load the pointers a,b,c afterwards:
define void @bar([8 x i8]* nocapture readonly %arg_ptr) #0 {
entrypoint:
%0 = bitcast [8 x i8]* %arg_ptr to i32*
%1 = load i32* %0, align 4
%2 = getelementptr [8 x
2013 Oct 28
0
[LLVMdev] Loop vectorizer dosen't find loop bounds
----- Original Message -----
> I am trying to vectorize the function
>
> void bar(float *c, float *a, float *b)
> {
> const int width = 256;
> for (int i = 0 ; i < 256 ; ++i ) {
> c[ i ] = a[ i ] + b[ i ];
> c[ width + i ] = a[ width + i ] + b[ width + i ];
> }
> }
>
> using the following commands
>
> clang
2013 Oct 29
2
[LLVMdev] Loop vectorizer dosen't find loop bounds
Thanks for the alternatives!
I am trying the 'extracting sub-function' approach. However, it seems I
can't get the 'subfunction' to pass the verifier. This is my subfunction:
define void @main_extern([8 x i8]* %arg_ptr) {
entrypoint:
%0 = getelementptr [8 x i8]* %arg_ptr, i32 0
%1 = bitcast [8 x i8]* %0 to i64*
%2 = load i64* %1
%3 = getelementptr [8 x i8]*
2013 Oct 28
0
[LLVMdev] Loop vectorizer dosen't find loop bounds
----- Original Message -----
> Bingo! That works (when coming from C source)
>
> Now, I have a serious problem. I am not coming from C but I build the
> function with the builder. I am also forced to change the signature
> and
> load the pointers a,b,c afterwards:
>
> define void @bar([8 x i8]* nocapture readonly %arg_ptr) #0 {
> entrypoint:
> %0 = bitcast [8 x
2013 Oct 29
0
[LLVMdev] Loop vectorizer dosen't find loop bounds
----- Original Message -----
> Thanks for the alternatives!
>
> I am trying the 'extracting sub-function' approach. However, it seems
> I
> can't get the 'subfunction' to pass the verifier. This is my
> subfunction:
>
> define void @main_extern([8 x i8]* %arg_ptr) {
> entrypoint:
> %0 = getelementptr [8 x i8]* %arg_ptr, i32 0
> %1 =
2013 Oct 28
2
[LLVMdev] loop vectorizer says Bad stride
Verifying function
running passes ...
LV: Checking a loop in "bar"
LV: Found a loop: L0
LV: Found an induction variable.
LV: We need to do 0 pointer comparisons.
LV: Checking memory dependencies
LV: Bad stride - Not an AddRecExpr pointer %13 = getelementptr float*
%arg2, i32 %1 SCEV: ((4 * (sext i32 {(256 + %arg0),+,1}<nw><%L0> to
i64)) + %arg2)
LV: Src Scev: {((4 * (sext
2013 Oct 28
0
[LLVMdev] loop vectorizer says Bad stride
Frank,
It looks like the loop vectorizer is unable to tell that the two stores in your code never overlap. This is probably because of the sign-extend in your code. Can you extend the indices to 64bit ?
Thanks,
Nadav
On Oct 28, 2013, at 1:38 PM, Frank Winter <fwinter at jlab.org> wrote:
> Verifying function
> running passes ...
> LV: Checking a loop in "bar"
> LV:
2017 Dec 19
4
MemorySSA question
Hi,
I am new to MemorySSA and wanted to understand its capabilities. Hence I
wrote the following program (test.c):
int N;
void test(int *restrict a, int *restrict b, int *restrict c, int *restrict
d, int *restrict e) {
int i;
for (i = 0; i < N; i = i + 5) {
a[i] = b[i] + c[i];
}
for (i = 0; i < N - 5; i = i + 5) {
e[i] = a[i] * d[i];
}
}
I compiled this program using
2017 Dec 19
2
MemorySSA question
On Tue, Dec 19, 2017 at 9:10 AM, Siddharth Bhat via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> I could be entirely wrong, but from my understanding of memorySSA, each
> def defines an "abstract heap state" which has the coarsest possible
> definition - any write will be modelled as a "new heap state".
>
This is true for def-def relationships, but
2013 Oct 26
2
[LLVMdev] Why is the loop vectorizer not working on my function?
Hi Arnold,
adding '-debug-only=loop-vectorize' to the command gives:
LV: Checking a loop in "bar"
LV: Found a loop: L0
LV: Found an induction variable.
LV: Found an unidentified write ptr: %7 = load float** %6
LV: Found an unidentified read ptr: %10 = load float** %9
LV: Found an unidentified read ptr: %13 = load float** %12
LV: We need to do 2 pointer comparisons.
LV: We
2013 Oct 26
0
[LLVMdev] Why is the loop vectorizer not working on my function?
----- Original Message -----
> Hi Arnold,
>
> adding '-debug-only=loop-vectorize' to the command gives:
>
> LV: Checking a loop in "bar"
> LV: Found a loop: L0
> LV: Found an induction variable.
> LV: Found an unidentified write ptr: %7 = load float** %6
> LV: Found an unidentified read ptr: %10 = load float** %9
> LV: Found an unidentified
2013 Oct 26
2
[LLVMdev] Why is the loop vectorizer not working on my function?
Hi Hal!
I am using the 'x86_64' target. Below the complete module dump and here
the command line:
opt -march=x64-64 -loop-vectorize -debug-only=loop-vectorize -S test.ll
Frank
; ModuleID = 'test.ll'
target datalayout =
"e-p:64:64:64-S128-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f16:16:16-f32:32:32-f64:64:64-f128:128:128-v64:64:64-v128:12
2013 Oct 26
0
[LLVMdev] Why is the loop vectorizer not working on my function?
>>> LV: The Widest type: 32 bits.
>>> LV: The Widest register is: 32 bits.
Yep, we don’t pick up the right TTI.
Try -march=x86-64 (or leave it out) you already have this info in the triple.
Then it should work (does for me with your example below).
On Oct 26, 2013, at 2:16 PM, Frank Winter <fwinter at jlab.org> wrote:
> Hi Hal!
>
> I am using the
2013 Oct 26
3
[LLVMdev] Why is the loop vectorizer not working on my function?
----- Original Message -----
> >>> LV: The Widest type: 32 bits.
> >>> LV: The Widest register is: 32 bits.
>
> Yep, we don’t pick up the right TTI.
>
> Try -march=x86-64 (or leave it out) you already have this info in the
> triple.
>
> Then it should work (does for me with your example below).
That may depend on what CPU is picks by default; Frank,
2013 Oct 26
0
[LLVMdev] Why is the loop vectorizer not working on my function?
I would need this to work when calling the vectorizer through
the function pass manager. Unfortunately I am having the same
problem there:
LV: The Widest type: 32 bits.
LV: The Widest register is: 32 bits.
It's not picking the target information, although I tried with and
without the target triple in the module.
Any idea what could be wrong?
Frank
On 26/10/13 15:54, Hal Finkel wrote:
2013 Oct 27
3
[LLVMdev] Why is the loop vectorizer not working on my function?
Hi Frank,
On Oct 26, 2013, at 6:29 PM, Frank Winter <fwinter at jlab.org> wrote:
> I would need this to work when calling the vectorizer through
> the function pass manager. Unfortunately I am having the same
> problem there:
I am not sure which function pass manager you are referring here. I assume you create your own (you are not using opt but configure your own pass
2013 Oct 27
0
[LLVMdev] Why is the loop vectorizer not working on my function?
Hi Arnold,
thanks for the detailed setup. Still, I haven't figured out the right
thing to do.
I would need only the native target since all generated code will
execute on the JIT execution machine (right now, the old JIT interface).
There is no need for other targets.
Maybe it would be good to ask specific questions:
How do I get the triple for the native target?
How do I setup the
2013 Nov 08
1
[LLVMdev] loop vectorizer and storing to uniform addresses
I changed the input C to using a 64 bit type for the loop index (this
eliminates 'sext' instructions in the IR)
Here the IR produced with clang -O0
define float @foo(i64 %start, i64 %end, float* %A) #0 {
entry:
%start.addr = alloca i64, align 8
%end.addr = alloca i64, align 8
%A.addr = alloca float*, align 8
%sum = alloca [4 x float], align 16
%i = alloca i64, align 8
2013 Nov 08
3
[LLVMdev] loop vectorizer and storing to uniform addresses
I am trying my luck on this global reduction kernel:
float foo( int start , int end , float * A )
{
float sum[4] = {0.,0.,0.,0.};
for (int i = start ; i < end ; ++i ) {
for (int q = 0 ; q < 4 ; ++q )
sum[q] += A[i*4+q];
}
return sum[0]+sum[1]+sum[2]+sum[3];
}
LV: Checking a loop in "foo"
LV: Found a loop: for.cond1
LV: Found an induction variable.
LV: We
2013 Nov 08
0
[LLVMdev] loop vectorizer and storing to uniform addresses
On 7 November 2013 17:18, Frank Winter <fwinter at jlab.org> wrote:
> LV: We don't allow storing to uniform addresses
>
This is triggering because it didn't recognize as a reduction variable
during the canVectorizeInstrs() but did recognize that sum[q] is loop
invariant in canVectorizeMemory().
I'm guessing the nested loop was unrolled because of the low trip-count,
and