thr3ads.net - llvm dev - [llvm-dev] Imprecise information by SCEV after loop unrolling and Vectorization. [Aug 2018]

If this information is useful, please help other people find it:
Share via:
Mani, Suresh via llvm-dev
2018-Aug-24 10:58 UTC
[llvm-dev] Imprecise information by SCEV after loop unrolling and Vectorization.

Hi,

I have a simple test case,

// 1.c
#include <stdio.h>

int b[1000];
main() {
       int a[1000];
       int sum = 0;

       for (int i = 0; i < 1000; i++) {
              a[i] = i + 10;
       }

       for (int i = 0; i < 1000; i += 1) {
              sum = sum + a[i];
       }

       printf("sum = %ld\n", sum);

}

Compilation:   clang -m64 -fuse-ld=lld -flto -S -emit-llvm -o 1.ll  1.c
SCEV DUMP:  opt -S  -loop-accesses -mem2reg -analyze -indvars -loop-simplify
-scalar-evolution 1.ll

Due to unrolling and Vectorization,  Scev  determines that some of the access
are outside the range [ 0 - 999 ]  --------------- [ 0 - 1020 ] as highlighted
below in red, not sure if this issue is a known issue in Scev ?

Printing analysis 'Loop Access Analysis' for function 'main':
  vector.body33:
    Memory dependences are safe
    Dependences:
    Run-time memory checks:
    Grouped accesses:

    Store to invariant address was not found in loop.
    SCEV assumptions:

    Expressions re-written:
  vector.body:
    Report: loop control flow is not understood by analyzer
    Dependences:
    Run-time memory checks:
    Grouped accesses:

    Store to invariant address was not found in loop.
    SCEV assumptions:

    Expressions re-written:
Printing analysis 'Promote Memory to Register' for function
'main':
Pass::print not implemented for pass: 'Promote Memory to Register'!
Printing analysis 'Induction Variable Simplification':
Pass::print not implemented for pass: 'Induction Variable
Simplification'!
Printing analysis 'Induction Variable Simplification':
Pass::print not implemented for pass: 'Induction Variable
Simplification'!
Printing analysis 'Canonicalize natural loops' for function
'main':
Pass::print not implemented for pass: 'Canonicalize natural loops'!
Printing analysis 'Scalar Evolution Analysis' for function
'main':
Classifying expressions for: @main
  %a = alloca [1000 x i32], align 16
  -->  %a U: [0,-15) S: [-9223372036854775808,9223372036854775793)
  %0 = bitcast [1000 x i32]* %a to i8*
  -->  %a U: [0,-15) S: [-9223372036854775808,9223372036854775793)
  %index = phi i64 [ 0, %entry ], [ %index.next.3, %vector.body.1 ]
  -->  {0,+,32}<nuw><nsw><%vector.body> U: [0,993) S:
[0,993)           Exits: 992              LoopDispositions: { %vector.body:
Computable }
  %1 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index
  -->  {%a,+,128}<nsw><%vector.body> U: [0,-15) S:
[-9223372036854775808,9223372036854775793)           Exits: (3968 + %a)         
LoopDispositions: { %vector.body: Computable }
  %4 = bitcast i32* %1 to <4 x i32>*
  -->  {%a,+,128}<nsw><%vector.body> U: [0,-15) S:
[-9223372036854775808,9223372036854775793)           Exits: (3968 + %a)         
LoopDispositions: { %vector.body: Computable }
  %5 = getelementptr inbounds i32, i32* %1, i64 4
  -->  {(16 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15)
S: [-9223372036854775808,9223372036854775793)               Exits: (3984 + %a)  
LoopDispositions: { %vector.body: Computable }
  %6 = bitcast i32* %5 to <4 x i32>*
  -->  {(16 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15)
S: [-9223372036854775808,9223372036854775793)               Exits: (3984 + %a)  
LoopDispositions: { %vector.body: Computable }
  %index.next = or i64 %index, 8
  -->  {8,+,32}<nuw><nsw><%vector.body> U: [8,1001) S:
[8,1001)         Exits: 1000             LoopDispositions: { %vector.body:
Computable }
  %index37 = phi i64 [ %index.next38.4, %vector.body33 ], [ 0,
%vector.body33.preheader ]
  -->  {0,+,40}<nuw><nsw><%vector.body33> U: [0,961) S:
[0,961)         Exits: 960              LoopDispositions: { %vector.body33:
Computable }
  %8 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64
%index37
  -->  {%a,+,160}<nsw><%vector.body33> U: [0,-15) S:
[-9223372036854775808,9223372036854775793)         Exits: (3840 + %a)           
LoopDispositions: { %vector.body33: Computable }
  %9 = bitcast i32* %8 to <4 x i32>*
  -->  {%a,+,160}<nsw><%vector.body33> U: [0,-15) S:
[-9223372036854775808,9223372036854775793)         Exits: (3840 + %a)           
LoopDispositions: { %vector.body33: Computable }
  %10 = getelementptr inbounds i32, i32* %8, i64 4
  -->  {(16 + %a)<nsw>,+,160}<nsw><%vector.body33> U:
[0,-15) S: [-9223372036854775808,9223372036854775793)             Exits: (3856 +
%a)              LoopDispositions: { %vector.body33: Computable }
  %11 = bitcast i32* %10 to <4 x i32>*
  -->  {(16 + %a)<nsw>,+,160}<nsw><%vector.body33> U:
[0,-15) S: [-9223372036854775808,9223372036854775793)             Exits: (3856 +
%a)              LoopDispositions: { %vector.body33: Computable }
  %index.next38 = add nuw nsw i64 %index37, 8
  -->  {8,+,40}<nuw><nsw><%vector.body33> U: [8,969) S:
[8,969)         Exits: 968              LoopDispositions: { %vector.body33:
Computable }
  %14 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64
%index.next38
 -->  {(32 + %a)<nsw>,+,160}<nsw><%vector.body33> U:
[0,-15) S: [-9223372036854775808,9223372036854775793)             Exits: (3872 +
%a)              LoopDispositions: { %vector.body33: Computable }
  %15 = bitcast i32* %14 to <4 x i32>*
  -->  {(32 + %a)<nsw>,+,160}<nsw><%vector.body33> U:
[0,-15) S: [-9223372036854775808,9223372036854775793)             Exits: (3872 +
%a)              LoopDispositions: { %vector.body33: Computable }
  %16 = getelementptr inbounds i32, i32* %14, i64 4
  -->  {(48 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S:
[-9223372036854775808,9223372036854775793)          Exits: (3888 + %a)          
LoopDispositions: { %vector.body33: Computable }
  %17 = bitcast i32* %16 to <4 x i32>*
  -->  {(48 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S:
[-9223372036854775808,9223372036854775793)          Exits: (3888 + %a)          
LoopDispositions: { %vector.body33: Computable }
  %index.next38.1 = add nuw nsw i64 %index37, 16
  -->  {16,+,40}<nuw><nsw><%vector.body33> U: [16,977) S:
[16,977)              Exits: 976              LoopDispositions: {
%vector.body33: Computable }
  %20 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64
%index.next38.1
  -->  {(64 + %a)<nsw>,+,160}<nsw><%vector.body33> U:
[0,-15) S: [-9223372036854775808,9223372036854775793)             Exits: (3904 +
%a)              LoopDispositions: { %vector.body33: Computable }
  %21 = bitcast i32* %20 to <4 x i32>*
  -->  {(64 + %a)<nsw>,+,160}<nsw><%vector.body33> U:
[0,-15) S: [-9223372036854775808,9223372036854775793)             Exits: (3904 +
%a)              LoopDispositions: { %vector.body33: Computable }
  %22 = getelementptr inbounds i32, i32* %20, i64 4
  -->  {(80 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S:
[-9223372036854775808,9223372036854775793)          Exits: (3920 + %a)          
LoopDispositions: { %vector.body33: Computable }
  %23 = bitcast i32* %22 to <4 x i32>*
  -->  {(80 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S:
[-9223372036854775808,9223372036854775793)          Exits: (3920 + %a)          
LoopDispositions: { %vector.body33: Computable }
  %index.next38.2 = add nuw nsw i64 %index37, 24
  -->  {24,+,40}<nuw><nsw><%vector.body33> U: [24,985) S:
[24,985)              Exits: 984              LoopDispositions: {
%vector.body33: Computable }
  %26 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64
%index.next38.2
  -->  {(96 + %a)<nsw>,+,160}<nsw><%vector.body33> U:
[0,-15) S: [-9223372036854775808,9223372036854775793)             Exits: (3936 +
%a)              LoopDispositions: { %vector.body33: Computable }
  %27 = bitcast i32* %26 to <4 x i32>*
  -->  {(96 + %a)<nsw>,+,160}<nsw><%vector.body33> U:
[0,-15) S: [-9223372036854775808,9223372036854775793)             Exits: (3936 +
%a)              LoopDispositions: { %vector.body33: Computable }
  %28 = getelementptr inbounds i32, i32* %26, i64 4
  -->  {(112 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S:
[-9223372036854775808,9223372036854775793)         Exits: (3952 + %a)           
LoopDispositions: { %vector.body33: Computable }
  %29 = bitcast i32* %28 to <4 x i32>*
  -->  {(112 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S:
[-9223372036854775808,9223372036854775793)         Exits: (3952 + %a)           
LoopDispositions: { %vector.body33: Computable }
  %index.next38.3 = add nuw nsw i64 %index37, 32
  -->  {32,+,40}<nuw><nsw><%vector.body33> U: [32,993) S:
[32,993)              Exits: 992              LoopDispositions: {
%vector.body33: Computable }
 %32 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64
%index.next38.3
  -->  {(128 + %a)<nsw>,+,160}<nsw><%vector.body33> U:
[0,-15) S: [-9223372036854775808,9223372036854775793)            Exits: (3968 +
%a)              LoopDispositions: { %vector.body33: Computable }
  %33 = bitcast i32* %32 to <4 x i32>*
  -->  {(128 + %a)<nsw>,+,160}<nsw><%vector.body33> U:
[0,-15) S: [-9223372036854775808,9223372036854775793)            Exits: (3968 +
%a)              LoopDispositions: { %vector.body33: Computable }
  %34 = getelementptr inbounds i32, i32* %32, i64 4
  -->  {(144 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S:
[-9223372036854775808,9223372036854775793)         Exits: (3984 + %a)           
LoopDispositions: { %vector.body33: Computable }
  %35 = bitcast i32* %34 to <4 x i32>*
  -->  {(144 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S:
[-9223372036854775808,9223372036854775793)         Exits: (3984 + %a)           
LoopDispositions: { %vector.body33: Computable }
  %index.next38.4 = add nuw nsw i64 %index37, 40
  -->  {40,+,40}<nuw><nsw><%vector.body33> U: [40,1001) S:
[40,1001)            Exits: 1000             LoopDispositions: { %vector.body33:
Computable }
  %39 = extractelement <4 x i32> %bin.rdx46, i32 0
  -->  %39 U: full-set S: full-set
  %call = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([11 x
i8], [11 x i8]* @.str, i64 0, i64 0), i32 %39)
  -->  %call U: full-set S: full-set
  %40 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64
%index.next
  -->  {(32 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15)
S: [-9223372036854775808,9223372036854775793)               Exits: (4000 + %a)  
LoopDispositions: { %vector.body: Computable }
  %43 = bitcast i32* %40 to <4 x i32>*
  -->  {(32 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15)
S: [-9223372036854775808,9223372036854775793)               Exits: (4000 + %a)  
LoopDispositions: { %vector.body: Computable }
  %44 = getelementptr inbounds i32, i32* %40, i64 4
  -->  {(48 + %a),+,128}<nsw><%vector.body> U: [0,-15) S:
[-9223372036854775808,9223372036854775793)            Exits: (4016 + %a)        
LoopDispositions: { %vector.body: Computable }                                 
<-------------------------------------------OUTSIDE RANGE  4016/4 = 1004
  %45 = bitcast i32* %44 to <4 x i32>*
  -->  {(48 + %a),+,128}<nsw><%vector.body> U: [0,-15) S:
[-9223372036854775808,9223372036854775793)            Exits: (4016 + %a)        
LoopDispositions: { %vector.body: Computable }
  %index.next.1 = or i64 %index, 16
  -->  {16,+,32}<nuw><nsw><%vector.body> U: [16,1009) S:
[16,1009)              Exits: 1008             LoopDispositions: { %vector.body:
Computable }
  %46 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64
%index.next.1
  -->  {(64 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15)
S: [-9223372036854775808,9223372036854775793)               Exits: (4032 + %a)  
LoopDispositions: { %vector.body: Computable }                     
<-------------------------------------------OUTSIDE RANGE  4032/4 = 1008

  %49 = bitcast i32* %46 to <4 x i32>*
  -->  {(64 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15)
S: [-9223372036854775808,9223372036854775793)               Exits: (4032 + %a)  
LoopDispositions: { %vector.body: Computable }
  %50 = getelementptr inbounds i32, i32* %46, i64 4
  -->  {(80 + %a),+,128}<nsw><%vector.body> U: [0,-15) S:
[-9223372036854775808,9223372036854775793)            Exits: (4048 + %a)        
LoopDispositions: { %vector.body: Computable }
  %51 = bitcast i32* %50 to <4 x i32>*
  -->  {(80 + %a),+,128}<nsw><%vector.body> U: [0,-15) S:
[-9223372036854775808,9223372036854775793)            Exits: (4048 + %a)        
LoopDispositions: { %vector.body: Computable }
  %index.next.2 = or i64 %index, 24
  -->  {24,+,32}<nuw><nsw><%vector.body> U: [24,1017) S:
[24,1017)              Exits: 1016             LoopDispositions: { %vector.body:
Computable }
  %52 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64
%index.next.2
  -->  {(96 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15)
S: [-9223372036854775808,9223372036854775793)               Exits: (4064 + %a)  
LoopDispositions: { %vector.body: Computable }
  %55 = bitcast i32* %52 to <4 x i32>*
  -->  {(96 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15)
S: [-9223372036854775808,9223372036854775793)               Exits: (4064 + %a)  
LoopDispositions: { %vector.body: Computable }
  %56 = getelementptr inbounds i32, i32* %52, i64 4
  -->  {(112 + %a),+,128}<nsw><%vector.body> U: [0,-15) S:
[-9223372036854775808,9223372036854775793)           Exits: (4080 + %a)         
LoopDispositions: { %vector.body: Computable }
  %57 = bitcast i32* %56 to <4 x i32>*
  -->  {(112 + %a),+,128}<nsw><%vector.body> U: [0,-15) S:
[-9223372036854775808,9223372036854775793)           Exits: (4080 + %a)         
LoopDispositions: { %vector.body: Computable }                                  
<-------------------------------------------OUTSIDE RANGE  4080/4 = 1020
  %index.next.3 = add nuw nsw i64 %index, 32
  -->  {32,+,32}<nuw><nsw><%vector.body> U: [32,1025) S:
[32,1025)              Exits: 1024             LoopDispositions: { %vector.body:
Computable }
Determining loop execution counts for: @main
Loop %vector.body33: backedge-taken count is 24
Loop %vector.body33: max backedge-taken count is 24
Loop %vector.body33: Predicated backedge-taken count is 24
Predicates:

Loop %vector.body33: Trip multiple is 25
Loop %vector.body: backedge-taken count is 31
Loop %vector.body: max backedge-taken count is 31
Loop %vector.body: Predicated backedge-taken count is 31
Predicates:

Loop %vector.body: Trip multiple is 32
$


Thanks
M Suresh

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180824/66210244/attachment-0001.html>
llvm dev - Aug 2018 - Imprecise information by SCEV after loop unrolling and Vectorization.

[llvm-dev] Imprecise information by SCEV after loop unrolling and Vectorization.