Mani, Suresh via llvm-dev
2018-Aug-24 10:58 UTC
[llvm-dev] Imprecise information by SCEV after loop unrolling and Vectorization.
Hi, I have a simple test case, // 1.c #include <stdio.h> int b[1000]; main() { int a[1000]; int sum = 0; for (int i = 0; i < 1000; i++) { a[i] = i + 10; } for (int i = 0; i < 1000; i += 1) { sum = sum + a[i]; } printf("sum = %ld\n", sum); } Compilation: clang -m64 -fuse-ld=lld -flto -S -emit-llvm -o 1.ll 1.c SCEV DUMP: opt -S -loop-accesses -mem2reg -analyze -indvars -loop-simplify -scalar-evolution 1.ll Due to unrolling and Vectorization, Scev determines that some of the access are outside the range [ 0 - 999 ] --------------- [ 0 - 1020 ] as highlighted below in red, not sure if this issue is a known issue in Scev ? Printing analysis 'Loop Access Analysis' for function 'main': vector.body33: Memory dependences are safe Dependences: Run-time memory checks: Grouped accesses: Store to invariant address was not found in loop. SCEV assumptions: Expressions re-written: vector.body: Report: loop control flow is not understood by analyzer Dependences: Run-time memory checks: Grouped accesses: Store to invariant address was not found in loop. SCEV assumptions: Expressions re-written: Printing analysis 'Promote Memory to Register' for function 'main': Pass::print not implemented for pass: 'Promote Memory to Register'! Printing analysis 'Induction Variable Simplification': Pass::print not implemented for pass: 'Induction Variable Simplification'! Printing analysis 'Induction Variable Simplification': Pass::print not implemented for pass: 'Induction Variable Simplification'! Printing analysis 'Canonicalize natural loops' for function 'main': Pass::print not implemented for pass: 'Canonicalize natural loops'! Printing analysis 'Scalar Evolution Analysis' for function 'main': Classifying expressions for: @main %a = alloca [1000 x i32], align 16 --> %a U: [0,-15) S: [-9223372036854775808,9223372036854775793) %0 = bitcast [1000 x i32]* %a to i8* --> %a U: [0,-15) S: [-9223372036854775808,9223372036854775793) %index = phi i64 [ 0, %entry ], [ %index.next.3, %vector.body.1 ] --> {0,+,32}<nuw><nsw><%vector.body> U: [0,993) S: [0,993) Exits: 992 LoopDispositions: { %vector.body: Computable } %1 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index --> {%a,+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3968 + %a) LoopDispositions: { %vector.body: Computable } %4 = bitcast i32* %1 to <4 x i32>* --> {%a,+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3968 + %a) LoopDispositions: { %vector.body: Computable } %5 = getelementptr inbounds i32, i32* %1, i64 4 --> {(16 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3984 + %a) LoopDispositions: { %vector.body: Computable } %6 = bitcast i32* %5 to <4 x i32>* --> {(16 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3984 + %a) LoopDispositions: { %vector.body: Computable } %index.next = or i64 %index, 8 --> {8,+,32}<nuw><nsw><%vector.body> U: [8,1001) S: [8,1001) Exits: 1000 LoopDispositions: { %vector.body: Computable } %index37 = phi i64 [ %index.next38.4, %vector.body33 ], [ 0, %vector.body33.preheader ] --> {0,+,40}<nuw><nsw><%vector.body33> U: [0,961) S: [0,961) Exits: 960 LoopDispositions: { %vector.body33: Computable } %8 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index37 --> {%a,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3840 + %a) LoopDispositions: { %vector.body33: Computable } %9 = bitcast i32* %8 to <4 x i32>* --> {%a,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3840 + %a) LoopDispositions: { %vector.body33: Computable } %10 = getelementptr inbounds i32, i32* %8, i64 4 --> {(16 + %a)<nsw>,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3856 + %a) LoopDispositions: { %vector.body33: Computable } %11 = bitcast i32* %10 to <4 x i32>* --> {(16 + %a)<nsw>,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3856 + %a) LoopDispositions: { %vector.body33: Computable } %index.next38 = add nuw nsw i64 %index37, 8 --> {8,+,40}<nuw><nsw><%vector.body33> U: [8,969) S: [8,969) Exits: 968 LoopDispositions: { %vector.body33: Computable } %14 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index.next38 --> {(32 + %a)<nsw>,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3872 + %a) LoopDispositions: { %vector.body33: Computable } %15 = bitcast i32* %14 to <4 x i32>* --> {(32 + %a)<nsw>,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3872 + %a) LoopDispositions: { %vector.body33: Computable } %16 = getelementptr inbounds i32, i32* %14, i64 4 --> {(48 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3888 + %a) LoopDispositions: { %vector.body33: Computable } %17 = bitcast i32* %16 to <4 x i32>* --> {(48 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3888 + %a) LoopDispositions: { %vector.body33: Computable } %index.next38.1 = add nuw nsw i64 %index37, 16 --> {16,+,40}<nuw><nsw><%vector.body33> U: [16,977) S: [16,977) Exits: 976 LoopDispositions: { %vector.body33: Computable } %20 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index.next38.1 --> {(64 + %a)<nsw>,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3904 + %a) LoopDispositions: { %vector.body33: Computable } %21 = bitcast i32* %20 to <4 x i32>* --> {(64 + %a)<nsw>,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3904 + %a) LoopDispositions: { %vector.body33: Computable } %22 = getelementptr inbounds i32, i32* %20, i64 4 --> {(80 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3920 + %a) LoopDispositions: { %vector.body33: Computable } %23 = bitcast i32* %22 to <4 x i32>* --> {(80 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3920 + %a) LoopDispositions: { %vector.body33: Computable } %index.next38.2 = add nuw nsw i64 %index37, 24 --> {24,+,40}<nuw><nsw><%vector.body33> U: [24,985) S: [24,985) Exits: 984 LoopDispositions: { %vector.body33: Computable } %26 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index.next38.2 --> {(96 + %a)<nsw>,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3936 + %a) LoopDispositions: { %vector.body33: Computable } %27 = bitcast i32* %26 to <4 x i32>* --> {(96 + %a)<nsw>,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3936 + %a) LoopDispositions: { %vector.body33: Computable } %28 = getelementptr inbounds i32, i32* %26, i64 4 --> {(112 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3952 + %a) LoopDispositions: { %vector.body33: Computable } %29 = bitcast i32* %28 to <4 x i32>* --> {(112 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3952 + %a) LoopDispositions: { %vector.body33: Computable } %index.next38.3 = add nuw nsw i64 %index37, 32 --> {32,+,40}<nuw><nsw><%vector.body33> U: [32,993) S: [32,993) Exits: 992 LoopDispositions: { %vector.body33: Computable } %32 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index.next38.3 --> {(128 + %a)<nsw>,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3968 + %a) LoopDispositions: { %vector.body33: Computable } %33 = bitcast i32* %32 to <4 x i32>* --> {(128 + %a)<nsw>,+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3968 + %a) LoopDispositions: { %vector.body33: Computable } %34 = getelementptr inbounds i32, i32* %32, i64 4 --> {(144 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3984 + %a) LoopDispositions: { %vector.body33: Computable } %35 = bitcast i32* %34 to <4 x i32>* --> {(144 + %a),+,160}<nsw><%vector.body33> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (3984 + %a) LoopDispositions: { %vector.body33: Computable } %index.next38.4 = add nuw nsw i64 %index37, 40 --> {40,+,40}<nuw><nsw><%vector.body33> U: [40,1001) S: [40,1001) Exits: 1000 LoopDispositions: { %vector.body33: Computable } %39 = extractelement <4 x i32> %bin.rdx46, i32 0 --> %39 U: full-set S: full-set %call = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([11 x i8], [11 x i8]* @.str, i64 0, i64 0), i32 %39) --> %call U: full-set S: full-set %40 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index.next --> {(32 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4000 + %a) LoopDispositions: { %vector.body: Computable } %43 = bitcast i32* %40 to <4 x i32>* --> {(32 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4000 + %a) LoopDispositions: { %vector.body: Computable } %44 = getelementptr inbounds i32, i32* %40, i64 4 --> {(48 + %a),+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4016 + %a) LoopDispositions: { %vector.body: Computable } <-------------------------------------------OUTSIDE RANGE 4016/4 = 1004 %45 = bitcast i32* %44 to <4 x i32>* --> {(48 + %a),+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4016 + %a) LoopDispositions: { %vector.body: Computable } %index.next.1 = or i64 %index, 16 --> {16,+,32}<nuw><nsw><%vector.body> U: [16,1009) S: [16,1009) Exits: 1008 LoopDispositions: { %vector.body: Computable } %46 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index.next.1 --> {(64 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4032 + %a) LoopDispositions: { %vector.body: Computable } <-------------------------------------------OUTSIDE RANGE 4032/4 = 1008 %49 = bitcast i32* %46 to <4 x i32>* --> {(64 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4032 + %a) LoopDispositions: { %vector.body: Computable } %50 = getelementptr inbounds i32, i32* %46, i64 4 --> {(80 + %a),+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4048 + %a) LoopDispositions: { %vector.body: Computable } %51 = bitcast i32* %50 to <4 x i32>* --> {(80 + %a),+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4048 + %a) LoopDispositions: { %vector.body: Computable } %index.next.2 = or i64 %index, 24 --> {24,+,32}<nuw><nsw><%vector.body> U: [24,1017) S: [24,1017) Exits: 1016 LoopDispositions: { %vector.body: Computable } %52 = getelementptr inbounds [1000 x i32], [1000 x i32]* %a, i64 0, i64 %index.next.2 --> {(96 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4064 + %a) LoopDispositions: { %vector.body: Computable } %55 = bitcast i32* %52 to <4 x i32>* --> {(96 + %a)<nsw>,+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4064 + %a) LoopDispositions: { %vector.body: Computable } %56 = getelementptr inbounds i32, i32* %52, i64 4 --> {(112 + %a),+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4080 + %a) LoopDispositions: { %vector.body: Computable } %57 = bitcast i32* %56 to <4 x i32>* --> {(112 + %a),+,128}<nsw><%vector.body> U: [0,-15) S: [-9223372036854775808,9223372036854775793) Exits: (4080 + %a) LoopDispositions: { %vector.body: Computable } <-------------------------------------------OUTSIDE RANGE 4080/4 = 1020 %index.next.3 = add nuw nsw i64 %index, 32 --> {32,+,32}<nuw><nsw><%vector.body> U: [32,1025) S: [32,1025) Exits: 1024 LoopDispositions: { %vector.body: Computable } Determining loop execution counts for: @main Loop %vector.body33: backedge-taken count is 24 Loop %vector.body33: max backedge-taken count is 24 Loop %vector.body33: Predicated backedge-taken count is 24 Predicates: Loop %vector.body33: Trip multiple is 25 Loop %vector.body: backedge-taken count is 31 Loop %vector.body: max backedge-taken count is 31 Loop %vector.body: Predicated backedge-taken count is 31 Predicates: Loop %vector.body: Trip multiple is 32 $ Thanks M Suresh -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180824/66210244/attachment-0001.html>