Anna Thomas via llvm-dev
2016-Jun-30 15:05 UTC
[llvm-dev] Optimizations hindered by GVN widening
Currently, the GVN optimization widens loads if the alignment permits it. There are some limitations that show up, as seen in example below: Initial IR: declare void @consume(i8) readonly define i8 @bar(i8* align 2 %a) { %1 = load i8, i8* %a %idx = getelementptr i8, i8* %a, i64 1 %2 = load i8, i8* %idx, align 1 call void @consume(i8 %1). ret i8 %2 } define i1 @foo(i8* %a) { entry: %0 = call i8 @bar(i8* %a) %1 = icmp eq i8 %0, 0 br i1 %1, label %cont.1, label %exit cont.1: store i8 0, i8* %a %2 = call i8 @bar(i8* %a) %3 = icmp eq i8 %2, 0 ret i1 %3 exit: ret i1 true } Since %a is 2 byte aligned, GVN widens the loads in bar, then bar() gets inlined into foo. The resulting final IR at O3: define i8 @bar(i8* align 2 %a) { %1 = bitcast i8* %a to i16* %2 = load i16, i16* %1, align 2 %3 = trunc i16 %2 to i8 %idx = getelementptr i8, i8* %a, i64 1 %4 = lshr i16 %2, 8 %5 = trunc i16 %4 to i8 call void @consume(i8 %3) ret i8 %5 } define i1 @foo(i8* %a) { entry: %0 = bitcast i8* %a to i16* %1 = load i16, i16* %0, align 2 <— widened load from bar() %2 = trunc i16 %1 to i8 call void @consume(i8 %2) %3 = icmp ult i16 %1, 256 br i1 %3, label %cont.1, label %exit cont.1: ; preds = %entry store i8 0, i8* %a, align 2 %4 = load i16, i16* %0, align 2 <— widened load from bar() %5 = trunc i16 %4 to i8 call void @consume(i8 %5) %6 = icmp ult i16 %4, 256 ret i1 %6 exit: ; preds = %entry ret i1 true } In the absence of GVN widening (we can see this when %a is 1 byte aligned in bar), bar is inlined into foo as-is. Final IR at O3: define i8 @bar(i8* align 1 %a) { %1 = load i8, i8* %a, align 1 <— align is 1, so GVN does not widen load %idx = getelementptr i8, i8* %a, i64 1 %2 = load i8, i8* %idx, align 1 call void @consume(i8 %1) ret i8 %2 } define i1 @foo(i8* %a) { entry: %0 = load i8, i8* %a, align 1 %idx.i = getelementptr i8, i8* %a, i64 1 %1 = load i8, i8* %idx.i, align 1 <— both loads exist non-widened from bar(). Used in GVN for removing the loads in cont.1 BB. call void @consume(i8 %0) %2 = icmp eq i8 %1, 0 br i1 %2, label %cont.1, label %exit cont.1: ; preds = %entry store i8 0, i8* %a, align 1 ; both the loads are removed. First load has the value fed from the store above. Second load is same as the one in entry BB (%1). call void @consume(i8 0) ret i1 true exit: ; preds = %entry ret i1 true } Here the 2 loads in the cont.1 basic block are removed by GVN, since the loads are not widened to a single load. Also, GVN and later optimizations does value forwarding of %a (0) and also the compare folds to true. This does not happen when GVN widens the load. Note that at the time GVN did the widening, it was a good transform since we have a single load instead of 2. However, inlining and later optimizations proved that leaving the load non-widened was more beneficial. Any thoughts on how we could mitigate these problems introduced by GVN widening? Thanks, Anna