On Sun, May 3, 2015 at 9:57 PM, David Majnemer <david.majnemer at gmail.com> wrote:> > > On Sun, May 3, 2015 at 6:26 PM, Nicholas White <n.j.white at gmail.com> > wrote: > >> Hi - I've got a question about what optimizations the "inbounds" >> keyword of "getelementptr" allows you to use. In the code below, %five >> is loaded from and inbounds offset of either a null pointer or %mem: >> >> target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128" >> >> define i8 @func(i8* %mem) { >> %test = icmp eq i8* %mem, null >> br i1 %test, label %done, label %mem.is.valid >> mem.is.valid: >> %1 = getelementptr inbounds i8, i8* %mem, i64 4 >> store i8 5, i8* %1, align 4 >> br label %done >> done: >> %after.phi = phi i8* [ %mem, %mem.is.valid ], [ null, %0 ] >> %2 = getelementptr inbounds i8, i8* %after.phi, i64 4 >> %five = load i8, i8* %2, align 4 >> ret i8 %five >> } >> >> According to the documentation, "the result value of the getelementptr >> is a poison value if the base pointer is not an in bounds address of >> an allocated object", so does this mean it's valid to optimise the >> function to: >> >> define i8 @func(i8* %mem) { >> %test = icmp eq i8* %mem, null >> br i1 %test, label %done, label %mem.is.valid >> mem.is.valid: >> ret i8 5 >> done: >> ret i8 undef >> } >> >> Or even this: >> >> define i8 @func(i8* %mem) { >> ret i8 5 >> } >> > > No, neither are semantics preserving because there exists no store to > '%mem'. > > Let's start by hoisting both sides of the branch into the entry block. > This will leave you with: > define i8 @func(i8* %mem) { > %test = icmp eq i8* %mem, null > %after.phi = select i1 %test, i8* null, %mem > %1 = getelementptr inbounds i8, i8* %mem, i64 4 > store i8 5, i8* %1, align 4 > %2 = getelementptr inbounds i8, i8* %after.phi, i64 4 > %five = load i8, i8* %2, align 4 > ret i8 %five > } > > The SSA value '%after.phi' can be trivially simplified to '%mem', this > leaves us with: > define i8 @func(i8* %mem) { > %1 = getelementptr inbounds i8, i8* %mem, i64 4 > store i8 5, i8* %1, align 4 > %2 = getelementptr inbounds i8, i8* %mem, i64 4 > %five = load i8, i8* %2, align 4 > ret i8 %five > } > > The SSA values '%1' and '%2' are equivalent, this leaves us with: > define i8 @func(i8* %mem) { > %1 = getelementptr inbounds i8, i8* %mem, i64 4 > store i8 5, i8* %1, align 4 > ret i8 5 > } > > I do not believe further simplification is possible. >One final point that I forgot to mention: the transformations that I performed did not rely on the presence of 'inbounds'.> > >> ...? This is a reduced example of something I saw while running "opt" >> on a test case that missed a null check. Thanks - >> >> Nick >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150503/b7191bd9/attachment.html>
Thanks - that makes sense. It's interesting that at -O3 the optimizer can't reduce the below though - I'll dig into it a bit and see if I can make a patch that fixes it: target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128" %struct.my_s = type { i32, i32, [0 x i8*] } ; Function Attrs: noreturn declare void @__assert_rtn() define void @func(i8* %mem) { %1 = icmp eq i8* %mem, null br i1 %1, label %check.zero, label %stash.zero stash.zero: %2 = bitcast i8* %mem to %struct.my_s* %3 = getelementptr inbounds i8, i8* %mem, i64 4 %4 = bitcast i8* %3 to i32* store i32 0, i32* %4, align 4 br label %check.zero check.zero: %.0.i = phi %struct.my_s* [ %2, %stash.zero ], [ null, %0 ] %5 = getelementptr inbounds %struct.my_s, %struct.my_s* %.0.i, i64 0, i32 1 %6 = load i32, i32* %5, align 4 %7 = icmp eq i32 %6, 0 br i1 %7, label %success, label %check.first.array.element check.first.array.element: %8 = getelementptr inbounds %struct.my_s, %struct.my_s* %.0.i, i64 0, i32 2, i64 0 %9 = load i8*, i8** %8, align 1 %10 = icmp eq i8* %9, null br i1 %10, label %success, label %abort abort: tail call void @__assert_rtn() unreachable success: ret void } Thanks - Nick
On Mon, May 4, 2015 at 9:40 AM, Nicholas White <n.j.white at gmail.com> wrote:> Thanks - that makes sense. It's interesting that at -O3 the optimizer > can't reduce the below though - I'll dig into it a bit and see if I > can make a patch that fixes it:I'm unsure what you expect to happen below. It's not quite the same testcase. GVN will PRE the loads, so you end up with one load. But i can't see how you expect it to determine anything else. Can you walk me through the below testcase and epxlain what you expect to ahppen? If so, i can probably make it happen for you :)> > target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128" > > %struct.my_s = type { i32, i32, [0 x i8*] } > > ; Function Attrs: noreturn > declare void @__assert_rtn() > > define void @func(i8* %mem) { > %1 = icmp eq i8* %mem, null > br i1 %1, label %check.zero, label %stash.zero > > stash.zero: > %2 = bitcast i8* %mem to %struct.my_s* > %3 = getelementptr inbounds i8, i8* %mem, i64 4 > %4 = bitcast i8* %3 to i32* > store i32 0, i32* %4, align 4 > br label %check.zero > > check.zero: > %.0.i = phi %struct.my_s* [ %2, %stash.zero ], [ null, %0 ] > %5 = getelementptr inbounds %struct.my_s, %struct.my_s* %.0.i, i64 0, i32 1 > %6 = load i32, i32* %5, align 4 > %7 = icmp eq i32 %6, 0 > br i1 %7, label %success, label %check.first.array.element > > check.first.array.element: > %8 = getelementptr inbounds %struct.my_s, %struct.my_s* %.0.i, i64 > 0, i32 2, i64 0 > %9 = load i8*, i8** %8, align 1 > %10 = icmp eq i8* %9, null > br i1 %10, label %success, label %abort > > abort: > tail call void @__assert_rtn() > unreachable > > success: > ret void > } > > Thanks - > > Nick > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Possibly Parallel Threads
- [LLVMdev] Semantics of an Inbounds GetElementPtr
- [LLVMdev] Semantics of an Inbounds GetElementPtr
- [LLVMdev] InstCombine strips the inBounds attribute in GetElementPtr ConstantExpr
- RFC: inbounds on getelementptr indices for global splitting
- getelementptr inbounds with offset 0