Hi again. I have a complex type system in my custom language that isn't easily representable as LLVM IR types, so I figured I could mostly get along with treating my types as i8* and doing the appropriate bitcasts and inttoptr instructions, and doing pointer arithmetic myself (by casting the pointers to ints, adding the appropriate byte offsets, and then casting back to pointers). However, I've found some oddities. While I have no problems generating IR, when I run it through the optimizer (opt -O3), it generates what appears to be totally wrong code. Here's my test case: I have a global called *testObj*. It would look like this in C: struct TestObjClass { int32 dummy1, dummy2; int32* m_array; } extern TestObjClass* testObj; and I'm trying to access: testObj->m_array[1] = 10 Theoretically, this should be a load to get the pointer to testObj (since it's a global), I should add 8 bytes, then do another load (to load the address of the array), add 4 bytes to get the array[1] element, and then store at that pointer the number 10. Here's my original output: @"compile-test::*testObj*" = external constant i8* ; <i8**> [#uses=1] define void @"compile-test::__toplevel-main"() { entry: store i8* null, i8** @"compile-test::*testObj*" %1 = load i8** @"compile-test::*testObj*" ; <i8*> [#uses=1] %2 = ptrtoint i8* %1 to i32 ; <i32> [#uses=1] %3 = add i32 %2, 8 ; <i32> [#uses=1] %4 = inttoptr i32 %3 to i8* ; <i8*> [#uses=1] %5 = load i8* %4 ; <i8> [#uses=1] %6 = inttoptr i8 %5 to i8* ; <i8*> [#uses=1] %7 = ptrtoint i8* %6 to i32 ; <i32> [#uses=1] %8 = add i32 %7, 4 ; <i32> [#uses=1] %9 = inttoptr i32 %8 to i8* ; <i8*> [#uses=1] %10 = bitcast i8* %9 to i32* ; <i32*> [#uses=1] store i32 10, i32* %10 ret void } This seems right to me. However, when I run it through opt -O3 and the through llvm-dis: @"compile-test::*testObj*" = external constant i8* ; <i8**> [#uses=1] define void @"compile-test::__toplevel-main"() { entry: %0 = load i8* inttoptr (i64 8 to i8*), align 8 ; <i8> [#uses=1] %1 = inttoptr i8 %0 to i8* ; <i8*> [#uses=1] %2 = ptrtoint i8* %1 to i32 ; <i32> [#uses=1] %3 = add i32 %2, 4 ; <i32> [#uses=1] %4 = inttoptr i32 %3 to i32* ; <i32*> [#uses=1] store i32 10, i32* %4 ret void } Notice how there's no mention of compile-test::*testObj* at all. Instead, the first line is loading from (i64 8)! What am I doing wrong? Thanks in advance, Scott
On Mon, Dec 14, 2009 at 12:27 PM, Scott Shumaker <sshumaker at gmail.com> wrote:> define void @"compile-test::__toplevel-main"() { > entry: > store i8* null, i8** @"compile-test::*testObj*" > %1 = load i8** @"compile-test::*testObj*" ; <i8*> [#uses=1]Here, %1 is guaranteed to be null.> %2 = ptrtoint i8* %1 to i32 ; <i32> [#uses=1] > %3 = add i32 %2, 8 ; <i32> [#uses=1] > %4 = inttoptr i32 %3 to i8* ; <i8*> [#uses=1] > %5 = load i8* %4 ; <i8> [#uses=1]So therefore, this load loads from null+8. -Eli
Never mind, I'm stupid. I forgot that I was setting the global to NULL immediately beforehand - so LLVM was optimizing this out. Scott On Mon, Dec 14, 2009 at 12:27 PM, Scott Shumaker <sshumaker at gmail.com> wrote:> Hi again. > > I have a complex type system in my custom language that isn't easily > representable as LLVM IR types, so I figured I could mostly get along > with treating my types as i8* and doing the appropriate bitcasts and > inttoptr instructions, and doing pointer arithmetic myself (by casting > the pointers to ints, adding the appropriate byte offsets, and then > casting back to pointers). > > However, I've found some oddities. While I have no problems > generating IR, when I run it through the optimizer (opt -O3), it > generates what appears to be totally wrong code. > > Here's my test case: > > I have a global called *testObj*. It would look like this in C: > > struct TestObjClass > { > int32 dummy1, dummy2; > int32* m_array; > } > > extern TestObjClass* testObj; > > and I'm trying to access: > > testObj->m_array[1] = 10 > > Theoretically, this should be a load to get the pointer to testObj > (since it's a global), I should add 8 bytes, then do another load (to > load the address of the array), add 4 bytes to get the array[1] > element, and then store at that pointer the number 10. > > Here's my original output: > > @"compile-test::*testObj*" = external constant i8* ; <i8**> [#uses=1] > > define void @"compile-test::__toplevel-main"() { > entry: > store i8* null, i8** @"compile-test::*testObj*" > %1 = load i8** @"compile-test::*testObj*" ; <i8*> [#uses=1] > %2 = ptrtoint i8* %1 to i32 ; <i32> [#uses=1] > %3 = add i32 %2, 8 ; <i32> [#uses=1] > %4 = inttoptr i32 %3 to i8* ; <i8*> [#uses=1] > %5 = load i8* %4 ; <i8> [#uses=1] > %6 = inttoptr i8 %5 to i8* ; <i8*> [#uses=1] > %7 = ptrtoint i8* %6 to i32 ; <i32> [#uses=1] > %8 = add i32 %7, 4 ; <i32> [#uses=1] > %9 = inttoptr i32 %8 to i8* ; <i8*> [#uses=1] > %10 = bitcast i8* %9 to i32* ; <i32*> [#uses=1] > store i32 10, i32* %10 > ret void > } > > This seems right to me. However, when I run it through opt -O3 and > the through llvm-dis: > > @"compile-test::*testObj*" = external constant i8* ; <i8**> [#uses=1] > define void @"compile-test::__toplevel-main"() { > entry: > %0 = load i8* inttoptr (i64 8 to i8*), align 8 ; <i8> [#uses=1] > %1 = inttoptr i8 %0 to i8* ; <i8*> [#uses=1] > %2 = ptrtoint i8* %1 to i32 ; <i32> [#uses=1] > %3 = add i32 %2, 4 ; <i32> [#uses=1] > %4 = inttoptr i32 %3 to i32* ; <i32*> [#uses=1] > store i32 10, i32* %4 > ret void > } > > Notice how there's no mention of compile-test::*testObj* at all. > Instead, the first line is loading from (i64 8)! What am I doing > wrong? > > Thanks in advance, > Scott >-- --------------------- Scott Shumaker CTO, magnifeast.com Online ordering from hundreds of LA delivery and carryout restaurants
On Mon, Dec 14, 2009 at 12:27 PM, Scott Shumaker <sshumaker at gmail.com> wrote:> Hi again. > > I have a complex type system in my custom language that isn't easily > representable as LLVM IR types, so I figured I could mostly get along > with treating my types as i8* and doing the appropriate bitcasts and > inttoptr instructions, and doing pointer arithmetic myself (by casting > the pointers to ints, adding the appropriate byte offsets, and then > casting back to pointers). > > However, I've found some oddities. While I have no problems > generating IR, when I run it through the optimizer (opt -O3), it > generates what appears to be totally wrong code. > > Here's my test case: > > I have a global called *testObj*. It would look like this in C: > > struct TestObjClass > { > int32 dummy1, dummy2; > int32* m_array; > } > > extern TestObjClass* testObj; > > and I'm trying to access: > > testObj->m_array[1] = 10 > > Theoretically, this should be a load to get the pointer to testObj > (since it's a global), I should add 8 bytes, then do another load (to > load the address of the array), add 4 bytes to get the array[1] > element, and then store at that pointer the number 10. > > Here's my original output: > > @"compile-test::*testObj*" = external constant i8* ; <i8**> [#uses=1] > > define void @"compile-test::__toplevel-main"() { > entry: > store i8* null, i8** @"compile-test::*testObj*"I'm surprised this store got optimized out, even though LLVM can optimize away the subsequent load. Writing to an external global variable is a visible side-effect, and unless there's other undefined behavior, LLVM shouldn't remove it.> %1 = load i8** @"compile-test::*testObj*" ; <i8*> [#uses=1] > %2 = ptrtoint i8* %1 to i32 ; <i32> [#uses=1] > %3 = add i32 %2, 8 ; <i32> [#uses=1] > %4 = inttoptr i32 %3 to i8* ; <i8*> [#uses=1]You may be able to save some instructions (and maybe give the optimizers more information) by replacing the above with %4 = getelementptr i8* %1, i32 8 That'll be equivalent to the inttoptr(ptrtoint(%4) + 8) on systems with 8-bit bytes.> %5 = load i8* %4 ; <i8> [#uses=1] > %6 = inttoptr i8 %5 to i8* ; <i8*> [#uses=1] > %7 = ptrtoint i8* %6 to i32 ; <i32> [#uses=1]The above two lines look odd to me. Aren't they equivalent to %7 = zext i8 %5 to i32 ?> %8 = add i32 %7, 4 ; <i32> [#uses=1] > %9 = inttoptr i32 %8 to i8* ; <i8*> [#uses=1] > %10 = bitcast i8* %9 to i32* ; <i32*> [#uses=1] > store i32 10, i32* %10And then here, if I'm not mistaken, you're converting %5, which is an i8, that is 0<=%5<256, to a pointer, and then storing through it. Unless you're on an embedded system, that's a guaranteed segfault, right? All that said, in the future you can produce a better bug report by trying to find which pass is making the surprising transformation. See http://llvm.org/docs/Bugpoint.html for instructions on using bugpoint to automatically reduce the list of passes.> ret void > } > > This seems right to me. However, when I run it through opt -O3 and > the through llvm-dis: > > @"compile-test::*testObj*" = external constant i8* ; <i8**> [#uses=1] > define void @"compile-test::__toplevel-main"() { > entry: > %0 = load i8* inttoptr (i64 8 to i8*), align 8 ; <i8> [#uses=1] > %1 = inttoptr i8 %0 to i8* ; <i8*> [#uses=1] > %2 = ptrtoint i8* %1 to i32 ; <i32> [#uses=1] > %3 = add i32 %2, 4 ; <i32> [#uses=1] > %4 = inttoptr i32 %3 to i32* ; <i32*> [#uses=1] > store i32 10, i32* %4 > ret void > } > > Notice how there's no mention of compile-test::*testObj* at all. > Instead, the first line is loading from (i64 8)! What am I doing > wrong? > > Thanks in advance, > Scott > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
On Dec 14, 2009, at 2:21 PM, Jeffrey Yasskin wrote:>> >> @"compile-test::*testObj*" = external constant i8* ; >> <i8**> [#uses=1] >> >> define void @"compile-test::__toplevel-main"() { >> entry: >> store i8* null, i8** @"compile-test::*testObj*" > > I'm surprised this store got optimized out, even though LLVM can > optimize away the subsequent load. Writing to an external global > variable is a visible side-effect, and unless there's other undefined > behavior, LLVM shouldn't remove it.Sure it can, llvm can delete any non-volatile redundant load, or any non-volatile redundant store. It doesn't matter whether it is to a global or not, LLVM (as with many compilers) memory models are for single threaded programs. We do try to conform to the C++'0x memory model by not introducing memory accesses where they did not exist before, but deleting non-volatile accesses is always fine. -Chris