Martins Mozeiko
2010-Feb-15 14:49 UTC
[LLVMdev] Incorrect execution of global constructor with JIT on ARM
Hello, llvm developers! I am running LLVM with JIT on ARM. For simple programs it runs ok, but for lager code I have stumbled upon some issues. See following C++ code to which I have reduced the problem: #include <stdio.h> struct Global { typedef unsigned char ArrayType[4]; ArrayType value; Global(const ArrayType& arg) { for (int i = 0; i < 4; i++) this->value[i] = arg[i]; } }; static const unsigned char arr[] = { 1, 2, 3, 4 }; static const Global Constant(arr); int main() { for (int i=0; i<4; i++) printf("%i", Constant.value[i]); } I am compiling it with llvm-gcc (with -O3 or -O2 optimization), and am running it with llvm version 2.6. Instead of priniting out 1234, it prints out 4444. I verified contents of Constant memory with this code: const llvm::GlobalValue* v = module->getGlobalVariable("_ZL8Constant", true); void* addr = EE->getPointerToGlobal(v); const unsigned char* ptr = (const unsigned char*)addr; for (int i=0; i<6; i++) { outs() << (int)ptr[i] << ','; } I really see that memory is filled with 4,4,4,4 and that is incorrect. When I put global constant definition const Global Constant(arr); in main function as local variable then everything runs fine - program prints out 1234. Is that some issue with LLVM JIT for ARM, or LLVM in general? Same code runs fine on Windows with same version of llvm. Global constructor code looks like this: define internal void @_GLOBAL__I_main() nounwind { entry: store i8 1, i8* getelementptr inbounds (%struct.Global* @_ZL8Constant, i32 0, i32 0, i32 0), align 8 store i8 2, i8* getelementptr inbounds (%struct.Global* @_ZL8Constant, i32 0, i32 0, i32 1), align 1 store i8 3, i8* getelementptr inbounds (%struct.Global* @_ZL8Constant, i32 0, i32 0, i32 2), align 2 store i8 4, i8* getelementptr inbounds (%struct.Global* @_ZL8Constant, i32 0, i32 0, i32 3), align 1 ret void } I don't see any problems with it. When I compile same bitcode file with llc.exe -march=arm, and use generated assembler file on my ARM, then code runs fine. What else could I check in this situation to determine more about problem? -- Martins Mozeiko
Renato Golin
2010-Feb-17 10:23 UTC
[LLVMdev] Incorrect execution of global constructor with JIT on ARM
On 15 February 2010 14:49, Martins Mozeiko <49640f8a at gmail.com> wrote:> #include <stdio.h> > struct Global { > typedef unsigned char ArrayType[4]; > ArrayType value; > Global(const ArrayType& arg) { > for (int i = 0; i < 4; i++) this->value[i] = arg[i]; > } > }; > static const unsigned char arr[] = { 1, 2, 3, 4 }; > static const Global Constant(arr); > int main() { > for (int i=0; i<4; i++) printf("%i", Constant.value[i]); > }Compiling with clang I got lots of errors, but boils down to two problems:> typedef unsigned char ArrayType[4];const_array.cpp:3:2: error: type name does not allow storage class to be specified typedef unsigned char ArrayType[4]; ^ Which, as far as I can tell, it's confusing ArrayType[4] by a declaration of an unsigned char[4] type. I've changed your code slightly to make it compile with clang, but I haven't been able to make it print 4444, not even with your own code, not even at -O3. There seems to be nothing wrong with the LLVM IR generated by your code, too, even at -O3. #include <stdio.h> typedef unsigned char ArrayType; struct Global { ArrayType value[4]; Global(const ArrayType* arg) { for (int i = 0; i < 4; i++) this->value[i] = arg[i]; } }; static const unsigned char arr[] = { 1, 2, 3, 4 }; static const struct Global Constant(arr); int main() { for (int i=0; i<4; i++) printf("%i", Constant.value[i]); } See if that helps. I think it has nothing to do with code generation, though. cheers, --renato http://systemcall.org/ Reclaim your digital rights, eliminate DRM, learn more at http://www.defectivebydesign.org/what_is_drm
Martins Mozeiko
2010-Feb-17 18:42 UTC
[LLVMdev] Incorrect codegen of getelementptr for ARM with JIT
Thanks for answer, Renato. But I still thing that there is some issue with ARM codegen. When I tried running your code you modified I got exactly same LLVM IR result (verified it by comparing output from llvm-dis) - and program on runtime still produces wrong result. With some help from another developer we managed to reduce issue to following C code that is simpler: #include <stdio.h> void init(int* value, int val) { *value = val; printf("Values: %08x\n", *value); } int main() { static struct { int a; int b; } value; init(&value.b, 11); init(&value.a, 10); printf("%i\n", value.a); printf("%i\n", value.b); } Correct result would be following output (I am getting this when I'm running ARM+Interpreter, or Windows+JIT): Values: 0000000b Values: 0000000a 10 11 But 2.6 LLVM + JIT on ARM, when compiled with llvm-gcc -O3, produces this: Values: 0000000b Values: 0000000a 10 10 Here is LLVM IR of main function: define i32 @main() nounwind { entry: store i32 11, i32* getelementptr inbounds (%struct..0._6* @_ZZ4mainE5value, i32 0, i32 1), align 4 %0 = tail call i32 (i8*, ...)* @printf(i8* noalias getelementptr inbounds ([14 x i8]* @.str, i32 0, i32 0), i32 11) nounwind ; <i32> [#uses=0] store i32 10, i32* getelementptr inbounds (%struct..0._6* @_ZZ4mainE5value, i32 0, i32 0), align 8 %1 = tail call i32 (i8*, ...)* @printf(i8* noalias getelementptr inbounds ([14 x i8]* @.str, i32 0, i32 0), i32 10) nounwind ; <i32> [#uses=0] %2 = load i32* getelementptr inbounds (%struct..0._6* @_ZZ4mainE5value, i32 0, i32 0), align 8 ; <i32> [#uses=1] %3 = tail call i32 (i8*, ...)* @printf(i8* noalias getelementptr inbounds ([4 x i8]* @.str1, i32 0, i32 0), i32 %2) ; <i32> [#uses=0] %4 = load i32* getelementptr inbounds (%struct..0._6* @_ZZ4mainE5value, i32 0, i32 1), align 4 ; <i32> [#uses=1] %5 = tail call i32 (i8*, ...)* @printf(i8* noalias getelementptr inbounds ([4 x i8]* @.str1, i32 0, i32 0), i32 %4) ; <i32> [#uses=0] ret i32 0 } It looks like the JIT compiler doesn't handle the following bitcode instructions correctly: store i32 11, i32* getelementptr inbounds (%struct..0._6* @_ZZ4mainE5value, i32 0, i32 1), align 4 and %4 = load i32* getelementptr inbounds (%struct..0._6* @_ZZ4mainE5value, i32 0, i32 1), align 4 ; <i32> [#uses=1] It ignores the "i32 1" offset in the getelementptr bitcode instruction. Here is produced ARM assembly of main function displayed from GDB: (gdb) x/30i FPtr 0x40029010: sub sp, sp, #16 ; 0x10 0x40029014: str lr, [sp, #12] 0x40029018: str r11, [sp, #8] 0x4002901c: str r5, [sp, #4] 0x40029020: str r4, [sp] Allocate four entries on the stack, and save the return address, r11, r5, and r4. 0x40029024: ldr r4, [pc, #88] ; 0x40029084 Address of the value variable in r4. 0x40029028: mov r1, #11 ; 0xb 0x4002902c: str r1, [r4] 0x40029030: ldr r5, [pc, #80] ; 0x40029088 0x40029034: mov r0, r5 0x40029038: bl 0x40009008 Inline the init function: store 11 at the address of the "value" variable, call printf with the string from r5. This is a bug, should have stored at an offset of four (str r1, [r4,4]). 0x4002903c: mov r1, #10 ; 0xa 0x40029040: str r1, [r4] 0x40029044: mov r0, r5 0x40029048: bl 0x40009008 Inline the init function: store 10 at the address of the "value" variable, call printf with the string from r5. This looks OK. 0x4002904c: ldr r1, [r4] 0x40029050: ldr r5, [pc, #52] ; 0x4002908c 0x40029054: mov r0, r5 0x40029058: bl 0x40009008 Load first number from the structure and print its value. 0x4002905c: ldr r1, [r4] 0x40029060: mov r0, r5 0x40029064: bl 0x40009008 Load first number from the structure and print its value. This is bug also, should have been "ldr r1, [r4,4]". 0x40029068: mov r0, #0 ; 0x0 0x4002906c: ldr r4, [sp] 0x40029070: ldr r5, [sp, #4] 0x40029074: ldr r11, [sp, #8] 0x40029078: ldr lr, [sp, #12] 0x4002907c: add sp, sp, #16 ; 0x10 0x40029080: bx lr And at the end restore registers from the stack and return. Can somebody confirm that this is a bug? Or am I missing something else here? -- Martins Mozeiko On Feb 17, 2010, at 12:23 , Renato Golin wrote:> On 15 February 2010 14:49, Martins Mozeiko <49640f8a at gmail.com> wrote: >> #include <stdio.h> >> struct Global { >> typedef unsigned char ArrayType[4]; >> ArrayType value; >> Global(const ArrayType& arg) { >> for (int i = 0; i < 4; i++) this->value[i] = arg[i]; >> } >> }; >> static const unsigned char arr[] = { 1, 2, 3, 4 }; >> static const Global Constant(arr); >> int main() { >> for (int i=0; i<4; i++) printf("%i", Constant.value[i]); >> } > > Compiling with clang I got lots of errors, but boils down to two problems: > >> typedef unsigned char ArrayType[4]; > > const_array.cpp:3:2: error: type name does not allow storage class to > be specified > typedef unsigned char ArrayType[4]; > ^ > > Which, as far as I can tell, it's confusing ArrayType[4] by a > declaration of an unsigned char[4] type. > > I've changed your code slightly to make it compile with clang, but I > haven't been able to make it print 4444, not even with your own code, > not even at -O3. There seems to be nothing wrong with the LLVM IR > generated by your code, too, even at -O3. > > #include <stdio.h> > typedef unsigned char ArrayType; > struct Global { > ArrayType value[4]; > Global(const ArrayType* arg) { > for (int i = 0; i < 4; i++) this->value[i] = arg[i]; > } > }; > static const unsigned char arr[] = { 1, 2, 3, 4 }; > static const struct Global Constant(arr); > int main() { > for (int i=0; i<4; i++) printf("%i", Constant.value[i]); > } > > See if that helps. I think it has nothing to do with code generation, though. > > cheers, > --renato > > http://systemcall.org/ > > Reclaim your digital rights, eliminate DRM, learn more at > http://www.defectivebydesign.org/what_is_drm-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100217/7c14056c/attachment.html>
Possibly Parallel Threads
- [LLVMdev] Incorrect execution of global constructor with JIT on ARM
- [LLVMdev] Incorrect codegen of getelementptr for ARM with JIT
- Why does LLVM keep some loads in the loops even after applying the O3 optimization?
- Why does LLVM keep some loads in the loops even after applying the O3 optimization?
- [ARM] Register pressure with -mthumb forces register reload before each call