Marc de Kruijf
2008-May-02 18:22 UTC
[LLVMdev] Pointer sizes, GetElementPtr, and offset sizes
The LLVA and LLVM papers motivate the GetElementPtr instruction by arguing that it abstracts implementation details, in particular pointer size, from the compiler. While it does this fine for pointer addresses, it does not manage it for address offsets. Consider the following code: $ cat test.c int main() { int *x[2]; int **y = &x[1]; return (y - x); } $ llvm-gcc -O3 -c test.c -emit-llvm -o - | llvm-dis ; ModuleID = '<stdin>' target datalayout "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32" target triple = "i686-pc-linux-gnu" define i32 @main() nounwind { entry: %x = alloca [2 x i32*] ; <[2 x i32*]*> [#uses=2] %tmp1 = getelementptr [2 x i32*]* %x, i32 0, i32 1 ; <i32**> [#uses=1] %tmp23 = ptrtoint i32** %tmp1 to i32 ; <i32> [#uses=1] %x45 = ptrtoint [2 x i32*]* %x to i32 ; <i32> [#uses=1] %tmp6 = sub i32 %tmp23, %x45 ; <i32> [#uses=1] %tmp7 = ashr i32 %tmp6, 2 ; <i32> [#uses=1] ret i32 %tmp7 } The return value is 1. The ashr exposes the pointer size by shifting the 4 byte distance over by 2. For the analysis that I am doing, it would be nice to have an instruction that explicitly performs this distance calculation in a type-safe manner, irrespective of pointer size. Something like this: define i32 @main() nounwind { entry: %x = alloca [2 x i32*] ; <[2 x i32*]*> [#uses=2] %tmp1 = getelementptr [2 x i32*]* %x, i32 0, i32 1 ; <i32**> [#uses=1] %tmp2 = getdistance i32** %x, %tmp1 ; <i32> [#uses=1] ret i32 %tmp2 } I'm not really a compiler person, so I'm wondering if a need for such an instruction ever arises in more compiler-oriented situations such as optimization or wrt. portability? Does the fact that pointer size is never completely hidden ever cause problems? Is GetElementPtr generally "good enough"? It would be nice to have a complete solution though, wouldn't it? Thoughts? Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080502/58bf98c3/attachment.html>
Marc de Kruijf
2008-May-02 18:33 UTC
[LLVMdev] Pointer sizes, GetElementPtr, and offset sizes
I didn't realize this before, but perhaps the fact that llvm-gcc was unable to optimize out the offset calculation at -O3 is sufficient evidence for supporting such an instruction. :) Marc On Fri, May 2, 2008 at 1:22 PM, Marc de Kruijf <dekruijf at cs.wisc.edu> wrote:> The LLVA and LLVM papers motivate the GetElementPtr instruction by arguing > that it abstracts implementation details, in particular pointer size, from > the compiler. While it does this fine for pointer addresses, it does not > manage it for address offsets. Consider the following code: > > $ cat test.c > int main() { > int *x[2]; > int **y = &x[1]; > return (y - x); > } > > $ llvm-gcc -O3 -c test.c -emit-llvm -o - | llvm-dis > ; ModuleID = '<stdin>' > target datalayout > "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32" > target triple = "i686-pc-linux-gnu" > > define i32 @main() nounwind { > entry: > %x = alloca [2 x i32*] ; <[2 x i32*]*> [#uses=2] > %tmp1 = getelementptr [2 x i32*]* %x, i32 0, i32 1 ; > <i32**> [#uses=1] > %tmp23 = ptrtoint i32** %tmp1 to i32 ; <i32> [#uses=1] > %x45 = ptrtoint [2 x i32*]* %x to i32 ; <i32> [#uses=1] > %tmp6 = sub i32 %tmp23, %x45 ; <i32> [#uses=1] > %tmp7 = ashr i32 %tmp6, 2 ; <i32> [#uses=1] > ret i32 %tmp7 > } > > > The return value is 1. The ashr exposes the pointer size by shifting the > 4 byte distance over by 2. > For the analysis that I am doing, it would be nice to have an instruction > that explicitly performs this distance calculation in a type-safe manner, > irrespective of pointer size. Something like this: > > define i32 @main() nounwind { > entry: > %x = alloca [2 x i32*] ; <[2 x i32*]*> [#uses=2] > %tmp1 = getelementptr [2 x i32*]* %x, i32 0, i32 1 ; > <i32**> [#uses=1] > %tmp2 = getdistance i32** %x, %tmp1 ; <i32> [#uses=1] > ret i32 %tmp2 > } > > I'm not really a compiler person, so I'm wondering if a need for such an > instruction ever arises in more compiler-oriented situations such as > optimization or wrt. portability? Does the fact that pointer size is never > completely hidden ever cause problems? Is GetElementPtr generally "good > enough"? It would be nice to have a complete solution though, wouldn't it? > Thoughts? > > Marc > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080502/601dad71/attachment.html>
Andrew Lenharth
2008-May-02 20:29 UTC
[LLVMdev] Pointer sizes, GetElementPtr, and offset sizes
On Fri, May 2, 2008 at 1:22 PM, Marc de Kruijf <dekruijf at cs.wisc.edu> wrote:> The LLVA and LLVM papers motivate the GetElementPtr instruction by arguing > that it abstracts implementation details, in particular pointer size, from > the compiler. While it does this fine for pointer addresses, it does not > manage it for address offsets. Consider the following code: > > $ cat test.c > int main() { > int *x[2]; > int **y = &x[1]; > return (y - x); > }Idefine i32 @main() nounwind { entry: %x = alloca [2 x i32*] ; <[2 x i32*]*> [#uses=2] %tmp1 = getelementptr [2 x i32*]* %x, i32 0, i32 1 ; <i32**> [#uses=1] %tmp23 = ptrtoint i32** %tmp1 to i32 ; <i32> [#uses=1] %x45 = ptrtoint [2 x i32*]* %x to i32 ; <i32> [#uses=1] %tmp6 = sub i32 %tmp23, %x45 ; <i32> [#uses=1] %size = getelementptr i32** null, i32 1 ; <i32**> [#uses=1] %sizeI = ptrtoint i32** %size to i32 ; <i32> [#uses=1] %tmp7 = ashr i32 %tmp6, %sizeI ; <i32> [#uses=1] ret i32 %tmp7 } There, pointer size independent. The problem you see is you are using a frontend targeting a specific platform, so pointersize is known (see the target datalayout line). Andrew
Andrew Lenharth
2008-May-02 20:40 UTC
[LLVMdev] Pointer sizes, GetElementPtr, and offset sizes
I should say what would be a nice instruction for type safety would be %type x = getcontainerptr %pointer, %type, gep indexes where %x is the beginning of the structure/array of which %pointer is the member at the offset that would be calculated by a gep. %x = alloca [2 x i32*] %tmp1 = getelementptr [2 x i32*]* %x, i32 0, i32 1 %tmp2 = getcontainerptr i32** %tmp1, [2 x i32*]*, i32 0, i32 1 then %x == %tmp2 such an instruction would let you backtrack in a structure without casts and pointer arithmetic. Andrew On Fri, May 2, 2008 at 3:29 PM, Andrew Lenharth <andrewl at lenharth.org> wrote:> On Fri, May 2, 2008 at 1:22 PM, Marc de Kruijf <dekruijf at cs.wisc.edu> wrote: > > > The LLVA and LLVM papers motivate the GetElementPtr instruction by arguing > > that it abstracts implementation details, in particular pointer size, from > > the compiler. While it does this fine for pointer addresses, it does not > > manage it for address offsets. Consider the following code: > > > > $ cat test.c > > int main() { > > int *x[2]; > > int **y = &x[1]; > > return (y - x); > > } > > Idefine i32 @main() nounwind { > > entry: > %x = alloca [2 x i32*] ; <[2 x i32*]*> [#uses=2] > %tmp1 = getelementptr [2 x i32*]* %x, i32 0, i32 1 > ; <i32**> [#uses=1] > %tmp23 = ptrtoint i32** %tmp1 to i32 ; <i32> [#uses=1] > %x45 = ptrtoint [2 x i32*]* %x to i32 ; <i32> [#uses=1] > %tmp6 = sub i32 %tmp23, %x45 ; <i32> [#uses=1] > %size = getelementptr i32** null, i32 1 ; <i32**> [#uses=1] > %sizeI = ptrtoint i32** %size to i32 ; <i32> [#uses=1] > %tmp7 = ashr i32 %tmp6, %sizeI ; <i32> [#uses=1] > ret i32 %tmp7 > } > > There, pointer size independent. The problem you see is you are using > a frontend targeting a specific platform, so pointersize is known (see > the target datalayout line). > > Andrew >
Chris Lattner
2008-May-02 22:24 UTC
[LLVMdev] Pointer sizes, GetElementPtr, and offset sizes
On Fri, 2 May 2008, Marc de Kruijf wrote:> The LLVA and LLVM papers motivate the GetElementPtr instruction by arguing > that it abstracts implementation details, in particular pointer size, from > the compiler. While it does this fine for pointer addresses, it does not > manage it for address offsets. Consider the following code: > > $ cat test.c > int main() { > int *x[2]; > int **y = &x[1]; > return (y - x); > }> The return value is 1. The ashr exposes the pointer size by shifting the 4 > byte distance over by 2.Right. A related issue is: http://llvm.org/bugs/show_bug.cgi?id=2247> I didn't realize this before, but perhaps the fact that llvm-gcc was > unable to optimize out the offset calculation at -O3 is sufficient > evidence for supporting such an instruction. :)Sure it does: $ llvm-gcc t.c -S -o - -O3 -fomit-frame-pointer _main: subl $8, %esp movl $1, %eax addl $8, %esp ret I agree that optimizing it before codegen time would be preferable, but adding a new instruction (by itself) doesn't handle this. It would be easy to add this to the current optimizer if we cared *shrug*. -Chris -- http://nondot.org/sabre/ http://llvm.org/