thr3ads.net - llvm dev - [LLVMdev] Pointer sizes, GetElementPtr, and offset sizes [May 2008]

If this information is useful, please help other people find it:
Share via:

Marc de Kruijf

2008-May-02 18:22 UTC

[LLVMdev] Pointer sizes, GetElementPtr, and offset sizes

The LLVA and LLVM papers motivate the GetElementPtr instruction by arguing
that it abstracts implementation details, in particular pointer size, from
the compiler.  While it does this fine for pointer addresses, it does not
manage it for address offsets.  Consider the following code:

$ cat test.c
int main() {
    int *x[2];
    int **y = &x[1];
    return (y - x);
}

$ llvm-gcc -O3 -c test.c -emit-llvm -o - | llvm-dis
; ModuleID = '<stdin>'
target datalayout
"e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32"
target triple = "i686-pc-linux-gnu"

define i32 @main() nounwind  {
entry:
        %x = alloca [2 x i32*]          ; <[2 x i32*]*> [#uses=2]
        %tmp1 = getelementptr [2 x i32*]* %x, i32 0, i32 1              ;
<i32**> [#uses=1]
        %tmp23 = ptrtoint i32** %tmp1 to i32            ; <i32> [#uses=1]
        %x45 = ptrtoint [2 x i32*]* %x to i32           ; <i32> [#uses=1]
        %tmp6 = sub i32 %tmp23, %x45            ; <i32> [#uses=1]
        %tmp7 = ashr i32 %tmp6, 2               ; <i32> [#uses=1]
        ret i32 %tmp7
}


The return value is 1.  The ashr exposes the pointer size by shifting the 4
byte distance over by 2.
For the analysis that I am doing, it would be nice to have an instruction
that explicitly performs this distance calculation in a type-safe manner,
irrespective of pointer size.  Something like this:

define i32 @main() nounwind  {
entry:
        %x = alloca [2 x i32*]          ; <[2 x i32*]*> [#uses=2]
        %tmp1 = getelementptr [2 x i32*]* %x, i32 0, i32 1              ;
<i32**> [#uses=1]
        %tmp2 = getdistance i32** %x, %tmp1            ; <i32> [#uses=1]
        ret i32 %tmp2
}

I'm not really a compiler person, so I'm wondering if a need for such an
instruction ever arises in more compiler-oriented situations such as
optimization or wrt. portability?  Does the fact that pointer size is never
completely hidden ever cause problems?  Is GetElementPtr generally "good
enough"?  It would be nice to have a complete solution though, wouldn't
it?
Thoughts?

Marc
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080502/58bf98c3/attachment.html>

Marc de Kruijf

2008-May-02 18:33 UTC

head link

[LLVMdev] Pointer sizes, GetElementPtr, and offset sizes

I didn't realize this before, but perhaps the fact that llvm-gcc was unable
to optimize out the offset calculation at -O3 is sufficient evidence for
supporting such an instruction. :)

Marc

On Fri, May 2, 2008 at 1:22 PM, Marc de Kruijf <dekruijf at cs.wisc.edu>
wrote:
> The LLVA and LLVM papers motivate the GetElementPtr instruction by arguing
> that it abstracts implementation details, in particular pointer size, from
> the compiler.  While it does this fine for pointer addresses, it does not
> manage it for address offsets.  Consider the following code:
>
> $ cat test.c
> int main() {
>     int *x[2];
>     int **y = &x[1];
>     return (y - x);
> }
>
> $ llvm-gcc -O3 -c test.c -emit-llvm -o - | llvm-dis
> ; ModuleID = '<stdin>'
> target datalayout >
"e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32"
> target triple = "i686-pc-linux-gnu"
>
> define i32 @main() nounwind  {
> entry:
>         %x = alloca [2 x i32*]          ; <[2 x i32*]*> [#uses=2]
>         %tmp1 = getelementptr [2 x i32*]* %x, i32 0, i32 1              ;
> <i32**> [#uses=1]
>         %tmp23 = ptrtoint i32** %tmp1 to i32            ; <i32>
[#uses=1]
>         %x45 = ptrtoint [2 x i32*]* %x to i32           ; <i32>
[#uses=1]
>         %tmp6 = sub i32 %tmp23, %x45            ; <i32> [#uses=1]
>         %tmp7 = ashr i32 %tmp6, 2               ; <i32> [#uses=1]
>         ret i32 %tmp7
> }
>
>
> The return value is 1.  The ashr exposes the pointer size by shifting the
> 4 byte distance over by 2.
> For the analysis that I am doing, it would be nice to have an instruction
> that explicitly performs this distance calculation in a type-safe manner,
> irrespective of pointer size.  Something like this:
>
> define i32 @main() nounwind  {
> entry:
>         %x = alloca [2 x i32*]          ; <[2 x i32*]*> [#uses=2]
>         %tmp1 = getelementptr [2 x i32*]* %x, i32 0, i32 1              ;
> <i32**> [#uses=1]
>         %tmp2 = getdistance i32** %x, %tmp1            ; <i32>
[#uses=1]
>         ret i32 %tmp2
> }
>
> I'm not really a compiler person, so I'm wondering if a need for
such an
> instruction ever arises in more compiler-oriented situations such as
> optimization or wrt. portability?  Does the fact that pointer size is never
> completely hidden ever cause problems?  Is GetElementPtr generally
"good
> enough"?  It would be nice to have a complete solution though,
wouldn't it?
> Thoughts?
>
> Marc
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080502/601dad71/attachment.html>

Andrew Lenharth

2008-May-02 20:29 UTC

head link

[LLVMdev] Pointer sizes, GetElementPtr, and offset sizes

On Fri, May 2, 2008 at 1:22 PM, Marc de Kruijf <dekruijf at cs.wisc.edu>
wrote:> The LLVA and LLVM papers motivate the GetElementPtr instruction by arguing
> that it abstracts implementation details, in particular pointer size, from
> the compiler.  While it does this fine for pointer addresses, it does not
> manage it for address offsets.  Consider the following code:
>
> $ cat test.c
> int main() {
>     int *x[2];
>     int **y = &x[1];
>     return (y - x);
> }
Idefine i32 @main() nounwind  {
entry:
        %x = alloca [2 x i32*]          ; <[2 x i32*]*> [#uses=2]
        %tmp1 = getelementptr [2 x i32*]* %x, i32 0, i32 1
 ; <i32**> [#uses=1]
        %tmp23 = ptrtoint i32** %tmp1 to i32            ; <i32> [#uses=1]
        %x45 = ptrtoint [2 x i32*]* %x to i32           ; <i32> [#uses=1]
        %tmp6 = sub i32 %tmp23, %x45            ; <i32> [#uses=1]
        %size = getelementptr i32** null, i32 1         ; <i32**>
[#uses=1]
        %sizeI = ptrtoint i32** %size to i32            ; <i32> [#uses=1]
        %tmp7 = ashr i32 %tmp6, %sizeI          ; <i32> [#uses=1]
        ret i32 %tmp7
}

There, pointer size independent.  The problem you see is you are using
a frontend targeting a specific platform, so pointersize is known (see
the target datalayout line).

Andrew

Andrew Lenharth

2008-May-02 20:40 UTC

head link

[LLVMdev] Pointer sizes, GetElementPtr, and offset sizes

I should say what would be a nice instruction for type safety would be
%type x = getcontainerptr %pointer, %type, gep indexes
where %x is the beginning of the structure/array of which %pointer is
the member at the offset that would be calculated by a gep.

%x = alloca [2 x i32*]
%tmp1 = getelementptr [2 x i32*]* %x, i32 0, i32 1
%tmp2 = getcontainerptr i32**  %tmp1, [2 x i32*]*, i32 0, i32 1

then %x == %tmp2

such an instruction would let you backtrack in a structure without
casts and pointer arithmetic.

Andrew

On Fri, May 2, 2008 at 3:29 PM, Andrew Lenharth <andrewl at lenharth.org>
wrote:> On Fri, May 2, 2008 at 1:22 PM, Marc de Kruijf <dekruijf at
cs.wisc.edu> wrote:
>
> > The LLVA and LLVM papers motivate the GetElementPtr instruction by
arguing
>  > that it abstracts implementation details, in particular pointer size,
from
>  > the compiler.  While it does this fine for pointer addresses, it does
not
>  > manage it for address offsets.  Consider the following code:
>  >
>  > $ cat test.c
>  > int main() {
>  >     int *x[2];
>  >     int **y = &x[1];
>  >     return (y - x);
>  > }
>
>  Idefine i32 @main() nounwind  {
>
> entry:
>         %x = alloca [2 x i32*]          ; <[2 x i32*]*> [#uses=2]
>         %tmp1 = getelementptr [2 x i32*]* %x, i32 0, i32 1
>   ; <i32**> [#uses=1]
>         %tmp23 = ptrtoint i32** %tmp1 to i32            ; <i32>
[#uses=1]
>         %x45 = ptrtoint [2 x i32*]* %x to i32           ; <i32>
[#uses=1]
>         %tmp6 = sub i32 %tmp23, %x45            ; <i32> [#uses=1]
>         %size = getelementptr i32** null, i32 1         ; <i32**>
[#uses=1]
>         %sizeI = ptrtoint i32** %size to i32            ; <i32>
[#uses=1]
>         %tmp7 = ashr i32 %tmp6, %sizeI          ; <i32> [#uses=1]
>         ret i32 %tmp7
>  }
>
>  There, pointer size independent.  The problem you see is you are using
>  a frontend targeting a specific platform, so pointersize is known (see
>  the target datalayout line).
>
>  Andrew
>

Chris Lattner

2008-May-02 22:24 UTC

head link

[LLVMdev] Pointer sizes, GetElementPtr, and offset sizes

On Fri, 2 May 2008, Marc de Kruijf wrote:> The LLVA and LLVM papers motivate the GetElementPtr instruction by arguing
> that it abstracts implementation details, in particular pointer size, from
> the compiler.  While it does this fine for pointer addresses, it does not
> manage it for address offsets.  Consider the following code:
>
> $ cat test.c
> int main() {
>    int *x[2];
>    int **y = &x[1];
>    return (y - x);
> }
> The return value is 1.  The ashr exposes the pointer size by shifting the 4
> byte distance over by 2.
Right.  A related issue is:
http://llvm.org/bugs/show_bug.cgi?id=2247
> I didn't realize this before, but perhaps the fact that llvm-gcc was 
> unable to optimize out the offset calculation at -O3 is sufficient 
> evidence for supporting such an instruction. :)
Sure it does:

$ llvm-gcc t.c -S -o - -O3 -fomit-frame-pointer
_main:
 	subl	$8, %esp
 	movl	$1, %eax
 	addl	$8, %esp
 	ret

I agree that optimizing it before codegen time would be preferable, but 
adding a new instruction (by itself) doesn't handle this.  It would be 
easy to add this to the current optimizer if we cared *shrug*.

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - May 2008 - [LLVMdev] Pointer sizes, GetElementPtr, and offset sizes

[LLVMdev] Pointer sizes, GetElementPtr, and offset sizes

[LLVMdev] Pointer sizes, GetElementPtr, and offset sizes

[LLVMdev] Pointer sizes, GetElementPtr, and offset sizes

[LLVMdev] Pointer sizes, GetElementPtr, and offset sizes

[LLVMdev] Pointer sizes, GetElementPtr, and offset sizes

Apparently Analagous Threads