thr3ads.net - llvm dev - [LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr [Sep 2014]

If this information is useful, please help other people find it:
Share via:

Philip Reames

2014-Sep-10 04:27 UTC

[LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr

On 09/08/2014 04:22 PM, Dan Gohman wrote:> An object can be allocated at virtual address 5 through extra-VM means 
> (eg. mmap), and then one can (creatively) interpret the return value 
> of @f as being associated with whatever %A was associated with *and* 
> 5. The return value of @g can only be associated with exactly the same 
> set that %A was associated with. Consequently, it's not always safe to 
> replace @f with @g.Dan, I'm trying to follow your logic here and am not arriving at the 
same conclusion.  Can you point out the flaw in my reasoning here?

define i8* @f(i8* %A) {
%pti = ptrtoint i8* %A to i64  <-- %pti is not a pointer and is thus not 
based on anything
%add = add i64 %pti, 5  <-- %add is not a pointer and is thus not based 
on anything, it is "associated with" the memory pointed to by %A
--- In particular, "5" is NOT a "an integer constant ... returned
from a
function not defined within LLVM".  It is not returned by a function.  
As a result the pointer value of 5 is not associated with any address 
range.
%itp = inttoptr i64 %add to i8*  %itp is based on %pti only
ret i8* %itp}

I'm guessing the key difference in our reasoning is about the constant 
5.  :)  I'm also guessing that you have an example in mind which 
motivates the need for 5 to be considered associated with the address 
range.  Could you expand on why?

>
> It looks a little silly to say this in the case of the integer 
> constant 5, and there are some semantic gray areas around extra-VM 
> allocation, but the same thing happens if the add were adding a 
> dynamic integer value, and then it's difficult to find a way to 
> separate that case from the constant 5 case.
>
> In any case, the general advice is that people should prefer to use 
> getelementptr to begin with. LLVM's own optimizers were converted to 
> use getelementptr instead of ptrtoint+add+inttoptr even when they have 
> to do raw byte arithmetic.It would be nice to be able to canoncalize ptrtoint+add+inttoptr to 
geps.  Having seemingly reasonable-looking legal IR that simply doesn't 
optimize is not the best introduction for new frontend authors. 
:)>
>
> On Sat, Aug 30, 2014 at 6:01 PM, David Majnemer 
> <david.majnemer at gmail.com <mailto:david.majnemer at
gmail.com>> wrote:
>
>     Consider the two functions bellow:
>
>     define i8* @f(i8* %A) {  %pti = ptrtoint i8* %A to i64 %add = add
>     i64 %pti, 5  %itp = inttoptr i64 %add to i8* ret i8* %itp}
>     define i8* @g(i8* %A) {
>       %gep = getelementptr i8* %A, i64 5  ret i8* %gep}
>     What, if anything, prevents us from canonicalizing @f to @g?I've
>     heard that this might be in violation of
>     http://llvm.org/docs/LangRef.html#pointeraliasing but I don't see
how.
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140909/6b13621d/attachment.html>

Kevin Modzelewski

2014-Sep-10 21:55 UTC

head link

[LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr

On Tue, Sep 9, 2014 at 9:27 PM, Philip Reames <listmail at
philipreames.com>
wrote:
>
> I'm guessing the key difference in our reasoning is about the constant
5.
> :)  I'm also guessing that you have an example in mind which motivates
the
> need for 5 to be considered associated with the address range.  Could you
> expand on why?
>
>Can't speak for Dan, but in Pyston we certainly make use of these types of
constructs to embed JIT-time constants (say, an interned string, or a
reference to the current function object) into the function being compiled.
 Heuristically, we can all see the different of intent between "ptr +
5"
and "load (int*)0x2aaaaa0000", but it seems like it'd be difficult
to come
up with reasonable rules that would separate them.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140910/6c249558/attachment.html>

Philip Reames

2014-Sep-10 22:16 UTC

head link

[LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr

On 09/10/2014 02:55 PM, Kevin Modzelewski wrote:>
> On Tue, Sep 9, 2014 at 9:27 PM, Philip Reames 
> <listmail at philipreames.com <mailto:listmail at
philipreames.com>> wrote:
>
>
>     I'm guessing the key difference in our reasoning is about the
>     constant 5.  :)  I'm also guessing that you have an example in
>     mind which motivates the need for 5 to be considered associated
>     with the address range. Could you expand on why?
>
>
> Can't speak for Dan, but in Pyston we certainly make use of these 
> types of constructs to embed JIT-time constants (say, an interned 
> string, or a reference to the current function object) into the 
> function being compiled.  Heuristically, we can all see the different 
> of intent between "ptr + 5" and "load
(int*)0x2aaaaa0000", but it
> seems like it'd be difficult to come up with reasonable rules that 
> would separate them.
>All of the cases I've seen in JITed code can be dealt with differently.  
By emitting a global variable and then using the "link time" address 
resolution to map it to the right address, you get the same effect while 
remaining entirely within the well defined part of the IR.  I don't see 
this case as being worth restricting an otherwise reasonable optimization.

One problem with Dan's interpretation of the current rules is that this 
otherwise legal transform becomes problematic:
%addr = inttoptr 0x2aaaaa0005 to %i32*
===>
%tmp = add i32 0x2aaaaa0000, i32 5
%addr = inttoptr %tmp to %i32*

We probably wouldn't do this at the IR level, but we definitely do 
perform this transform in the backends.  There's no reason it 
*shouldn't* be valid at the IR level either.

Philip

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140910/31f9ca2b/attachment.html>

Dan Gohman

2014-Sep-11 21:29 UTC

head link

[LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr

On Tue, Sep 9, 2014 at 9:27 PM, Philip Reames <listmail at
philipreames.com>
wrote:
>  On 09/08/2014 04:22 PM, Dan Gohman wrote:
>
>  An object can be allocated at virtual address 5 through extra-VM means
> (eg. mmap), and then one can (creatively) interpret the return value of @f
> as being associated with whatever %A was associated with *and* 5. The
> return value of @g can only be associated with exactly the same set that %A
> was associated with. Consequently, it's not always safe to replace @f
with
> @g.
>
> Dan, I'm trying to follow your logic here and am not arriving at the
same
> conclusion.  Can you point out the flaw in my reasoning here?
>
> define i8* @f(i8* %A) {
> %pti = ptrtoint i8* %A to i64  <-- %pti is not a pointer and is thus not
> based on anything
> %add = add i64 %pti, 5  <-- %add is not a pointer and is thus not based
on
> anything, it is "associated with" the memory pointed to by %A
> --- In particular, "5" is NOT a "an integer constant ...
returned from a
> function not defined within LLVM".  It is not returned by a function. 
As a
> result the pointer value of 5 is not associated with any address range.
>
I believe you misinterpreted the text here. 5 is "an integer constant other
than zero", so it "may be associated with address ranges allocated
through
mechanisms other than those provided by LLVM".

%itp = inttoptr i64 %add to i8*  %itp is based on %pti
only> ret i8* %itp}
>
> I'm guessing the key difference in our reasoning is about the constant
5.
> :)  I'm also guessing that you have an example in mind which motivates
the
> need for 5 to be considered associated with the address range.  Could you
> expand on why?
>
LLVM is used in a wide variety of contexts. In some of them, objects are
statically allocated at known fixed addresses. In others, the JIT runs
after objects are allocated, so it knows the address of allocated objects.
In others, mmap is used to dynamically allocate objects at fixed addresses.
The current rules attempt to accommodate all of these use cases, and more.

To respond to your suggestion elsewhere about using symbolic addresses that
are resolved at link time, that's indeed a great technique, but not one
that LLVM can require all its front-ends to use, because the practice of
using integer constants is very widespread. It's even common enough at the
C/C++ level. Also, in a JIT context, using symbolic addresses could require
expensive and otherwise unnecessary relocation processing.

>
>
> It looks a little silly to say this in the case of the integer constant 5,
> and there are some semantic gray areas around extra-VM allocation, but the
> same thing happens if the add were adding a dynamic integer value, and then
> it's difficult to find a way to separate that case from the constant 5
case.
>
>  In any case, the general advice is that people should prefer to use
> getelementptr to begin with. LLVM's own optimizers were converted to
use
> getelementptr instead of ptrtoint+add+inttoptr even when they have to do
> raw byte arithmetic.
>
> It would be nice to be able to canoncalize ptrtoint+add+inttoptr to geps.
> Having seemingly reasonable-looking legal IR that simply doesn't
optimize
> is not the best introduction for new frontend authors.  :)
>
I don't know if bitcast+getelementptr+bitcast is really worse than
ptrtoint+add+inttoptr here. It's also my own experience writing front-ends
that one most often gets into array and struct field accesses pretty
quickly, and raw byte offsets only after getting into it a ways, so
getelementptr shouldn't that foreign.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140911/2d94af8c/attachment.html>

llvm dev - Sep 2014 - [LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr

[LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr

[LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr

[LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr

[LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr