thr3ads.net - llvm dev - [LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr [Aug 2014]

If this information is useful, please help other people find it:
Share via:

David Majnemer

2014-Aug-31 01:01 UTC

[LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr

Consider the two functions bellow:

define i8* @f(i8* %A) {  %pti = ptrtoint i8* %A to i64  %add = add i64
%pti, 5  %itp = inttoptr i64 %add to i8*  ret i8* %itp}
define i8* @g(i8* %A) {
  %gep = getelementptr i8* %A, i64 5  ret i8* %gep}
What, if anything, prevents us from canonicalizing @f to @g?I've heard that
this might be in violation of
http://llvm.org/docs/LangRef.html#pointeraliasing but I don't see how.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140830/8cdfddda/attachment.html>

Dan Gohman

2014-Sep-08 23:22 UTC

head link

[LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr

An object can be allocated at virtual address 5 through extra-VM means (eg.
mmap), and then one can (creatively) interpret the return value of @f as
being associated with whatever %A was associated with *and* 5. The return
value of @g can only be associated with exactly the same set that %A was
associated with. Consequently, it's not always safe to replace @f with @g.

It looks a little silly to say this in the case of the integer constant 5,
and there are some semantic gray areas around extra-VM allocation, but the
same thing happens if the add were adding a dynamic integer value, and then
it's difficult to find a way to separate that case from the constant 5 case.

In any case, the general advice is that people should prefer to use
getelementptr to begin with. LLVM's own optimizers were converted to use
getelementptr instead of ptrtoint+add+inttoptr even when they have to do
raw byte arithmetic.

On Sat, Aug 30, 2014 at 6:01 PM, David Majnemer <david.majnemer at
gmail.com>
wrote:
> Consider the two functions bellow:
>
> define i8* @f(i8* %A) {  %pti = ptrtoint i8* %A to i64  %add = add i64
> %pti, 5  %itp = inttoptr i64 %add to i8*  ret i8* %itp}
> define i8* @g(i8* %A) {
>   %gep = getelementptr i8* %A, i64 5  ret i8* %gep}
> What, if anything, prevents us from canonicalizing @f to @g?I've heard
> that this might be in violation of
> http://llvm.org/docs/LangRef.html#pointeraliasing but I don't see how.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140908/6607c0e3/attachment.html>

Reid Kleckner

2014-Sep-09 03:36 UTC

head link

[LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr

On Mon, Sep 8, 2014 at 4:22 PM, Dan Gohman <dan433584 at gmail.com> wrote:
> It looks a little silly to say this in the case of the integer constant 5,
> and there are some semantic gray areas around extra-VM allocation, but the
> same thing happens if the add were adding a dynamic integer value, and then
> it's difficult to find a way to separate that case from the constant 5
case.
>
Could we say that constant integers have no objects associated with them?
If so, we need a way to bless constant integers that *do* refer to real
objects, such as ASan's shadow memory base.

Then you should be able to take something like add a phi of constant ints
to an inttoptr and transform that to a GEP, without explicitly calling out
constant integers.

> In any case, the general advice is that people should prefer to use
> getelementptr to begin with. LLVM's own optimizers were converted to
use
> getelementptr instead of ptrtoint+add+inttoptr even when they have to do
> raw byte arithmetic.
>
I'm guessing the IR comes from C++ code that subtracts pointers, so it'd
be
good if we could figure this out.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140908/e2c6d3e5/attachment.html>

Philip Reames

2014-Sep-10 04:27 UTC

head link

[LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr

On 09/08/2014 04:22 PM, Dan Gohman wrote:> An object can be allocated at virtual address 5 through extra-VM means 
> (eg. mmap), and then one can (creatively) interpret the return value 
> of @f as being associated with whatever %A was associated with *and* 
> 5. The return value of @g can only be associated with exactly the same 
> set that %A was associated with. Consequently, it's not always safe to 
> replace @f with @g.Dan, I'm trying to follow your logic here and am not arriving at the 
same conclusion.  Can you point out the flaw in my reasoning here?

define i8* @f(i8* %A) {
%pti = ptrtoint i8* %A to i64  <-- %pti is not a pointer and is thus not 
based on anything
%add = add i64 %pti, 5  <-- %add is not a pointer and is thus not based 
on anything, it is "associated with" the memory pointed to by %A
--- In particular, "5" is NOT a "an integer constant ... returned
from a
function not defined within LLVM".  It is not returned by a function.  
As a result the pointer value of 5 is not associated with any address 
range.
%itp = inttoptr i64 %add to i8*  %itp is based on %pti only
ret i8* %itp}

I'm guessing the key difference in our reasoning is about the constant 
5.  :)  I'm also guessing that you have an example in mind which 
motivates the need for 5 to be considered associated with the address 
range.  Could you expand on why?

>
> It looks a little silly to say this in the case of the integer 
> constant 5, and there are some semantic gray areas around extra-VM 
> allocation, but the same thing happens if the add were adding a 
> dynamic integer value, and then it's difficult to find a way to 
> separate that case from the constant 5 case.
>
> In any case, the general advice is that people should prefer to use 
> getelementptr to begin with. LLVM's own optimizers were converted to 
> use getelementptr instead of ptrtoint+add+inttoptr even when they have 
> to do raw byte arithmetic.It would be nice to be able to canoncalize ptrtoint+add+inttoptr to 
geps.  Having seemingly reasonable-looking legal IR that simply doesn't 
optimize is not the best introduction for new frontend authors. 
:)>
>
> On Sat, Aug 30, 2014 at 6:01 PM, David Majnemer 
> <david.majnemer at gmail.com <mailto:david.majnemer at
gmail.com>> wrote:
>
>     Consider the two functions bellow:
>
>     define i8* @f(i8* %A) {  %pti = ptrtoint i8* %A to i64 %add = add
>     i64 %pti, 5  %itp = inttoptr i64 %add to i8* ret i8* %itp}
>     define i8* @g(i8* %A) {
>       %gep = getelementptr i8* %A, i64 5  ret i8* %gep}
>     What, if anything, prevents us from canonicalizing @f to @g?I've
>     heard that this might be in violation of
>     http://llvm.org/docs/LangRef.html#pointeraliasing but I don't see
how.
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140909/6b13621d/attachment.html>

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Aug 2014 - [LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr

[LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr

[LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr

[LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr

[LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr

Reasonably Related Threads