David Majnemer
2014-Aug-31 01:01 UTC
[LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr
Consider the two functions bellow: define i8* @f(i8* %A) { %pti = ptrtoint i8* %A to i64 %add = add i64 %pti, 5 %itp = inttoptr i64 %add to i8* ret i8* %itp} define i8* @g(i8* %A) { %gep = getelementptr i8* %A, i64 5 ret i8* %gep} What, if anything, prevents us from canonicalizing @f to @g?I've heard that this might be in violation of llvm.org/docs/LangRef.html#pointeraliasing but I don't see how. -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20140830/8cdfddda/attachment.html>
Dan Gohman
2014-Sep-08 23:22 UTC
[LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr
An object can be allocated at virtual address 5 through extra-VM means (eg. mmap), and then one can (creatively) interpret the return value of @f as being associated with whatever %A was associated with *and* 5. The return value of @g can only be associated with exactly the same set that %A was associated with. Consequently, it's not always safe to replace @f with @g. It looks a little silly to say this in the case of the integer constant 5, and there are some semantic gray areas around extra-VM allocation, but the same thing happens if the add were adding a dynamic integer value, and then it's difficult to find a way to separate that case from the constant 5 case. In any case, the general advice is that people should prefer to use getelementptr to begin with. LLVM's own optimizers were converted to use getelementptr instead of ptrtoint+add+inttoptr even when they have to do raw byte arithmetic. On Sat, Aug 30, 2014 at 6:01 PM, David Majnemer <david.majnemer at gmail.com> wrote:> Consider the two functions bellow: > > define i8* @f(i8* %A) { %pti = ptrtoint i8* %A to i64 %add = add i64 > %pti, 5 %itp = inttoptr i64 %add to i8* ret i8* %itp} > define i8* @g(i8* %A) { > %gep = getelementptr i8* %A, i64 5 ret i8* %gep} > What, if anything, prevents us from canonicalizing @f to @g?I've heard > that this might be in violation of > llvm.org/docs/LangRef.html#pointeraliasing but I don't see how. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20140908/6607c0e3/attachment.html>
Reid Kleckner
2014-Sep-09 03:36 UTC
[LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr
On Mon, Sep 8, 2014 at 4:22 PM, Dan Gohman <dan433584 at gmail.com> wrote:> It looks a little silly to say this in the case of the integer constant 5, > and there are some semantic gray areas around extra-VM allocation, but the > same thing happens if the add were adding a dynamic integer value, and then > it's difficult to find a way to separate that case from the constant 5 case. >Could we say that constant integers have no objects associated with them? If so, we need a way to bless constant integers that *do* refer to real objects, such as ASan's shadow memory base. Then you should be able to take something like add a phi of constant ints to an inttoptr and transform that to a GEP, without explicitly calling out constant integers.> In any case, the general advice is that people should prefer to use > getelementptr to begin with. LLVM's own optimizers were converted to use > getelementptr instead of ptrtoint+add+inttoptr even when they have to do > raw byte arithmetic. >I'm guessing the IR comes from C++ code that subtracts pointers, so it'd be good if we could figure this out. -------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20140908/e2c6d3e5/attachment.html>
Philip Reames
2014-Sep-10 04:27 UTC
[LLVMdev] Canonicalization of ptrtoint/inttoptr and getelementptr
On 09/08/2014 04:22 PM, Dan Gohman wrote:> An object can be allocated at virtual address 5 through extra-VM means > (eg. mmap), and then one can (creatively) interpret the return value > of @f as being associated with whatever %A was associated with *and* > 5. The return value of @g can only be associated with exactly the same > set that %A was associated with. Consequently, it's not always safe to > replace @f with @g.Dan, I'm trying to follow your logic here and am not arriving at the same conclusion. Can you point out the flaw in my reasoning here? define i8* @f(i8* %A) { %pti = ptrtoint i8* %A to i64 <-- %pti is not a pointer and is thus not based on anything %add = add i64 %pti, 5 <-- %add is not a pointer and is thus not based on anything, it is "associated with" the memory pointed to by %A --- In particular, "5" is NOT a "an integer constant ... returned from a function not defined within LLVM". It is not returned by a function. As a result the pointer value of 5 is not associated with any address range. %itp = inttoptr i64 %add to i8* %itp is based on %pti only ret i8* %itp} I'm guessing the key difference in our reasoning is about the constant 5. :) I'm also guessing that you have an example in mind which motivates the need for 5 to be considered associated with the address range. Could you expand on why?> > It looks a little silly to say this in the case of the integer > constant 5, and there are some semantic gray areas around extra-VM > allocation, but the same thing happens if the add were adding a > dynamic integer value, and then it's difficult to find a way to > separate that case from the constant 5 case. > > In any case, the general advice is that people should prefer to use > getelementptr to begin with. LLVM's own optimizers were converted to > use getelementptr instead of ptrtoint+add+inttoptr even when they have > to do raw byte arithmetic.It would be nice to be able to canoncalize ptrtoint+add+inttoptr to geps. Having seemingly reasonable-looking legal IR that simply doesn't optimize is not the best introduction for new frontend authors. :)> > > On Sat, Aug 30, 2014 at 6:01 PM, David Majnemer > <david.majnemer at gmail.com <mailto:david.majnemer at gmail.com>> wrote: > > Consider the two functions bellow: > > define i8* @f(i8* %A) { %pti = ptrtoint i8* %A to i64 %add = add > i64 %pti, 5 %itp = inttoptr i64 %add to i8* ret i8* %itp} > define i8* @g(i8* %A) { > %gep = getelementptr i8* %A, i64 5 ret i8* %gep} > What, if anything, prevents us from canonicalizing @f to @g?I've > heard that this might be in violation of > llvm.org/docs/LangRef.html#pointeraliasing but I don't see how. > > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu llvm.cs.uiuc.edu > lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <lists.llvm.org/pipermail/llvm-dev/attachments/20140909/6b13621d/attachment.html>