Ralf Jung via llvm-dev
2019-Feb-24 17:04 UTC
[llvm-dev] getelementptr inbounds with offset 0
Hi all, What exactly are the rules for `getelementptr inbounds` with offset 0? In Rust, we are relying on the fact that if we use, for example, `inttoptr` to turn `4` into a pointer, we can then do `getelementptr inbounds` with offset 0 on that without LLVM deducing that there actually is any dereferencable memory at location 4. The argument is that we can think of there being a zero-sized allocation. Is that a reasonable assumption? Can something like this be documented in the LangRef? Relatedly, how does the situation change if the pointer is not created "out of thin air" from a fixed integer, but is actually a dangling pointer obtained previously from `malloc` (or `alloca` or whatever)? Is getelementptr inbounds` with offset 0 on such a pointer a NOP, or does it result in `poison`? And if that makes a difference, how does that square with the fact that, e.g., the integer `0x4000` could well be inside such an allocation, but doing `getelementptr inbounds` with offset 0 on that would fall under the first question above? Kind regards, Ralf
Bruce Hoult via llvm-dev
2019-Feb-25 12:10 UTC
[llvm-dev] getelementptr inbounds with offset 0
LLVM has no idea whether the address computed by GEP is actually within a legal object. The "inbounds" keyword is just you, the programmer, promising LLVM that you know it's ok and that you don't care what happens if it is actually out of bounds. https://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds On Sun, Feb 24, 2019 at 9:05 AM Ralf Jung via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > Hi all, > > What exactly are the rules for `getelementptr inbounds` with offset 0? > > In Rust, we are relying on the fact that if we use, for example, `inttoptr` to > turn `4` into a pointer, we can then do `getelementptr inbounds` with offset 0 > on that without LLVM deducing that there actually is any dereferencable memory > at location 4. The argument is that we can think of there being a zero-sized > allocation. Is that a reasonable assumption? Can something like this be > documented in the LangRef? > > Relatedly, how does the situation change if the pointer is not created "out of > thin air" from a fixed integer, but is actually a dangling pointer obtained > previously from `malloc` (or `alloca` or whatever)? Is getelementptr inbounds` > with offset 0 on such a pointer a NOP, or does it result in `poison`? And if > that makes a difference, how does that square with the fact that, e.g., the > integer `0x4000` could well be inside such an allocation, but doing > `getelementptr inbounds` with offset 0 on that would fall under the first > question above? > > Kind regards, > Ralf > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Robin Kruppe via llvm-dev
2019-Feb-25 12:50 UTC
[llvm-dev] getelementptr inbounds with offset 0
On Mon, 25 Feb 2019 at 13:11, Bruce Hoult via llvm-dev < llvm-dev at lists.llvm.org> wrote:> LLVM has no idea whether the address computed by GEP is actually > within a legal object. The "inbounds" keyword is just you, the > programmer, promising LLVM that you know it's ok and that you don't > care what happens if it is actually out of bounds. > > > https://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds >Hi Bruce, it's not true in general that LLVM has no idea about (or doesn't care about) object sizes. It can infer object size and other things from allocas, global variables, and calls to built-in functions such as malloc(). In the case of Rust we even have an out of tree patch to teach LLVM the same for Rust's (global) heap allocation functions. You can see this information being computed in lib/Analysis/MemoryBuiltins.cpp. More importantly, the question is *what* actually is being promised to LLVM, more specifically, what the definitions of the terms "out of bounds" and "object" are in this context. It is easy enough to answer intuitively in many specific cases whether a GEP should be considered "out of bounds", but in the cases Ralf described, where offsets and "object sizes" are equal to 0, it is not so clear-cut and depends on tricky matters such as whether zero-sized allocations exist. We (Rust developers) very much care what happens in those cases (it should be a NOP), so it's important to check whether that is compatible with the Rust compiler emitting inbounds GEPs. It is true that in practice in many cases LLVM won't be able to determine conclusively whether an object exists or not and what its bounds are, but that doesn't answer the question. Cheers, Robin> On Sun, Feb 24, 2019 at 9:05 AM Ralf Jung via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > > Hi all, > > > > What exactly are the rules for `getelementptr inbounds` with offset 0? > > > > In Rust, we are relying on the fact that if we use, for example, > `inttoptr` to > > turn `4` into a pointer, we can then do `getelementptr inbounds` with > offset 0 > > on that without LLVM deducing that there actually is any dereferencable > memory > > at location 4. The argument is that we can think of there being a > zero-sized > > allocation. Is that a reasonable assumption? Can something like this be > > documented in the LangRef? > > > > Relatedly, how does the situation change if the pointer is not created > "out of > > thin air" from a fixed integer, but is actually a dangling pointer > obtained > > previously from `malloc` (or `alloca` or whatever)? Is getelementptr > inbounds` > > with offset 0 on such a pointer a NOP, or does it result in `poison`? > And if > > that makes a difference, how does that square with the fact that, e.g., > the > > integer `0x4000` could well be inside such an allocation, but doing > > `getelementptr inbounds` with offset 0 on that would fall under the first > > question above? > > > > Kind regards, > > Ralf > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190225/8fb65bbe/attachment.html>
Ralf Jung via llvm-dev
2019-Feb-25 14:58 UTC
[llvm-dev] getelementptr inbounds with offset 0
Hi Bruce, On 25.02.19 13:10, Bruce Hoult wrote:> LLVM has no idea whether the address computed by GEP is actually > within a legal object. The "inbounds" keyword is just you, the > programmer, promising LLVM that you know it's ok and that you don't > care what happens if it is actually out of bounds. > > https://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-boundsThe LangRef says I get a poison value when I am violating the bounds. What I am asking is what exactly this means when the offset is 0 -- what *are* the conditions under which an offset-by-0 is "out of bounds" and hence yields poison? Of course LLVM cannot always statically determine this, but it relies on (dynamically, on the "LLVM abstract machine") such things not happening, and I am asking what exactly these dynamic conditions are. Kind regards, Ralf> > On Sun, Feb 24, 2019 at 9:05 AM Ralf Jung via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> >> Hi all, >> >> What exactly are the rules for `getelementptr inbounds` with offset 0? >> >> In Rust, we are relying on the fact that if we use, for example, `inttoptr` to >> turn `4` into a pointer, we can then do `getelementptr inbounds` with offset 0 >> on that without LLVM deducing that there actually is any dereferencable memory >> at location 4. The argument is that we can think of there being a zero-sized >> allocation. Is that a reasonable assumption? Can something like this be >> documented in the LangRef? >> >> Relatedly, how does the situation change if the pointer is not created "out of >> thin air" from a fixed integer, but is actually a dangling pointer obtained >> previously from `malloc` (or `alloca` or whatever)? Is getelementptr inbounds` >> with offset 0 on such a pointer a NOP, or does it result in `poison`? And if >> that makes a difference, how does that square with the fact that, e.g., the >> integer `0x4000` could well be inside such an allocation, but doing >> `getelementptr inbounds` with offset 0 on that would fall under the first >> question above? >> >> Kind regards, >> Ralf >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev