Ralf Jung via llvm-dev
2019-Feb-25 14:58 UTC
[llvm-dev] getelementptr inbounds with offset 0
Hi Bruce, On 25.02.19 13:10, Bruce Hoult wrote:> LLVM has no idea whether the address computed by GEP is actually > within a legal object. The "inbounds" keyword is just you, the > programmer, promising LLVM that you know it's ok and that you don't > care what happens if it is actually out of bounds. > > https://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-boundsThe LangRef says I get a poison value when I am violating the bounds. What I am asking is what exactly this means when the offset is 0 -- what *are* the conditions under which an offset-by-0 is "out of bounds" and hence yields poison? Of course LLVM cannot always statically determine this, but it relies on (dynamically, on the "LLVM abstract machine") such things not happening, and I am asking what exactly these dynamic conditions are. Kind regards, Ralf> > On Sun, Feb 24, 2019 at 9:05 AM Ralf Jung via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> >> Hi all, >> >> What exactly are the rules for `getelementptr inbounds` with offset 0? >> >> In Rust, we are relying on the fact that if we use, for example, `inttoptr` to >> turn `4` into a pointer, we can then do `getelementptr inbounds` with offset 0 >> on that without LLVM deducing that there actually is any dereferencable memory >> at location 4. The argument is that we can think of there being a zero-sized >> allocation. Is that a reasonable assumption? Can something like this be >> documented in the LangRef? >> >> Relatedly, how does the situation change if the pointer is not created "out of >> thin air" from a fixed integer, but is actually a dangling pointer obtained >> previously from `malloc` (or `alloca` or whatever)? Is getelementptr inbounds` >> with offset 0 on such a pointer a NOP, or does it result in `poison`? And if >> that makes a difference, how does that square with the fact that, e.g., the >> integer `0x4000` could well be inside such an allocation, but doing >> `getelementptr inbounds` with offset 0 on that would fall under the first >> question above? >> >> Kind regards, >> Ralf >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Doerfert, Johannes via llvm-dev
2019-Mar-07 17:52 UTC
[llvm-dev] getelementptr inbounds with offset 0
Hi Ralf, I wanted to restart this discussion as it is important for my IPO attribute deduction work as well. Let me share my take on the situation, no guarantees! From the Lang-Ref statement "With the inbounds keyword, the result value of the GEP is undefined if the address is outside the actual underlying allocated object and not the address one-past-the-end." I'd argue that the actual offset value (here 0) is irrelevant. The GEP value is undefined if inbounds is present and the resulting pointer does not point into, or one-past-the-end, of an allocated object. This object, in my understanding, has to be the same one the base pointer of the GEP points into, or one-past-the-end, or you get again an undefined result. That being said, your initial "gep inbounds (int2ptr 4) 0" might cause an undefined value if 4 is not part of a valid allocation, or one-past-the-end. Now if that might cause any problems, e.g., if LLVM is able to act on this fact, depends on various factors including what you do with the GEP. Your initial problem seemed to be that LLVM "might be able to deduce dereferencable memory at location 4" but that should never be the case if you only form the aforementioned GEP, with or without the inbounds actually. Forming a pointer that has a undefined value is just that, a pointer with an undefined value. A side-effect based on the GEP will however __locally__ introduce an dereferencability assumption (in my opinion at least). Let's say the code looks like this: %G = gep inbounds (int2ptr 4) 0 ; We don't know anything about the dereferencability of ; the memory at address 4 here. br %cnd, %BB0, %BB1 BB0: ; We don't know anything about the dereferencability of ; the memory at address 4 here. load %G ; We know the memory at address 4 is dereferenceable here. ; Though, that is due to the load and not the inbounds. ... br %BB1 BB1: ; We don't know anything about the dereferencability of ; the memory at address 4 here. It is a different story if you start to use the GEP in other operations, e.g., to alter control flow. Then the (potential) undefined value can propagate. Any thought on this? Did I at least get your problem description right? Cheers, Johannes P.S. Sorry if this breaks the thread and apologies that I had to remove Bruce from the CC. It turns out replying to an email you did not receive is complicated and getting on the LLVM-Dev list is nowadays as well... On 02/25, Ralf Jung via llvm-dev wrote:> Hi Bruce, > > On 25.02.19 13:10, Bruce Hoult wrote: > > LLVM has no idea whether the address computed by GEP is actually > > within a legal object. The "inbounds" keyword is just you, the > > programmer, promising LLVM that you know it's ok and that you don't > > care what happens if it is actually out of bounds. > > > > https://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds > > The LangRef says I get a poison value when I am violating the bounds. What I am > asking is what exactly this means when the offset is 0 -- what *are* the > conditions under which an offset-by-0 is "out of bounds" and hence yields poison? > Of course LLVM cannot always statically determine this, but it relies on > (dynamically, on the "LLVM abstract machine") such things not happening, and I > am asking what exactly these dynamic conditions are. > > Kind regards, > Ralf > > > > > On Sun, Feb 24, 2019 at 9:05 AM Ralf Jung via llvm-dev > > <llvm... at lists.llvm.org> wrote: > >> > >> Hi all, > >> > >> What exactly are the rules for `getelementptr inbounds` with offset 0? > >> > >> In Rust, we are relying on the fact that if we use, for example, `inttoptr` to > >> turn `4` into a pointer, we can then do `getelementptr inbounds` with offset 0 > >> on that without LLVM deducing that there actually is any dereferencable memory > >> at location 4. The argument is that we can think of there being a zero-sized > >> allocation. Is that a reasonable assumption? Can something like this be > >> documented in the LangRef? > >> > >> Relatedly, how does the situation change if the pointer is not created "out of > >> thin air" from a fixed integer, but is actually a dangling pointer obtained > >> previously from `malloc` (or `alloca` or whatever)? Is getelementptr inbounds` > >> with offset 0 on such a pointer a NOP, or does it result in `poison`? And if > >> that makes a difference, how does that square with the fact that, e.g., the > >> integer `0x4000` could well be inside such an allocation, but doing > >> `getelementptr inbounds` with offset 0 on that would fall under the first > >> question above? > >> > >> Kind regards, > >> Ralf > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm... at lists.llvm.org > >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm... at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Johannes Doerfert Researcher Argonne National Laboratory Lemont, IL 60439, USA jdoerfert at anl.gov -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190307/b2192c5e/attachment.sig>
Doerfert, Johannes via llvm-dev
2019-Mar-07 23:07 UTC
[llvm-dev] getelementptr inbounds with offset 0
After I read the message again I think the BB0 comments were wrong. It should have been: BB0: ; We know the memory at address 4 is dereferenceable here. ; Though, that is due to the load and not the inbounds. load %G ; We know the memory at address 4 is dereferenceable here. ; Though, that is due to the load and not the inbounds. ... br %BB1 ________________________________ From: Johannes Doerfert <jdoerfert at anl.gov> Sent: Thursday, March 7, 2019 11:52:55 AM To: Ralf Jung Cc: LLVM Dev Subject: Re: [llvm-dev] getelementptr inbounds with offset 0 Hi Ralf, I wanted to restart this discussion as it is important for my IPO attribute deduction work as well. Let me share my take on the situation, no guarantees!>From the Lang-Ref statement"With the inbounds keyword, the result value of the GEP is undefined if the address is outside the actual underlying allocated object and not the address one-past-the-end." I'd argue that the actual offset value (here 0) is irrelevant. The GEP value is undefined if inbounds is present and the resulting pointer does not point into, or one-past-the-end, of an allocated object. This object, in my understanding, has to be the same one the base pointer of the GEP points into, or one-past-the-end, or you get again an undefined result. That being said, your initial "gep inbounds (int2ptr 4) 0" might cause an undefined value if 4 is not part of a valid allocation, or one-past-the-end. Now if that might cause any problems, e.g., if LLVM is able to act on this fact, depends on various factors including what you do with the GEP. Your initial problem seemed to be that LLVM "might be able to deduce dereferencable memory at location 4" but that should never be the case if you only form the aforementioned GEP, with or without the inbounds actually. Forming a pointer that has a undefined value is just that, a pointer with an undefined value. A side-effect based on the GEP will however __locally__ introduce an dereferencability assumption (in my opinion at least). Let's say the code looks like this: %G = gep inbounds (int2ptr 4) 0 ; We don't know anything about the dereferencability of ; the memory at address 4 here. br %cnd, %BB0, %BB1 BB0: ; We don't know anything about the dereferencability of ; the memory at address 4 here. load %G ; We know the memory at address 4 is dereferenceable here. ; Though, that is due to the load and not the inbounds. ... br %BB1 BB1: ; We don't know anything about the dereferencability of ; the memory at address 4 here. It is a different story if you start to use the GEP in other operations, e.g., to alter control flow. Then the (potential) undefined value can propagate. Any thought on this? Did I at least get your problem description right? Cheers, Johannes P.S. Sorry if this breaks the thread and apologies that I had to remove Bruce from the CC. It turns out replying to an email you did not receive is complicated and getting on the LLVM-Dev list is nowadays as well... On 02/25, Ralf Jung via llvm-dev wrote:> Hi Bruce, > > On 25.02.19 13:10, Bruce Hoult wrote: > > LLVM has no idea whether the address computed by GEP is actually > > within a legal object. The "inbounds" keyword is just you, the > > programmer, promising LLVM that you know it's ok and that you don't > > care what happens if it is actually out of bounds. > > > > https://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds > > The LangRef says I get a poison value when I am violating the bounds. What I am > asking is what exactly this means when the offset is 0 -- what *are* the > conditions under which an offset-by-0 is "out of bounds" and hence yields poison? > Of course LLVM cannot always statically determine this, but it relies on > (dynamically, on the "LLVM abstract machine") such things not happening, and I > am asking what exactly these dynamic conditions are. > > Kind regards, > Ralf > > > > > On Sun, Feb 24, 2019 at 9:05 AM Ralf Jung via llvm-dev > > <llvm... at lists.llvm.org> wrote: > >> > >> Hi all, > >> > >> What exactly are the rules for `getelementptr inbounds` with offset 0? > >> > >> In Rust, we are relying on the fact that if we use, for example, `inttoptr` to > >> turn `4` into a pointer, we can then do `getelementptr inbounds` with offset 0 > >> on that without LLVM deducing that there actually is any dereferencable memory > >> at location 4. The argument is that we can think of there being a zero-sized > >> allocation. Is that a reasonable assumption? Can something like this be > >> documented in the LangRef? > >> > >> Relatedly, how does the situation change if the pointer is not created "out of > >> thin air" from a fixed integer, but is actually a dangling pointer obtained > >> previously from `malloc` (or `alloca` or whatever)? Is getelementptr inbounds` > >> with offset 0 on such a pointer a NOP, or does it result in `poison`? And if > >> that makes a difference, how does that square with the fact that, e.g., the > >> integer `0x4000` could well be inside such an allocation, but doing > >> `getelementptr inbounds` with offset 0 on that would fall under the first > >> question above? > >> > >> Kind regards, > >> Ralf > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm... at lists.llvm.org > >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > _______________________________________________ > LLVM Developers mailing list > llvm... at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Johannes Doerfert Researcher Argonne National Laboratory Lemont, IL 60439, USA jdoerfert at anl.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190307/65a9176f/attachment-0001.html>
Ralf Jung via llvm-dev
2019-Mar-15 16:09 UTC
[llvm-dev] getelementptr inbounds with offset 0
Hi Johannes,> From the Lang-Ref statement > > "With the inbounds keyword, the result value of the GEP is undefined > if the address is outside the actual underlying allocated object and > not the address one-past-the-end." > > I'd argue that the actual offset value (here 0) is irrelevant. The GEP > value is undefined if inbounds is present and the resulting pointer does > not point into, or one-past-the-end, of an allocated object. This > object, in my understanding, has to be the same one the base pointer of > the GEP points into, or one-past-the-end, or you get again an undefined > result.Yes, I agree with that reading. However, the notion of "allocated object" here is not entirely clear. LLVM has to operate under the assumption that there are allocations and allocators it doe snot know anything about. Just imagine some embedded project writing to well-known address 0xDeadCafe because there is a hardware register there. So, the thinking here is: LLVM cannot exclude the possibility of an object of size 0 existing at any given address. The pointer returned by "GEPi p 0" then would be one-past-the-end of such a 0-sized object. Thus, "GEPi p 0" is the identitiy function for any p, it will not return poison.> Now if that might cause any problems, e.g., if LLVM is able to act on > this fact, depends on various factors including what you do with the > GEP. Your initial problem seemed to be that LLVM "might be able to > deduce dereferencable memory at location 4" but that should never be the > case if you only form the aforementioned GEP, with or without the > inbounds actually. Forming a pointer that has a undefined value is just > that, a pointer with an undefined value.Ah, good point. First of all I was indeed unclear; the case I am worried about here is GEPi returning poison. (These values might be used in further computations and eventually surface as UB.) But also, clearly a "GEPi 0" alone cannot introduce any dereferencability assumption because of the "one-past-the-end" case. That point is inbounds but cannot be dereferenced. So, for the sake of a more concrete example (and please excuse me butchering LLVM syntax, I usually deal with this in terms of C or Rust syntax): Can %G in the following programs be poison? If yes, what is the analysis that would be weakened or the optimization that could no longer happen if "GEPi %P 0" was instead defined to always return %P? # example1 %P = int2ptr 4 %G = gep inbounds %P 0 # example2 %P = call noalias i8* @malloc(i64 12) call void @free(i8* %P) %G = gep inbounds %P 0 The first happens in Rust all the time, and we rely on not getting poison. The second doesn't occur in Rust (to my knowledge), but it seems somewhat inconsistent to return poison in one case and not the other. Kind regards, Ralf> A side-effect based on the GEP > will however __locally__ introduce an dereferencability assumption (in > my opinion at least). Let's say the code looks like this: > > > %G = gep inbounds (int2ptr 4) 0 > ; We don't know anything about the dereferencability of > ; the memory at address 4 here. > br %cnd, %BB0, %BB1 > > BB0: > ; We don't know anything about the dereferencability of > ; the memory at address 4 here. > load %G > ; We know the memory at address 4 is dereferenceable here. > ; Though, that is due to the load and not the inbounds. > ... > br %BB1 > > BB1: > ; We don't know anything about the dereferencability of > ; the memory at address 4 here. > > > It is a different story if you start to use the GEP in other operations, > e.g., to alter control flow. Then the (potential) undefined value can > propagate. > > > Any thought on this? Did I at least get your problem description right? > > Cheers, > Johannes > > > > P.S. Sorry if this breaks the thread and apologies that I had to remove > Bruce from the CC. It turns out replying to an email you did not > receive is complicated and getting on the LLVM-Dev list is nowadays > as well... > > > On 02/25, Ralf Jung via llvm-dev wrote: >> Hi Bruce, >> >> On 25.02.19 13:10, Bruce Hoult wrote: >>> LLVM has no idea whether the address computed by GEP is actually >>> within a legal object. The "inbounds" keyword is just you, the >>> programmer, promising LLVM that you know it's ok and that you don't >>> care what happens if it is actually out of bounds. >>> >>> https://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds >> >> The LangRef says I get a poison value when I am violating the bounds. What I am >> asking is what exactly this means when the offset is 0 -- what *are* the >> conditions under which an offset-by-0 is "out of bounds" and hence yields poison? >> Of course LLVM cannot always statically determine this, but it relies on >> (dynamically, on the "LLVM abstract machine") such things not happening, and I >> am asking what exactly these dynamic conditions are. >> >> Kind regards, >> Ralf >> >>> >>> On Sun, Feb 24, 2019 at 9:05 AM Ralf Jung via llvm-dev >>> <llvm... at lists.llvm.org> wrote: >>>> >>>> Hi all, >>>> >>>> What exactly are the rules for `getelementptr inbounds` with offset 0? >>>> >>>> In Rust, we are relying on the fact that if we use, for example, `inttoptr` to >>>> turn `4` into a pointer, we can then do `getelementptr inbounds` with offset 0 >>>> on that without LLVM deducing that there actually is any dereferencable memory >>>> at location 4. The argument is that we can think of there being a zero-sized >>>> allocation. Is that a reasonable assumption? Can something like this be >>>> documented in the LangRef? >>>> >>>> Relatedly, how does the situation change if the pointer is not created "out of >>>> thin air" from a fixed integer, but is actually a dangling pointer obtained >>>> previously from `malloc` (or `alloca` or whatever)? Is getelementptr inbounds` >>>> with offset 0 on such a pointer a NOP, or does it result in `poison`? And if >>>> that makes a difference, how does that square with the fact that, e.g., the >>>> integer `0x4000` could well be inside such an allocation, but doing >>>> `getelementptr inbounds` with offset 0 on that would fall under the first >>>> question above? >>>> >>>> Kind regards, >>>> Ralf >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm... at lists.llvm.org >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> _______________________________________________ >> LLVM Developers mailing list >> llvm... at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >