Ralf Jung via llvm-dev
2019-Apr-10 15:14 UTC
[llvm-dev] getelementptr inbounds with offset 0
Hi,>>> I see. Is there a quick answer to the questions why you need inbounds >>> GEPs in that case? Can't you just use non-inbounds GEPs if you know you >>> might not have a valid base ptr and "optimize" it to inbounds once that >>> is proven? >> >> You mean on the Rust side? We emit GEPi for field accesses and array indexing. >> We cannot always statically determine if this is happening for a ZST or not. >> At the same time, given that no memory access ever happens for a ZST, allocating >> a ZST (Box::new in Rust, think of it like new in C++) does not actually allocate >> any memory, it just returns an integer (sufficiently aligned) cast to a pointer. > > OK, but why not emit non-inbonuds GEPs instead? They do not come with > the problems you have now, or maybe I misunderstand.The problem is statically figuring out whether it should be inbounds or non-inbounds. When we have code like `&x[n]`, this might be an offset-by-0 in an empty slice and hence fall into the scope of my question, or it might be a "normal" array access where we definitely want inbounds.>> Sure, UB is definitely *defined* in a runtime-value dependent way. The problem >> here is that it is not defined in a precise way -- something where one could >> write an interpreter that tracks all the extra state that is needed (like >> poison/undef and where allocations lie) and then says precisely under which >> conditions we have UB and under which we do not. >> What I am asking here for is the exact definition of GEPi if, *at run-time*, the >> offset is 0, and the base pointer is (a) an integer, or (b) dangling. > > That last part is given by the lang-ref (imo): > "If the inbounds keyword is present, the result value of the > getelementptr is a poison value if the base pointer is not an in > bounds address of an allocated object" > > I read this as: If you have a GEPi, you get poison if the base pointer > is not an allocated object. That is a dangling pointer (b) causes the > GEPi to be poison and a pointer from integer (a) may, if the address > denoted by the integer is not inside, or one past, an allocated object. > Now any offset except 0 will add more possible ways to generate a poison > value.Thanks. That makes sense from reading the docs (though I am not convinced that it actually helps with optimizations to be this strict here). For the (a) case, the question about "0-sized objects" remains, but it doesn't seem like the answer could affect what LLVM does. It would be really nice to have a reference interpreter for LLVM IR that can explicitly check for all the UB. Maybe, one day... ;) Kind regards, Ralf
Doerfert, Johannes via llvm-dev
2019-Apr-12 19:44 UTC
[llvm-dev] getelementptr inbounds with offset 0
Hi Ralf, On 04/10, Ralf Jung wrote:> >>> I see. Is there a quick answer to the questions why you need inbounds > >>> GEPs in that case? Can't you just use non-inbounds GEPs if you know you > >>> might not have a valid base ptr and "optimize" it to inbounds once that > >>> is proven? > >> > >> You mean on the Rust side? We emit GEPi for field accesses and array indexing. > >> We cannot always statically determine if this is happening for a ZST or not. > >> At the same time, given that no memory access ever happens for a ZST, allocating > >> a ZST (Box::new in Rust, think of it like new in C++) does not actually allocate > >> any memory, it just returns an integer (sufficiently aligned) cast to a pointer. > > > > OK, but why not emit non-inbonuds GEPs instead? They do not come with > > the problems you have now, or maybe I misunderstand. > > The problem is statically figuring out whether it should be inbounds or > non-inbounds. When we have code like `&x[n]`, this might be an offset-by-0 in > an empty slice and hence fall into the scope of my question, or it might be a > "normal" array access where we definitely want inbounds.I'd argue, after all this discussion at least, use non-inbounds if you do not know you have a valid object (and want to avoid undef and all what it entails). This might cause performance regressions, if you try it, it would be interesting to know how much. We could even look into an "inbounds" detection in the "Attributor framework" [0] to get some of the performance back. [0] https://reviews.llvm.org/D59919 (but see also the "Stack" tab that shows related commits)> >> Sure, UB is definitely *defined* in a runtime-value dependent way. The problem > >> here is that it is not defined in a precise way -- something where one could > >> write an interpreter that tracks all the extra state that is needed (like > >> poison/undef and where allocations lie) and then says precisely under which > >> conditions we have UB and under which we do not. > >> What I am asking here for is the exact definition of GEPi if, *at run-time*, the > >> offset is 0, and the base pointer is (a) an integer, or (b) dangling. > > > > That last part is given by the lang-ref (imo): > > "If the inbounds keyword is present, the result value of the > > getelementptr is a poison value if the base pointer is not an in > > bounds address of an allocated object" > > > > I read this as: If you have a GEPi, you get poison if the base pointer > > is not an allocated object. That is a dangling pointer (b) causes the > > GEPi to be poison and a pointer from integer (a) may, if the address > > denoted by the integer is not inside, or one past, an allocated object. > > Now any offset except 0 will add more possible ways to generate a poison > > value. > > Thanks. That makes sense from reading the docs (though I am not convinced that > it actually helps with optimizations to be this strict here).I never argued it does "make sense" ;)> For the (a) case, the question about "0-sized objects" remains, but it doesn't > seem like the answer could affect what LLVM does.I think I now see (maybe part of) your point. Something like: x = malloc(0); // ... anything except free(x) or equivalent y = gep inbounds x, 0 // ... anything except free(x) or equivalent use_but_not_dereference(y); should be OK (= no undef/poison appears). Does that at least go in the right direction? I think this should be OK from the IR definition or something is broken. Obviously, there is always the possibility, or better the certainty, that the implementation is somewhere broken ;)> It would be really nice to have a reference interpreter for LLVM IR that can > explicitly check for all the UB. Maybe, one day... ;)Let me know once you start working on one, I'd be quite interested ;) Cheers, Johannes -- Johannes Doerfert Researcher Argonne National Laboratory Lemont, IL 60439, USA jdoerfert at anl.gov -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190412/24ae3ff7/attachment.sig>
Ralf Jung via llvm-dev
2019-Apr-15 09:06 UTC
[llvm-dev] getelementptr inbounds with offset 0
Hi,>> For the (a) case, the question about "0-sized objects" remains, but it doesn't >> seem like the answer could affect what LLVM does. > > I think I now see (maybe part of) your point. > Something like: > > x = malloc(0); > // ... anything except free(x) or equivalent > y = gep inbounds x, 0 > // ... anything except free(x) or equivalent > use_but_not_dereference(y); > > should be OK (= no undef/poison appears). Does that at least go in the > right direction? I think this should be OK from the IR definition or > something is broken. Obviously, there is always the possibility, or > better the certainty, that the implementation is somewhere broken ;)I guess that is a way to look at it -- though malloc can return NULL, and that may (or may not) change the rules here. But yes, this is the closest that you can get to in C when trying to mirror what we do in Rust. C does not have 0-sized types, Rust does, so there is no more direct equivalent. Kind regards, Ralf