Can we go back a little? 1) Add a new transformation to InstCombine that will replace 'getelementptr> i8, i8* null, <ty> %n' with 'inttoptr <ty> %n to i8*' when <ty> has the > same size as a pointer for the target architecture.What's the actual problem with this approach? I personally find it the most compelling - it is well-defined (well, somewhat), front-end agnostic (and assume some front ends may find this kind of pointer arithmetic to be well-defined) and predictable. I would even extend it to allow offsets of different types to be used, with additional zero-extension when applicable. Cheers, Marcin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170709/082091d2/attachment.html>
On Sun, Jul 9, 2017 at 1:10 PM, Marcin Słowik via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Can we go back a little? > > 1) Add a new transformation to InstCombine that will replace >> 'getelementptr i8, i8* null, <ty> %n' with 'inttoptr <ty> %n to i8*' when >> <ty> has the same size as a pointer for the target architecture. > > > What's the actual problem with this approach? I personally find it the > most compelling - it is well-defined (well, somewhat), front-end agnostic > (and assume some front ends may find this kind of pointer arithmetic to be > well-defined) and predictable. > I would even extend it to allow offsets of different types to be used, > with additional zero-extension when applicable. >This would make correctness of a program dependent on running a particular optimization pass, something which is not sound from a semantics point of view (what if another pass sees the gep of null before InstCombine does, etc.). LLVM IR has semantics, properties which we are supposed to use to reason about what a particular piece of IR does. This proposed transformation, while legal, is not mandatory. Making it mandatory means more than adding one particular change to InstCombine: it means a change to the semantics of LLVM IR. This way we require that all passes, analysis, etc. treat gep null in an appropriate way. There are many reasons why such a semantic shift would be undesirable: - It opens up a pandora's box with regard to the semantics of transformations on GEPs when commuted and combined with other GEPs - It results in less expresivity: frontends should emit the IR that match the semantics of their source language. Constraining GEP semantics would constrain it for frontends which do not want or need this semantic shift. If the goal is to make (char*)0 + n work in clang, clang should be the bearer of that burden. It is not difficult to implement this in AST->IR lowering and has several benefits: - No change to GEP semantics which means that existing optimizations are sound. - Easy to explain why, when and how clang's behavior shifts with regards to particular source expressions and their lowering.> > Cheers, > Marcin > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170709/2f6b2cb3/attachment.html>
Chandler Carruth via llvm-dev
2017-Jul-10 05:45 UTC
[llvm-dev] GEP with a null pointer base
On Sun, Jul 9, 2017 at 9:24 PM David Majnemer via llvm-dev < llvm-dev at lists.llvm.org> wrote:> On Sun, Jul 9, 2017 at 1:10 PM, Marcin Słowik via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Can we go back a little? >> >> 1) Add a new transformation to InstCombine that will replace >>> 'getelementptr i8, i8* null, <ty> %n' with 'inttoptr <ty> %n to i8*' when >>> <ty> has the same size as a pointer for the target architecture. >> >> >> What's the actual problem with this approach? I personally find it the >> most compelling - it is well-defined (well, somewhat), front-end agnostic >> (and assume some front ends may find this kind of pointer arithmetic to be >> well-defined) and predictable. >> I would even extend it to allow offsets of different types to be used, >> with additional zero-extension when applicable. >> > > This would make correctness of a program dependent on running a particular > optimization pass, something which is not sound from a semantics point of > view (what if another pass sees the gep of null before InstCombine does, > etc.). > > LLVM IR has semantics, properties which we are supposed to use to reason > about what a particular piece of IR does. This proposed transformation, > while legal, is not mandatory. Making it mandatory means more than adding > one particular change to InstCombine: it means a change to the semantics of > LLVM IR. This way we require that all passes, analysis, etc. treat gep null > in an appropriate way. > > There are many reasons why such a semantic shift would be undesirable: > - It opens up a pandora's box with regard to the semantics of > transformations on GEPs when commuted and combined with other GEPs > - It results in less expresivity: frontends should emit the IR that match > the semantics of their source language. Constraining GEP semantics would > constrain it for frontends which do not want or need this semantic shift. > > If the goal is to make (char*)0 + n work in clang, clang should be the > bearer of that burden. It is not difficult to implement this in AST->IR > lowering and has several benefits: > - No change to GEP semantics which means that existing optimizations are > sound. > - Easy to explain why, when and how clang's behavior shifts with regards > to particular source expressions and their lowering. >Just wanted to say that I emphatically agree with all of this. (And with making the above craziness work in Clang as a pragmatic way to support real code in the wild even if it is undesirable code in the wild.) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170710/220d51c2/attachment.html>