Daniel Berlin via llvm-dev
2017-May-14 17:49 UTC
[llvm-dev] RFC: Representing unions in TBAA
On Sun, May 14, 2017 at 10:20 AM, Hal Finkel <hfinkel at anl.gov> wrote:> > On 05/14/2017 11:06 AM, Daniel Berlin wrote: > > > > On Sun, May 14, 2017 at 8:37 AM, Hal Finkel <hfinkel at anl.gov> wrote: > >> >> On 03/01/2017 05:30 PM, Daniel Berlin via llvm-dev wrote: >> >> So, https://bugs.llvm.org/show_bug.cgi?id=32056 is an example showing >> our current TBAA tree for union generation is definitely irretrievably >> broken. >> I'll be honest here. I'm pretty sure your proposal doesn't go far enough. >> But truthfully, I would rather see us come closer to a representation we >> know works, which is GCC's. >> Let me try to simplify what you are suggesting, and what we have. >> Our current representation is basically inverted from GCC, but we don't >> allow things that would enable it to work. >> >> Given >> union {int a, short b}; >> >> GCC's will be: >> >> union >> / \ >> short int >> >> >> Everything is implicitly a subset of alias set 0 (which for C/C++ is >> char) just to avoid representing it. >> >> Our metadata has no child links, and only a single parent link. >> >> You can't represent the above form because you can't make a single short >> node a child of every union/struct it needs to be (lack of multiple >> parents means you can't just frob them all to offset zero). >> >> Your suggestion is to invert this as a struct >> >> short int >> | / >> union >> >> We don't allow multiple parents, however. >> Because of that, you suggest we follow all nodes for unions, special >> casing union-type nodes somehow >> >> >> Now that I've spent a bunch of time looking at this, I'd like to voice >> support for Steven's original proposal. In the context of what we have, it >> makes sense, seems sound, and should fix the representational mapping >> problem we currently have. >> > > Except you can't easily differentiate it from the current one, and if we > are going to have to upgrade/break compatibility, why not just do it once > right, a way we know works, instead of risk screwing it up again, and > playing with a representation we aren't actually sure we can make efficient > for this case? > > > I don't see why need to break backward compatibility. Do you have > something specific in mind? Once we extend the TBAA implementation to treat > repeated offsets as disjunction, then we'll extend Clang to emit metadata > for unions in this form. Old IR will work exactly as it does now. >Except the Old IR is irretrievably broken. and in this method, you can't tell whether you are dealing with correct TBAA or not. Earlier, the discussion was basically "detect old IR, disable TBAA since it's broken, make new IR work". If we do that, we need a way to distinguish new vs old IR. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170514/c2ff7514/attachment.html>
On 05/14/2017 12:49 PM, Daniel Berlin wrote:> > > On Sun, May 14, 2017 at 10:20 AM, Hal Finkel <hfinkel at anl.gov > <mailto:hfinkel at anl.gov>> wrote: > > > On 05/14/2017 11:06 AM, Daniel Berlin wrote: >> >> >> On Sun, May 14, 2017 at 8:37 AM, Hal Finkel <hfinkel at anl.gov >> <mailto:hfinkel at anl.gov>> wrote: >> >> >> On 03/01/2017 05:30 PM, Daniel Berlin via llvm-dev wrote: >>> So, https://bugs.llvm.org/show_bug.cgi?id=32056 >>> <https://bugs.llvm.org/show_bug.cgi?id=32056> is an example >>> showing our current TBAA tree for union generation is >>> definitely irretrievably broken. >>> I'll be honest here. I'm pretty sure your proposal doesn't >>> go far enough. >>> But truthfully, I would rather see us come closer to a >>> representation we know works, which is GCC's. >>> Let me try to simplify what you are suggesting, and what we >>> have. >>> Our current representation is basically inverted from GCC, >>> but we don't allow things that would enable it to work. >>> >>> Given >>> union {int a, short b}; >>> >>> GCC's will be: >>> >>> union >>> / \ >>> short int >>> >>> >>> Everything is implicitly a subset of alias set 0 (which for >>> C/C++ is char) just to avoid representing it. >>> >>> Our metadata has no child links, and only a single parent link. >>> >>> You can't represent the above form because you can't make a >>> single short node a child of every union/struct it needs to >>> be (lack of multiple parents means you can't just frob them >>> all to offset zero). >>> >>> Your suggestion is to invert this as a struct >>> >>> short int >>> | / >>> union >>> >>> We don't allow multiple parents, however. >>> Because of that, you suggest we follow all nodes for unions, >>> special casing union-type nodes somehow >> >> Now that I've spent a bunch of time looking at this, I'd like >> to voice support for Steven's original proposal. In the >> context of what we have, it makes sense, seems sound, and >> should fix the representational mapping problem we currently >> have. >> >> >> Except you can't easily differentiate it from the current one, >> and if we are going to have to upgrade/break compatibility, why >> not just do it once right, a way we know works, instead of risk >> screwing it up again, and playing with a representation we aren't >> actually sure we can make efficient for this case? > > I don't see why need to break backward compatibility. Do you have > something specific in mind? Once we extend the TBAA implementation > to treat repeated offsets as disjunction, then we'll extend Clang > to emit metadata for unions in this form. Old IR will work exactly > as it does now. > > Except the Old IR is irretrievably broken. and in this method, you > can't tell whether you are dealing with correct TBAA or not. > > Earlier, the discussion was basically "detect old IR, disable TBAA > since it's broken, make new IR work". > > If we do that, we need a way to distinguish new vs old IR.Ah, okay. I don't think that's desirable in general. There are frontends emitting perfectly self-consistent TBAA metadata, and there's no reason they should change. Clang's metadata is broken because the mapping between language rules and TBAA is broken. If we'd like, we can blacklist TBAA metadata coming from root nodes named "Simple C++ TBAA" and "Simple C/C++ TBAA" (or provide an option to do that because I'm not convinced it is worth trying to retroactively "fix" old IR by default given the associated performance penalties). After the fix, Clang can emit root nodes with different names (they're arbitrary). Changing the root-node names will also give the conservatively-correct behavior when linking old IR with new IR. Thanks again, Hal -- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170514/7e0b34ed/attachment-0001.html>
Daniel Berlin via llvm-dev
2017-May-14 21:39 UTC
[llvm-dev] RFC: Representing unions in TBAA
On Sun, May 14, 2017 at 11:01 AM, Hal Finkel <hfinkel at anl.gov> wrote:> > On 05/14/2017 12:49 PM, Daniel Berlin wrote: > > > > On Sun, May 14, 2017 at 10:20 AM, Hal Finkel <hfinkel at anl.gov> wrote: > >> >> On 05/14/2017 11:06 AM, Daniel Berlin wrote: >> >> >> >> On Sun, May 14, 2017 at 8:37 AM, Hal Finkel <hfinkel at anl.gov> wrote: >> >>> >>> On 03/01/2017 05:30 PM, Daniel Berlin via llvm-dev wrote: >>> >>> So, https://bugs.llvm.org/show_bug.cgi?id=32056 is an example showing >>> our current TBAA tree for union generation is definitely irretrievably >>> broken. >>> I'll be honest here. I'm pretty sure your proposal doesn't go far enough. >>> But truthfully, I would rather see us come closer to a representation >>> we know works, which is GCC's. >>> Let me try to simplify what you are suggesting, and what we have. >>> Our current representation is basically inverted from GCC, but we don't >>> allow things that would enable it to work. >>> >>> Given >>> union {int a, short b}; >>> >>> GCC's will be: >>> >>> union >>> / \ >>> short int >>> >>> >>> Everything is implicitly a subset of alias set 0 (which for C/C++ is >>> char) just to avoid representing it. >>> >>> Our metadata has no child links, and only a single parent link. >>> >>> You can't represent the above form because you can't make a single short >>> node a child of every union/struct it needs to be (lack of multiple >>> parents means you can't just frob them all to offset zero). >>> >>> Your suggestion is to invert this as a struct >>> >>> short int >>> | / >>> union >>> >>> We don't allow multiple parents, however. >>> Because of that, you suggest we follow all nodes for unions, special >>> casing union-type nodes somehow >>> >>> >>> Now that I've spent a bunch of time looking at this, I'd like to voice >>> support for Steven's original proposal. In the context of what we have, it >>> makes sense, seems sound, and should fix the representational mapping >>> problem we currently have. >>> >> >> Except you can't easily differentiate it from the current one, and if we >> are going to have to upgrade/break compatibility, why not just do it once >> right, a way we know works, instead of risk screwing it up again, and >> playing with a representation we aren't actually sure we can make efficient >> for this case? >> >> >> I don't see why need to break backward compatibility. Do you have >> something specific in mind? Once we extend the TBAA implementation to treat >> repeated offsets as disjunction, then we'll extend Clang to emit metadata >> for unions in this form. Old IR will work exactly as it does now. >> > > Except the Old IR is irretrievably broken. and in this method, you can't > tell whether you are dealing with correct TBAA or not. > > Earlier, the discussion was basically "detect old IR, disable TBAA since > it's broken, make new IR work". > > If we do that, we need a way to distinguish new vs old IR. > > > > Ah, okay. I don't think that's desirable in general. There are frontends > emitting perfectly self-consistent TBAA metadata, and there's no reason > they should change. Clang's metadata is broken because the mapping between > language rules and TBAA is broken. If we'd like, we can blacklist TBAA > metadata coming from root nodes named "Simple C++ TBAA" and "Simple C/C++ > TBAA" (or provide an option to do that because I'm not convinced it is > worth trying to retroactively "fix" old IR by default given the associated > performance penalties). After the fix, Clang can emit root nodes with > different names (they're arbitrary). Changing the root-node names will also > give the conservatively-correct behavior when linking old IR with new IR. > >I still continue to believe trying to patch the existing mechanism this way is not a great plan, and likely to end us in a place where we still have either correctness or performance issues. But i'm not doing the work, so if that's what you guys want to do, go for it. --Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170514/20219e57/attachment.html>