Daniel Berlin via llvm-dev
2017-May-14 16:06 UTC
[llvm-dev] RFC: Representing unions in TBAA
On Sun, May 14, 2017 at 8:37 AM, Hal Finkel <hfinkel at anl.gov> wrote:> > On 03/01/2017 05:30 PM, Daniel Berlin via llvm-dev wrote: > > So, https://bugs.llvm.org/show_bug.cgi?id=32056 is an example showing our > current TBAA tree for union generation is definitely irretrievably broken. > I'll be honest here. I'm pretty sure your proposal doesn't go far enough. > But truthfully, I would rather see us come closer to a representation we > know works, which is GCC's. > Let me try to simplify what you are suggesting, and what we have. > Our current representation is basically inverted from GCC, but we don't > allow things that would enable it to work. > > Given > union {int a, short b}; > > GCC's will be: > > union > / \ > short int > > > Everything is implicitly a subset of alias set 0 (which for C/C++ is char) > just to avoid representing it. > > Our metadata has no child links, and only a single parent link. > > You can't represent the above form because you can't make a single short > node a child of every union/struct it needs to be (lack of multiple > parents means you can't just frob them all to offset zero). > > Your suggestion is to invert this as a struct > > short int > | / > union > > We don't allow multiple parents, however. > Because of that, you suggest we follow all nodes for unions, special > casing union-type nodes somehow > > > Now that I've spent a bunch of time looking at this, I'd like to voice > support for Steven's original proposal. In the context of what we have, it > makes sense, seems sound, and should fix the representational mapping > problem we currently have. >Except you can't easily differentiate it from the current one, and if we are going to have to upgrade/break compatibility, why not just do it once right, a way we know works, instead of risk screwing it up again, and playing with a representation we aren't actually sure we can make efficient for this case?> Something completely different (e.g. closer to what GCC uses) can work > too, but this seems unnecessary (the proposed extension to the current > semantics seem equivalently expressive). >Yes, they are equivalently expressive.> > What you call "special casing of union-type nodes" does not actually seem > all that special. The rule seems quite simple: if you some across > duplicate offsets, then search all of them. >Which means you have to know they exist, for starters, which means keeping it in some sorted order and checking, or some other mechanism. This should not be difficult to implement or use, and seems no less> efficient than any other way of encoding the concept of disjunction in the > hierarchy. > > It is, in actuality, less efficient.For starters, in the inverted representation, you don't have to explicitly represent the things that are children of alias set zero (char in C++'s case), which is quite common. It's also trivial to generate transitive closures of parts of the graph in the inverted representation, and be able to use them, if you need to. It turns out to be much trickier the other way. Those are just trivial examples. Now, how often this matters, don't know. But i'm suggesting what i believe to be the most practical route: Given a situation where our representation has been broken for years, take an approach that is battle tested and we know works and is efficient, instead of trying to patch our representation and hope we've thought it through well enough :) Does this mean the original proposal won't work? Nope. It may in fact work. But i'd still do the thing i knew already worked well. Because like i said, you are going to break compatibility anyway, so ... -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170514/5dce33e9/attachment.html>
On 05/14/2017 11:06 AM, Daniel Berlin wrote:> > > On Sun, May 14, 2017 at 8:37 AM, Hal Finkel <hfinkel at anl.gov > <mailto:hfinkel at anl.gov>> wrote: > > > On 03/01/2017 05:30 PM, Daniel Berlin via llvm-dev wrote: >> So, https://bugs.llvm.org/show_bug.cgi?id=32056 >> <https://bugs.llvm.org/show_bug.cgi?id=32056> is an example >> showing our current TBAA tree for union generation is definitely >> irretrievably broken. >> I'll be honest here. I'm pretty sure your proposal doesn't go far >> enough. >> But truthfully, I would rather see us come closer to a >> representation we know works, which is GCC's. >> Let me try to simplify what you are suggesting, and what we have. >> Our current representation is basically inverted from GCC, but we >> don't allow things that would enable it to work. >> >> Given >> union {int a, short b}; >> >> GCC's will be: >> >> union >> / \ >> short int >> >> >> Everything is implicitly a subset of alias set 0 (which for C/C++ >> is char) just to avoid representing it. >> >> Our metadata has no child links, and only a single parent link. >> >> You can't represent the above form because you can't make a >> single short node a child of every union/struct it needs to be >> (lack of multiple parents means you can't just frob them all to >> offset zero). >> >> Your suggestion is to invert this as a struct >> >> short int >> | / >> union >> >> We don't allow multiple parents, however. >> Because of that, you suggest we follow all nodes for unions, >> special casing union-type nodes somehow > > Now that I've spent a bunch of time looking at this, I'd like to > voice support for Steven's original proposal. In the context of > what we have, it makes sense, seems sound, and should fix the > representational mapping problem we currently have. > > > Except you can't easily differentiate it from the current one, and if > we are going to have to upgrade/break compatibility, why not just do > it once right, a way we know works, instead of risk screwing it up > again, and playing with a representation we aren't actually sure we > can make efficient for this case?I don't see why need to break backward compatibility. Do you have something specific in mind? Once we extend the TBAA implementation to treat repeated offsets as disjunction, then we'll extend Clang to emit metadata for unions in this form. Old IR will work exactly as it does now.> Something completely different (e.g. closer to what GCC uses) can > work too, but this seems unnecessary (the proposed extension to > the current semantics seem equivalently expressive). > > > Yes, they are equivalently expressive. > > > What you call "special casing of union-type nodes" does not > actually seem all that special. The rule seems quite simple: if > you some across duplicate offsets, then search all of them. > > > Which means you have to know they exist, for starters, which means > keeping it in some sorted order and checking, or some other mechanism.We already keep the offsets in order (this is a current requirement), so finding duplicate offsets can be done with no added complexity.> > > This should not be difficult to implement or use, and seems no > less efficient than any other way of encoding the concept of > disjunction in the hierarchy. > > It is, in actuality, less efficient. > For starters, in the inverted representation, you don't have to > explicitly represent the things that are children of alias set zero > (char in C++'s case), which is quite common.So, I think this is an orthogonal concern. It is not clear to me that overhead here is significant, and even if it is, it seems like an even-more efficient choice would be for the frontend just not to emit TBAA at all for universally-aliasing accesses. We can also reduce the overhead by merging the universally-aliasing-type nodes with the root node of the type system. We don't do that now, however, because we treat "omnipotent char" and "vtable pointer" as peer children to the root of the type hierarchy. That seems desirable (i.e. even char* does not alias with vtable pointers). If we wish to keep this, then I think we're doing about as well as we can.> > It's also trivial to generate transitive closures of parts of the > graph in the inverted representation, and be able to use them, if you > need to. > It turns out to be much trickier the other way.I don't see how this differs significantly. We can still cache the types in the transitive closure up to the root for any type and use that as an O(1) filter on queries. As you point out, TBAA hierarchies tend to be shallow, so I'm not sure how much this matters. If it does, it is just a matter of walking from the type to the root (covering all paths if there are disjunctions).> > Those are just trivial examples. > > Now, how often this matters, don't know. > > But i'm suggesting what i believe to be the most practical route: > Given a situation where our representation has been broken for years, > take an approach that is battle tested and we know works and is > efficient, instead of trying to patch our representation and hope > we've thought it through well enough :)I'm obviously all for learning from what others have done. Also, obviously, for everything we do it might turn out we haven't thought it through well enough. However, the fundamental problem here seems easy to express: we currently lack a concept of disjunction. Disjunction is the underlying concept required to represent unions.> > Does this mean the original proposal won't work? > Nope. It may in fact work. But i'd still do the thing i knew already > worked well. Because like i said, you are going to break compatibility > anyway, so ...As I said above, I don't think we need to break backward compatibility. If we do, that may change my opinion. ;) Thanks again, Hal -- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170514/f2b92ce3/attachment.html>
Daniel Berlin via llvm-dev
2017-May-14 17:49 UTC
[llvm-dev] RFC: Representing unions in TBAA
On Sun, May 14, 2017 at 10:20 AM, Hal Finkel <hfinkel at anl.gov> wrote:> > On 05/14/2017 11:06 AM, Daniel Berlin wrote: > > > > On Sun, May 14, 2017 at 8:37 AM, Hal Finkel <hfinkel at anl.gov> wrote: > >> >> On 03/01/2017 05:30 PM, Daniel Berlin via llvm-dev wrote: >> >> So, https://bugs.llvm.org/show_bug.cgi?id=32056 is an example showing >> our current TBAA tree for union generation is definitely irretrievably >> broken. >> I'll be honest here. I'm pretty sure your proposal doesn't go far enough. >> But truthfully, I would rather see us come closer to a representation we >> know works, which is GCC's. >> Let me try to simplify what you are suggesting, and what we have. >> Our current representation is basically inverted from GCC, but we don't >> allow things that would enable it to work. >> >> Given >> union {int a, short b}; >> >> GCC's will be: >> >> union >> / \ >> short int >> >> >> Everything is implicitly a subset of alias set 0 (which for C/C++ is >> char) just to avoid representing it. >> >> Our metadata has no child links, and only a single parent link. >> >> You can't represent the above form because you can't make a single short >> node a child of every union/struct it needs to be (lack of multiple >> parents means you can't just frob them all to offset zero). >> >> Your suggestion is to invert this as a struct >> >> short int >> | / >> union >> >> We don't allow multiple parents, however. >> Because of that, you suggest we follow all nodes for unions, special >> casing union-type nodes somehow >> >> >> Now that I've spent a bunch of time looking at this, I'd like to voice >> support for Steven's original proposal. In the context of what we have, it >> makes sense, seems sound, and should fix the representational mapping >> problem we currently have. >> > > Except you can't easily differentiate it from the current one, and if we > are going to have to upgrade/break compatibility, why not just do it once > right, a way we know works, instead of risk screwing it up again, and > playing with a representation we aren't actually sure we can make efficient > for this case? > > > I don't see why need to break backward compatibility. Do you have > something specific in mind? Once we extend the TBAA implementation to treat > repeated offsets as disjunction, then we'll extend Clang to emit metadata > for unions in this form. Old IR will work exactly as it does now. >Except the Old IR is irretrievably broken. and in this method, you can't tell whether you are dealing with correct TBAA or not. Earlier, the discussion was basically "detect old IR, disable TBAA since it's broken, make new IR work". If we do that, we need a way to distinguish new vs old IR. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170514/c2ff7514/attachment.html>