Daniel Berlin via llvm-dev
2017-Mar-01 23:30 UTC
[llvm-dev] RFC: Representing unions in TBAA
So, https://bugs.llvm.org/show_bug.cgi?id=32056 is an example showing our current TBAA tree for union generation is definitely irretrievably broken. I'll be honest here. I'm pretty sure your proposal doesn't go far enough. But truthfully, I would rather see us come closer to a representation we know works, which is GCC's. Let me try to simplify what you are suggesting, and what we have. Our current representation is basically inverted from GCC, but we don't allow things that would enable it to work. Given union {int a, short b}; GCC's will be: union / \ short int Everything is implicitly a subset of alias set 0 (which for C/C++ is char) just to avoid representing it. Our metadata has no child links, and only a single parent link. You can't represent the above form because you can't make a single short node a child of every union/struct it needs to be (lack of multiple parents means you can't just frob them all to offset zero). Your suggestion is to invert this as a struct short int | / union We don't allow multiple parents, however. Because of that, you suggest we follow all nodes for unions, special casing union-type nodes somehow Let me suggest something different: We know the current structure fails us in a number of ways. Instead of trying to shoehorn this into our current structure, I suggest: we stop messing around and just have a GCC style tree, and represent the children instead of the parents. We make contained types descendants instead of ancestors. We allow multiple children at each offset and for scalar type nodes.x` We could do that without changing to children, but honestly, i strongly believe nobody who ever looks at this really understands it right now. (If someone really does like the ancestor form, i'd love to understand why :P) So if we are going to change it, we should change it to something that is easier to understand. something like: scalar type node -> {name, children nodes} struct node -> {name, array of {offset, child node} } Paths are separate from the tbaa tree itself, and are just: path node -> {struct node, offset} or whatever. unions are just scalar type nodes with multiple children, not struct nodes with special-cased offset zero. This also has a well-defined upgrade path. For "old-style" DAGs, like exist now, we know we have to regen the metadata, and that it is wrong (So we could, for example, simply disable it for correctness reasons) . Anything with a "new-style" DAG, we know is usable. In this representation, the most generic tbaa node is just the one that contains the other. If neither contains the other, you can just make one that has both as children and use that. (instead of now, where you'd have to have multiple parents again). (The above also may be better for other languages) --Dan On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron <perrons at ca.ibm.com> wrote:> Seems like the comments have stopped. I'll try to get a patch together. > Then we can continue the discussion from there. > > Hubert, as for the issue with the llvm optimizations losing the TBAA > information, it should be the responsibility to make sure the aliasing is > changed in the correct way. One function related to this has already been > mentioned: getMostGenericTBAA. > > Later, > Steven Perron > > > > From: Hubert Tong <hubert.reinterpretcast at gmail.com> > To: Steven Perron/Toronto/IBM at IBMCA > Cc: Daniel Berlin <dberlin at dberlin.org>, llvm-dev < > llvm-dev at lists.llvm.org>, Sanjoy Das <sanjoy at playingwithpointers.com> > Date: 2017/02/15 07:44 AM > Subject: Re: [llvm-dev] RFC: Representing unions in TBAA > ------------------------------ > > > > On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron <*perrons at ca.ibm.com* > <perrons at ca.ibm.com>> wrote: > 3) How should we handle a reference directly through a union, and a > reference that is not through the union? > > My solution was to look for each member of the union overlaps the given > offset, and see if any of those members aliased the other reference. If no > member aliases the other reference, then the answer is no alias. Otherwise > the answer is may alias. The run time for this would be proportional to > "distance to the root" * "number of overlapping members". This could be > slow if there are unions with many members or many unions of unions. > > Another option is to say that they do not alias. This would mean that all > references to unions must be explicitly through the union. > From what I gather from the thread so far, the access through the union > may be lost because of LLVM transformations. I am not sure how, in the face > of that, TBAA could indicate NoAlias safely (without the risk of > functional-correctness issues in correct programs) between types which > overlap within any union (within some portion of the program). > > As for the standards, it is definitely not true that all references to > unions must be explicitly through the union. However, if you are trying to > perform union-based type punning (under C11), then it appears that it is > intended that the read must be through the union. > > This would be the least restrictive aliasing allowing the most > optimization. The implementation would be simple. I believe we make the > parent of the TBAA node for the union to be "omnipotent char". This might > be similar to treating the union TBAA node more like a scalar node instead > of a struct-path. Then the traversal of the TBAA nodes will be quick. > I'll have to work this out a bit more, but, if this is good enough to meet > the requirements of the standard, I can try to think this through a little > more. I'll need Hubert and Daniel to comment on that since I am no expert > on the C and C++ standards. > > The third option is to be pessimistic and say "may alias" all of the time > (conservatively correct), and rely on other alias analysis to improve it. > This will have good compile time, but could hinder optimization. > Personally I do not like this option. Most of the time it will not have a > negative effect, but there will be a reasonable number of programs where > this will hurt optimization more that it needs to. > > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170301/d57aabb9/attachment.html>
Pardon my exclamation, are you saying TBAA will return NoAlias between union members under current implementation? If so, this is definitely a concern. Kevin On Wed, Mar 1, 2017 at 3:30 PM, Daniel Berlin via llvm-dev < llvm-dev at lists.llvm.org> wrote:> So, https://bugs.llvm.org/show_bug.cgi?id=32056 is an example showing our > current TBAA tree for union generation is definitely irretrievably broken. > I'll be honest here. I'm pretty sure your proposal doesn't go far enough. > But truthfully, I would rather see us come closer to a representation we > know works, which is GCC's. > Let me try to simplify what you are suggesting, and what we have. > Our current representation is basically inverted from GCC, but we don't > allow things that would enable it to work. > > Given > union {int a, short b}; > > GCC's will be: > > union > / \ > short int > > > Everything is implicitly a subset of alias set 0 (which for C/C++ is char) > just to avoid representing it. > > Our metadata has no child links, and only a single parent link. > > You can't represent the above form because you can't make a single short > node a child of every union/struct it needs to be (lack of multiple > parents means you can't just frob them all to offset zero). > > Your suggestion is to invert this as a struct > > short int > | / > union > > We don't allow multiple parents, however. > Because of that, you suggest we follow all nodes for unions, special > casing union-type nodes somehow > > Let me suggest something different: > > We know the current structure fails us in a number of ways. > Instead of trying to shoehorn this into our current structure, I suggest: > we stop messing around and just have a GCC style tree, and represent the > children instead of the parents. > We make contained types descendants instead of ancestors. > We allow multiple children at each offset and for scalar type nodes.x` > > We could do that without changing to children, but honestly, i strongly > believe nobody who ever looks at this really understands it right now. > (If someone really does like the ancestor form, i'd love to understand why > :P) > > So if we are going to change it, we should change it to something that is > easier to understand. > > something like: > > scalar type node -> {name, children nodes} > struct node -> {name, array of {offset, child node} } > > Paths are separate from the tbaa tree itself, and are just: > path node -> {struct node, offset} or whatever. > > unions are just scalar type nodes with multiple children, not struct nodes > with special-cased offset zero. > > This also has a well-defined upgrade path. > > For "old-style" DAGs, like exist now, we know we have to regen the > metadata, and that it is wrong (So we could, for example, simply disable it > for correctness reasons) > . > Anything with a "new-style" DAG, we know is usable. > > In this representation, the most generic tbaa node is just the one that > contains the other. > If neither contains the other, you can just make one that has both as > children and use that. > (instead of now, where you'd have to have multiple parents again). > > (The above also may be better for other languages) > > --Dan > > > > > On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron <perrons at ca.ibm.com> wrote: > >> Seems like the comments have stopped. I'll try to get a patch together. >> Then we can continue the discussion from there. >> >> Hubert, as for the issue with the llvm optimizations losing the TBAA >> information, it should be the responsibility to make sure the aliasing is >> changed in the correct way. One function related to this has already been >> mentioned: getMostGenericTBAA. >> >> Later, >> Steven Perron >> >> >> >> From: Hubert Tong <hubert.reinterpretcast at gmail.com> >> To: Steven Perron/Toronto/IBM at IBMCA >> Cc: Daniel Berlin <dberlin at dberlin.org>, llvm-dev < >> llvm-dev at lists.llvm.org>, Sanjoy Das <sanjoy at playingwithpointers.com> >> Date: 2017/02/15 07:44 AM >> Subject: Re: [llvm-dev] RFC: Representing unions in TBAA >> ------------------------------ >> >> >> >> On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron <*perrons at ca.ibm.com* >> <perrons at ca.ibm.com>> wrote: >> 3) How should we handle a reference directly through a union, and a >> reference that is not through the union? >> >> My solution was to look for each member of the union overlaps the given >> offset, and see if any of those members aliased the other reference. If no >> member aliases the other reference, then the answer is no alias. Otherwise >> the answer is may alias. The run time for this would be proportional to >> "distance to the root" * "number of overlapping members". This could be >> slow if there are unions with many members or many unions of unions. >> >> Another option is to say that they do not alias. This would mean that >> all references to unions must be explicitly through the union. >> From what I gather from the thread so far, the access through the union >> may be lost because of LLVM transformations. I am not sure how, in the face >> of that, TBAA could indicate NoAlias safely (without the risk of >> functional-correctness issues in correct programs) between types which >> overlap within any union (within some portion of the program). >> >> As for the standards, it is definitely not true that all references to >> unions must be explicitly through the union. However, if you are trying to >> perform union-based type punning (under C11), then it appears that it is >> intended that the read must be through the union. >> >> This would be the least restrictive aliasing allowing the most >> optimization. The implementation would be simple. I believe we make the >> parent of the TBAA node for the union to be "omnipotent char". This might >> be similar to treating the union TBAA node more like a scalar node instead >> of a struct-path. Then the traversal of the TBAA nodes will be quick. >> I'll have to work this out a bit more, but, if this is good enough to meet >> the requirements of the standard, I can try to think this through a little >> more. I'll need Hubert and Daniel to comment on that since I am no expert >> on the C and C++ standards. >> >> The third option is to be pessimistic and say "may alias" all of the time >> (conservatively correct), and rely on other alias analysis to improve it. >> This will have good compile time, but could hinder optimization. >> Personally I do not like this option. Most of the time it will not have a >> negative effect, but there will be a reasonable number of programs where >> this will hurt optimization more that it needs to. >> >> >> >> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170301/46cdbd43/attachment.html>
Daniel Berlin via llvm-dev
2017-Mar-02 00:36 UTC
[llvm-dev] RFC: Representing unions in TBAA
Yes. Clang generates tbaa trees that are not conservatively correct in the face of unions, for any definition of correct you have :) We mostly rely on basicaa to say things are must-alias, but if it says may-alias, in the case of unions, tbaa will often say "noalias". On Wed, Mar 1, 2017, 4:00 PM Flamedoge <code.kchoi at gmail.com> wrote:> Pardon my exclamation, are you saying TBAA will return NoAlias between > union members under current implementation? > If so, this is definitely a concern. > > Kevin > > On Wed, Mar 1, 2017 at 3:30 PM, Daniel Berlin via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > So, https://bugs.llvm.org/show_bug.cgi?id=32056 is an example showing our > current TBAA tree for union generation is definitely irretrievably broken. > I'll be honest here. I'm pretty sure your proposal doesn't go far enough. > But truthfully, I would rather see us come closer to a representation we > know works, which is GCC's. > Let me try to simplify what you are suggesting, and what we have. > Our current representation is basically inverted from GCC, but we don't > allow things that would enable it to work. > > Given > union {int a, short b}; > > GCC's will be: > > union > / \ > short int > > > Everything is implicitly a subset of alias set 0 (which for C/C++ is char) > just to avoid representing it. > > Our metadata has no child links, and only a single parent link. > > You can't represent the above form because you can't make a single short > node a child of every union/struct it needs to be (lack of multiple > parents means you can't just frob them all to offset zero). > > Your suggestion is to invert this as a struct > > short int > | / > union > > We don't allow multiple parents, however. > Because of that, you suggest we follow all nodes for unions, special > casing union-type nodes somehow > > Let me suggest something different: > > We know the current structure fails us in a number of ways. > Instead of trying to shoehorn this into our current structure, I suggest: > we stop messing around and just have a GCC style tree, and represent the > children instead of the parents. > We make contained types descendants instead of ancestors. > We allow multiple children at each offset and for scalar type nodes.x` > > We could do that without changing to children, but honestly, i strongly > believe nobody who ever looks at this really understands it right now. > (If someone really does like the ancestor form, i'd love to understand why > :P) > > So if we are going to change it, we should change it to something that is > easier to understand. > > something like: > > scalar type node -> {name, children nodes} > struct node -> {name, array of {offset, child node} } > > Paths are separate from the tbaa tree itself, and are just: > path node -> {struct node, offset} or whatever. > > unions are just scalar type nodes with multiple children, not struct nodes > with special-cased offset zero. > > This also has a well-defined upgrade path. > > For "old-style" DAGs, like exist now, we know we have to regen the > metadata, and that it is wrong (So we could, for example, simply disable it > for correctness reasons) > . > Anything with a "new-style" DAG, we know is usable. > > In this representation, the most generic tbaa node is just the one that > contains the other. > If neither contains the other, you can just make one that has both as > children and use that. > (instead of now, where you'd have to have multiple parents again). > > (The above also may be better for other languages) > > --Dan > > > > > On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron <perrons at ca.ibm.com> wrote: > > Seems like the comments have stopped. I'll try to get a patch together. > Then we can continue the discussion from there. > > Hubert, as for the issue with the llvm optimizations losing the TBAA > information, it should be the responsibility to make sure the aliasing is > changed in the correct way. One function related to this has already been > mentioned: getMostGenericTBAA. > > Later, > Steven Perron > > > > From: Hubert Tong <hubert.reinterpretcast at gmail.com> > To: Steven Perron/Toronto/IBM at IBMCA > Cc: Daniel Berlin <dberlin at dberlin.org>, llvm-dev < > llvm-dev at lists.llvm.org>, Sanjoy Das <sanjoy at playingwithpointers.com> > Date: 2017/02/15 07:44 AM > Subject: Re: [llvm-dev] RFC: Representing unions in TBAA > ------------------------------ > > > > On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron <*perrons at ca.ibm.com* > <perrons at ca.ibm.com>> wrote: > 3) How should we handle a reference directly through a union, and a > reference that is not through the union? > > My solution was to look for each member of the union overlaps the given > offset, and see if any of those members aliased the other reference. If no > member aliases the other reference, then the answer is no alias. Otherwise > the answer is may alias. The run time for this would be proportional to > "distance to the root" * "number of overlapping members". This could be > slow if there are unions with many members or many unions of unions. > > Another option is to say that they do not alias. This would mean that all > references to unions must be explicitly through the union. > From what I gather from the thread so far, the access through the union > may be lost because of LLVM transformations. I am not sure how, in the face > of that, TBAA could indicate NoAlias safely (without the risk of > functional-correctness issues in correct programs) between types which > overlap within any union (within some portion of the program). > > As for the standards, it is definitely not true that all references to > unions must be explicitly through the union. However, if you are trying to > perform union-based type punning (under C11), then it appears that it is > intended that the read must be through the union. > > This would be the least restrictive aliasing allowing the most > optimization. The implementation would be simple. I believe we make the > parent of the TBAA node for the union to be "omnipotent char". This might > be similar to treating the union TBAA node more like a scalar node instead > of a struct-path. Then the traversal of the TBAA nodes will be quick. > I'll have to work this out a bit more, but, if this is good enough to meet > the requirements of the standard, I can try to think this through a little > more. I'll need Hubert and Daniel to comment on that since I am no expert > on the C and C++ standards. > > The third option is to be pessimistic and say "may alias" all of the time > (conservatively correct), and rely on other alias analysis to improve it. > This will have good compile time, but could hinder optimization. > Personally I do not like this option. Most of the time it will not have a > negative effect, but there will be a reasonable number of programs where > this will hurt optimization more that it needs to. > > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170301/e93bd273/attachment-0001.html>
Daniel Berlin via llvm-dev
2017-Mar-02 01:17 UTC
[llvm-dev] RFC: Representing unions in TBAA
And, fwiw, the path walking issues are easy to fix. What we did in GCC, and could be done here, is to generate the transitive closure of the children (or parents) as we saw them. The height of the TBAA tree is not that large. For scalar type nodes, this answers queries in O(1) time. Even in offset land, you can use this, because if none of your children (parents) have that type, it can't alias, no matter what. So it will cut off the query at the first point it is possible to discover that your path does not contain the type. The worst case is that the exact path does contain the type, in which case you just made a few wasted queries. On Wed, Mar 1, 2017 at 3:30 PM, Daniel Berlin <dberlin at dberlin.org> wrote:> So, https://bugs.llvm.org/show_bug.cgi?id=32056 is an example showing our > current TBAA tree for union generation is definitely irretrievably broken. > I'll be honest here. I'm pretty sure your proposal doesn't go far enough. > But truthfully, I would rather see us come closer to a representation we > know works, which is GCC's. > Let me try to simplify what you are suggesting, and what we have. > Our current representation is basically inverted from GCC, but we don't > allow things that would enable it to work. > > Given > union {int a, short b}; > > GCC's will be: > > union > / \ > short int > > > Everything is implicitly a subset of alias set 0 (which for C/C++ is char) > just to avoid representing it. > > Our metadata has no child links, and only a single parent link. > > You can't represent the above form because you can't make a single short > node a child of every union/struct it needs to be (lack of multiple > parents means you can't just frob them all to offset zero). > > Your suggestion is to invert this as a struct > > short int > | / > union > > We don't allow multiple parents, however. > Because of that, you suggest we follow all nodes for unions, special > casing union-type nodes somehow > > Let me suggest something different: > > We know the current structure fails us in a number of ways. > Instead of trying to shoehorn this into our current structure, I suggest: > we stop messing around and just have a GCC style tree, and represent the > children instead of the parents. > We make contained types descendants instead of ancestors. > We allow multiple children at each offset and for scalar type nodes.x` > > We could do that without changing to children, but honestly, i strongly > believe nobody who ever looks at this really understands it right now. > (If someone really does like the ancestor form, i'd love to understand why > :P) > > So if we are going to change it, we should change it to something that is > easier to understand. > > something like: > > scalar type node -> {name, children nodes} > struct node -> {name, array of {offset, child node} } > > Paths are separate from the tbaa tree itself, and are just: > path node -> {struct node, offset} or whatever. > > unions are just scalar type nodes with multiple children, not struct nodes > with special-cased offset zero. > > This also has a well-defined upgrade path. > > For "old-style" DAGs, like exist now, we know we have to regen the > metadata, and that it is wrong (So we could, for example, simply disable it > for correctness reasons) > . > Anything with a "new-style" DAG, we know is usable. > > In this representation, the most generic tbaa node is just the one that > contains the other. > If neither contains the other, you can just make one that has both as > children and use that. > (instead of now, where you'd have to have multiple parents again). > > (The above also may be better for other languages) > > --Dan > > > > > On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron <perrons at ca.ibm.com> wrote: > >> Seems like the comments have stopped. I'll try to get a patch together. >> Then we can continue the discussion from there. >> >> Hubert, as for the issue with the llvm optimizations losing the TBAA >> information, it should be the responsibility to make sure the aliasing is >> changed in the correct way. One function related to this has already been >> mentioned: getMostGenericTBAA. >> >> Later, >> Steven Perron >> >> >> >> From: Hubert Tong <hubert.reinterpretcast at gmail.com> >> To: Steven Perron/Toronto/IBM at IBMCA >> Cc: Daniel Berlin <dberlin at dberlin.org>, llvm-dev < >> llvm-dev at lists.llvm.org>, Sanjoy Das <sanjoy at playingwithpointers.com> >> Date: 2017/02/15 07:44 AM >> Subject: Re: [llvm-dev] RFC: Representing unions in TBAA >> ------------------------------ >> >> >> >> On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron <*perrons at ca.ibm.com* >> <perrons at ca.ibm.com>> wrote: >> 3) How should we handle a reference directly through a union, and a >> reference that is not through the union? >> >> My solution was to look for each member of the union overlaps the given >> offset, and see if any of those members aliased the other reference. If no >> member aliases the other reference, then the answer is no alias. Otherwise >> the answer is may alias. The run time for this would be proportional to >> "distance to the root" * "number of overlapping members". This could be >> slow if there are unions with many members or many unions of unions. >> >> Another option is to say that they do not alias. This would mean that >> all references to unions must be explicitly through the union. >> From what I gather from the thread so far, the access through the union >> may be lost because of LLVM transformations. I am not sure how, in the face >> of that, TBAA could indicate NoAlias safely (without the risk of >> functional-correctness issues in correct programs) between types which >> overlap within any union (within some portion of the program). >> >> As for the standards, it is definitely not true that all references to >> unions must be explicitly through the union. However, if you are trying to >> perform union-based type punning (under C11), then it appears that it is >> intended that the read must be through the union. >> >> This would be the least restrictive aliasing allowing the most >> optimization. The implementation would be simple. I believe we make the >> parent of the TBAA node for the union to be "omnipotent char". This might >> be similar to treating the union TBAA node more like a scalar node instead >> of a struct-path. Then the traversal of the TBAA nodes will be quick. >> I'll have to work this out a bit more, but, if this is good enough to meet >> the requirements of the standard, I can try to think this through a little >> more. I'll need Hubert and Daniel to comment on that since I am no expert >> on the C and C++ standards. >> >> The third option is to be pessimistic and say "may alias" all of the time >> (conservatively correct), and rely on other alias analysis to improve it. >> This will have good compile time, but could hinder optimization. >> Personally I do not like this option. Most of the time it will not have a >> negative effect, but there will be a reasonable number of programs where >> this will hurt optimization more that it needs to. >> >> >> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170301/b688134b/attachment.html>
On 03/01/2017 05:30 PM, Daniel Berlin via llvm-dev wrote:> So, https://bugs.llvm.org/show_bug.cgi?id=32056 is an example showing > our current TBAA tree for union generation is definitely irretrievably > broken. > I'll be honest here. I'm pretty sure your proposal doesn't go far enough. > But truthfully, I would rather see us come closer to a representation > we know works, which is GCC's. > Let me try to simplify what you are suggesting, and what we have. > Our current representation is basically inverted from GCC, but we > don't allow things that would enable it to work. > > Given > union {int a, short b}; > > GCC's will be: > > union > / \ > short int > > > Everything is implicitly a subset of alias set 0 (which for C/C++ is > char) just to avoid representing it. > > Our metadata has no child links, and only a single parent link. > > You can't represent the above form because you can't make a single > short node a child of every union/struct it needs to be (lack of > multiple parents means you can't just frob them all to offset zero). > > Your suggestion is to invert this as a struct > > short int > | / > union > > We don't allow multiple parents, however. > Because of that, you suggest we follow all nodes for unions, special > casing union-type nodes somehow > > Let me suggest something different: > > We know the current structure fails us in a number of ways. > Instead of trying to shoehorn this into our current structure, I > suggest: we stop messing around and just have a GCC style tree, and > represent the children instead of the parents. > We make contained types descendants instead of ancestors. > We allow multiple children at each offset and for scalar type nodes.x` > > We could do that without changing to children, but honestly, i > strongly believe nobody who ever looks at this really understands it > right now. > (If someone really does like the ancestor form, i'd love to understand > why :P) > > So if we are going to change it, we should change it to something that > is easier to understand. > > something like: > > scalar type node -> {name, children nodes} > struct node -> {name, array of {offset, child node} } > > Paths are separate from the tbaa tree itself, and are just: > path node -> {struct node, offset} or whatever. > > unions are just scalar type nodes with multiple children, not struct > nodes with special-cased offset zero. > > This also has a well-defined upgrade path. > > For "old-style" DAGs, like exist now, we know we have to regen the > metadata, and that it is wrong (So we could, for example, simply > disable it for correctness reasons) > . > Anything with a "new-style" DAG, we know is usable. > > In this representation, the most generic tbaa node is just the one > that contains the other. > If neither contains the other, you can just make one that has both as > children and use that. > (instead of now, where you'd have to have multiple parents again).You mean making a 'scalar type node', because those represent 'or'? -Hal> > (The above also may be better for other languages) > > --Dan > > > > > On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron <perrons at ca.ibm.com > <mailto:perrons at ca.ibm.com>> wrote: > > Seems like the comments have stopped. I'll try to get a patch > together. Then we can continue the discussion from there. > > Hubert, as for the issue with the llvm optimizations losing the > TBAA information, it should be the responsibility to make sure the > aliasing is changed in the correct way. One function related to > this has already been mentioned: getMostGenericTBAA. > > Later, > Steven Perron > > > > From: Hubert Tong <hubert.reinterpretcast at gmail.com > <mailto:hubert.reinterpretcast at gmail.com>> > To: Steven Perron/Toronto/IBM at IBMCA > Cc: Daniel Berlin <dberlin at dberlin.org > <mailto:dberlin at dberlin.org>>, llvm-dev <llvm-dev at lists.llvm.org > <mailto:llvm-dev at lists.llvm.org>>, Sanjoy Das > <sanjoy at playingwithpointers.com > <mailto:sanjoy at playingwithpointers.com>> > Date: 2017/02/15 07:44 AM > Subject: Re: [llvm-dev] RFC: Representing unions in TBAA > ------------------------------------------------------------------------ > > > > On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron > <_perrons at ca.ibm.com_ <mailto:perrons at ca.ibm.com>> wrote: > 3) How should we handle a reference directly through a union, and > a reference that is not through the union? > > My solution was to look for each member of the union overlaps the > given offset, and see if any of those members aliased the other > reference. If no member aliases the other reference, then the > answer is no alias. Otherwise the answer is may alias. The run > time for this would be proportional to "distance to the root" * > "number of overlapping members". This could be slow if there are > unions with many members or many unions of unions. > > Another option is to say that they do not alias. This would mean > that all references to unions must be explicitly through the union. > From what I gather from the thread so far, the access through the > union may be lost because of LLVM transformations. I am not sure > how, in the face of that, TBAA could indicate NoAlias safely > (without the risk of functional-correctness issues in correct > programs) between types which overlap within any union (within > some portion of the program). > > As for the standards, it is definitely not true that all > references to unions must be explicitly through the union. > However, if you are trying to perform union-based type punning > (under C11), then it appears that it is intended that the read > must be through the union. > > This would be the least restrictive aliasing allowing the most > optimization. The implementation would be simple. I believe we > make the parent of the TBAA node for the union to be "omnipotent > char". This might be similar to treating the union TBAA node more > like a scalar node instead of a struct-path. Then the traversal > of the TBAA nodes will be quick. I'll have to work this out a bit > more, but, if this is good enough to meet the requirements of the > standard, I can try to think this through a little more. I'll > need Hubert and Daniel to comment on that since I am no expert on > the C and C++ standards. > > The third option is to be pessimistic and say "may alias" all of > the time (conservatively correct), and rely on other alias > analysis to improve it. This will have good compile time, but > could hinder optimization. Personally I do not like this option. > Most of the time it will not have a negative effect, but there > will be a reasonable number of programs where this will hurt > optimization more that it needs to. > > > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170309/9d4848b8/attachment.html>
Steven Perron via llvm-dev
2017-Mar-09 18:21 UTC
[llvm-dev] RFC: Representing unions in TBAA
<div class="socmaildefaultfont" dir="ltr" style="font-family:Arial, Helvetica, sans-serif;font-size:10.5pt" ><div dir="ltr" >I could make something like Daniel's suggestion work. The main question I still have is how we tell the difference between the "old-style" DAG and the "new-style" DAG? I don't know if there is some standard way of doing this. Do we just base it on the version of llvm-ir?</div> <div dir="ltr" > </div> <div dir="ltr" >Would there be a need to maintain different code for both type of DAG? If we change clang to generate the new-style DAG, things will be fine for C/C++. However, will this force other components that generate llvm-ir to change?</div> <div dir="ltr" > </div> <div dir="ltr" >In general, what kind of backwards compatibility do we need to keep?</div> <div dir="ltr" > </div> <div dir="ltr" >Later,</div> <div dir="ltr" >Steven Perron</div></div><BR>
As far as my limited understanding goes, this should be a bugfix (correctness) and it should trump backward compatibility requirement. It makes no sense to me to continue backward compatible incorrectness. Kevin On Thu, Mar 9, 2017 at 10:21 AM, Steven Perron via llvm-dev < llvm-dev at lists.llvm.org> wrote:> I could make something like Daniel's suggestion work. The main question I > still have is how we tell the difference between the "old-style" DAG and > the "new-style" DAG? I don't know if there is some standard way of doing > this. Do we just base it on the version of llvm-ir? > > Would there be a need to maintain different code for both type of DAG? If > we change clang to generate the new-style DAG, things will be fine for > C/C++. However, will this force other components that generate llvm-ir to > change? > > In general, what kind of backwards compatibility do we need to keep? > > Later, > Steven Perron > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170309/5c20eab3/attachment-0001.html>
On 03/09/2017 12:21 PM, Steven Perron wrote:> I could make something like Daniel's suggestion work. The main > question I still have is how we tell the difference between the > "old-style" DAG and the "new-style" DAG? I don't know if there is > some standard way of doing this. Do we just base it on the version of > llvm-ir?We don't generally use IR version like that. Either make the format incompatible so that you can tell them apart, or we'll add a separate tag type (!tbaa2 or whatever).> Would there be a need to maintain different code for both type of > DAG? If we change clang to generate the new-style DAG, things will be > fine for C/C++. However, will this force other components that > generate llvm-ir to change? > In general, what kind of backwards compatibility do we need to keep?There are other frontends that generate TBAA, and there might not be a clear reason to make them change. The existing format is well defined, it just does not match C/C++ semantics well enough for our needs. On the other hand, if the new format subsumes the old one, we might just expect people to migrate. Let's make that decision separately. Assume for now that we'll have both at the same time and we'll change what Clang does. -Hal> Later, > Steven Perron >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170309/89ba00e7/attachment.html>
Daniel Berlin via llvm-dev
2017-Mar-09 19:15 UTC
[llvm-dev] RFC: Representing unions in TBAA
On Thu, Mar 9, 2017 at 9:57 AM, Hal Finkel <hfinkel at anl.gov> wrote:> > On 03/01/2017 05:30 PM, Daniel Berlin via llvm-dev wrote: > > So, https://bugs.llvm.org/show_bug.cgi?id=32056 is an example showing our > current TBAA tree for union generation is definitely irretrievably broken. > I'll be honest here. I'm pretty sure your proposal doesn't go far enough. > But truthfully, I would rather see us come closer to a representation we > know works, which is GCC's. > Let me try to simplify what you are suggesting, and what we have. > Our current representation is basically inverted from GCC, but we don't > allow things that would enable it to work. > > Given > union {int a, short b}; > > GCC's will be: > > union > / \ > short int > > > Everything is implicitly a subset of alias set 0 (which for C/C++ is char) > just to avoid representing it. > > Our metadata has no child links, and only a single parent link. > > You can't represent the above form because you can't make a single short > node a child of every union/struct it needs to be (lack of multiple > parents means you can't just frob them all to offset zero). > > Your suggestion is to invert this as a struct > > short int > | / > union > > We don't allow multiple parents, however. > Because of that, you suggest we follow all nodes for unions, special > casing union-type nodes somehow > > Let me suggest something different: > > We know the current structure fails us in a number of ways. > Instead of trying to shoehorn this into our current structure, I suggest: > we stop messing around and just have a GCC style tree, and represent the > children instead of the parents. > We make contained types descendants instead of ancestors. > We allow multiple children at each offset and for scalar type nodes.x` > > We could do that without changing to children, but honestly, i strongly > believe nobody who ever looks at this really understands it right now. > (If someone really does like the ancestor form, i'd love to understand why > :P) > > So if we are going to change it, we should change it to something that is > easier to understand. > > something like: > > scalar type node -> {name, children nodes} > struct node -> {name, array of {offset, child node} } > > Paths are separate from the tbaa tree itself, and are just: > path node -> {struct node, offset} or whatever. > > unions are just scalar type nodes with multiple children, not struct nodes > with special-cased offset zero. > > This also has a well-defined upgrade path. > > For "old-style" DAGs, like exist now, we know we have to regen the > metadata, and that it is wrong (So we could, for example, simply disable it > for correctness reasons) > . > Anything with a "new-style" DAG, we know is usable. > > In this representation, the most generic tbaa node is just the one that > contains the other. > If neither contains the other, you can just make one that has both as > children and use that. > (instead of now, where you'd have to have multiple parents again). > > > You mean making a 'scalar type node', because those represent 'or'? >Yes. I would probably stop calling them scalar type nodes and struct type nodes too, while we are bike-shedding a bit :) The terminology seems to be based on what we think a type system that generates those kinds of nodes look like. But uh 1. That's wrong a lot of the time, because each frontend may want to use different types of nodes to be conservatively correct, and because different things can be represented, correctly, in multiple ways (IE, as a stupid example, you could represent any type as a struct node where all pieces go to the same place). In fact, it's not even true for the few things that do generate TBAA trees right now. 2. It gives you no idea what they do in LLVM, which is, IMHO, what anyone reading those docs cares about. I'd probably call them based on something related to their actual semantic in llvm's tbaa tree is. IE i'd call scalar type nodes "or-type nodes" or "any of" nodes or something that gives people an idea that it means the aliasing result is may-alias if any path from that node is may-alias. I'd call struct type nodes "offset-based nodes", becuase that's really what they are. (access-path nodes are actually access paths, so that seems find)> -Hal > > > (The above also may be better for other languages) > > --Dan > > > > > On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron <perrons at ca.ibm.com> wrote: > >> Seems like the comments have stopped. I'll try to get a patch together. >> Then we can continue the discussion from there. >> >> Hubert, as for the issue with the llvm optimizations losing the TBAA >> information, it should be the responsibility to make sure the aliasing is >> changed in the correct way. One function related to this has already been >> mentioned: getMostGenericTBAA. >> >> Later, >> Steven Perron >> >> >> >> From: Hubert Tong <hubert.reinterpretcast at gmail.com> >> To: Steven Perron/Toronto/IBM at IBMCA >> Cc: Daniel Berlin <dberlin at dberlin.org>, llvm-dev < >> llvm-dev at lists.llvm.org>, Sanjoy Das <sanjoy at playingwithpointers.com> >> Date: 2017/02/15 07:44 AM >> Subject: Re: [llvm-dev] RFC: Representing unions in TBAA >> ------------------------------ >> >> >> >> On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron <*perrons at ca.ibm.com* >> <perrons at ca.ibm.com>> wrote: >> 3) How should we handle a reference directly through a union, and a >> reference that is not through the union? >> >> My solution was to look for each member of the union overlaps the given >> offset, and see if any of those members aliased the other reference. If no >> member aliases the other reference, then the answer is no alias. Otherwise >> the answer is may alias. The run time for this would be proportional to >> "distance to the root" * "number of overlapping members". This could be >> slow if there are unions with many members or many unions of unions. >> >> Another option is to say that they do not alias. This would mean that >> all references to unions must be explicitly through the union. >> From what I gather from the thread so far, the access through the union >> may be lost because of LLVM transformations. I am not sure how, in the face >> of that, TBAA could indicate NoAlias safely (without the risk of >> functional-correctness issues in correct programs) between types which >> overlap within any union (within some portion of the program). >> >> As for the standards, it is definitely not true that all references to >> unions must be explicitly through the union. However, if you are trying to >> perform union-based type punning (under C11), then it appears that it is >> intended that the read must be through the union. >> >> This would be the least restrictive aliasing allowing the most >> optimization. The implementation would be simple. I believe we make the >> parent of the TBAA node for the union to be "omnipotent char". This might >> be similar to treating the union TBAA node more like a scalar node instead >> of a struct-path. Then the traversal of the TBAA nodes will be quick. >> I'll have to work this out a bit more, but, if this is good enough to meet >> the requirements of the standard, I can try to think this through a little >> more. I'll need Hubert and Daniel to comment on that since I am no expert >> on the C and C++ standards. >> >> The third option is to be pessimistic and say "may alias" all of the time >> (conservatively correct), and rely on other alias analysis to improve it. >> This will have good compile time, but could hinder optimization. >> Personally I do not like this option. Most of the time it will not have a >> negative effect, but there will be a reasonable number of programs where >> this will hurt optimization more that it needs to. >> >> >> >> > > > _______________________________________________ > LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > -- > Hal Finkel > Lead, Compiler Technology and Programming Languages > Leadership Computing Facility > Argonne National Laboratory > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170309/cfa46220/attachment.html>
Krzysztof Parzyszek via llvm-dev
2017-Apr-07 19:09 UTC
[llvm-dev] RFC: Representing unions in TBAA
Can we turn off TBAA info for union member accesses in clang before this gets fixed? -Krzysztof On 3/1/2017 5:30 PM, Daniel Berlin via llvm-dev wrote:> So, https://bugs.llvm.org/show_bug.cgi?id=32056 is an example showing > our current TBAA tree for union generation is definitely irretrievably > broken. > I'll be honest here. I'm pretty sure your proposal doesn't go far enough. > But truthfully, I would rather see us come closer to a representation > we know works, which is GCC's. > Let me try to simplify what you are suggesting, and what we have. > Our current representation is basically inverted from GCC, but we don't > allow things that would enable it to work. > > Given > union {int a, short b}; > > GCC's will be: > > union > / \ > short int > > > Everything is implicitly a subset of alias set 0 (which for C/C++ is > char) just to avoid representing it. > > Our metadata has no child links, and only a single parent link. > > You can't represent the above form because you can't make a single short > node a child of every union/struct it needs to be (lack of multiple > parents means you can't just frob them all to offset zero). > > Your suggestion is to invert this as a struct > > short int > | / > union > > We don't allow multiple parents, however. > Because of that, you suggest we follow all nodes for unions, special > casing union-type nodes somehow > > Let me suggest something different: > > We know the current structure fails us in a number of ways. > Instead of trying to shoehorn this into our current structure, I > suggest: we stop messing around and just have a GCC style tree, and > represent the children instead of the parents. > We make contained types descendants instead of ancestors. > We allow multiple children at each offset and for scalar type nodes.x` > > We could do that without changing to children, but honestly, i strongly > believe nobody who ever looks at this really understands it right now. > (If someone really does like the ancestor form, i'd love to understand > why :P) > > So if we are going to change it, we should change it to something that > is easier to understand. > > something like: > > scalar type node -> {name, children nodes} > struct node -> {name, array of {offset, child node} } > > Paths are separate from the tbaa tree itself, and are just: > path node -> {struct node, offset} or whatever. > > unions are just scalar type nodes with multiple children, not struct > nodes with special-cased offset zero. > > This also has a well-defined upgrade path. > > For "old-style" DAGs, like exist now, we know we have to regen the > metadata, and that it is wrong (So we could, for example, simply disable > it for correctness reasons) > . > Anything with a "new-style" DAG, we know is usable. > > In this representation, the most generic tbaa node is just the one that > contains the other. > If neither contains the other, you can just make one that has both as > children and use that. > (instead of now, where you'd have to have multiple parents again). > > (The above also may be better for other languages) > > --Dan > > > > > On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron <perrons at ca.ibm.com > <mailto:perrons at ca.ibm.com>> wrote: > > Seems like the comments have stopped. I'll try to get a patch > together. Then we can continue the discussion from there. > > Hubert, as for the issue with the llvm optimizations losing the TBAA > information, it should be the responsibility to make sure the > aliasing is changed in the correct way. One function related to > this has already been mentioned: getMostGenericTBAA. > > Later, > Steven Perron > > > > From: Hubert Tong <hubert.reinterpretcast at gmail.com > <mailto:hubert.reinterpretcast at gmail.com>> > To: Steven Perron/Toronto/IBM at IBMCA > Cc: Daniel Berlin <dberlin at dberlin.org > <mailto:dberlin at dberlin.org>>, llvm-dev <llvm-dev at lists.llvm.org > <mailto:llvm-dev at lists.llvm.org>>, Sanjoy Das > <sanjoy at playingwithpointers.com <mailto:sanjoy at playingwithpointers.com>> > Date: 2017/02/15 07:44 AM > Subject: Re: [llvm-dev] RFC: Representing unions in TBAA > ------------------------------------------------------------------------ > > > > On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron > <_perrons at ca.ibm.com_ <mailto:perrons at ca.ibm.com>> wrote: > 3) How should we handle a reference directly through a union, and a > reference that is not through the union? > > My solution was to look for each member of the union overlaps the > given offset, and see if any of those members aliased the other > reference. If no member aliases the other reference, then the > answer is no alias. Otherwise the answer is may alias. The run > time for this would be proportional to "distance to the root" * > "number of overlapping members". This could be slow if there are > unions with many members or many unions of unions. > > Another option is to say that they do not alias. This would mean > that all references to unions must be explicitly through the union. > From what I gather from the thread so far, the access through the > union may be lost because of LLVM transformations. I am not sure > how, in the face of that, TBAA could indicate NoAlias safely > (without the risk of functional-correctness issues in correct > programs) between types which overlap within any union (within some > portion of the program). > > As for the standards, it is definitely not true that all references > to unions must be explicitly through the union. However, if you are > trying to perform union-based type punning (under C11), then it > appears that it is intended that the read must be through the union. > > This would be the least restrictive aliasing allowing the most > optimization. The implementation would be simple. I believe we > make the parent of the TBAA node for the union to be "omnipotent > char". This might be similar to treating the union TBAA node more > like a scalar node instead of a struct-path. Then the traversal of > the TBAA nodes will be quick. I'll have to work this out a bit > more, but, if this is good enough to meet the requirements of the > standard, I can try to think this through a little more. I'll need > Hubert and Daniel to comment on that since I am no expert on the C > and C++ standards. > > The third option is to be pessimistic and say "may alias" all of the > time (conservatively correct), and rely on other alias analysis to > improve it. This will have good compile time, but could hinder > optimization. Personally I do not like this option. Most of the > time it will not have a negative effect, but there will be a > reasonable number of programs where this will hurt optimization more > that it needs to. > > > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Daniel Berlin via llvm-dev
2017-Apr-07 20:25 UTC
[llvm-dev] RFC: Representing unions in TBAA
Not familiar with clang enough to know. Staring at it, it looks a bit annoying to do. You'd basically just want to stop decorating loads/stores with tbaa info if it's a union. On Fri, Apr 7, 2017 at 12:09 PM, Krzysztof Parzyszek via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Can we turn off TBAA info for union member accesses in clang before this > gets fixed? > > -Krzysztof > > > On 3/1/2017 5:30 PM, Daniel Berlin via llvm-dev wrote: > >> So, https://bugs.llvm.org/show_bug.cgi?id=32056 is an example showing >> our current TBAA tree for union generation is definitely irretrievably >> broken. >> I'll be honest here. I'm pretty sure your proposal doesn't go far enough. >> But truthfully, I would rather see us come closer to a representation >> we know works, which is GCC's. >> Let me try to simplify what you are suggesting, and what we have. >> Our current representation is basically inverted from GCC, but we don't >> allow things that would enable it to work. >> >> Given >> union {int a, short b}; >> >> GCC's will be: >> >> union >> / \ >> short int >> >> >> Everything is implicitly a subset of alias set 0 (which for C/C++ is >> char) just to avoid representing it. >> >> Our metadata has no child links, and only a single parent link. >> >> You can't represent the above form because you can't make a single short >> node a child of every union/struct it needs to be (lack of multiple >> parents means you can't just frob them all to offset zero). >> >> Your suggestion is to invert this as a struct >> >> short int >> | / >> union >> >> We don't allow multiple parents, however. >> Because of that, you suggest we follow all nodes for unions, special >> casing union-type nodes somehow >> >> Let me suggest something different: >> >> We know the current structure fails us in a number of ways. >> Instead of trying to shoehorn this into our current structure, I >> suggest: we stop messing around and just have a GCC style tree, and >> represent the children instead of the parents. >> We make contained types descendants instead of ancestors. >> We allow multiple children at each offset and for scalar type nodes.x` >> >> We could do that without changing to children, but honestly, i strongly >> believe nobody who ever looks at this really understands it right now. >> (If someone really does like the ancestor form, i'd love to understand >> why :P) >> >> So if we are going to change it, we should change it to something that >> is easier to understand. >> >> something like: >> >> scalar type node -> {name, children nodes} >> struct node -> {name, array of {offset, child node} } >> >> Paths are separate from the tbaa tree itself, and are just: >> path node -> {struct node, offset} or whatever. >> >> unions are just scalar type nodes with multiple children, not struct >> nodes with special-cased offset zero. >> >> This also has a well-defined upgrade path. >> >> For "old-style" DAGs, like exist now, we know we have to regen the >> metadata, and that it is wrong (So we could, for example, simply disable >> it for correctness reasons) >> . >> Anything with a "new-style" DAG, we know is usable. >> >> In this representation, the most generic tbaa node is just the one that >> contains the other. >> If neither contains the other, you can just make one that has both as >> children and use that. >> (instead of now, where you'd have to have multiple parents again). >> >> (The above also may be better for other languages) >> >> --Dan >> >> >> >> >> On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron <perrons at ca.ibm.com >> <mailto:perrons at ca.ibm.com>> wrote: >> >> Seems like the comments have stopped. I'll try to get a patch >> together. Then we can continue the discussion from there. >> >> Hubert, as for the issue with the llvm optimizations losing the TBAA >> information, it should be the responsibility to make sure the >> aliasing is changed in the correct way. One function related to >> this has already been mentioned: getMostGenericTBAA. >> >> Later, >> Steven Perron >> >> >> >> From: Hubert Tong <hubert.reinterpretcast at gmail.com >> <mailto:hubert.reinterpretcast at gmail.com>> >> To: Steven Perron/Toronto/IBM at IBMCA >> Cc: Daniel Berlin <dberlin at dberlin.org >> <mailto:dberlin at dberlin.org>>, llvm-dev <llvm-dev at lists.llvm.org >> <mailto:llvm-dev at lists.llvm.org>>, Sanjoy Das >> <sanjoy at playingwithpointers.com <mailto:sanjoy at playingwithpoin >> ters.com>> >> Date: 2017/02/15 07:44 AM >> Subject: Re: [llvm-dev] RFC: Representing unions in TBAA >> ------------------------------------------------------------ >> ------------ >> >> >> >> On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron >> <_perrons at ca.ibm.com_ <mailto:perrons at ca.ibm.com>> wrote: >> 3) How should we handle a reference directly through a union, and a >> reference that is not through the union? >> >> My solution was to look for each member of the union overlaps the >> given offset, and see if any of those members aliased the other >> reference. If no member aliases the other reference, then the >> answer is no alias. Otherwise the answer is may alias. The run >> time for this would be proportional to "distance to the root" * >> "number of overlapping members". This could be slow if there are >> unions with many members or many unions of unions. >> >> Another option is to say that they do not alias. This would mean >> that all references to unions must be explicitly through the union. >> From what I gather from the thread so far, the access through the >> union may be lost because of LLVM transformations. I am not sure >> how, in the face of that, TBAA could indicate NoAlias safely >> (without the risk of functional-correctness issues in correct >> programs) between types which overlap within any union (within some >> portion of the program). >> >> As for the standards, it is definitely not true that all references >> to unions must be explicitly through the union. However, if you are >> trying to perform union-based type punning (under C11), then it >> appears that it is intended that the read must be through the union. >> >> This would be the least restrictive aliasing allowing the most >> optimization. The implementation would be simple. I believe we >> make the parent of the TBAA node for the union to be "omnipotent >> char". This might be similar to treating the union TBAA node more >> like a scalar node instead of a struct-path. Then the traversal of >> the TBAA nodes will be quick. I'll have to work this out a bit >> more, but, if this is good enough to meet the requirements of the >> standard, I can try to think this through a little more. I'll need >> Hubert and Daniel to comment on that since I am no expert on the C >> and C++ standards. >> >> The third option is to be pessimistic and say "may alias" all of the >> time (conservatively correct), and rely on other alias analysis to >> improve it. This will have good compile time, but could hinder >> optimization. Personally I do not like this option. Most of the >> time it will not have a negative effect, but there will be a >> reasonable number of programs where this will hurt optimization more >> that it needs to. >> >> >> >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted > by The Linux Foundation > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170407/bc7f872f/attachment.html>
On 03/01/2017 05:30 PM, Daniel Berlin via llvm-dev wrote:> So, https://bugs.llvm.org/show_bug.cgi?id=32056 is an example showing > our current TBAA tree for union generation is definitely irretrievably > broken. > I'll be honest here. I'm pretty sure your proposal doesn't go far enough. > But truthfully, I would rather see us come closer to a representation > we know works, which is GCC's. > Let me try to simplify what you are suggesting, and what we have. > Our current representation is basically inverted from GCC, but we > don't allow things that would enable it to work. > > Given > union {int a, short b}; > > GCC's will be: > > union > / \ > short int > > > Everything is implicitly a subset of alias set 0 (which for C/C++ is > char) just to avoid representing it. > > Our metadata has no child links, and only a single parent link. > > You can't represent the above form because you can't make a single > short node a child of every union/struct it needs to be (lack of > multiple parents means you can't just frob them all to offset zero). > > Your suggestion is to invert this as a struct > > short int > | / > union > > We don't allow multiple parents, however. > Because of that, you suggest we follow all nodes for unions, special > casing union-type nodes somehowNow that I've spent a bunch of time looking at this, I'd like to voice support for Steven's original proposal. In the context of what we have, it makes sense, seems sound, and should fix the representational mapping problem we currently have. Something completely different (e.g. closer to what GCC uses) can work too, but this seems unnecessary (the proposed extension to the current semantics seem equivalently expressive). What you call "special casing of union-type nodes" does not actually seem all that special. The rule seems quite simple: if you some across duplicate offsets, then search all of them. This should not be difficult to implement or use, and seems no less efficient than any other way of encoding the concept of disjunction in the hierarchy. -Hal> > Let me suggest something different: > > We know the current structure fails us in a number of ways. > Instead of trying to shoehorn this into our current structure, I > suggest: we stop messing around and just have a GCC style tree, and > represent the children instead of the parents. > We make contained types descendants instead of ancestors. > We allow multiple children at each offset and for scalar type nodes.x` > > We could do that without changing to children, but honestly, i > strongly believe nobody who ever looks at this really understands it > right now. > (If someone really does like the ancestor form, i'd love to understand > why :P) > > So if we are going to change it, we should change it to something that > is easier to understand. > > something like: > > scalar type node -> {name, children nodes} > struct node -> {name, array of {offset, child node} } > > Paths are separate from the tbaa tree itself, and are just: > path node -> {struct node, offset} or whatever. > > unions are just scalar type nodes with multiple children, not struct > nodes with special-cased offset zero. > > This also has a well-defined upgrade path. > > For "old-style" DAGs, like exist now, we know we have to regen the > metadata, and that it is wrong (So we could, for example, simply > disable it for correctness reasons) > . > Anything with a "new-style" DAG, we know is usable. > > In this representation, the most generic tbaa node is just the one > that contains the other. > If neither contains the other, you can just make one that has both as > children and use that. > (instead of now, where you'd have to have multiple parents again). > > (The above also may be better for other languages) > > --Dan > > > > > On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron <perrons at ca.ibm.com > <mailto:perrons at ca.ibm.com>> wrote: > > Seems like the comments have stopped. I'll try to get a patch > together. Then we can continue the discussion from there. > > Hubert, as for the issue with the llvm optimizations losing the > TBAA information, it should be the responsibility to make sure the > aliasing is changed in the correct way. One function related to > this has already been mentioned: getMostGenericTBAA. > > Later, > Steven Perron > > > > From: Hubert Tong <hubert.reinterpretcast at gmail.com > <mailto:hubert.reinterpretcast at gmail.com>> > To: Steven Perron/Toronto/IBM at IBMCA > Cc: Daniel Berlin <dberlin at dberlin.org > <mailto:dberlin at dberlin.org>>, llvm-dev <llvm-dev at lists.llvm.org > <mailto:llvm-dev at lists.llvm.org>>, Sanjoy Das > <sanjoy at playingwithpointers.com > <mailto:sanjoy at playingwithpointers.com>> > Date: 2017/02/15 07:44 AM > Subject: Re: [llvm-dev] RFC: Representing unions in TBAA > ------------------------------------------------------------------------ > > > > On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron > <_perrons at ca.ibm.com_ <mailto:perrons at ca.ibm.com>> wrote: > 3) How should we handle a reference directly through a union, and > a reference that is not through the union? > > My solution was to look for each member of the union overlaps the > given offset, and see if any of those members aliased the other > reference. If no member aliases the other reference, then the > answer is no alias. Otherwise the answer is may alias. The run > time for this would be proportional to "distance to the root" * > "number of overlapping members". This could be slow if there are > unions with many members or many unions of unions. > > Another option is to say that they do not alias. This would mean > that all references to unions must be explicitly through the union. > From what I gather from the thread so far, the access through the > union may be lost because of LLVM transformations. I am not sure > how, in the face of that, TBAA could indicate NoAlias safely > (without the risk of functional-correctness issues in correct > programs) between types which overlap within any union (within > some portion of the program). > > As for the standards, it is definitely not true that all > references to unions must be explicitly through the union. > However, if you are trying to perform union-based type punning > (under C11), then it appears that it is intended that the read > must be through the union. > > This would be the least restrictive aliasing allowing the most > optimization. The implementation would be simple. I believe we > make the parent of the TBAA node for the union to be "omnipotent > char". This might be similar to treating the union TBAA node more > like a scalar node instead of a struct-path. Then the traversal > of the TBAA nodes will be quick. I'll have to work this out a bit > more, but, if this is good enough to meet the requirements of the > standard, I can try to think this through a little more. I'll > need Hubert and Daniel to comment on that since I am no expert on > the C and C++ standards. > > The third option is to be pessimistic and say "may alias" all of > the time (conservatively correct), and rely on other alias > analysis to improve it. This will have good compile time, but > could hinder optimization. Personally I do not like this option. > Most of the time it will not have a negative effect, but there > will be a reasonable number of programs where this will hurt > optimization more that it needs to. > > > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170514/e265919b/attachment.html>
Daniel Berlin via llvm-dev
2017-May-14 16:06 UTC
[llvm-dev] RFC: Representing unions in TBAA
On Sun, May 14, 2017 at 8:37 AM, Hal Finkel <hfinkel at anl.gov> wrote:> > On 03/01/2017 05:30 PM, Daniel Berlin via llvm-dev wrote: > > So, https://bugs.llvm.org/show_bug.cgi?id=32056 is an example showing our > current TBAA tree for union generation is definitely irretrievably broken. > I'll be honest here. I'm pretty sure your proposal doesn't go far enough. > But truthfully, I would rather see us come closer to a representation we > know works, which is GCC's. > Let me try to simplify what you are suggesting, and what we have. > Our current representation is basically inverted from GCC, but we don't > allow things that would enable it to work. > > Given > union {int a, short b}; > > GCC's will be: > > union > / \ > short int > > > Everything is implicitly a subset of alias set 0 (which for C/C++ is char) > just to avoid representing it. > > Our metadata has no child links, and only a single parent link. > > You can't represent the above form because you can't make a single short > node a child of every union/struct it needs to be (lack of multiple > parents means you can't just frob them all to offset zero). > > Your suggestion is to invert this as a struct > > short int > | / > union > > We don't allow multiple parents, however. > Because of that, you suggest we follow all nodes for unions, special > casing union-type nodes somehow > > > Now that I've spent a bunch of time looking at this, I'd like to voice > support for Steven's original proposal. In the context of what we have, it > makes sense, seems sound, and should fix the representational mapping > problem we currently have. >Except you can't easily differentiate it from the current one, and if we are going to have to upgrade/break compatibility, why not just do it once right, a way we know works, instead of risk screwing it up again, and playing with a representation we aren't actually sure we can make efficient for this case?> Something completely different (e.g. closer to what GCC uses) can work > too, but this seems unnecessary (the proposed extension to the current > semantics seem equivalently expressive). >Yes, they are equivalently expressive.> > What you call "special casing of union-type nodes" does not actually seem > all that special. The rule seems quite simple: if you some across > duplicate offsets, then search all of them. >Which means you have to know they exist, for starters, which means keeping it in some sorted order and checking, or some other mechanism. This should not be difficult to implement or use, and seems no less> efficient than any other way of encoding the concept of disjunction in the > hierarchy. > > It is, in actuality, less efficient.For starters, in the inverted representation, you don't have to explicitly represent the things that are children of alias set zero (char in C++'s case), which is quite common. It's also trivial to generate transitive closures of parts of the graph in the inverted representation, and be able to use them, if you need to. It turns out to be much trickier the other way. Those are just trivial examples. Now, how often this matters, don't know. But i'm suggesting what i believe to be the most practical route: Given a situation where our representation has been broken for years, take an approach that is battle tested and we know works and is efficient, instead of trying to patch our representation and hope we've thought it through well enough :) Does this mean the original proposal won't work? Nope. It may in fact work. But i'd still do the thing i knew already worked well. Because like i said, you are going to break compatibility anyway, so ... -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170514/5dce33e9/attachment.html>