Daniel Berlin via llvm-dev
2017-Apr-07 20:25 UTC
[llvm-dev] RFC: Representing unions in TBAA
Not familiar with clang enough to know. Staring at it, it looks a bit annoying to do. You'd basically just want to stop decorating loads/stores with tbaa info if it's a union. On Fri, Apr 7, 2017 at 12:09 PM, Krzysztof Parzyszek via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Can we turn off TBAA info for union member accesses in clang before this > gets fixed? > > -Krzysztof > > > On 3/1/2017 5:30 PM, Daniel Berlin via llvm-dev wrote: > >> So, https://bugs.llvm.org/show_bug.cgi?id=32056 is an example showing >> our current TBAA tree for union generation is definitely irretrievably >> broken. >> I'll be honest here. I'm pretty sure your proposal doesn't go far enough. >> But truthfully, I would rather see us come closer to a representation >> we know works, which is GCC's. >> Let me try to simplify what you are suggesting, and what we have. >> Our current representation is basically inverted from GCC, but we don't >> allow things that would enable it to work. >> >> Given >> union {int a, short b}; >> >> GCC's will be: >> >> union >> / \ >> short int >> >> >> Everything is implicitly a subset of alias set 0 (which for C/C++ is >> char) just to avoid representing it. >> >> Our metadata has no child links, and only a single parent link. >> >> You can't represent the above form because you can't make a single short >> node a child of every union/struct it needs to be (lack of multiple >> parents means you can't just frob them all to offset zero). >> >> Your suggestion is to invert this as a struct >> >> short int >> | / >> union >> >> We don't allow multiple parents, however. >> Because of that, you suggest we follow all nodes for unions, special >> casing union-type nodes somehow >> >> Let me suggest something different: >> >> We know the current structure fails us in a number of ways. >> Instead of trying to shoehorn this into our current structure, I >> suggest: we stop messing around and just have a GCC style tree, and >> represent the children instead of the parents. >> We make contained types descendants instead of ancestors. >> We allow multiple children at each offset and for scalar type nodes.x` >> >> We could do that without changing to children, but honestly, i strongly >> believe nobody who ever looks at this really understands it right now. >> (If someone really does like the ancestor form, i'd love to understand >> why :P) >> >> So if we are going to change it, we should change it to something that >> is easier to understand. >> >> something like: >> >> scalar type node -> {name, children nodes} >> struct node -> {name, array of {offset, child node} } >> >> Paths are separate from the tbaa tree itself, and are just: >> path node -> {struct node, offset} or whatever. >> >> unions are just scalar type nodes with multiple children, not struct >> nodes with special-cased offset zero. >> >> This also has a well-defined upgrade path. >> >> For "old-style" DAGs, like exist now, we know we have to regen the >> metadata, and that it is wrong (So we could, for example, simply disable >> it for correctness reasons) >> . >> Anything with a "new-style" DAG, we know is usable. >> >> In this representation, the most generic tbaa node is just the one that >> contains the other. >> If neither contains the other, you can just make one that has both as >> children and use that. >> (instead of now, where you'd have to have multiple parents again). >> >> (The above also may be better for other languages) >> >> --Dan >> >> >> >> >> On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron <perrons at ca.ibm.com >> <mailto:perrons at ca.ibm.com>> wrote: >> >> Seems like the comments have stopped. I'll try to get a patch >> together. Then we can continue the discussion from there. >> >> Hubert, as for the issue with the llvm optimizations losing the TBAA >> information, it should be the responsibility to make sure the >> aliasing is changed in the correct way. One function related to >> this has already been mentioned: getMostGenericTBAA. >> >> Later, >> Steven Perron >> >> >> >> From: Hubert Tong <hubert.reinterpretcast at gmail.com >> <mailto:hubert.reinterpretcast at gmail.com>> >> To: Steven Perron/Toronto/IBM at IBMCA >> Cc: Daniel Berlin <dberlin at dberlin.org >> <mailto:dberlin at dberlin.org>>, llvm-dev <llvm-dev at lists.llvm.org >> <mailto:llvm-dev at lists.llvm.org>>, Sanjoy Das >> <sanjoy at playingwithpointers.com <mailto:sanjoy at playingwithpoin >> ters.com>> >> Date: 2017/02/15 07:44 AM >> Subject: Re: [llvm-dev] RFC: Representing unions in TBAA >> ------------------------------------------------------------ >> ------------ >> >> >> >> On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron >> <_perrons at ca.ibm.com_ <mailto:perrons at ca.ibm.com>> wrote: >> 3) How should we handle a reference directly through a union, and a >> reference that is not through the union? >> >> My solution was to look for each member of the union overlaps the >> given offset, and see if any of those members aliased the other >> reference. If no member aliases the other reference, then the >> answer is no alias. Otherwise the answer is may alias. The run >> time for this would be proportional to "distance to the root" * >> "number of overlapping members". This could be slow if there are >> unions with many members or many unions of unions. >> >> Another option is to say that they do not alias. This would mean >> that all references to unions must be explicitly through the union. >> From what I gather from the thread so far, the access through the >> union may be lost because of LLVM transformations. I am not sure >> how, in the face of that, TBAA could indicate NoAlias safely >> (without the risk of functional-correctness issues in correct >> programs) between types which overlap within any union (within some >> portion of the program). >> >> As for the standards, it is definitely not true that all references >> to unions must be explicitly through the union. However, if you are >> trying to perform union-based type punning (under C11), then it >> appears that it is intended that the read must be through the union. >> >> This would be the least restrictive aliasing allowing the most >> optimization. The implementation would be simple. I believe we >> make the parent of the TBAA node for the union to be "omnipotent >> char". This might be similar to treating the union TBAA node more >> like a scalar node instead of a struct-path. Then the traversal of >> the TBAA nodes will be quick. I'll have to work this out a bit >> more, but, if this is good enough to meet the requirements of the >> standard, I can try to think this through a little more. I'll need >> Hubert and Daniel to comment on that since I am no expert on the C >> and C++ standards. >> >> The third option is to be pessimistic and say "may alias" all of the >> time (conservatively correct), and rely on other alias analysis to >> improve it. This will have good compile time, but could hinder >> optimization. Personally I do not like this option. Most of the >> time it will not have a negative effect, but there will be a >> reasonable number of programs where this will hurt optimization more >> that it needs to. >> >> >> >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted > by The Linux Foundation > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170407/bc7f872f/attachment.html>
Krzysztof Parzyszek via llvm-dev
2017-Apr-07 20:28 UTC
[llvm-dev] RFC: Representing unions in TBAA
I'm asking if people are ok with it. :) We've had customer reports that can be tracked down to this issue, so this is something we'd really like to working (at least in terms of correctness). I can come up with a patch. -Krzysztof On 4/7/2017 3:25 PM, Daniel Berlin wrote:> Not familiar with clang enough to know. > Staring at it, it looks a bit annoying to do. > You'd basically just want to stop decorating loads/stores with tbaa info > if it's a union. > > > On Fri, Apr 7, 2017 at 12:09 PM, Krzysztof Parzyszek via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > Can we turn off TBAA info for union member accesses in clang before > this gets fixed? > > -Krzysztof > > > On 3/1/2017 5:30 PM, Daniel Berlin via llvm-dev wrote: > > So, https://bugs.llvm.org/show_bug.cgi?id=32056 > <https://bugs.llvm.org/show_bug.cgi?id=32056> is an example showing > our current TBAA tree for union generation is definitely > irretrievably > broken. > I'll be honest here. I'm pretty sure your proposal doesn't go > far enough. > But truthfully, I would rather see us come closer to a > representation > we know works, which is GCC's. > Let me try to simplify what you are suggesting, and what we have. > Our current representation is basically inverted from GCC, but > we don't > allow things that would enable it to work. > > Given > union {int a, short b}; > > GCC's will be: > > union > / \ > short int > > > Everything is implicitly a subset of alias set 0 (which for C/C++ is > char) just to avoid representing it. > > Our metadata has no child links, and only a single parent link. > > You can't represent the above form because you can't make a > single short > node a child of every union/struct it needs to be (lack of multiple > parents means you can't just frob them all to offset zero). > > Your suggestion is to invert this as a struct > > short int > | / > union > > We don't allow multiple parents, however. > Because of that, you suggest we follow all nodes for unions, special > casing union-type nodes somehow > > Let me suggest something different: > > We know the current structure fails us in a number of ways. > Instead of trying to shoehorn this into our current structure, I > suggest: we stop messing around and just have a GCC style tree, and > represent the children instead of the parents. > We make contained types descendants instead of ancestors. > We allow multiple children at each offset and for scalar type > nodes.x` > > We could do that without changing to children, but honestly, i > strongly > believe nobody who ever looks at this really understands it > right now. > (If someone really does like the ancestor form, i'd love to > understand > why :P) > > So if we are going to change it, we should change it to > something that > is easier to understand. > > something like: > > scalar type node -> {name, children nodes} > struct node -> {name, array of {offset, child node} } > > Paths are separate from the tbaa tree itself, and are just: > path node -> {struct node, offset} or whatever. > > unions are just scalar type nodes with multiple children, not struct > nodes with special-cased offset zero. > > This also has a well-defined upgrade path. > > For "old-style" DAGs, like exist now, we know we have to regen the > metadata, and that it is wrong (So we could, for example, simply > disable > it for correctness reasons) > . > Anything with a "new-style" DAG, we know is usable. > > In this representation, the most generic tbaa node is just the > one that > contains the other. > If neither contains the other, you can just make one that has > both as > children and use that. > (instead of now, where you'd have to have multiple parents again). > > (The above also may be better for other languages) > > --Dan > > > > > On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron > <perrons at ca.ibm.com <mailto:perrons at ca.ibm.com> > <mailto:perrons at ca.ibm.com <mailto:perrons at ca.ibm.com>>> wrote: > > Seems like the comments have stopped. I'll try to get a patch > together. Then we can continue the discussion from there. > > Hubert, as for the issue with the llvm optimizations losing > the TBAA > information, it should be the responsibility to make sure the > aliasing is changed in the correct way. One function related to > this has already been mentioned: getMostGenericTBAA. > > Later, > Steven Perron > > > > From: Hubert Tong <hubert.reinterpretcast at gmail.com > <mailto:hubert.reinterpretcast at gmail.com> > <mailto:hubert.reinterpretcast at gmail.com > <mailto:hubert.reinterpretcast at gmail.com>>> > To: Steven Perron/Toronto/IBM at IBMCA > Cc: Daniel Berlin <dberlin at dberlin.org > <mailto:dberlin at dberlin.org> > <mailto:dberlin at dberlin.org <mailto:dberlin at dberlin.org>>>, > llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > <mailto:llvm-dev at lists.llvm.org > <mailto:llvm-dev at lists.llvm.org>>>, Sanjoy Das > <sanjoy at playingwithpointers.com > <mailto:sanjoy at playingwithpointers.com> > <mailto:sanjoy at playingwithpointers.com > <mailto:sanjoy at playingwithpointers.com>>> > Date: 2017/02/15 07:44 AM > Subject: Re: [llvm-dev] RFC: Representing unions in TBAA > > ------------------------------------------------------------------------ > > > > On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron > <_perrons at ca.ibm.com_ <mailto:perrons at ca.ibm.com > <mailto:perrons at ca.ibm.com>>> wrote: > 3) How should we handle a reference directly through a > union, and a > reference that is not through the union? > > My solution was to look for each member of the union > overlaps the > given offset, and see if any of those members aliased the other > reference. If no member aliases the other reference, then the > answer is no alias. Otherwise the answer is may alias. The run > time for this would be proportional to "distance to the root" * > "number of overlapping members". This could be slow if > there are > unions with many members or many unions of unions. > > Another option is to say that they do not alias. This would > mean > that all references to unions must be explicitly through the > union. > From what I gather from the thread so far, the access > through the > union may be lost because of LLVM transformations. I am not sure > how, in the face of that, TBAA could indicate NoAlias safely > (without the risk of functional-correctness issues in correct > programs) between types which overlap within any union > (within some > portion of the program). > > As for the standards, it is definitely not true that all > references > to unions must be explicitly through the union. However, if > you are > trying to perform union-based type punning (under C11), then it > appears that it is intended that the read must be through > the union. > > This would be the least restrictive aliasing allowing the most > optimization. The implementation would be simple. I believe we > make the parent of the TBAA node for the union to be "omnipotent > char". This might be similar to treating the union TBAA > node more > like a scalar node instead of a struct-path. Then the > traversal of > the TBAA nodes will be quick. I'll have to work this out a bit > more, but, if this is good enough to meet the requirements > of the > standard, I can try to think this through a little more. > I'll need > Hubert and Daniel to comment on that since I am no expert on > the C > and C++ standards. > > The third option is to be pessimistic and say "may alias" > all of the > time (conservatively correct), and rely on other alias > analysis to > improve it. This will have good compile time, but could hinder > optimization. Personally I do not like this option. Most > of the > time it will not have a negative effect, but there will be a > reasonable number of programs where this will hurt > optimization more > that it needs to. > > > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > hosted by The Linux Foundation > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > >-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Daniel Berlin via llvm-dev
2017-Apr-07 20:29 UTC
[llvm-dev] RFC: Representing unions in TBAA
Ah. IMHO, yes we should disable it until it's correct. On Fri, Apr 7, 2017 at 1:28 PM, Krzysztof Parzyszek <kparzysz at codeaurora.org> wrote:> I'm asking if people are ok with it. :) > We've had customer reports that can be tracked down to this issue, so this > is something we'd really like to working (at least in terms of correctness). > > I can come up with a patch. > > -Krzysztof > > > On 4/7/2017 3:25 PM, Daniel Berlin wrote: > >> Not familiar with clang enough to know. >> Staring at it, it looks a bit annoying to do. >> You'd basically just want to stop decorating loads/stores with tbaa info >> if it's a union. >> >> >> On Fri, Apr 7, 2017 at 12:09 PM, Krzysztof Parzyszek via llvm-dev >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Can we turn off TBAA info for union member accesses in clang before >> this gets fixed? >> >> -Krzysztof >> >> >> On 3/1/2017 5:30 PM, Daniel Berlin via llvm-dev wrote: >> >> So, https://bugs.llvm.org/show_bug.cgi?id=32056 >> <https://bugs.llvm.org/show_bug.cgi?id=32056> is an example >> showing >> our current TBAA tree for union generation is definitely >> irretrievably >> broken. >> I'll be honest here. I'm pretty sure your proposal doesn't go >> far enough. >> But truthfully, I would rather see us come closer to a >> representation >> we know works, which is GCC's. >> Let me try to simplify what you are suggesting, and what we have. >> Our current representation is basically inverted from GCC, but >> we don't >> allow things that would enable it to work. >> >> Given >> union {int a, short b}; >> >> GCC's will be: >> >> union >> / \ >> short int >> >> >> Everything is implicitly a subset of alias set 0 (which for C/C++ >> is >> char) just to avoid representing it. >> >> Our metadata has no child links, and only a single parent link. >> >> You can't represent the above form because you can't make a >> single short >> node a child of every union/struct it needs to be (lack of >> multiple >> parents means you can't just frob them all to offset zero). >> >> Your suggestion is to invert this as a struct >> >> short int >> | / >> union >> >> We don't allow multiple parents, however. >> Because of that, you suggest we follow all nodes for unions, >> special >> casing union-type nodes somehow >> >> Let me suggest something different: >> >> We know the current structure fails us in a number of ways. >> Instead of trying to shoehorn this into our current structure, I >> suggest: we stop messing around and just have a GCC style tree, >> and >> represent the children instead of the parents. >> We make contained types descendants instead of ancestors. >> We allow multiple children at each offset and for scalar type >> nodes.x` >> >> We could do that without changing to children, but honestly, i >> strongly >> believe nobody who ever looks at this really understands it >> right now. >> (If someone really does like the ancestor form, i'd love to >> understand >> why :P) >> >> So if we are going to change it, we should change it to >> something that >> is easier to understand. >> >> something like: >> >> scalar type node -> {name, children nodes} >> struct node -> {name, array of {offset, child node} } >> >> Paths are separate from the tbaa tree itself, and are just: >> path node -> {struct node, offset} or whatever. >> >> unions are just scalar type nodes with multiple children, not >> struct >> nodes with special-cased offset zero. >> >> This also has a well-defined upgrade path. >> >> For "old-style" DAGs, like exist now, we know we have to regen the >> metadata, and that it is wrong (So we could, for example, simply >> disable >> it for correctness reasons) >> . >> Anything with a "new-style" DAG, we know is usable. >> >> In this representation, the most generic tbaa node is just the >> one that >> contains the other. >> If neither contains the other, you can just make one that has >> both as >> children and use that. >> (instead of now, where you'd have to have multiple parents again). >> >> (The above also may be better for other languages) >> >> --Dan >> >> >> >> >> On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron >> <perrons at ca.ibm.com <mailto:perrons at ca.ibm.com> >> <mailto:perrons at ca.ibm.com <mailto:perrons at ca.ibm.com>>> wrote: >> >> Seems like the comments have stopped. I'll try to get a patch >> together. Then we can continue the discussion from there. >> >> Hubert, as for the issue with the llvm optimizations losing >> the TBAA >> information, it should be the responsibility to make sure the >> aliasing is changed in the correct way. One function related >> to >> this has already been mentioned: getMostGenericTBAA. >> >> Later, >> Steven Perron >> >> >> >> From: Hubert Tong <hubert.reinterpretcast at gmail.com >> <mailto:hubert.reinterpretcast at gmail.com> >> <mailto:hubert.reinterpretcast at gmail.com >> <mailto:hubert.reinterpretcast at gmail.com>>> >> To: Steven Perron/Toronto/IBM at IBMCA >> Cc: Daniel Berlin <dberlin at dberlin.org >> <mailto:dberlin at dberlin.org> >> <mailto:dberlin at dberlin.org <mailto:dberlin at dberlin.org>>>, >> llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org >> > >> <mailto:llvm-dev at lists.llvm.org >> <mailto:llvm-dev at lists.llvm.org>>>, Sanjoy Das >> <sanjoy at playingwithpointers.com >> <mailto:sanjoy at playingwithpointers.com> >> <mailto:sanjoy at playingwithpointers.com >> >> <mailto:sanjoy at playingwithpointers.com>>> >> Date: 2017/02/15 07:44 AM >> Subject: Re: [llvm-dev] RFC: Representing unions in >> TBAA >> >> ------------------------------------------------------------ >> ------------ >> >> >> >> On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron >> <_perrons at ca.ibm.com_ <mailto:perrons at ca.ibm.com >> <mailto:perrons at ca.ibm.com>>> wrote: >> 3) How should we handle a reference directly through a >> union, and a >> reference that is not through the union? >> >> My solution was to look for each member of the union >> overlaps the >> given offset, and see if any of those members aliased the >> other >> reference. If no member aliases the other reference, then the >> answer is no alias. Otherwise the answer is may alias. The >> run >> time for this would be proportional to "distance to the >> root" * >> "number of overlapping members". This could be slow if >> there are >> unions with many members or many unions of unions. >> >> Another option is to say that they do not alias. This would >> mean >> that all references to unions must be explicitly through the >> union. >> From what I gather from the thread so far, the access >> through the >> union may be lost because of LLVM transformations. I am not >> sure >> how, in the face of that, TBAA could indicate NoAlias safely >> (without the risk of functional-correctness issues in correct >> programs) between types which overlap within any union >> (within some >> portion of the program). >> >> As for the standards, it is definitely not true that all >> references >> to unions must be explicitly through the union. However, if >> you are >> trying to perform union-based type punning (under C11), then >> it >> appears that it is intended that the read must be through >> the union. >> >> This would be the least restrictive aliasing allowing the most >> optimization. The implementation would be simple. I believe >> we >> make the parent of the TBAA node for the union to be >> "omnipotent >> char". This might be similar to treating the union TBAA >> node more >> like a scalar node instead of a struct-path. Then the >> traversal of >> the TBAA nodes will be quick. I'll have to work this out a >> bit >> more, but, if this is good enough to meet the requirements >> of the >> standard, I can try to think this through a little more. >> I'll need >> Hubert and Daniel to comment on that since I am no expert on >> the C >> and C++ standards. >> >> The third option is to be pessimistic and say "may alias" >> all of the >> time (conservatively correct), and rely on other alias >> analysis to >> improve it. This will have good compile time, but could >> hinder >> optimization. Personally I do not like this option. Most >> of the >> time it will not have a negative effect, but there will be a >> reasonable number of programs where this will hurt >> optimization more >> that it needs to. >> >> >> >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >> >> >> -- >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, >> hosted by The Linux Foundation >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >> >> >> > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted > by The Linux Foundation >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170407/215e72b3/attachment.html>