Daniel Berlin via llvm-dev
2017-Apr-07 20:29 UTC
[llvm-dev] RFC: Representing unions in TBAA
Ah. IMHO, yes we should disable it until it's correct. On Fri, Apr 7, 2017 at 1:28 PM, Krzysztof Parzyszek <kparzysz at codeaurora.org> wrote:> I'm asking if people are ok with it. :) > We've had customer reports that can be tracked down to this issue, so this > is something we'd really like to working (at least in terms of correctness). > > I can come up with a patch. > > -Krzysztof > > > On 4/7/2017 3:25 PM, Daniel Berlin wrote: > >> Not familiar with clang enough to know. >> Staring at it, it looks a bit annoying to do. >> You'd basically just want to stop decorating loads/stores with tbaa info >> if it's a union. >> >> >> On Fri, Apr 7, 2017 at 12:09 PM, Krzysztof Parzyszek via llvm-dev >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Can we turn off TBAA info for union member accesses in clang before >> this gets fixed? >> >> -Krzysztof >> >> >> On 3/1/2017 5:30 PM, Daniel Berlin via llvm-dev wrote: >> >> So, https://bugs.llvm.org/show_bug.cgi?id=32056 >> <https://bugs.llvm.org/show_bug.cgi?id=32056> is an example >> showing >> our current TBAA tree for union generation is definitely >> irretrievably >> broken. >> I'll be honest here. I'm pretty sure your proposal doesn't go >> far enough. >> But truthfully, I would rather see us come closer to a >> representation >> we know works, which is GCC's. >> Let me try to simplify what you are suggesting, and what we have. >> Our current representation is basically inverted from GCC, but >> we don't >> allow things that would enable it to work. >> >> Given >> union {int a, short b}; >> >> GCC's will be: >> >> union >> / \ >> short int >> >> >> Everything is implicitly a subset of alias set 0 (which for C/C++ >> is >> char) just to avoid representing it. >> >> Our metadata has no child links, and only a single parent link. >> >> You can't represent the above form because you can't make a >> single short >> node a child of every union/struct it needs to be (lack of >> multiple >> parents means you can't just frob them all to offset zero). >> >> Your suggestion is to invert this as a struct >> >> short int >> | / >> union >> >> We don't allow multiple parents, however. >> Because of that, you suggest we follow all nodes for unions, >> special >> casing union-type nodes somehow >> >> Let me suggest something different: >> >> We know the current structure fails us in a number of ways. >> Instead of trying to shoehorn this into our current structure, I >> suggest: we stop messing around and just have a GCC style tree, >> and >> represent the children instead of the parents. >> We make contained types descendants instead of ancestors. >> We allow multiple children at each offset and for scalar type >> nodes.x` >> >> We could do that without changing to children, but honestly, i >> strongly >> believe nobody who ever looks at this really understands it >> right now. >> (If someone really does like the ancestor form, i'd love to >> understand >> why :P) >> >> So if we are going to change it, we should change it to >> something that >> is easier to understand. >> >> something like: >> >> scalar type node -> {name, children nodes} >> struct node -> {name, array of {offset, child node} } >> >> Paths are separate from the tbaa tree itself, and are just: >> path node -> {struct node, offset} or whatever. >> >> unions are just scalar type nodes with multiple children, not >> struct >> nodes with special-cased offset zero. >> >> This also has a well-defined upgrade path. >> >> For "old-style" DAGs, like exist now, we know we have to regen the >> metadata, and that it is wrong (So we could, for example, simply >> disable >> it for correctness reasons) >> . >> Anything with a "new-style" DAG, we know is usable. >> >> In this representation, the most generic tbaa node is just the >> one that >> contains the other. >> If neither contains the other, you can just make one that has >> both as >> children and use that. >> (instead of now, where you'd have to have multiple parents again). >> >> (The above also may be better for other languages) >> >> --Dan >> >> >> >> >> On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron >> <perrons at ca.ibm.com <mailto:perrons at ca.ibm.com> >> <mailto:perrons at ca.ibm.com <mailto:perrons at ca.ibm.com>>> wrote: >> >> Seems like the comments have stopped. I'll try to get a patch >> together. Then we can continue the discussion from there. >> >> Hubert, as for the issue with the llvm optimizations losing >> the TBAA >> information, it should be the responsibility to make sure the >> aliasing is changed in the correct way. One function related >> to >> this has already been mentioned: getMostGenericTBAA. >> >> Later, >> Steven Perron >> >> >> >> From: Hubert Tong <hubert.reinterpretcast at gmail.com >> <mailto:hubert.reinterpretcast at gmail.com> >> <mailto:hubert.reinterpretcast at gmail.com >> <mailto:hubert.reinterpretcast at gmail.com>>> >> To: Steven Perron/Toronto/IBM at IBMCA >> Cc: Daniel Berlin <dberlin at dberlin.org >> <mailto:dberlin at dberlin.org> >> <mailto:dberlin at dberlin.org <mailto:dberlin at dberlin.org>>>, >> llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org >> > >> <mailto:llvm-dev at lists.llvm.org >> <mailto:llvm-dev at lists.llvm.org>>>, Sanjoy Das >> <sanjoy at playingwithpointers.com >> <mailto:sanjoy at playingwithpointers.com> >> <mailto:sanjoy at playingwithpointers.com >> >> <mailto:sanjoy at playingwithpointers.com>>> >> Date: 2017/02/15 07:44 AM >> Subject: Re: [llvm-dev] RFC: Representing unions in >> TBAA >> >> ------------------------------------------------------------ >> ------------ >> >> >> >> On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron >> <_perrons at ca.ibm.com_ <mailto:perrons at ca.ibm.com >> <mailto:perrons at ca.ibm.com>>> wrote: >> 3) How should we handle a reference directly through a >> union, and a >> reference that is not through the union? >> >> My solution was to look for each member of the union >> overlaps the >> given offset, and see if any of those members aliased the >> other >> reference. If no member aliases the other reference, then the >> answer is no alias. Otherwise the answer is may alias. The >> run >> time for this would be proportional to "distance to the >> root" * >> "number of overlapping members". This could be slow if >> there are >> unions with many members or many unions of unions. >> >> Another option is to say that they do not alias. This would >> mean >> that all references to unions must be explicitly through the >> union. >> From what I gather from the thread so far, the access >> through the >> union may be lost because of LLVM transformations. I am not >> sure >> how, in the face of that, TBAA could indicate NoAlias safely >> (without the risk of functional-correctness issues in correct >> programs) between types which overlap within any union >> (within some >> portion of the program). >> >> As for the standards, it is definitely not true that all >> references >> to unions must be explicitly through the union. However, if >> you are >> trying to perform union-based type punning (under C11), then >> it >> appears that it is intended that the read must be through >> the union. >> >> This would be the least restrictive aliasing allowing the most >> optimization. The implementation would be simple. I believe >> we >> make the parent of the TBAA node for the union to be >> "omnipotent >> char". This might be similar to treating the union TBAA >> node more >> like a scalar node instead of a struct-path. Then the >> traversal of >> the TBAA nodes will be quick. I'll have to work this out a >> bit >> more, but, if this is good enough to meet the requirements >> of the >> standard, I can try to think this through a little more. >> I'll need >> Hubert and Daniel to comment on that since I am no expert on >> the C >> and C++ standards. >> >> The third option is to be pessimistic and say "may alias" >> all of the >> time (conservatively correct), and rely on other alias >> analysis to >> improve it. This will have good compile time, but could >> hinder >> optimization. Personally I do not like this option. Most >> of the >> time it will not have a negative effect, but there will be a >> reasonable number of programs where this will hurt >> optimization more >> that it needs to. >> >> >> >> >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >> >> >> -- >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, >> hosted by The Linux Foundation >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >> >> >> > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted > by The Linux Foundation >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170407/215e72b3/attachment.html>
I would like to see it disabled as well until fixed, we have problem compiling HHVM with clang due to this union+TBAA problem. -Xin -- Software Engineer Employee of Facebook Inc. On Fri, Apr 7, 2017 at 1:29 PM, Daniel Berlin via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Ah. > IMHO, yes we should disable it until it's correct. > > > On Fri, Apr 7, 2017 at 1:28 PM, Krzysztof Parzyszek > <kparzysz at codeaurora.org> wrote: >> >> I'm asking if people are ok with it. :) >> We've had customer reports that can be tracked down to this issue, so this >> is something we'd really like to working (at least in terms of correctness). >> >> I can come up with a patch. >> >> -Krzysztof >> >> >> On 4/7/2017 3:25 PM, Daniel Berlin wrote: >>> >>> Not familiar with clang enough to know. >>> Staring at it, it looks a bit annoying to do. >>> You'd basically just want to stop decorating loads/stores with tbaa info >>> if it's a union. >>> >>> >>> On Fri, Apr 7, 2017 at 12:09 PM, Krzysztof Parzyszek via llvm-dev >>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >>> >>> Can we turn off TBAA info for union member accesses in clang before >>> this gets fixed? >>> >>> -Krzysztof >>> >>> >>> On 3/1/2017 5:30 PM, Daniel Berlin via llvm-dev wrote: >>> >>> So, https://bugs.llvm.org/show_bug.cgi?id=32056 >>> <https://bugs.llvm.org/show_bug.cgi?id=32056> is an example >>> showing >>> our current TBAA tree for union generation is definitely >>> irretrievably >>> broken. >>> I'll be honest here. I'm pretty sure your proposal doesn't go >>> far enough. >>> But truthfully, I would rather see us come closer to a >>> representation >>> we know works, which is GCC's. >>> Let me try to simplify what you are suggesting, and what we have. >>> Our current representation is basically inverted from GCC, but >>> we don't >>> allow things that would enable it to work. >>> >>> Given >>> union {int a, short b}; >>> >>> GCC's will be: >>> >>> union >>> / \ >>> short int >>> >>> >>> Everything is implicitly a subset of alias set 0 (which for C/C++ >>> is >>> char) just to avoid representing it. >>> >>> Our metadata has no child links, and only a single parent link. >>> >>> You can't represent the above form because you can't make a >>> single short >>> node a child of every union/struct it needs to be (lack of >>> multiple >>> parents means you can't just frob them all to offset zero). >>> >>> Your suggestion is to invert this as a struct >>> >>> short int >>> | / >>> union >>> >>> We don't allow multiple parents, however. >>> Because of that, you suggest we follow all nodes for unions, >>> special >>> casing union-type nodes somehow >>> >>> Let me suggest something different: >>> >>> We know the current structure fails us in a number of ways. >>> Instead of trying to shoehorn this into our current structure, I >>> suggest: we stop messing around and just have a GCC style tree, >>> and >>> represent the children instead of the parents. >>> We make contained types descendants instead of ancestors. >>> We allow multiple children at each offset and for scalar type >>> nodes.x` >>> >>> We could do that without changing to children, but honestly, i >>> strongly >>> believe nobody who ever looks at this really understands it >>> right now. >>> (If someone really does like the ancestor form, i'd love to >>> understand >>> why :P) >>> >>> So if we are going to change it, we should change it to >>> something that >>> is easier to understand. >>> >>> something like: >>> >>> scalar type node -> {name, children nodes} >>> struct node -> {name, array of {offset, child node} } >>> >>> Paths are separate from the tbaa tree itself, and are just: >>> path node -> {struct node, offset} or whatever. >>> >>> unions are just scalar type nodes with multiple children, not >>> struct >>> nodes with special-cased offset zero. >>> >>> This also has a well-defined upgrade path. >>> >>> For "old-style" DAGs, like exist now, we know we have to regen >>> the >>> metadata, and that it is wrong (So we could, for example, simply >>> disable >>> it for correctness reasons) >>> . >>> Anything with a "new-style" DAG, we know is usable. >>> >>> In this representation, the most generic tbaa node is just the >>> one that >>> contains the other. >>> If neither contains the other, you can just make one that has >>> both as >>> children and use that. >>> (instead of now, where you'd have to have multiple parents >>> again). >>> >>> (The above also may be better for other languages) >>> >>> --Dan >>> >>> >>> >>> >>> On Tue, Feb 28, 2017 at 4:44 PM, Steven Perron >>> <perrons at ca.ibm.com <mailto:perrons at ca.ibm.com> >>> <mailto:perrons at ca.ibm.com <mailto:perrons at ca.ibm.com>>> wrote: >>> >>> Seems like the comments have stopped. I'll try to get a >>> patch >>> together. Then we can continue the discussion from there. >>> >>> Hubert, as for the issue with the llvm optimizations losing >>> the TBAA >>> information, it should be the responsibility to make sure the >>> aliasing is changed in the correct way. One function related >>> to >>> this has already been mentioned: getMostGenericTBAA. >>> >>> Later, >>> Steven Perron >>> >>> >>> >>> From: Hubert Tong <hubert.reinterpretcast at gmail.com >>> <mailto:hubert.reinterpretcast at gmail.com> >>> <mailto:hubert.reinterpretcast at gmail.com >>> <mailto:hubert.reinterpretcast at gmail.com>>> >>> To: Steven Perron/Toronto/IBM at IBMCA >>> Cc: Daniel Berlin <dberlin at dberlin.org >>> <mailto:dberlin at dberlin.org> >>> <mailto:dberlin at dberlin.org <mailto:dberlin at dberlin.org>>>, >>> llvm-dev <llvm-dev at lists.llvm.org >>> <mailto:llvm-dev at lists.llvm.org> >>> <mailto:llvm-dev at lists.llvm.org >>> <mailto:llvm-dev at lists.llvm.org>>>, Sanjoy Das >>> <sanjoy at playingwithpointers.com >>> <mailto:sanjoy at playingwithpointers.com> >>> <mailto:sanjoy at playingwithpointers.com >>> >>> <mailto:sanjoy at playingwithpointers.com>>> >>> Date: 2017/02/15 07:44 AM >>> Subject: Re: [llvm-dev] RFC: Representing unions in >>> TBAA >>> >>> >>> ------------------------------------------------------------------------ >>> >>> >>> >>> On Tue, Feb 14, 2017 at 11:22 PM, Steven Perron >>> <_perrons at ca.ibm.com_ <mailto:perrons at ca.ibm.com >>> <mailto:perrons at ca.ibm.com>>> wrote: >>> 3) How should we handle a reference directly through a >>> union, and a >>> reference that is not through the union? >>> >>> My solution was to look for each member of the union >>> overlaps the >>> given offset, and see if any of those members aliased the >>> other >>> reference. If no member aliases the other reference, then >>> the >>> answer is no alias. Otherwise the answer is may alias. The >>> run >>> time for this would be proportional to "distance to the >>> root" * >>> "number of overlapping members". This could be slow if >>> there are >>> unions with many members or many unions of unions. >>> >>> Another option is to say that they do not alias. This would >>> mean >>> that all references to unions must be explicitly through the >>> union. >>> From what I gather from the thread so far, the access >>> through the >>> union may be lost because of LLVM transformations. I am not >>> sure >>> how, in the face of that, TBAA could indicate NoAlias safely >>> (without the risk of functional-correctness issues in correct >>> programs) between types which overlap within any union >>> (within some >>> portion of the program). >>> >>> As for the standards, it is definitely not true that all >>> references >>> to unions must be explicitly through the union. However, if >>> you are >>> trying to perform union-based type punning (under C11), then >>> it >>> appears that it is intended that the read must be through >>> the union. >>> >>> This would be the least restrictive aliasing allowing the >>> most >>> optimization. The implementation would be simple. I believe >>> we >>> make the parent of the TBAA node for the union to be >>> "omnipotent >>> char". This might be similar to treating the union TBAA >>> node more >>> like a scalar node instead of a struct-path. Then the >>> traversal of >>> the TBAA nodes will be quick. I'll have to work this out a >>> bit >>> more, but, if this is good enough to meet the requirements >>> of the >>> standard, I can try to think this through a little more. >>> I'll need >>> Hubert and Daniel to comment on that since I am no expert on >>> the C >>> and C++ standards. >>> >>> The third option is to be pessimistic and say "may alias" >>> all of the >>> time (conservatively correct), and rely on other alias >>> analysis to >>> improve it. This will have good compile time, but could >>> hinder >>> optimization. Personally I do not like this option. Most >>> of the >>> time it will not have a negative effect, but there will be a >>> reasonable number of programs where this will hurt >>> optimization more >>> that it needs to. >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >>> >>> >>> -- >>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, >>> hosted by The Linux Foundation >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> >>> >>> >> >> -- >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted >> by The Linux Foundation > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Krzysztof Parzyszek via llvm-dev
2017-Apr-10 15:09 UTC
[llvm-dev] RFC: Representing unions in TBAA
On 4/9/2017 8:40 PM, Xin Tong wrote:> I would like to see it disabled as well until fixed, we have problem > compiling HHVM with clang due to this union+TBAA problem.I posted a patch at https://reviews.llvm.org/D31885. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation