I was trying to add some stronger assertions in the verifier around !tbaa metadata when I ran into an ambiguity: I think the encoding of the metadata nodes are such that a given node can be interpreted as either a struct type node or a scalar tbaa node. I'd like a sanity check before I try to fix or work around this. Consider some IR that I got from running clang over a small C++ program: ``` define void @foo() { ... load ..., !tbaa !2 load ..., !tbaa !7 load ..., !tbaa !10 ... } !2 = !{!3, !5, i64 0} !3 = !{!"T0", !4, i64 0} !4 = !{!"T1", !5, i64 0} !5 = !{!"T2", !6, i64 0} !6 = !{!"Root"} !7 = !{!8, !9, i64 0} !8 = !{!"T3", !9, i64 0} !9 = !{!"T4", !5, i64 0} !10 = !{!9, !9, i64 0} ``` I've erased the actual string names to make the ambiguity more obvious. Here !2 and !7 are both struct tag nodes. This means that !5 and !9 are both scalar type nodes and !3 and !8 are struct type nodes[1]. However, once we get to the first field of !3, !4 at offset 0, things become murkier. !4 could either be a read-write scalar node with a !5 as a parent, or be a struct type node containing !5 at offset 0. I don't see a way to tell the two possibilities apart. The ambiguity shown above is "fine" since (I think, but I'm not sure) that containing an object at offset 0 should be equivalent to "subclassing" it in the TBAA type system. It still makes writing verifier Assertions more difficult than it should be, though. Things get a bit more problematic once we allow for setting the "constant" tag on scalar TBAA. If !4 was !{!"T1", !5, i64 1} then there we'd have a "real" ambiguity between it being a scalar node describing constant memory or a struct type node containing !5 at offset 1. Finally: we have a comment from 2013 in TypeBasedAliasAnalysis that implies scalar TBAA was slated for removal: "After all testing cases are upgraded to use struct-path aware TBAA and we can auto-upgrade existing bc files, the support for scalar TBAA can be dropped." Does anyone have some context for what the motivations were / why the work was stopped? If all the use cases for scalar TBAA can be simulated using struct tbaa then that may be the easiest way to remove the ambiguity. [1]: I've assumed that the base type has to be a struct type node iff it is different from access type node; without it things are even more ambiguous. This isn't explicitly stated today, though. For completeness, here is the C++ source that was used to generate the IR above: struct A { char f; }; struct B { A a; }; struct C { int f; }; int f(B *b, C *c, int *i) { return b->a.f + c->f + *i; } and the metadata was: !2 = !{!3, !5, i64 0} !3 = !{!"_ZTS1B", !4, i64 0} !4 = !{!"_ZTS1A", !5, i64 0} !5 = !{!"omnipotent char", !6, i64 0} !6 = !{!"Simple C++ TBAA"} !7 = !{!8, !9, i64 0} !8 = !{!"_ZTS1C", !9, i64 0} !9 = !{!"int", !5, i64 0} !10 = !{!9, !9, i64 0} -- Sanjoy
> On Nov 1, 2016, at 10:52 AM, Sanjoy Das via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > I was trying to add some stronger assertions in the verifier around > !tbaa metadata when I ran into an ambiguity: I think the encoding of > the metadata nodes are such that a given node can be interpreted as > either a struct type node or a scalar tbaa node. I'd like a sanity > check before I try to fix or work around this. > > Consider some IR that I got from running clang over a small C++ > program: > > ``` > define void @foo() { > ... > load ..., !tbaa !2 > load ..., !tbaa !7 > load ..., !tbaa !10 > ... > } > > !2 = !{!3, !5, i64 0} > !3 = !{!"T0", !4, i64 0} > !4 = !{!"T1", !5, i64 0} > !5 = !{!"T2", !6, i64 0} > !6 = !{!"Root"} > !7 = !{!8, !9, i64 0} > !8 = !{!"T3", !9, i64 0} > !9 = !{!"T4", !5, i64 0} > !10 = !{!9, !9, i64 0} > ``` > > I've erased the actual string names to make the ambiguity more > obvious. > > Here !2 and !7 are both struct tag nodes. This means that !5 and !9 > are both scalar type nodes and !3 and !8 are struct type nodes[1].For struct-path-aware TBAA, the scalar type node has name, parent node, and offset (optional). This simplifies implementation by making formats of struct type nodes and scalar type nodes uniform. Sorry that the comments are confusing. For old scalar TBAA, the scalar node has name, parent node, and “constant”. Why do you need to tell whether it is a struct type node or a scalar type node?> > However, once we get to the first field of !3, !4 at offset 0, things > become murkier. !4 could either be a read-write scalar node with a !5 > as a parent, or be a struct type node containing !5 at offset 0. I > don't see a way to tell the two possibilities apart. > > The ambiguity shown above is "fine" since (I think, but I'm not sure) > that containing an object at offset 0 should be equivalent to > "subclassing" it in the TBAA type system. It still makes writing > verifier Assertions more difficult than it should be, though. > > Things get a bit more problematic once we allow for setting the > "constant" tag on scalar TBAA. If !4 was !{!"T1", !5, i64 1} then > there we'd have a "real" ambiguity between it being a scalar node > describing constant memory or a struct type node containing !5 at > offset 1. > > > Finally: we have a comment from 2013 in TypeBasedAliasAnalysis that > implies scalar TBAA was slated for removal: > > "After all testing cases are upgraded to use struct-path aware TBAA > and we can auto-upgrade existing bc files, the support for scalar TBAA > can be dropped." > > Does anyone have some context for what the motivations were / why the > work was stopped? If all the use cases for scalar TBAA can be > simulated using struct tbaa then that may be the easiest way to remove > the ambiguity.We auto-upgrade from scalar TBAA to struct-path-aware TBAA. I vaguely recall that we keep scalar TBAA around because of outside usage. That is why we still have “createTBAANode” even though clang frontend does not use it. Manman> > [1]: I've assumed that the base type has to be a struct type node iff > it is different from access type node; without it things are even > more ambiguous. This isn't explicitly stated today, though. > > > For completeness, here is the C++ source that was used to generate the > IR above: > > struct A { char f; }; > struct B { A a; }; > struct C { int f; }; > > int f(B *b, C *c, int *i) { > return b->a.f + c->f + *i; > } > > and the metadata was: > > !2 = !{!3, !5, i64 0} > !3 = !{!"_ZTS1B", !4, i64 0} > !4 = !{!"_ZTS1A", !5, i64 0} > !5 = !{!"omnipotent char", !6, i64 0} > !6 = !{!"Simple C++ TBAA"} > !7 = !{!8, !9, i64 0} > !8 = !{!"_ZTS1C", !9, i64 0} > !9 = !{!"int", !5, i64 0} > !10 = !{!9, !9, i64 0} > > > -- Sanjoy > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Hi Manman, Thanks for the quick reply! Manman wrote: >> On Nov 1, 2016, at 10:52 AM, Sanjoy Das via llvm-dev<llvm-dev at lists.llvm.org> wrote: >> >> I was trying to add some stronger assertions in the verifier around >> !tbaa metadata when I ran into an ambiguity: I think the encoding of >> the metadata nodes are such that a given node can be interpreted as >> either a struct type node or a scalar tbaa node. I'd like a sanity >> check before I try to fix or work around this. >> >> Consider some IR that I got from running clang over a small C++ >> program: >> >> ``` >> define void @foo() { >> ... >> load ..., !tbaa !2 >> load ..., !tbaa !7 >> load ..., !tbaa !10 >> ... >> } >> >> !2 = !{!3, !5, i64 0} >> !3 = !{!"T0", !4, i64 0} >> !4 = !{!"T1", !5, i64 0} >> !5 = !{!"T2", !6, i64 0} >> !6 = !{!"Root"} >> !7 = !{!8, !9, i64 0} >> !8 = !{!"T3", !9, i64 0} >> !9 = !{!"T4", !5, i64 0} >> !10 = !{!9, !9, i64 0} >> ``` >> >> I've erased the actual string names to make the ambiguity more >> obvious. >> >> Here !2 and !7 are both struct tag nodes. This means that !5 and !9 >> are both scalar type nodes and !3 and !8 are struct type nodes[1]. > > For struct-path-aware TBAA, the scalar type node has name, parent node, and offset (optional). > This simplifies implementation by making formats of struct type nodes and scalar type nodes uniform. So this answers one of my key questions -- the similarity is not accidental. > Sorry that the comments are confusing. > For old scalar TBAA, the scalar node has name, parent node, and “constant”. > > Why do you need to tell whether it is a struct type node or a scalar type node? Say !4 above is !{!"T1", !5, i64 1}, with the interpretation that it is a scalar type node with the "constant" bit set. Then an access through !4 with the offset 0 may alias with an access through !5. However, if LLVM interprets !4 = !{!"T1", !5, i64 1} as a struct type TBAA node that contains !5 at an offset of 1 byte, then an access through !4 with the offset 0 is no alias with an access through !5. Thus if the compiler thinks the latter and the frontend that generated the TBAA metadata was thinking the former, then we may have a miscompile. The example is somewhat flimsy though, since ideally we'd force struct type nodes to start with a 0 offset. However this isn't part of the specification today. The back-story here is that I think there are some unverified invariants that our TBAA implementation uses which would be nice to enforce in the verifier (in particular, we had a subtle bug in our codebase that would've been caught by a sufficiently smart verifier). The "struct type nodes must start with offset 0" is one. Another one I want to verify is (this is the motivating cause): the final access into the access type in a struct path has to be at offset 0. That is, the following is illegal: ... load ..., !tbaa !2 ... !0 = !{"struct S", !1, 0, !1, 4, !1, 8} !1 = < scalar int > !2 = !{!0, !1, 6} since if the above is allowed, then the implementation of getMostGenericTBAA looks suspect. >> However, once we get to the first field of !3, !4 at offset 0, things >> become murkier. !4 could either be a read-write scalar node with a !5 >> as a parent, or be a struct type node containing !5 at offset 0. I >> don't see a way to tell the two possibilities apart. >> >> The ambiguity shown above is "fine" since (I think, but I'm not sure) >> that containing an object at offset 0 should be equivalent to >> "subclassing" it in the TBAA type system. It still makes writing >> verifier Assertions more difficult than it should be, though. >> >> Things get a bit more problematic once we allow for setting the >> "constant" tag on scalar TBAA. If !4 was !{!"T1", !5, i64 1} then >> there we'd have a "real" ambiguity between it being a scalar node >> describing constant memory or a struct type node containing !5 at >> offset 1. >> >> >> Finally: we have a comment from 2013 in TypeBasedAliasAnalysis that >> implies scalar TBAA was slated for removal: >> >> "After all testing cases are upgraded to use struct-path aware TBAA >> and we can auto-upgrade existing bc files, the support for scalar TBAA >> can be dropped." >> >> Does anyone have some context for what the motivations were / why the >> work was stopped? If all the use cases for scalar TBAA can be >> simulated using struct tbaa then that may be the easiest way to remove >> the ambiguity. > > We auto-upgrade from scalar TBAA to struct-path-aware TBAA. I vaguely recall that we keep scalar TBAA around because of outside usage. > That is why we still have “createTBAANode” even though clang frontend does not use it. Yes, I see that now. Unless you have objections, I'd like to propose dropping support for scalar TBAA metadata, since that lets us get rid of a non-trivial amount of logic from TypeBasedAliasAnalysis. I'll start an RFC on the mailing list. -- Sanjoy