David Blaikie via llvm-dev
2021-Oct-15 19:06 UTC
[llvm-dev] distinct DISubprograms hindering sharing inlined subprogram descriptions
(came across this while looking for other things) This is still an issue: $ cat a.cpp #include "a.h" void f3() { f2(); } $ cat b.cpp #include "a.h" void f3(); void f1() { } int main() { f3(); f2(); } $ cat a.h void f1(); struct t1 { int x; }; inline __attribute__((always_inline)) void f2(t1 = t1()) { f1(); } $ $ clang++-tot -fuse-ld=lld -flto a.cpp b.cpp -g && llvm-dwarfdump-tot a.out -debug-info | grep "DW_AT_name\|DW_TAG_\|DW_AT_type" | sed -e "s/^............//" DW_TAG_compile_unit DW_AT_name ("a.cpp") DW_TAG_subprogram DW_AT_name ("f2") DW_TAG_formal_parameter DW_AT_type (0x0000003e "t1") DW_TAG_structure_type DW_AT_name ("t1") DW_TAG_member DW_AT_name ("x") DW_AT_type (0x00000054 "int") ... DW_TAG_compile_unit DW_AT_name ("b.cpp") DW_TAG_subprogram DW_AT_name ("f2") DW_TAG_formal_parameter DW_AT_type (0x000000000000003e "t1") ... The type (t1) gets deduplicated/shared across CUs (the DW_AT_type in b.cpp uses FORM_ref_addr) but the f2 subprogram is duplicated - though b.cpp's use of it could use ref_addr, and would if the function hadn't been inlined away. eg: if we move 'f2' to be defined in a.cpp but attributed always_inline, so it does get cross-unit inlined due to LTO: DW_TAG_compile_unit DW_AT_name ("a.cpp") DW_TAG_subprogram DW_AT_name ("f2") DW_TAG_formal_parameter DW_AT_type (0x0000003e "t1") DW_TAG_structure_type DW_AT_name ("t1") DW_TAG_member DW_AT_name ("x") DW_AT_type (0x00000054 "int") DW_TAG_base_type DW_AT_name ("int") DW_TAG_subprogram DW_AT_name ("f3") DW_TAG_inlined_subroutine DW_AT_abstract_origin (0x0000002a "_Z2f22t1") DW_TAG_formal_parameter DW_AT_abstract_origin (0x00000036) DW_TAG_compile_unit DW_AT_name ("b.cpp") DW_TAG_subprogram DW_AT_name ("f1") DW_TAG_subprogram DW_AT_name ("main") DW_AT_type (0x0000000000000054 "int") DW_TAG_inlined_subroutine DW_AT_abstract_origin (0x000000000000002a "_Z2f22t1") So now because the cross-CU inlining happened during LTO after module merging - the merging happens when the function's not inlined, module merge picks one llvm::Function, dropping the other one and dropping the other DISubprogram - and then the inlining kicks in, referencing just that single DIsubprogram. Not sure if it's of interest to anyone/a priority. (side note: Anyone interested in making ThinLTO home type definitions? That'd be great to reduce type duplication - would mean ThinLTO could turn off type units and/or that .o/.dwo/etc files would just generally be smaller anyway) On Sat, Dec 24, 2016 at 10:18 AM David Blaikie <dblaikie at gmail.com> wrote:> > > On Fri, Dec 23, 2016 at 7:25 PM Duncan Exon Smith <dexonsmith at apple.com> > wrote: > >> >> >> On Dec 23, 2016, at 18:36, David Blaikie <dblaikie at gmail.com> wrote: >> >> >> >> On Fri, Dec 23, 2016 at 11:47 AM Duncan P. N. Exon Smith < >> dexonsmith at apple.com> wrote: >> >>> A few disjoint thoughts; sorry they're so delayed (I skimmed the >>> responses below, and I think these are still relevant/not covered >>> elsewhere). >>> >>> Firstly, why *should* DISubprogram definitions be distinct? There were >>> two reasons this was valuable (this was from before there was a cu: link). >>> >>> - It helped to fix long-standing bugs in the IRLinker, where >>> uniqued-DISubprograms in different compile units would coalesce. >>> >> >> Any idea why that ^ was a problem/bugs? >> >> >>> Now that the Function <-> DISubprogram connection has been reversed, >>> I'm not entirely sure it's still a necessary precaution. >>> >>> - It broke a several uniquing cycles involving DISubprogram. The >>> existence of the uniquing cycles meant that affected DISubprograms weren't >>> really uniqued, anyway (since we don't bother to solve graph isomorphism); >>> and breaking the cycles sped up the IRLinker and reduced memory waste. >>> >> >> Ah, fair point - my trivial example didn't have any local variables, so >> didn't hit this problem. They do indeed form cycles and I haven't tested, >> but seems totally reasonable/likely that that would break my simple example >> because cycles break uniquing and all that. >> >> >>> Making DISubprogram uniqued again would either have no real effect >>> (because distinct nodes they reference aren't uniqued anymore either) or >>> bring back uniquing cycles (which would have effects... but not the desired >>> one). >>> >>> But there ought to be a way to get coalescing in the right places. We >>> just need to ensure that the uniqued/coalesced parts of the graph: >>> - are acyclic, and >>> - reference only coalesced nodes (note that leaves can happily include >>> the strangely-coalesced distinct DICompositeType beast I created). >>> >> >> Perhaps before I dive into your suggestion (which, it seems, is to >> separate out the definition but let the declaration/locals form cycles >> >> >> No (new) cycles; in fact this breaks some. >> - the Function itself references the defn and the locals; >> - the locals and the defn reference (transitively) the decl; and >> - the decl doesn't reference any of those. >> >> The only cycle is the decl <-> composite type one that already exists >> (and that odr magic (mostly) fixes). Maybe that one could be removed >> somehow too, but it isn't directly relevant to the problem you're looking >> at. >> > > Oh, yeah - I've an aside there too, maybe for another thread. (punchy > summary: anonymous enums in headers used for constants are technically ODR > violations (well, if they're ever used in an ODR entity - like an inline > function) - they're not public types, so we don't put the mangled name on > them & don't deduplicate them in DWARF type units - and probably similarly > don't deduplicate them in the IR... - not sure what the right answer is > there) > > >> (Note that this is somewhat inspired by Reid's point in his recent debug >> info talk, that CodeView avoids type cycles by breaking types into two >> records.) >> > > Ah, thanks, that helps with me picture it! > > >> >> - that would at least be dropped because the definition would only be >> kept due to it being referenced from the llvm::Function), what about >> something simpler: >> >> What if scope chains didn't go all the way to the top (the subprogram) >> but stopped short of that? >> >> Now, granted - this was part of the great assertion I added several years >> back that caught all kinds of silliness - mostly around inlined call sites >> not having debugloc and leading to scope chains that lead to distinct >> roots. But the alternative to having and maintaining an invariant - is just >> to make things true by construction. All debug info scope chains within a >> function with an associated DISubprogram have an implicit last element of >> that DISubprogram? >> >> >> That seems reasonable, I think, now that the function points directly at >> its subprogram. >> >> I'm not sure it's simpler to get right though. Dropping the scope runs >> the risk of coalescing things incorrectly. I imagine the backend relies on >> pointer distinctness of two local variables called "a" that are in (or are >> parameters of) unrelated subprograms. I think in some situations they >> could be coalesced incorrectly if they had no scope. >> > > Just no top level scope - so it'd only happen if you had two variables > with the same name in the same scope. It might break in debug-having inline > functions inlined into non-debug-having functions (which then might in turn > be inlined into a debug-having function) - would have to check. > > Also, not sure if inlined situations might introduce circularity. Yep. The > scope for inlinedAt DILocations point to the DISubprogram - so that'd > create a cycle. Ah well. > > >> >> That would mean it'd be harder to catch the bugs I found >> >> >> A shame if so, but if we can find another way to catch it or design it >> away, as you mention, it doesn't much matter. >> >> - but I think most of them came from that specific inlining situation (I >> can go back and do some archaeology and see if there's other exposure where >> we could sure up some defenses as well) which can be caught more directly >> (I forget if the inliner now catches it itself, or if we enhanced the debug >> info verifier to catch it after the inliner runs - if it's the verifier, >> then that wouldn't suffice if we made the change I'm suggesting - and we'd >> have to hook it up in the inliner itself) >> >> Thoughts? >> >> >> I've been thinking idly whether we can/should break DISubprogram apart >>> into uniqued (coalesce-able) and distinct parts. Still being someone >>> ignorant of DWARF itself (and CodeView, for that matter), I'm not entirely >>> sure. But... >>> >>> Here's my understanding of the current uses of DISubprogram (where I'm >>> using DISubprogramDecl for the usually uniqued declarations, and >>> DISubprogramDefn for the always distinct definitions): >>> >>> // !dbg >>> Function -> DISubprogram >>> >>> // Declaration (optional, only for member functions). >>> DISubprogramDefn -> DISubprogramDecl >>> >>> // Variables: >>> DISubprogramDefn -> MDNode -> DILocalVariable >>> >>> // Terminator for scope chains: >>> DILocalVariable [-> DILocalScope [-> ...]] -> DISubprogramDefn >>> DILocation [-> DILocalScope [-> ...]] -> DISubprogramDefn >>> DICompositeType -> DISubprogramDefn // function-local type >>> >>> // Elements (member functions): >>> DICompositeType -> MDNode -> DISubprogramDecl >>> DISubprogramDecl -> DICompositeType >>> DISubprogramDefn -> DICompositeType >>> >>> >>> And here are the changes I would consider: >>> >>> 1. Split DISubprogramDecl and DISubprogramDefn into different types. >>> >>> 2. Remove definition-specific fields from DISubprogramDecl. >>> - variables, a "dangerous" source of uniquing cycles. >>> - declaration, which it will never have. >>> - scopeLine, which it will never have. >>> - unit, which the Defn will have when it's relevant. >>> >>> 3. Make a DISubprogramDecl mandatory for DISubprogramDefn. >>> >>> 4. Remove redundant fields from DISubprogramDefn. >>> - name and linkageName are on the declaration. >>> - type is redundant. >>> - flags is likely redundant? >>> >>> Nothing so far has made any real changes, I think. DISubprogramDecl >>> should coalesce nicely between modules though. >>> >>> 5. Migrate links to point at DISubprogramDecl instead of >>> DISubprogramDefn. >>> - DILocalVariable scope chain. >>> - DILocation scope chain. >>> - DICompositeType scope. >>> >>> I don't think this will create any uniquing cycles. DISubprogramDecl >>> won't point at any of these. Also, DILocalVariable should start coalescing >>> with each other. >>> >>> Since DISubprogramDefn is only referenced from the Function !dbg >>> attachment, only one should survive function-coalescing. >>> >>> One open question, since I don't know the backend code, is: can you emit >>> the "correct" DWARF after step (5)? I'm hoping you only need the Defn of >>> the Function that has instructions referencing the >>> DILocalVariable/DILocation. In which case, we suddenly get a lot of >>> coalescing. >>> >>> On 2016-Dec-15, at 10:54, David Blaikie <dblaikie at gmail.com> wrote: >>> >>> Branching off from a discussion of improvements to DIGlobalVariable >>> representations that Adrian's working on - got me thinking about related >>> changes that have already been made to DISubprogram. >>> >>> To reduce duplicate debug info when things like linkonce_odr functions >>> were deduplicated in LTO linking, the relationship between a CU and >>> DISubprogram was inverted (instead of a CU maintaining a list >>> of subprograms, subprograms specify which CU they come from - and the >>> llvm::Function references the DISubprogram, so if the llvm::Function goes >>> away, so does the associated DISubprogram) >>> >>> I'm not sure if this caused a regression, but at least seems to miss a >>> possible improvement: >>> >>> During IR linking (for LTO, ThinLTO, etc) these distinct DISubprogram >>> definitions (& their CU link, even if they weren't marked 'distinct', the >>> CU link would cause them to effectively be so) remain separate - this means >>> that inlined versions in one CU don't refer to an existing subprogram >>> definition in another CU. >>> >>> To demonstrate: >>> inl.h: >>> void f1(); >>> inline __attribute__((always_inline)) void f2() { >>> f1(); >>> } >>> inl1.cpp: >>> #include "inl.h" >>> void c1() { >>> f2(); >>> } >>> inl2.cpp: >>> #include "inl.h" >>> void c2() { >>> f2(); >>> } >>> >>> Compile to IR, llvm-link the result. The DWARF you get is basically the >>> same as the DWARF you'd get without linking: >>> >>> DW_TAG_compile_unit >>> DW_AT_name "inl1.cpp" >>> DW_TAG_subprogram #0 >>> DW_AT_name "f2" >>> DW_TAG_subprogram >>> DW_AT_name "c1" >>> DW_TAG_inlined_subroutine >>> DW_TAG_abstract_origin #0 "f2" >>> DW_TAG_compile_unit >>> DW_AT_name "inl2.cpp" >>> DW_TAG_subprogram #1 >>> DW_AT_name "f2" >>> DW_TAG_subprogram >>> DW_AT_name "c2" >>> DW_TAG_inlined_subroutine >>> DW_TAG_abstract_origin #1 "f2" >>> >>> Instead of something more like this: >>> >>> DW_TAG_compile_unit >>> DW_AT_name "inl1.cpp" >>> DW_TAG_subprogram #0 >>> DW_AT_name "f2" >>> DW_TAG_subprogram >>> DW_AT_name "c1" >>> DW_TAG_inlined_subroutine >>> DW_TAG_abstract_origin #0 "f2" >>> DW_TAG_compile_unit >>> DW_AT_name "inl2.cpp" >>> DW_TAG_subprogram >>> DW_AT_name "c2" >>> DW_TAG_inlined_subroutine >>> DW_TAG_abstract_origin #0 "f2" >>> >>> (note that only one abstract definition of f2 is produced here) >>> >>> Any thoughts? I imagine this is probably worth a reasonable amount of >>> savings in an optimized build. Not huge, but not nothing. (probably not the >>> top of anyone's list though, I realize) >>> >>> Should we remove the CU link from a non-internal linkage subprogram (& >>> this may have an effect on the GV representation issue originally being >>> discussed) and just emit it into whichever CU happens to need it first? >>> >>> This might be slightly sub-optimal, due to, say, the namespace being >>> foreign to that CU. But it's how we do types currently, I think? So at >>> least it'd be consistent and probably cheap enough/better. >>> >>> >>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211015/97d747cc/attachment.html>
Reid Kleckner via llvm-dev
2021-Oct-15 19:25 UTC
[llvm-dev] distinct DISubprograms hindering sharing inlined subprogram descriptions
Sorry for derailing into your aside, but... On Fri, Oct 15, 2021 at 12:07 PM David Blaikie <dblaikie at gmail.com> wrote:> (side note: Anyone interested in making ThinLTO home type definitions? > That'd be great to reduce type duplication - would mean ThinLTO could turn > off type units and/or that .o/.dwo/etc files would just generally be > smaller anyway) >Hey, I think that is a great idea! ThinLTO tends to be used in release build configurations, which tend to have debug info enabled. The size of symbols in release build config matters a lot because it is typically archived for a long time. COFF & MachO platforms have other ways to deduplicate types (PDBs & dsymutil), but less duplicate stuff always makes things faster. For ELF users already using type units, this optimization will only allow us to recover the overhead of type units, which I recall is significant. In our evaluation of the feature for Chrome, we found it increased the final binary size significantly: crbug.com/1031936#c4 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211015/7d6d7d15/attachment.html>