David Blaikie via llvm-dev
2016-Dec-16 00:20 UTC
[llvm-dev] distinct DISubprograms hindering sharing inlined subprogram descriptions
On Thu, Dec 15, 2016 at 4:17 PM Teresa Johnson <tejohnson at google.com> wrote:> On Thu, Dec 15, 2016 at 2:08 PM, David Blaikie <dblaikie at gmail.com> wrote: > > On Thu, Dec 15, 2016 at 1:30 PM Teresa Johnson <tejohnson at google.com> > wrote: > > On Thu, Dec 15, 2016 at 11:38 AM, David Blaikie <dblaikie at gmail.com> > wrote: > > > > On Thu, Dec 15, 2016 at 11:26 AM Teresa Johnson <tejohnson at google.com> > wrote: > > Trying to wrap my brain around this, so a few questions below. =) > > > Sure thing - sorry, did assume a bit too much arcane context here. > > > > On Thu, Dec 15, 2016 at 10:54 AM, David Blaikie <dblaikie at gmail.com> > wrote: > > Branching off from a discussion of improvements to DIGlobalVariable > representations that Adrian's working on - got me thinking about related > changes that have already been made to DISubprogram. > > To reduce duplicate debug info when things like linkonce_odr functions > were deduplicated in LTO linking, the relationship between a CU and > DISubprogram was inverted (instead of a CU maintaining a list of > subprograms, subprograms specify which CU they come from - and the > llvm::Function references the DISubprogram, so if the llvm::Function goes > away, so does the associated DISubprogram) > > I'm not sure if this caused a regression, but at least seems to miss a > possible improvement: > > During IR linking (for LTO, ThinLTO, etc) these distinct DISubprogram > definitions (& their CU link, even if they weren't marked 'distinct', the > CU link would cause them to effectively be so) remain separate - this means > that inlined versions in one CU don't refer to an existing subprogram > definition in another CU. > > To demonstrate: > inl.h: > void f1(); > inline __attribute__((always_inline)) void f2() { > f1(); > } > inl1.cpp: > #include "inl.h" > void c1() { > f2(); > } > inl2.cpp: > #include "inl.h" > void c2() { > f2(); > } > > Compile to IR, llvm-link the result. The DWARF you get is basically the > same as the DWARF you'd get without linking: > > DW_TAG_compile_unit > DW_AT_name "inl1.cpp" > DW_TAG_subprogram #0 > DW_AT_name "f2" > DW_TAG_subprogram > DW_AT_name "c1" > DW_TAG_inlined_subroutine > DW_TAG_abstract_origin #0 "f2" > DW_TAG_compile_unit > DW_AT_name "inl2.cpp" > DW_TAG_subprogram #1 > DW_AT_name "f2" > DW_TAG_subprogram > DW_AT_name "c2" > DW_TAG_inlined_subroutine > DW_TAG_abstract_origin #1 "f2" > > Instead of something more like this: > > DW_TAG_compile_unit > DW_AT_name "inl1.cpp" > DW_TAG_subprogram #0 > DW_AT_name "f2" > DW_TAG_subprogram > DW_AT_name "c1" > DW_TAG_inlined_subroutine > DW_TAG_abstract_origin #0 "f2" > DW_TAG_compile_unit > DW_AT_name "inl2.cpp" > DW_TAG_subprogram > DW_AT_name "c2" > DW_TAG_inlined_subroutine > DW_TAG_abstract_origin #0 "f2" > > (note that only one abstract definition of f2 is produced here) > > > I think I understand what you are saying. Essentially, having the SP->CU > link allows the SP to be deduplicated when multiple *outline* copies of the > corresponding function are deduplicated. But not when the multiple copies > are inlined, as it looks like we need all the copies, right? > > > Not quite - having the SP->CU link (well,h onestly, marking the SP as > "distinct" does this, but even if we didn't do that, the SP->CU link would > still do it) causes SPs /not/ to be deduplicated on IR linking. > > Each SP is distinct/not considered duplicate with any other. (if we didn't > mark it 'distinct', the fact that each SP refers to its corresponding CU > would produce the same effect - they wouldn't be deduplicated because they > aren't identical - they refer to different CUs) > > For non-inlined cases, this is fine. > > Before we inverted the SP<>CU link, what would happen is that all copies > of the llvm::Function would be dropped, but their SPs would be left around. > So two CUs that both used the same linkonce_odr function (let's say no > inlining actually occurred though) would both have a SP description in the > DWARF - but one would actual have a proper definition (with a high/low PC, > etc) the other would be missing those features, as though the function had > been optimized away (which it sort of had) > > So by reversing the link, we got rid of those extra SP descriptions in the > DWARF (and the extra SP descriptions in the metadata - I think they were > duplicate back then because they still had a scope chain leading back to > their CU (maybe we had gotten rid of that chain - if we had, then adding it > back in may've actually caused more metadata, but less DWARF)) > > > I almost followed all of this, until I got to this last bit. I understood > from above that with the SP->CU link (and distinct SPs), prevented > deduplication. But this last bit sounds like we are in fact removing the > duplicates in the DWARF and possibly also in the metadata. > > > Ah, right - I can see how it reads that way, sorry. > > Old old way: > > first.ll: > CU1 -> {fn1_SP -> @fn1, inl_SP -> @inl, ... } > @fn1 ... > @inl ... > Resulting DWARF: > compile_unit CU1 > subprogram fn1 > high/low pc, etc > subprogram inl > high/low pc, etc > > second.ll: > CU2 -> {inl_SP2 -> @inl, SP2 -> @fn2, ... } > @inl ... > @fn2 ... > Resulting DWARF: > compile_unit CU2 > subprogram inl > high/low pc, etc > subprogram fn2 > high/low pc, etc > > link first.ll + second.ll: > CU1 -> {fn1_SP -> @fn1, inl_SP -> @inl, ... } > CU2 -> {inl_SP2 -> null, SP2 -> @fn2, ... } > @fn1 ... > @inl ... > @fn2 ... > Resulting DWARF: > compile_unit CU1 > subprogram fn1 > high/low pc, etc > subprogram inl > high/low pc, etc > compile_unit CU2 > subprogram inl > name, but no high/low pc - this is unnecessary > subprogram fn2 > high/low pc, etc > > New way: > CU1 > @fn1 -> fn1_SP -> CU1 > @inl -> inl_SP -> CU1 > Resulting DWARF: > compile_unit CU1 > subprogram fn1 > high/low pc, etc > subprogram inl > high/low pc, etc > > second.ll: > CU2 > @inl -> inl_SP -> CU2 > @fn2 -> fn2_SP -> CU2 > Resulting DWARF: > compile_unit CU2 > subprogram inl > high/low pc, etc > subprogram fn2 > high/low pc, etc > > link first.ll + second.ll (we pick @inl from first.ll in this example): > CU1 > CU2 > @fn1 -> fn1_SP -> CU1 > @inl -> inl_SP -> CU1 > @fn2 -> fn2_SP -> CU2 > Resulting DWARF: > compile_unit CU1 > subprogram fn1 > high/low pc, etc > subprogram inl > high/low pc, etc > compile_unit CU2 > subprogram fn2 > high/low pc, etc > > So inverting the links causes us to completely drop the redundant > description of 'inl' that appeared in C2 when the function was not inlined. > > But if the function /is/ inlined, then the inlined location descriptions > that remain in @fn2 (assuming there was a call to @inl in @fn1 and @fn2) > still point to that original (CU2) version of @inl - causing it to to be > emitted into CU2. > > Whereas for type descriptions we don't do this - the type has no CU link, > so they all get deduplicated and even if @fn2 has a parameter of the same > type as @fn1 - we emit the type into CU1 when we first encounter it (when > emitting @fn1) and then reference it whenever we need it, even when > emitting @fn2 in the other CU. > > > Ok, great, thanks for the detailed explanation! I was missing the > distinction between the location descriptions and the type descriptions. > > > > > > > > > > > Any thoughts? I imagine this is probably worth a reasonable amount of > savings in an optimized build. Not huge, but not nothing. (probably not the > top of anyone's list though, I realize) > > Should we remove the CU link from a non-internal linkage subprogram (& > this may have an effect on the GV representation issue originally being > discussed) and just emit it into whichever CU happens to need it first? > > > I can see how this would be done in LTO where the compiler has full > visibility. For ThinLTO presumably we would need to do some index-based > marking? Can we at least do something when we import an inlined SP and drop > it since we know it is defined elsewhere? > > > Complete visibility isn't required to benefit here - and unfortunately > there's nothing fancier (that I know of) that we can do to avoid emitting > one definition of each used inline function in each thinlto object file we > produce (we can't say "oh, the name of the function, its mangled name, the > names and types of its parameters are over in that other object > file/somewhere else" - but we can avoid emitting those descriptions in each > /CU/ that uses the inlined function within a single ThinLTO object) > > I can provide some more thorough examples if that'd be helpful :) > > > Ok, I think I understand. This is only emitting once per object file, > which with ThinLTO can contain multiple CUs due to importing. But then with > full LTO it sounds like we would be in even better shape, since it has a > single module with all the CUs? > > > Right - currently we emit once per CU, but with a change in format we > could emit once per object file - which hurts ThinLTO over non-LTO (because > ThinLTO produces more CUs (due to imports) per object file) and is neutral > for full LTO (since it produces the same number of CUs, just in one object > file). > > Improving this representation to produce once per object would help get > ThinLTO back what it's currently paying - and improve full LTO further than > its current position in this regard. > > But the gains might not be major/significant - I've done nothing to assess > that, just observing that it is suboptimal. > > Thanks for asking/helping me explain it further, hopefully this is more > descriptive. > > > Great, thanks for pointing out this opportunity! When I get a chance > probably tomorrow I will give your prototype patch a try and see how much > difference it makes for Chromium. >Ah - the patch fixes the backend/LLVM to cope with this representational change (at least for the one tiny example) but doesn't fix Clang to produce the new representation (I just hand hacked the IR in a simple example to see if it worked). I could work with you/work up a change that might do that too, if that's handy/worth testing. - Dave> > Teresa > > > > - Dave > > > > Teresa > > > > > Thanks, > Teresa > > > > This might be slightly sub-optimal, due to, say, the namespace being > foreign to that CU. But it's how we do types currently, I think? So at > least it'd be consistent and probably cheap enough/better. > > > > > -- > Teresa Johnson | Software Engineer | tejohnson at google.com | > 408-460-2413 <(408)%20460-2413> > > > > > -- > Teresa Johnson | Software Engineer | tejohnson at google.com | > 408-460-2413 <(408)%20460-2413> > > > > > -- > Teresa Johnson | Software Engineer | tejohnson at google.com | > 408-460-2413 <(408)%20460-2413> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161216/7492f493/attachment.html>
Teresa Johnson via llvm-dev
2016-Dec-16 00:26 UTC
[llvm-dev] distinct DISubprograms hindering sharing inlined subprogram descriptions
On Thu, Dec 15, 2016 at 4:20 PM, David Blaikie <dblaikie at gmail.com> wrote:> > > On Thu, Dec 15, 2016 at 4:17 PM Teresa Johnson <tejohnson at google.com> > wrote: > >> On Thu, Dec 15, 2016 at 2:08 PM, David Blaikie <dblaikie at gmail.com> >> wrote: >> >> On Thu, Dec 15, 2016 at 1:30 PM Teresa Johnson <tejohnson at google.com> >> wrote: >> >> On Thu, Dec 15, 2016 at 11:38 AM, David Blaikie <dblaikie at gmail.com> >> wrote: >> >> >> >> On Thu, Dec 15, 2016 at 11:26 AM Teresa Johnson <tejohnson at google.com> >> wrote: >> >> Trying to wrap my brain around this, so a few questions below. =) >> >> >> Sure thing - sorry, did assume a bit too much arcane context here. >> >> >> >> On Thu, Dec 15, 2016 at 10:54 AM, David Blaikie <dblaikie at gmail.com> >> wrote: >> >> Branching off from a discussion of improvements to DIGlobalVariable >> representations that Adrian's working on - got me thinking about related >> changes that have already been made to DISubprogram. >> >> To reduce duplicate debug info when things like linkonce_odr functions >> were deduplicated in LTO linking, the relationship between a CU and >> DISubprogram was inverted (instead of a CU maintaining a list of >> subprograms, subprograms specify which CU they come from - and the >> llvm::Function references the DISubprogram, so if the llvm::Function goes >> away, so does the associated DISubprogram) >> >> I'm not sure if this caused a regression, but at least seems to miss a >> possible improvement: >> >> During IR linking (for LTO, ThinLTO, etc) these distinct DISubprogram >> definitions (& their CU link, even if they weren't marked 'distinct', the >> CU link would cause them to effectively be so) remain separate - this means >> that inlined versions in one CU don't refer to an existing subprogram >> definition in another CU. >> >> To demonstrate: >> inl.h: >> void f1(); >> inline __attribute__((always_inline)) void f2() { >> f1(); >> } >> inl1.cpp: >> #include "inl.h" >> void c1() { >> f2(); >> } >> inl2.cpp: >> #include "inl.h" >> void c2() { >> f2(); >> } >> >> Compile to IR, llvm-link the result. The DWARF you get is basically the >> same as the DWARF you'd get without linking: >> >> DW_TAG_compile_unit >> DW_AT_name "inl1.cpp" >> DW_TAG_subprogram #0 >> DW_AT_name "f2" >> DW_TAG_subprogram >> DW_AT_name "c1" >> DW_TAG_inlined_subroutine >> DW_TAG_abstract_origin #0 "f2" >> DW_TAG_compile_unit >> DW_AT_name "inl2.cpp" >> DW_TAG_subprogram #1 >> DW_AT_name "f2" >> DW_TAG_subprogram >> DW_AT_name "c2" >> DW_TAG_inlined_subroutine >> DW_TAG_abstract_origin #1 "f2" >> >> Instead of something more like this: >> >> DW_TAG_compile_unit >> DW_AT_name "inl1.cpp" >> DW_TAG_subprogram #0 >> DW_AT_name "f2" >> DW_TAG_subprogram >> DW_AT_name "c1" >> DW_TAG_inlined_subroutine >> DW_TAG_abstract_origin #0 "f2" >> DW_TAG_compile_unit >> DW_AT_name "inl2.cpp" >> DW_TAG_subprogram >> DW_AT_name "c2" >> DW_TAG_inlined_subroutine >> DW_TAG_abstract_origin #0 "f2" >> >> (note that only one abstract definition of f2 is produced here) >> >> >> I think I understand what you are saying. Essentially, having the SP->CU >> link allows the SP to be deduplicated when multiple *outline* copies of the >> corresponding function are deduplicated. But not when the multiple copies >> are inlined, as it looks like we need all the copies, right? >> >> >> Not quite - having the SP->CU link (well,h onestly, marking the SP as >> "distinct" does this, but even if we didn't do that, the SP->CU link would >> still do it) causes SPs /not/ to be deduplicated on IR linking. >> >> Each SP is distinct/not considered duplicate with any other. (if we >> didn't mark it 'distinct', the fact that each SP refers to its >> corresponding CU would produce the same effect - they wouldn't be >> deduplicated because they aren't identical - they refer to different CUs) >> >> For non-inlined cases, this is fine. >> >> Before we inverted the SP<>CU link, what would happen is that all copies >> of the llvm::Function would be dropped, but their SPs would be left around. >> So two CUs that both used the same linkonce_odr function (let's say no >> inlining actually occurred though) would both have a SP description in the >> DWARF - but one would actual have a proper definition (with a high/low PC, >> etc) the other would be missing those features, as though the function had >> been optimized away (which it sort of had) >> >> So by reversing the link, we got rid of those extra SP descriptions in >> the DWARF (and the extra SP descriptions in the metadata - I think they >> were duplicate back then because they still had a scope chain leading back >> to their CU (maybe we had gotten rid of that chain - if we had, then adding >> it back in may've actually caused more metadata, but less DWARF)) >> >> >> I almost followed all of this, until I got to this last bit. I understood >> from above that with the SP->CU link (and distinct SPs), prevented >> deduplication. But this last bit sounds like we are in fact removing the >> duplicates in the DWARF and possibly also in the metadata. >> >> >> Ah, right - I can see how it reads that way, sorry. >> >> Old old way: >> >> first.ll: >> CU1 -> {fn1_SP -> @fn1, inl_SP -> @inl, ... } >> @fn1 ... >> @inl ... >> Resulting DWARF: >> compile_unit CU1 >> subprogram fn1 >> high/low pc, etc >> subprogram inl >> high/low pc, etc >> >> second.ll: >> CU2 -> {inl_SP2 -> @inl, SP2 -> @fn2, ... } >> @inl ... >> @fn2 ... >> Resulting DWARF: >> compile_unit CU2 >> subprogram inl >> high/low pc, etc >> subprogram fn2 >> high/low pc, etc >> >> link first.ll + second.ll: >> CU1 -> {fn1_SP -> @fn1, inl_SP -> @inl, ... } >> CU2 -> {inl_SP2 -> null, SP2 -> @fn2, ... } >> @fn1 ... >> @inl ... >> @fn2 ... >> Resulting DWARF: >> compile_unit CU1 >> subprogram fn1 >> high/low pc, etc >> subprogram inl >> high/low pc, etc >> compile_unit CU2 >> subprogram inl >> name, but no high/low pc - this is unnecessary >> subprogram fn2 >> high/low pc, etc >> >> New way: >> CU1 >> @fn1 -> fn1_SP -> CU1 >> @inl -> inl_SP -> CU1 >> Resulting DWARF: >> compile_unit CU1 >> subprogram fn1 >> high/low pc, etc >> subprogram inl >> high/low pc, etc >> >> second.ll: >> CU2 >> @inl -> inl_SP -> CU2 >> @fn2 -> fn2_SP -> CU2 >> Resulting DWARF: >> compile_unit CU2 >> subprogram inl >> high/low pc, etc >> subprogram fn2 >> high/low pc, etc >> >> link first.ll + second.ll (we pick @inl from first.ll in this example): >> CU1 >> CU2 >> @fn1 -> fn1_SP -> CU1 >> @inl -> inl_SP -> CU1 >> @fn2 -> fn2_SP -> CU2 >> Resulting DWARF: >> compile_unit CU1 >> subprogram fn1 >> high/low pc, etc >> subprogram inl >> high/low pc, etc >> compile_unit CU2 >> subprogram fn2 >> high/low pc, etc >> >> So inverting the links causes us to completely drop the redundant >> description of 'inl' that appeared in C2 when the function was not inlined. >> >> But if the function /is/ inlined, then the inlined location descriptions >> that remain in @fn2 (assuming there was a call to @inl in @fn1 and @fn2) >> still point to that original (CU2) version of @inl - causing it to to be >> emitted into CU2. >> >> Whereas for type descriptions we don't do this - the type has no CU link, >> so they all get deduplicated and even if @fn2 has a parameter of the same >> type as @fn1 - we emit the type into CU1 when we first encounter it (when >> emitting @fn1) and then reference it whenever we need it, even when >> emitting @fn2 in the other CU. >> >> >> Ok, great, thanks for the detailed explanation! I was missing the >> distinction between the location descriptions and the type descriptions. >> >> >> >> >> >> >> >> >> >> >> Any thoughts? I imagine this is probably worth a reasonable amount of >> savings in an optimized build. Not huge, but not nothing. (probably not the >> top of anyone's list though, I realize) >> >> Should we remove the CU link from a non-internal linkage subprogram (& >> this may have an effect on the GV representation issue originally being >> discussed) and just emit it into whichever CU happens to need it first? >> >> >> I can see how this would be done in LTO where the compiler has full >> visibility. For ThinLTO presumably we would need to do some index-based >> marking? Can we at least do something when we import an inlined SP and drop >> it since we know it is defined elsewhere? >> >> >> Complete visibility isn't required to benefit here - and unfortunately >> there's nothing fancier (that I know of) that we can do to avoid emitting >> one definition of each used inline function in each thinlto object file we >> produce (we can't say "oh, the name of the function, its mangled name, the >> names and types of its parameters are over in that other object >> file/somewhere else" - but we can avoid emitting those descriptions in each >> /CU/ that uses the inlined function within a single ThinLTO object) >> >> I can provide some more thorough examples if that'd be helpful :) >> >> >> Ok, I think I understand. This is only emitting once per object file, >> which with ThinLTO can contain multiple CUs due to importing. But then with >> full LTO it sounds like we would be in even better shape, since it has a >> single module with all the CUs? >> >> >> Right - currently we emit once per CU, but with a change in format we >> could emit once per object file - which hurts ThinLTO over non-LTO (because >> ThinLTO produces more CUs (due to imports) per object file) and is neutral >> for full LTO (since it produces the same number of CUs, just in one object >> file). >> >> Improving this representation to produce once per object would help get >> ThinLTO back what it's currently paying - and improve full LTO further than >> its current position in this regard. >> >> But the gains might not be major/significant - I've done nothing to >> assess that, just observing that it is suboptimal. >> >> Thanks for asking/helping me explain it further, hopefully this is more >> descriptive. >> >> >> Great, thanks for pointing out this opportunity! When I get a chance >> probably tomorrow I will give your prototype patch a try and see how much >> difference it makes for Chromium. >> > > Ah - the patch fixes the backend/LLVM to cope with this representational > change (at least for the one tiny example) but doesn't fix Clang to produce > the new representation (I just hand hacked the IR in a simple example to > see if it worked). I could work with you/work up a change that might do > that too, if that's handy/worth testing. >Oh, right. Maybe I can get an idea of how much this would save. Is there a decent heuristic to measure the redundancy? Teresa> > - Dave > > >> >> Teresa >> >> >> >> - Dave >> >> >> >> Teresa >> >> >> >> >> Thanks, >> Teresa >> >> >> >> This might be slightly sub-optimal, due to, say, the namespace being >> foreign to that CU. But it's how we do types currently, I think? So at >> least it'd be consistent and probably cheap enough/better. >> >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohnson at google.com | >> 408-460-2413 <(408)%20460-2413> >> >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohnson at google.com | >> 408-460-2413 <(408)%20460-2413> >> >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohnson at google.com | >> 408-460-2413 <(408)%20460-2413> >> >-- Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161215/bda6af40/attachment-0001.html>
David Blaikie via llvm-dev
2016-Dec-16 00:30 UTC
[llvm-dev] distinct DISubprograms hindering sharing inlined subprogram descriptions
The Clang changes might not take much longer (O(minutes)) than the LLVM changes. But I wouldn't be totally confident in any of it scaling to a big codebase - might be a bunch of small issues/cases I wouldn't've handled. Might be worth spending a little (~half an hour, an hour) on all up, I guess. I can't think of a great heuristic for this, really. Measuring debug info size impact isn't too easy, really, short of prototyping/experimenting. (at least at this level of "how many of this kind of thing do we produce/how big are they generally") On Thu, Dec 15, 2016 at 4:26 PM Teresa Johnson <tejohnson at google.com> wrote:> On Thu, Dec 15, 2016 at 4:20 PM, David Blaikie <dblaikie at gmail.com> wrote: > > > > On Thu, Dec 15, 2016 at 4:17 PM Teresa Johnson <tejohnson at google.com> > wrote: > > On Thu, Dec 15, 2016 at 2:08 PM, David Blaikie <dblaikie at gmail.com> wrote: > > On Thu, Dec 15, 2016 at 1:30 PM Teresa Johnson <tejohnson at google.com> > wrote: > > On Thu, Dec 15, 2016 at 11:38 AM, David Blaikie <dblaikie at gmail.com> > wrote: > > > > On Thu, Dec 15, 2016 at 11:26 AM Teresa Johnson <tejohnson at google.com> > wrote: > > Trying to wrap my brain around this, so a few questions below. =) > > > Sure thing - sorry, did assume a bit too much arcane context here. > > > > On Thu, Dec 15, 2016 at 10:54 AM, David Blaikie <dblaikie at gmail.com> > wrote: > > Branching off from a discussion of improvements to DIGlobalVariable > representations that Adrian's working on - got me thinking about related > changes that have already been made to DISubprogram. > > To reduce duplicate debug info when things like linkonce_odr functions > were deduplicated in LTO linking, the relationship between a CU and > DISubprogram was inverted (instead of a CU maintaining a list of > subprograms, subprograms specify which CU they come from - and the > llvm::Function references the DISubprogram, so if the llvm::Function goes > away, so does the associated DISubprogram) > > I'm not sure if this caused a regression, but at least seems to miss a > possible improvement: > > During IR linking (for LTO, ThinLTO, etc) these distinct DISubprogram > definitions (& their CU link, even if they weren't marked 'distinct', the > CU link would cause them to effectively be so) remain separate - this means > that inlined versions in one CU don't refer to an existing subprogram > definition in another CU. > > To demonstrate: > inl.h: > void f1(); > inline __attribute__((always_inline)) void f2() { > f1(); > } > inl1.cpp: > #include "inl.h" > void c1() { > f2(); > } > inl2.cpp: > #include "inl.h" > void c2() { > f2(); > } > > Compile to IR, llvm-link the result. The DWARF you get is basically the > same as the DWARF you'd get without linking: > > DW_TAG_compile_unit > DW_AT_name "inl1.cpp" > DW_TAG_subprogram #0 > DW_AT_name "f2" > DW_TAG_subprogram > DW_AT_name "c1" > DW_TAG_inlined_subroutine > DW_TAG_abstract_origin #0 "f2" > DW_TAG_compile_unit > DW_AT_name "inl2.cpp" > DW_TAG_subprogram #1 > DW_AT_name "f2" > DW_TAG_subprogram > DW_AT_name "c2" > DW_TAG_inlined_subroutine > DW_TAG_abstract_origin #1 "f2" > > Instead of something more like this: > > DW_TAG_compile_unit > DW_AT_name "inl1.cpp" > DW_TAG_subprogram #0 > DW_AT_name "f2" > DW_TAG_subprogram > DW_AT_name "c1" > DW_TAG_inlined_subroutine > DW_TAG_abstract_origin #0 "f2" > DW_TAG_compile_unit > DW_AT_name "inl2.cpp" > DW_TAG_subprogram > DW_AT_name "c2" > DW_TAG_inlined_subroutine > DW_TAG_abstract_origin #0 "f2" > > (note that only one abstract definition of f2 is produced here) > > > I think I understand what you are saying. Essentially, having the SP->CU > link allows the SP to be deduplicated when multiple *outline* copies of the > corresponding function are deduplicated. But not when the multiple copies > are inlined, as it looks like we need all the copies, right? > > > Not quite - having the SP->CU link (well,h onestly, marking the SP as > "distinct" does this, but even if we didn't do that, the SP->CU link would > still do it) causes SPs /not/ to be deduplicated on IR linking. > > Each SP is distinct/not considered duplicate with any other. (if we didn't > mark it 'distinct', the fact that each SP refers to its corresponding CU > would produce the same effect - they wouldn't be deduplicated because they > aren't identical - they refer to different CUs) > > For non-inlined cases, this is fine. > > Before we inverted the SP<>CU link, what would happen is that all copies > of the llvm::Function would be dropped, but their SPs would be left around. > So two CUs that both used the same linkonce_odr function (let's say no > inlining actually occurred though) would both have a SP description in the > DWARF - but one would actual have a proper definition (with a high/low PC, > etc) the other would be missing those features, as though the function had > been optimized away (which it sort of had) > > So by reversing the link, we got rid of those extra SP descriptions in the > DWARF (and the extra SP descriptions in the metadata - I think they were > duplicate back then because they still had a scope chain leading back to > their CU (maybe we had gotten rid of that chain - if we had, then adding it > back in may've actually caused more metadata, but less DWARF)) > > > I almost followed all of this, until I got to this last bit. I understood > from above that with the SP->CU link (and distinct SPs), prevented > deduplication. But this last bit sounds like we are in fact removing the > duplicates in the DWARF and possibly also in the metadata. > > > Ah, right - I can see how it reads that way, sorry. > > Old old way: > > first.ll: > CU1 -> {fn1_SP -> @fn1, inl_SP -> @inl, ... } > @fn1 ... > @inl ... > Resulting DWARF: > compile_unit CU1 > subprogram fn1 > high/low pc, etc > subprogram inl > high/low pc, etc > > second.ll: > CU2 -> {inl_SP2 -> @inl, SP2 -> @fn2, ... } > @inl ... > @fn2 ... > Resulting DWARF: > compile_unit CU2 > subprogram inl > high/low pc, etc > subprogram fn2 > high/low pc, etc > > link first.ll + second.ll: > CU1 -> {fn1_SP -> @fn1, inl_SP -> @inl, ... } > CU2 -> {inl_SP2 -> null, SP2 -> @fn2, ... } > @fn1 ... > @inl ... > @fn2 ... > Resulting DWARF: > compile_unit CU1 > subprogram fn1 > high/low pc, etc > subprogram inl > high/low pc, etc > compile_unit CU2 > subprogram inl > name, but no high/low pc - this is unnecessary > subprogram fn2 > high/low pc, etc > > New way: > CU1 > @fn1 -> fn1_SP -> CU1 > @inl -> inl_SP -> CU1 > Resulting DWARF: > compile_unit CU1 > subprogram fn1 > high/low pc, etc > subprogram inl > high/low pc, etc > > second.ll: > CU2 > @inl -> inl_SP -> CU2 > @fn2 -> fn2_SP -> CU2 > Resulting DWARF: > compile_unit CU2 > subprogram inl > high/low pc, etc > subprogram fn2 > high/low pc, etc > > link first.ll + second.ll (we pick @inl from first.ll in this example): > CU1 > CU2 > @fn1 -> fn1_SP -> CU1 > @inl -> inl_SP -> CU1 > @fn2 -> fn2_SP -> CU2 > Resulting DWARF: > compile_unit CU1 > subprogram fn1 > high/low pc, etc > subprogram inl > high/low pc, etc > compile_unit CU2 > subprogram fn2 > high/low pc, etc > > So inverting the links causes us to completely drop the redundant > description of 'inl' that appeared in C2 when the function was not inlined. > > But if the function /is/ inlined, then the inlined location descriptions > that remain in @fn2 (assuming there was a call to @inl in @fn1 and @fn2) > still point to that original (CU2) version of @inl - causing it to to be > emitted into CU2. > > Whereas for type descriptions we don't do this - the type has no CU link, > so they all get deduplicated and even if @fn2 has a parameter of the same > type as @fn1 - we emit the type into CU1 when we first encounter it (when > emitting @fn1) and then reference it whenever we need it, even when > emitting @fn2 in the other CU. > > > Ok, great, thanks for the detailed explanation! I was missing the > distinction between the location descriptions and the type descriptions. > > > > > > > > > > > Any thoughts? I imagine this is probably worth a reasonable amount of > savings in an optimized build. Not huge, but not nothing. (probably not the > top of anyone's list though, I realize) > > Should we remove the CU link from a non-internal linkage subprogram (& > this may have an effect on the GV representation issue originally being > discussed) and just emit it into whichever CU happens to need it first? > > > I can see how this would be done in LTO where the compiler has full > visibility. For ThinLTO presumably we would need to do some index-based > marking? Can we at least do something when we import an inlined SP and drop > it since we know it is defined elsewhere? > > > Complete visibility isn't required to benefit here - and unfortunately > there's nothing fancier (that I know of) that we can do to avoid emitting > one definition of each used inline function in each thinlto object file we > produce (we can't say "oh, the name of the function, its mangled name, the > names and types of its parameters are over in that other object > file/somewhere else" - but we can avoid emitting those descriptions in each > /CU/ that uses the inlined function within a single ThinLTO object) > > I can provide some more thorough examples if that'd be helpful :) > > > Ok, I think I understand. This is only emitting once per object file, > which with ThinLTO can contain multiple CUs due to importing. But then with > full LTO it sounds like we would be in even better shape, since it has a > single module with all the CUs? > > > Right - currently we emit once per CU, but with a change in format we > could emit once per object file - which hurts ThinLTO over non-LTO (because > ThinLTO produces more CUs (due to imports) per object file) and is neutral > for full LTO (since it produces the same number of CUs, just in one object > file). > > Improving this representation to produce once per object would help get > ThinLTO back what it's currently paying - and improve full LTO further than > its current position in this regard. > > But the gains might not be major/significant - I've done nothing to assess > that, just observing that it is suboptimal. > > Thanks for asking/helping me explain it further, hopefully this is more > descriptive. > > > Great, thanks for pointing out this opportunity! When I get a chance > probably tomorrow I will give your prototype patch a try and see how much > difference it makes for Chromium. > > > Ah - the patch fixes the backend/LLVM to cope with this representational > change (at least for the one tiny example) but doesn't fix Clang to produce > the new representation (I just hand hacked the IR in a simple example to > see if it worked). I could work with you/work up a change that might do > that too, if that's handy/worth testing. > > > Oh, right. Maybe I can get an idea of how much this would save. Is there a > decent heuristic to measure the redundancy? > Teresa > > > > - Dave > > > > Teresa > > > > - Dave > > > > Teresa > > > > > Thanks, > Teresa > > > > This might be slightly sub-optimal, due to, say, the namespace being > foreign to that CU. But it's how we do types currently, I think? So at > least it'd be consistent and probably cheap enough/better. > > > > > -- > Teresa Johnson | Software Engineer | tejohnson at google.com | > 408-460-2413 <(408)%20460-2413> > > > > > -- > Teresa Johnson | Software Engineer | tejohnson at google.com | > 408-460-2413 <(408)%20460-2413> > > > > > -- > Teresa Johnson | Software Engineer | tejohnson at google.com | > 408-460-2413 <(408)%20460-2413> > > > > > -- > Teresa Johnson | Software Engineer | tejohnson at google.com | > 408-460-2413 <(408)%20460-2413> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161216/7099d3f9/attachment-0001.html>
Possibly Parallel Threads
- distinct DISubprograms hindering sharing inlined subprogram descriptions
- distinct DISubprograms hindering sharing inlined subprogram descriptions
- distinct DISubprograms hindering sharing inlined subprogram descriptions
- distinct DISubprograms hindering sharing inlined subprogram descriptions
- sampling from data.frame