thr3ads.net - llvm dev - [llvm-dev] distinct DISubprograms hindering sharing inlined subprogram descriptions [Dec 2016]

If this information is useful, please help other people find it:
Share via:

David Blaikie via llvm-dev

2016-Dec-15 19:38 UTC

[llvm-dev] distinct DISubprograms hindering sharing inlined subprogram descriptions

On Thu, Dec 15, 2016 at 11:26 AM Teresa Johnson <tejohnson at google.com>
wrote:
> Trying to wrap my brain around this, so a few questions below. =)
>
Sure thing - sorry, did assume a bit too much arcane context here.

>
> On Thu, Dec 15, 2016 at 10:54 AM, David Blaikie <dblaikie at
gmail.com>
> wrote:
>
> Branching off from a discussion of improvements to DIGlobalVariable
> representations that Adrian's working on - got me thinking about
related
> changes that have already been made to DISubprogram.
>
> To reduce duplicate debug info when things like linkonce_odr functions
> were deduplicated in LTO linking, the relationship between a CU and
> DISubprogram was inverted (instead of a CU maintaining a list of
> subprograms, subprograms specify which CU they come from - and the
> llvm::Function references the DISubprogram, so if the llvm::Function goes
> away, so does the associated DISubprogram)
>
> I'm not sure if this caused a regression, but at least seems to miss a
> possible improvement:
>
> During IR linking (for LTO, ThinLTO, etc) these distinct DISubprogram
> definitions (& their CU link, even if they weren't marked
'distinct', the
> CU link would cause them to effectively be so) remain separate - this means
> that inlined versions in one CU don't refer to an existing subprogram
> definition in another CU.
>
> To demonstrate:
> inl.h:
> void f1();
> inline __attribute__((always_inline)) void f2() {
>   f1();
> }
> inl1.cpp:
> #include "inl.h"
> void c1() {
>   f2();
> }
> inl2.cpp:
> #include "inl.h"
> void c2() {
>   f2();
> }
>
> Compile to IR, llvm-link the result. The DWARF you get is basically the
> same as the DWARF you'd get without linking:
>
> DW_TAG_compile_unit
>   DW_AT_name "inl1.cpp"
>   DW_TAG_subprogram #0
>     DW_AT_name "f2"
>   DW_TAG_subprogram
>     DW_AT_name "c1"
>     DW_TAG_inlined_subroutine
>       DW_TAG_abstract_origin #0 "f2"
> DW_TAG_compile_unit
>   DW_AT_name "inl2.cpp"
>   DW_TAG_subprogram #1
>     DW_AT_name "f2"
>   DW_TAG_subprogram
>     DW_AT_name "c2"
>     DW_TAG_inlined_subroutine
>       DW_TAG_abstract_origin #1 "f2"
>
> Instead of something more like this:
>
> DW_TAG_compile_unit
>   DW_AT_name "inl1.cpp"
>   DW_TAG_subprogram #0
>     DW_AT_name "f2"
>   DW_TAG_subprogram
>     DW_AT_name "c1"
>     DW_TAG_inlined_subroutine
>       DW_TAG_abstract_origin #0 "f2"
> DW_TAG_compile_unit
>   DW_AT_name "inl2.cpp"
>   DW_TAG_subprogram
>     DW_AT_name "c2"
>     DW_TAG_inlined_subroutine
>       DW_TAG_abstract_origin #0 "f2"
>
> (note that only one abstract definition of f2 is produced here)
>
>
> I think I understand what you are saying. Essentially, having the SP->CU
> link allows the SP to be deduplicated when multiple *outline* copies of the
> corresponding function are deduplicated. But not when the multiple copies
> are inlined, as it looks like we need all the copies, right?
>
Not quite - having the SP->CU link (well,h onestly, marking the SP as
"distinct" does this, but even if we didn't do that, the SP->CU
link would
still do it) causes SPs /not/ to be deduplicated on IR linking.

Each SP is distinct/not considered duplicate with any other. (if we didn't
mark it 'distinct', the fact that each SP refers to its corresponding CU
would produce the same effect - they wouldn't be deduplicated because they
aren't identical - they refer to different CUs)

For non-inlined cases, this is fine.

Before we inverted the SP<>CU link, what would happen is that all copies
of
the llvm::Function would be dropped, but their SPs would be left around. So
two CUs that both used the same linkonce_odr function (let's say no
inlining actually occurred though) would both have a SP description in the
DWARF - but one would actual have a proper definition (with a high/low PC,
etc) the other would be missing those features, as though the function had
been optimized away (which it sort of had)

So by reversing the link, we got rid of those extra SP descriptions in the
DWARF (and the extra SP descriptions in the metadata - I think they were
duplicate back then because they still had a scope chain leading back to
their CU (maybe we had gotten rid of that chain - if we had, then adding it
back in may've actually caused more metadata, but less DWARF))

>
>
>
> Any thoughts? I imagine this is probably worth a reasonable amount of
> savings in an optimized build. Not huge, but not nothing. (probably not the
> top of anyone's list though, I realize)
>
> Should we remove the CU link from a non-internal linkage subprogram (&
> this may have an effect on the GV representation issue originally being
> discussed) and just emit it into whichever CU happens to need it first?
>
>
> I can see how this would be done in LTO where the compiler has full
> visibility. For ThinLTO presumably we would need to do some index-based
> marking? Can we at least do something when we import an inlined SP and drop
> it since we know it is defined elsewhere?
>
Complete visibility isn't required to benefit here - and unfortunately
there's nothing fancier (that I know of) that we can do to avoid emitting
one definition of each used inline function in each thinlto object file we
produce (we can't say "oh, the name of the function, its mangled name,
the
names and types of its parameters are over in that other object
file/somewhere else" - but we can avoid emitting those descriptions in each
/CU/ that uses the inlined function within a single ThinLTO object)

I can provide some more thorough examples if that'd be helpful :)

>
> Thanks,
> Teresa
>
>
>
> This might be slightly sub-optimal, due to, say, the namespace being
> foreign to that CU. But it's how we do types currently, I think? So at
> least it'd be consistent and probably cheap enough/better.
>
>
>
>
> --
> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
> 408-460-2413 <(408)%20460-2413>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161215/408d74c5/attachment.html>

Teresa Johnson via llvm-dev

2016-Dec-15 21:30 UTC

head link

[llvm-dev] distinct DISubprograms hindering sharing inlined subprogram descriptions

On Thu, Dec 15, 2016 at 11:38 AM, David Blaikie <dblaikie at gmail.com>
wrote:
>
>
> On Thu, Dec 15, 2016 at 11:26 AM Teresa Johnson <tejohnson at
google.com>
> wrote:
>
>> Trying to wrap my brain around this, so a few questions below. =)
>>
>
> Sure thing - sorry, did assume a bit too much arcane context here.
>
>
>>
>> On Thu, Dec 15, 2016 at 10:54 AM, David Blaikie <dblaikie at
gmail.com>
>> wrote:
>>
>> Branching off from a discussion of improvements to DIGlobalVariable
>> representations that Adrian's working on - got me thinking about
related
>> changes that have already been made to DISubprogram.
>>
>> To reduce duplicate debug info when things like linkonce_odr functions
>> were deduplicated in LTO linking, the relationship between a CU and
>> DISubprogram was inverted (instead of a CU maintaining a list of
>> subprograms, subprograms specify which CU they come from - and the
>> llvm::Function references the DISubprogram, so if the llvm::Function
goes
>> away, so does the associated DISubprogram)
>>
>> I'm not sure if this caused a regression, but at least seems to
miss a
>> possible improvement:
>>
>> During IR linking (for LTO, ThinLTO, etc) these distinct DISubprogram
>> definitions (& their CU link, even if they weren't marked
'distinct', the
>> CU link would cause them to effectively be so) remain separate - this
means
>> that inlined versions in one CU don't refer to an existing
subprogram
>> definition in another CU.
>>
>> To demonstrate:
>> inl.h:
>> void f1();
>> inline __attribute__((always_inline)) void f2() {
>>   f1();
>> }
>> inl1.cpp:
>> #include "inl.h"
>> void c1() {
>>   f2();
>> }
>> inl2.cpp:
>> #include "inl.h"
>> void c2() {
>>   f2();
>> }
>>
>> Compile to IR, llvm-link the result. The DWARF you get is basically the
>> same as the DWARF you'd get without linking:
>>
>> DW_TAG_compile_unit
>>   DW_AT_name "inl1.cpp"
>>   DW_TAG_subprogram #0
>>     DW_AT_name "f2"
>>   DW_TAG_subprogram
>>     DW_AT_name "c1"
>>     DW_TAG_inlined_subroutine
>>       DW_TAG_abstract_origin #0 "f2"
>> DW_TAG_compile_unit
>>   DW_AT_name "inl2.cpp"
>>   DW_TAG_subprogram #1
>>     DW_AT_name "f2"
>>   DW_TAG_subprogram
>>     DW_AT_name "c2"
>>     DW_TAG_inlined_subroutine
>>       DW_TAG_abstract_origin #1 "f2"
>>
>> Instead of something more like this:
>>
>> DW_TAG_compile_unit
>>   DW_AT_name "inl1.cpp"
>>   DW_TAG_subprogram #0
>>     DW_AT_name "f2"
>>   DW_TAG_subprogram
>>     DW_AT_name "c1"
>>     DW_TAG_inlined_subroutine
>>       DW_TAG_abstract_origin #0 "f2"
>> DW_TAG_compile_unit
>>   DW_AT_name "inl2.cpp"
>>   DW_TAG_subprogram
>>     DW_AT_name "c2"
>>     DW_TAG_inlined_subroutine
>>       DW_TAG_abstract_origin #0 "f2"
>>
>> (note that only one abstract definition of f2 is produced here)
>>
>>
>> I think I understand what you are saying. Essentially, having the
SP->CU
>> link allows the SP to be deduplicated when multiple *outline* copies of
the
>> corresponding function are deduplicated. But not when the multiple
copies
>> are inlined, as it looks like we need all the copies, right?
>>
>
> Not quite - having the SP->CU link (well,h onestly, marking the SP as
> "distinct" does this, but even if we didn't do that, the
SP->CU link would
> still do it) causes SPs /not/ to be deduplicated on IR linking.
>
> Each SP is distinct/not considered duplicate with any other. (if we
didn't
> mark it 'distinct', the fact that each SP refers to its
corresponding CU
> would produce the same effect - they wouldn't be deduplicated because
they
> aren't identical - they refer to different CUs)
>
> For non-inlined cases, this is fine.
>
> Before we inverted the SP<>CU link, what would happen is that all
copies
> of the llvm::Function would be dropped, but their SPs would be left around.
> So two CUs that both used the same linkonce_odr function (let's say no
> inlining actually occurred though) would both have a SP description in the
> DWARF - but one would actual have a proper definition (with a high/low PC,
> etc) the other would be missing those features, as though the function had
> been optimized away (which it sort of had)
>
> So by reversing the link, we got rid of those extra SP descriptions in the
> DWARF (and the extra SP descriptions in the metadata - I think they were
> duplicate back then because they still had a scope chain leading back to
> their CU (maybe we had gotten rid of that chain - if we had, then adding it
> back in may've actually caused more metadata, but less DWARF))
>
I almost followed all of this, until I got to this last bit. I understood
from above that with the SP->CU link (and distinct SPs), prevented
deduplication. But this last bit sounds like we are in fact removing the
duplicates in the DWARF and possibly also in the metadata.

>
>
>>
>>
>>
>> Any thoughts? I imagine this is probably worth a reasonable amount of
>> savings in an optimized build. Not huge, but not nothing. (probably not
the
>> top of anyone's list though, I realize)
>>
>> Should we remove the CU link from a non-internal linkage subprogram
(&
>> this may have an effect on the GV representation issue originally being
>> discussed) and just emit it into whichever CU happens to need it first?
>>
>>
>> I can see how this would be done in LTO where the compiler has full
>> visibility. For ThinLTO presumably we would need to do some index-based
>> marking? Can we at least do something when we import an inlined SP and
drop
>> it since we know it is defined elsewhere?
>>
>
> Complete visibility isn't required to benefit here - and unfortunately
> there's nothing fancier (that I know of) that we can do to avoid
emitting
> one definition of each used inline function in each thinlto object file we
> produce (we can't say "oh, the name of the function, its mangled
name, the
> names and types of its parameters are over in that other object
> file/somewhere else" - but we can avoid emitting those descriptions in
each
> /CU/ that uses the inlined function within a single ThinLTO object)
>
> I can provide some more thorough examples if that'd be helpful :)
>
Ok, I think I understand. This is only emitting once per object file, which
with ThinLTO can contain multiple CUs due to importing. But then with full
LTO it sounds like we would be in even better shape, since it has a single
module with all the CUs?

Teresa

>
>>
>> Thanks,
>> Teresa
>>
>>
>>
>> This might be slightly sub-optimal, due to, say, the namespace being
>> foreign to that CU. But it's how we do types currently, I think? So
at
>> least it'd be consistent and probably cheap enough/better.
>>
>>
>>
>>
>> --
>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>> 408-460-2413 <(408)%20460-2413>
>>
>

-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161215/ae395881/attachment.html>

David Blaikie via llvm-dev

2016-Dec-15 22:08 UTC

head link

[llvm-dev] distinct DISubprograms hindering sharing inlined subprogram descriptions

On Thu, Dec 15, 2016 at 1:30 PM Teresa Johnson <tejohnson at google.com>
wrote:

On Thu, Dec 15, 2016 at 11:38 AM, David Blaikie <dblaikie at gmail.com>
wrote:



On Thu, Dec 15, 2016 at 11:26 AM Teresa Johnson <tejohnson at google.com>
wrote:

Trying to wrap my brain around this, so a few questions below. =)


Sure thing - sorry, did assume a bit too much arcane context here.



On Thu, Dec 15, 2016 at 10:54 AM, David Blaikie <dblaikie at gmail.com>
wrote:

Branching off from a discussion of improvements to DIGlobalVariable
representations that Adrian's working on - got me thinking about related
changes that have already been made to DISubprogram.

To reduce duplicate debug info when things like linkonce_odr functions were
deduplicated in LTO linking, the relationship between a CU and DISubprogram
was inverted (instead of a CU maintaining a list of subprograms,
subprograms specify which CU they come from - and the llvm::Function
references the DISubprogram, so if the llvm::Function goes away, so does
the associated DISubprogram)

I'm not sure if this caused a regression, but at least seems to miss a
possible improvement:

During IR linking (for LTO, ThinLTO, etc) these distinct DISubprogram
definitions (& their CU link, even if they weren't marked
'distinct', the
CU link would cause them to effectively be so) remain separate - this means
that inlined versions in one CU don't refer to an existing subprogram
definition in another CU.

To demonstrate:
inl.h:
void f1();
inline __attribute__((always_inline)) void f2() {
  f1();
}
inl1.cpp:
#include "inl.h"
void c1() {
  f2();
}
inl2.cpp:
#include "inl.h"
void c2() {
  f2();
}

Compile to IR, llvm-link the result. The DWARF you get is basically the
same as the DWARF you'd get without linking:

DW_TAG_compile_unit
  DW_AT_name "inl1.cpp"
  DW_TAG_subprogram #0
    DW_AT_name "f2"
  DW_TAG_subprogram
    DW_AT_name "c1"
    DW_TAG_inlined_subroutine
      DW_TAG_abstract_origin #0 "f2"
DW_TAG_compile_unit
  DW_AT_name "inl2.cpp"
  DW_TAG_subprogram #1
    DW_AT_name "f2"
  DW_TAG_subprogram
    DW_AT_name "c2"
    DW_TAG_inlined_subroutine
      DW_TAG_abstract_origin #1 "f2"

Instead of something more like this:

DW_TAG_compile_unit
  DW_AT_name "inl1.cpp"
  DW_TAG_subprogram #0
    DW_AT_name "f2"
  DW_TAG_subprogram
    DW_AT_name "c1"
    DW_TAG_inlined_subroutine
      DW_TAG_abstract_origin #0 "f2"
DW_TAG_compile_unit
  DW_AT_name "inl2.cpp"
  DW_TAG_subprogram
    DW_AT_name "c2"
    DW_TAG_inlined_subroutine
      DW_TAG_abstract_origin #0 "f2"

(note that only one abstract definition of f2 is produced here)


I think I understand what you are saying. Essentially, having the SP->CU
link allows the SP to be deduplicated when multiple *outline* copies of the
corresponding function are deduplicated. But not when the multiple copies
are inlined, as it looks like we need all the copies, right?


Not quite - having the SP->CU link (well,h onestly, marking the SP as
"distinct" does this, but even if we didn't do that, the SP->CU
link would
still do it) causes SPs /not/ to be deduplicated on IR linking.

Each SP is distinct/not considered duplicate with any other. (if we didn't
mark it 'distinct', the fact that each SP refers to its corresponding CU
would produce the same effect - they wouldn't be deduplicated because they
aren't identical - they refer to different CUs)

For non-inlined cases, this is fine.

Before we inverted the SP<>CU link, what would happen is that all copies
of
the llvm::Function would be dropped, but their SPs would be left around. So
two CUs that both used the same linkonce_odr function (let's say no
inlining actually occurred though) would both have a SP description in the
DWARF - but one would actual have a proper definition (with a high/low PC,
etc) the other would be missing those features, as though the function had
been optimized away (which it sort of had)

So by reversing the link, we got rid of those extra SP descriptions in the
DWARF (and the extra SP descriptions in the metadata - I think they were
duplicate back then because they still had a scope chain leading back to
their CU (maybe we had gotten rid of that chain - if we had, then adding it
back in may've actually caused more metadata, but less DWARF))


I almost followed all of this, until I got to this last bit. I understood
from above that with the SP->CU link (and distinct SPs), prevented
deduplication. But this last bit sounds like we are in fact removing the
duplicates in the DWARF and possibly also in the metadata.


Ah, right - I can see how it reads that way, sorry.

Old old way:

first.ll:
  CU1 -> {fn1_SP -> @fn1, inl_SP -> @inl, ... }
  @fn1 ...
  @inl ...
  Resulting DWARF:
    compile_unit CU1
      subprogram fn1
        high/low pc, etc
      subprogram inl
        high/low pc, etc

second.ll:
  CU2 -> {inl_SP2 -> @inl, SP2 -> @fn2, ... }
  @inl ...
  @fn2 ...
  Resulting DWARF:
    compile_unit CU2
      subprogram inl
        high/low pc, etc
      subprogram fn2
        high/low pc, etc

link first.ll + second.ll:
  CU1 -> {fn1_SP -> @fn1, inl_SP -> @inl, ... }
  CU2 -> {inl_SP2 -> null, SP2 -> @fn2, ... }
  @fn1 ...
  @inl ...
  @fn2 ...
  Resulting DWARF:
    compile_unit CU1
      subprogram fn1
        high/low pc, etc
      subprogram inl
        high/low pc, etc
    compile_unit CU2
      subprogram inl
        name, but no high/low pc - this is unnecessary
      subprogram fn2
        high/low pc, etc

New way:
  CU1
  @fn1 -> fn1_SP -> CU1
  @inl -> inl_SP -> CU1
  Resulting DWARF:
    compile_unit CU1
      subprogram fn1
        high/low pc, etc
      subprogram inl
        high/low pc, etc

second.ll:
  CU2
  @inl -> inl_SP -> CU2
  @fn2 -> fn2_SP -> CU2
  Resulting DWARF:
    compile_unit CU2
      subprogram inl
        high/low pc, etc
      subprogram fn2
        high/low pc, etc

link first.ll + second.ll (we pick @inl from first.ll in this example):
  CU1
  CU2
  @fn1 -> fn1_SP -> CU1
  @inl -> inl_SP -> CU1
  @fn2 -> fn2_SP -> CU2
  Resulting DWARF:
    compile_unit CU1
      subprogram fn1
        high/low pc, etc
      subprogram inl
        high/low pc, etc
    compile_unit CU2
      subprogram fn2
        high/low pc, etc

So inverting the links causes us to completely drop the redundant
description of 'inl' that appeared in C2 when the function was not
inlined.

But if the function /is/ inlined, then the inlined location descriptions
that remain in @fn2 (assuming there was a call to @inl in @fn1 and @fn2)
still point to that original (CU2) version of @inl - causing it to to be
emitted into CU2.

Whereas for type descriptions we don't do this - the type has no CU link,
so they all get deduplicated and even if @fn2 has a parameter of the same
type as @fn1 - we emit the type into CU1 when we first encounter it (when
emitting @fn1) and then reference it whenever we need it, even when
emitting @fn2 in the other CU.









Any thoughts? I imagine this is probably worth a reasonable amount of
savings in an optimized build. Not huge, but not nothing. (probably not the
top of anyone's list though, I realize)

Should we remove the CU link from a non-internal linkage subprogram (& this
may have an effect on the GV representation issue originally being
discussed) and just emit it into whichever CU happens to need it first?


I can see how this would be done in LTO where the compiler has full
visibility. For ThinLTO presumably we would need to do some index-based
marking? Can we at least do something when we import an inlined SP and drop
it since we know it is defined elsewhere?


Complete visibility isn't required to benefit here - and unfortunately
there's nothing fancier (that I know of) that we can do to avoid emitting
one definition of each used inline function in each thinlto object file we
produce (we can't say "oh, the name of the function, its mangled name,
the
names and types of its parameters are over in that other object
file/somewhere else" - but we can avoid emitting those descriptions in each
/CU/ that uses the inlined function within a single ThinLTO object)

I can provide some more thorough examples if that'd be helpful :)


Ok, I think I understand. This is only emitting once per object file, which
with ThinLTO can contain multiple CUs due to importing. But then with full
LTO it sounds like we would be in even better shape, since it has a single
module with all the CUs?


Right - currently we emit once per CU, but with a change in format we could
emit once per object file - which hurts ThinLTO over non-LTO (because
ThinLTO produces more CUs (due to imports) per object file) and is neutral
for full LTO (since it produces the same number of CUs, just in one object
file).

Improving this representation to produce once per object would help get
ThinLTO back what it's currently paying - and improve full LTO further than
its current position in this regard.

But the gains might not be major/significant - I've done nothing to assess
that, just observing that it is suboptimal.

Thanks for asking/helping me explain it further, hopefully this is more
descriptive.

- Dave



Teresa




Thanks,
Teresa



This might be slightly sub-optimal, due to, say, the namespace being
foreign to that CU. But it's how we do types currently, I think? So at
least it'd be consistent and probably cheap enough/better.




-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
<(408)%20460-2413>




-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
<(408)%20460-2413>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161215/50e0dc03/attachment.html>

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Dec 2016 - distinct DISubprograms hindering sharing inlined subprogram descriptions

[llvm-dev] distinct DISubprograms hindering sharing inlined subprogram descriptions

[llvm-dev] distinct DISubprograms hindering sharing inlined subprogram descriptions

[llvm-dev] distinct DISubprograms hindering sharing inlined subprogram descriptions

Possibly Parallel Threads