thr3ads.net - llvm dev - [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld. [May 2020]

If this information is useful, please help other people find it:
Share via:

Alexey Lapshin via llvm-dev

2020-May-08 13:18 UTC

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

Folks, we work on optimization of binary size and improvement of debug info
quality.
To reduce the size of the binary we use -ffunction-sections so that unused code
would be garbage collected.
When the linker does garbage collection, a lot of abandoned debug info is left
behind.
Besides inflated debug info size, we ended up with overlapping address ranges
and no way to say valid vs garbage ranges(D59553).
To resolve these two problems, we use implementation extracted from dsymutil
https://reviews.llvm.org/D74169.
It adds --gc-debuginfo command line option to the linker to remove obsolete
debug info.
Currently, it has the following limitations: does not support DWARF5, modules,
-fdebug-types-section, type units, .debug_types, multiple .debug_info sections,
split DWARF, thin lto.

Following are size/performance results for the D74169:

A: --function-sections --gc-sections
B: --function-sections --gc-sections --gc-debuginfo
C: --function-sections --gc-sections --fdebug-types-section
D: --function-sections --gc-sections --gsplit-dwarf
E: --function-sections --gc-sections --gc-debuginfo
--compress-debug-sections=zlib

LLVM code base:
--------------------------------------------------------------
| Options |    build time   |    bin size   |    lib size    |
--------------------------------------------------------------
|    A    |    54min(100%)  |   19.0G(100%) |  15.0G(100.0%) |
--------------------------------------------------------------
|    B    |    65min(120%)  |    9.7G( 51%) |  12.0G( 80.0%) |
--------------------------------------------------------------
|    C    |    53min( 98%)  |   12.0G( 63%) |  15.0G(100.0%) |
--------------------------------------------------------------
|    D    |    52min( 96%)  |   12.0G( 63%) |   8.2G( 55.0%) |
--------------------------------------------------------------
|    E    |    64min(118%)  |    5.3G( 28%) |  12.0G( 80.0%) |
--------------------------------------------------------------


Clang binary:
-------------------------------------------------------------
| Options |      size      |     link time  |  used memory  |
-------------------------------------------------------------
|    A    |    1.50G(100%) |    9sec(100%)  |  9307MB(100%) |
-------------------------------------------------------------
|    B    |    0.76G( 50%) |   68sec(755%)  | 15055MB(161%) |
-------------------------------------------------------------
|    C    |    0.82G( 54%) |    8sec( 89%)  |  8402MB( 90%) |
-------------------------------------------------------------
|    D    |    0.96G( 64%) |    6sec( 67%)  |  4273MB( 46%) |
-------------------------------------------------------------
|    E    |    0.43G( 29%) |   77sec(855%)  | 15000MB(161%) |
-------------------------------------------------------------


lldb loading time:
--------------------------------------------
| Options |      time     |   used memory  |
--------------------------------------------
|    A    |  6.4sec(100%) |  1495MB(100%)  |
--------------------------------------------
|    B    |  4.0sec( 63%) |   826MB( 55%)  |
--------------------------------------------
|    C    |  3.7sec( 58%) |   877MB( 59%)  |
--------------------------------------------
|    D    |  4.3sec( 67%) |  1023MB( 69%)  |
--------------------------------------------
|    E    |  2.1sec( 33%) |   478MB( 32%)  |
--------------------------------------------

I want to discuss the results and to decide whether it is worth to integrate of
D74169:

improvements:

1. Reduces the size of debug info(50%).
2. Resolves overlapping of address ranges(D59553).
3. Reduced size of debug info allows tools to work faster and to require less
memory.

drawbacks and not implemented features:

1. linking time is increased(755%).

  The --gc-debuginfo option is off by default. So it would affect only those who
need it and explicitly specified it.

  I think the current DWARFLinker code could be optimized more to improve
performance results.

2. Support of type units.

  That could be implemented further.

3. DWARF5.

   Current DWARFEmitter/DWARFStreamer has an implementation for DWARF
generation, which does not support
DWARF5(only debug_names table). At the same time, there already exists code in
CodeGen/AsmPrinter/DwarfDebug.h,
which implements most of DWARF5. It seems that DWARFEmitter/DWARFStreamer should
be rewritten using
DwarfDebug/DwarfFile. Though I am not sure whether it would be easy to re-use
DwarfDebug/DwarfFile.
It would probably be necessary to separate some intermediate level of
DwarfDebug/DwarfFile.

4. split DWARF support.

   This solution does not work with split DWARF currently. But it could be
useful for the split dwarf in two ways:

   a) The generation of skeleton file could be changed in such a way that
address ranges pointing to garbage
collected code would be replaced with lowpc=0, highpc=0. That would solve the
problem of overlapping address
ranges(D59553).

   b) The approach similar to dsymutil implementation could be used to generate
monolithic debuginfo created
from .dwo files. That suggestion is from -
https://reviews.llvm.org/D74169#1888386.
      i.e., DWARFLinker could be taught to generate the same output as D74169
but for split DWARF as the source.

5. -fmodules-debuginfo

   That problem was described in this review -
https://reviews.llvm.org/D54747#1505462 . Currently, DWARFLinker/dsymutil has
the same problem. It could be solved using the fact that DWARFLinker analyzes
debuginfo. It could recognize debug info generated for the module and keep
it(compile units containing debug info for modules do not have low_pc, high_pc).

6. -flto=thin

   That problem was described in this review
https://reviews.llvm.org/D54747#1503720. It also exists in current
DWARFLinker/dsymutil implementation. I think that problem should be discussed
more: it could probably be fixed by avoiding generation of such incomplete
declaration during thinlto, or, alternatively, DWARFLinker could recognize such
situation and copy missed type declaration.

======================================================================================
Debuginfo, Linker folks, What do you think about current results and future
directions?

It introduces quite a significant linking time increase(6x-8x). But it would
affect only those who use that feature.

Thus the users will be able to decide whether that linking time increase is
acceptable or not.
Resolving all 1-6 points is quite a significant work. But, in the result, debug
info is more correct and compact.

Do you think that it would be good to integrate it and to start to work on
improving?

Thank you, Alexey.




-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200508/b4cec2d1/attachment.html>

James Henderson via llvm-dev

2020-May-11 08:25 UTC

head link

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

Hi Alexey,

Regarding the link performance timings, have you tried profiling to see if
there are any obvious performance improvements that could be made? A slow
down of 7x seems like an awfully large amount given what this should be
doing after all. Also, do you have an idea whether the slow down is
exponential for the size/linear etc?

The problem is that if it is opt-in, but the link time cost is so high, it
may put people off ever enabling it, which would be a shame, as the
debugger load time improvements seem worthwhile having.

James

On Fri, 8 May 2020 at 14:18, Alexey Lapshin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Folks, we work on optimization of binary size and improvement of debug
> info quality.
> To reduce the size of the binary we use -ffunction-sections so that unused
> code would be garbage collected.
> When the linker does garbage collection, a lot of abandoned debug info is
> left behind.
> Besides inflated debug info size, we ended up with overlapping address
> ranges and no way to say valid vs garbage ranges(D59553).
> To resolve these two problems, we use implementation extracted from
> dsymutil https://reviews.llvm.org/D74169.
> It adds --gc-debuginfo command line option to the linker to remove
> obsolete debug info.
> Currently, it has the following limitations: does not support DWARF5,
> modules, -fdebug-types-section, type units, .debug_types, multiple
> .debug_info sections, split DWARF, thin lto.
>
> Following are size/performance results for the D74169:
>
> A: --function-sections --gc-sections
> B: --function-sections --gc-sections --gc-debuginfo
> C: --function-sections --gc-sections --fdebug-types-section
> D: --function-sections --gc-sections --gsplit-dwarf
> E: --function-sections --gc-sections --gc-debuginfo
> --compress-debug-sections=zlib
>
> LLVM code base:
> --------------------------------------------------------------
> | Options |    build time   |    bin size   |    lib size    |
> --------------------------------------------------------------
> |    A    |    54min(100%)  |   19.0G(100%) |  15.0G(100.0%) |
> --------------------------------------------------------------
> |    B    |    65min(120%)  |    9.7G( 51%) |  12.0G( 80.0%) |
> --------------------------------------------------------------
> |    C    |    53min( 98%)  |   12.0G( 63%) |  15.0G(100.0%) |
> --------------------------------------------------------------
> |    D    |    52min( 96%)  |   12.0G( 63%) |   8.2G( 55.0%) |
> --------------------------------------------------------------
> |    E    |    64min(118%)  |    5.3G( 28%) |  12.0G( 80.0%) |
> --------------------------------------------------------------
>
>
> Clang binary:
> -------------------------------------------------------------
> | Options |      size      |     link time  |  used memory  |
> -------------------------------------------------------------
> |    A    |    1.50G(100%) |    9sec(100%)  |  9307MB(100%) |
> -------------------------------------------------------------
> |    B    |    0.76G( 50%) |   68sec(755%)  | 15055MB(161%) |
> -------------------------------------------------------------
> |    C    |    0.82G( 54%) |    8sec( 89%)  |  8402MB( 90%) |
> -------------------------------------------------------------
> |    D    |    0.96G( 64%) |    6sec( 67%)  |  4273MB( 46%) |
> -------------------------------------------------------------
> |    E    |    0.43G( 29%) |   77sec(855%)  | 15000MB(161%) |
> -------------------------------------------------------------
>
>
> lldb loading time:
> --------------------------------------------
> | Options |      time     |   used memory  |
> --------------------------------------------
> |    A    |  6.4sec(100%) |  1495MB(100%)  |
> --------------------------------------------
> |    B    |  4.0sec( 63%) |   826MB( 55%)  |
> --------------------------------------------
> |    C    |  3.7sec( 58%) |   877MB( 59%)  |
> --------------------------------------------
> |    D    |  4.3sec( 67%) |  1023MB( 69%)  |
> --------------------------------------------
> |    E    |  2.1sec( 33%) |   478MB( 32%)  |
> --------------------------------------------
>
> I want to discuss the results and to decide whether it is worth to
> integrate of D74169:
>
> improvements:
>
> 1. Reduces the size of debug info(50%).
> 2. Resolves overlapping of address ranges(D59553).
> 3. Reduced size of debug info allows tools to work faster and to require
> less memory.
>
> drawbacks and not implemented features:
>
> 1. linking time is increased(755%).
>
>   The --gc-debuginfo option is off by default. So it would affect only
> those who need it and explicitly specified it.
>
>   I think the current DWARFLinker code could be optimized more to improve
> performance results.
>
> 2. Support of type units.
>
>   That could be implemented further.
>
> 3. DWARF5.
>
>    Current DWARFEmitter/DWARFStreamer has an implementation for DWARF
> generation, which does not support
> DWARF5(only debug_names table). At the same time, there already exists
> code in CodeGen/AsmPrinter/DwarfDebug.h,
> which implements most of DWARF5. It seems that DWARFEmitter/DWARFStreamer
> should be rewritten using
> DwarfDebug/DwarfFile. Though I am not sure whether it would be easy to
> re-use DwarfDebug/DwarfFile.
> It would probably be necessary to separate some intermediate level of
> DwarfDebug/DwarfFile.
>
> 4. split DWARF support.
>
>    This solution does not work with split DWARF currently. But it could be
> useful for the split dwarf in two ways:
>
>    a) The generation of skeleton file could be changed in such a way that
> address ranges pointing to garbage
> collected code would be replaced with lowpc=0, highpc=0. That would solve
> the problem of overlapping address
> ranges(D59553).
>
>    b) The approach similar to dsymutil implementation could be used to
> generate monolithic debuginfo created
> from .dwo files. That suggestion is from -
> https://reviews.llvm.org/D74169#1888386.
>       i.e., DWARFLinker could be taught to generate the same output as
> D74169 but for split DWARF as the source.
>
> 5. -fmodules-debuginfo
>
>    That problem was described in this review -
> https://reviews.llvm.org/D54747#1505462 . Currently, DWARFLinker/dsymutil
> has the same problem. It could be solved using the fact that DWARFLinker
> analyzes debuginfo. It could recognize debug info generated for the module
> and keep it(compile units containing debug info for modules do not have
> low_pc, high_pc).
>
> 6. -flto=thin
>
>    That problem was described in this review
> https://reviews.llvm.org/D54747#1503720. It also exists in current
> DWARFLinker/dsymutil implementation. I think that problem should be
> discussed more: it could probably be fixed by avoiding generation of such
> incomplete declaration during thinlto, or, alternatively, DWARFLinker could
> recognize such situation and copy missed type declaration.
>
>
>
======================================================================================>
> Debuginfo, Linker folks, What do you think about current results and
> future directions?
>
>
> It introduces quite a significant linking time increase(6x-8x). But it
> would affect only those who use that feature.
>
> Thus the users will be able to decide whether that linking time increase
> is acceptable or not.
> Resolving all 1-6 points is quite a significant work. But, in the result,
> debug info is more correct and compact.
>
> Do you think that it would be good to integrate it and to start to work on
> improving?
>
> Thank you, Alexey.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200511/8f7770c2/attachment.html>

David Blaikie via llvm-dev

2020-May-11 20:11 UTC

head link

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

Broad question: Do you have any specific motivation/users/etc in
implementing this (if you can speak about it)? - it might help motivate the
work, understand what tradeoffs might be suitable for you/your users, etc.

In general, in the current state, I don't have strong feelings either way
about this going in as-is with the intent to improve it to make it more
viable - or some of that work being done out-of-tree until it's a more
viable performance tradeoff. Mostly happy to leave that up to folks more
involved with lld.

A couple of minor points...

On Fri, May 8, 2020 at 6:18 AM Alexey Lapshin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Folks, we work on optimization of binary size and improvement of debug
> info quality.
> To reduce the size of the binary we use -ffunction-sections so that unused
> code would be garbage collected.
> When the linker does garbage collection, a lot of abandoned debug info is
> left behind.
> Besides inflated debug info size, we ended up with overlapping address
> ranges and no way to say valid vs garbage ranges(D59553).
> To resolve these two problems, we use implementation extracted from
> dsymutil https://reviews.llvm.org/D74169.
> It adds --gc-debuginfo command line option to the linker to remove
> obsolete debug info.
> Currently, it has the following limitations: does not support DWARF5,
> modules, -fdebug-types-section, type units, .debug_types,
>
These last 3 ^ are all the same thing, FWIW. (well, in DWARFv5 they go in
debug_info, but it's the same feature)

> multiple .debug_info sections, split DWARF, thin lto.
>
> Following are size/performance results for the D74169:
>
> A: --function-sections --gc-sections
> B: --function-sections --gc-sections --gc-debuginfo
> C: --function-sections --gc-sections --fdebug-types-section
> ^ not sure of the point of testing/showing comparisons with a situation
that's currently unsupported
> D: --function-sections --gc-sections --gsplit-dwarf
> E: --function-sections --gc-sections --gc-debuginfo
> --compress-debug-sections=zlib
>
> LLVM code base:
> --------------------------------------------------------------
> | Options |    build time   |    bin size   |    lib size    |
> --------------------------------------------------------------
> |    A    |    54min(100%)  |   19.0G(100%) |  15.0G(100.0%) |
> --------------------------------------------------------------
> |    B    |    65min(120%)  |    9.7G( 51%) |  12.0G( 80.0%) |
> --------------------------------------------------------------
> |    C    |    53min( 98%)  |   12.0G( 63%) |  15.0G(100.0%) |
> --------------------------------------------------------------
> |    D    |    52min( 96%)  |   12.0G( 63%) |   8.2G( 55.0%) |
> --------------------------------------------------------------
> |    E    |    64min(118%)  |    5.3G( 28%) |  12.0G( 80.0%) |
> --------------------------------------------------------------
>
>
> Clang binary:
> -------------------------------------------------------------
> | Options |      size      |     link time  |  used memory  |
> -------------------------------------------------------------
> |    A    |    1.50G(100%) |    9sec(100%)  |  9307MB(100%) |
> -------------------------------------------------------------
> |    B    |    0.76G( 50%) |   68sec(755%)  | 15055MB(161%) |
> -------------------------------------------------------------
> |    C    |    0.82G( 54%) |    8sec( 89%)  |  8402MB( 90%) |
> -------------------------------------------------------------
> |    D    |    0.96G( 64%) |    6sec( 67%)  |  4273MB( 46%) |
> -------------------------------------------------------------
> |    E    |    0.43G( 29%) |   77sec(855%)  | 15000MB(161%) |
> -------------------------------------------------------------
>
>
> lldb loading time:
> --------------------------------------------
> | Options |      time     |   used memory  |
> --------------------------------------------
> |    A    |  6.4sec(100%) |  1495MB(100%)  |
> --------------------------------------------
> |    B    |  4.0sec( 63%) |   826MB( 55%)  |
> --------------------------------------------
> |    C    |  3.7sec( 58%) |   877MB( 59%)  |
> --------------------------------------------
> |    D    |  4.3sec( 67%) |  1023MB( 69%)  |
> --------------------------------------------
> |    E    |  2.1sec( 33%) |   478MB( 32%)  |
> --------------------------------------------
>
> I want to discuss the results and to decide whether it is worth to
> integrate of D74169:
>
> improvements:
>
> 1. Reduces the size of debug info(50%).
> 2. Resolves overlapping of address ranges(D59553).
> 3. Reduced size of debug info allows tools to work faster and to require
> less memory.
>
> drawbacks and not implemented features:
>
> 1. linking time is increased(755%).
>
>   The --gc-debuginfo option is off by default. So it would affect only
> those who need it and explicitly specified it.
>
>   I think the current DWARFLinker code could be optimized more to improve
> performance results.
>
> 2. Support of type units.
>
>   That could be implemented further.
>Enabling type units increases object size to make it easier to deduplicate
at link time by a DWARF-unaware linker. With a DWARF aware linker it'd be
generally desirable not to have to add that object size overhead to get the
linking improvements.
>
> 3. DWARF5.
>
>    Current DWARFEmitter/DWARFStreamer has an implementation for DWARF
> generation, which does not support
> DWARF5(only debug_names table). At the same time, there already exists
> code in CodeGen/AsmPrinter/DwarfDebug.h,
> which implements most of DWARF5. It seems that DWARFEmitter/DWARFStreamer
> should be rewritten using
> DwarfDebug/DwarfFile. Though I am not sure whether it would be easy to
> re-use DwarfDebug/DwarfFile.
> It would probably be necessary to separate some intermediate level of
> DwarfDebug/DwarfFile.
>
> 4. split DWARF support.
>
>    This solution does not work with split DWARF currently. But it could be
> useful for the split dwarf in two ways:
>
>    a) The generation of skeleton file could be changed in such a way that
> address ranges pointing to garbage
> collected code would be replaced with lowpc=0, highpc=0. That would solve
> the problem of overlapping address
> ranges(D59553).
>This wouldn't/couldn't completely address the issue - because some
address
ranges would be in the .dwo files the linker can't see - and they'd
still
end up with the interesting address ranges.
>
>    b) The approach similar to dsymutil implementation could be used to
> generate monolithic debuginfo created
> from .dwo files. That suggestion is from -
> https://reviews.llvm.org/D74169#1888386.
>       i.e., DWARFLinker could be taught to generate the same output as
> D74169 but for split DWARF as the source.
>
> 5. -fmodules-debuginfo
>
>    That problem was described in this review -
> https://reviews.llvm.org/D54747#1505462 . Currently, DWARFLinker/dsymutil
> has the same problem. It could be solved using the fact that DWARFLinker
> analyzes debuginfo. It could recognize debug info generated for the module
> and keep it(compile units containing debug info for modules do not have
> low_pc, high_pc).
>
> 6. -flto=thin
>
>    That problem was described in this review
> https://reviews.llvm.org/D54747#1503720. It also exists in current
> DWARFLinker/dsymutil implementation. I think that problem should be
> discussed more: it could probably be fixed by avoiding generation of such
> incomplete declaration during thinlto,
>That would be costly to produce extra/redundant debug info in ThinLTO -
actually ThinLTO could be doing more to reduce that redundancy early on
(actually removing definitions from some llvm Modules if the type
definition is known to exist in another Module, etc)

I don't know if it's a problem since that patch was reverted.
> or, alternatively, DWARFLinker could recognize such situation and copy
> missed type declaration.
>
>
>
======================================================================================>
> Debuginfo, Linker folks, What do you think about current results and
> future directions?
>
>
> It introduces quite a significant linking time increase(6x-8x). But it
> would affect only those who use that feature.
>
> Thus the users will be able to decide whether that linking time increase
> is acceptable or not.
> Resolving all 1-6 points is quite a significant work. But, in the result,
> debug info is more correct and compact.
>
> Do you think that it would be good to integrate it and to start to work on
> improving?
>
> Thank you, Alexey.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200511/8577b741/attachment.html>

Alexey Lapshin via llvm-dev

2020-May-11 21:06 UTC

head link

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

>Hi Alexey,
Hi James, Thank you for your comments. Please, find my answers below:
>Regarding the link performance timings, have you tried profiling to see if
there are any obvious performance >improvements that could be made? A slow
down of 7x seems like an awfully large amount given what this >should be
doing after all.
I do not see "easy to fix" alternatives. But there are some
posibilities to improve performance:

1. ~10% improvement could probably be achieved by optimizing string pools
   (NonRelocatableStringpool/DwarfStringPool).

   Measurements show that it is spent ~10 sec in
llvm::StringMapImpl::LookupBucketFor(). The problem
   is that the same strings, again and again, are added to the string pool. Two
attributes
   having the same string value would be analyzed (hash calculated) and searched
inside
   the string pool. Even if these strings are already in string
table(DW_FORM_strp, DW_FORM_strx).
   The process could be optimized for string tables. So that if some string from
the string table were
   accessed previously then, it would keep a reference into the string pool.
This would eliminate
   a lot of string pool searches.

2. ~20-30% improvement by processing each object file in parallel.

   Currently, all object files are analyzed sequentially and cloned
sequentially.
   Cloning is started in parallel with analyzing. That scheme could be changed:
   analyzing and cloning could be done in parallel for each object file.
   That requires refactoring of DWARFLinker and making string pools and
DeclContextTree
    thread-safe.

3. ~10-20% improvement by support type units.

   Currently, dsymutil/DWARFLinker does not support type units. If type units
would be supported, then the "analyzing" step could be skipped for
significant part of debug info data. This would save time.

4. ~2-3% improvement could probably be achieved by optimizing DWARF parser
classes.
   Following is a list of ideas:

   https://reviews.llvm.org/D78672#inline-720056
   https://reviews.llvm.org/D78672#2000012
   https://reviews.llvm.org/D78672#2000363.
>Also, do you have an idea whether the slow down is exponential for the
size/linear etc?
It is linear. Following is the data for different runs(Output size is the size
of overall binary) :

---------------------------------------
| linking time, sec | Output size, MB |
---------------------------------------
|         4         |        64       |
|         5         |        79       |
|        18         |       211       |
|        25         |       308       |
|        29         |       356       |
|        51         |       526       |
|        72         |       788       |
---------------------------------------
>The problem is that if it is opt-in, but the link time cost is so high, it
may put people off ever enabling it, which >would be a shame, as the debugger
load time improvements seem worthwhile having.
>From the other side - integrating of D74169 allows to make things
iteratively. Doing above performance optimizations would require significant
time. Implementing support of DWARF5 would probably require significant time. It
would be much longer to implement whole thing at a time. Also, if D74169 would
be integrated then additional people could probably join that work. I think LLVM
developer policy encourages splitting some work on smaller pieces and
iteratively integrate them.
Thank you, Alexey.
>James
On Fri, 8 May 2020 at 14:18, Alexey Lapshin via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Folks, we work on optimization of binary size and improvement of debug info
quality.
To reduce the size of the binary we use -ffunction-sections so that unused code
would be garbage collected.
When the linker does garbage collection, a lot of abandoned debug info is left
behind.
Besides inflated debug info size, we ended up with overlapping address ranges
and no way to say valid vs garbage ranges(D59553).
To resolve these two problems, we use implementation extracted from dsymutil
https://reviews.llvm.org/D74169.
It adds --gc-debuginfo command line option to the linker to remove obsolete
debug info.
Currently, it has the following limitations: does not support DWARF5, modules,
-fdebug-types-section, type units, .debug_types, multiple .debug_info sections,
split DWARF, thin lto.

Following are size/performance results for the D74169:

A: --function-sections --gc-sections
B: --function-sections --gc-sections --gc-debuginfo
C: --function-sections --gc-sections --fdebug-types-section
D: --function-sections --gc-sections --gsplit-dwarf
E: --function-sections --gc-sections --gc-debuginfo
--compress-debug-sections=zlib

LLVM code base:
--------------------------------------------------------------
| Options |    build time   |    bin size   |    lib size    |
--------------------------------------------------------------
|    A    |    54min(100%)  |   19.0G(100%) |  15.0G(100.0%) |
--------------------------------------------------------------
|    B    |    65min(120%)  |    9.7G( 51%) |  12.0G( 80.0%) |
--------------------------------------------------------------
|    C    |    53min( 98%)  |   12.0G( 63%) |  15.0G(100.0%) |
--------------------------------------------------------------
|    D    |    52min( 96%)  |   12.0G( 63%) |   8.2G( 55.0%) |
--------------------------------------------------------------
|    E    |    64min(118%)  |    5.3G( 28%) |  12.0G( 80.0%) |
--------------------------------------------------------------

Clang binary:
-------------------------------------------------------------
| Options |      size      |     link time  |  used memory  |
-------------------------------------------------------------
|    A    |    1.50G(100%) |    9sec(100%)  |  9307MB(100%) |
-------------------------------------------------------------
|    B    |    0.76G( 50%) |   68sec(755%)  | 15055MB(161%) |
-------------------------------------------------------------
|    C    |    0.82G( 54%) |    8sec( 89%)  |  8402MB( 90%) |
-------------------------------------------------------------
|    D    |    0.96G( 64%) |    6sec( 67%)  |  4273MB( 46%) |
-------------------------------------------------------------
|    E    |    0.43G( 29%) |   77sec(855%)  | 15000MB(161%) |
-------------------------------------------------------------

lldb loading time:
--------------------------------------------
| Options |      time     |   used memory  |
--------------------------------------------
|    A    |  6.4sec(100%) |  1495MB(100%)  |
--------------------------------------------
|    B    |  4.0sec( 63%) |   826MB( 55%)  |
--------------------------------------------
|    C    |  3.7sec( 58%) |   877MB( 59%)  |
--------------------------------------------
|    D    |  4.3sec( 67%) |  1023MB( 69%)  |
--------------------------------------------
|    E    |  2.1sec( 33%) |   478MB( 32%)  |
--------------------------------------------

I want to discuss the results and to decide whether it is worth to integrate of
D74169:

improvements:

1. Reduces the size of debug info(50%).
2. Resolves overlapping of address ranges(D59553).
3. Reduced size of debug info allows tools to work faster and to require less
memory.

drawbacks and not implemented features:

1. linking time is increased(755%).

  The --gc-debuginfo option is off by default. So it would affect only those who
need it and explicitly specified it.

  I think the current DWARFLinker code could be optimized more to improve
performance results.

2. Support of type units.

  That could be implemented further.

3. DWARF5.

   Current DWARFEmitter/DWARFStreamer has an implementation for DWARF
generation, which does not support
DWARF5(only debug_names table). At the same time, there already exists code in
CodeGen/AsmPrinter/DwarfDebug.h,
which implements most of DWARF5. It seems that DWARFEmitter/DWARFStreamer should
be rewritten using
DwarfDebug/DwarfFile. Though I am not sure whether it would be easy to re-use
DwarfDebug/DwarfFile.
It would probably be necessary to separate some intermediate level of
DwarfDebug/DwarfFile.

4. split DWARF support.

   This solution does not work with split DWARF currently. But it could be
useful for the split dwarf in two ways:

   a) The generation of skeleton file could be changed in such a way that
address ranges pointing to garbage
collected code would be replaced with lowpc=0, highpc=0. That would solve the
problem of overlapping address
ranges(D59553).

   b) The approach similar to dsymutil implementation could be used to generate
monolithic debuginfo created
from .dwo files. That suggestion is from -
https://reviews.llvm.org/D74169#1888386.
      i.e., DWARFLinker could be taught to generate the same output as D74169
but for split DWARF as the source.

5. -fmodules-debuginfo

   That problem was described in this review -
https://reviews.llvm.org/D54747#1505462 . Currently, DWARFLinker/dsymutil has
the same problem. It could be solved using the fact that DWARFLinker analyzes
debuginfo. It could recognize debug info generated for the module and keep
it(compile units containing debug info for modules do not have low_pc, high_pc).

6. -flto=thin

   That problem was described in this review
https://reviews.llvm.org/D54747#1503720. It also exists in current
DWARFLinker/dsymutil implementation. I think that problem should be discussed
more: it could probably be fixed by avoiding generation of such incomplete
declaration during thinlto, or, alternatively, DWARFLinker could recognize such
situation and copy missed type declaration.

======================================================================================
Debuginfo, Linker folks, What do you think about current results and future
directions?

It introduces quite a significant linking time increase(6x-8x). But it would
affect only those who use that feature.

Thus the users will be able to decide whether that linking time increase is
acceptable or not.
Resolving all 1-6 points is quite a significant work. But, in the result, debug
info is more correct and compact.

Do you think that it would be good to integrate it and to start to work on
improving?

Thank you, Alexey.

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200511/0eee9acc/attachment.html>

Alexey Lapshin via llvm-dev

2020-May-13 19:36 UTC

head link

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

Hi David, Excuse me for delayed answer. It took some time to prepare. Please,
find the answers bellow...

>Broad question: Do you have any specific motivation/users/etc in
implementing this (if you can speak about it)?
> - it might help motivate the work, understand what tradeoffs might be
suitable for you/your users, etc.
There are two general requirements:
 1) Remove (or clean) invalid debug info.
 2) Optimize the DWARF size.

The specifics which our users have:
 - embedded platform which uses 0 as start of .text section.
 - custom toolset which does not support all features yet(f.e. split dwarf).
 - tolerant of the link-time increase.
 - need a useful way to share debug builds.

For the first point: we have a problem "Overlapping address ranges starting
from 0"(D59553).
We use custom solution, but the general solution like D74169 would be better
here.

For the second point: split dwarf could be a good alternative to have debug info
with minimal size.
Still, it has drawbacks (not supported by tools currently, does not solve the
"Overlapping address ranges"
problem, not very convenient to share(even using .dwp)).

Thus in long terms, the D74169 looks to be a good solution for us: resolves
"Overlapping address ranges"
problem, binary with minimal size, supported by current tools, easy to share
debug build(single binary with
minimal size).
> In general, in the current state, I don't have strong feelings either
way about this going in as-is with the intent to >improve it to make it more
viable - or some of that work being done out-of-tree until it's a more
viable >performance tradeoff. Mostly happy to leave that up to folks more
involved with lld.
>
>A couple of minor points...
>> C: --function-sections --gc-sections --fdebug-types-section
> ^ not sure of the point of testing/showing comparisons with a situation
that's currently unsupported

that situation is currently supported(--gc-debuginfo is not used in this
measurement).
"--fdebug-types-section" is supported functionality.
The purpose of these data is to compare results for
"--fdebug-types-section" and "--gc-debuginfo".


>>2. Support of type units.
>>  That could be implemented further.
>Enabling type units increases object size to make it easier to deduplicate
at link time by a DWARF-unaware
>linker. With a DWARF aware linker it'd be generally desirable not to
have to add that object size overhead to
>get the linking improvements.
But, DWARFLinker should adequately work with type units since they are already
implemented.
If someone uses --fdebug-types-section, then it should adequately work when used
together
with --gc-debuginfo(if --gc-debuginfo would be accepted).
Right?

Another thing is that the idea behind type units has the potential to help
Dwarf-aware linker to work faster.
Currently, DWARFLinker analyzes context to understand whether types are the same
or not.
But the context is known when types are generated. So, no need to spent the time
analyzing it.
If types could be compared without analyzing context, then Dwarf-aware linker
would work faster.
That is just an idea(not for immediate implementation): If types would be stored
in some "type table"
(instead of COMDAT section group) and could be accessed through hash-id(like
type units)
- then it would be the solution requiring fewer bits to store but allowing to
compare types
by hash-id(not analysing context).
In this case, size increasing would be small. And processing time could be done
faster.

this is just an idea and could be discussed separately from the problem of
integrating of D74169.


>>4. split DWARF support.
>>   This solution does not work with split DWARF currently. But it could
be useful for the split dwarf in two ways:
>>   a) The generation of skeleton file could be changed in such a way
that address ranges pointing to garbage
>>   collected code would be replaced with lowpc=0, highpc=0. That would
solve the problem of overlapping
>> address ranges(D59553).
>This wouldn't/couldn't completely address the issue - because some
address ranges would be in the .dwo files >the linker can't see - and
they'd still end up with the interesting address ranges.
I see, Thank you. Thus it would not be a complete solution.


>> 6. -flto=thin
>>    That problem was described in this review
https://reviews.llvm.org/D54747#1503720. It also exists in
>> current DWARFLinker/dsymutil implementation. I think that problem
should be discussed more: it could
>> probably be fixed by avoiding generation of such incomplete declaration
during thinlto,
>> That would be costly to produce extra/redundant debug info in ThinLTO -
actually ThinLTO could be doing
>> more to reduce that redundancy early on (actually removing definitions
from some llvm Modules if the type
>> definition is known to exist in another Module, etc)
>I don't know if it's a problem since that patch was reverted.
Yes. That patch was reverted, but this patch(D74169) has the same problem.
if D74169 would be applied and --gc-debuginfo used then structure type
definition would be removed.

DWARFLinker could handle that case - "removing definitions from some llvm
Modules if the type
definition is known to exist in another Module".
i.e. DWARFLinker could replace the declaration with the definition.

But that problem could be more easily resolved when debug info is
generated(probably without
significant increase of debug info size):

Let`s check the example:

0x0000000b: DW_TAG_compile_unit
              DW_AT_low_pc      (0x0000000000201700)
              DW_AT_high_pc     (0x0000000000201719)

0x0000002a:   DW_TAG_subprogram
0x00000043:     DW_TAG_inlined_subroutine
                  DW_AT_abstract_origin (0x0000000000000086 "_Z1fv")
                  DW_AT_low_pc  (0x0000000000201700)
                  DW_AT_high_pc (0x0000000000201718)

0x00000057:       DW_TAG_variable
                    DW_AT_abstract_origin       (0x0000000000000096
"var")
0x00000065:       NULL

0x00000073: DW_TAG_compile_unit
              DW_AT_stmt_list   (0x00000080)

0x00000086:   DW_TAG_subprogram
                DW_AT_name      ("f")
                DW_AT_inline    (DW_INL_inlined)

0x00000096:     DW_TAG_variable
                  DW_AT_name    ("var")
                  DW_AT_type    (0x000000a9 "volatile Foo")
0x000000a1:     NULL

0x000000a9:   DW_TAG_volatile_type
                DW_AT_type      (0x000000ae "Foo")

0x000000ae:   DW_TAG_structure_type
                DW_AT_name      ("Foo")
                DW_AT_declaration       (true)

0x000000c1: DW_TAG_compile_unit
              DW_AT_low_pc      (0x0000000000000000)
              DW_AT_high_pc     (0x0000000000000019)

0x000000e0:   DW_TAG_subprogram
                DW_AT_low_pc    (0x0000000000000000)
                DW_AT_high_pc   (0x0000000000000019)
                DW_AT_name      ("f")

0x000000fd:     DW_TAG_variable
                  DW_AT_name    ("var")
                  DW_AT_type    (0x00000119 "volatile Foo")

0x00000119:   DW_TAG_volatile_type
                DW_AT_type      (0x0000011e "Foo")

0x0000011e:   DW_TAG_structure_type
                DW_AT_name      ("Foo")
                DW_AT_decl_line (1)

Here we have:

DW_TAG_compile_unit(0x0000000b) - compile unit containing concrete instance for
function "f".
DW_TAG_compile_unit(0x00000073) - compile unit containing abstract instance root
for function "f".
DW_TAG_compile_unit(0x000000c1) - compile unit containing function "f"
definition.

Code for function "f" was deleted. gc-debuginfo deletes compile unit
DW_TAG_compile_unit(0x000000c1)
containing "f" definition (since there is no corresponding code). But
it has structure "Foo" definition
DW_TAG_structure_type(0x0000011e) referenced from
DW_TAG_compile_unit(0x00000073)
by declaration DW_TAG_structure_type(0x000000ae). That declaration is exactly
the case when definition
was removed by thinlto and replaced with declaration.

Would it cost too much if type definition would not be replaced with declaration
for "abstract instance root"?
The number of concrete instances is bigger than number of abstract instance
roots.
Probably, it would not be too costly to leave definition in abstract instance
root?

Alternatively, Would it cost too much if type definition would not be replaced
with declaration when declaration references type from not used function? (lto
could understand that concrete function is not used).


Thank you, Alexey.





_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200513/b0ccaf1c/attachment.html>

Maybe Matching Threads

Search for more seemingly similar threads

llvm dev - May 2020 - [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

Maybe Matching Threads