thr3ads.net - llvm dev - [llvm-dev] Fragmented DWARF [Oct 2020]

If this information is useful, please help other people find it:
Share via:

James Henderson via llvm-dev

2020-Oct-19 08:50 UTC

[llvm-dev] Fragmented DWARF

Great, thanks Alexey! I'll try to take a look at this in the near future,
and will report my results back here. I imagine our clang results will
differ, purely because we probably used different toolchains to build the
input in the first place.

On Thu, 15 Oct 2020 at 10:08, Alexey Lapshin <avl.lapshin at gmail.com>
wrote:
>
> On 13.10.2020 10:20, James Henderson wrote:
>
> The script included in the patch can be used to convert an object
> containing normal DWARF into an object using fragmented DWARF. It does this
> by using llvm-dwarfdump to dump the various sections, parses the output to
> identify where it should split (using the offsets of the various entries),
> and then writes new section headers accordingly - you can see roughly what
> it's doing if you get a chance to watch the talk recording. The
additional
> section headers are appended to the end of the ELF section header table,
> whilst the original DWARF is left in the same place it was before (making
> use of the fact that section headers don't have to appear in offset
order).
> The script also parses and fragments the relocation sections targeting the
> DWARF sections so that they match up with the fragmented DWARF sections.
> This is clearly all suboptimal - in practice the compiler should be
> modified to do the fragmenting upfront, to save having to parse a
tool's
> stdout, but that was just the simplest thing I could come up with to
> quickly write the script. Full details of the script usage are included in
> the patch description, if you want to play around with it.
>
> If Alexey could point me at the latest version of his patch, I'd be
happy
> to run that through either or both of the packages I used to see what
> happens. Equally, I'd be happy if Alexey is able to run my script to
> fragment and measure the performance of a couple of projects he's been
> working with. Based purely on the two packages I've tried this with, I
can
> tell already that the results can vary wildly. My expectation is that
> Alexey's approach will be slower (at least in its current form, but
> probably more generally), but produce smaller output, but to what scale I
> have no idea.
>
> James, I updated the patch - https://reviews.llvm.org/D74169.
>
> To make it working it is necessary to build example with
> -ffunction-sections and specify following options to the linker :
>
> --gc-sections --gc-debuginfo --gc-debuginfo-no-odr
>
> For clang binary I got following results:
>
> 1. --gc-sections = binary size 1,5G, Debug Info size(*)1.2G
>
> 2. --gc-sections --gc-debuginfo = binary size 840M, 8x performance
> decrease, Debug Info size 542M
>
> 3. --gc-sections --gc-debuginfo --gc-debuginfo-no-odr = binary size 1,3G,
> 16x performance decrease, Debug Info size 1G
>
> (*) .debug_info+.debug_str+.debug_line+.debug_ranges+.debug_loc
>
>
> I added option --gc-debuginfo-no-odr, so that size reduction could be
> compared correctly. Without that option D74169 does types deduplication and
> then it is not correct to compare resulting size with "Fragmented
DWARF"
> solution which does not do types deduplication.
>
> Also, I look at your D89229 <https://reviews.llvm.org/D89229> and
would
> share results some time later.
>
> Thank you, Alexey.
>
>
> I think linkers parse .eh_frame partly because they have no other choice.
> That being said, I think it's format is not too complex, so similarly
the
> parser isn't too complex. You can see LLD's ELF implementation in
> ELF/EhFrame.cpp, how it is used in ELF/InputSection.cpp (see the bits to do
> with EhInputSection) and EhFrameSection in ELF/SyntheticSections.h (plus
> various usages of these two throughout the LLD code). I think the key to
> any structural changes in the DWARF format to make them more amenable to
> link-time parsing is being able to read a minimal amount without needing to
> parse the payload (e.g. a length field, some sort of type, and then using
> the relocations to associate it accordingly).
>
> James
>
> On Mon, 12 Oct 2020 at 20:48, David Blaikie <dblaikie at gmail.com>
wrote:
>
>> Awesome! Sorry I missed the lightning talk, but really interested to
see
>> this sort of thing (though it's not directly/immediately applicable
to the
>> use case I work with - Split DWARF, something similar could be used
there
>> with further work)
>>
>> Though it looks like the patch has mostly linker changes - where/how do
>> you generate the fragmented DWARF to begin with? Via the Python script?
Run
>> over assembly? I'd be surprised if it was achievable that way -
curious to
>> know more.
>>
>> Got a rough sense/are you able to run apples-to-apples comparisons with
>> Alexey's linker-based patches to compare linker time/memory
overhead versus
>> resulting output size gains?
>>
>> (& yeah, I'm a bit curious about how the linkers do eh_frame
rewriting,
>> if the format is especially amenable to a lightweight parsing/rewriting
and
>> how we could make the DWARF more amenable to that too)
>>
>> On Mon, Oct 12, 2020 at 6:41 AM James Henderson <
>> jh7370.2008 at my.bristol.ac.uk> wrote:
>>
>>> Hi all,
>>>
>>> At the recent LLVM developers' meeting, I presented a lightning
talk on
>>> an approach to reduce the amount of dead debug data left in an
executable
>>> following operations such as --gc-sections and duplicate COMDAT
removal. In
>>> that presentation, I presented some figures based on linking a game
that
>>> had been built by our downstream clang port and fragmented using
the
>>> described approach. Since recording the presentation, I ran the
same
>>> experiment on a clang package (this time built with a GCC version).
The
>>> comparable figures are below:
>>>
>>> Link-time speed (s):
>>>
>>>
+--------------------+-------+---------------+------+------+------+------+------+
>>> | Package variant    | No GC | GC 1 (normal) | GC 2 | GC 3 | GC 4 |
GC 5
>>> | GC 6 |
>>>
>>>
+--------------------+-------+---------------+------+------+------+------+------+
>>> | Game (plain)       |  4.5  |  4.9          |  4.2 |  3.6 |  3.4 |
3.3
>>> |  3.2 |
>>> | Game (fragmented)  | 11.1  | 11.8          |  9.7 |  8.6 |  7.9 |
7.7
>>> |  7.5 |
>>> | Clang (plain)      | 13.9  | 17.9          | 17.0 | 16.7 | 16.3 |
16.2
>>> | 16.1 |
>>> | Clang (fragmented) | 18.6  | 22.8          | 21.6 | 21.1 | 20.8 |
20.5
>>> | 20.2 |
>>>
>>>
+--------------------+-------+---------------+------+------+------+------+------+
>>>
>>> Output size - Game package (MB):
>>>
+---------------------+-------+------+------+------+------+------+------+
>>> | Category            | No GC | GC 1 | GC 2 | GC 3 | GC 4 | GC 5 |
GC 6 |
>>>
+---------------------+-------+------+------+------+------+------+------+
>>> | Plain (total)       | 1149  | 1121 | 1017 |  965 |  938 |  930 | 
928 |
>>> | Plain (DWARF*)      |  845  |  845 |  845 |  845 |  845 |  845 | 
845 |
>>> | Plain (other)       |  304  |  276 |  172 |  120 |   93 |   85 | 
82 |
>>> | Fragmented (total)  | 1044  |  940 |  556 |  373 |  287 |  263 | 
255 |
>>> | Fragmented (DWARF*) |  740  |  664 |  384 |  253 |  194 |  178 | 
173 |
>>> | Fragmented (other)  |  304  |  276 |  172 |  120 |   93 |   85 | 
82 |
>>>
+---------------------+-------+------+------+------+------+------+------+
>>>
>>>
>>> Output size - Clang (MB):
>>>
+---------------------+-------+------+------+------+------+------+------+
>>> | Category            | No GC | GC 1 | GC 2 | GC 3 | GC 4 | GC 5 |
GC 6 |
>>>
+---------------------+-------+------+------+------+------+------+------+
>>> | Plain (total)       | 2596  | 2546 | 2406 | 2332 | 2293 | 2273 |
2251 |
>>> | Plain (DWARF*)      | 1979  | 1979 | 1979 | 1979 | 1979 | 1979 |
1979 |
>>> | Plain (other)       |  616  |  567 |  426 |  353 |  314 |  294 | 
272 |
>>> | Fragmented (total)  | 2397  | 2346 | 2164 | 2069 | 2017 | 1990 |
1963 |
>>> | Fragmented (DWARF*) | 1780  | 1780 | 1738 | 1716 | 1703 | 1696 |
1691 |
>>> | Fragmented (other)  |  616  |  567 |  426 |  353 |  314 |  294 | 
272 |
>>>
+---------------------+-------+------+------+------+------+------+------+
>>>
>>> *DWARF size == total size of .debug_info + .debug_line +
.debug_ranges +
>>> .debug_aranges + .debug_loc
>>>
>>> Additionally, I have posted https://reviews.llvm.org/D89229 which
>>> provides the python script and linker patches used to reproduce the
above
>>> results on my machine. The GC 1/2/3/4/5/6 correspond to the linker
option
>>> added in that patch --mark-live-pc with values 1/0.8/0.6/0.4/0.2/0
>>> respectively.
>>>
>>> During the conference, the question was asked what the memory usage
and
>>> input size impact was. I've summarised these below:
>>>
>>> Input file size total (GB):
>>> +--------------------+------------+
>>> | Package variant    | Total Size |
>>> +--------------------+------------+
>>> | Game (plain)       |     2.9    |
>>> | Game (fragmented)  |     4.2    |
>>> | Clang (plain)      |    10.9    |
>>> | Clang (fragmented) |    12.3    |
>>> +--------------------+------------+
>>>
>>> Peak Working Set Memory usage (GB):
>>> +--------------------+-------+------+
>>> | Package variant    | No GC | GC 1 |
>>> +--------------------+-------+------+
>>> | Game (plain)       |  4.3  |  4.7 |
>>> | Game (fragmented)  |  8.9  |  8.6 |
>>> | Clang (plain)      | 15.7  | 15.6 |
>>> | Clang (fragmented) | 19.4  | 19.2 |
>>> +--------------------+-------+------+
>>>
>>> I'm keen to hear what people's feedback is, and also
interested to see
>>> what results others might see by running this experiment on other
input
>>> packages. Also, if anybody has any alternative ideas that meet the
goals
>>> listed below, I'd love to hear them!
>>>
>>> To reiterate some key goals of fragmented DWARF, similar to what I
said
>>> in the presentation:
>>> 1) Devise a scheme that gives significant size savings without
being too
>>> costly. It's clear from just the two packages I've tried
this on that there
>>> is a fairly hefty link time performance cost, although the exact
cost
>>> depends on the nature of the input package. On the other hand,
depending on
>>> the nature of the input package, there can also be some big gains.
>>> 2) Devise a scheme that doesn't require any linker knowledge of
DWARF.
>>> The current approach doesn't quite achieve this properly due to
the slight
>>> misuse of SHF_LINK_ORDER, but I expect that a pivot to using
non-COMDAT
>>> group sections should solve this problem.
>>> 3) Provide some kind of halfway house between simply writing
tombstone
>>> values into dead DWARF and fully parsing the DWARF to reoptimise
>>> its/discard the dead bits.
>>>
>>> I'm hopeful that changes could be made to the linker to improve
the
>>> link-time cost. There seems to be a significant amount of the link
time
>>> spent creating the input sections. An alternative would be to
devise a
>>> scheme that would avoid the literal splitting into section headers,
in
>>> favour of some sort of list of split-points that the linker uses to
split
>>> things up (a bit like it already does for .eh_frame or mergeable
sections).
>>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201019/87bb1569/attachment.html>

James Henderson via llvm-dev

2020-Oct-29 10:52 UTC

head link

[llvm-dev] Fragmented DWARF

Hi Alexey,

I've just started looking at running your patch on the clang and game
packages I used for the Fragmented DWARF experiment, and on both occasions,
I got "warning: Generated debug info is broken" near the end of the
link.
Digging further, the actual error this represented (for the clang case) was
"invalid e_shentsize in ELF header: 16912" (aside: there are several
Expected instances around where the former warning was reported which are
being thrown away and will cause assertions under the right configuration).
I don't really follow the code enough to understand whether this is a bug
in the code or possibly some weird interaction with our downstream patches
(I don't expect the latter, for the clang build, as our patches are
supposed to be a no-op when not using our target). I'll check what happens
with the clang package if I try using a completely vanilla LLVM with your
patch applied.

I also got a large number of "no mapping for range" warnings when
linking
the game package. I tried debugging the code in the area, but the data
types are all difficult to debug, and I don't really understand the
relevant area of code enough to be able to theorise what actually is
causing this. llvm-dwarfdump --verify doesn't flag up any issues, and
there's nothing obviously broken looking at the dump of the debug data
either. Any pointers as to what might be going wrong would be appreciated.
I assume with your experiments that you build with
-ffunction-sections/-fdata-sections for maximum GC opportunities?

Thanks,

James

On Mon, 19 Oct 2020 at 09:50, James Henderson <jh7370.2008 at
my.bristol.ac.uk>
wrote:
> Great, thanks Alexey! I'll try to take a look at this in the near
future,
> and will report my results back here. I imagine our clang results will
> differ, purely because we probably used different toolchains to build the
> input in the first place.
>
> On Thu, 15 Oct 2020 at 10:08, Alexey Lapshin <avl.lapshin at
gmail.com>
> wrote:
>
>>
>> On 13.10.2020 10:20, James Henderson wrote:
>>
>> The script included in the patch can be used to convert an object
>> containing normal DWARF into an object using fragmented DWARF. It does
this
>> by using llvm-dwarfdump to dump the various sections, parses the output
to
>> identify where it should split (using the offsets of the various
entries),
>> and then writes new section headers accordingly - you can see roughly
what
>> it's doing if you get a chance to watch the talk recording. The
additional
>> section headers are appended to the end of the ELF section header
table,
>> whilst the original DWARF is left in the same place it was before
(making
>> use of the fact that section headers don't have to appear in offset
order).
>> The script also parses and fragments the relocation sections targeting
the
>> DWARF sections so that they match up with the fragmented DWARF
sections.
>> This is clearly all suboptimal - in practice the compiler should be
>> modified to do the fragmenting upfront, to save having to parse a
tool's
>> stdout, but that was just the simplest thing I could come up with to
>> quickly write the script. Full details of the script usage are included
in
>> the patch description, if you want to play around with it.
>>
>> If Alexey could point me at the latest version of his patch, I'd be
happy
>> to run that through either or both of the packages I used to see what
>> happens. Equally, I'd be happy if Alexey is able to run my script
to
>> fragment and measure the performance of a couple of projects he's
been
>> working with. Based purely on the two packages I've tried this
with, I can
>> tell already that the results can vary wildly. My expectation is that
>> Alexey's approach will be slower (at least in its current form, but
>> probably more generally), but produce smaller output, but to what scale
I
>> have no idea.
>>
>> James, I updated the patch - https://reviews.llvm.org/D74169.
>>
>> To make it working it is necessary to build example with
>> -ffunction-sections and specify following options to the linker :
>>
>> --gc-sections --gc-debuginfo --gc-debuginfo-no-odr
>>
>> For clang binary I got following results:
>>
>> 1. --gc-sections = binary size 1,5G, Debug Info size(*)1.2G
>>
>> 2. --gc-sections --gc-debuginfo = binary size 840M, 8x performance
>> decrease, Debug Info size 542M
>>
>> 3. --gc-sections --gc-debuginfo --gc-debuginfo-no-odr = binary size
1,3G,
>> 16x performance decrease, Debug Info size 1G
>>
>> (*) .debug_info+.debug_str+.debug_line+.debug_ranges+.debug_loc
>>
>>
>> I added option --gc-debuginfo-no-odr, so that size reduction could be
>> compared correctly. Without that option D74169 does types deduplication
and
>> then it is not correct to compare resulting size with "Fragmented
DWARF"
>> solution which does not do types deduplication.
>>
>> Also, I look at your D89229 <https://reviews.llvm.org/D89229> and
would
>> share results some time later.
>>
>> Thank you, Alexey.
>>
>>
>> I think linkers parse .eh_frame partly because they have no other
choice.
>> That being said, I think it's format is not too complex, so
similarly the
>> parser isn't too complex. You can see LLD's ELF implementation
in
>> ELF/EhFrame.cpp, how it is used in ELF/InputSection.cpp (see the bits
to do
>> with EhInputSection) and EhFrameSection in ELF/SyntheticSections.h
(plus
>> various usages of these two throughout the LLD code). I think the key
to
>> any structural changes in the DWARF format to make them more amenable
to
>> link-time parsing is being able to read a minimal amount without
needing to
>> parse the payload (e.g. a length field, some sort of type, and then
using
>> the relocations to associate it accordingly).
>>
>> James
>>
>> On Mon, 12 Oct 2020 at 20:48, David Blaikie <dblaikie at
gmail.com> wrote:
>>
>>> Awesome! Sorry I missed the lightning talk, but really interested
to see
>>> this sort of thing (though it's not directly/immediately
applicable to the
>>> use case I work with - Split DWARF, something similar could be used
there
>>> with further work)
>>>
>>> Though it looks like the patch has mostly linker changes -
where/how do
>>> you generate the fragmented DWARF to begin with? Via the Python
script? Run
>>> over assembly? I'd be surprised if it was achievable that way -
curious to
>>> know more.
>>>
>>> Got a rough sense/are you able to run apples-to-apples comparisons
with
>>> Alexey's linker-based patches to compare linker time/memory
overhead versus
>>> resulting output size gains?
>>>
>>> (& yeah, I'm a bit curious about how the linkers do
eh_frame rewriting,
>>> if the format is especially amenable to a lightweight
parsing/rewriting and
>>> how we could make the DWARF more amenable to that too)
>>>
>>> On Mon, Oct 12, 2020 at 6:41 AM James Henderson <
>>> jh7370.2008 at my.bristol.ac.uk> wrote:
>>>
>>>> Hi all,
>>>>
>>>> At the recent LLVM developers' meeting, I presented a
lightning talk on
>>>> an approach to reduce the amount of dead debug data left in an
executable
>>>> following operations such as --gc-sections and duplicate COMDAT
removal. In
>>>> that presentation, I presented some figures based on linking a
game that
>>>> had been built by our downstream clang port and fragmented
using the
>>>> described approach. Since recording the presentation, I ran the
same
>>>> experiment on a clang package (this time built with a GCC
version). The
>>>> comparable figures are below:
>>>>
>>>> Link-time speed (s):
>>>>
>>>>
+--------------------+-------+---------------+------+------+------+------+------+
>>>> | Package variant    | No GC | GC 1 (normal) | GC 2 | GC 3 | GC
4 | GC
>>>> 5 | GC 6 |
>>>>
>>>>
+--------------------+-------+---------------+------+------+------+------+------+
>>>> | Game (plain)       |  4.5  |  4.9          |  4.2 |  3.6 | 
3.4 |
>>>> 3.3 |  3.2 |
>>>> | Game (fragmented)  | 11.1  | 11.8          |  9.7 |  8.6 | 
7.9 |
>>>> 7.7 |  7.5 |
>>>> | Clang (plain)      | 13.9  | 17.9          | 17.0 | 16.7 |
16.3 |
>>>> 16.2 | 16.1 |
>>>> | Clang (fragmented) | 18.6  | 22.8          | 21.6 | 21.1 |
20.8 |
>>>> 20.5 | 20.2 |
>>>>
>>>>
+--------------------+-------+---------------+------+------+------+------+------+
>>>>
>>>> Output size - Game package (MB):
>>>>
>>>>
+---------------------+-------+------+------+------+------+------+------+
>>>> | Category            | No GC | GC 1 | GC 2 | GC 3 | GC 4 | GC
5 | GC 6
>>>> |
>>>>
>>>>
+---------------------+-------+------+------+------+------+------+------+
>>>> | Plain (total)       | 1149  | 1121 | 1017 |  965 |  938 | 
930 |  928
>>>> |
>>>> | Plain (DWARF*)      |  845  |  845 |  845 |  845 |  845 | 
845 |  845
>>>> |
>>>> | Plain (other)       |  304  |  276 |  172 |  120 |   93 |  
85 |   82
>>>> |
>>>> | Fragmented (total)  | 1044  |  940 |  556 |  373 |  287 | 
263 |  255
>>>> |
>>>> | Fragmented (DWARF*) |  740  |  664 |  384 |  253 |  194 | 
178 |  173
>>>> |
>>>> | Fragmented (other)  |  304  |  276 |  172 |  120 |   93 |  
85 |   82
>>>> |
>>>>
+---------------------+-------+------+------+------+------+------+------+
>>>>
>>>>
>>>> Output size - Clang (MB):
>>>>
>>>>
+---------------------+-------+------+------+------+------+------+------+
>>>> | Category            | No GC | GC 1 | GC 2 | GC 3 | GC 4 | GC
5 | GC 6
>>>> |
>>>>
>>>>
+---------------------+-------+------+------+------+------+------+------+
>>>> | Plain (total)       | 2596  | 2546 | 2406 | 2332 | 2293 |
2273 | 2251
>>>> |
>>>> | Plain (DWARF*)      | 1979  | 1979 | 1979 | 1979 | 1979 |
1979 | 1979
>>>> |
>>>> | Plain (other)       |  616  |  567 |  426 |  353 |  314 | 
294 |  272
>>>> |
>>>> | Fragmented (total)  | 2397  | 2346 | 2164 | 2069 | 2017 |
1990 | 1963
>>>> |
>>>> | Fragmented (DWARF*) | 1780  | 1780 | 1738 | 1716 | 1703 |
1696 | 1691
>>>> |
>>>> | Fragmented (other)  |  616  |  567 |  426 |  353 |  314 | 
294 |  272
>>>> |
>>>>
>>>>
+---------------------+-------+------+------+------+------+------+------+
>>>>
>>>> *DWARF size == total size of .debug_info + .debug_line +
.debug_ranges
>>>> + .debug_aranges + .debug_loc
>>>>
>>>> Additionally, I have posted https://reviews.llvm.org/D89229
which
>>>> provides the python script and linker patches used to reproduce
the above
>>>> results on my machine. The GC 1/2/3/4/5/6 correspond to the
linker option
>>>> added in that patch --mark-live-pc with values
1/0.8/0.6/0.4/0.2/0
>>>> respectively.
>>>>
>>>> During the conference, the question was asked what the memory
usage and
>>>> input size impact was. I've summarised these below:
>>>>
>>>> Input file size total (GB):
>>>> +--------------------+------------+
>>>> | Package variant    | Total Size |
>>>> +--------------------+------------+
>>>> | Game (plain)       |     2.9    |
>>>> | Game (fragmented)  |     4.2    |
>>>> | Clang (plain)      |    10.9    |
>>>> | Clang (fragmented) |    12.3    |
>>>> +--------------------+------------+
>>>>
>>>> Peak Working Set Memory usage (GB):
>>>> +--------------------+-------+------+
>>>> | Package variant    | No GC | GC 1 |
>>>> +--------------------+-------+------+
>>>> | Game (plain)       |  4.3  |  4.7 |
>>>> | Game (fragmented)  |  8.9  |  8.6 |
>>>> | Clang (plain)      | 15.7  | 15.6 |
>>>> | Clang (fragmented) | 19.4  | 19.2 |
>>>> +--------------------+-------+------+
>>>>
>>>> I'm keen to hear what people's feedback is, and also
interested to see
>>>> what results others might see by running this experiment on
other input
>>>> packages. Also, if anybody has any alternative ideas that meet
the goals
>>>> listed below, I'd love to hear them!
>>>>
>>>> To reiterate some key goals of fragmented DWARF, similar to
what I said
>>>> in the presentation:
>>>> 1) Devise a scheme that gives significant size savings without
being
>>>> too costly. It's clear from just the two packages I've
tried this on that
>>>> there is a fairly hefty link time performance cost, although
the exact cost
>>>> depends on the nature of the input package. On the other hand,
depending on
>>>> the nature of the input package, there can also be some big
gains.
>>>> 2) Devise a scheme that doesn't require any linker
knowledge of DWARF.
>>>> The current approach doesn't quite achieve this properly
due to the slight
>>>> misuse of SHF_LINK_ORDER, but I expect that a pivot to using
non-COMDAT
>>>> group sections should solve this problem.
>>>> 3) Provide some kind of halfway house between simply writing
tombstone
>>>> values into dead DWARF and fully parsing the DWARF to
reoptimise
>>>> its/discard the dead bits.
>>>>
>>>> I'm hopeful that changes could be made to the linker to
improve the
>>>> link-time cost. There seems to be a significant amount of the
link time
>>>> spent creating the input sections. An alternative would be to
devise a
>>>> scheme that would avoid the literal splitting into section
headers, in
>>>> favour of some sort of list of split-points that the linker
uses to split
>>>> things up (a bit like it already does for .eh_frame or
mergeable sections).
>>>>
>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201029/eea607cf/attachment.html>

Alexey Lapshin via llvm-dev

2020-Oct-29 15:04 UTC

head link

[llvm-dev] Fragmented DWARF

Hi James,

Thank you very much for the information.
According to the first problem: Could you send me a clang build 
configuration that you used so that I could reproduce the problem, please?

For the second problem: yes, I built the experiment with 
-ffunction-sections -fdata-sections.
According to the error message, it seems, that address ranges were read 
incorrectly.
As a quick guess, Could it be that incorrect address ranges are marked 
with -1/-2 value? Then they might be handled incorrectly, since this 
patch does not support(and was not tested) with LowPC>HighPC case. The 
simplest solution would be not to use -1/-2 values with this patch.

Thank you, Alexey.

On 29.10.2020 13:52, James Henderson wrote:> Hi Alexey,
>
> I've just started looking at running your patch on the clang and game 
> packages I used for the Fragmented DWARF experiment, and on both 
> occasions, I got "warning: Generated debug info is broken" near
the
> end of the link. Digging further, the actual error this represented 
> (for the clang case) was "invalid e_shentsize in ELF header:
16912"
> (aside: there are several Expected instances around where the former 
> warning was reported which are being thrown away and will cause 
> assertions under the right configuration). I don't really follow the 
> code enough to understand whether this is a bug in the code or 
> possibly some weird interaction with our downstream patches (I don't 
> expect the latter, for the clang build, as our patches are supposed to 
> be a no-op when not using our target). I'll check what happens with 
> the clang package if I try using a completely vanilla LLVM with your 
> patch applied.
>
> I also got a large number of "no mapping for range" warnings when
> linking the game package. I tried debugging the code in the area, but 
> the data types are all difficult to debug, and I don't really 
> understand the relevant area of code enough to be able to theorise 
> what actually is causing this. llvm-dwarfdump --verify doesn't flag up 
> any issues, and there's nothing obviously broken looking at the dump 
> of the debug data either. Any pointers as to what might be going wrong 
> would be appreciated. I assume with your experiments that you build 
> with -ffunction-sections/-fdata-sections for maximum GC opportunities?
>
> Thanks,
>
> James
>
> On Mon, 19 Oct 2020 at 09:50, James Henderson 
> <jh7370.2008 at my.bristol.ac.uk <mailto:jh7370.2008 at
my.bristol.ac.uk>>
> wrote:
>
>     Great, thanks Alexey! I'll try to take a look at this in the near
>     future, and will report my results back here. I imagine our clang
>     results will differ, purely because we probably used different
>     toolchains to build the input in the first place.
>
>     On Thu, 15 Oct 2020 at 10:08, Alexey Lapshin
>     <avl.lapshin at gmail.com <mailto:avl.lapshin at
gmail.com>> wrote:
>
>
>         On 13.10.2020 10:20, James Henderson wrote:
>>         The script included in the patch can be used to convert an
>>         object containing normal DWARF into an object using
>>         fragmented DWARF. It does this by using llvm-dwarfdump to
>>         dump the various sections, parses the output to identify
>>         where it should split (using the offsets of the various
>>         entries), and then writes new section headers accordingly -
>>         you can see roughly what it's doing if you get a chance to
>>         watch the talk recording. The additional section headers are
>>         appended to the end of the ELF section header table, whilst
>>         the original DWARF is left in the same place it was before
>>         (making use of the fact that section headers don't have to
>>         appear in offset order). The script also parses and fragments
>>         the relocation sections targeting the DWARF sections so that
>>         they match up with the fragmented DWARF sections. This is
>>         clearly all suboptimal - in practice the compiler should be
>>         modified to do the fragmenting upfront, to save having to
>>         parse a tool's stdout, but that was just the simplest thing
I
>>         could come up with to quickly write the script. Full details
>>         of the script usage are included in the patch description, if
>>         you want to play around with it.
>>
>>         If Alexey could point me at the latest version of his patch,
>>         I'd be happy to run that through either or both of the
>>         packages I used to see what happens. Equally, I'd be happy
if
>>         Alexey is able to run my script to fragment and measure the
>>         performance of a couple of projects he's been working with.
>>         Based purely on the two packages I've tried this with, I
can
>>         tell already that the results can vary wildly. My expectation
>>         is that Alexey's approach will be slower (at least in its
>>         current form, but probably more generally), but produce
>>         smaller output, but to what scale I have no idea.
>
>         James, I updated the patch - https://reviews.llvm.org/D74169.
>
>         To make it working it is necessary to build example with
>         -ffunction-sections and specify following options to the linker :
>
>         --gc-sections --gc-debuginfo --gc-debuginfo-no-odr
>
>         For clang binary I got following results:
>
>         1. --gc-sections = binary size 1,5G, Debug Info size(*)1.2G
>
>         2. --gc-sections --gc-debuginfo = binary size 840M, 8x
>         performance decrease, Debug Info size 542M
>
>         3. --gc-sections --gc-debuginfo --gc-debuginfo-no-odr = binary
>         size 1,3G, 16x performance decrease, Debug Info size 1G
>
>         (*) .debug_info+.debug_str+.debug_line+.debug_ranges+.debug_loc
>
>
>         I added option --gc-debuginfo-no-odr, so that size reduction
>         could be compared correctly. Without that option D74169 does
>         types deduplication and then it is not correct to compare
>         resulting size with "Fragmented DWARF" solution which
does not
>         do types deduplication.
>
>         Also, I look at your D89229 <https://reviews.llvm.org/D89229>
>         and would share results some time later.
>
>         Thank you, Alexey.
>
>>
>>         I think linkers parse .eh_frame partly because they have no
>>         other choice. That being said, I think it's format is not
too
>>         complex, so similarly the parser isn't too complex. You can
>>         see LLD's ELF implementation in ELF/EhFrame.cpp, how it is
>>         used in ELF/InputSection.cpp (see the bits to do with
>>         EhInputSection) and EhFrameSection in ELF/SyntheticSections.h
>>         (plus various usages of these two throughout the LLD code). I
>>         think the key to any structural changes in the DWARF format
>>         to make them more amenable to link-time parsing is being able
>>         to read a minimal amount without needing to parse the payload
>>         (e.g. a length field, some sort of type, and then using the
>>         relocations to associate it accordingly).
>>
>>         James
>>
>>         On Mon, 12 Oct 2020 at 20:48, David Blaikie
>>         <dblaikie at gmail.com <mailto:dblaikie at
gmail.com>> wrote:
>>
>>             Awesome! Sorry I missed the lightning talk, but really
>>             interested to see this sort of thing (though it's not
>>             directly/immediately applicable to the use case I work
>>             with - Split DWARF, something similar could be used there
>>             with further work)
>>
>>             Though it looks like the patch has mostly linker changes
>>             - where/how do you generate the fragmented DWARF to begin
>>             with? Via the Python script? Run over assembly? I'd be
>>             surprised if it was achievable that way - curious to know
>>             more.
>>
>>             Got a rough sense/are you able to run apples-to-apples
>>             comparisons with Alexey's linker-based patches to
compare
>>             linker time/memory overhead versus resulting output size
>>             gains?
>>
>>             (& yeah, I'm a bit curious about how the linkers do
>>             eh_frame rewriting, if the format is especially amenable
>>             to a lightweight parsing/rewriting and how we could make
>>             the DWARF more amenable to that too)
>>
>>             On Mon, Oct 12, 2020 at 6:41 AM James Henderson
>>             <jh7370.2008 at my.bristol.ac.uk
>>             <mailto:jh7370.2008 at my.bristol.ac.uk>> wrote:
>>
>>                 Hi all,
>>
>>                 At the recent LLVM developers' meeting, I presented
a
>>                 lightning talk on an approach to reduce the amount of
>>                 dead debug data left in an executable following
>>                 operations such as --gc-sections and duplicate COMDAT
>>                 removal. In that presentation, I presented some
>>                 figures based on linking a game that had been built
>>                 by our downstream clang port and fragmented using the
>>                 described approach. Since recording the presentation,
>>                 I ran the same experiment on a clang package (this
>>                 time built with a GCC version). The comparable
>>                 figures are below:
>>
>>                 Link-time speed (s):
>>                
+--------------------+-------+---------------+------+------+------+------+------+
>>                 | Package variant    | No GC | GC 1 (normal) | GC 2 |
>>                 GC 3 | GC 4 | GC 5 | GC 6 |
>>                
+--------------------+-------+---------------+------+------+------+------+------+
>>                 | Game (plain)       |  4.5  |  4.9          | 4.2 | 
>>                 3.6 |  3.4 |  3.3 |  3.2 |
>>                 | Game (fragmented)  | 11.1  | 11.8          | 9.7 | 
>>                 8.6 |  7.9 |  7.7 |  7.5 |
>>                 | Clang (plain)      | 13.9  | 17.9          | 17.0 |
>>                 16.7 | 16.3 | 16.2 | 16.1 |
>>                 | Clang (fragmented) | 18.6  | 22.8          | 21.6 |
>>                 21.1 | 20.8 | 20.5 | 20.2 |
>>                
+--------------------+-------+---------------+------+------+------+------+------+
>>
>>                 Output size - Game package (MB):
>>                
+---------------------+-------+------+------+------+------+------+------+
>>                 | Category            | No GC | GC 1 | GC 2 | GC 3 |
>>                 GC 4 | GC 5 | GC 6 |
>>                
+---------------------+-------+------+------+------+------+------+------+
>>                 | Plain (total)       | 1149  | 1121 | 1017 |  965 | 
>>                 938 |  930 |  928 |
>>                 | Plain (DWARF*)      |  845  |  845 | 845 |  845 | 
>>                 845 |  845 |  845 |
>>                 | Plain (other)       |  304  |  276 | 172 |  120 |  
>>                 93 |   85 |   82 |
>>                 | Fragmented (total)  | 1044  |  940 | 556 |  373 | 
>>                 287 |  263 |  255 |
>>                 | Fragmented (DWARF*) |  740  |  664 | 384 |  253 | 
>>                 194 |  178 |  173 |
>>                 | Fragmented (other)  |  304  |  276 | 172 |  120 |  
>>                 93 |   85 |   82 |
>>                
+---------------------+-------+------+------+------+------+------+------+
>>
>>
>>                 Output size - Clang (MB):
>>                
+---------------------+-------+------+------+------+------+------+------+
>>                 | Category            | No GC | GC 1 | GC 2 | GC 3 |
>>                 GC 4 | GC 5 | GC 6 |
>>                
+---------------------+-------+------+------+------+------+------+------+
>>                 | Plain (total)       | 2596  | 2546 | 2406 | 2332 |
>>                 2293 | 2273 | 2251 |
>>                 | Plain (DWARF*)      | 1979  | 1979 | 1979 | 1979 |
>>                 1979 | 1979 | 1979 |
>>                 | Plain (other)       |  616  |  567 | 426 |  353 | 
>>                 314 |  294 |  272 |
>>                 | Fragmented (total)  | 2397  | 2346 | 2164 | 2069 |
>>                 2017 | 1990 | 1963 |
>>                 | Fragmented (DWARF*) | 1780  | 1780 | 1738 | 1716 |
>>                 1703 | 1696 | 1691 |
>>                 | Fragmented (other)  |  616  |  567 | 426 |  353 | 
>>                 314 |  294 |  272 |
>>                
+---------------------+-------+------+------+------+------+------+------+
>>
>>                 *DWARF size == total size of .debug_info +
>>                 .debug_line + .debug_ranges + .debug_aranges +
.debug_loc
>>
>>                 Additionally, I have posted
>>                 https://reviews.llvm.org/D89229 which provides the
>>                 python script and linker patches used to reproduce
>>                 the above results on my machine. The GC 1/2/3/4/5/6
>>                 correspond to the linker option added in that patch
>>                 --mark-live-pc with values 1/0.8/0.6/0.4/0.2/0
>>                 respectively.
>>
>>                 During the conference, the question was asked what
>>                 the memory usage and input size impact was. I've
>>                 summarised these below:
>>
>>                 Input file size total (GB):
>>                 +--------------------+------------+
>>                 | Package variant    | Total Size |
>>                 +--------------------+------------+
>>                 | Game (plain)       |     2.9    |
>>                 | Game (fragmented)  |     4.2    |
>>                 | Clang (plain)      |    10.9    |
>>                 | Clang (fragmented) |    12.3    |
>>                 +--------------------+------------+
>>
>>                 Peak Working Set Memory usage (GB):
>>                 +--------------------+-------+------+
>>                 | Package variant    | No GC | GC 1 |
>>                 +--------------------+-------+------+
>>                 | Game (plain)       |  4.3  |  4.7 |
>>                 | Game (fragmented)  |  8.9  |  8.6 |
>>                 | Clang (plain)      | 15.7  | 15.6 |
>>                 | Clang (fragmented) | 19.4  | 19.2 |
>>                 +--------------------+-------+------+
>>
>>                 I'm keen to hear what people's feedback is, and
also
>>                 interested to see what results others might see by
>>                 running this experiment on other input packages.
>>                 Also, if anybody has any alternative ideas that meet
>>                 the goals listed below, I'd love to hear them!
>>
>>                 To reiterate some key goals of fragmented DWARF,
>>                 similar to what I said in the presentation:
>>                 1) Devise a scheme that gives significant size
>>                 savings without being too costly. It's clear from
>>                 just the two packages I've tried this on that there
>>                 is a fairly hefty link time performance cost,
>>                 although the exact cost depends on the nature of the
>>                 input package. On the other hand, depending on the
>>                 nature of the input package, there can also be some
>>                 big gains.
>>                 2) Devise a scheme that doesn't require any linker
>>                 knowledge of DWARF. The current approach doesn't
>>                 quite achieve this properly due to the slight misuse
>>                 of SHF_LINK_ORDER, but I expect that a pivot to using
>>                 non-COMDAT group sections should solve this problem.
>>                 3) Provide some kind of halfway house between simply
>>                 writing tombstone values into dead DWARF and fully
>>                 parsing the DWARF to reoptimise its/discard the dead
>>                 bits.
>>
>>                 I'm hopeful that changes could be made to the
linker
>>                 to improve the link-time cost. There seems to be a
>>                 significant amount of the link time spent creating
>>                 the input sections. An alternative would be to devise
>>                 a scheme that would avoid the literal splitting into
>>                 section headers, in favour of some sort of list of
>>                 split-points that the linker uses to split things up
>>                 (a bit like it already does for .eh_frame or
>>                 mergeable sections).
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201029/1ef6ccfa/attachment-0001.html>

Alexey Lapshin via llvm-dev

2020-Nov-04 11:54 UTC

head link

[llvm-dev] Fragmented DWARF

Hi James,

I did experiments with the clang code base and will do experiments with 
our local codebase later.
Overall, both solutions("Fragmented DWARF" and "DWARFLinker
without odr
types deduplication") look having similar size savings results for the 
final binary. "DWARFLinker with odr types deduplication" has a bigger 
size saving effect. "Fragmented DWARF" increases the size of original 
object files up to 15%.
LLD with "fragmented DWARF" works significantly faster than with 
"DWARFLinker".

Following are the results for "llvm-strings" and "clang"
binaries:

1. llvm-strings:

    source object files size: 381M.
    fragmented source object files size: 451M(18% increase).

    a. upstream version,
       command line options: --gc-sections
       binary size: 6,5M
       compilation time: 0:00.13 sec
       run-time memory: 111kb

    b. "fragmented DWARF" version,
       command line options: --gc-sections --mark-live-pc=0.45
       binary size: 3,7M
       compilation time: 0:00.10 sec
       run-time memory: 122kb

    c. DWARFLinker version,
       command line options: --gc-sections --gc-debuginfo
       binary size: 3,8M
       compilation time: 0:00.33 sec
       run-time memory: 141kb

    d. DWARFLinker no-odr version,
       command line options: --gc-sections --gc-debuginfo 
--gc-debuginfo-no-odr
       binary size: 4,3M
       compilation time: 0:00.38 sec
       run-time memory: 142kb


2. clang:

    source object files size: 6,5G.
    fragmented source object files size: 7,3G(13% increase).

    a. upstream version,
       command line options: --gc-sections
       binary size: 1,5G
       compilation time: 6 sec
       run-time memory: 9.7G

    b. "fragmented DWARF" version,
       command line options: --gc-sections --mark-live-pc=0.43
       binary size: 1,1G
       compilation time: 9 sec
       run-time memory: 11G

    c. DWARFLinker version,
       command line options: --gc-sections --gc-debuginfo
       binary size: 836M
       compilation time: 62 sec
       run-time memory: 15G

    d. DWARFLinker no-odr version,
       command line options: --gc-sections --gc-debuginfo 
--gc-debuginfo-no-odr
       binary size: 1,3G
       compilation time: 128 sec
       run-time memory: 17G

Detailed size results:

1. llvm-strings

    a)

     FILE SIZE        VM SIZE
  --------------  --------------
   41.1%  2.64Mi   0.0%       0    .debug_info
   24.9%  1.60Mi   0.0%       0    .debug_str
   12.6%   827Ki   0.0%       0    .debug_line
    6.5%   428Ki  63.8%   428Ki    .text
    4.8%   317Ki   0.0%       0    .strtab
    3.4%   223Ki   0.0%       0    .debug_ranges
    2.0%   133Ki  19.8%   133Ki    .eh_frame
    1.7%   110Ki   0.0%       0    .symtab
    1.2%  77.6Ki   0.0%       0    .debug_abbrev

    b)

     FILE SIZE        VM SIZE
  --------------  --------------
   50.3%  1.85Mi   0.0%       0    .debug_info
   43.6%  1.60Mi   0.0%       0    .debug_str
    2.6%  98.2Ki   0.0%       0    .debug_line
    2.1%  77.6Ki   0.0%       0    .debug_abbrev
    0.5%  17.5Ki  54.9%  17.4Ki    .text
    0.3%  9.94Ki   0.0%       0    .strtab
    0.2%  6.27Ki   0.0%       0    .symtab
    0.1%  5.09Ki  15.9%  5.03Ki    .eh_frame
    0.1%  3.28Ki   0.0%       0    .debug_ranges

    c)

     FILE SIZE        VM SIZE
  --------------  --------------
   33.0%  1.25Mi   0.0%       0    .debug_info
   29.2%  1.11Mi   0.0%       0    .debug_str
   11.0%   428Ki  63.8%   428Ki    .text
    8.2%   317Ki   0.0%       0    .strtab
    7.8%   304Ki   0.0%       0    .debug_line
    3.4%   133Ki  19.8%   133Ki    .eh_frame
    2.8%   110Ki   0.0%       0    .symtab
    1.7%  65.9Ki   0.0%       0    .debug_ranges
    1.0%  38.4Ki   5.7%  38.4Ki    .rodata

    d)

        FILE SIZE        VM SIZE
  --------------  --------------
   39.7%  1.68Mi   0.0%       0    .debug_info
   26.3%  1.11Mi   0.0%       0    .debug_str
    9.9%   428Ki  63.8%   428Ki    .text
    7.3%   317Ki   0.0%       0    .strtab
    7.0%   304Ki   0.0%       0    .debug_line
    3.1%   133Ki  19.8%   133Ki    .eh_frame
    2.6%   110Ki   0.0%       0    .symtab
    1.5%  65.9Ki   0.0%       0    .debug_ranges


2. clang

    a)

     FILE SIZE        VM SIZE
  --------------  --------------
   58.3%   878Mi   0.0%       0    .debug_info
   11.8%   177Mi   0.0%       0    .debug_str
    7.7%   115Mi  62.2%   115Mi    .text
    7.7%   115Mi   0.0%       0    .debug_line
    6.0%  90.7Mi   0.0%       0    .strtab
    2.4%  35.4Mi   0.0%       0    .debug_ranges
    1.5%  23.3Mi  12.5%  23.3Mi    .eh_frame
    1.5%  23.0Mi  12.4%  23.0Mi    .rodata
    1.2%  17.9Mi   0.0%       0    .symtab

    b)

     FILE SIZE        VM SIZE
  --------------  --------------
   71.5%   772Mi   0.0%       0    .debug_info
   16.5%   177Mi   0.0%       0    .debug_str
    3.7%  40.2Mi  59.2%  40.2Mi    .text
    2.4%  25.8Mi   0.0%       0    .debug_line
    2.1%  23.0Mi   0.0%       0    .strtab
    1.0%  10.6Mi  15.6%  10.6Mi    .dynstr
    0.7%  7.18Mi  10.6%  7.18Mi    .eh_frame
    0.5%  5.60Mi   0.0%       0    .symtab
    0.4%  4.28Mi   0.0%       0    .debug_ranges
    0.4%  4.04Mi   0.0%       0    .debug_abbrev


    c)

     FILE SIZE        VM SIZE
  --------------  --------------
   35.1%   293Mi   0.0%       0    .debug_info
   21.2%   177Mi   0.0%       0    .debug_str
   13.9%   115Mi  62.2%   115Mi    .text
   10.9%  90.7Mi   0.0%       0    .strtab
    6.9%  57.4Mi   0.0%       0    .debug_line
    2.8%  23.3Mi  12.5%  23.3Mi    .eh_frame
    2.8%  23.0Mi  12.4%  23.0Mi    .rodata
    2.1%  17.9Mi   0.0%       0    .symtab
    1.5%  12.4Mi   0.0%       0    .debug_ranges
    1.3%  10.6Mi   5.7%  10.6Mi    .dynstr

    d)

     FILE SIZE        VM SIZE
  --------------  --------------
   58.3%   758Mi   0.0%       0    .debug_info
   13.6%   177Mi   0.0%       0    .debug_str
    8.9%   115Mi  62.2%   115Mi    .text
    7.0%  90.7Mi   0.0%       0    .strtab
    4.4%  57.4Mi   0.0%       0    .debug_line
    1.8%  23.3Mi  12.5%  23.3Mi    .eh_frame
    1.8%  23.0Mi  12.4%  23.0Mi    .rodata
    1.4%  17.9Mi   0.0%       0    .symtab
    1.0%  12.4Mi   0.0%       0    .debug_ranges
    0.8%  10.6Mi   5.7%  10.6Mi    .dynstr

Thank you, Alexey.

On 19.10.2020 11:50, James Henderson wrote:> Great, thanks Alexey! I'll try to take a look at this in the near 
> future, and will report my results back here. I imagine our clang 
> results will differ, purely because we probably used different 
> toolchains to build the input in the first place.
>
> On Thu, 15 Oct 2020 at 10:08, Alexey Lapshin <avl.lapshin at gmail.com 
> <mailto:avl.lapshin at gmail.com>> wrote:
>
>
>     On 13.10.2020 10:20, James Henderson wrote:
>>     The script included in the patch can be used to convert an object
>>     containing normal DWARF into an object using fragmented DWARF. It
>>     does this by using llvm-dwarfdump to dump the various sections,
>>     parses the output to identify where it should split (using the
>>     offsets of the various entries), and then writes new section
>>     headers accordingly - you can see roughly what it's doing if
you
>>     get a chance to watch the talk recording. The additional section
>>     headers are appended to the end of the ELF section header table,
>>     whilst the original DWARF is left in the same place it was before
>>     (making use of the fact that section headers don't have to
appear
>>     in offset order). The script also parses and fragments the
>>     relocation sections targeting the DWARF sections so that they
>>     match up with the fragmented DWARF sections. This is clearly all
>>     suboptimal - in practice the compiler should be modified to do
>>     the fragmenting upfront, to save having to parse a tool's
stdout,
>>     but that was just the simplest thing I could come up with to
>>     quickly write the script. Full details of the script usage are
>>     included in the patch description, if you want to play around
>>     with it.
>>
>>     If Alexey could point me at the latest version of his patch,
I'd
>>     be happy to run that through either or both of the packages I
>>     used to see what happens. Equally, I'd be happy if Alexey is
able
>>     to run my script to fragment and measure the performance of a
>>     couple of projects he's been working with. Based purely on the
>>     two packages I've tried this with, I can tell already that the
>>     results can vary wildly. My expectation is that Alexey's
approach
>>     will be slower (at least in its current form, but probably more
>>     generally), but produce smaller output, but to what scale I have
>>     no idea.
>
>     James, I updated the patch - https://reviews.llvm.org/D74169.
>
>     To make it working it is necessary to build example with
>     -ffunction-sections and specify following options to the linker :
>
>     --gc-sections --gc-debuginfo --gc-debuginfo-no-odr
>
>     For clang binary I got following results:
>
>     1. --gc-sections = binary size 1,5G, Debug Info size(*)1.2G
>
>     2. --gc-sections --gc-debuginfo = binary size 840M, 8x performance
>     decrease, Debug Info size 542M
>
>     3. --gc-sections --gc-debuginfo --gc-debuginfo-no-odr = binary
>     size 1,3G, 16x performance decrease, Debug Info size 1G
>
>     (*) .debug_info+.debug_str+.debug_line+.debug_ranges+.debug_loc
>
>
>     I added option --gc-debuginfo-no-odr, so that size reduction could
>     be compared correctly. Without that option D74169 does types
>     deduplication and then it is not correct to compare resulting size
>     with "Fragmented DWARF" solution which does not do types
>     deduplication.
>
>     Also, I look at your D89229 <https://reviews.llvm.org/D89229> and
>     would share results some time later.
>
>     Thank you, Alexey.
>
>>
>>     I think linkers parse .eh_frame partly because they have no other
>>     choice. That being said, I think it's format is not too
complex,
>>     so similarly the parser isn't too complex. You can see
LLD's ELF
>>     implementation in ELF/EhFrame.cpp, how it is used in
>>     ELF/InputSection.cpp (see the bits to do with EhInputSection) and
>>     EhFrameSection in ELF/SyntheticSections.h (plus various usages of
>>     these two throughout the LLD code). I think the key to any
>>     structural changes in the DWARF format to make them more amenable
>>     to link-time parsing is being able to read a minimal amount
>>     without needing to parse the payload (e.g. a length field, some
>>     sort of type, and then using the relocations to associate it
>>     accordingly).
>>
>>     James
>>
>>     On Mon, 12 Oct 2020 at 20:48, David Blaikie <dblaikie at
gmail.com
>>     <mailto:dblaikie at gmail.com>> wrote:
>>
>>         Awesome! Sorry I missed the lightning talk, but really
>>         interested to see this sort of thing (though it's not
>>         directly/immediately applicable to the use case I work with -
>>         Split DWARF, something similar could be used there with
>>         further work)
>>
>>         Though it looks like the patch has mostly linker changes -
>>         where/how do you generate the fragmented DWARF to begin with?
>>         Via the Python script? Run over assembly? I'd be surprised
if
>>         it was achievable that way - curious to know more.
>>
>>         Got a rough sense/are you able to run apples-to-apples
>>         comparisons with Alexey's linker-based patches to compare
>>         linker time/memory overhead versus resulting output size gains?
>>
>>         (& yeah, I'm a bit curious about how the linkers do
eh_frame
>>         rewriting, if the format is especially amenable to a
>>         lightweight parsing/rewriting and how we could make the DWARF
>>         more amenable to that too)
>>
>>         On Mon, Oct 12, 2020 at 6:41 AM James Henderson
>>         <jh7370.2008 at my.bristol.ac.uk
>>         <mailto:jh7370.2008 at my.bristol.ac.uk>> wrote:
>>
>>             Hi all,
>>
>>             At the recent LLVM developers' meeting, I presented a
>>             lightning talk on an approach to reduce the amount of
>>             dead debug data left in an executable following
>>             operations such as --gc-sections and duplicate COMDAT
>>             removal. In that presentation, I presented some figures
>>             based on linking a game that had been built by our
>>             downstream clang port and fragmented using the described
>>             approach. Since recording the presentation, I ran the
>>             same experiment on a clang package (this time built with
>>             a GCC version). The comparable figures are below:
>>
>>             Link-time speed (s):
>>            
+--------------------+-------+---------------+------+------+------+------+------+
>>             | Package variant | No GC | GC 1 (normal) | GC 2 | GC 3 |
>>             GC 4 | GC 5 | GC 6 |
>>            
+--------------------+-------+---------------+------+------+------+------+------+
>>             | Game (plain) |  4.5  |  4.9          |  4.2 |  3.6 | 
>>             3.4 |  3.3 |  3.2 |
>>             | Game (fragmented) | 11.1  | 11.8          |  9.7 |  8.6
>>             |  7.9 |  7.7 |  7.5 |
>>             | Clang (plain) | 13.9  | 17.9          | 17.0 | 16.7 |
>>             16.3 | 16.2 | 16.1 |
>>             | Clang (fragmented) | 18.6  | 22.8          | 21.6 |
>>             21.1 | 20.8 | 20.5 | 20.2 |
>>            
+--------------------+-------+---------------+------+------+------+------+------+
>>
>>             Output size - Game package (MB):
>>            
+---------------------+-------+------+------+------+------+------+------+
>>             | Category            | No GC | GC 1 | GC 2 | GC 3 | GC 4
>>             | GC 5 | GC 6 |
>>            
+---------------------+-------+------+------+------+------+------+------+
>>             | Plain (total)       | 1149  | 1121 | 1017 |  965 |  938
>>             |  930 |  928 |
>>             | Plain (DWARF*)      |  845  |  845 |  845 |  845 |  845
>>             |  845 |  845 |
>>             | Plain (other)       |  304  |  276 |  172 |  120 |   93
>>             |   85 |   82 |
>>             | Fragmented (total)  | 1044  |  940 |  556 | 373 |  287
>>             |  263 |  255 |
>>             | Fragmented (DWARF*) |  740  |  664 |  384 | 253 |  194
>>             |  178 |  173 |
>>             | Fragmented (other)  |  304  |  276 |  172 | 120 |   93
>>             |   85 |   82 |
>>            
+---------------------+-------+------+------+------+------+------+------+
>>
>>
>>             Output size - Clang (MB):
>>            
+---------------------+-------+------+------+------+------+------+------+
>>             | Category            | No GC | GC 1 | GC 2 | GC 3 | GC 4
>>             | GC 5 | GC 6 |
>>            
+---------------------+-------+------+------+------+------+------+------+
>>             | Plain (total)       | 2596  | 2546 | 2406 | 2332 | 2293
>>             | 2273 | 2251 |
>>             | Plain (DWARF*)      | 1979  | 1979 | 1979 | 1979 | 1979
>>             | 1979 | 1979 |
>>             | Plain (other)       |  616  |  567 |  426 |  353 |  314
>>             |  294 |  272 |
>>             | Fragmented (total)  | 2397  | 2346 | 2164 | 2069 | 2017
>>             | 1990 | 1963 |
>>             | Fragmented (DWARF*) | 1780  | 1780 | 1738 | 1716 | 1703
>>             | 1696 | 1691 |
>>             | Fragmented (other)  |  616  |  567 |  426 |  353 |  314
>>             |  294 |  272 |
>>            
+---------------------+-------+------+------+------+------+------+------+
>>
>>             *DWARF size == total size of .debug_info + .debug_line +
>>             .debug_ranges + .debug_aranges + .debug_loc
>>
>>             Additionally, I have posted
>>             https://reviews.llvm.org/D89229 which provides the python
>>             script and linker patches used to reproduce the above
>>             results on my machine. The GC 1/2/3/4/5/6 correspond to
>>             the linker option added in that patch --mark-live-pc with
>>             values 1/0.8/0.6/0.4/0.2/0 respectively.
>>
>>             During the conference, the question was asked what the
>>             memory usage and input size impact was. I've summarised
>>             these below:
>>
>>             Input file size total (GB):
>>             +--------------------+------------+
>>             | Package variant    | Total Size |
>>             +--------------------+------------+
>>             | Game (plain)       |     2.9    |
>>             | Game (fragmented)  |     4.2    |
>>             | Clang (plain)      |    10.9    |
>>             | Clang (fragmented) |    12.3    |
>>             +--------------------+------------+
>>
>>             Peak Working Set Memory usage (GB):
>>             +--------------------+-------+------+
>>             | Package variant    | No GC | GC 1 |
>>             +--------------------+-------+------+
>>             | Game (plain)       |  4.3  |  4.7 |
>>             | Game (fragmented)  |  8.9  |  8.6 |
>>             | Clang (plain)      | 15.7  | 15.6 |
>>             | Clang (fragmented) | 19.4  | 19.2 |
>>             +--------------------+-------+------+
>>
>>             I'm keen to hear what people's feedback is, and
also
>>             interested to see what results others might see by
>>             running this experiment on other input packages. Also, if
>>             anybody has any alternative ideas that meet the goals
>>             listed below, I'd love to hear them!
>>
>>             To reiterate some key goals of fragmented DWARF, similar
>>             to what I said in the presentation:
>>             1) Devise a scheme that gives significant size savings
>>             without being too costly. It's clear from just the two
>>             packages I've tried this on that there is a fairly
hefty
>>             link time performance cost, although the exact cost
>>             depends on the nature of the input package. On the other
>>             hand, depending on the nature of the input package, there
>>             can also be some big gains.
>>             2) Devise a scheme that doesn't require any linker
>>             knowledge of DWARF. The current approach doesn't quite
>>             achieve this properly due to the slight misuse of
>>             SHF_LINK_ORDER, but I expect that a pivot to using
>>             non-COMDAT group sections should solve this problem.
>>             3) Provide some kind of halfway house between simply
>>             writing tombstone values into dead DWARF and fully
>>             parsing the DWARF to reoptimise its/discard the dead bits.
>>
>>             I'm hopeful that changes could be made to the linker to
>>             improve the link-time cost. There seems to be a
>>             significant amount of the link time spent creating the
>>             input sections. An alternative would be to devise a
>>             scheme that would avoid the literal splitting into
>>             section headers, in favour of some sort of list of
>>             split-points that the linker uses to split things up (a
>>             bit like it already does for .eh_frame or mergeable
>>             sections).
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201104/7192a278/attachment-0001.html>

James Henderson via llvm-dev

2020-Nov-04 12:28 UTC

head link

[llvm-dev] Fragmented DWARF

Hi Alexey,

Thanks for taking a look at these. I noticed you set the --mark-live-pc
value to a value other than 1 for the fragmented DWARF version. This will
mean additional GC-ing will be done beyond the amount that --gc-sections
will do, so unless you use the same value for the option for other
versions, the result will not be comparable. (The option is purely there to
experiment with the effects were different amounts of the input codebase to
be considered dead). Would you be okay to run those figures again without
the option specified?

I'm still trying to figure out the problems on my end to try running your
experiment on the game package I used in my presentation, but have been
interrupted by other unrelated issues. I'll try to get back to this in the
coming days.

James

On Wed, 4 Nov 2020 at 11:54, Alexey Lapshin <avl.lapshin at gmail.com>
wrote:
> Hi James,
>
> I did experiments with the clang code base and will do experiments with
> our local codebase later.
> Overall, both solutions("Fragmented DWARF" and "DWARFLinker
without odr
> types deduplication") look having similar size savings results for the
> final binary. "DWARFLinker with odr types deduplication" has a
bigger size
> saving effect. "Fragmented DWARF" increases the size of original
object
> files up to 15%.
> LLD with "fragmented DWARF" works significantly faster than with
> "DWARFLinker".
>
> Following are the results for "llvm-strings" and
"clang" binaries:
>
> 1. llvm-strings:
>
>    source object files size: 381M.
>    fragmented source object files size: 451M(18% increase).
>
>    a. upstream version,
>       command line options: --gc-sections
>       binary size: 6,5M
>       compilation time: 0:00.13 sec
>       run-time memory: 111kb
>
>    b. "fragmented DWARF" version,
>       command line options: --gc-sections --mark-live-pc=0.45
>       binary size: 3,7M
>       compilation time: 0:00.10 sec
>       run-time memory: 122kb
>
>    c. DWARFLinker version,
>       command line options: --gc-sections --gc-debuginfo
>       binary size: 3,8M
>       compilation time: 0:00.33 sec
>       run-time memory: 141kb
>
>    d. DWARFLinker no-odr version,
>       command line options: --gc-sections --gc-debuginfo
> --gc-debuginfo-no-odr
>       binary size: 4,3M
>       compilation time: 0:00.38 sec
>       run-time memory: 142kb
>
>
> 2. clang:
>
>    source object files size: 6,5G.
>    fragmented source object files size: 7,3G(13% increase).
>
>    a. upstream version,
>       command line options: --gc-sections
>       binary size: 1,5G
>       compilation time: 6 sec
>       run-time memory: 9.7G
>
>    b. "fragmented DWARF" version,
>       command line options: --gc-sections --mark-live-pc=0.43
>       binary size: 1,1G
>       compilation time: 9 sec
>       run-time memory: 11G
>
>    c. DWARFLinker version,
>       command line options: --gc-sections --gc-debuginfo
>       binary size: 836M
>       compilation time: 62 sec
>       run-time memory: 15G
>
>    d. DWARFLinker no-odr version,
>       command line options: --gc-sections --gc-debuginfo
> --gc-debuginfo-no-odr
>       binary size: 1,3G
>       compilation time: 128 sec
>       run-time memory: 17G
>
> Detailed size results:
>
> 1. llvm-strings
>
>    a)
>
>     FILE SIZE        VM SIZE
>  --------------  --------------
>   41.1%  2.64Mi   0.0%       0    .debug_info
>   24.9%  1.60Mi   0.0%       0    .debug_str
>   12.6%   827Ki   0.0%       0    .debug_line
>    6.5%   428Ki  63.8%   428Ki    .text
>    4.8%   317Ki   0.0%       0    .strtab
>    3.4%   223Ki   0.0%       0    .debug_ranges
>    2.0%   133Ki  19.8%   133Ki    .eh_frame
>    1.7%   110Ki   0.0%       0    .symtab
>    1.2%  77.6Ki   0.0%       0    .debug_abbrev
>
>    b)
>
>     FILE SIZE        VM SIZE
>  --------------  --------------
>   50.3%  1.85Mi   0.0%       0    .debug_info
>   43.6%  1.60Mi   0.0%       0    .debug_str
>    2.6%  98.2Ki   0.0%       0    .debug_line
>    2.1%  77.6Ki   0.0%       0    .debug_abbrev
>    0.5%  17.5Ki  54.9%  17.4Ki    .text
>    0.3%  9.94Ki   0.0%       0    .strtab
>    0.2%  6.27Ki   0.0%       0    .symtab
>    0.1%  5.09Ki  15.9%  5.03Ki    .eh_frame
>    0.1%  3.28Ki   0.0%       0    .debug_ranges
>
>    c)
>
>     FILE SIZE        VM SIZE
>  --------------  --------------
>   33.0%  1.25Mi   0.0%       0    .debug_info
>   29.2%  1.11Mi   0.0%       0    .debug_str
>   11.0%   428Ki  63.8%   428Ki    .text
>    8.2%   317Ki   0.0%       0    .strtab
>    7.8%   304Ki   0.0%       0    .debug_line
>    3.4%   133Ki  19.8%   133Ki    .eh_frame
>    2.8%   110Ki   0.0%       0    .symtab
>    1.7%  65.9Ki   0.0%       0    .debug_ranges
>    1.0%  38.4Ki   5.7%  38.4Ki    .rodata
>
>    d)
>
>        FILE SIZE        VM SIZE
>  --------------  --------------
>   39.7%  1.68Mi   0.0%       0    .debug_info
>   26.3%  1.11Mi   0.0%       0    .debug_str
>    9.9%   428Ki  63.8%   428Ki    .text
>    7.3%   317Ki   0.0%       0    .strtab
>    7.0%   304Ki   0.0%       0    .debug_line
>    3.1%   133Ki  19.8%   133Ki    .eh_frame
>    2.6%   110Ki   0.0%       0    .symtab
>    1.5%  65.9Ki   0.0%       0    .debug_ranges
>
>
> 2. clang
>
>    a)
>
>     FILE SIZE        VM SIZE
>  --------------  --------------
>   58.3%   878Mi   0.0%       0    .debug_info
>   11.8%   177Mi   0.0%       0    .debug_str
>    7.7%   115Mi  62.2%   115Mi    .text
>    7.7%   115Mi   0.0%       0    .debug_line
>    6.0%  90.7Mi   0.0%       0    .strtab
>    2.4%  35.4Mi   0.0%       0    .debug_ranges
>    1.5%  23.3Mi  12.5%  23.3Mi    .eh_frame
>    1.5%  23.0Mi  12.4%  23.0Mi    .rodata
>    1.2%  17.9Mi   0.0%       0    .symtab
>
>    b)
>
>     FILE SIZE        VM SIZE
>  --------------  --------------
>   71.5%   772Mi   0.0%       0    .debug_info
>   16.5%   177Mi   0.0%       0    .debug_str
>    3.7%  40.2Mi  59.2%  40.2Mi    .text
>    2.4%  25.8Mi   0.0%       0    .debug_line
>    2.1%  23.0Mi   0.0%       0    .strtab
>    1.0%  10.6Mi  15.6%  10.6Mi    .dynstr
>    0.7%  7.18Mi  10.6%  7.18Mi    .eh_frame
>    0.5%  5.60Mi   0.0%       0    .symtab
>    0.4%  4.28Mi   0.0%       0    .debug_ranges
>    0.4%  4.04Mi   0.0%       0    .debug_abbrev
>
>
>    c)
>
>     FILE SIZE        VM SIZE
>  --------------  --------------
>   35.1%   293Mi   0.0%       0    .debug_info
>   21.2%   177Mi   0.0%       0    .debug_str
>   13.9%   115Mi  62.2%   115Mi    .text
>   10.9%  90.7Mi   0.0%       0    .strtab
>    6.9%  57.4Mi   0.0%       0    .debug_line
>    2.8%  23.3Mi  12.5%  23.3Mi    .eh_frame
>    2.8%  23.0Mi  12.4%  23.0Mi    .rodata
>    2.1%  17.9Mi   0.0%       0    .symtab
>    1.5%  12.4Mi   0.0%       0    .debug_ranges
>    1.3%  10.6Mi   5.7%  10.6Mi    .dynstr
>
>    d)
>
>     FILE SIZE        VM SIZE
>  --------------  --------------
>   58.3%   758Mi   0.0%       0    .debug_info
>   13.6%   177Mi   0.0%       0    .debug_str
>    8.9%   115Mi  62.2%   115Mi    .text
>    7.0%  90.7Mi   0.0%       0    .strtab
>    4.4%  57.4Mi   0.0%       0    .debug_line
>    1.8%  23.3Mi  12.5%  23.3Mi    .eh_frame
>    1.8%  23.0Mi  12.4%  23.0Mi    .rodata
>    1.4%  17.9Mi   0.0%       0    .symtab
>    1.0%  12.4Mi   0.0%       0    .debug_ranges
>    0.8%  10.6Mi   5.7%  10.6Mi    .dynstr
>
> Thank you, Alexey.
> On 19.10.2020 11:50, James Henderson wrote:
>
> Great, thanks Alexey! I'll try to take a look at this in the near
future,
> and will report my results back here. I imagine our clang results will
> differ, purely because we probably used different toolchains to build the
> input in the first place.
>
> On Thu, 15 Oct 2020 at 10:08, Alexey Lapshin <avl.lapshin at
gmail.com>
> wrote:
>
>>
>> On 13.10.2020 10:20, James Henderson wrote:
>>
>> The script included in the patch can be used to convert an object
>> containing normal DWARF into an object using fragmented DWARF. It does
this
>> by using llvm-dwarfdump to dump the various sections, parses the output
to
>> identify where it should split (using the offsets of the various
entries),
>> and then writes new section headers accordingly - you can see roughly
what
>> it's doing if you get a chance to watch the talk recording. The
additional
>> section headers are appended to the end of the ELF section header
table,
>> whilst the original DWARF is left in the same place it was before
(making
>> use of the fact that section headers don't have to appear in offset
order).
>> The script also parses and fragments the relocation sections targeting
the
>> DWARF sections so that they match up with the fragmented DWARF
sections.
>> This is clearly all suboptimal - in practice the compiler should be
>> modified to do the fragmenting upfront, to save having to parse a
tool's
>> stdout, but that was just the simplest thing I could come up with to
>> quickly write the script. Full details of the script usage are included
in
>> the patch description, if you want to play around with it.
>>
>> If Alexey could point me at the latest version of his patch, I'd be
happy
>> to run that through either or both of the packages I used to see what
>> happens. Equally, I'd be happy if Alexey is able to run my script
to
>> fragment and measure the performance of a couple of projects he's
been
>> working with. Based purely on the two packages I've tried this
with, I can
>> tell already that the results can vary wildly. My expectation is that
>> Alexey's approach will be slower (at least in its current form, but
>> probably more generally), but produce smaller output, but to what scale
I
>> have no idea.
>>
>> James, I updated the patch - https://reviews.llvm.org/D74169.
>>
>> To make it working it is necessary to build example with
>> -ffunction-sections and specify following options to the linker :
>>
>> --gc-sections --gc-debuginfo --gc-debuginfo-no-odr
>>
>> For clang binary I got following results:
>>
>> 1. --gc-sections = binary size 1,5G, Debug Info size(*)1.2G
>>
>> 2. --gc-sections --gc-debuginfo = binary size 840M, 8x performance
>> decrease, Debug Info size 542M
>>
>> 3. --gc-sections --gc-debuginfo --gc-debuginfo-no-odr = binary size
1,3G,
>> 16x performance decrease, Debug Info size 1G
>>
>> (*) .debug_info+.debug_str+.debug_line+.debug_ranges+.debug_loc
>>
>>
>> I added option --gc-debuginfo-no-odr, so that size reduction could be
>> compared correctly. Without that option D74169 does types deduplication
and
>> then it is not correct to compare resulting size with "Fragmented
DWARF"
>> solution which does not do types deduplication.
>>
>> Also, I look at your D89229 <https://reviews.llvm.org/D89229> and
would
>> share results some time later.
>>
>> Thank you, Alexey.
>>
>>
>> I think linkers parse .eh_frame partly because they have no other
choice.
>> That being said, I think it's format is not too complex, so
similarly the
>> parser isn't too complex. You can see LLD's ELF implementation
in
>> ELF/EhFrame.cpp, how it is used in ELF/InputSection.cpp (see the bits
to do
>> with EhInputSection) and EhFrameSection in ELF/SyntheticSections.h
(plus
>> various usages of these two throughout the LLD code). I think the key
to
>> any structural changes in the DWARF format to make them more amenable
to
>> link-time parsing is being able to read a minimal amount without
needing to
>> parse the payload (e.g. a length field, some sort of type, and then
using
>> the relocations to associate it accordingly).
>>
>> James
>>
>> On Mon, 12 Oct 2020 at 20:48, David Blaikie <dblaikie at
gmail.com> wrote:
>>
>>> Awesome! Sorry I missed the lightning talk, but really interested
to see
>>> this sort of thing (though it's not directly/immediately
applicable to the
>>> use case I work with - Split DWARF, something similar could be used
there
>>> with further work)
>>>
>>> Though it looks like the patch has mostly linker changes -
where/how do
>>> you generate the fragmented DWARF to begin with? Via the Python
script? Run
>>> over assembly? I'd be surprised if it was achievable that way -
curious to
>>> know more.
>>>
>>> Got a rough sense/are you able to run apples-to-apples comparisons
with
>>> Alexey's linker-based patches to compare linker time/memory
overhead versus
>>> resulting output size gains?
>>>
>>> (& yeah, I'm a bit curious about how the linkers do
eh_frame rewriting,
>>> if the format is especially amenable to a lightweight
parsing/rewriting and
>>> how we could make the DWARF more amenable to that too)
>>>
>>> On Mon, Oct 12, 2020 at 6:41 AM James Henderson <
>>> jh7370.2008 at my.bristol.ac.uk> wrote:
>>>
>>>> Hi all,
>>>>
>>>> At the recent LLVM developers' meeting, I presented a
lightning talk on
>>>> an approach to reduce the amount of dead debug data left in an
executable
>>>> following operations such as --gc-sections and duplicate COMDAT
removal. In
>>>> that presentation, I presented some figures based on linking a
game that
>>>> had been built by our downstream clang port and fragmented
using the
>>>> described approach. Since recording the presentation, I ran the
same
>>>> experiment on a clang package (this time built with a GCC
version). The
>>>> comparable figures are below:
>>>>
>>>> Link-time speed (s):
>>>>
>>>>
+--------------------+-------+---------------+------+------+------+------+------+
>>>> | Package variant    | No GC | GC 1 (normal) | GC 2 | GC 3 | GC
4 | GC
>>>> 5 | GC 6 |
>>>>
>>>>
+--------------------+-------+---------------+------+------+------+------+------+
>>>> | Game (plain)       |  4.5  |  4.9          |  4.2 |  3.6 | 
3.4 |
>>>> 3.3 |  3.2 |
>>>> | Game (fragmented)  | 11.1  | 11.8          |  9.7 |  8.6 | 
7.9 |
>>>> 7.7 |  7.5 |
>>>> | Clang (plain)      | 13.9  | 17.9          | 17.0 | 16.7 |
16.3 |
>>>> 16.2 | 16.1 |
>>>> | Clang (fragmented) | 18.6  | 22.8          | 21.6 | 21.1 |
20.8 |
>>>> 20.5 | 20.2 |
>>>>
>>>>
+--------------------+-------+---------------+------+------+------+------+------+
>>>>
>>>> Output size - Game package (MB):
>>>>
>>>>
+---------------------+-------+------+------+------+------+------+------+
>>>> | Category            | No GC | GC 1 | GC 2 | GC 3 | GC 4 | GC
5 | GC 6
>>>> |
>>>>
>>>>
+---------------------+-------+------+------+------+------+------+------+
>>>> | Plain (total)       | 1149  | 1121 | 1017 |  965 |  938 | 
930 |  928
>>>> |
>>>> | Plain (DWARF*)      |  845  |  845 |  845 |  845 |  845 | 
845 |  845
>>>> |
>>>> | Plain (other)       |  304  |  276 |  172 |  120 |   93 |  
85 |   82
>>>> |
>>>> | Fragmented (total)  | 1044  |  940 |  556 |  373 |  287 | 
263 |  255
>>>> |
>>>> | Fragmented (DWARF*) |  740  |  664 |  384 |  253 |  194 | 
178 |  173
>>>> |
>>>> | Fragmented (other)  |  304  |  276 |  172 |  120 |   93 |  
85 |   82
>>>> |
>>>>
+---------------------+-------+------+------+------+------+------+------+
>>>>
>>>>
>>>> Output size - Clang (MB):
>>>>
>>>>
+---------------------+-------+------+------+------+------+------+------+
>>>> | Category            | No GC | GC 1 | GC 2 | GC 3 | GC 4 | GC
5 | GC 6
>>>> |
>>>>
>>>>
+---------------------+-------+------+------+------+------+------+------+
>>>> | Plain (total)       | 2596  | 2546 | 2406 | 2332 | 2293 |
2273 | 2251
>>>> |
>>>> | Plain (DWARF*)      | 1979  | 1979 | 1979 | 1979 | 1979 |
1979 | 1979
>>>> |
>>>> | Plain (other)       |  616  |  567 |  426 |  353 |  314 | 
294 |  272
>>>> |
>>>> | Fragmented (total)  | 2397  | 2346 | 2164 | 2069 | 2017 |
1990 | 1963
>>>> |
>>>> | Fragmented (DWARF*) | 1780  | 1780 | 1738 | 1716 | 1703 |
1696 | 1691
>>>> |
>>>> | Fragmented (other)  |  616  |  567 |  426 |  353 |  314 | 
294 |  272
>>>> |
>>>>
>>>>
+---------------------+-------+------+------+------+------+------+------+
>>>>
>>>> *DWARF size == total size of .debug_info + .debug_line +
.debug_ranges
>>>> + .debug_aranges + .debug_loc
>>>>
>>>> Additionally, I have posted https://reviews.llvm.org/D89229
which
>>>> provides the python script and linker patches used to reproduce
the above
>>>> results on my machine. The GC 1/2/3/4/5/6 correspond to the
linker option
>>>> added in that patch --mark-live-pc with values
1/0.8/0.6/0.4/0.2/0
>>>> respectively.
>>>>
>>>> During the conference, the question was asked what the memory
usage and
>>>> input size impact was. I've summarised these below:
>>>>
>>>> Input file size total (GB):
>>>> +--------------------+------------+
>>>> | Package variant    | Total Size |
>>>> +--------------------+------------+
>>>> | Game (plain)       |     2.9    |
>>>> | Game (fragmented)  |     4.2    |
>>>> | Clang (plain)      |    10.9    |
>>>> | Clang (fragmented) |    12.3    |
>>>> +--------------------+------------+
>>>>
>>>> Peak Working Set Memory usage (GB):
>>>> +--------------------+-------+------+
>>>> | Package variant    | No GC | GC 1 |
>>>> +--------------------+-------+------+
>>>> | Game (plain)       |  4.3  |  4.7 |
>>>> | Game (fragmented)  |  8.9  |  8.6 |
>>>> | Clang (plain)      | 15.7  | 15.6 |
>>>> | Clang (fragmented) | 19.4  | 19.2 |
>>>> +--------------------+-------+------+
>>>>
>>>> I'm keen to hear what people's feedback is, and also
interested to see
>>>> what results others might see by running this experiment on
other input
>>>> packages. Also, if anybody has any alternative ideas that meet
the goals
>>>> listed below, I'd love to hear them!
>>>>
>>>> To reiterate some key goals of fragmented DWARF, similar to
what I said
>>>> in the presentation:
>>>> 1) Devise a scheme that gives significant size savings without
being
>>>> too costly. It's clear from just the two packages I've
tried this on that
>>>> there is a fairly hefty link time performance cost, although
the exact cost
>>>> depends on the nature of the input package. On the other hand,
depending on
>>>> the nature of the input package, there can also be some big
gains.
>>>> 2) Devise a scheme that doesn't require any linker
knowledge of DWARF.
>>>> The current approach doesn't quite achieve this properly
due to the slight
>>>> misuse of SHF_LINK_ORDER, but I expect that a pivot to using
non-COMDAT
>>>> group sections should solve this problem.
>>>> 3) Provide some kind of halfway house between simply writing
tombstone
>>>> values into dead DWARF and fully parsing the DWARF to
reoptimise
>>>> its/discard the dead bits.
>>>>
>>>> I'm hopeful that changes could be made to the linker to
improve the
>>>> link-time cost. There seems to be a significant amount of the
link time
>>>> spent creating the input sections. An alternative would be to
devise a
>>>> scheme that would avoid the literal splitting into section
headers, in
>>>> favour of some sort of list of split-points that the linker
uses to split
>>>> things up (a bit like it already does for .eh_frame or
mergeable sections).
>>>>
>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201104/07e37e82/attachment.html>

llvm dev - Oct 2020 - Fragmented DWARF

[llvm-dev] Fragmented DWARF

[llvm-dev] Fragmented DWARF

[llvm-dev] Fragmented DWARF

[llvm-dev] Fragmented DWARF

[llvm-dev] Fragmented DWARF