Great, thanks! Those results are about roughly what I was expecting. I assume "compilation time" is actually just the link time? I find it particularly interesting that the DWARFLinker rewriting solution produces the same size improvement in .debug_line as the fragmented DWARF approach. That suggests that in that case, fragmented DWARF output is probably about as optimal as it can get. I'm not surprised that the same can't be said for other sections, but I'm also pleased to see that the full rewrite option isn't so much better in size improvements. Regarding the problems I was having with the patch, if you want to try reproducing the problems with clang, I built commit 05d02e5a of clang using gcc 7.5.0 on Ubuntu 18.04, to generate an ELF package. I then used LLD to relink it to create a reproducible package. As I'm primarily a Windows developer, I transferred this package to my Windows machine so that I could use my existing Windows checkout of LLVM, applied your patch, rebuilt LLD, and used that to try linking the package, getting the stated message. I'm going to have another try at the latter now to see if I can figure out what the issue is myself. James On Wed, 4 Nov 2020 at 13:35, Alexey Lapshin <avl.lapshin at gmail.com> wrote:> > On 04.11.2020 15:28, James Henderson wrote: > > Hi Alexey, > > Thanks for taking a look at these. I noticed you set the --mark-live-pc > value to a value other than 1 for the fragmented DWARF version. This will > mean additional GC-ing will be done beyond the amount that --gc-sections > will do, so unless you use the same value for the option for other > versions, the result will not be comparable. (The option is purely there to > experiment with the effects were different amounts of the input codebase to > be considered dead). Would you be okay to run those figures again without > the option specified? > > Oh, mis-interpreted that option. Following are updated results: > 1. llvm-strings: > > source object files size: 381M. > fragmented source object files size: 451M(18% increase). > > a. upstream version, > command line options: --gc-sections > binary size: 6,5M > compilation time: 0:00.13 sec > run-time memory: 111kb > > b. "fragmented DWARF" version, > command line options: --gc-sections > binary size: 5,3M > compilation time: 0:00.11 sec > run-time memory: 125kb > > c. DWARFLinker version, > command line options: --gc-sections --gc-debuginfo > binary size: 3,8M > compilation time: 0:00.33 sec > run-time memory: 141kb > > d. DWARFLinker no-odr version, > command line options: --gc-sections --gc-debuginfo > --gc-debuginfo-no-odr > binary size: 4,3M > compilation time: 0:00.38 sec > run-time memory: 142kb > > > 2. clang: > > source object files size: 6,5G. > fragmented source object files size: 7,3G(13% increase). > > a. upstream version, > command line options: --gc-sections > binary size: 1,5G > compilation time: 6 sec > run-time memory: 9.7G > > b. "fragmented DWARF" version, > command line options: --gc-sections > binary size: 1,4G > compilation time: 8 sec > run-time memory: 12G > > c. DWARFLinker version, > command line options: --gc-sections --gc-debuginfo > binary size: 836M > compilation time: 62 sec > run-time memory: 15G > > d. DWARFLinker no-odr version, > command line options: --gc-sections --gc-debuginfo > --gc-debuginfo-no-odr > binary size: 1,3G > compilation time: 128 sec > run-time memory: 17G > > Detailed size results: > > 1. a) > > FILE SIZE VM SIZE > -------------- -------------- > 41.1% 2.64Mi 0.0% 0 .debug_info > 24.9% 1.60Mi 0.0% 0 .debug_str > 12.6% 827Ki 0.0% 0 .debug_line > 6.5% 428Ki 63.8% 428Ki .text > 4.8% 317Ki 0.0% 0 .strtab > 3.4% 223Ki 0.0% 0 .debug_ranges > 2.0% 133Ki 19.8% 133Ki .eh_frame > 1.7% 110Ki 0.0% 0 .symtab > 1.2% 77.6Ki 0.0% 0 .debug_abbrev > > b) > > FILE SIZE VM SIZE > -------------- -------------- > 40.2% 2.10Mi 0.0% 0 .debug_info > 30.7% 1.60Mi 0.0% 0 .debug_str > 8.0% 428Ki 63.8% 428Ki .text > 5.9% 317Ki 0.0% 0 .strtab > 5.9% 313Ki 0.0% 0 .debug_line > 2.5% 133Ki 19.8% 133Ki .eh_frame > 2.1% 110Ki 0.0% 0 .symtab > 1.5% 77.6Ki 0.0% 0 .debug_abbrev > 1.3% 69.2Ki 0.0% 0 .debug_ranges > > c) > > FILE SIZE VM SIZE > -------------- -------------- > 33.0% 1.25Mi 0.0% 0 .debug_info > 29.2% 1.11Mi 0.0% 0 .debug_str > 11.0% 428Ki 63.8% 428Ki .text > 8.2% 317Ki 0.0% 0 .strtab > 7.8% 304Ki 0.0% 0 .debug_line > 3.4% 133Ki 19.8% 133Ki .eh_frame > 2.8% 110Ki 0.0% 0 .symtab > 1.7% 65.9Ki 0.0% 0 .debug_ranges > 1.0% 38.4Ki 5.7% 38.4Ki .rodata > > d) > > FILE SIZE VM SIZE > -------------- -------------- > 39.7% 1.68Mi 0.0% 0 .debug_info > 26.3% 1.11Mi 0.0% 0 .debug_str > 9.9% 428Ki 63.8% 428Ki .text > 7.3% 317Ki 0.0% 0 .strtab > 7.0% 304Ki 0.0% 0 .debug_line > 3.1% 133Ki 19.8% 133Ki .eh_frame > 2.6% 110Ki 0.0% 0 .symtab > 1.5% 65.9Ki 0.0% 0 .debug_ranges > > > 2. a) > > FILE SIZE VM SIZE > -------------- -------------- > 58.3% 878Mi 0.0% 0 .debug_info > 11.8% 177Mi 0.0% 0 .debug_str > 7.7% 115Mi 62.2% 115Mi .text > 7.7% 115Mi 0.0% 0 .debug_line > 6.0% 90.7Mi 0.0% 0 .strtab > 2.4% 35.4Mi 0.0% 0 .debug_ranges > 1.5% 23.3Mi 12.5% 23.3Mi .eh_frame > 1.5% 23.0Mi 12.4% 23.0Mi .rodata > 1.2% 17.9Mi 0.0% 0 .symtab > > b) > > FILE SIZE VM SIZE > -------------- -------------- > 59.6% 807Mi 0.0% 0 .debug_info > 13.1% 177Mi 0.0% 0 .debug_str > 8.5% 115Mi 62.2% 115Mi .text > 6.7% 90.7Mi 0.0% 0 .strtab > 4.2% 57.4Mi 0.0% 0 .debug_line > 1.7% 23.3Mi 12.5% 23.3Mi .eh_frame > 1.7% 23.0Mi 12.4% 23.0Mi .rodata > 1.3% 17.9Mi 0.0% 0 .symtab > 1.0% 13.0Mi 0.0% 0 .debug_ranges > 0.8% 10.6Mi 5.7% 10.6Mi .dynstr > > c) > > FILE SIZE VM SIZE > -------------- -------------- > 35.1% 293Mi 0.0% 0 .debug_info > 21.2% 177Mi 0.0% 0 .debug_str > 13.9% 115Mi 62.2% 115Mi .text > 10.9% 90.7Mi 0.0% 0 .strtab > 6.9% 57.4Mi 0.0% 0 .debug_line > 2.8% 23.3Mi 12.5% 23.3Mi .eh_frame > 2.8% 23.0Mi 12.4% 23.0Mi .rodata > 2.1% 17.9Mi 0.0% 0 .symtab > 1.5% 12.4Mi 0.0% 0 .debug_ranges > 1.3% 10.6Mi 5.7% 10.6Mi .dynstr > > d) > > FILE SIZE VM SIZE > -------------- -------------- > 58.3% 758Mi 0.0% 0 .debug_info > 13.6% 177Mi 0.0% 0 .debug_str > 8.9% 115Mi 62.2% 115Mi .text > 7.0% 90.7Mi 0.0% 0 .strtab > 4.4% 57.4Mi 0.0% 0 .debug_line > 1.8% 23.3Mi 12.5% 23.3Mi .eh_frame > 1.8% 23.0Mi 12.4% 23.0Mi .rodata > 1.4% 17.9Mi 0.0% 0 .symtab > 1.0% 12.4Mi 0.0% 0 .debug_ranges > 0.8% 10.6Mi 5.7% 10.6Mi .dynstr > > > > I'm still trying to figure out the problems on my end to try running your > experiment on the game package I used in my presentation, but have been > interrupted by other unrelated issues. I'll try to get back to this in the > coming days. > > James > > On Wed, 4 Nov 2020 at 11:54, Alexey Lapshin <avl.lapshin at gmail.com> wrote: > >> Hi James, >> >> I did experiments with the clang code base and will do experiments with >> our local codebase later. >> Overall, both solutions("Fragmented DWARF" and "DWARFLinker without odr >> types deduplication") look having similar size savings results for the >> final binary. "DWARFLinker with odr types deduplication" has a bigger size >> saving effect. "Fragmented DWARF" increases the size of original object >> files up to 15%. >> LLD with "fragmented DWARF" works significantly faster than with >> "DWARFLinker". >> >> Following are the results for "llvm-strings" and "clang" binaries: >> >> 1. llvm-strings: >> >> source object files size: 381M. >> fragmented source object files size: 451M(18% increase). >> >> a. upstream version, >> command line options: --gc-sections >> binary size: 6,5M >> compilation time: 0:00.13 sec >> run-time memory: 111kb >> >> b. "fragmented DWARF" version, >> command line options: --gc-sections --mark-live-pc=0.45 >> binary size: 3,7M >> compilation time: 0:00.10 sec >> run-time memory: 122kb >> >> c. DWARFLinker version, >> command line options: --gc-sections --gc-debuginfo >> binary size: 3,8M >> compilation time: 0:00.33 sec >> run-time memory: 141kb >> >> d. DWARFLinker no-odr version, >> command line options: --gc-sections --gc-debuginfo >> --gc-debuginfo-no-odr >> binary size: 4,3M >> compilation time: 0:00.38 sec >> run-time memory: 142kb >> >> >> 2. clang: >> >> source object files size: 6,5G. >> fragmented source object files size: 7,3G(13% increase). >> >> a. upstream version, >> command line options: --gc-sections >> binary size: 1,5G >> compilation time: 6 sec >> run-time memory: 9.7G >> >> b. "fragmented DWARF" version, >> command line options: --gc-sections --mark-live-pc=0.43 >> binary size: 1,1G >> compilation time: 9 sec >> run-time memory: 11G >> >> c. DWARFLinker version, >> command line options: --gc-sections --gc-debuginfo >> binary size: 836M >> compilation time: 62 sec >> run-time memory: 15G >> >> d. DWARFLinker no-odr version, >> command line options: --gc-sections --gc-debuginfo >> --gc-debuginfo-no-odr >> binary size: 1,3G >> compilation time: 128 sec >> run-time memory: 17G >> >> Detailed size results: >> >> 1. llvm-strings >> >> a) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 41.1% 2.64Mi 0.0% 0 .debug_info >> 24.9% 1.60Mi 0.0% 0 .debug_str >> 12.6% 827Ki 0.0% 0 .debug_line >> 6.5% 428Ki 63.8% 428Ki .text >> 4.8% 317Ki 0.0% 0 .strtab >> 3.4% 223Ki 0.0% 0 .debug_ranges >> 2.0% 133Ki 19.8% 133Ki .eh_frame >> 1.7% 110Ki 0.0% 0 .symtab >> 1.2% 77.6Ki 0.0% 0 .debug_abbrev >> >> b) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 50.3% 1.85Mi 0.0% 0 .debug_info >> 43.6% 1.60Mi 0.0% 0 .debug_str >> 2.6% 98.2Ki 0.0% 0 .debug_line >> 2.1% 77.6Ki 0.0% 0 .debug_abbrev >> 0.5% 17.5Ki 54.9% 17.4Ki .text >> 0.3% 9.94Ki 0.0% 0 .strtab >> 0.2% 6.27Ki 0.0% 0 .symtab >> 0.1% 5.09Ki 15.9% 5.03Ki .eh_frame >> 0.1% 3.28Ki 0.0% 0 .debug_ranges >> >> c) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 33.0% 1.25Mi 0.0% 0 .debug_info >> 29.2% 1.11Mi 0.0% 0 .debug_str >> 11.0% 428Ki 63.8% 428Ki .text >> 8.2% 317Ki 0.0% 0 .strtab >> 7.8% 304Ki 0.0% 0 .debug_line >> 3.4% 133Ki 19.8% 133Ki .eh_frame >> 2.8% 110Ki 0.0% 0 .symtab >> 1.7% 65.9Ki 0.0% 0 .debug_ranges >> 1.0% 38.4Ki 5.7% 38.4Ki .rodata >> >> d) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 39.7% 1.68Mi 0.0% 0 .debug_info >> 26.3% 1.11Mi 0.0% 0 .debug_str >> 9.9% 428Ki 63.8% 428Ki .text >> 7.3% 317Ki 0.0% 0 .strtab >> 7.0% 304Ki 0.0% 0 .debug_line >> 3.1% 133Ki 19.8% 133Ki .eh_frame >> 2.6% 110Ki 0.0% 0 .symtab >> 1.5% 65.9Ki 0.0% 0 .debug_ranges >> >> >> 2. clang >> >> a) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 58.3% 878Mi 0.0% 0 .debug_info >> 11.8% 177Mi 0.0% 0 .debug_str >> 7.7% 115Mi 62.2% 115Mi .text >> 7.7% 115Mi 0.0% 0 .debug_line >> 6.0% 90.7Mi 0.0% 0 .strtab >> 2.4% 35.4Mi 0.0% 0 .debug_ranges >> 1.5% 23.3Mi 12.5% 23.3Mi .eh_frame >> 1.5% 23.0Mi 12.4% 23.0Mi .rodata >> 1.2% 17.9Mi 0.0% 0 .symtab >> >> b) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 71.5% 772Mi 0.0% 0 .debug_info >> 16.5% 177Mi 0.0% 0 .debug_str >> 3.7% 40.2Mi 59.2% 40.2Mi .text >> 2.4% 25.8Mi 0.0% 0 .debug_line >> 2.1% 23.0Mi 0.0% 0 .strtab >> 1.0% 10.6Mi 15.6% 10.6Mi .dynstr >> 0.7% 7.18Mi 10.6% 7.18Mi .eh_frame >> 0.5% 5.60Mi 0.0% 0 .symtab >> 0.4% 4.28Mi 0.0% 0 .debug_ranges >> 0.4% 4.04Mi 0.0% 0 .debug_abbrev >> >> >> c) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 35.1% 293Mi 0.0% 0 .debug_info >> 21.2% 177Mi 0.0% 0 .debug_str >> 13.9% 115Mi 62.2% 115Mi .text >> 10.9% 90.7Mi 0.0% 0 .strtab >> 6.9% 57.4Mi 0.0% 0 .debug_line >> 2.8% 23.3Mi 12.5% 23.3Mi .eh_frame >> 2.8% 23.0Mi 12.4% 23.0Mi .rodata >> 2.1% 17.9Mi 0.0% 0 .symtab >> 1.5% 12.4Mi 0.0% 0 .debug_ranges >> 1.3% 10.6Mi 5.7% 10.6Mi .dynstr >> >> d) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 58.3% 758Mi 0.0% 0 .debug_info >> 13.6% 177Mi 0.0% 0 .debug_str >> 8.9% 115Mi 62.2% 115Mi .text >> 7.0% 90.7Mi 0.0% 0 .strtab >> 4.4% 57.4Mi 0.0% 0 .debug_line >> 1.8% 23.3Mi 12.5% 23.3Mi .eh_frame >> 1.8% 23.0Mi 12.4% 23.0Mi .rodata >> 1.4% 17.9Mi 0.0% 0 .symtab >> 1.0% 12.4Mi 0.0% 0 .debug_ranges >> 0.8% 10.6Mi 5.7% 10.6Mi .dynstr >> >> Thank you, Alexey. >> On 19.10.2020 11:50, James Henderson wrote: >> >> Great, thanks Alexey! I'll try to take a look at this in the near future, >> and will report my results back here. I imagine our clang results will >> differ, purely because we probably used different toolchains to build the >> input in the first place. >> >> On Thu, 15 Oct 2020 at 10:08, Alexey Lapshin <avl.lapshin at gmail.com> >> wrote: >> >>> >>> On 13.10.2020 10:20, James Henderson wrote: >>> >>> The script included in the patch can be used to convert an object >>> containing normal DWARF into an object using fragmented DWARF. It does this >>> by using llvm-dwarfdump to dump the various sections, parses the output to >>> identify where it should split (using the offsets of the various entries), >>> and then writes new section headers accordingly - you can see roughly what >>> it's doing if you get a chance to watch the talk recording. The additional >>> section headers are appended to the end of the ELF section header table, >>> whilst the original DWARF is left in the same place it was before (making >>> use of the fact that section headers don't have to appear in offset order). >>> The script also parses and fragments the relocation sections targeting the >>> DWARF sections so that they match up with the fragmented DWARF sections. >>> This is clearly all suboptimal - in practice the compiler should be >>> modified to do the fragmenting upfront, to save having to parse a tool's >>> stdout, but that was just the simplest thing I could come up with to >>> quickly write the script. Full details of the script usage are included in >>> the patch description, if you want to play around with it. >>> >>> If Alexey could point me at the latest version of his patch, I'd be >>> happy to run that through either or both of the packages I used to see what >>> happens. Equally, I'd be happy if Alexey is able to run my script to >>> fragment and measure the performance of a couple of projects he's been >>> working with. Based purely on the two packages I've tried this with, I can >>> tell already that the results can vary wildly. My expectation is that >>> Alexey's approach will be slower (at least in its current form, but >>> probably more generally), but produce smaller output, but to what scale I >>> have no idea. >>> >>> James, I updated the patch - https://reviews.llvm.org/D74169. >>> >>> To make it working it is necessary to build example with >>> -ffunction-sections and specify following options to the linker : >>> >>> --gc-sections --gc-debuginfo --gc-debuginfo-no-odr >>> >>> For clang binary I got following results: >>> >>> 1. --gc-sections = binary size 1,5G, Debug Info size(*)1.2G >>> >>> 2. --gc-sections --gc-debuginfo = binary size 840M, 8x performance >>> decrease, Debug Info size 542M >>> >>> 3. --gc-sections --gc-debuginfo --gc-debuginfo-no-odr = binary size >>> 1,3G, 16x performance decrease, Debug Info size 1G >>> >>> (*) .debug_info+.debug_str+.debug_line+.debug_ranges+.debug_loc >>> >>> >>> I added option --gc-debuginfo-no-odr, so that size reduction could be >>> compared correctly. Without that option D74169 does types deduplication and >>> then it is not correct to compare resulting size with "Fragmented DWARF" >>> solution which does not do types deduplication. >>> >>> Also, I look at your D89229 <https://reviews.llvm.org/D89229> and would >>> share results some time later. >>> >>> Thank you, Alexey. >>> >>> >>> I think linkers parse .eh_frame partly because they have no other >>> choice. That being said, I think it's format is not too complex, so >>> similarly the parser isn't too complex. You can see LLD's ELF >>> implementation in ELF/EhFrame.cpp, how it is used in ELF/InputSection.cpp >>> (see the bits to do with EhInputSection) and EhFrameSection in >>> ELF/SyntheticSections.h (plus various usages of these two throughout the >>> LLD code). I think the key to any structural changes in the DWARF format to >>> make them more amenable to link-time parsing is being able to read a >>> minimal amount without needing to parse the payload (e.g. a length field, >>> some sort of type, and then using the relocations to associate it >>> accordingly). >>> >>> James >>> >>> On Mon, 12 Oct 2020 at 20:48, David Blaikie <dblaikie at gmail.com> wrote: >>> >>>> Awesome! Sorry I missed the lightning talk, but really interested to >>>> see this sort of thing (though it's not directly/immediately applicable to >>>> the use case I work with - Split DWARF, something similar could be used >>>> there with further work) >>>> >>>> Though it looks like the patch has mostly linker changes - where/how do >>>> you generate the fragmented DWARF to begin with? Via the Python script? Run >>>> over assembly? I'd be surprised if it was achievable that way - curious to >>>> know more. >>>> >>>> Got a rough sense/are you able to run apples-to-apples comparisons with >>>> Alexey's linker-based patches to compare linker time/memory overhead versus >>>> resulting output size gains? >>>> >>>> (& yeah, I'm a bit curious about how the linkers do eh_frame rewriting, >>>> if the format is especially amenable to a lightweight parsing/rewriting and >>>> how we could make the DWARF more amenable to that too) >>>> >>>> On Mon, Oct 12, 2020 at 6:41 AM James Henderson < >>>> jh7370.2008 at my.bristol.ac.uk> wrote: >>>> >>>>> Hi all, >>>>> >>>>> At the recent LLVM developers' meeting, I presented a lightning talk >>>>> on an approach to reduce the amount of dead debug data left in an >>>>> executable following operations such as --gc-sections and duplicate COMDAT >>>>> removal. In that presentation, I presented some figures based on linking a >>>>> game that had been built by our downstream clang port and fragmented using >>>>> the described approach. Since recording the presentation, I ran the same >>>>> experiment on a clang package (this time built with a GCC version). The >>>>> comparable figures are below: >>>>> >>>>> Link-time speed (s): >>>>> >>>>> +--------------------+-------+---------------+------+------+------+------+------+ >>>>> | Package variant | No GC | GC 1 (normal) | GC 2 | GC 3 | GC 4 | GC >>>>> 5 | GC 6 | >>>>> >>>>> +--------------------+-------+---------------+------+------+------+------+------+ >>>>> | Game (plain) | 4.5 | 4.9 | 4.2 | 3.6 | 3.4 | >>>>> 3.3 | 3.2 | >>>>> | Game (fragmented) | 11.1 | 11.8 | 9.7 | 8.6 | 7.9 | >>>>> 7.7 | 7.5 | >>>>> | Clang (plain) | 13.9 | 17.9 | 17.0 | 16.7 | 16.3 | >>>>> 16.2 | 16.1 | >>>>> | Clang (fragmented) | 18.6 | 22.8 | 21.6 | 21.1 | 20.8 | >>>>> 20.5 | 20.2 | >>>>> >>>>> +--------------------+-------+---------------+------+------+------+------+------+ >>>>> >>>>> Output size - Game package (MB): >>>>> >>>>> +---------------------+-------+------+------+------+------+------+------+ >>>>> | Category | No GC | GC 1 | GC 2 | GC 3 | GC 4 | GC 5 | GC >>>>> 6 | >>>>> >>>>> +---------------------+-------+------+------+------+------+------+------+ >>>>> | Plain (total) | 1149 | 1121 | 1017 | 965 | 938 | 930 | >>>>> 928 | >>>>> | Plain (DWARF*) | 845 | 845 | 845 | 845 | 845 | 845 | >>>>> 845 | >>>>> | Plain (other) | 304 | 276 | 172 | 120 | 93 | 85 | >>>>> 82 | >>>>> | Fragmented (total) | 1044 | 940 | 556 | 373 | 287 | 263 | >>>>> 255 | >>>>> | Fragmented (DWARF*) | 740 | 664 | 384 | 253 | 194 | 178 | >>>>> 173 | >>>>> | Fragmented (other) | 304 | 276 | 172 | 120 | 93 | 85 | >>>>> 82 | >>>>> +---------------------+-------+------+------+------+------+------+------+ >>>>> >>>>> >>>>> Output size - Clang (MB): >>>>> >>>>> +---------------------+-------+------+------+------+------+------+------+ >>>>> | Category | No GC | GC 1 | GC 2 | GC 3 | GC 4 | GC 5 | GC >>>>> 6 | >>>>> >>>>> +---------------------+-------+------+------+------+------+------+------+ >>>>> | Plain (total) | 2596 | 2546 | 2406 | 2332 | 2293 | 2273 | >>>>> 2251 | >>>>> | Plain (DWARF*) | 1979 | 1979 | 1979 | 1979 | 1979 | 1979 | >>>>> 1979 | >>>>> | Plain (other) | 616 | 567 | 426 | 353 | 314 | 294 | >>>>> 272 | >>>>> | Fragmented (total) | 2397 | 2346 | 2164 | 2069 | 2017 | 1990 | >>>>> 1963 | >>>>> | Fragmented (DWARF*) | 1780 | 1780 | 1738 | 1716 | 1703 | 1696 | >>>>> 1691 | >>>>> | Fragmented (other) | 616 | 567 | 426 | 353 | 314 | 294 | >>>>> 272 | >>>>> >>>>> +---------------------+-------+------+------+------+------+------+------+ >>>>> >>>>> *DWARF size == total size of .debug_info + .debug_line + .debug_ranges >>>>> + .debug_aranges + .debug_loc >>>>> >>>>> Additionally, I have posted https://reviews.llvm.org/D89229 which >>>>> provides the python script and linker patches used to reproduce the above >>>>> results on my machine. The GC 1/2/3/4/5/6 correspond to the linker option >>>>> added in that patch --mark-live-pc with values 1/0.8/0.6/0.4/0.2/0 >>>>> respectively. >>>>> >>>>> During the conference, the question was asked what the memory usage >>>>> and input size impact was. I've summarised these below: >>>>> >>>>> Input file size total (GB): >>>>> +--------------------+------------+ >>>>> | Package variant | Total Size | >>>>> +--------------------+------------+ >>>>> | Game (plain) | 2.9 | >>>>> | Game (fragmented) | 4.2 | >>>>> | Clang (plain) | 10.9 | >>>>> | Clang (fragmented) | 12.3 | >>>>> +--------------------+------------+ >>>>> >>>>> Peak Working Set Memory usage (GB): >>>>> +--------------------+-------+------+ >>>>> | Package variant | No GC | GC 1 | >>>>> +--------------------+-------+------+ >>>>> | Game (plain) | 4.3 | 4.7 | >>>>> | Game (fragmented) | 8.9 | 8.6 | >>>>> | Clang (plain) | 15.7 | 15.6 | >>>>> | Clang (fragmented) | 19.4 | 19.2 | >>>>> +--------------------+-------+------+ >>>>> >>>>> I'm keen to hear what people's feedback is, and also interested to see >>>>> what results others might see by running this experiment on other input >>>>> packages. Also, if anybody has any alternative ideas that meet the goals >>>>> listed below, I'd love to hear them! >>>>> >>>>> To reiterate some key goals of fragmented DWARF, similar to what I >>>>> said in the presentation: >>>>> 1) Devise a scheme that gives significant size savings without being >>>>> too costly. It's clear from just the two packages I've tried this on that >>>>> there is a fairly hefty link time performance cost, although the exact cost >>>>> depends on the nature of the input package. On the other hand, depending on >>>>> the nature of the input package, there can also be some big gains. >>>>> 2) Devise a scheme that doesn't require any linker knowledge of DWARF. >>>>> The current approach doesn't quite achieve this properly due to the slight >>>>> misuse of SHF_LINK_ORDER, but I expect that a pivot to using non-COMDAT >>>>> group sections should solve this problem. >>>>> 3) Provide some kind of halfway house between simply writing tombstone >>>>> values into dead DWARF and fully parsing the DWARF to reoptimise >>>>> its/discard the dead bits. >>>>> >>>>> I'm hopeful that changes could be made to the linker to improve the >>>>> link-time cost. There seems to be a significant amount of the link time >>>>> spent creating the input sections. An alternative would be to devise a >>>>> scheme that would avoid the literal splitting into section headers, in >>>>> favour of some sort of list of split-points that the linker uses to split >>>>> things up (a bit like it already does for .eh_frame or mergeable sections). >>>>> >>>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201104/fd634086/attachment-0001.html>
Hi Alexey, Just an update - I identified the cause of the "Generated debug info is broken" error message when I tried to build things locally: the `outStreamer` instance is initialised with the host Triple, instead of whatever the target's triple is. For example, I build and run LLD on Windows, which means that a Windows triple will be generated, and consequently a COFF-emitting streamer will be created, rather than the ELF-emitting one I'd expect were the triple information to somehow be derived from the linker flavor/input objects etc. Hard-coding in my target triple resolved the issue (although I still got the other warnings mentioned from my game link). I measured the performance figures using LLD patched as described, and using the same methodology as my earlier results, and got the following: Link-time speed (s): +-----------------------------+---------------+ | Package variant | GC 1 (normal) | +-----------------------------+---------------+ | Game (DWARF linker) | 53.6 | | Game (DWARF linker, no ODR) | 63.6 | | Clang (DWARF linker) | 200.6 | +-----------------------------+---------------+ Output size - Game package (MB): +-----------------------------+------+ | Category | GC 1 | +-----------------------------+------+ | DWARFLinker (total) | 696 | | DWARFLinker (DWARF*) | 429 | | DWARFLinker (other) | 267 | | DWARFLinker no ODR (total) | 753 | | DWARFLinker no ODR (DWARF*) | 485 | | DWARFLinker no ODR (other) | 268 | +-----------------------------+------+ Output size - Clang (MB): +-----------------------------+------+ | Category | GC 1 | +-----------------------------+------+ | DWARFLinker (total) | 1294 | | DWARFLinker (DWARF*) | 743 | | DWARFLinker (other) | 551 | | DWARFLinker no ODR (total) | 1294 | | DWARFLinker no ODR (DWARF*) | 743 | | DWARFLinker no ODR (other) | 551 | +-----------------------------+------+ *DWARF = just .debug_info, .debug_line, .debug_loc, .debug_aranges, .debug_ranges. Peak Working Set Memory usage (GB): +-----------------------------+------+ | Package variant | GC 1 | +-----------------------------+------+ | Game (DWARFLinker) | 5.7 | | Game (DWARFLinker, no ODR) | 5.8 | | Clang (DWARFLinker) | 22.4 | | Clang (DWARFLinker, no ODR) | 22.5 | +-----------------------------+------+ My opinion is that the time costs of the DWARF Linker approach are not really practical except on build servers, in the current state of affairs for larger packages: clang takes 8.8x as long as the fragmented approach and 11.2x as long as the plain approach (without the no ODR option). The size saving is certainly good, with my version of clang 51% of the total output size for the DWARF linker approach versus the plain approach and 55% of the fragmented approach (though it is likely that further size savings might be possible for the latter). The game produced reasonable size savings too: 62% and 74%, but I'd be surprised if these gains would be enough for people to want to use the approach in day-to-day situations, which presumably is the main use-case for smaller DWARF, due to improved debugger load times. Interesting to note is that the GCC 7.5 build of clang I've used these figures with produced no difference in size results between the two variants, unlike other packages. Consequently, a significant amount of time is saved for no penalty. I'll be interested to see what the time results of the DWARF linker are once further improvements to it have been made. Thanks, James On Wed, 4 Nov 2020 at 13:57, James Henderson <jh7370.2008 at my.bristol.ac.uk> wrote:> Great, thanks! Those results are about roughly what I was expecting. I > assume "compilation time" is actually just the link time? > > I find it particularly interesting that the DWARFLinker rewriting solution > produces the same size improvement in .debug_line as the fragmented DWARF > approach. That suggests that in that case, fragmented DWARF output is > probably about as optimal as it can get. I'm not surprised that the same > can't be said for other sections, but I'm also pleased to see that the full > rewrite option isn't so much better in size improvements. > > Regarding the problems I was having with the patch, if you want to try > reproducing the problems with clang, I built commit 05d02e5a of clang using > gcc 7.5.0 on Ubuntu 18.04, to generate an ELF package. I then used LLD to > relink it to create a reproducible package. As I'm primarily a Windows > developer, I transferred this package to my Windows machine so that I could > use my existing Windows checkout of LLVM, applied your patch, rebuilt LLD, > and used that to try linking the package, getting the stated message. I'm > going to have another try at the latter now to see if I can figure out what > the issue is myself. > > James > > On Wed, 4 Nov 2020 at 13:35, Alexey Lapshin <avl.lapshin at gmail.com> wrote: > >> >> On 04.11.2020 15:28, James Henderson wrote: >> >> Hi Alexey, >> >> Thanks for taking a look at these. I noticed you set the --mark-live-pc >> value to a value other than 1 for the fragmented DWARF version. This will >> mean additional GC-ing will be done beyond the amount that --gc-sections >> will do, so unless you use the same value for the option for other >> versions, the result will not be comparable. (The option is purely there to >> experiment with the effects were different amounts of the input codebase to >> be considered dead). Would you be okay to run those figures again without >> the option specified? >> >> Oh, mis-interpreted that option. Following are updated results: >> 1. llvm-strings: >> >> source object files size: 381M. >> fragmented source object files size: 451M(18% increase). >> >> a. upstream version, >> command line options: --gc-sections >> binary size: 6,5M >> compilation time: 0:00.13 sec >> run-time memory: 111kb >> >> b. "fragmented DWARF" version, >> command line options: --gc-sections >> binary size: 5,3M >> compilation time: 0:00.11 sec >> run-time memory: 125kb >> >> c. DWARFLinker version, >> command line options: --gc-sections --gc-debuginfo >> binary size: 3,8M >> compilation time: 0:00.33 sec >> run-time memory: 141kb >> >> d. DWARFLinker no-odr version, >> command line options: --gc-sections --gc-debuginfo >> --gc-debuginfo-no-odr >> binary size: 4,3M >> compilation time: 0:00.38 sec >> run-time memory: 142kb >> >> >> 2. clang: >> >> source object files size: 6,5G. >> fragmented source object files size: 7,3G(13% increase). >> >> a. upstream version, >> command line options: --gc-sections >> binary size: 1,5G >> compilation time: 6 sec >> run-time memory: 9.7G >> >> b. "fragmented DWARF" version, >> command line options: --gc-sections >> binary size: 1,4G >> compilation time: 8 sec >> run-time memory: 12G >> >> c. DWARFLinker version, >> command line options: --gc-sections --gc-debuginfo >> binary size: 836M >> compilation time: 62 sec >> run-time memory: 15G >> >> d. DWARFLinker no-odr version, >> command line options: --gc-sections --gc-debuginfo >> --gc-debuginfo-no-odr >> binary size: 1,3G >> compilation time: 128 sec >> run-time memory: 17G >> >> Detailed size results: >> >> 1. a) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 41.1% 2.64Mi 0.0% 0 .debug_info >> 24.9% 1.60Mi 0.0% 0 .debug_str >> 12.6% 827Ki 0.0% 0 .debug_line >> 6.5% 428Ki 63.8% 428Ki .text >> 4.8% 317Ki 0.0% 0 .strtab >> 3.4% 223Ki 0.0% 0 .debug_ranges >> 2.0% 133Ki 19.8% 133Ki .eh_frame >> 1.7% 110Ki 0.0% 0 .symtab >> 1.2% 77.6Ki 0.0% 0 .debug_abbrev >> >> b) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 40.2% 2.10Mi 0.0% 0 .debug_info >> 30.7% 1.60Mi 0.0% 0 .debug_str >> 8.0% 428Ki 63.8% 428Ki .text >> 5.9% 317Ki 0.0% 0 .strtab >> 5.9% 313Ki 0.0% 0 .debug_line >> 2.5% 133Ki 19.8% 133Ki .eh_frame >> 2.1% 110Ki 0.0% 0 .symtab >> 1.5% 77.6Ki 0.0% 0 .debug_abbrev >> 1.3% 69.2Ki 0.0% 0 .debug_ranges >> >> c) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 33.0% 1.25Mi 0.0% 0 .debug_info >> 29.2% 1.11Mi 0.0% 0 .debug_str >> 11.0% 428Ki 63.8% 428Ki .text >> 8.2% 317Ki 0.0% 0 .strtab >> 7.8% 304Ki 0.0% 0 .debug_line >> 3.4% 133Ki 19.8% 133Ki .eh_frame >> 2.8% 110Ki 0.0% 0 .symtab >> 1.7% 65.9Ki 0.0% 0 .debug_ranges >> 1.0% 38.4Ki 5.7% 38.4Ki .rodata >> >> d) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 39.7% 1.68Mi 0.0% 0 .debug_info >> 26.3% 1.11Mi 0.0% 0 .debug_str >> 9.9% 428Ki 63.8% 428Ki .text >> 7.3% 317Ki 0.0% 0 .strtab >> 7.0% 304Ki 0.0% 0 .debug_line >> 3.1% 133Ki 19.8% 133Ki .eh_frame >> 2.6% 110Ki 0.0% 0 .symtab >> 1.5% 65.9Ki 0.0% 0 .debug_ranges >> >> >> 2. a) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 58.3% 878Mi 0.0% 0 .debug_info >> 11.8% 177Mi 0.0% 0 .debug_str >> 7.7% 115Mi 62.2% 115Mi .text >> 7.7% 115Mi 0.0% 0 .debug_line >> 6.0% 90.7Mi 0.0% 0 .strtab >> 2.4% 35.4Mi 0.0% 0 .debug_ranges >> 1.5% 23.3Mi 12.5% 23.3Mi .eh_frame >> 1.5% 23.0Mi 12.4% 23.0Mi .rodata >> 1.2% 17.9Mi 0.0% 0 .symtab >> >> b) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 59.6% 807Mi 0.0% 0 .debug_info >> 13.1% 177Mi 0.0% 0 .debug_str >> 8.5% 115Mi 62.2% 115Mi .text >> 6.7% 90.7Mi 0.0% 0 .strtab >> 4.2% 57.4Mi 0.0% 0 .debug_line >> 1.7% 23.3Mi 12.5% 23.3Mi .eh_frame >> 1.7% 23.0Mi 12.4% 23.0Mi .rodata >> 1.3% 17.9Mi 0.0% 0 .symtab >> 1.0% 13.0Mi 0.0% 0 .debug_ranges >> 0.8% 10.6Mi 5.7% 10.6Mi .dynstr >> >> c) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 35.1% 293Mi 0.0% 0 .debug_info >> 21.2% 177Mi 0.0% 0 .debug_str >> 13.9% 115Mi 62.2% 115Mi .text >> 10.9% 90.7Mi 0.0% 0 .strtab >> 6.9% 57.4Mi 0.0% 0 .debug_line >> 2.8% 23.3Mi 12.5% 23.3Mi .eh_frame >> 2.8% 23.0Mi 12.4% 23.0Mi .rodata >> 2.1% 17.9Mi 0.0% 0 .symtab >> 1.5% 12.4Mi 0.0% 0 .debug_ranges >> 1.3% 10.6Mi 5.7% 10.6Mi .dynstr >> >> d) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 58.3% 758Mi 0.0% 0 .debug_info >> 13.6% 177Mi 0.0% 0 .debug_str >> 8.9% 115Mi 62.2% 115Mi .text >> 7.0% 90.7Mi 0.0% 0 .strtab >> 4.4% 57.4Mi 0.0% 0 .debug_line >> 1.8% 23.3Mi 12.5% 23.3Mi .eh_frame >> 1.8% 23.0Mi 12.4% 23.0Mi .rodata >> 1.4% 17.9Mi 0.0% 0 .symtab >> 1.0% 12.4Mi 0.0% 0 .debug_ranges >> 0.8% 10.6Mi 5.7% 10.6Mi .dynstr >> >> >> >> I'm still trying to figure out the problems on my end to try running your >> experiment on the game package I used in my presentation, but have been >> interrupted by other unrelated issues. I'll try to get back to this in the >> coming days. >> >> James >> >> On Wed, 4 Nov 2020 at 11:54, Alexey Lapshin <avl.lapshin at gmail.com> >> wrote: >> >>> Hi James, >>> >>> I did experiments with the clang code base and will do experiments with >>> our local codebase later. >>> Overall, both solutions("Fragmented DWARF" and "DWARFLinker without odr >>> types deduplication") look having similar size savings results for the >>> final binary. "DWARFLinker with odr types deduplication" has a bigger size >>> saving effect. "Fragmented DWARF" increases the size of original object >>> files up to 15%. >>> LLD with "fragmented DWARF" works significantly faster than with >>> "DWARFLinker". >>> >>> Following are the results for "llvm-strings" and "clang" binaries: >>> >>> 1. llvm-strings: >>> >>> source object files size: 381M. >>> fragmented source object files size: 451M(18% increase). >>> >>> a. upstream version, >>> command line options: --gc-sections >>> binary size: 6,5M >>> compilation time: 0:00.13 sec >>> run-time memory: 111kb >>> >>> b. "fragmented DWARF" version, >>> command line options: --gc-sections --mark-live-pc=0.45 >>> binary size: 3,7M >>> compilation time: 0:00.10 sec >>> run-time memory: 122kb >>> >>> c. DWARFLinker version, >>> command line options: --gc-sections --gc-debuginfo >>> binary size: 3,8M >>> compilation time: 0:00.33 sec >>> run-time memory: 141kb >>> >>> d. DWARFLinker no-odr version, >>> command line options: --gc-sections --gc-debuginfo >>> --gc-debuginfo-no-odr >>> binary size: 4,3M >>> compilation time: 0:00.38 sec >>> run-time memory: 142kb >>> >>> >>> 2. clang: >>> >>> source object files size: 6,5G. >>> fragmented source object files size: 7,3G(13% increase). >>> >>> a. upstream version, >>> command line options: --gc-sections >>> binary size: 1,5G >>> compilation time: 6 sec >>> run-time memory: 9.7G >>> >>> b. "fragmented DWARF" version, >>> command line options: --gc-sections --mark-live-pc=0.43 >>> binary size: 1,1G >>> compilation time: 9 sec >>> run-time memory: 11G >>> >>> c. DWARFLinker version, >>> command line options: --gc-sections --gc-debuginfo >>> binary size: 836M >>> compilation time: 62 sec >>> run-time memory: 15G >>> >>> d. DWARFLinker no-odr version, >>> command line options: --gc-sections --gc-debuginfo >>> --gc-debuginfo-no-odr >>> binary size: 1,3G >>> compilation time: 128 sec >>> run-time memory: 17G >>> >>> Detailed size results: >>> >>> 1. llvm-strings >>> >>> a) >>> >>> FILE SIZE VM SIZE >>> -------------- -------------- >>> 41.1% 2.64Mi 0.0% 0 .debug_info >>> 24.9% 1.60Mi 0.0% 0 .debug_str >>> 12.6% 827Ki 0.0% 0 .debug_line >>> 6.5% 428Ki 63.8% 428Ki .text >>> 4.8% 317Ki 0.0% 0 .strtab >>> 3.4% 223Ki 0.0% 0 .debug_ranges >>> 2.0% 133Ki 19.8% 133Ki .eh_frame >>> 1.7% 110Ki 0.0% 0 .symtab >>> 1.2% 77.6Ki 0.0% 0 .debug_abbrev >>> >>> b) >>> >>> FILE SIZE VM SIZE >>> -------------- -------------- >>> 50.3% 1.85Mi 0.0% 0 .debug_info >>> 43.6% 1.60Mi 0.0% 0 .debug_str >>> 2.6% 98.2Ki 0.0% 0 .debug_line >>> 2.1% 77.6Ki 0.0% 0 .debug_abbrev >>> 0.5% 17.5Ki 54.9% 17.4Ki .text >>> 0.3% 9.94Ki 0.0% 0 .strtab >>> 0.2% 6.27Ki 0.0% 0 .symtab >>> 0.1% 5.09Ki 15.9% 5.03Ki .eh_frame >>> 0.1% 3.28Ki 0.0% 0 .debug_ranges >>> >>> c) >>> >>> FILE SIZE VM SIZE >>> -------------- -------------- >>> 33.0% 1.25Mi 0.0% 0 .debug_info >>> 29.2% 1.11Mi 0.0% 0 .debug_str >>> 11.0% 428Ki 63.8% 428Ki .text >>> 8.2% 317Ki 0.0% 0 .strtab >>> 7.8% 304Ki 0.0% 0 .debug_line >>> 3.4% 133Ki 19.8% 133Ki .eh_frame >>> 2.8% 110Ki 0.0% 0 .symtab >>> 1.7% 65.9Ki 0.0% 0 .debug_ranges >>> 1.0% 38.4Ki 5.7% 38.4Ki .rodata >>> >>> d) >>> >>> FILE SIZE VM SIZE >>> -------------- -------------- >>> 39.7% 1.68Mi 0.0% 0 .debug_info >>> 26.3% 1.11Mi 0.0% 0 .debug_str >>> 9.9% 428Ki 63.8% 428Ki .text >>> 7.3% 317Ki 0.0% 0 .strtab >>> 7.0% 304Ki 0.0% 0 .debug_line >>> 3.1% 133Ki 19.8% 133Ki .eh_frame >>> 2.6% 110Ki 0.0% 0 .symtab >>> 1.5% 65.9Ki 0.0% 0 .debug_ranges >>> >>> >>> 2. clang >>> >>> a) >>> >>> FILE SIZE VM SIZE >>> -------------- -------------- >>> 58.3% 878Mi 0.0% 0 .debug_info >>> 11.8% 177Mi 0.0% 0 .debug_str >>> 7.7% 115Mi 62.2% 115Mi .text >>> 7.7% 115Mi 0.0% 0 .debug_line >>> 6.0% 90.7Mi 0.0% 0 .strtab >>> 2.4% 35.4Mi 0.0% 0 .debug_ranges >>> 1.5% 23.3Mi 12.5% 23.3Mi .eh_frame >>> 1.5% 23.0Mi 12.4% 23.0Mi .rodata >>> 1.2% 17.9Mi 0.0% 0 .symtab >>> >>> b) >>> >>> FILE SIZE VM SIZE >>> -------------- -------------- >>> 71.5% 772Mi 0.0% 0 .debug_info >>> 16.5% 177Mi 0.0% 0 .debug_str >>> 3.7% 40.2Mi 59.2% 40.2Mi .text >>> 2.4% 25.8Mi 0.0% 0 .debug_line >>> 2.1% 23.0Mi 0.0% 0 .strtab >>> 1.0% 10.6Mi 15.6% 10.6Mi .dynstr >>> 0.7% 7.18Mi 10.6% 7.18Mi .eh_frame >>> 0.5% 5.60Mi 0.0% 0 .symtab >>> 0.4% 4.28Mi 0.0% 0 .debug_ranges >>> 0.4% 4.04Mi 0.0% 0 .debug_abbrev >>> >>> >>> c) >>> >>> FILE SIZE VM SIZE >>> -------------- -------------- >>> 35.1% 293Mi 0.0% 0 .debug_info >>> 21.2% 177Mi 0.0% 0 .debug_str >>> 13.9% 115Mi 62.2% 115Mi .text >>> 10.9% 90.7Mi 0.0% 0 .strtab >>> 6.9% 57.4Mi 0.0% 0 .debug_line >>> 2.8% 23.3Mi 12.5% 23.3Mi .eh_frame >>> 2.8% 23.0Mi 12.4% 23.0Mi .rodata >>> 2.1% 17.9Mi 0.0% 0 .symtab >>> 1.5% 12.4Mi 0.0% 0 .debug_ranges >>> 1.3% 10.6Mi 5.7% 10.6Mi .dynstr >>> >>> d) >>> >>> FILE SIZE VM SIZE >>> -------------- -------------- >>> 58.3% 758Mi 0.0% 0 .debug_info >>> 13.6% 177Mi 0.0% 0 .debug_str >>> 8.9% 115Mi 62.2% 115Mi .text >>> 7.0% 90.7Mi 0.0% 0 .strtab >>> 4.4% 57.4Mi 0.0% 0 .debug_line >>> 1.8% 23.3Mi 12.5% 23.3Mi .eh_frame >>> 1.8% 23.0Mi 12.4% 23.0Mi .rodata >>> 1.4% 17.9Mi 0.0% 0 .symtab >>> 1.0% 12.4Mi 0.0% 0 .debug_ranges >>> 0.8% 10.6Mi 5.7% 10.6Mi .dynstr >>> >>> Thank you, Alexey. >>> On 19.10.2020 11:50, James Henderson wrote: >>> >>> Great, thanks Alexey! I'll try to take a look at this in the near >>> future, and will report my results back here. I imagine our clang results >>> will differ, purely because we probably used different toolchains to build >>> the input in the first place. >>> >>> On Thu, 15 Oct 2020 at 10:08, Alexey Lapshin <avl.lapshin at gmail.com> >>> wrote: >>> >>>> >>>> On 13.10.2020 10:20, James Henderson wrote: >>>> >>>> The script included in the patch can be used to convert an object >>>> containing normal DWARF into an object using fragmented DWARF. It does this >>>> by using llvm-dwarfdump to dump the various sections, parses the output to >>>> identify where it should split (using the offsets of the various entries), >>>> and then writes new section headers accordingly - you can see roughly what >>>> it's doing if you get a chance to watch the talk recording. The additional >>>> section headers are appended to the end of the ELF section header table, >>>> whilst the original DWARF is left in the same place it was before (making >>>> use of the fact that section headers don't have to appear in offset order). >>>> The script also parses and fragments the relocation sections targeting the >>>> DWARF sections so that they match up with the fragmented DWARF sections. >>>> This is clearly all suboptimal - in practice the compiler should be >>>> modified to do the fragmenting upfront, to save having to parse a tool's >>>> stdout, but that was just the simplest thing I could come up with to >>>> quickly write the script. Full details of the script usage are included in >>>> the patch description, if you want to play around with it. >>>> >>>> If Alexey could point me at the latest version of his patch, I'd be >>>> happy to run that through either or both of the packages I used to see what >>>> happens. Equally, I'd be happy if Alexey is able to run my script to >>>> fragment and measure the performance of a couple of projects he's been >>>> working with. Based purely on the two packages I've tried this with, I can >>>> tell already that the results can vary wildly. My expectation is that >>>> Alexey's approach will be slower (at least in its current form, but >>>> probably more generally), but produce smaller output, but to what scale I >>>> have no idea. >>>> >>>> James, I updated the patch - https://reviews.llvm.org/D74169. >>>> >>>> To make it working it is necessary to build example with >>>> -ffunction-sections and specify following options to the linker : >>>> >>>> --gc-sections --gc-debuginfo --gc-debuginfo-no-odr >>>> >>>> For clang binary I got following results: >>>> >>>> 1. --gc-sections = binary size 1,5G, Debug Info size(*)1.2G >>>> >>>> 2. --gc-sections --gc-debuginfo = binary size 840M, 8x performance >>>> decrease, Debug Info size 542M >>>> >>>> 3. --gc-sections --gc-debuginfo --gc-debuginfo-no-odr = binary size >>>> 1,3G, 16x performance decrease, Debug Info size 1G >>>> >>>> (*) .debug_info+.debug_str+.debug_line+.debug_ranges+.debug_loc >>>> >>>> >>>> I added option --gc-debuginfo-no-odr, so that size reduction could be >>>> compared correctly. Without that option D74169 does types deduplication and >>>> then it is not correct to compare resulting size with "Fragmented DWARF" >>>> solution which does not do types deduplication. >>>> >>>> Also, I look at your D89229 <https://reviews.llvm.org/D89229> and >>>> would share results some time later. >>>> >>>> Thank you, Alexey. >>>> >>>> >>>> I think linkers parse .eh_frame partly because they have no other >>>> choice. That being said, I think it's format is not too complex, so >>>> similarly the parser isn't too complex. You can see LLD's ELF >>>> implementation in ELF/EhFrame.cpp, how it is used in ELF/InputSection.cpp >>>> (see the bits to do with EhInputSection) and EhFrameSection in >>>> ELF/SyntheticSections.h (plus various usages of these two throughout the >>>> LLD code). I think the key to any structural changes in the DWARF format to >>>> make them more amenable to link-time parsing is being able to read a >>>> minimal amount without needing to parse the payload (e.g. a length field, >>>> some sort of type, and then using the relocations to associate it >>>> accordingly). >>>> >>>> James >>>> >>>> On Mon, 12 Oct 2020 at 20:48, David Blaikie <dblaikie at gmail.com> wrote: >>>> >>>>> Awesome! Sorry I missed the lightning talk, but really interested to >>>>> see this sort of thing (though it's not directly/immediately applicable to >>>>> the use case I work with - Split DWARF, something similar could be used >>>>> there with further work) >>>>> >>>>> Though it looks like the patch has mostly linker changes - where/how >>>>> do you generate the fragmented DWARF to begin with? Via the Python script? >>>>> Run over assembly? I'd be surprised if it was achievable that way - curious >>>>> to know more. >>>>> >>>>> Got a rough sense/are you able to run apples-to-apples comparisons >>>>> with Alexey's linker-based patches to compare linker time/memory overhead >>>>> versus resulting output size gains? >>>>> >>>>> (& yeah, I'm a bit curious about how the linkers do eh_frame >>>>> rewriting, if the format is especially amenable to a lightweight >>>>> parsing/rewriting and how we could make the DWARF more amenable to that too) >>>>> >>>>> On Mon, Oct 12, 2020 at 6:41 AM James Henderson < >>>>> jh7370.2008 at my.bristol.ac.uk> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> At the recent LLVM developers' meeting, I presented a lightning talk >>>>>> on an approach to reduce the amount of dead debug data left in an >>>>>> executable following operations such as --gc-sections and duplicate COMDAT >>>>>> removal. In that presentation, I presented some figures based on linking a >>>>>> game that had been built by our downstream clang port and fragmented using >>>>>> the described approach. Since recording the presentation, I ran the same >>>>>> experiment on a clang package (this time built with a GCC version). The >>>>>> comparable figures are below: >>>>>> >>>>>> Link-time speed (s): >>>>>> >>>>>> +--------------------+-------+---------------+------+------+------+------+------+ >>>>>> | Package variant | No GC | GC 1 (normal) | GC 2 | GC 3 | GC 4 | >>>>>> GC 5 | GC 6 | >>>>>> >>>>>> +--------------------+-------+---------------+------+------+------+------+------+ >>>>>> | Game (plain) | 4.5 | 4.9 | 4.2 | 3.6 | 3.4 | >>>>>> 3.3 | 3.2 | >>>>>> | Game (fragmented) | 11.1 | 11.8 | 9.7 | 8.6 | 7.9 | >>>>>> 7.7 | 7.5 | >>>>>> | Clang (plain) | 13.9 | 17.9 | 17.0 | 16.7 | 16.3 | >>>>>> 16.2 | 16.1 | >>>>>> | Clang (fragmented) | 18.6 | 22.8 | 21.6 | 21.1 | 20.8 | >>>>>> 20.5 | 20.2 | >>>>>> >>>>>> +--------------------+-------+---------------+------+------+------+------+------+ >>>>>> >>>>>> Output size - Game package (MB): >>>>>> >>>>>> +---------------------+-------+------+------+------+------+------+------+ >>>>>> | Category | No GC | GC 1 | GC 2 | GC 3 | GC 4 | GC 5 | GC >>>>>> 6 | >>>>>> >>>>>> +---------------------+-------+------+------+------+------+------+------+ >>>>>> | Plain (total) | 1149 | 1121 | 1017 | 965 | 938 | 930 | >>>>>> 928 | >>>>>> | Plain (DWARF*) | 845 | 845 | 845 | 845 | 845 | 845 | >>>>>> 845 | >>>>>> | Plain (other) | 304 | 276 | 172 | 120 | 93 | 85 | >>>>>> 82 | >>>>>> | Fragmented (total) | 1044 | 940 | 556 | 373 | 287 | 263 | >>>>>> 255 | >>>>>> | Fragmented (DWARF*) | 740 | 664 | 384 | 253 | 194 | 178 | >>>>>> 173 | >>>>>> | Fragmented (other) | 304 | 276 | 172 | 120 | 93 | 85 | >>>>>> 82 | >>>>>> +---------------------+-------+------+------+------+------+------+------+ >>>>>> >>>>>> >>>>>> Output size - Clang (MB): >>>>>> >>>>>> +---------------------+-------+------+------+------+------+------+------+ >>>>>> | Category | No GC | GC 1 | GC 2 | GC 3 | GC 4 | GC 5 | GC >>>>>> 6 | >>>>>> >>>>>> +---------------------+-------+------+------+------+------+------+------+ >>>>>> | Plain (total) | 2596 | 2546 | 2406 | 2332 | 2293 | 2273 | >>>>>> 2251 | >>>>>> | Plain (DWARF*) | 1979 | 1979 | 1979 | 1979 | 1979 | 1979 | >>>>>> 1979 | >>>>>> | Plain (other) | 616 | 567 | 426 | 353 | 314 | 294 | >>>>>> 272 | >>>>>> | Fragmented (total) | 2397 | 2346 | 2164 | 2069 | 2017 | 1990 | >>>>>> 1963 | >>>>>> | Fragmented (DWARF*) | 1780 | 1780 | 1738 | 1716 | 1703 | 1696 | >>>>>> 1691 | >>>>>> | Fragmented (other) | 616 | 567 | 426 | 353 | 314 | 294 | >>>>>> 272 | >>>>>> >>>>>> +---------------------+-------+------+------+------+------+------+------+ >>>>>> >>>>>> *DWARF size == total size of .debug_info + .debug_line + >>>>>> .debug_ranges + .debug_aranges + .debug_loc >>>>>> >>>>>> Additionally, I have posted https://reviews.llvm.org/D89229 which >>>>>> provides the python script and linker patches used to reproduce the above >>>>>> results on my machine. The GC 1/2/3/4/5/6 correspond to the linker option >>>>>> added in that patch --mark-live-pc with values 1/0.8/0.6/0.4/0.2/0 >>>>>> respectively. >>>>>> >>>>>> During the conference, the question was asked what the memory usage >>>>>> and input size impact was. I've summarised these below: >>>>>> >>>>>> Input file size total (GB): >>>>>> +--------------------+------------+ >>>>>> | Package variant | Total Size | >>>>>> +--------------------+------------+ >>>>>> | Game (plain) | 2.9 | >>>>>> | Game (fragmented) | 4.2 | >>>>>> | Clang (plain) | 10.9 | >>>>>> | Clang (fragmented) | 12.3 | >>>>>> +--------------------+------------+ >>>>>> >>>>>> Peak Working Set Memory usage (GB): >>>>>> +--------------------+-------+------+ >>>>>> | Package variant | No GC | GC 1 | >>>>>> +--------------------+-------+------+ >>>>>> | Game (plain) | 4.3 | 4.7 | >>>>>> | Game (fragmented) | 8.9 | 8.6 | >>>>>> | Clang (plain) | 15.7 | 15.6 | >>>>>> | Clang (fragmented) | 19.4 | 19.2 | >>>>>> +--------------------+-------+------+ >>>>>> >>>>>> I'm keen to hear what people's feedback is, and also interested to >>>>>> see what results others might see by running this experiment on other input >>>>>> packages. Also, if anybody has any alternative ideas that meet the goals >>>>>> listed below, I'd love to hear them! >>>>>> >>>>>> To reiterate some key goals of fragmented DWARF, similar to what I >>>>>> said in the presentation: >>>>>> 1) Devise a scheme that gives significant size savings without being >>>>>> too costly. It's clear from just the two packages I've tried this on that >>>>>> there is a fairly hefty link time performance cost, although the exact cost >>>>>> depends on the nature of the input package. On the other hand, depending on >>>>>> the nature of the input package, there can also be some big gains. >>>>>> 2) Devise a scheme that doesn't require any linker knowledge of >>>>>> DWARF. The current approach doesn't quite achieve this properly due to the >>>>>> slight misuse of SHF_LINK_ORDER, but I expect that a pivot to using >>>>>> non-COMDAT group sections should solve this problem. >>>>>> 3) Provide some kind of halfway house between simply writing >>>>>> tombstone values into dead DWARF and fully parsing the DWARF to reoptimise >>>>>> its/discard the dead bits. >>>>>> >>>>>> I'm hopeful that changes could be made to the linker to improve the >>>>>> link-time cost. There seems to be a significant amount of the link time >>>>>> spent creating the input sections. An alternative would be to devise a >>>>>> scheme that would avoid the literal splitting into section headers, in >>>>>> favour of some sort of list of split-points that the linker uses to split >>>>>> things up (a bit like it already does for .eh_frame or mergeable sections). >>>>>> >>>>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201105/5e00b625/attachment-0001.html>
(Resending with history trimmed to avoid it getting stuck in moderator queue). Hi Alexey, Just an update - I identified the cause of the "Generated debug info is broken" error message when I tried to build things locally: the `outStreamer` instance is initialised with the host Triple, instead of whatever the target's triple is. For example, I build and run LLD on Windows, which means that a Windows triple will be generated, and consequently a COFF-emitting streamer will be created, rather than the ELF-emitting one I'd expect were the triple information to somehow be derived from the linker flavor/input objects etc. Hard-coding in my target triple resolved the issue (although I still got the other warnings mentioned from my game link). I measured the performance figures using LLD patched as described, and using the same methodology as my earlier results, and got the following: Link-time speed (s): +-----------------------------+---------------+ | Package variant | GC 1 (normal) | +-----------------------------+---------------+ | Game (DWARF linker) | 53.6 | | Game (DWARF linker, no ODR) | 63.6 | | Clang (DWARF linker) | 200.6 | +-----------------------------+---------------+ Output size - Game package (MB): +-----------------------------+------+ | Category | GC 1 | +-----------------------------+------+ | DWARFLinker (total) | 696 | | DWARFLinker (DWARF*) | 429 | | DWARFLinker (other) | 267 | | DWARFLinker no ODR (total) | 753 | | DWARFLinker no ODR (DWARF*) | 485 | | DWARFLinker no ODR (other) | 268 | +-----------------------------+------+ Output size - Clang (MB): +-----------------------------+------+ | Category | GC 1 | +-----------------------------+------+ | DWARFLinker (total) | 1294 | | DWARFLinker (DWARF*) | 743 | | DWARFLinker (other) | 551 | | DWARFLinker no ODR (total) | 1294 | | DWARFLinker no ODR (DWARF*) | 743 | | DWARFLinker no ODR (other) | 551 | +-----------------------------+------+ *DWARF = just .debug_info, .debug_line, .debug_loc, .debug_aranges, .debug_ranges. Peak Working Set Memory usage (GB): +-----------------------------+------+ | Package variant | GC 1 | +-----------------------------+------+ | Game (DWARFLinker) | 5.7 | | Game (DWARFLinker, no ODR) | 5.8 | | Clang (DWARFLinker) | 22.4 | | Clang (DWARFLinker, no ODR) | 22.5 | +-----------------------------+------+ My opinion is that the time costs of the DWARF Linker approach are not really practical except on build servers, in the current state of affairs for larger packages: clang takes 8.8x as long as the fragmented approach and 11.2x as long as the plain approach (without the no ODR option). The size saving is certainly good, with my version of clang 51% of the total output size for the DWARF linker approach versus the plain approach and 55% of the fragmented approach (though it is likely that further size savings might be possible for the latter). The game produced reasonable size savings too: 62% and 74%, but I'd be surprised if these gains would be enough for people to want to use the approach in day-to-day situations, which presumably is the main use-case for smaller DWARF, due to improved debugger load times. Interesting to note is that the GCC 7.5 build of clang I've used these figures with produced no difference in size results between the two variants, unlike other packages. Consequently, a significant amount of time is saved for no penalty. I'll be interested to see what the time results of the DWARF linker are once further improvements to it have been made. Thanks, James>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201105/cd8c557d/attachment.html>
On 04.11.2020 16:57, James Henderson wrote:> Great, thanks! Those results are about roughly what I was expecting. I > assume "compilation time" is actually just the link time?yep, that is link time.> > I find it particularly interesting that the DWARFLinker rewriting > solution produces the same size improvement in .debug_line as the > fragmented DWARF approach. That suggests that in that case, fragmented > DWARF output is probably about as optimal as it can get. I'm not > surprised that the same can't be said for other sections, but I'm also > pleased to see that the full rewrite option isn't so much better in > size improvements. > > Regarding the problems I was having with the patch, if you want to try > reproducing the problems with clang, I built commit 05d02e5a of clang > using gcc 7.5.0 on Ubuntu 18.04, to generate an ELF package. I then > used LLD to relink it to create a reproducible package. As I'm > primarily a Windows developer, I transferred this package to my > Windows machine so that I could use my existing Windows checkout of > LLVM, applied your patch, rebuilt LLD, and used that to try linking > the package, getting the stated message. I'm going to have another try > at the latter now to see if I can figure out what the issue is myself. > > James > > On Wed, 4 Nov 2020 at 13:35, Alexey Lapshin <avl.lapshin at gmail.com > <mailto:avl.lapshin at gmail.com>> wrote: > > > On 04.11.2020 15:28, James Henderson wrote: >> Hi Alexey, >> >> Thanks for taking a look at these. I noticed you set the >> --mark-live-pc value to a value other than 1 for the fragmented >> DWARF version. This will mean additional GC-ing will be done >> beyond the amount that --gc-sections will do, so unless you use >> the same value for the option for other versions, the result will >> not be comparable. (The option is purely there to experiment with >> the effects were different amounts of the input codebase to be >> considered dead). Would you be okay to run those figures again >> without the option specified? > > Oh, mis-interpreted that option. Following are updated results: > > 1. llvm-strings: > > source object files size: 381M. > fragmented source object files size: 451M(18% increase). > > a. upstream version, > command line options: --gc-sections > binary size: 6,5M > compilation time: 0:00.13 sec > run-time memory: 111kb > > b. "fragmented DWARF" version, > command line options: --gc-sections > binary size: 5,3M > compilation time: 0:00.11 sec > run-time memory: 125kb > > c. DWARFLinker version, > command line options: --gc-sections --gc-debuginfo > binary size: 3,8M > compilation time: 0:00.33 sec > run-time memory: 141kb > > d. DWARFLinker no-odr version, > command line options: --gc-sections --gc-debuginfo > --gc-debuginfo-no-odr > binary size: 4,3M > compilation time: 0:00.38 sec > run-time memory: 142kb > > > 2. clang: > > source object files size: 6,5G. > fragmented source object files size: 7,3G(13% increase). > > a. upstream version, > command line options: --gc-sections > binary size: 1,5G > compilation time: 6 sec > run-time memory: 9.7G > > b. "fragmented DWARF" version, > command line options: --gc-sections > binary size: 1,4G > compilation time: 8 sec > run-time memory: 12G > > c. DWARFLinker version, > command line options: --gc-sections --gc-debuginfo > binary size: 836M > compilation time: 62 sec > run-time memory: 15G > > d. DWARFLinker no-odr version, > command line options: --gc-sections --gc-debuginfo > --gc-debuginfo-no-odr > binary size: 1,3G > compilation time: 128 sec > run-time memory: 17G > > Detailed size results: > > 1. a) > > FILE SIZE VM SIZE > -------------- -------------- > 41.1% 2.64Mi 0.0% 0 .debug_info > 24.9% 1.60Mi 0.0% 0 .debug_str > 12.6% 827Ki 0.0% 0 .debug_line > 6.5% 428Ki 63.8% 428Ki .text > 4.8% 317Ki 0.0% 0 .strtab > 3.4% 223Ki 0.0% 0 .debug_ranges > 2.0% 133Ki 19.8% 133Ki .eh_frame > 1.7% 110Ki 0.0% 0 .symtab > 1.2% 77.6Ki 0.0% 0 .debug_abbrev > > b) > > FILE SIZE VM SIZE > -------------- -------------- > 40.2% 2.10Mi 0.0% 0 .debug_info > 30.7% 1.60Mi 0.0% 0 .debug_str > 8.0% 428Ki 63.8% 428Ki .text > 5.9% 317Ki 0.0% 0 .strtab > 5.9% 313Ki 0.0% 0 .debug_line > 2.5% 133Ki 19.8% 133Ki .eh_frame > 2.1% 110Ki 0.0% 0 .symtab > 1.5% 77.6Ki 0.0% 0 .debug_abbrev > 1.3% 69.2Ki 0.0% 0 .debug_ranges > > c) > > FILE SIZE VM SIZE > -------------- -------------- > 33.0% 1.25Mi 0.0% 0 .debug_info > 29.2% 1.11Mi 0.0% 0 .debug_str > 11.0% 428Ki 63.8% 428Ki .text > 8.2% 317Ki 0.0% 0 .strtab > 7.8% 304Ki 0.0% 0 .debug_line > 3.4% 133Ki 19.8% 133Ki .eh_frame > 2.8% 110Ki 0.0% 0 .symtab > 1.7% 65.9Ki 0.0% 0 .debug_ranges > 1.0% 38.4Ki 5.7% 38.4Ki .rodata > > d) > > FILE SIZE VM SIZE > -------------- -------------- > 39.7% 1.68Mi 0.0% 0 .debug_info > 26.3% 1.11Mi 0.0% 0 .debug_str > 9.9% 428Ki 63.8% 428Ki .text > 7.3% 317Ki 0.0% 0 .strtab > 7.0% 304Ki 0.0% 0 .debug_line > 3.1% 133Ki 19.8% 133Ki .eh_frame > 2.6% 110Ki 0.0% 0 .symtab > 1.5% 65.9Ki 0.0% 0 .debug_ranges > > > 2. a) > > FILE SIZE VM SIZE > -------------- -------------- > 58.3% 878Mi 0.0% 0 .debug_info > 11.8% 177Mi 0.0% 0 .debug_str > 7.7% 115Mi 62.2% 115Mi .text > 7.7% 115Mi 0.0% 0 .debug_line > 6.0% 90.7Mi 0.0% 0 .strtab > 2.4% 35.4Mi 0.0% 0 .debug_ranges > 1.5% 23.3Mi 12.5% 23.3Mi .eh_frame > 1.5% 23.0Mi 12.4% 23.0Mi .rodata > 1.2% 17.9Mi 0.0% 0 .symtab > > b) > > FILE SIZE VM SIZE > -------------- -------------- > 59.6% 807Mi 0.0% 0 .debug_info > 13.1% 177Mi 0.0% 0 .debug_str > 8.5% 115Mi 62.2% 115Mi .text > 6.7% 90.7Mi 0.0% 0 .strtab > 4.2% 57.4Mi 0.0% 0 .debug_line > 1.7% 23.3Mi 12.5% 23.3Mi .eh_frame > 1.7% 23.0Mi 12.4% 23.0Mi .rodata > 1.3% 17.9Mi 0.0% 0 .symtab > 1.0% 13.0Mi 0.0% 0 .debug_ranges > 0.8% 10.6Mi 5.7% 10.6Mi .dynstr > > c) > > FILE SIZE VM SIZE > -------------- -------------- > 35.1% 293Mi 0.0% 0 .debug_info > 21.2% 177Mi 0.0% 0 .debug_str > 13.9% 115Mi 62.2% 115Mi .text > 10.9% 90.7Mi 0.0% 0 .strtab > 6.9% 57.4Mi 0.0% 0 .debug_line > 2.8% 23.3Mi 12.5% 23.3Mi .eh_frame > 2.8% 23.0Mi 12.4% 23.0Mi .rodata > 2.1% 17.9Mi 0.0% 0 .symtab > 1.5% 12.4Mi 0.0% 0 .debug_ranges > 1.3% 10.6Mi 5.7% 10.6Mi .dynstr > > d) > > FILE SIZE VM SIZE > -------------- -------------- > 58.3% 758Mi 0.0% 0 .debug_info > 13.6% 177Mi 0.0% 0 .debug_str > 8.9% 115Mi 62.2% 115Mi .text > 7.0% 90.7Mi 0.0% 0 .strtab > 4.4% 57.4Mi 0.0% 0 .debug_line > 1.8% 23.3Mi 12.5% 23.3Mi .eh_frame > 1.8% 23.0Mi 12.4% 23.0Mi .rodata > 1.4% 17.9Mi 0.0% 0 .symtab > 1.0% 12.4Mi 0.0% 0 .debug_ranges > 0.8% 10.6Mi 5.7% 10.6Mi .dynstr > > >> >> I'm still trying to figure out the problems on my end to try >> running your experiment on the game package I used in my >> presentation, but have been interrupted by other unrelated >> issues. I'll try to get back to this in the coming days. >> >> James >> >> On Wed, 4 Nov 2020 at 11:54, Alexey Lapshin >> <avl.lapshin at gmail.com <mailto:avl.lapshin at gmail.com>> wrote: >> >> Hi James, >> >> I did experiments with the clang code base and will do >> experiments with our local codebase later. >> Overall, both solutions("Fragmented DWARF" and "DWARFLinker >> without odr types deduplication") look having similar size >> savings results for the final binary. "DWARFLinker with odr >> types deduplication" has a bigger size saving effect. >> "Fragmented DWARF" increases the size of original object >> files up to 15%. >> LLD with "fragmented DWARF" works significantly faster than >> with "DWARFLinker". >> >> Following are the results for "llvm-strings" and "clang" >> binaries: >> >> 1. llvm-strings: >> >> source object files size: 381M. >> fragmented source object files size: 451M(18% increase). >> >> a. upstream version, >> command line options: --gc-sections >> binary size: 6,5M >> compilation time: 0:00.13 sec >> run-time memory: 111kb >> >> b. "fragmented DWARF" version, >> command line options: --gc-sections --mark-live-pc=0.45 >> binary size: 3,7M >> compilation time: 0:00.10 sec >> run-time memory: 122kb >> >> c. DWARFLinker version, >> command line options: --gc-sections --gc-debuginfo >> binary size: 3,8M >> compilation time: 0:00.33 sec >> run-time memory: 141kb >> >> d. DWARFLinker no-odr version, >> command line options: --gc-sections --gc-debuginfo >> --gc-debuginfo-no-odr >> binary size: 4,3M >> compilation time: 0:00.38 sec >> run-time memory: 142kb >> >> >> 2. clang: >> >> source object files size: 6,5G. >> fragmented source object files size: 7,3G(13% increase). >> >> a. upstream version, >> command line options: --gc-sections >> binary size: 1,5G >> compilation time: 6 sec >> run-time memory: 9.7G >> >> b. "fragmented DWARF" version, >> command line options: --gc-sections --mark-live-pc=0.43 >> binary size: 1,1G >> compilation time: 9 sec >> run-time memory: 11G >> >> c. DWARFLinker version, >> command line options: --gc-sections --gc-debuginfo >> binary size: 836M >> compilation time: 62 sec >> run-time memory: 15G >> >> d. DWARFLinker no-odr version, >> command line options: --gc-sections --gc-debuginfo >> --gc-debuginfo-no-odr >> binary size: 1,3G >> compilation time: 128 sec >> run-time memory: 17G >> >> Detailed size results: >> >> 1. llvm-strings >> >> a) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 41.1% 2.64Mi 0.0% 0 .debug_info >> 24.9% 1.60Mi 0.0% 0 .debug_str >> 12.6% 827Ki 0.0% 0 .debug_line >> 6.5% 428Ki 63.8% 428Ki .text >> 4.8% 317Ki 0.0% 0 .strtab >> 3.4% 223Ki 0.0% 0 .debug_ranges >> 2.0% 133Ki 19.8% 133Ki .eh_frame >> 1.7% 110Ki 0.0% 0 .symtab >> 1.2% 77.6Ki 0.0% 0 .debug_abbrev >> >> b) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 50.3% 1.85Mi 0.0% 0 .debug_info >> 43.6% 1.60Mi 0.0% 0 .debug_str >> 2.6% 98.2Ki 0.0% 0 .debug_line >> 2.1% 77.6Ki 0.0% 0 .debug_abbrev >> 0.5% 17.5Ki 54.9% 17.4Ki .text >> 0.3% 9.94Ki 0.0% 0 .strtab >> 0.2% 6.27Ki 0.0% 0 .symtab >> 0.1% 5.09Ki 15.9% 5.03Ki .eh_frame >> 0.1% 3.28Ki 0.0% 0 .debug_ranges >> >> c) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 33.0% 1.25Mi 0.0% 0 .debug_info >> 29.2% 1.11Mi 0.0% 0 .debug_str >> 11.0% 428Ki 63.8% 428Ki .text >> 8.2% 317Ki 0.0% 0 .strtab >> 7.8% 304Ki 0.0% 0 .debug_line >> 3.4% 133Ki 19.8% 133Ki .eh_frame >> 2.8% 110Ki 0.0% 0 .symtab >> 1.7% 65.9Ki 0.0% 0 .debug_ranges >> 1.0% 38.4Ki 5.7% 38.4Ki .rodata >> >> d) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 39.7% 1.68Mi 0.0% 0 .debug_info >> 26.3% 1.11Mi 0.0% 0 .debug_str >> 9.9% 428Ki 63.8% 428Ki .text >> 7.3% 317Ki 0.0% 0 .strtab >> 7.0% 304Ki 0.0% 0 .debug_line >> 3.1% 133Ki 19.8% 133Ki .eh_frame >> 2.6% 110Ki 0.0% 0 .symtab >> 1.5% 65.9Ki 0.0% 0 .debug_ranges >> >> >> 2. clang >> >> a) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 58.3% 878Mi 0.0% 0 .debug_info >> 11.8% 177Mi 0.0% 0 .debug_str >> 7.7% 115Mi 62.2% 115Mi .text >> 7.7% 115Mi 0.0% 0 .debug_line >> 6.0% 90.7Mi 0.0% 0 .strtab >> 2.4% 35.4Mi 0.0% 0 .debug_ranges >> 1.5% 23.3Mi 12.5% 23.3Mi .eh_frame >> 1.5% 23.0Mi 12.4% 23.0Mi .rodata >> 1.2% 17.9Mi 0.0% 0 .symtab >> >> b) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 71.5% 772Mi 0.0% 0 .debug_info >> 16.5% 177Mi 0.0% 0 .debug_str >> 3.7% 40.2Mi 59.2% 40.2Mi .text >> 2.4% 25.8Mi 0.0% 0 .debug_line >> 2.1% 23.0Mi 0.0% 0 .strtab >> 1.0% 10.6Mi 15.6% 10.6Mi .dynstr >> 0.7% 7.18Mi 10.6% 7.18Mi .eh_frame >> 0.5% 5.60Mi 0.0% 0 .symtab >> 0.4% 4.28Mi 0.0% 0 .debug_ranges >> 0.4% 4.04Mi 0.0% 0 .debug_abbrev >> >> >> c) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 35.1% 293Mi 0.0% 0 .debug_info >> 21.2% 177Mi 0.0% 0 .debug_str >> 13.9% 115Mi 62.2% 115Mi .text >> 10.9% 90.7Mi 0.0% 0 .strtab >> 6.9% 57.4Mi 0.0% 0 .debug_line >> 2.8% 23.3Mi 12.5% 23.3Mi .eh_frame >> 2.8% 23.0Mi 12.4% 23.0Mi .rodata >> 2.1% 17.9Mi 0.0% 0 .symtab >> 1.5% 12.4Mi 0.0% 0 .debug_ranges >> 1.3% 10.6Mi 5.7% 10.6Mi .dynstr >> >> d) >> >> FILE SIZE VM SIZE >> -------------- -------------- >> 58.3% 758Mi 0.0% 0 .debug_info >> 13.6% 177Mi 0.0% 0 .debug_str >> 8.9% 115Mi 62.2% 115Mi .text >> 7.0% 90.7Mi 0.0% 0 .strtab >> 4.4% 57.4Mi 0.0% 0 .debug_line >> 1.8% 23.3Mi 12.5% 23.3Mi .eh_frame >> 1.8% 23.0Mi 12.4% 23.0Mi .rodata >> 1.4% 17.9Mi 0.0% 0 .symtab >> 1.0% 12.4Mi 0.0% 0 .debug_ranges >> 0.8% 10.6Mi 5.7% 10.6Mi .dynstr >> >> Thank you, Alexey. >> >> On 19.10.2020 11:50, James Henderson wrote: >>> Great, thanks Alexey! I'll try to take a look at this in the >>> near future, and will report my results back here. I imagine >>> our clang results will differ, purely because we probably >>> used different toolchains to build the input in the first place. >>> >>> On Thu, 15 Oct 2020 at 10:08, Alexey Lapshin >>> <avl.lapshin at gmail.com <mailto:avl.lapshin at gmail.com>> wrote: >>> >>> >>> On 13.10.2020 10:20, James Henderson wrote: >>>> The script included in the patch can be used to convert >>>> an object containing normal DWARF into an object using >>>> fragmented DWARF. It does this by using llvm-dwarfdump >>>> to dump the various sections, parses the output to >>>> identify where it should split (using the offsets of >>>> the various entries), and then writes new section >>>> headers accordingly - you can see roughly what it's >>>> doing if you get a chance to watch the talk recording. >>>> The additional section headers are appended to the end >>>> of the ELF section header table, whilst the original >>>> DWARF is left in the same place it was before (making >>>> use of the fact that section headers don't have to >>>> appear in offset order). The script also parses and >>>> fragments the relocation sections targeting the DWARF >>>> sections so that they match up with the fragmented >>>> DWARF sections. This is clearly all suboptimal - in >>>> practice the compiler should be modified to do the >>>> fragmenting upfront, to save having to parse a tool's >>>> stdout, but that was just the simplest thing I could >>>> come up with to quickly write the script. Full details >>>> of the script usage are included in the patch >>>> description, if you want to play around with it. >>>> >>>> If Alexey could point me at the latest version of his >>>> patch, I'd be happy to run that through either or both >>>> of the packages I used to see what happens. Equally, >>>> I'd be happy if Alexey is able to run my script to >>>> fragment and measure the performance of a couple of >>>> projects he's been working with. Based purely on the >>>> two packages I've tried this with, I can tell already >>>> that the results can vary wildly. My expectation is >>>> that Alexey's approach will be slower (at least in its >>>> current form, but probably more generally), but produce >>>> smaller output, but to what scale I have no idea. >>> >>> James, I updated the patch - >>> https://reviews.llvm.org/D74169. >>> >>> To make it working it is necessary to build example with >>> -ffunction-sections and specify following options to the >>> linker : >>> >>> --gc-sections --gc-debuginfo --gc-debuginfo-no-odr >>> >>> For clang binary I got following results: >>> >>> 1. --gc-sections = binary size 1,5G, Debug Info size(*)1.2G >>> >>> 2. --gc-sections --gc-debuginfo = binary size 840M, 8x >>> performance decrease, Debug Info size 542M >>> >>> 3. --gc-sections --gc-debuginfo --gc-debuginfo-no-odr >>> binary size 1,3G, 16x performance decrease, Debug Info >>> size 1G >>> >>> (*) >>> .debug_info+.debug_str+.debug_line+.debug_ranges+.debug_loc >>> >>> >>> I added option --gc-debuginfo-no-odr, so that size >>> reduction could be compared correctly. Without that >>> option D74169 does types deduplication and then it is >>> not correct to compare resulting size with "Fragmented >>> DWARF" solution which does not do types deduplication. >>> >>> Also, I look at your D89229 >>> <https://reviews.llvm.org/D89229> and would share >>> results some time later. >>> >>> Thank you, Alexey. >>> >>>> >>>> I think linkers parse .eh_frame partly because they >>>> have no other choice. That being said, I think it's >>>> format is not too complex, so similarly the parser >>>> isn't too complex. You can see LLD's ELF implementation >>>> in ELF/EhFrame.cpp, how it is used in >>>> ELF/InputSection.cpp (see the bits to do with >>>> EhInputSection) and EhFrameSection in >>>> ELF/SyntheticSections.h (plus various usages of these >>>> two throughout the LLD code). I think the key to any >>>> structural changes in the DWARF format to make them >>>> more amenable to link-time parsing is being able to >>>> read a minimal amount without needing to parse the >>>> payload (e.g. a length field, some sort of type, and >>>> then using the relocations to associate it accordingly). >>>> >>>> James >>>> >>>> On Mon, 12 Oct 2020 at 20:48, David Blaikie >>>> <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote: >>>> >>>> Awesome! Sorry I missed the lightning talk, but >>>> really interested to see this sort of thing (though >>>> it's not directly/immediately applicable to the use >>>> case I work with - Split DWARF, something similar >>>> could be used there with further work) >>>> >>>> Though it looks like the patch has mostly linker >>>> changes - where/how do you generate the fragmented >>>> DWARF to begin with? Via the Python script? Run >>>> over assembly? I'd be surprised if it was >>>> achievable that way - curious to know more. >>>> >>>> Got a rough sense/are you able to run >>>> apples-to-apples comparisons with Alexey's >>>> linker-based patches to compare linker time/memory >>>> overhead versus resulting output size gains? >>>> >>>> (& yeah, I'm a bit curious about how the linkers do >>>> eh_frame rewriting, if the format is especially >>>> amenable to a lightweight parsing/rewriting and how >>>> we could make the DWARF more amenable to that too) >>>> >>>> On Mon, Oct 12, 2020 at 6:41 AM James Henderson >>>> <jh7370.2008 at my.bristol.ac.uk >>>> <mailto:jh7370.2008 at my.bristol.ac.uk>> wrote: >>>> >>>> Hi all, >>>> >>>> At the recent LLVM developers' meeting, I >>>> presented a lightning talk on an approach to >>>> reduce the amount of dead debug data left in an >>>> executable following operations such as >>>> --gc-sections and duplicate COMDAT removal. In >>>> that presentation, I presented some figures >>>> based on linking a game that had been built by >>>> our downstream clang port and fragmented using >>>> the described approach. Since recording the >>>> presentation, I ran the same experiment on a >>>> clang package (this time built with a GCC >>>> version). The comparable figures are below: >>>> >>>> Link-time speed (s): >>>> +--------------------+-------+---------------+------+------+------+------+------+ >>>> | Package variant | No GC | GC 1 (normal) | >>>> GC 2 | GC 3 | GC 4 | GC 5 | GC 6 | >>>> +--------------------+-------+---------------+------+------+------+------+------+ >>>> | Game (plain) | 4.5 | 4.9 | >>>> 4.2 | 3.6 | 3.4 | 3.3 | 3.2 | >>>> | Game (fragmented) | 11.1 | 11.8 | >>>> 9.7 | 8.6 | 7.9 | 7.7 | 7.5 | >>>> | Clang (plain) | 13.9 | 17.9 | >>>> 17.0 | 16.7 | 16.3 | 16.2 | 16.1 | >>>> | Clang (fragmented) | 18.6 | 22.8 | >>>> 21.6 | 21.1 | 20.8 | 20.5 | 20.2 | >>>> +--------------------+-------+---------------+------+------+------+------+------+ >>>> >>>> Output size - Game package (MB): >>>> +---------------------+-------+------+------+------+------+------+------+ >>>> | Category | No GC | GC 1 | GC 2 | >>>> GC 3 | GC 4 | GC 5 | GC 6 | >>>> +---------------------+-------+------+------+------+------+------+------+ >>>> | Plain (total) | 1149 | 1121 | 1017 | >>>> 965 | 938 | 930 | 928 | >>>> | Plain (DWARF*) | 845 | 845 | 845 | >>>> 845 | 845 | 845 | 845 | >>>> | Plain (other) | 304 | 276 | 172 | >>>> 120 | 93 | 85 | 82 | >>>> | Fragmented (total) | 1044 | 940 | 556 | >>>> 373 | 287 | 263 | 255 | >>>> | Fragmented (DWARF*) | 740 | 664 | 384 | >>>> 253 | 194 | 178 | 173 | >>>> | Fragmented (other) | 304 | 276 | 172 | >>>> 120 | 93 | 85 | 82 | >>>> +---------------------+-------+------+------+------+------+------+------+ >>>> >>>> >>>> Output size - Clang (MB): >>>> +---------------------+-------+------+------+------+------+------+------+ >>>> | Category | No GC | GC 1 | GC 2 | >>>> GC 3 | GC 4 | GC 5 | GC 6 | >>>> +---------------------+-------+------+------+------+------+------+------+ >>>> | Plain (total) | 2596 | 2546 | 2406 | >>>> 2332 | 2293 | 2273 | 2251 | >>>> | Plain (DWARF*) | 1979 | 1979 | 1979 | >>>> 1979 | 1979 | 1979 | 1979 | >>>> | Plain (other) | 616 | 567 | 426 | >>>> 353 | 314 | 294 | 272 | >>>> | Fragmented (total) | 2397 | 2346 | 2164 | >>>> 2069 | 2017 | 1990 | 1963 | >>>> | Fragmented (DWARF*) | 1780 | 1780 | 1738 | >>>> 1716 | 1703 | 1696 | 1691 | >>>> | Fragmented (other) | 616 | 567 | 426 | >>>> 353 | 314 | 294 | 272 | >>>> +---------------------+-------+------+------+------+------+------+------+ >>>> >>>> *DWARF size == total size of .debug_info + >>>> .debug_line + .debug_ranges + .debug_aranges + >>>> .debug_loc >>>> >>>> Additionally, I have posted >>>> https://reviews.llvm.org/D89229 which provides >>>> the python script and linker patches used to >>>> reproduce the above results on my machine. The >>>> GC 1/2/3/4/5/6 correspond to the linker option >>>> added in that patch --mark-live-pc with values >>>> 1/0.8/0.6/0.4/0.2/0 respectively. >>>> >>>> During the conference, the question was asked >>>> what the memory usage and input size impact >>>> was. I've summarised these below: >>>> >>>> Input file size total (GB): >>>> +--------------------+------------+ >>>> | Package variant | Total Size | >>>> +--------------------+------------+ >>>> | Game (plain) | 2.9 | >>>> | Game (fragmented) | 4.2 | >>>> | Clang (plain) | 10.9 | >>>> | Clang (fragmented) | 12.3 | >>>> +--------------------+------------+ >>>> >>>> Peak Working Set Memory usage (GB): >>>> +--------------------+-------+------+ >>>> | Package variant | No GC | GC 1 | >>>> +--------------------+-------+------+ >>>> | Game (plain) | 4.3 | 4.7 | >>>> | Game (fragmented) | 8.9 | 8.6 | >>>> | Clang (plain) | 15.7 | 15.6 | >>>> | Clang (fragmented) | 19.4 | 19.2 | >>>> +--------------------+-------+------+ >>>> >>>> I'm keen to hear what people's feedback is, and >>>> also interested to see what results others >>>> might see by running this experiment on other >>>> input packages. Also, if anybody has any >>>> alternative ideas that meet the goals listed >>>> below, I'd love to hear them! >>>> >>>> To reiterate some key goals of fragmented >>>> DWARF, similar to what I said in the presentation: >>>> 1) Devise a scheme that gives significant size >>>> savings without being too costly. It's clear >>>> from just the two packages I've tried this on >>>> that there is a fairly hefty link time >>>> performance cost, although the exact cost >>>> depends on the nature of the input package. On >>>> the other hand, depending on the nature of the >>>> input package, there can also be some big gains. >>>> 2) Devise a scheme that doesn't require any >>>> linker knowledge of DWARF. The current approach >>>> doesn't quite achieve this properly due to the >>>> slight misuse of SHF_LINK_ORDER, but I expect >>>> that a pivot to using non-COMDAT group sections >>>> should solve this problem. >>>> 3) Provide some kind of halfway house between >>>> simply writing tombstone values into dead DWARF >>>> and fully parsing the DWARF to reoptimise >>>> its/discard the dead bits. >>>> >>>> I'm hopeful that changes could be made to the >>>> linker to improve the link-time cost. There >>>> seems to be a significant amount of the link >>>> time spent creating the input sections. An >>>> alternative would be to devise a scheme that >>>> would avoid the literal splitting into section >>>> headers, in favour of some sort of list of >>>> split-points that the linker uses to split >>>> things up (a bit like it already does for >>>> .eh_frame or mergeable sections). >>>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201105/cbfb6511/attachment-0001.html>