Robinson, Paul via llvm-dev
2017-Dec-04 18:49 UTC
[llvm-dev] [RFC] - Deduplication of debug information in linkers (LLD).
Thanks for providing the experimental data! It clearly shows the value of type sections in DWARF. Regarding why type sections are off by default, aside from the issue of consumers needing to understand them, there is a size penalty to type sections that becomes more evident in smaller projects (meaning, fewer compilation units). The size penalty can be balanced against the amount of deduplication for a net win, if you have enough duplicates that you can eliminate. But it is a tradeoff. In Sony's case, it is not uncommon for studios to do what are called "unity" builds, where you have basically one master .cpp file that does #include of each other .cpp file, giving you an LTO-like build. In this case the debug-info production will automatically produce only one copy of each type, and so using type sections would probably make the net debug info bigger. And of course an LTO build will deduplicate type info at the metadata level, with a similar effect. So, I think whether type sections help or hurt will depend on how a particular project's build procedure is set up. Clang/LLVM are set up to do lots of smaller compilations and link them all together, in a fairly traditional model, and that is where type sections will provide the most benefit. Your data, then, is essentially for a best-case scenario. Other kinds of projects will not benefit as much. Regarding DWARF 5 and emitting type sections into the .debug_info section rather than the .debug_types section: The work to support DWARF 5 in LLVM has not gotten very far yet. Conforming to the standard in this respect is certainly on my list, however there are other features that Sony considers higher priority. If you or someone else wants to contribute that feature sooner, that would be excellent! Otherwise, we will get to it in due time. Thanks, --paulr From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of George Rimar via llvm-dev Sent: Monday, December 04, 2017 7:11 AM To: llvm-dev at lists.llvm.org Subject: [llvm-dev] [RFC] - Deduplication of debug information in linkers (LLD). Hi all ! We have an issue with LLD, it is "relocation R_X86_64_32 out of range" (PR31109) which occurs during resolving relocations in debug sections. It looks happens because .debug_info section can be too large sometimes and 32x relocation is not enough to represent the value. One of possible solutions looks to be to deduplicate information to reduce .debug_info size. The rest of mail contains information about experiments I did, the obtained results and some questions and suggestions as well. I was investigating idea to deduplicate debug types information. Idea is described at p276 of DWARF4 specification (http://www.dwarfstd.org/doc/DWARF4.pdf). It suggests to split types information out of .debug_info and emit multiple .debug_types sections with use of COMDATs. Both clang and gcc I tested implements -fdebug-types-section flag for that: -fdebug-types-section, -fno-debug-types-section Place debug types in their own section (ELF Only) gcc's description is here: https://gcc.gnu.org/onlinedocs/gcc-6.4.0/gcc/Debugging-Options.html#Debugging-Options. This flag is disabled by default. I compared clang binaries to see the difference with and without the linker side optimisation. 1) Clang built with -g has size of 1.7 GB, .debug_info section size is 894.5 Mb. 2) Clang built with -g -fdebug-types-section has size of 1.0 GB. .debug_types size is 26.267 MB, .debug_info size is 227.7 MB. Difference is huge and I believe shows (though probably for most of readers here it was already obvious) that optimization can be useful. Though -fdebug-types-section is disabled by default. Looks it was initially disabled because not all of DWARF consumers were aware of .debug_types section. Now in 2017 situation is different. I think most of DWARF consumers knows about .debug_types, but: 1) DWARF5 specification explicitly eliminates the .debug_types section introduced in DWARF4: p8, "1.4 Changes from Version 4 to Version 5" http://dwarfstd.org/doc/DWARF5.pdf 2) Instead of emiting multiple .debug_types it suggests to emit multiple .debug_info COMDAT sections. (p375, p376). And it seems currently there is no way to make clang to emit multiple .debug_info with type information like DWARF5 suggests. I tried command line below: -g -fdebug-types-section -gdwarf-5 It still emits .debug_types and does not look there is a flag for emiting multiple .debug_info. Looking at whole LLVM code (lib/mc, lib/CodeGen) actually it seems it is just always assumed .debug_info is a unique section in object. (also not sure why clang emits .debug_types when -gdwarf-5 flag is set, as this section is incompatible with v5, probably it is a bug). So my questions are following: 1) Do we want to try to implement multiple .debug_info approach ? As it seems can be very useful sometimes. 2) For now in LLD may be we may want to extend our error message from "relocation X out of range" to something suggesting to use -fdebug-types-section (only for relocations in debug sections) ? 3) Why -fdebug-types-section is disabled by default ? Best regards, George | Developer | Access Softek, Inc -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171204/401fa5cb/attachment-0001.html>
UE US via llvm-dev
2017-Dec-04 21:23 UTC
[llvm-dev] [RFC] - Deduplication of debug information in linkers (LLD).
An old co-worker told me that writing a dwarf support library was the most painful experience of his life due to the confusing standards documents, so it's not surprising DWARF5 is going slow. GNOMETOYS On Mon, Dec 4, 2017 at 12:49 PM, Robinson, Paul via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Thanks for providing the experimental data! It clearly shows the value of > type sections in DWARF. > > Regarding why type sections are off by default, aside from the issue of > consumers needing to understand them, there is a size penalty to type > sections that becomes more evident in smaller projects (meaning, fewer > compilation units). The size penalty can be balanced against the amount of > deduplication for a net win, if you have enough duplicates that you can > eliminate. But it is a tradeoff. > > In Sony's case, it is not uncommon for studios to do what are called > "unity" builds, where you have basically one master .cpp file that does > #include of each other .cpp file, giving you an LTO-like build. In this > case the debug-info production will automatically produce only one copy of > each type, and so using type sections would probably make the net debug > info bigger. And of course an LTO build will deduplicate type info at the > metadata level, with a similar effect. > > So, I think whether type sections help or hurt will depend on how a > particular project's build procedure is set up. Clang/LLVM are set up to > do lots of smaller compilations and link them all together, in a fairly > traditional model, and that is where type sections will provide the most > benefit. Your data, then, is essentially for a best-case scenario. Other > kinds of projects will not benefit as much. > > > > Regarding DWARF 5 and emitting type sections into the .debug_info section > rather than the .debug_types section: The work to support DWARF 5 in LLVM > has not gotten very far yet. Conforming to the standard in this respect is > certainly on my list, however there are other features that Sony considers > higher priority. If you or someone else wants to contribute that feature > sooner, that would be excellent! Otherwise, we will get to it in due time. > > Thanks, > > --paulr > > > > *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *George > Rimar via llvm-dev > *Sent:* Monday, December 04, 2017 7:11 AM > *To:* llvm-dev at lists.llvm.org > *Subject:* [llvm-dev] [RFC] - Deduplication of debug information in > linkers (LLD). > > > > Hi all ! > > > > We have an issue with LLD, it is "relocation R_X86_64_32 out of range" > (PR31109) > > which occurs during resolving relocations in debug sections. It looks > happens > > because .debug_info section can be too large sometimes and 32x relocation > is not enough > > to represent the value. One of possible solutions looks to be to > deduplicate information > > to reduce .debug_info size. > > The rest of mail contains information about experiments I did, the > obtained results and > > some questions and suggestions as well. > > > > I was investigating idea to deduplicate debug types information. Idea is > described at > > p276 of DWARF4 specification (http://www.dwarfstd.org/doc/DWARF4.pdf). It > suggests > > to split types information out of .debug_info and emit multiple > .debug_types sections > > with use of COMDATs. Both clang and gcc I tested implements > -fdebug-types-section flag for that: > > > > -fdebug-types-section, -fno-debug-types-section > > Place debug types in their own section (ELF Only) > > gcc's description is here: https://gcc.gnu.org/onlinedocs/gcc-6.4.0/gcc/ > Debugging-Options.html#Debugging-Options. > > > > This flag is disabled by default. I compared clang binaries to see the > difference > > with and without the linker side optimisation. > > 1) Clang built with -g has size of 1.7 GB, .debug_info section size is > 894.5 Mb. > > 2) Clang built with -g -fdebug-types-section has size of 1.0 GB. > > .debug_types size is 26.267 MB, .debug_info size is 227.7 MB. > > > > Difference is huge and I believe shows (though probably for most of > readers here it was > > already obvious) that optimization can be useful. Though > -fdebug-types-section is disabled by default. > > Looks it was initially disabled because not all of DWARF consumers were > aware of .debug_types section. > > > > Now in 2017 situation is different. I think most of DWARF consumers knows > about .debug_types, but: > > 1) DWARF5 specification explicitly eliminates the .debug_types section > introduced in DWARF4: > > p8, "1.4 Changes from Version 4 to Version 5" http://dwarfstd.org/doc/ > DWARF5.pdf > > 2) Instead of emiting multiple .debug_types it suggests to emit multiple > .debug_info COMDAT > > sections. (p375, p376). > > > > And it seems currently there is no way to make clang to emit multiple > .debug_info with type information > > like DWARF5 suggests. I tried command line below: > > -g -fdebug-types-section -gdwarf-5 > > It still emits .debug_types and does not look there is a flag for emiting > multiple .debug_info. > > Looking at whole LLVM code (lib/mc, lib/CodeGen) actually it seems it is > just always assumed .debug_info is > > a unique section in object. > > (also not sure why clang emits .debug_types when -gdwarf-5 flag is set, as > this section is incompatible with v5, > > probably it is a bug). > > > > So my questions are following: > > 1) Do we want to try to implement multiple .debug_info approach ? As it > seems can be very useful sometimes. > > 2) For now in LLD may be we may want to extend our error message from > "relocation X out of range" to something > > suggesting to use -fdebug-types-section (only for relocations in debug > sections) ? > > 3) Why -fdebug-types-section is disabled by default ? > > > > Best regards, > George | Developer | Access Softek, Inc > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171204/07993994/attachment.html>
Eric Christopher via llvm-dev
2017-Dec-04 22:59 UTC
[llvm-dev] [RFC] - Deduplication of debug information in linkers (LLD).
This isn't a particularly productive email - especially as a number of people on this list are current contributors to the standard. Mostly dwarf5 support is lined up behind one of us having the spare cycles to implement it rather than anything else FWIW :) That said, if you have specific feedback about confusing items I'm definitely happy to help figure out: a) some better way to say it, b) some other implementation to avoid it being confusing Having partially implemented a couple of readers and writers at this point I agree that it's not the friendliest of documents, but sometimes being inside of it makes it harder to see where it's causing issues. Thanks! -eric On Mon, Dec 4, 2017 at 1:23 PM UE US via llvm-dev <llvm-dev at lists.llvm.org> wrote:> An old co-worker told me that writing a dwarf support library was the most > painful experience of his life due to the confusing standards documents, so > it's not surprising DWARF5 is going slow. > > GNOMETOYS > > On Mon, Dec 4, 2017 at 12:49 PM, Robinson, Paul via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Thanks for providing the experimental data! It clearly shows the value >> of type sections in DWARF. >> >> Regarding why type sections are off by default, aside from the issue of >> consumers needing to understand them, there is a size penalty to type >> sections that becomes more evident in smaller projects (meaning, fewer >> compilation units). The size penalty can be balanced against the amount of >> deduplication for a net win, if you have enough duplicates that you can >> eliminate. But it is a tradeoff. >> >> In Sony's case, it is not uncommon for studios to do what are called >> "unity" builds, where you have basically one master .cpp file that does >> #include of each other .cpp file, giving you an LTO-like build. In this >> case the debug-info production will automatically produce only one copy of >> each type, and so using type sections would probably make the net debug >> info bigger. And of course an LTO build will deduplicate type info at the >> metadata level, with a similar effect. >> >> So, I think whether type sections help or hurt will depend on how a >> particular project's build procedure is set up. Clang/LLVM are set up to >> do lots of smaller compilations and link them all together, in a fairly >> traditional model, and that is where type sections will provide the most >> benefit. Your data, then, is essentially for a best-case scenario. Other >> kinds of projects will not benefit as much. >> >> >> >> Regarding DWARF 5 and emitting type sections into the .debug_info section >> rather than the .debug_types section: The work to support DWARF 5 in LLVM >> has not gotten very far yet. Conforming to the standard in this respect is >> certainly on my list, however there are other features that Sony considers >> higher priority. If you or someone else wants to contribute that feature >> sooner, that would be excellent! Otherwise, we will get to it in due time. >> >> Thanks, >> >> --paulr >> >> >> >> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *George >> Rimar via llvm-dev >> *Sent:* Monday, December 04, 2017 7:11 AM >> *To:* llvm-dev at lists.llvm.org >> *Subject:* [llvm-dev] [RFC] - Deduplication of debug information in >> linkers (LLD). >> >> >> >> Hi all ! >> >> >> >> We have an issue with LLD, it is "relocation R_X86_64_32 out of range" >> (PR31109) >> >> which occurs during resolving relocations in debug sections. It looks >> happens >> >> because .debug_info section can be too large sometimes and 32x relocation >> is not enough >> >> to represent the value. One of possible solutions looks to be to >> deduplicate information >> >> to reduce .debug_info size. >> >> The rest of mail contains information about experiments I did, the >> obtained results and >> >> some questions and suggestions as well. >> >> >> >> I was investigating idea to deduplicate debug types information. Idea is >> described at >> >> p276 of DWARF4 specification (http://www.dwarfstd.org/doc/DWARF4.pdf). >> It suggests >> >> to split types information out of .debug_info and emit multiple >> .debug_types sections >> >> with use of COMDATs. Both clang and gcc I tested implements >> -fdebug-types-section flag for that: >> >> >> >> -fdebug-types-section, -fno-debug-types-section >> >> Place debug types in their own section (ELF Only) >> >> gcc's description is here: >> https://gcc.gnu.org/onlinedocs/gcc-6.4.0/gcc/Debugging-Options.html#Debugging-Options >> . >> >> >> >> This flag is disabled by default. I compared clang binaries to see the >> difference >> >> with and without the linker side optimisation. >> >> 1) Clang built with -g has size of 1.7 GB, .debug_info section size is >> 894.5 Mb. >> >> 2) Clang built with -g -fdebug-types-section has size of 1.0 GB. >> >> .debug_types size is 26.267 MB, .debug_info size is 227.7 MB. >> >> >> >> Difference is huge and I believe shows (though probably for most of >> readers here it was >> >> already obvious) that optimization can be useful. Though >> -fdebug-types-section is disabled by default. >> >> Looks it was initially disabled because not all of DWARF consumers were >> aware of .debug_types section. >> >> >> >> Now in 2017 situation is different. I think most of DWARF consumers knows >> about .debug_types, but: >> >> 1) DWARF5 specification explicitly eliminates the .debug_types section >> introduced in DWARF4: >> >> p8, "1.4 Changes from Version 4 to Version 5" >> http://dwarfstd.org/doc/DWARF5.pdf >> >> 2) Instead of emiting multiple .debug_types it suggests to emit multiple >> .debug_info COMDAT >> >> sections. (p375, p376). >> >> >> >> And it seems currently there is no way to make clang to emit multiple >> .debug_info with type information >> >> like DWARF5 suggests. I tried command line below: >> >> -g -fdebug-types-section -gdwarf-5 >> >> It still emits .debug_types and does not look there is a flag for emiting >> multiple .debug_info. >> >> Looking at whole LLVM code (lib/mc, lib/CodeGen) actually it seems it is >> just always assumed .debug_info is >> >> a unique section in object. >> >> (also not sure why clang emits .debug_types when -gdwarf-5 flag is set, >> as this section is incompatible with v5, >> >> probably it is a bug). >> >> >> >> So my questions are following: >> >> 1) Do we want to try to implement multiple .debug_info approach ? As it >> seems can be very useful sometimes. >> >> 2) For now in LLD may be we may want to extend our error message from >> "relocation X out of range" to something >> >> suggesting to use -fdebug-types-section (only for relocations in debug >> sections) ? >> >> 3) Why -fdebug-types-section is disabled by default ? >> >> >> >> Best regards, >> George | Developer | Access Softek, Inc >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171204/f7d97e3d/attachment-0001.html>
Rui Ueyama via llvm-dev
2017-Dec-05 05:08 UTC
[llvm-dev] [RFC] - Deduplication of debug information in linkers (LLD).
On Mon, Dec 4, 2017 at 10:49 AM, Robinson, Paul via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Thanks for providing the experimental data! It clearly shows the value of > type sections in DWARF. > > Regarding why type sections are off by default, aside from the issue of > consumers needing to understand them, there is a size penalty to type > sections that becomes more evident in smaller projects (meaning, fewer > compilation units). The size penalty can be balanced against the amount of > deduplication for a net win, if you have enough duplicates that you can > eliminate. But it is a tradeoff. >By a size penalty, which do you mean, the size of the final executable or the intermediate object files? If it is a size penalty of object files, how much is that? I wonder if the current situation is a reasonable trade-off. In Sony's case, it is not uncommon for studios to do what are called> "unity" builds, where you have basically one master .cpp file that does > #include of each other .cpp file, giving you an LTO-like build. In this > case the debug-info production will automatically produce only one copy of > each type, and so using type sections would probably make the net debug > info bigger. And of course an LTO build will deduplicate type info at the > metadata level, with a similar effect. > > So, I think whether type sections help or hurt will depend on how a > particular project's build procedure is set up. Clang/LLVM are set up to > do lots of smaller compilations and link them all together, in a fairly > traditional model, and that is where type sections will provide the most > benefit. Your data, then, is essentially for a best-case scenario. Other > kinds of projects will not benefit as much. > > > > Regarding DWARF 5 and emitting type sections into the .debug_info section > rather than the .debug_types section: The work to support DWARF 5 in LLVM > has not gotten very far yet. Conforming to the standard in this respect is > certainly on my list, however there are other features that Sony considers > higher priority. If you or someone else wants to contribute that feature > sooner, that would be excellent! Otherwise, we will get to it in due time. > > Thanks, > > --paulr > > > > *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *George > Rimar via llvm-dev > *Sent:* Monday, December 04, 2017 7:11 AM > *To:* llvm-dev at lists.llvm.org > *Subject:* [llvm-dev] [RFC] - Deduplication of debug information in > linkers (LLD). > > > > Hi all ! > > > > We have an issue with LLD, it is "relocation R_X86_64_32 out of range" > (PR31109) > > which occurs during resolving relocations in debug sections. It looks > happens > > because .debug_info section can be too large sometimes and 32x relocation > is not enough > > to represent the value. One of possible solutions looks to be to > deduplicate information > > to reduce .debug_info size. > > The rest of mail contains information about experiments I did, the > obtained results and > > some questions and suggestions as well. > > > > I was investigating idea to deduplicate debug types information. Idea is > described at > > p276 of DWARF4 specification (http://www.dwarfstd.org/doc/DWARF4.pdf). It > suggests > > to split types information out of .debug_info and emit multiple > .debug_types sections > > with use of COMDATs. Both clang and gcc I tested implements > -fdebug-types-section flag for that: > > > > -fdebug-types-section, -fno-debug-types-section > > Place debug types in their own section (ELF Only) > > gcc's description is here: https://gcc.gnu.org/onlinedocs/gcc-6.4.0/gcc/ > Debugging-Options.html#Debugging-Options. > > > > This flag is disabled by default. I compared clang binaries to see the > difference > > with and without the linker side optimisation. > > 1) Clang built with -g has size of 1.7 GB, .debug_info section size is > 894.5 Mb. > > 2) Clang built with -g -fdebug-types-section has size of 1.0 GB. > > .debug_types size is 26.267 MB, .debug_info size is 227.7 MB. > > > > Difference is huge and I believe shows (though probably for most of > readers here it was > > already obvious) that optimization can be useful. Though > -fdebug-types-section is disabled by default. > > Looks it was initially disabled because not all of DWARF consumers were > aware of .debug_types section. > > > > Now in 2017 situation is different. I think most of DWARF consumers knows > about .debug_types, but: > > 1) DWARF5 specification explicitly eliminates the .debug_types section > introduced in DWARF4: > > p8, "1.4 Changes from Version 4 to Version 5" http://dwarfstd.org/doc/ > DWARF5.pdf > > 2) Instead of emiting multiple .debug_types it suggests to emit multiple > .debug_info COMDAT > > sections. (p375, p376). > > > > And it seems currently there is no way to make clang to emit multiple > .debug_info with type information > > like DWARF5 suggests. I tried command line below: > > -g -fdebug-types-section -gdwarf-5 > > It still emits .debug_types and does not look there is a flag for emiting > multiple .debug_info. > > Looking at whole LLVM code (lib/mc, lib/CodeGen) actually it seems it is > just always assumed .debug_info is > > a unique section in object. > > (also not sure why clang emits .debug_types when -gdwarf-5 flag is set, as > this section is incompatible with v5, > > probably it is a bug). > > > > So my questions are following: > > 1) Do we want to try to implement multiple .debug_info approach ? As it > seems can be very useful sometimes. > > 2) For now in LLD may be we may want to extend our error message from > "relocation X out of range" to something > > suggesting to use -fdebug-types-section (only for relocations in debug > sections) ? > > 3) Why -fdebug-types-section is disabled by default ? > > > > Best regards, > George | Developer | Access Softek, Inc > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171204/e9960766/attachment.html>
George Rimar via llvm-dev
2017-Dec-05 13:50 UTC
[llvm-dev] [RFC] - Deduplication of debug information in linkers (LLD).
Thanks for answers, Paul !>So, I think whether type sections help or hurt will depend on how a particular project's build procedure is set up. Clang/LLVM are set up >to do lots of smaller compilations and link them all together, in a fairly traditional model, and that is where type sections will provide the >most benefit. Your data, then, is essentially for a best-case scenario. Other kinds of projects will not benefit as much.This inspired me to do additional tests for LLVM binaries to see how much win they can have if we enable -fdebug-types-section. (Full table with results is at the end of mail.) During experiment I observed both object size penalies and a single final executable size penalty: 1) Size of .a files in LLVM/lib files inreases from 6.5GB to 7.7GB. 2) One binary which is llvm-PerfectShuffle was larger with flag, size changed from 120064 to 124952. For all others use of flag usually grants noticable win (up to reduce of size by 41%).>Regarding DWARF 5 and emitting type sections into the .debug_info section rather than the .debug_types section: The work to support >DWARF 5 in LLVM has not gotten very far yet. Conforming to the standard in this respect is certainly on my list, however there are other >features that Sony considers higher priority. If you or someone else wants to contribute that feature sooner, that would be excellent! >Otherwise, we will get to it in due time. >Thanks, >--paulrI am going to look at it closer. At least I do not think LLD would work correctly with multiple .debug_info right now for building .gdb_index. We expect to see unique .debug_info in a object file and probably will do something wrong in another case. Looks llvm/DebugInfo needs to be fixed first, which also affects tools lile llvm-dwarfdump and probably something else. Going to investigate all of that. Testing results: ---------------------------------------------------------------- Name Size change (-g / -g -fdebug-types-section) ---------------------------------------------------------------- arcmt-test 461644608 / 322758048 = 1.430x bugpoint 938191280 / 552402624 = 1.698x c-arcmt-test 20968 / 20968 = 1.000x c-index-test 941325408 / 613643776 = 1.533x clang-6.0 1697417824 / 1025908400 = 1.654x clang-check 1440335448 / 864954472 = 1.665x clang-diff 422183328 / 293650384 = 1.437x clang-format 67763352 / 51596584 = 1.313x clang-func-mapping 423746376 / 294311536 = 1.439x clang-import-test 611477912 / 410019056 = 1.491x clang-offload-bundler 76254024 / 61321152 = 1.243x clang-refactor 448153976 / 311549496 = 1.438x clang-rename 441661264 / 307777416 = 1.435x clang-tblgen 17489504 / 16802744 = 1.040x count 18392 / 18392 = 1.000x diagtool 415912688 / 289701512 = 1.435x FileCheck 6681280 / 6489896 = 1.029x llc 903308048 / 529531864 = 1.705x lld 1009754992 / 620445232 = 1.627x lli 419682176 / 270912680 = 1.549x lli-child-target 77237632 / 63011888 = 1.225x llvm-ar 131787624 / 102692104 = 1.283x llvm-as 72916752 / 57792456 = 1.261x llvm-bcanalyzer 6464984 / 6259992 = 1.032x llvm-cat 73318016 / 57999784 = 1.264x llvm-cfi-verify 160259072 / 125738440 = 1.274x llvm-config 5947768 / 5776752 = 1.029x llvm-cov 80728632 / 65663448 = 1.229x llvm-c-test 843631952 / 498768912 = 1.691x llvm-cvtres 72163840 / 58065104 = 1.242x llvm-cxxdump 74284720 / 59261168 = 1.253x llvm-cxxfilt 7046752 / 6865368 = 1.026x llvm-demangle-fuzzer 70156288 / 55760784 = 1.258x llvm-diff 58551832 / 46506104 = 1.259x llvm-dis 52982824 / 42252624 = 1.253x llvm-dsymutil 883071928 / 517877728 = 1.705x llvm-dwarfdump 121679064 / 95079960 = 1.279x llvm-dwp 879362280 / 514570584 = 1.708x llvm-extract 115790888 / 87646504 = 1.321x llvm-isel-fuzzer 887217736 / 519910464 = 1.706x llvm-link 79158192 / 62087976 = 1.274x llvm-lto 932838656 / 553536912 = 1.685x llvm-lto2 926319416 / 550018696 = 1.684x llvm-mc 118139784 / 89656216 = 1.317x llvm-mcmarkup 5974664 / 5775368 = 1.034x llvm-modextract 68740776 / 54352208 = 1.264x llvm-mt 6749720 / 6440088 = 1.048x llvm-nm 131633536 / 102825080 = 1.280x llvm-objcopy 73991272 / 60029840 = 1.232x llvm-objdump 150270880 / 118629456 = 1.266x llvm-opt-fuzzer 891258608 / 527493664 = 1.689x llvm-opt-report 8814368 / 8585952 = 1.026x llvm-pdbutil 110919744 / 93010704 = 1.192x llvm-PerfectShuffle 120064 / 124952 = 0.960x llvm-profdata 41889560 / 32957976 = 1.270x llvm-rc 8954768 / 8551192 = 1.047x llvm-readobj 85723040 / 70542776 = 1.215x llvm-rtdyld 138255056 / 108085992 = 1.279x llvm-size 71567376 / 57589872 = 1.259x llvm-split 125299816 / 95063600 = 1.318x llvm-stress 46366576 / 37211688 = 1.246x llvm-strings 5746216 / 5563216 = 1.032x llvm-symbolizer 87280568 / 71248216 = 1.225x llvm-tblgen 49304088 / 42580848 = 1.157x llvm-xray 93953928 / 77434112 = 1.213x not 5495536 / 5325816 = 1.031x obj2yaml 97146752 / 81415480 = 1.193x opt 955386696 / 564492184 = 1.692x sancov 146145680 / 114837520 = 1.272x sanstats 87031832 / 71004312 = 1.225x scan-build 53444 / 53444 = 1.000x scan-view 4504 / 4504 = 1.000x verify-uselistorder 73506560 / 58211520 = 1.262x yaml2obj 27882712 / 26506184 = 1.051x yaml-bench 7001024 / 6763952 = 1.035x From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of George Rimar via llvm-dev Sent: Monday, December 04, 2017 7:11 AM To: llvm-dev at lists.llvm.org Subject: [llvm-dev] [RFC] - Deduplication of debug information in linkers (LLD). Hi all ! We have an issue with LLD, it is "relocation R_X86_64_32 out of range" (PR31109) which occurs during resolving relocations in debug sections. It looks happens because .debug_info section can be too large sometimes and 32x relocation is not enough to represent the value. One of possible solutions looks to be to deduplicate information to reduce .debug_info size. The rest of mail contains information about experiments I did, the obtained results and some questions and suggestions as well. I was investigating idea to deduplicate debug types information. Idea is described at p276 of DWARF4 specification (http://www.dwarfstd.org/doc/DWARF4.pdf). It suggests to split types information out of .debug_info and emit multiple .debug_types sections with use of COMDATs. Both clang and gcc I tested implements -fdebug-types-section flag for that: -fdebug-types-section, -fno-debug-types-section Place debug types in their own section (ELF Only) gcc's description is here: https://gcc.gnu.org/onlinedocs/gcc-6.4.0/gcc/Debugging-Options.html#Debugging-Options. This flag is disabled by default. I compared clang binaries to see the difference with and without the linker side optimisation. 1) Clang built with -g has size of 1.7 GB, .debug_info section size is 894.5 Mb. 2) Clang built with -g -fdebug-types-section has size of 1.0 GB. .debug_types size is 26.267 MB, .debug_info size is 227.7 MB. Difference is huge and I believe shows (though probably for most of readers here it was already obvious) that optimization can be useful. Though -fdebug-types-section is disabled by default. Looks it was initially disabled because not all of DWARF consumers were aware of .debug_types section. Now in 2017 situation is different. I think most of DWARF consumers knows about .debug_types, but: 1) DWARF5 specification explicitly eliminates the .debug_types section introduced in DWARF4: p8, "1.4 Changes from Version 4 to Version 5" http://dwarfstd.org/doc/DWARF5.pdf 2) Instead of emiting multiple .debug_types it suggests to emit multiple .debug_info COMDAT sections. (p375, p376). And it seems currently there is no way to make clang to emit multiple .debug_info with type information like DWARF5 suggests. I tried command line below: -g -fdebug-types-section -gdwarf-5 It still emits .debug_types and does not look there is a flag for emiting multiple .debug_info. Looking at whole LLVM code (lib/mc, lib/CodeGen) actually it seems it is just always assumed .debug_info is a unique section in object. (also not sure why clang emits .debug_types when -gdwarf-5 flag is set, as this section is incompatible with v5, probably it is a bug). So my questions are following: 1) Do we want to try to implement multiple .debug_info approach ? As it seems can be very useful sometimes. 2) For now in LLD may be we may want to extend our error message from "relocation X out of range" to something suggesting to use -fdebug-types-section (only for relocations in debug sections) ? 3) Why -fdebug-types-section is disabled by default ? Best regards, George | Developer | Access Softek, Inc -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171205/3671e6a6/attachment-0001.html>
Robinson, Paul via llvm-dev
2017-Dec-05 15:13 UTC
[llvm-dev] [RFC] - Deduplication of debug information in linkers (LLD).
From: Rui Ueyama [mailto:ruiu at google.com] Sent: Monday, December 04, 2017 9:09 PM To: Robinson, Paul Cc: George Rimar; llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] [RFC] - Deduplication of debug information in linkers (LLD). On Mon, Dec 4, 2017 at 10:49 AM, Robinson, Paul via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote: Thanks for providing the experimental data! It clearly shows the value of type sections in DWARF. Regarding why type sections are off by default, aside from the issue of consumers needing to understand them, there is a size penalty to type sections that becomes more evident in smaller projects (meaning, fewer compilation units). The size penalty can be balanced against the amount of deduplication for a net win, if you have enough duplicates that you can eliminate. But it is a tradeoff. By a size penalty, which do you mean, the size of the final executable or the intermediate object files? If it is a size penalty of object files, how much is that? I wonder if the current situation is a reasonable trade-off. When we emit a type section instead of directly emitting the type to .debug_info, we effectively extract the type description and move it into the type section; however the type section also has overhead, consisting of a header and some wrapper around the type information, and possibly some additional context. This is obviously bigger than the original description. Also references to the type become bigger; at a minimum, they are each 8 bytes, rather than the usual 4 bytes. Repeat this overhead for each type moved to a type section. All of this results in a bigger intermediate object file. I have not tried to measure how much this is for "typical" compilation units. IIRC, LLVM chooses to move enums and aggregates into type units; it does not assess the size of a type description as part of its heuristic. If none of the type sections are duplicated in other object files, then the final executable will be just as much bigger as the linkfiles. To the extent that there are duplicates the linker can eliminate, you start to claw back space consumed by the overhead. If you have enough duplicates to eliminate, you have a net size win in the executable. --paulr -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171205/8f80560a/attachment.html>