George Rimar via llvm-dev
2017-Dec-06 11:15 UTC
[llvm-dev] [RFC] - Deduplication of debug information in linkers (LLD).
>If you're interested in things you can do in the linker for this - you might consider something more aggressive: Fully DWARF aware deduplication.> >This could be done hopefully by reusing some of the code in the dsymutil implementation in LLVM. > >This would be much more effective (and without the possible context-sensitive tradeoffs) than using type units. >Though it'd possibly have a big tradeoff in link time and/or linker memory usage (I'm not sure how much dsymutil needs/uses of either).+ Rui. I think LLD development direction vector currently is to avoid teaching linker about things it naturally should not be aware off. Like it should ideally work with sections as pieces and should not know about content. That is not always possible, for example we have to look inside .eh_frame to deuplicate FDEs, but that is probably what we would want to avoid in general.>It doesn't seem especially important to implement the DWARF5 types -> debug_info thing for this situation, the type units >as they are (in debug_types) offer the same size benefits here. But sure, if anyone wanted to implement it at some point, that'd be fine.But there is no .debug_types in DWARF5, so it is depricated approach as far I understand.>I think Paul covered some of the reasons type units might not be a reasonable default. > >One additional reason is that if you use Split DWARF (another great way to massively reduce the amount of debug info going to the linker) >type units are mostly /just/ overhead in the .dwo files: since the debug info is not linked, there's no opportunity to remove the >duplication anyway (unless you're making a DWP - like a >dsym file)Yeah. Looks -gsplit-dwarf? and -fdebug-types-section are harmfull together. Probably it worth to restrict using of them together or emit a warning (both clang and gcc silently allows the combination and output has size penalty you describing). But then does it make sence to emit multiple .debug_info sections with -gsplit-dwarf, so that objects will contain skeleton .debug_info and .debug_info sections with type units as described in DWARF5. So that linker will be able to do deduplication of types on a sections level as expected ? George. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171206/1acea48a/attachment.html>
George Rimar via llvm-dev
2017-Dec-06 12:15 UTC
[llvm-dev] [RFC] - Deduplication of debug information in linkers (LLD).
>But then does it make sence to emit multiple .debug_info sections with -gsplit-dwarf, so that objects will contain skeleton .debug_info and>.debug_info sections with type units as described in DWARF5. So that linker will be able to do deduplication of >types on a sections level as expected ?Looks that just would not work as skeleton CU has no children according to spec.. George. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171206/4d766a17/attachment-0001.html>
George Rimar via llvm-dev
2017-Dec-06 13:25 UTC
[llvm-dev] [RFC] - Deduplication of debug information in linkers (LLD).
>>But then does it make sence to emit multiple .debug_info sections with -gsplit-dwarf, so that objects will contain skeleton .debug_info and>>.debug_info sections with type units as described in DWARF5. So that linker will be able to do deduplication of >>types on a sections level as expected ? > >Looks that just would not work as skeleton CU has no children according to spec.. > >George.Ah, please ignore the above. Skeleton CU looks does not need children for that. So theoretical scenario I meant was: 1) test.o file has: .debug_info skeleton [0..N] .debug_info sections with types 2) Then test.dwo would have full .debug_info with declarations of types which definitions are still in test.o. I am not sure if it is possible to represent and makes sence. Assuming it is possible, and there is enough duplicate types, that would add some work for linker to deduplicate sections, though it should be fast, and would increase output binary for size of deduplicated types. But also would reduce size of whole set (executable + *.dwo), what can probably be useful. George. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171206/b0f80603/attachment.html>
David Blaikie via llvm-dev
2017-Dec-06 22:22 UTC
[llvm-dev] [RFC] - Deduplication of debug information in linkers (LLD).
On Wed, Dec 6, 2017 at 3:15 AM George Rimar <grimar at accesssoftek.com> wrote:> >If you're interested in things you can do in the linker for this - you > might consider something more aggressive: Fully DWARF aware deduplication. > > > >This could be done hopefully by reusing some of the code in the dsymutil > implementation in LLVM. > > > >This would be much more effective (and without the possible > context-sensitive tradeoffs) than using type units. > >Though it'd possibly have a big tradeoff in link time and/or linker > memory usage (I'm not sure how much dsymutil needs/uses of either). > > + Rui. > > I think LLD development direction vector currently is to avoid teaching > linker about things it naturally should not be aware off. >*nod* That's been the historic ELF+DWARF approach, but both MacOS (with dsyms+DWARF) and Windows (COFF+CodeView+PDB) don't do it that way, and instead involve the linker to a degree. Mostly I'm wondering if it'd be reasonable to (and if anyone would be interested in doing it) do something more like the PDB support - fully debug-aware linking.> Like it should ideally work with sections as pieces and should not know > about content. That is not always possible, > for example we have to look inside .eh_frame to deuplicate FDEs, but that > is probably what we would want to avoid in general. >Yeah, I can totally understand that & it's historically how it's been done, so I'm not expecting a change there, just floating the idea.> > >It doesn't seem especially important to implement the DWARF5 types -> > debug_info thing for this situation, the type units > >as they are (in debug_types) offer the same size benefits here. But sure, > if anyone wanted to implement it at some point, that'd be fine. > > But there is no .debug_types in DWARF5, so it is depricated approach as > far I understand. >Sure - but it works/is supported/is implemented. If someone wants to implement the newer thing, that's cool, but I don't have any personal motivation to do so for example. (& honestly we've been throwing around some ideas about how to further generalize the debug_info contributions to reduce some of the overhead of isolating types - so maybe if we're lazy enough, we might leapfrog this particular state and just implement that future better thing)> > > >I think Paul covered some of the reasons type units might not be a > reasonable default. > > > >One additional reason is that if you use Split DWARF (another great way > to massively reduce the amount of debug info going to the linker) > >type units are mostly /just/ overhead in the .dwo files: since the debug > info is not linked, there's no opportunity to remove the > >duplication anyway (unless you're making a DWP - like a >dsym file) > > Yeah. Looks -gsplit-dwarf and -fdebug-types-section are harmfull > together. Probably it worth to restrict using of them together or > emit a warning (both clang and gcc silently allows the combination and > output has size penalty you describing). >Nah, only if you're not producing a DWP at the end ( https://gcc.gnu.org/wiki/DebugFissionDWP ). In short, I probably wouldn't change any of LLVM's defaults. But there are certainly flags people can use to reduce their debug info size. You mentioned starting with this because LLVM's defaults mean the DWARF is too large to link with DWARF 32 bit? How does gold cope with this? I haven't seen failures/error messages/etc from either gold or lld related to this? (though I mostly use Split DWARF myself)> > But then does it make sence to emit multiple .debug_info sections with > -gsplit-dwarf, so that objects will contain skeleton .debug_info and > .debug_info sections with type units as described in DWARF5. So > that linker will be able to do deduplication of > types on a sections level as expected ? > > George. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171206/483c7afa/attachment.html>
George Rimar via llvm-dev
2017-Dec-07 12:47 UTC
[llvm-dev] [RFC] - Deduplication of debug information in linkers (LLD).
>*nod* That's been the historic ELF+DWARF approach, but both MacOS (with dsyms+DWARF) and Windows >(COFF+CodeView+PDB) don't do it that way, and instead involve the linker to a degree. >Mostly I'm wondering if it'd be reasonable to (and if anyone would be interested in doing it) do >something more like the PDB support - fully debug-aware linking.Honestly saying I only know how ELF linker works and may be my thoughts below are silly for some reason or duplicating some already existent approach. Looking at what .dwp do, looks there are two main things reducing size debug data: 1) "It must allow for the removal of duplicate type units". 2) "It must allow for the removal of duplicate strings". Linker already deduplicates strings by itself, though it can delegate it to some API for debug sections. And what it could probably do is call some library API. Linker could give it a some set (or all of) .debug_* sections so this library would rebuild and optimize the dwarf data, eliminate duplicates, and return optimized debug sections back to linker. Then linker would perform relocations and emit the result to output. That way library can be used for stand alone post proccessing tool probably and linker should be able to work with data on a sections level only and be not DWARF aware.>Sure - but it works/is supported/is implemented. If someone wants to implement the newer thing, that's cool, but I don't have any >personal motivation to do so for example. (& honestly we've been throwing around some ideas about how to further generalize the >debug_info contributions to reduce some of the overhead of isolating types - so maybe if we're lazy enough, we might leapfrog >this particular state and just implement that future better thing)I see. Basing on all comments in this thread I am inclined to agree that implementing newer thing does not make much sence atm. For now I prepared patch to error out when LLD faces objects with multiple .debug_* sections for cases when we do not support it. (D40950). (In LLD we are supporting deduplicating COMDATs, so generally such object is not a problem as already supported, but for error reporting purposes and for --gdb-index we assume debug sections are unique in object, so in that case we looks want to error out). Have last thoughts/question about this though :) Currently clang -gdwarf-5 -fdebug-types-section works. And so linker can deduplicate types. Though that probably violates specification saying there is no more .debug_type sections. But behavior is convinent for users of -fdebug-types-section. I do not know how transition from v4 to v5 will happen/happens (or how transition between dwarf standarts usually happens). I suppose one day clang just will start to produce v5 debug data by default. And at the same time multiple .debug_info sections mentioned in DWARF5 spec as an optimization, so it should not be a mandatory thing to implement. If so it just seems that either we will need to implement this optimization before switching to v5 by default or allow -gdwarf-5 -fdebug-types-section to support existent use case. And since it is already works and already allowed in releases it probably means it is acceptable to keep (and use) this behavior ? (If so, attempt to leapfrog can be nice strategy IMO).>>>>I think Paul covered some of the reasons type units might not be a reasonable default. >>> >>>One additional reason is that if you use Split DWARF (another great way to massively reduce the amount of debug info going to the linker) >>>type units are mostly /just/ overhead in the .dwo files: since the debug info is not linked, there's no opportunity to remove the >>>duplication anyway (unless you're making a DWP - like a >dsym file) >> >>Yeah. Looks -gsplit-dwarf and -fdebug-types-section are harmfull together. Probably it worth to restrict using of them together or >>emit a warning (both clang and gcc silently allows the combination and output has size penalty you describing). > >Nah, only if you're not producing a DWP at the end ( https://gcc.gnu.org/wiki/DebugFissionDWP ).Sure DWP do great job here it seems, but even for DWP use case flow it does not look make sence to force compiler to do excessive job to produce types sections, because DWP producing tools probably should have no benefit from larger .dwo files with .debug_types at all I think. I can only imagine now that somebody could use -gsplit-dwarf and -fdebug-types-section together so that can parse .debug_types.dwo instead of parsing .debug_info.dwo to look for types in a bit more convinent way, but that looks too synthetic case.>In short, I probably wouldn't change any of LLVM's defaults. But there are certainly flags people can use to reduce their debug info size. > >You mentioned starting with this because LLVM's defaults mean the DWARF is too large to link with DWARF 32 bit? How does gold cope with this? >I haven't seen failures/error messages/etc from either gold or lld related to this? (though I mostly use Split DWARF myself)I posted some results earlier here: https://bugs.llvm.org//show_bug.cgi?id=31109#c3, in short: gold 2.26.1 silently ignored this (probably produced broken output), and newer versions of gold are able to report and catch the same error. I think it is simply still not common to have such a large debug sections, we had only single bug about this so far. And hopefully DWARF64 can be a solution, though it can just hide the issue, looks would be nice to reduce amount of debug data we produce still. Best regards, George | Developer | Access Softek, Inc -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171207/9e255bad/attachment.html>