David Blaikie via llvm-dev
2020-Jul-17 01:19 UTC
[llvm-dev] Switch to ld.bfd tombstone behavior by default
In short: Perhaps we should switch lld to the bfd-style tombstoning behavior for a release or two, letting users opt-in to testing with the new -1/-2 tombstoning in the interim, before switching to the new tombstone by default (while still having the flag to switch back when users find surprise places that can't handle the new behavior). In long: https://reviews.llvm.org/D81784 and follow-on patches modified the behavior of lld with regards to resolving relocations from debug sections to dead code (either comdat deduplicated, or gc-sections use). A very quick summary of the situation: Original Behavior: - bfd: 1 for debug_ranges(0 would prematurely terminate the list), 0 elsewhere - gold/lld: 0+addend everywhere Limitations/bugs: - bfd/gold/lld - doesn't support 0 as a valid executable address without ambiguities - gold/lld - ambiguities with large gc'd functions combined with a .text mapping that starts in relative low addresses - premature debug_range termination with zero-length functions (Clang produces these with __builtin_unreachable or non-void return type functions without a return statement) New behavior: - -2 for DWARFv4 debug_loc, debug_ranges (-1 is a base address specifier there) - -1 elsewhere - linker flag to customize to other values if desired Known issues: - lldb's line table parsing can't handle -1 well at all (essentially unusable) - gdb's line table parsing ends up with different handling when breaking on gc'd functions (minor functionality issue) - chromium/firefox have some tools that were broken: https://bugs.chromium.org/p/chromium/issues/detail?id=1102223#c5 I think there's enough risk in this work (even given the small number of bugs found so far), given there's a pretty wide array of debug info consumers out there, that we should change lld's default to match the long-lived bfd strategy. This would address my original motivation for raising all this (empty functions prematurely terminating the list), while letting users who want to experiment with it, or need it (like Alexey), can opt-in to the -1/-2 behavior. I'm not sure how to get the word out to DWARF consumers that they should consider this new experimental behavior. Ray's done a good job evangelizing/discussing this with gdb and lldb at least - and of course having turned it on by default briefly has found some users (like Chromium) that we probably wouldn't have found no matter how long we left this as an experimental option... so some things are going to break when we switch no matter what. P.S: Sony's already been using the -1 technique with their debugger and linker for a while, so they may want to keep this on by default for SCE - but I'm not sure how to do that in-tree. Clang doesn't know which lld version it's running, so whether the flag can be specified, I would think? (so it'd be hard to have Clang go "if SCE and LLD, pass the flag to use -1", I think) - if there is a way to make that decision in the compiler driver+linker, then we'd have a question of "default new behavior except when tuning for LLDB and GDB" or "default bfd behavior except when tuning for SCE". -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200716/2d9c4bc6/attachment-0001.html>
Fangrui Song via llvm-dev
2020-Jul-17 07:03 UTC
[llvm-dev] Switch to ld.bfd tombstone behavior by default
Thanks for the write-up! On 2020-07-16, David Blaikie wrote:>In short: Perhaps we should switch lld to the bfd-style tombstoning >behavior for a release or two, letting users opt-in to testing with the new >-1/-2 tombstoning in the interim, before switching to the new tombstone by >default (while still having the flag to switch back when users find >surprise places that can't handle the new behavior). > >In long: >https://reviews.llvm.org/D81784 and follow-on patches modified the behavior >of lld with regards to resolving relocations from debug sections to dead >code (either comdat deduplicated, or gc-sections use). > >A very quick summary of the situation: > >Original Behavior: > > - bfd: 1 for debug_ranges(0 would prematurely terminate the list), 0 > elsewhere > - gold/lld: 0+addend everywhere > >Limitations/bugs: > > - bfd/gold/lld > - doesn't support 0 as a valid executable address without ambiguities > - gold/lld > - ambiguities with large gc'd functions combined with a .text mapping > that starts in relative low addresses > - premature debug_range termination with zero-length functions (Clang > produces these with __builtin_unreachable or non-void return >type functions > without a return statement) > >New behavior: > > - -2 for DWARFv4 debug_loc, debug_ranges (-1 is a base address specifier > there) > - -1 elsewhere > - linker flag to customize to other values if desired > >Known issues: > > - lldb's line table parsing can't handle -1 well at all (essentially > unusable)Pavel Labath will fix this soon https://reviews.llvm.org/D83957 This is an unhandled address-space wraparound problem. This pattern is potentially common - and other downstream DWARF consumers might make similar line table handling mistakes.> - gdb's line table parsing ends up with different handling when breaking > on gc'd functions (minor functionality issue)This is just a behavior difference, not affecting users. It did break a test if linked with LLD (gdb intrinsically has lots of failing tests even if built with GCC+GNU ld). Previous behavior (when an address is zero): a breakpoint on a --gc-sections discarded function will be redirected to a larger line number with debug info, even if that line can be an unrelated different function. New behavior is that the breakpoint is on a wrapped-around small address. GDB 9.3 will restore the previous behavior (https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=a8caed5d7faa639a1e6769eba551d15d8ddd9510)> >I think there's enough risk in this work (even given the small number of >bugs found so far), given there's a pretty wide array of debug info >consumers out there, that we should change lld's default to match the >long-lived bfd strategy. This would address my original motivation for >raising all this (empty functions prematurely terminating the list), while >letting users who want to experiment with it, or need it (like Alexey), can >opt-in to the -1/-2 behavior.I think we can only confidently say that there is enough risk in using tombstone value -1 in .debug_line, but I'd not say tombstone value -1 in other .debug_* can cause problems. Hope others can chime in. With consideration for satefy for the upcoming release/11.x, we can make two choices: a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* -> -1 b) .debug_ranges&.debug_loc => -2, other .debug_* => 0 Delaying .debug_line => -1 for one or two release sounds good to me. So LLD 11 or 12 linked binaries can be debugged by LLDB 10. This is a nice property. This write-up proposes b), but I'd say a) is likely sufficient. With the available information, I cannot yet say that a) will have more risk.> - chromium/firefox have some tools that were broken: > https://bugs.chromium.org/p/chromium/issues/detail?id=1102223#c5This is potentially related to other .debug_* (not .debug_line) I hope Chromium developers can chime in here:) The breakage was unfortunate but I don't know how we could have avoided that. IMHO this is no different from "clang started to emit a new DW_FORM_* and a postprocessing tool of .debug chokes on that" Whether we want to suppress that particular DW_FORM_* definitely should depend on how likely it can cause problems, but we can't yet say we have to hold off on a feature for a solved (precisely, mitigated) problem.>I'm not sure how to get the word out to DWARF consumers that they should >consider this new experimental behavior. Ray's done a good job >evangelizing/discussing this with gdb and lldb at least - and of course >having turned it on by default briefly has found some users (like Chromium) >that we probably wouldn't have found no matter how long we left this as an >experimental option... so some things are going to break when we switch no >matter what.Thank you for following up with some GNU folks on their lists! If folks want to follow along the thread: https://sourceware.org/pipermail/binutils/2020-June/111376.html We have informed binutils, elfutils-devel (elfutils has a few debug tools) and gdb. I don't recall that anyone has thought about problems with a tombstone value.> >P.S: Sony's already been using the -1 technique with their debugger and >linker for a while, so they may want to keep this on by default for SCE - >but I'm not sure how to do that in-tree. > > >Clang doesn't know which lld >version it's running, so whether the flag can be specified, I would think? >(so it'd be hard to have Clang go "if SCE and LLD, pass the flag to use >-1", I think) - if there is a way to make that decision in the compiler >driver+linker, then we'd have a question of "default new behavior except >when tuning for LLDB and GDB" or "default bfd behavior except when tuning >for SCE".I've been involed in another thread on SHF_LINK_ORDER (https://sourceware.org/pipermail/binutils/2020-July/112415.html ). We may need a way to tell codegen about the used linker. pcc proposed -mbinutils-version= - This is nice in that some MC decisions related to -fno-integrated-as can use this option as well. jyknight proposed -mlinker-version= and syntax like -fuse-ld=bfd:2.34 This may get more complex if the generated object file want to be linked with more than one linker. This discussion probably deserves its own thread.
James Henderson via llvm-dev
2020-Jul-17 09:05 UTC
[llvm-dev] Switch to ld.bfd tombstone behavior by default
On Fri, 17 Jul 2020 at 02:20, David Blaikie <dblaikie at gmail.com> wrote:> P.S: Sony's already been using the -1 technique with their debugger and > linker for a while, so they may want to keep this on by default for SCE - > but I'm not sure how to do that in-tree. Clang doesn't know which lld > version it's running, so whether the flag can be specified, I would think? > (so it'd be hard to have Clang go "if SCE and LLD, pass the flag to use > -1", I think) - if there is a way to make that decision in the compiler > driver+linker, then we'd have a question of "default new behavior except > when tuning for LLDB and GDB" or "default bfd behavior except when tuning > for SCE". >Currently, where we want different defaults in our LLD to upstream LLD, our downstream port overrides the behaviour, if the linker name matches our downstream linker name. The behaviour isn't at all related to the input file targets etc. This is so that the linker can be run without using the clang driver to provide all the various arguments/keep things simpler in our restricted environment etc. We might change this in the future, but either way, as long as there's a configurable behaviour we can easily override, then it's fine. It would help if we get a heads up if a default behaviour like this is changing, since sometimes it might cause us problems/not get noticed by our testing etc. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200717/e458f056/attachment.html>
David Blaikie via llvm-dev
2020-Jul-17 16:42 UTC
[llvm-dev] Switch to ld.bfd tombstone behavior by default
On Fri, Jul 17, 2020 at 12:03 AM Fangrui Song <maskray at google.com> wrote:> > Thanks for the write-up! > > On 2020-07-16, David Blaikie wrote: > >In short: Perhaps we should switch lld to the bfd-style tombstoning > >behavior for a release or two, letting users opt-in to testing with the new > >-1/-2 tombstoning in the interim, before switching to the new tombstone by > >default (while still having the flag to switch back when users find > >surprise places that can't handle the new behavior). > > > >In long: > >https://reviews.llvm.org/D81784 and follow-on patches modified the behavior > >of lld with regards to resolving relocations from debug sections to dead > >code (either comdat deduplicated, or gc-sections use). > > > >A very quick summary of the situation: > > > >Original Behavior: > > > > - bfd: 1 for debug_ranges(0 would prematurely terminate the list), 0 > > elsewhere > > - gold/lld: 0+addend everywhere > > > >Limitations/bugs: > > > > - bfd/gold/lld > > - doesn't support 0 as a valid executable address without ambiguities > > - gold/lld > > - ambiguities with large gc'd functions combined with a .text mapping > > that starts in relative low addresses > > - premature debug_range termination with zero-length functions (Clang > > produces these with __builtin_unreachable or non-void return > >type functions > > without a return statement) > > > >New behavior: > > > > - -2 for DWARFv4 debug_loc, debug_ranges (-1 is a base address specifier > > there) > > - -1 elsewhere > > - linker flag to customize to other values if desired > > > >Known issues: > > > > - lldb's line table parsing can't handle -1 well at all (essentially > > unusable) > > Pavel Labath will fix this soon https://reviews.llvm.org/D83957 > This is an unhandled address-space wraparound problem. > This pattern is potentially common - and other downstream DWARF > consumers might make similar line table handling mistakes.That's the thing - I'm not sure we can really classify them as "mistakes". I think bfd.ld's tombstoning behavior is about the only thing we can reasonably say DWARF consumers should /probably/ be expected to handle - and I'd imagine in many cases they haven't been written intentionally to handle it, but whatever behavior they have has accidentally been sufficient for their needs.> > - gdb's line table parsing ends up with different handling when breaking > > on gc'd functions (minor functionality issue) > > This is just a behavior difference, not affecting users. > It did break a test if linked with LLD (gdb intrinsically has lots of > failing tests even if built with GCC+GNU ld). > > Previous behavior (when an address is zero): a breakpoint on a > --gc-sections discarded function will be redirected to a larger line > number with debug info, even if that line can be an unrelated different > function. > New behavior is that the breakpoint is on a wrapped-around small address. > > GDB 9.3 will restore the previous behavior > (https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=a8caed5d7faa639a1e6769eba551d15d8ddd9510) > > > > >I think there's enough risk in this work (even given the small number of > >bugs found so far), given there's a pretty wide array of debug info > >consumers out there, that we should change lld's default to match the > >long-lived bfd strategy. This would address my original motivation for > >raising all this (empty functions prematurely terminating the list), while > >letting users who want to experiment with it, or need it (like Alexey), can > >opt-in to the -1/-2 behavior. > > I think we can only confidently say that there is enough risk in using > tombstone value -1 in .debug_line, but I'd not say tombstone value -1 in > other .debug_* can cause problems.Given how many DWARF parsers we've had to cleanup or migrate off in transitioning to DWARFv5 inside Google, I think that's a fair bit of evidence the set of parsers isn't as narrow/closed as we'd like and thus the number of places that might have issues isn't known/easily exhaustively tested. So I'm not so concerned about the bugs we've seen, but what that might indicate about the things that we don't know about and can't test (because we don't know about them). (of course, the flipside of that is that if we don't know about them, we can't tell people who own them to go check if they work with this opt-in feature - so they'll break whenever we turn this on by default - but perhaps in the interim we can get at least a few big LLD customers to deploy the feature and flush out some of the issues - happy for Google to use this internally, hopefully we can encourage Apple folks, Sony of course already has this semantic so nothing to find there most likely - Chromium, maybe Firefox, etc)> With consideration for satefy for the upcoming release/11.x, we can make > two choices: > > a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* -> -1 > b) .debug_ranges&.debug_loc => -2, other .debug_* => 0 > > Delaying .debug_line => -1 for one or two release sounds good to me. > So LLD 11 or 12 linked binaries can be debugged by LLDB 10. This is a > nice property. > > This write-up proposes b), but I'd say a) is likely sufficient. With the > available information, I cannot yet say that a) will have more risk.Risk is about the unknowns - and it still seems like a lot of unknowns. While there are probably many more consumers that read .debug_line than other sections, reading debug_info (for instance) is necessary for inline frames in symbolizing - still probably one of the most common uses of DWARF I'd guess. (what about stack unwinding using debug_frame? that'd worry me a bit if anyone got /that/ wrong because of this change)> > - chromium/firefox have some tools that were broken: > > https://bugs.chromium.org/p/chromium/issues/detail?id=1102223#c5 > > This is potentially related to other .debug_* (not .debug_line) > I hope Chromium developers can chime in here:) The breakage was > unfortunate but I don't know how we could have avoided that. IMHO this > is no different from "clang started to emit a new DW_FORM_* and a > postprocessing tool of .debug chokes on that" Whether we want to > suppress that particular DW_FORM_* definitely should depend on how > likely it can cause problems, but we can't yet say we have to hold off > on a feature for a solved (precisely, mitigated) problem.LLVM has no custom forms and I'd be super cautious about adding any that were on by default because of how bad that breakage would be. I'm not so concerned about the problems we know - but what they tell us about the problems that might arise from use cases we don't know. All the other projects out there that might have custom DWARF parsers to do some ad-hoc things. (also, ultimately - given how far-reaching this is, I think we'll want some tidier flags that are more user-focussed. I'd hope for a flag that gives BFD-like semantics (though I'd be OK with fixing debug_loc (using 1 instead of 0) to work the same as debug_ranges while we're there - a minor divergence from BFD, but highly likely to not cause problems/fall out naturally from a simple implementation of parsing that section) - something that's been in-use and tested by basically everyone for decades. And another flag for the new semantics (-2 for debug_loc/debug_ranges, -1 everywhere else). Customizable per-section expression-based support I think is a recipe for platform divergence & I'd rather it not be available/supported at all, but if you really want to keep it in, I'd at least rather it not be the feature we promote to users about how they can test/opt in/out of the behavior when they're seeing breakages or want to test the future semantics)> >I'm not sure how to get the word out to DWARF consumers that they should > >consider this new experimental behavior. Ray's done a good job > >evangelizing/discussing this with gdb and lldb at least - and of course > >having turned it on by default briefly has found some users (like Chromium) > >that we probably wouldn't have found no matter how long we left this as an > >experimental option... so some things are going to break when we switch no > >matter what. > > Thank you for following up with some GNU folks on their lists! > If folks want to follow along the thread: > https://sourceware.org/pipermail/binutils/2020-June/111376.html > > We have informed binutils, elfutils-devel (elfutils has a few debug > tools) and gdb. I don't recall that anyone has thought about problems > with a tombstone value. > > > > >P.S: Sony's already been using the -1 technique with their debugger and > >linker for a while, so they may want to keep this on by default for SCE - > >but I'm not sure how to do that in-tree. > > > > > >Clang doesn't know which lld > >version it's running, so whether the flag can be specified, I would think? > >(so it'd be hard to have Clang go "if SCE and LLD, pass the flag to use > >-1", I think) - if there is a way to make that decision in the compiler > >driver+linker, then we'd have a question of "default new behavior except > >when tuning for LLDB and GDB" or "default bfd behavior except when tuning > >for SCE". > > I've been involed in another thread on SHF_LINK_ORDER (https://sourceware.org/pipermail/binutils/2020-July/112415.html ). > We may need a way to tell codegen about the used linker. > > pcc proposed -mbinutils-version= - This is nice in that some MC > decisions related to -fno-integrated-as can use this option as well. > jyknight proposed -mlinker-version= and syntax like -fuse-ld=bfd:2.34 > > This may get more complex if the generated object file want to be linked > with more than one linker. This discussion probably deserves its own > thread.