Alexey Lapshin via llvm-dev
2020-May-29 19:08 UTC
[llvm-dev] Range lists, zero-length functions, linker gc
> Subject: Re: [llvm-dev] Range lists, zero-length functions, linker gc > > On 2020-05-28, David Blaikie wrote: > >On Thu, May 28, 2020 at 2:52 PM Robinson, Paul <paul.robinson at sony.com> > >wrote: > > > >> As has been mentioned elsewhere, Sony generally fixes up references > from > >> debug info to stripped functions (of any length) using -1, because > that’s a > >> less-likely-to-be-real address than 0x0 or 0x1. (0x0 is a typical base > >> address for shared libraries, I’d think using it has the potential to > >> mislead various consumers.) For .debug_ranges we use -2, because both > a > >> 0/0 pair and a -1/-1 pair have a reserved meaning in that section. > >> > > > >Any harm in using -2 everywhere, for consistency? > > When resolving a relocation, in certain cases we have to give an undefined > symbol a value. > This can happen with: > > * an undefined weak symbol > * an undefined global symbol in --noinhibit-exec mode (a buggy --gc- > sections implementation can trigger this as well) > * a relocation referencing an undefined symbol in a non-SHF_ALLOC section > > We always respect the addend in a relocation entry for an absolute/PC- > relative (I can use "most" here) > relocation (R_ARM_THM_PC8, R_AARCH64_ADR_PREL_PG_HI21, R_X86_64_64, > local exec TLS relocation types, ...) > Ignoring the addend (using -2 everywhere) will break this consistency. > > The relocated code may do pointer subtraction which would work if addends > were > respected, but will break using -2 everywhere.>I suspect David meant "any harm to using -2 in all .debug_* sections?" >and not literally everywhere. Sony does special cases only for the >.debug_* sections.>I've been meaning to propose that DWARF v6 reserve a special address for >this kind of situation. Whether the committee would be willing to make >it be -1 or -2 for all targets, or make it target-defined, I don't know.>(Dreading the inevitable argument over whether addresses are signed or >unsigned, or more to the point whether they wrap. They've been unsigned >and wrapping was undefined on the small set of machines I'm familiar with.) >Certainly the toolchain community would benefit from making it be the >same everywhere.>Personally I'd vote for -1, and make pre-v5 .debug_loc/.debug_ranges >sections be an extra-special case using -2. We can (I hope) standardize >on -1 for v6 onward, and document -1/-2 on the DWARF wiki as recommended >practice for prior versions.Would it make sense to use "LowPC > HighPC" in DWARF documentation as a sign for that case, instead of -1 or -2 ? Or more correct: To indicate that address range points into deleted code there should be used either zero length, either LowPC>HighPc range ? zero length address range is already defined in DWARF documentation. LowPC>HighPc is currently not described. It could be documented and used as general representation instead of concrete special value. Implementation could still use -2 for resolving relocations and it would satisfy above definition. Thank you, Alexey.
Robinson, Paul via llvm-dev
2020-May-29 19:52 UTC
[llvm-dev] Range lists, zero-length functions, linker gc
> -----Original Message----- > From: Alexey Lapshin <alapshin at accesssoftek.com> > Sent: Friday, May 29, 2020 3:09 PM > To: Robinson, Paul <paul.robinson at sony.com>; Fangrui Song > <maskray at google.com>; David Blaikie <dblaikie at gmail.com> > Cc: Sriraman Tallam <tmsriram at google.com>; Wei Mi <wmi at google.com>; Adrian > Prantl <aprantl at apple.com>; Jonas Devlieghere <jdevlieghere at apple.com>; > Alexey Lapshin <a.v.lapshin at mail.ru>; Eric Christopher > <echristo at gmail.com>; peter.smith at arm.com; George Rimar > <grimar at accesssoftek.com>; llvm-dev at lists.llvm.org > Subject: Re: [llvm-dev] Range lists, zero-length functions, linker gc > > > > Subject: Re: [llvm-dev] Range lists, zero-length functions, linker gc > > > > On 2020-05-28, David Blaikie wrote: > > >On Thu, May 28, 2020 at 2:52 PM Robinson, Paul <paul.robinson at sony.com> > > >wrote: > > > > > >> As has been mentioned elsewhere, Sony generally fixes up references > > from > > >> debug info to stripped functions (of any length) using -1, because > > that's a > > >> less-likely-to-be-real address than 0x0 or 0x1. (0x0 is a typical > base > > >> address for shared libraries, I'd think using it has the potential to > > >> mislead various consumers.) For .debug_ranges we use -2, because > both > > a > > >> 0/0 pair and a -1/-1 pair have a reserved meaning in that section. > > >> > > > > > >Any harm in using -2 everywhere, for consistency? > > > > When resolving a relocation, in certain cases we have to give an > undefined > > symbol a value. > > This can happen with: > > > > * an undefined weak symbol > > * an undefined global symbol in --noinhibit-exec mode (a buggy --gc- > > sections implementation can trigger this as well) > > * a relocation referencing an undefined symbol in a non-SHF_ALLOC > section > > > > We always respect the addend in a relocation entry for an absolute/PC- > > relative (I can use "most" here) > > relocation (R_ARM_THM_PC8, R_AARCH64_ADR_PREL_PG_HI21, R_X86_64_64, > > local exec TLS relocation types, ...) > > Ignoring the addend (using -2 everywhere) will break this consistency. > > > > The relocated code may do pointer subtraction which would work if > addends > > were > > respected, but will break using -2 everywhere. > > >I suspect David meant "any harm to using -2 in all .debug_* sections?" > >and not literally everywhere. Sony does special cases only for the > >.debug_* sections. > > >I've been meaning to propose that DWARF v6 reserve a special address for > >this kind of situation. Whether the committee would be willing to make > >it be -1 or -2 for all targets, or make it target-defined, I don't know. > > >(Dreading the inevitable argument over whether addresses are signed or > >unsigned, or more to the point whether they wrap. They've been unsigned > >and wrapping was undefined on the small set of machines I'm familiar > with.) > >Certainly the toolchain community would benefit from making it be the > >same everywhere. > > >Personally I'd vote for -1, and make pre-v5 .debug_loc/.debug_ranges > >sections be an extra-special case using -2. We can (I hope) standardize > >on -1 for v6 onward, and document -1/-2 on the DWARF wiki as recommended > >practice for prior versions. > > Would it make sense to use "LowPC > HighPC" in DWARF documentation as a > sign for that > case, instead of -1 or -2 ? > > Or more correct: To indicate that address range points into deleted code > there should be used either zero length, either LowPC>HighPc range ? > > zero length address range is already defined in DWARF documentation. > LowPC>HighPc is currently not described. It could be documented and > used as general representation instead of concrete special value. > > Implementation could still use -2 for resolving relocations and it would > satisfy above definition. > > Thank you, Alexey.For addresses that are part of a range, that sounds reasonable. Addresses are not always part of a range, however. I can think of two cases where they do not: DW_TAG_label points to a single instruction, not a range; and the .debug_line section doesn't really identify ranges, at least not directly. I still think we'd want to specify a reserved value. Thanks, --paulr
David Blaikie via llvm-dev
2020-May-29 20:21 UTC
[llvm-dev] Range lists, zero-length functions, linker gc
On Fri, May 29, 2020 at 12:52 PM Robinson, Paul <paul.robinson at sony.com> wrote:> > > > > -----Original Message----- > > From: Alexey Lapshin <alapshin at accesssoftek.com> > > Sent: Friday, May 29, 2020 3:09 PM > > To: Robinson, Paul <paul.robinson at sony.com>; Fangrui Song > > <maskray at google.com>; David Blaikie <dblaikie at gmail.com> > > Cc: Sriraman Tallam <tmsriram at google.com>; Wei Mi <wmi at google.com>; Adrian > > Prantl <aprantl at apple.com>; Jonas Devlieghere <jdevlieghere at apple.com>; > > Alexey Lapshin <a.v.lapshin at mail.ru>; Eric Christopher > > <echristo at gmail.com>; peter.smith at arm.com; George Rimar > > <grimar at accesssoftek.com>; llvm-dev at lists.llvm.org > > Subject: Re: [llvm-dev] Range lists, zero-length functions, linker gc > > > > > > > Subject: Re: [llvm-dev] Range lists, zero-length functions, linker gc > > > > > > On 2020-05-28, David Blaikie wrote: > > > >On Thu, May 28, 2020 at 2:52 PM Robinson, Paul <paul.robinson at sony.com> > > > >wrote: > > > > > > > >> As has been mentioned elsewhere, Sony generally fixes up references > > > from > > > >> debug info to stripped functions (of any length) using -1, because > > > that's a > > > >> less-likely-to-be-real address than 0x0 or 0x1. (0x0 is a typical > > base > > > >> address for shared libraries, I'd think using it has the potential to > > > >> mislead various consumers.) For .debug_ranges we use -2, because > > both > > > a > > > >> 0/0 pair and a -1/-1 pair have a reserved meaning in that section. > > > >> > > > > > > > >Any harm in using -2 everywhere, for consistency? > > > > > > When resolving a relocation, in certain cases we have to give an > > undefined > > > symbol a value. > > > This can happen with: > > > > > > * an undefined weak symbol > > > * an undefined global symbol in --noinhibit-exec mode (a buggy --gc- > > > sections implementation can trigger this as well) > > > * a relocation referencing an undefined symbol in a non-SHF_ALLOC > > section > > > > > > We always respect the addend in a relocation entry for an absolute/PC- > > > relative (I can use "most" here) > > > relocation (R_ARM_THM_PC8, R_AARCH64_ADR_PREL_PG_HI21, R_X86_64_64, > > > local exec TLS relocation types, ...) > > > Ignoring the addend (using -2 everywhere) will break this consistency. > > > > > > The relocated code may do pointer subtraction which would work if > > addends > > > were > > > respected, but will break using -2 everywhere. > > > > >I suspect David meant "any harm to using -2 in all .debug_* sections?" > > >and not literally everywhere. Sony does special cases only for the > > >.debug_* sections. > > > > >I've been meaning to propose that DWARF v6 reserve a special address for > > >this kind of situation. Whether the committee would be willing to make > > >it be -1 or -2 for all targets, or make it target-defined, I don't know. > > > > >(Dreading the inevitable argument over whether addresses are signed or > > >unsigned, or more to the point whether they wrap. They've been unsigned > > >and wrapping was undefined on the small set of machines I'm familiar > > with.) > > >Certainly the toolchain community would benefit from making it be the > > >same everywhere. > > > > >Personally I'd vote for -1, and make pre-v5 .debug_loc/.debug_ranges > > >sections be an extra-special case using -2. We can (I hope) standardize > > >on -1 for v6 onward, and document -1/-2 on the DWARF wiki as recommended > > >practice for prior versions. > > > > Would it make sense to use "LowPC > HighPC" in DWARF documentation as a > > sign for that > > case, instead of -1 or -2 ? > > > > Or more correct: To indicate that address range points into deleted code > > there should be used either zero length, either LowPC>HighPc range ? > > > > zero length address range is already defined in DWARF documentation. > > LowPC>HighPc is currently not described. It could be documented and > > used as general representation instead of concrete special value. > > > > Implementation could still use -2 for resolving relocations and it would > > satisfy above definition. > > > > Thank you, Alexey. > > For addresses that are part of a range, that sounds reasonable.I think it'd still be tricky to work with even just considering ranges for a few reasons: * Ranges described in split DWARF by low_pc(address), high_pc(data/length) - the high_pc can't be fixed up. * Ranges that aren't at the start of a section - eg: ("void f1() { } nodebug void f2() { } void f3() { }" - without function sections, f3 will start and end at some offset relative to the base address of the .text section - this means, for instance, that the low/high_pc of f3, let's say low_pc was a relocatable address and high_pc was length and f3 gets linker-gc'd, now the base address resolves to -2, but -2 + offset (OK, I'm stretching a bit here, we aren't doing this yet - but see my thread back in February or so, when I discussed the idea of using base address+offset to reduce the number of relocations/size of the address pool) wraps around and becomes <> > Addresses are not always part of a range, however. I can think of > two cases where they do not: DW_TAG_label points to a single > instruction, not a range; and the .debug_line section doesn't really > identify ranges, at least not directly. I still think we'd want to > specify a reserved value.the new/3rd case here is DW_TAG_call_site, that uses DW_AT_call_pc to identify the call instruction. But, yeah - I think a blessed value, that taints any address computation you may do based on it (so it requires explicit support in the debugger so that it doesn't casually wrap-around "max - 1 (aka "-2")" back to a positive value just because you do "addr+offset" on it in a DWARF expression or form, etc).