thr3ads.net - llvm dev - [llvm-dev] Range lists, zero-length functions, linker gc [May 2020]

If this information is useful, please help other people find it:
Share via:

David Blaikie via llvm-dev

2020-May-27 20:44 UTC

[llvm-dev] Range lists, zero-length functions, linker gc

So there have been several recent discussions about the issues around
DWARF-agnostic linking and gc-sections, linkonce function definitions being
dropped, etc - and just how much DWARF-awareness would be suitable in a
linker to help with this situation.

I'd like to discuss a narrower instance of this issue: Zero length
gc'd/deduplicated functions.

LLVM seems to at least produce zero length functions in a few cases:
* non-void function without a return statement
* function definition containing only llvm_unreachable
(both of these trap at -O0, but at higher optimization levels even the trap
instruction is removed & you get the full power UB of control flowing off
the end of the function into whatever other bytes are after that function)

So, for context, debug_ranges (this whole issue doesn't exist in DWARFv5,
FWIW) is a list of address pairs, terminated by a pair of zeros.
With function sections, or even just with normal C++ inline functions, the
CU will have a range entry for that function that consists of two
relocations - to the start and end of the function. Generally the start of
the function is the start of the section, and the end is "start of
function + length of function (aka addend)".

Usually any relocation to the section would keep that section "alive"
during linking - but that would cause debug info to defeat linker GC and
deduplication. So there's special rules for how linkers handle these
relocations in debug info to allow the sections to be dropped - what do you
write in the bytes that requested the relocation?

Binutils ld: Special cases only debug_ranges, resolving all relocations to
dead code to 1. In other debug sections, these values are all resolved to
zero.
Gold and lld: Special cases all debug info sections - resolving all
relocations to "addend" (so begin usually goes to zero, end goes to
"size
of function")

These special rules are designed to ensure omitted/gc'd/deduplicated
functions don't cause the range list to terminate prematurely (which would
happen if begin/end were both resolved to zero).


But with an empty function, gold and lld's strategy here fails to avoid
terminating a range list by accident.

What should we do about it?

1) Ensure no zero-length functions exist? (doesn't address backwards
compatibility/existing functions/other compilers)
2) adopt the binutils approach to this (at least in debug_ranges - maybe in
all debug sections? (doing it in other sections could break )
3) Revisit the discussion about using an even more 'blessed' value, like
int max-1? ( https://reviews.llvm.org/D59553 )

(I don't have links to all the recent threads about this discussion - I
think D59553 might've spawned a separate broader discussion/non-review -
oh, Alexey wrote a good summary with links to other discussions here:
http://lists.llvm.org/pipermail/llvm-dev/2019-September/135068.html )

Thoughts?
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200527/1293fa00/attachment.html>

Alexey Lapshin via llvm-dev

2020-May-28 13:03 UTC

head link

[llvm-dev] Range lists, zero-length functions, linker gc

Hi David,

>So there have been several recent discussions about the issues around
>DWARF-agnostic linking and gc-sections, linkonce function definitions being
>dropped, etc - and just how much DWARF-awareness would be suitable
>in a linker to help with this situation.
> I'd like to discuss a narrower instance of this issue: Zero length
gc'd/deduplicated functions.
> LLVM seems to at least produce zero length functions in a few cases:
> * non-void function without a return statement
> * function definition containing only llvm_unreachable
> (both of these trap at -O0, but at higher optimization levels even the trap
> instruction is removed & you get the full power UB of control flowing
off
> the end of the function into whatever other bytes are after that function)
> So, for context, debug_ranges (this whole issue doesn't exist in
DWARFv5,
> FWIW) is a list of address pairs, terminated by a pair of zeros.
> With function sections, or even just with normal C++ inline functions,
> the CU will have a range entry for that function that consists of two
relocations
> - to the start and end of the function. Generally the start of the function
is the
> start of the section, and the end is "start of function + length of
function (aka addend)".
>  Usually any relocation to the section would keep that section
"alive" during linking -
> but that would cause debug info to defeat linker GC and deduplication. So
there's
> special rules for how linkers handle these relocations in debug info to
allow the
> sections to be dropped - what do you write in the bytes that requested the
relocation?
> Binutils ld: Special cases only debug_ranges, resolving all relocations to
dead
> code to 1. In other debug sections, these values are all resolved to zero.
> Gold and lld: Special cases all debug info sections - resolving all
relocations
> to "addend" (so begin usually goes to zero, end goes to
"size of function")
> These special rules are designed to ensure omitted/gc'd/deduplicated
functions
> don't cause the range list to terminate prematurely (which would happen
if begin/end
> were both resolved to zero).
>But with an empty function, gold and lld's strategy here fails to avoid
terminating a
>range list by accident.
> What should we do about it?
>  1) Ensure no zero-length functions exist? (doesn't address backwards
> compatibility/existing functions/other compilers)
> 2) adopt the binutils approach to this (at least in debug_ranges - maybe in
all
> debug sections? (doing it in other sections could break )
>  3) Revisit the discussion about using an even more 'blessed'
value,
> like int max-1? ( https://reviews.llvm.org/D59553 )
>  (I don't have links to all the recent threads about this discussion -
I think D59553
> might've spawned a separate broader discussion/non-review - oh, Alexey
wrote a
> good summary with links to other discussions here:
>  http://lists.llvm.org/pipermail/llvm-dev/2019-September/135068.html )
> Thoughts?
I think for the problem of "zero length functions and .debug_ranges"
binutils approach looks good:
>Special cases only debug_ranges, resolving all relocations to
>dead code to 1. In other debug sections, these values are all resolved to
>zero.
But, this would not completely solve the problem from
https://reviews.llvm.org/D59553 - Overlapped address ranges. Binutils approach
will solve the problem if the address range specified as
start_address:end_address. While resolving relocations, it would replace such a
range with 1:1.
However, It would not work if address ranges were specified as
start_address:length since the length is not relocated. This case could be
additionally fixed by fast scan debug_info for High_PC defined as length and
changing it to 1. Something which you suggested here:
http://lists.llvm.org/pipermail/llvm-dev/2020-May/141599.html.

So it looks like following solution could fix both problems and be relatively
fast:

"Resolve all relocations from debug sections into dead code to 1. Parse
debug sections and replace HighPc of an address range pointing to dead code and
specified as length to 1".

As the result all address ranges pointing into dead code would be marked as zero
length.

There still exist another problem:

DWARF4: "A range list entry (but not a base address selection or end of
list entry) whose beginning and
ending addresses are equal has no effect because the size of the range covered
by such an
entry is zero."

DWARF5: "A bounded range entry whose beginning and ending address offsets
are equal
(including zero) indicates an empty range and may be ignored."

These rules allow us to ignore zero-length address ranges. I.e., some tool
reading DWARF is permitted to ignore related DWARF entries. In that case, there
could be ignored essential descriptions. That problem could happen with
-flto=thin example https://reviews.llvm.org/D54747#1503720 . In this example,
all type definitions except one were replaced with declarations by thinlto. The
definition, which was left, is in a piece of debug info related to deleted code.
According to zero-length rule, that definition could be ignored, and finally,
incomplete debug info could be used.

So, it probably should be forbidden to generate debug_info, which could become
incomplete after removing pieces related to zero length address ranges.
Otherwise, creating zero-length address ranges could lead to incomplete debug
info.

Thank you, Alexey.



-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200528/e18f38da/attachment.html>

Alexey Lapshin via llvm-dev

2020-May-28 13:14 UTC

head link

[llvm-dev] Range lists, zero-length functions, linker gc

There is a typo in previous message:
 >"Resolve all relocations from debug sections into dead code to 1.
>Parse debug sections and replace HighPc of an address range 
>pointing to dead code and specified as length to 1". 
"Resolve all relocations from debug sections into dead code to 1. Parse
debug sections and replace HighPc of an address range pointing to dead code and
specified as length to 0".
 
Alexey.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200528/4b39460d/attachment.html>

Robinson, Paul via llvm-dev

2020-May-28 21:52 UTC

head link

[llvm-dev] Range lists, zero-length functions, linker gc

As has been mentioned elsewhere, Sony generally fixes up references from debug
info to stripped functions (of any length) using -1, because that's a
less-likely-to-be-real address than 0x0 or 0x1.  (0x0 is a typical base address
for shared libraries, I'd think using it has the potential to mislead
various consumers.)  For .debug_ranges we use -2, because both a 0/0 pair and a
-1/-1 pair have a reserved meaning in that section.

If you're looking only at zero-length functions, you can stop there; but
I'm not sure why stopping there solves much of a real problem, as
zero-length functions seem like a weird corner case. Linkers know how to strip
dead functions (gc) or deduplicate them (icf, COMDAT) and people do this all the
time, in some cases (COMDAT) without explicitly asking for it, so
non-zero-length functions seem like the much more interesting case.  In that
situation, -1 (or -2) seems like a much wiser choice of blessed-as-not-real
address, versus 0x0 or 0x1.

Stripping non-zero-length functions does mean you have to care about more
sections.  For example .debug_locs would want to be fixed up the same way as
.debug_ranges, not because a debugger would care but so that dumpers would not
run into the 0/0 brick wall.  We also fix up lengths in .debug_aranges to zero,
although there might be history behind that tactic that I'm not aware of; it
seems like it ought to be unnecessary, if consumers are aware of the special
address(es).

--paulr

From: Alexey Lapshin <alapshin at accesssoftek.com>
Sent: Thursday, May 28, 2020 9:03 AM
To: Sriraman Tallam <tmsriram at google.com>; Wei Mi <wmi at
google.com>; Robinson, Paul <paul.robinson at sony.com>; Adrian Prantl
<aprantl at apple.com>; Jonas Devlieghere <jdevlieghere at
apple.com>; Alexey Lapshin <a.v.lapshin at mail.ru>; Eric Christopher
<echristo at gmail.com>; Fangrui Song <maskray at google.com>; David
Blaikie <dblaikie at gmail.com>; llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] Range lists, zero-length functions, linker gc


Hi David,


>So there have been several recent discussions about the issues around
>DWARF-agnostic linking and gc-sections, linkonce function definitions being
>dropped, etc - and just how much DWARF-awareness would be suitable
>in a linker to help with this situation.
> I'd like to discuss a narrower instance of this issue: Zero length
gc'd/deduplicated functions.
> LLVM seems to at least produce zero length functions in a few cases:
> * non-void function without a return statement
> * function definition containing only llvm_unreachable
> (both of these trap at -O0, but at higher optimization levels even the trap
> instruction is removed & you get the full power UB of control flowing
off
> the end of the function into whatever other bytes are after that function)
> So, for context, debug_ranges (this whole issue doesn't exist in
DWARFv5,
> FWIW) is a list of address pairs, terminated by a pair of zeros.
> With function sections, or even just with normal C++ inline functions,
> the CU will have a range entry for that function that consists of two
relocations
> - to the start and end of the function. Generally the start of the function
is the
> start of the section, and the end is "start of function + length of
function (aka addend)".
>  Usually any relocation to the section would keep that section
"alive" during linking -
> but that would cause debug info to defeat linker GC and deduplication. So
there's
> special rules for how linkers handle these relocations in debug info to
allow the
> sections to be dropped - what do you write in the bytes that requested the
relocation?
> Binutils ld: Special cases only debug_ranges, resolving all relocations to
dead
> code to 1. In other debug sections, these values are all resolved to zero.
> Gold and lld: Special cases all debug info sections - resolving all
relocations
> to "addend" (so begin usually goes to zero, end goes to
"size of function")
> These special rules are designed to ensure omitted/gc'd/deduplicated
functions
> don't cause the range list to terminate prematurely (which would happen
if begin/end
> were both resolved to zero).
>But with an empty function, gold and lld's strategy here fails to avoid
terminating a
>range list by accident.
> What should we do about it?
>  1) Ensure no zero-length functions exist? (doesn't address backwards
> compatibility/existing functions/other compilers)
> 2) adopt the binutils approach to this (at least in debug_ranges - maybe in
all
> debug sections? (doing it in other sections could break )
>  3) Revisit the discussion about using an even more 'blessed'
value,
> like int max-1? (
https://reviews.llvm.org/D59553<https://urldefense.com/v3/__https:/reviews.llvm.org/D59553__;!!JmoZiZGBv3RvKRSx!r2Jqc2yEgxrb2QcQEocDHJBizj0KUKE70_57b4_rsj1TN0qB8NpBvVKtY63HSqgMOg$>
)
>  (I don't have links to all the recent threads about this discussion -
I think D59553
> might've spawned a separate broader discussion/non-review - oh, Alexey
wrote a
> good summary with links to other discussions here:
> 
http://lists.llvm.org/pipermail/llvm-dev/2019-September/135068.html<https://urldefense.com/v3/__http:/lists.llvm.org/pipermail/llvm-dev/2019-September/135068.html__;!!JmoZiZGBv3RvKRSx!r2Jqc2yEgxrb2QcQEocDHJBizj0KUKE70_57b4_rsj1TN0qB8NpBvVKtY638NIRu2g$>
)
> Thoughts?
I think for the problem of "zero length functions and .debug_ranges"
binutils approach looks good:
>Special cases only debug_ranges, resolving all relocations to
>dead code to 1. In other debug sections, these values are all resolved to
>zero.
But, this would not completely solve the problem from
https://reviews.llvm.org/D59553<https://urldefense.com/v3/__https:/reviews.llvm.org/D59553__;!!JmoZiZGBv3RvKRSx!r2Jqc2yEgxrb2QcQEocDHJBizj0KUKE70_57b4_rsj1TN0qB8NpBvVKtY63HSqgMOg$>
- Overlapped address ranges. Binutils approach will solve the problem if the
address range specified as start_address:end_address. While resolving
relocations, it would replace such a range with 1:1.
However, It would not work if address ranges were specified as
start_address:length since the length is not relocated. This case could be
additionally fixed by fast scan debug_info for High_PC defined as length and
changing it to 1. Something which you suggested here:
http://lists.llvm.org/pipermail/llvm-dev/2020-May/141599.html<https://urldefense.com/v3/__http:/lists.llvm.org/pipermail/llvm-dev/2020-May/141599.html__;!!JmoZiZGBv3RvKRSx!r2Jqc2yEgxrb2QcQEocDHJBizj0KUKE70_57b4_rsj1TN0qB8NpBvVKtY63PsubKJQ$>.

So it looks like following solution could fix both problems and be relatively
fast:

"Resolve all relocations from debug sections into dead code to 1. Parse
debug sections and replace HighPc of an address range pointing to dead code and
specified as length to 1".

As the result all address ranges pointing into dead code would be marked as zero
length.

There still exist another problem:

DWARF4: "A range list entry (but not a base address selection or end of
list entry) whose beginning and
ending addresses are equal has no effect because the size of the range covered
by such an
entry is zero."

DWARF5: "A bounded range entry whose beginning and ending address offsets
are equal
(including zero) indicates an empty range and may be ignored."

These rules allow us to ignore zero-length address ranges. I.e., some tool
reading DWARF is permitted to ignore related DWARF entries. In that case, there
could be ignored essential descriptions. That problem could happen with
-flto=thin example
https://reviews.llvm.org/D54747#1503720<https://urldefense.com/v3/__https:/reviews.llvm.org/D54747*1503720__;Iw!!JmoZiZGBv3RvKRSx!r2Jqc2yEgxrb2QcQEocDHJBizj0KUKE70_57b4_rsj1TN0qB8NpBvVKtY637ju_eQw$>
. In this example, all type definitions except one were replaced with
declarations by thinlto. The definition, which was left, is in a piece of debug
info related to deleted code. According to zero-length rule, that definition
could be ignored, and finally, incomplete debug info could be used.

So, it probably should be forbidden to generate debug_info, which could become
incomplete after removing pieces related to zero length address ranges.
Otherwise, creating zero-length address ranges could lead to incomplete debug
info.

Thank you, Alexey.


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200528/9c57bee0/attachment.html>

David Blaikie via llvm-dev

2020-May-28 22:55 UTC

head link

[llvm-dev] Range lists, zero-length functions, linker gc

On Thu, May 28, 2020 at 6:03 AM Alexey Lapshin <alapshin at
accesssoftek.com>
wrote:
> Hi David,
>
>
> >So there have been several recent discussions about the issues around
>
> >DWARF-agnostic linking and gc-sections, linkonce function definitions
> being
>
> >dropped, etc - and just how much DWARF-awareness would be suitable
>
> >in a linker to help with this situation.
>
> > I'd like to discuss a narrower instance of this issue: Zero length
> gc'd/deduplicated functions.
>
> > LLVM seems to at least produce zero length functions in a few cases:
> > * non-void function without a return statement
> > * function definition containing only llvm_unreachable
> > (both of these trap at -O0, but at higher optimization levels even the
> trap
> > instruction is removed & you get the full power UB of control
> flowing off
> > the end of the function into whatever other bytes are after that
> function)
>
> > So, for context, debug_ranges (this whole issue doesn't exist in
> DWARFv5,
> > FWIW) is a list of address pairs, terminated by a pair of zeros.
> > With function sections, or even just with normal C++ inline functions,
> > the CU will have a range entry for that function that consists of two
> relocations
> > - to the start and end of the function. Generally the start of the
> function is the
> > start of the section, and the end is "start of function + length
of
> function (aka addend)".
>
> >  Usually any relocation to the section would keep that section
"alive"
> during linking -
> > but that would cause debug info to defeat linker GC and deduplication.
> So there's
> > special rules for how linkers handle these relocations in debug info
to
> allow the
> > sections to be dropped - what do you write in the bytes that requested
> the relocation?
>
> > Binutils ld: Special cases only debug_ranges, resolving all
relocations
> to dead
> > code to 1. In other debug sections, these values are all resolved to
> zero.
> > Gold and lld: Special cases all debug info sections - resolving all
> relocations
> > to "addend" (so begin usually goes to zero, end goes to
"size of
> function")
>
> > These special rules are designed to ensure
omitted/gc'd/deduplicated
> functions
> > don't cause the range list to terminate prematurely (which would
happen
> if begin/end
> > were both resolved to zero).
>
> >But with an empty function, gold and lld's strategy here fails to
avoid
> terminating a
> >range list by accident.
>
> > What should we do about it?
>
> >  1) Ensure no zero-length functions exist? (doesn't address
backwards
> > compatibility/existing functions/other compilers)
> > 2) adopt the binutils approach to this (at least in debug_ranges -
maybe
> in all
> > debug sections? (doing it in other sections could break )
> >  3) Revisit the discussion about using an even more 'blessed'
value,
> > like int max-1? ( https://reviews.llvm.org/D59553 )
>
> >  (I don't have links to all the recent threads about this
discussion - I
> think D59553
> > might've spawned a separate broader discussion/non-review - oh,
Alexey
> wrote a
> > good summary with links to other discussions here:
> >  http://lists.llvm.org/pipermail/llvm-dev/2019-September/135068.html )
>
> > Thoughts?
>
> I think for the problem of "zero length functions and
.debug_ranges"
> binutils approach looks good:
>
> >Special cases only debug_ranges, resolving all relocations to
> >dead code to 1. In other debug sections, these values are all resolved
to
> >zero.
>
> But, this would not completely solve the problem from
> https://reviews.llvm.org/D59553 - Overlapped address ranges. Binutils
> approach will solve the problem if the address range specified as
> start_address:end_address. While resolving relocations, it would replace
> such a range with 1:1.
> However, It would not work if address ranges were specified as
> start_address:length since the length is not relocated.
>This case could be additionally fixed by fast scan debug_info for
High_PC> defined as length and changing it to 1. Something which you suggested here:
> http://lists.llvm.org/pipermail/llvm-dev/2020-May/141599.html.
>
Hmm, I don't /think/ I intended to suggest anything that would have to
parse all the debug_info, even if just to fixup high_pc. I meant that
debug_rnglist for the CU at least (rnglist has fewer problems - you can't
accidentally terminate it early, but still has the "large functions in
programs that use relatively low code addresses can't just be resolved to
"addend" because then [0, length) of the large function might overlap
into
that code address range") could be modified by a DWARF-aware linker to
remove the unused chunks. The DWARF that describes a specific function
using low_pc/high_pc - it may be split into a .dwo file and unreachable by
the linker - so it /needs/ a magic value for the address referenced by the
low_pc to indicate that it is invalid.

Which all comes back to "we probably need to pick a value that's
explicitly
invalid" and -2 (max - 1) seems to be about the right thing.

>
> So it looks like following solution could fix both problems and be
> relatively fast:
>
> "Resolve all relocations from debug sections into dead code to 1.
Parse
> debug sections and replace HighPc of an address range pointing to dead code
> and specified as length to 1".
>
That second part seems pretty expensive compared to anything else the
linker is doing with debug info. I'd try to avoid it if at all possible.

> As the result all address ranges pointing into dead code would be marked
> as zero length.
>
> There still exist another problem:
>
> DWARF4: "A range list entry (but not a base address selection or end
of
> list entry) whose beginning and
> ending addresses are equal has no effect because the size of the range
> covered by such an
> entry is zero."
>
> DWARF5: "A bounded range entry whose beginning and ending address
offsets
> are equal
> (including zero) indicates an empty range and may be ignored."
>
> These rules allow us to ignore zero-length address ranges. I.e., some tool
> reading DWARF is permitted to ignore related DWARF entries.
>
I agree it allows consumers to ignore that entry in the range list because
that entry is zero-length/equivalent to not being present at all - I don't
think that means consumers can ignore the DIE that refers to this range
list. I think it's valid DWARF to have a CU that only describes types,
without any code attached to it at all. Or for a subprogram that's been
eliminated to still be used by a consumer for name lookup purposes - so the
consumer can tell the user there is a function called "f1" and tell
the
user what parameter types, return type it has, etc - not ignore it entirely.

> In that case, there could be ignored essential descriptions. That problem
> could happen with -flto=thin example
> https://reviews.llvm.org/D54747#1503720 . In this example, all type
> definitions except one were replaced with declarations by thinlto. The
> definition, which was left, is in a piece of debug info related to deleted
> code. According to zero-length rule, that definition could be ignored, and
> finally, incomplete debug info could be used.
>
Yeah, I think the bug there is the linker dropping object files just
because they have no exxecutable code in them - I think the patch that did
that was reverted, if I'm remembering correctly.

>
> So, it probably should be forbidden to generate debug_info, which could
> become incomplete after removing pieces related to zero length address
> ranges. Otherwise, creating zero-length address ranges could lead to
> incomplete debug info.
>
> Thank you, Alexey.
>
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200528/d6306619/attachment-0001.html>

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - May 2020 - Range lists, zero-length functions, linker gc

[llvm-dev] Range lists, zero-length functions, linker gc

[llvm-dev] Range lists, zero-length functions, linker gc

[llvm-dev] Range lists, zero-length functions, linker gc

[llvm-dev] Range lists, zero-length functions, linker gc

[llvm-dev] Range lists, zero-length functions, linker gc

Apparently Analagous Threads