thr3ads.net - llvm dev - [llvm-dev] Increasing address pool reuse/reducing .o file size in DWARFv5 [Feb 2021]

If this information is useful, please help other people find it:
Share via:

David Blaikie via llvm-dev

2021-Feb-11 01:48 UTC

[llvm-dev] Increasing address pool reuse/reducing .o file size in DWARFv5

All 3 options are now implemented & I've tidied up a flag name (still an
-mllvm flag - I don't think this should ever be a user-visible flag).

-mllvm -minimize-addr-in-v5=Ranges
  Uses debug_rnglists even for contiguous ranges if doing so would avoid
adding another entry to .debug_addr eg: a CU with 3 functions, two in the
same section. The first function in each section uses low/high, the CU has
a rnglist, and can share/reuse the low_pc of those two functions. But for a
function that is later in a section that already has another function in it
- that one would use the low_pc of the first function in the section as its
base address, and an offset pair - avoiding the need for a 3rd debug_addr
entry and associated relocation

-mllvm -minimize-addr-in-v5=Expressions
  This uses the exprloc idea - using a non-trivial expression for a
DW_AT_low_pc or other address classed attribute. This reduces the overhead
compared to the 'Ranges' technique, and allows more cases - including
DW_TAG_labels and DW_TAG_call_sites.

-mllvm -minimize-addr-in-v5=Form
   Similar to Expressions, but using a custom form to make things a bit
more compact (has the drawback that consumers who don't recognize the form
can't parse any of the DWARF because they can't skip over the attribute
due
to not knowing its size)

For comparisons, a few different build modes using 'Ranges':

I should say all these builds are with compressed debug info enabled (in
object files) and type units. the asan build uses compressed debug info in
the linked binary and only gmlt.

But the main takeaway is this seems probably (to me) worth turning on for
Split DWARF - it does mean the final build assets (exe+dwp) are slightly
larger (1.28%), but the benefit in object and executable size seems
probably generically worthwhile.

I plan to roll =Ranges out inside google for cases that use Split DWARF,
see if sticks, and if so, change upstream to default to enable the feature
under Split DWARF.

For the other two modes generally make things better/reduce the tradeoff
cost:
So with the custom form, we can even get to a total savings in both
intermediate (.o/.dwo) and linked (exe/dwp) files, so it might even be
applicable to non-split DWARF. (though, again, the tradeoffs will look
somewhat different without compression enabled and maybe without type units
might swing it one way or another a bit (probably not much though))

I'd love to have the Form version supported in lldb and enabled by default
when tuning/targeting lldb, but not sure I have the lldb expertise/time to
implement that just yet.

Anyone have thoughts/ideas/interest in collaborating on any of this?


On Tue, Jan 5, 2021 at 4:43 PM David Blaikie <dblaikie at gmail.com>
wrote:
> Coming back around to this...
>
>
>
https://github.com/llvm/llvm-project/commit/ad18b075fd63935148b460f9c6b4dce130c56b15
> Added the "always use ranges" option, currently off-by-default,
usable with
> -gdwarf-5 -mllvm -always-use-ranges-in-v5=Enable (as the name implies, this
> has no effect on DWARFv4 and below, because there's no benefit there).
I
> have plans to make this the default behavior for Split DWARF since moving
> bytes from .o to .dwo is valuable even if it breaks pretty even - enough to
> justify this even though it's a wash or maybe a slight cost to linked
> binary size (compared to unlinked object size).
>
> I did come across a couple of lldb bugs related to using ranges on
> subprograms ("Ranges everywhere" can use ranges on subprograms
where the
> subprogram is in the same section as another subprogram), sent fixes for
> them in: https://reviews.llvm.org/D94063 and
> https://reviews.llvm.org/D94064 - if anyone has a chance to look at
> those, it'd be most appreciated.
>
> Once those lldb fixes are in, I'll make the change to enable this
feature
> by default when using Split DWARF unless anyone's got objections to
that.
>
> & in the mean time I'm also working on patches for the other two
> candidates - novel DWARF expressions and an LLVM extension form.
>
> On Mon, Jan 13, 2020 at 2:15 PM David Blaikie <dblaikie at gmail.com>
wrote:
>
>>
>>
>> On Mon, Jan 13, 2020 at 1:39 PM Vedant Kumar <vedant_kumar at
apple.com>
>> wrote:
>>
>>>
>>>
>>> On Jan 13, 2020, at 9:20 AM, David Blaikie via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>
>>>
>>> On Mon, Jan 13, 2020 at 9:03 AM Vedant Kumar <vedant_kumar at
apple.com>
>>> wrote:
>>>
>>>> I think I get it now, thanks for explaining!
>>>>
>>>> On Jan 12, 2020, at 11:44 AM, David Blaikie via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>> 
>>>>
>>>>
>>>> On Fri, Jan 10, 2020 at 12:57 PM Vedant Kumar <vedant_kumar
at apple.com>
>>>> wrote:
>>>>
>>>>> I don't totally follow the proposed encoding change
& would appreciate
>>>>> a small example.
>>>>>
>>>>> Is the idea to replace e.g. an 'AT_low_pc (<direct
address>) +
>>>>> relocation for <direct address>' with an
'AT_low_pc (<indirection into a
>>>>> pool of addresses> + offset)',
>>>>>
>>>>
>>>> With Split DWARF or with DWARFv5 in LLVM at the moment, all
addresses
>>>> are indirected already. So it's:
>>>>
>>>> Replace "AT_low_pc (<indirection into a pool of
addresses>)" with an
>>>> "AT_low_pc (<indirection into a pool of addresses> +
offset)".
>>>>
>>>>
>>>>> s.t. the cost of a relocation for the address is paid down
the more
>>>>> it's used?
>>>>>
>>>>
>>>> Right - specifically to reduce the pool of addresses down to,
ideally,
>>>> one address per section/indivisible chunk of machine code (per
subsection
>>>> in MachO, for instance) (whereas currently there are many
addresses per
>>>> section)
>>>>
>>>>
>>>>> How do you figure the offset out?
>>>>>
>>>>
>>>> Label difference - same as is done for DW_AT_high_pc today in
DWARFv4
>>>> and DWARFv5 in LLVM. high_pc currently uses the low_pc addresse
to be
>>>> relative to, in this proposed situation, we'd use a symbol
that's in the
>>>> first bit of debug info in the section (or subsection in
MachO). So the
>>>> low_pc of the subprogram/function, for instance, or if there
are two
>>>> functions in the same section with debug info for both, the
low_pc of the
>>>> first of those functions, etc...
>>>>
>>>>
>>>> If the label difference in a low_pc attribute is relative to
the start
>>>> of a section, could a linker orderfile pass break the dwarf
unless it
>>>> updates the offset?
>>>>
>>>
>>> Nah - terminologically, ELF sections are indivisible - more akin to
>>> MachO subsections. ELF files can have multiple sections with the
same name
>>> (as is used for comdat sections for inline functions, and for
>>> -ffunction-sections (roughly equivalent to MachO's
"subsections via
>>> symbols", as I understand it) (or can use
".text.suffix" naming to give
>>> each separate .text section its own name - but the linker strips
the
>>> suffixes and concatenates all these together into the final linked
.text
>>> section)
>>>
>>>
>>> I see, so an ELF linker may reorder sections relative to each
other, but
>>> not the contents of a section. (That matches up with what I've
read
>>> elsewhere - you'd use -ffunction-sections to reorder function
symbols,
>>> IIRC.)
>>>
>>
>> Right.
>>
>>
>>> And in this proposal to increase address pool reuse, label
differences
>>> in a MachO would be relative to the subsection.
>>>
>>
>> Even before my proposal, there are already many cases where rnglists
and
>> loclists in DWARFv5 (& location lists in DWARFv4) will use
selectively
>> chosen base addresses and symbol differences as often as possible
(insofar
>> as I could do that when working/experimenting with ELF).
>>
>> So without function sections, for instance - rnglists for sub-function
>> ranges (ignoring PROPELLER for now/in this part of the discussion).
>>
>> Perhaps an example would be helpful. Here's LLVM's current
behavior with
>> DWARFv5 and ELF, without function sections:
>>
>> int f1();
>> void f2() {
>>   if (int i = f1()) {
>>     f1();
>>   }
>> }
>> void f3() {
>>   if (f1()) {
>>     int i = f1();
>>   }
>> }
>> __attribute__((section(".other"))) void f4() {
>> }
>>
>> In this code there are only two ELF sections (".text"
contains the
>> definitions of f2 and f3, ".other" contains the definition of
f4) and so we
>> /should/ be able to only have 2 relocations in the debug info.
>>
>> (I'm exploiting something of a bug/quirk in Clang/LLVM's debug
info that
>> causes, even at -O0, the lexical_block for the 'if' to have a
hole in it,
>> where the call to f1 is, so it has ranges rather than low/high pc)
>>
>> In DWARFv4 this example would've used 10 relocations. (on the CU
ranges,
>> there would be begin/end for the ".text" range covering f2
and f3, and
>> begin/end for the ".other" range covering f4, then the range
list for the
>> "if" lexical_block would contain another 2 pairs (4
addresses/relocations),
>> one relocation for f2's low_pc, one for f3's 'if'
lexical_block).
>>
>> In DWARFv5, we see the following:
>>
>> 0x00000014: [DW_RLE_base_addressx]:  0x0000000000000000
>> 0x00000016: [DW_RLE_offset_pair  ]:  0x0000000000000008,
>> 0x0000000000000014
>> 0x00000019: [DW_RLE_offset_pair  ]:  0x000000000000001a,
>> 0x000000000000001f
>> 0x0000001c: [DW_RLE_end_of_list  ]
>> 0x0000001d: [DW_RLE_startx_length]:  0x0000000000000000,
>> 0x0000000000000036
>> 0x00000020: [DW_RLE_startx_length]:  0x0000000000000002,
>> 0x0000000000000006
>> 0x00000023: [DW_RLE_end_of_list  ]
>>
>> The first location list is for the 'if' scope, the second is
for the CU.
>> Both are able to efficiently select encodings and base addresses.
>>
>> But the debug_addr has 4 addresses in it - the address at index 1 (not
>> used in the rnglists shown above - we see index 0 and index 2 are used
>> there) is for the low_pc of f3's subprogram, and the address at
index 2 is
>> for the low_pc of f3's if block/scope.
>>
>> That's the address/relocation that would be... addressed by the
change
>> I'm proposing. One way to avoid that relocation would be to encode
f3's
>> address range using a rnglist - this is fully backwards compatible, and
>> would produce a rnglist like this:
>>
>> [DW_RLE_base_addressx]:  0x0000000000000000
>> [DW_RLE_offset_pair  ]:  0x0000000000000030, 0x0000000000000036
>> [DW_RLE_end_of_list  ]
>>
>> Similarly, f3's if block could use a rangelist like:
>>
>> [DW_RLE_base_addressx]:  0x0000000000000000
>> [DW_RLE_offset_pair  ]:  0x0000000000000046, 0x0000000000000054
>> [DW_RLE_end_of_list  ]
>>
>> As you can imagine, there are quite a few ranges (especially once you
get
>> inlining) that use low/high_pc, and could benefit from the reduction in
>> relocations by using this strategy. Though it isn't optimal (the
range list
>> encoding isn't intended to be good for this use case) in terms of
size cost
>> - hence the possibility of using DWARF expressions for address class
>> attributes, or a custom form that would more directly encode the
<indirect
>> address> + <offset>.
>>
>> In Propeller, is basic block reordering done after a .o is emitted?
>>>
>>
>> Yes.
>>
>>
>>> If so, I suppose I don't yet see how the proposed scheme is
resilient to
>>> this reordering.
>>>
>>
>> With PROPELLER any function that is fragmented into reorderable
sections
>> must necessarily use ranges to describe the function's address
range - but,
>> again, choosing base addresses strategically & using relative
references
>> whenever possible, would help reduce the cost of PROPELLER's debug
info.
>>
>>
>>> OTOH if block reordering is done just before the label difference
is
>>> evaluated, then there shouldn't be any issue.
>>>
>>>
>>> Ditto, I suppose, for an intra-function offset when something like
>>>> propeller is used to reorder basic blocks (I’m thinking of
>>>> At_call_return_pc now).
>>>>
>>>
>>> Yeah - currently the "base address" for each section is
determined by
>>> the first function with debug info being emitted in that section (
>>>
https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1787
)
>>> - with PROPELLER we'd need to add similar code when function
fragments are
>>> emitted. (I'm planning to check the PROPELLER work in progress
tree soon
>>> and do another sanity pass over the debug info emitted to check
this is
>>> working as intended - in part because this base address selection,
coupled
>>> with DWARFv5 and maybe with the changes I'm suggesting in this
thread (&
>>> will commit under flags "soon" (might take me a week or
two judging by my
>>> review/bug/investigation load right now... *fingers crossed*))
might make
>>> PROPELLER less expensive in terms of debug info size, or more
expensive
>>> relative to the significant improvements this provides)
>>>
>>>
>>> Thanks for investigating!
>>>
>>> Owing to the way MachO debug info distribution works differently
& if I
>>> understand correctly doesn't need relocations in many cases due
to
>>> DWARF-aware parsing/linking (& if it does use relocations,
I've no
>>> knowledge of when/how and how big they are compared to the ELF
relocations
>>> I've been measuring) it's quite possible MachO would have
different
>>> tradeoffs in this space.
>>>
>>>
>>> A linked .dSYM (analogous to an ELF .dwp, IIUC) doesn't contain
>>> relocations for AT_low_pc or AT_call_return_pc in the simple
examples I
>>> tried out. We do emit relocations for those attributes in MachO
object
>>> files (there isn't something analogous to a .dwo on MachO, the
debug info
>>> just goes into a different set of sections in the .o). My
understanding
>>> (based on the definition of `macho_relocation_info` in the ld64
sources) is
>>> that MachO relocations are 8 bytes in size. It looks like ELF
rel/rela
>>> relocations are 16/24 bytes in size, but I'm not sure why
(perhaps they're
>>> more extensible / encode more information).
>>>
>>
>> OK *nod* with the smaller encoding it may be less of a pressing issue
for
>> you & the tradeoff may be different.
>>
>>
>>> Would a vanilla DWARFv4 .dwp (without your patches applied) contain
a
>>> relocation for each 'AT_low_pc (<direct address>)'?
>>>
>>
>> DWP files contain no direct addresses - they are all indirect through
the
>> address pool. But, yes, for a DWARFv4 Split DWARF build, low_pcs
don't have
>> an opportunity to reuse a strategically chosen base address - they have
to
>> use an addrx form & the debug_addr section would have that specific
address
>> with a relocation for it.
>>
>>
>>>
>>> vedant
>>>
>>>
>>>
>>>> Apologies if this has been answered elsewhere, I suppose there
must be
>>>> a solution for this for At_high_pc to work.
>>>>
>>>> vedant
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> thanks,
>>>>> vedant
>>>>>
>>>>> On Jan 8, 2020, at 1:33 PM, David Blaikie via llvm-dev <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>> Sounds good all round - I'll commit these two modes,
and maybe even
>>>>> the third (given Sony's interest & possible
interest in changing their
>>>>> consumer to handle it) of a custom form to eek out the last
few bytes from
>>>>> the more direct addr+offset encoding.
>>>>>
>>>>> I'll follow up here with flag names and revision
numbers once they're
>>>>> in.
>>>>>
>>>>> On Wed, Jan 8, 2020 at 1:26 PM Robinson, Paul
<paul.robinson at sony.com>
>>>>> wrote:
>>>>>
>>>>>> On some previous occasion that introduced additional
indirection
>>>>>> (don't remember the details) my debugger people
groused about the
>>>>>> additional performance cost of chasing down data in a
different
>>>>>> object-file section.  So we (Sony) might be happier
with low_pc as
>>>>>> expressions, than with a ranges-always solution.
>>>>>>
>>>>>> But hard to say without data, and getting both modes in
at least
>>>>>> as a temporary thing sounds like a good plan.
>>>>>> --paulr
>>>>>>
>>>>>>
>>>>>> > -----Original Message-----
>>>>>> > From: aprantl at apple.com <aprantl at
apple.com>
>>>>>> > Sent: Wednesday, January 8, 2020 1:49 PM
>>>>>> > To: David Blaikie <dblaikie at gmail.com>
>>>>>> > Cc: llvm-dev <llvm-dev at lists.llvm.org>;
Jonas Devlieghere
>>>>>> > <jdevlieghere at apple.com>; Robinson, Paul
<paul.robinson at sony.com>;
>>>>>> Eric
>>>>>> > Christopher <echristo at gmail.com>;
Frederic Riss <friss at apple.com>
>>>>>> > Subject: Re: Increasing address pool
reuse/reducing .o file size in
>>>>>> > DWARFv5
>>>>>> >
>>>>>> > I think this sounds like a good plan for Linux. I
would like to see
>>>>>> the
>>>>>> > numbers for Darwin (= non-split DWARF) to decide
whether we should
>>>>>> just
>>>>>> > make that the default. Eric's suggestion of
having this committed
>>>>>> as an
>>>>>> > option first seems like a good step in that
direction. If it is an
>>>>>> > advantage across the board we can remove the
option and just make
>>>>>> this the
>>>>>> > default behavior.
>>>>>> >
>>>>>> > thanks,
>>>>>> > adrian
>>>>>> >
>>>>>> > > On Dec 30, 2019, at 12:08 PM, David Blaikie
<dblaikie at gmail.com>
>>>>>> wrote:
>>>>>> > >
>>>>>> > > tl;dr: in DWARFv5, using DW_AT_ranges even
when the range is
>>>>>> contiguous
>>>>>> > reduces linked, uncompressed debug_addr size for
optimized builds
>>>>>> by 93%
>>>>>> > and reduces total .o file size (with compression
and split) by 15%.
>>>>>> It
>>>>>> > does grow .dwo file size a bit - DWARFv5, no
compression, not split
>>>>>> shows
>>>>>> > the net effect if all bytes are equal: -O3 clang
binary grows by
>>>>>> 0.4%, -O0
>>>>>> > clang binary shrinks by 0.1%
>>>>>> > > Should we enable this strategy by default for
DWARFv5, for
>>>>>> DWARFv5+Split
>>>>>> > DWARF, or not by default at all/only under a flag?
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > > So, I've brought this up a few times
before - that DWARFv5 does a
>>>>>> pretty
>>>>>> > good job of reducing relocations (& reducing
.o file size with Split
>>>>>> > DWARF) by allowing many uses of addresses to
include some kind of
>>>>>> > address+offset (debug_rnglists and loclists
allowing "base_address"
>>>>>> then
>>>>>> > offset_pairs (an improvement over similar
functionality in DWARFv4
>>>>>> because
>>>>>> > the offset pairs can be uleb encoded - so they can
be quite
>>>>>> compact))
>>>>>> > >
>>>>>> > > But one place that DWARFv5 misses to reduce
relocations further is
>>>>>> > direct addresses from debug_info, such as
DW_AT_low_pc.
>>>>>> > >
>>>>>> > > For a while I've wondered if we could use
an extension form for
>>>>>> > addr+offset, and I prototyped this without an
extension attribute,
>>>>>> but
>>>>>> > instead using exprloc. This has slightly higher
overhead to express
>>>>>> the...
>>>>>> > expression. (it's 9 bytes in total, could be
as few as 5 with a
>>>>>> custom
>>>>>> > form)
>>>>>> > >
>>>>>> > > But I had another idea that's more
instantly deployable: Why not
>>>>>> use
>>>>>> > DW_AT_ranges even when the range is contiguous?
That way the low_pc
>>>>>> that
>>>>>> > previously couldn't use an existing address
pool entry + offset,
>>>>>> could use
>>>>>> > the rnglist support for base address.
>>>>>> > >
>>>>>> > > The only unnecessary address pool entries
that remain that I've
>>>>>> found
>>>>>> > are DW_AT_low_pc for DW_TAG_labels - but
there's only a handful of
>>>>>> those
>>>>>> > in most code. So the "ranges everywhere"
strategy gets the
>>>>>> addresses for
>>>>>> > optimized clang down from 4758 (v4 address pool
used 9923
>>>>>> addresses... )
>>>>>> > to 342, with about ~4 "extra" addresses
for DW_TAG_labels.
>>>>>> > >
>>>>>> > > This could also be a bit less costly if
DWARFv5 rnglists didn't
>>>>>> use a
>>>>>> > separate offset table (instead encoding the
offsets directly in
>>>>>> > debug_info, rather than using indexes)
>>>>>> > >
>>>>>> > > I have patches for both the addr+offset
exprloc and for the
>>>>>> ranges-
>>>>>> > always, both with -mllvm flags - do people think
they're both worth
>>>>>> > committing for experimentation? Neither? Default
on in some cases
>>>>>> (like
>>>>>> > Split DWARF)?
>>>>>> > >
>>>>>> > > Thanks,
>>>>>> > > - Dave
>>>>>>
>>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210210/03f24a7c/attachment-0001.html>

Fangrui Song via llvm-dev

2021-Feb-11 05:34 UTC

head link

[llvm-dev] Increasing address pool reuse/reducing .o file size in DWARFv5

Hi, David, this looks great! I just started to play this under llc
-minimize-addr-in-v5= and I will study it in the coming days.

On 2021-02-10, David Blaikie via llvm-dev wrote:>All 3 options are now implemented & I've tidied up a flag name
(still an
>-mllvm flag - I don't think this should ever be a user-visible flag).
>
>-mllvm -minimize-addr-in-v5=Ranges
>  Uses debug_rnglists even for contiguous ranges if doing so would avoid
>adding another entry to .debug_addr eg: a CU with 3 functions, two in the
>same section. The first function in each section uses low/high, the CU has
>a rnglist, and can share/reuse the low_pc of those two functions. But for a
>function that is later in a section that already has another function in it
>- that one would use the low_pc of the first function in the section as its
>base address, and an offset pair - avoiding the need for a 3rd debug_addr
>entry and associated relocation
>
>-mllvm -minimize-addr-in-v5=Expressions
>  This uses the exprloc idea - using a non-trivial expression for a
>DW_AT_low_pc or other address classed attribute. This reduces the overhead
>compared to the 'Ranges' technique, and allows more cases -
including
>DW_TAG_labels and DW_TAG_call_sites.
This option emits: DW_OP_addrx 0, DW_OP_const4u 9, DW_OP_plus.

DW_OP_const4u is a bit wasteful. This could be changed to DW_OP_addrx 0,
DW_OP_plus_udata 9. However, the current implementation requires the size of the
DWARF expression, and we don't know the addend size of DW_OP_plus_udata.

   .byte size_of_exprloc   # This would be dependent on the size of .uleb128
   ...
   .byte 35
   .long .Ltmp1-.Lfunc_begin0
   # it'd be nice if we can use .uleb128 .Ltmp1-.Lfunc_begin0

size_of_exprloc could be changed to a subtraction of two labels.

When .uleb128 is used, we should be careful about assembler convergence.

* GNU as hacked around the problem specifically for .gcc_except_table by
inserting additional .align https://sourceware.org/bugzilla/show_bug.cgi?id=4029
It works for .gcc_except_table but can be a problem for our .uleb128 + .byte
scheme.
* LLVM MC's solution is generic.
>-mllvm -minimize-addr-in-v5=Form
>   Similar to Expressions, but using a custom form to make things a bit
>more compact (has the drawback that consumers who don't recognize the
form
>can't parse any of the DWARF because they can't skip over the
attribute due
>to not knowing its size)
This option emits a new form: DW_FORM_LLVM_addrx_offset, which is the composite
of DW_FORM_addrx and DW_FORM_data4. This is superior to Expressions because the
bytes for the exprloc size and the plus operation can be saved.

Similar to Expressions, there is a question whether DW_FORM_udata would be
better.
It could save 3 bytes compare with DW_OP_plus_udata.
>
>For comparisons, a few different build modes using 'Ranges':
>
>I should say all these builds are with compressed debug info enabled (in
>object files) and type units. the asan build uses compressed debug info in
>the linked binary and only gmlt.
>
>But the main takeaway is this seems probably (to me) worth turning on for
>Split DWARF - it does mean the final build assets (exe+dwp) are slightly
>larger (1.28%), but the benefit in object and executable size seems
>probably generically worthwhile.
>
>I plan to roll =Ranges out inside google for cases that use Split DWARF,
>see if sticks, and if so, change upstream to default to enable the feature
>under Split DWARF.
>
>For the other two modes generally make things better/reduce the tradeoff
>cost:
>So with the custom form, we can even get to a total savings in both
>intermediate (.o/.dwo) and linked (exe/dwp) files, so it might even be
>applicable to non-split DWARF. (though, again, the tradeoffs will look
>somewhat different without compression enabled and maybe without type units
>might swing it one way or another a bit (probably not much though))
>
>I'd love to have the Form version supported in lldb and enabled by
default
>when tuning/targeting lldb, but not sure I have the lldb expertise/time to
>implement that just yet.
>
>Anyone have thoughts/ideas/interest in collaborating on any of this?
>
>On Tue, Jan 5, 2021 at 4:43 PM David Blaikie <dblaikie at gmail.com>
wrote:
>
>> Coming back around to this...
>>
>>
>>
https://github.com/llvm/llvm-project/commit/ad18b075fd63935148b460f9c6b4dce130c56b15
>> Added the "always use ranges" option, currently
off-by-default, usable with
>> -gdwarf-5 -mllvm -always-use-ranges-in-v5=Enable (as the name implies,
this
>> has no effect on DWARFv4 and below, because there's no benefit
there). I
>> have plans to make this the default behavior for Split DWARF since
moving
>> bytes from .o to .dwo is valuable even if it breaks pretty even -
enough to
>> justify this even though it's a wash or maybe a slight cost to
linked
>> binary size (compared to unlinked object size).
>>
>> I did come across a couple of lldb bugs related to using ranges on
>> subprograms ("Ranges everywhere" can use ranges on
subprograms where the
>> subprogram is in the same section as another subprogram), sent fixes
for
>> them in: https://reviews.llvm.org/D94063 and
>> https://reviews.llvm.org/D94064 - if anyone has a chance to look at
>> those, it'd be most appreciated.
>>
>> Once those lldb fixes are in, I'll make the change to enable this
feature
>> by default when using Split DWARF unless anyone's got objections to
that.
>>
>> & in the mean time I'm also working on patches for the other
two
>> candidates - novel DWARF expressions and an LLVM extension form.
>>
>> On Mon, Jan 13, 2020 at 2:15 PM David Blaikie <dblaikie at
gmail.com> wrote:
>>
>>>
>>>
>>> On Mon, Jan 13, 2020 at 1:39 PM Vedant Kumar <vedant_kumar at
apple.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Jan 13, 2020, at 9:20 AM, David Blaikie via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>
>>>>
>>>> On Mon, Jan 13, 2020 at 9:03 AM Vedant Kumar <vedant_kumar
at apple.com>
>>>> wrote:
>>>>
>>>>> I think I get it now, thanks for explaining!
>>>>>
>>>>> On Jan 12, 2020, at 11:44 AM, David Blaikie via llvm-dev
<
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>> 
>>>>>
>>>>>
>>>>> On Fri, Jan 10, 2020 at 12:57 PM Vedant Kumar
<vedant_kumar at apple.com>
>>>>> wrote:
>>>>>
>>>>>> I don't totally follow the proposed encoding change
& would appreciate
>>>>>> a small example.
>>>>>>
>>>>>> Is the idea to replace e.g. an 'AT_low_pc
(<direct address>) +
>>>>>> relocation for <direct address>' with an
'AT_low_pc (<indirection into a
>>>>>> pool of addresses> + offset)',
>>>>>>
>>>>>
>>>>> With Split DWARF or with DWARFv5 in LLVM at the moment, all
addresses
>>>>> are indirected already. So it's:
>>>>>
>>>>> Replace "AT_low_pc (<indirection into a pool of
addresses>)" with an
>>>>> "AT_low_pc (<indirection into a pool of
addresses> + offset)".
>>>>>
>>>>>
>>>>>> s.t. the cost of a relocation for the address is paid
down the more
>>>>>> it's used?
>>>>>>
>>>>>
>>>>> Right - specifically to reduce the pool of addresses down
to, ideally,
>>>>> one address per section/indivisible chunk of machine code
(per subsection
>>>>> in MachO, for instance) (whereas currently there are many
addresses per
>>>>> section)
>>>>>
>>>>>
>>>>>> How do you figure the offset out?
>>>>>>
>>>>>
>>>>> Label difference - same as is done for DW_AT_high_pc today
in DWARFv4
>>>>> and DWARFv5 in LLVM. high_pc currently uses the low_pc
addresse to be
>>>>> relative to, in this proposed situation, we'd use a
symbol that's in the
>>>>> first bit of debug info in the section (or subsection in
MachO). So the
>>>>> low_pc of the subprogram/function, for instance, or if
there are two
>>>>> functions in the same section with debug info for both, the
low_pc of the
>>>>> first of those functions, etc...
>>>>>
>>>>>
>>>>> If the label difference in a low_pc attribute is relative
to the start
>>>>> of a section, could a linker orderfile pass break the dwarf
unless it
>>>>> updates the offset?
>>>>>
>>>>
>>>> Nah - terminologically, ELF sections are indivisible - more
akin to
>>>> MachO subsections. ELF files can have multiple sections with
the same name
>>>> (as is used for comdat sections for inline functions, and for
>>>> -ffunction-sections (roughly equivalent to MachO's
"subsections via
>>>> symbols", as I understand it) (or can use
".text.suffix" naming to give
>>>> each separate .text section its own name - but the linker
strips the
>>>> suffixes and concatenates all these together into the final
linked .text
>>>> section)
>>>>
>>>>
>>>> I see, so an ELF linker may reorder sections relative to each
other, but
>>>> not the contents of a section. (That matches up with what
I've read
>>>> elsewhere - you'd use -ffunction-sections to reorder
function symbols,
>>>> IIRC.)
>>>>
>>>
>>> Right.
>>>
>>>
>>>> And in this proposal to increase address pool reuse, label
differences
>>>> in a MachO would be relative to the subsection.
>>>>
>>>
>>> Even before my proposal, there are already many cases where
rnglists and
>>> loclists in DWARFv5 (& location lists in DWARFv4) will use
selectively
>>> chosen base addresses and symbol differences as often as possible
(insofar
>>> as I could do that when working/experimenting with ELF).
>>>
>>> So without function sections, for instance - rnglists for
sub-function
>>> ranges (ignoring PROPELLER for now/in this part of the discussion).
>>>
>>> Perhaps an example would be helpful. Here's LLVM's current
behavior with
>>> DWARFv5 and ELF, without function sections:
>>>
>>> int f1();
>>> void f2() {
>>>   if (int i = f1()) {
>>>     f1();
>>>   }
>>> }
>>> void f3() {
>>>   if (f1()) {
>>>     int i = f1();
>>>   }
>>> }
>>> __attribute__((section(".other"))) void f4() {
>>> }
>>>
>>> In this code there are only two ELF sections (".text"
contains the
>>> definitions of f2 and f3, ".other" contains the
definition of f4) and so we
>>> /should/ be able to only have 2 relocations in the debug info.
>>>
>>> (I'm exploiting something of a bug/quirk in Clang/LLVM's
debug info that
>>> causes, even at -O0, the lexical_block for the 'if' to have
a hole in it,
>>> where the call to f1 is, so it has ranges rather than low/high pc)
>>>
>>> In DWARFv4 this example would've used 10 relocations. (on the
CU ranges,
>>> there would be begin/end for the ".text" range covering
f2 and f3, and
>>> begin/end for the ".other" range covering f4, then the
range list for the
>>> "if" lexical_block would contain another 2 pairs (4
addresses/relocations),
>>> one relocation for f2's low_pc, one for f3's 'if'
lexical_block).
>>>
>>> In DWARFv5, we see the following:
>>>
>>> 0x00000014: [DW_RLE_base_addressx]:  0x0000000000000000
>>> 0x00000016: [DW_RLE_offset_pair  ]:  0x0000000000000008,
>>> 0x0000000000000014
>>> 0x00000019: [DW_RLE_offset_pair  ]:  0x000000000000001a,
>>> 0x000000000000001f
>>> 0x0000001c: [DW_RLE_end_of_list  ]
>>> 0x0000001d: [DW_RLE_startx_length]:  0x0000000000000000,
>>> 0x0000000000000036
>>> 0x00000020: [DW_RLE_startx_length]:  0x0000000000000002,
>>> 0x0000000000000006
>>> 0x00000023: [DW_RLE_end_of_list  ]
>>>
>>> The first location list is for the 'if' scope, the second
is for the CU.
>>> Both are able to efficiently select encodings and base addresses.
>>>
>>> But the debug_addr has 4 addresses in it - the address at index 1
(not
>>> used in the rnglists shown above - we see index 0 and index 2 are
used
>>> there) is for the low_pc of f3's subprogram, and the address at
index 2 is
>>> for the low_pc of f3's if block/scope.
>>>
>>> That's the address/relocation that would be... addressed by the
change
>>> I'm proposing. One way to avoid that relocation would be to
encode f3's
>>> address range using a rnglist - this is fully backwards compatible,
and
>>> would produce a rnglist like this:
>>>
>>> [DW_RLE_base_addressx]:  0x0000000000000000
>>> [DW_RLE_offset_pair  ]:  0x0000000000000030, 0x0000000000000036
>>> [DW_RLE_end_of_list  ]
>>>
>>> Similarly, f3's if block could use a rangelist like:
>>>
>>> [DW_RLE_base_addressx]:  0x0000000000000000
>>> [DW_RLE_offset_pair  ]:  0x0000000000000046, 0x0000000000000054
>>> [DW_RLE_end_of_list  ]
>>>
>>> As you can imagine, there are quite a few ranges (especially once
you get
>>> inlining) that use low/high_pc, and could benefit from the
reduction in
>>> relocations by using this strategy. Though it isn't optimal
(the range list
>>> encoding isn't intended to be good for this use case) in terms
of size cost
>>> - hence the possibility of using DWARF expressions for address
class
>>> attributes, or a custom form that would more directly encode the
<indirect
>>> address> + <offset>.
>>>
>>> In Propeller, is basic block reordering done after a .o is emitted?
>>>>
>>>
>>> Yes.
>>>
>>>
>>>> If so, I suppose I don't yet see how the proposed scheme is
resilient to
>>>> this reordering.
>>>>
>>>
>>> With PROPELLER any function that is fragmented into reorderable
sections
>>> must necessarily use ranges to describe the function's address
range - but,
>>> again, choosing base addresses strategically & using relative
references
>>> whenever possible, would help reduce the cost of PROPELLER's
debug info.
>>>
>>>
>>>> OTOH if block reordering is done just before the label
difference is
>>>> evaluated, then there shouldn't be any issue.
>>>>
>>>>
>>>> Ditto, I suppose, for an intra-function offset when something
like
>>>>> propeller is used to reorder basic blocks (I’m thinking of
>>>>> At_call_return_pc now).
>>>>>
>>>>
>>>> Yeah - currently the "base address" for each section
is determined by
>>>> the first function with debug info being emitted in that
section (
>>>>
https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1787
)
>>>> - with PROPELLER we'd need to add similar code when
function fragments are
>>>> emitted. (I'm planning to check the PROPELLER work in
progress tree soon
>>>> and do another sanity pass over the debug info emitted to check
this is
>>>> working as intended - in part because this base address
selection, coupled
>>>> with DWARFv5 and maybe with the changes I'm suggesting in
this thread (&
>>>> will commit under flags "soon" (might take me a week
or two judging by my
>>>> review/bug/investigation load right now... *fingers crossed*))
might make
>>>> PROPELLER less expensive in terms of debug info size, or more
expensive
>>>> relative to the significant improvements this provides)
>>>>
>>>>
>>>> Thanks for investigating!
>>>>
>>>> Owing to the way MachO debug info distribution works
differently & if I
>>>> understand correctly doesn't need relocations in many cases
due to
>>>> DWARF-aware parsing/linking (& if it does use relocations,
I've no
>>>> knowledge of when/how and how big they are compared to the ELF
relocations
>>>> I've been measuring) it's quite possible MachO would
have different
>>>> tradeoffs in this space.
>>>>
>>>>
>>>> A linked .dSYM (analogous to an ELF .dwp, IIUC) doesn't
contain
>>>> relocations for AT_low_pc or AT_call_return_pc in the simple
examples I
>>>> tried out. We do emit relocations for those attributes in MachO
object
>>>> files (there isn't something analogous to a .dwo on MachO,
the debug info
>>>> just goes into a different set of sections in the .o). My
understanding
>>>> (based on the definition of `macho_relocation_info` in the ld64
sources) is
>>>> that MachO relocations are 8 bytes in size. It looks like ELF
rel/rela
>>>> relocations are 16/24 bytes in size, but I'm not sure why
(perhaps they're
>>>> more extensible / encode more information).
>>>>
>>>
>>> OK *nod* with the smaller encoding it may be less of a pressing
issue for
>>> you & the tradeoff may be different.
>>>
>>>
>>>> Would a vanilla DWARFv4 .dwp (without your patches applied)
contain a
>>>> relocation for each 'AT_low_pc (<direct
address>)'?
>>>>
>>>
>>> DWP files contain no direct addresses - they are all indirect
through the
>>> address pool. But, yes, for a DWARFv4 Split DWARF build, low_pcs
don't have
>>> an opportunity to reuse a strategically chosen base address - they
have to
>>> use an addrx form & the debug_addr section would have that
specific address
>>> with a relocation for it.
>>>
>>>
>>>>
>>>> vedant
>>>>
>>>>
>>>>
>>>>> Apologies if this has been answered elsewhere, I suppose
there must be
>>>>> a solution for this for At_high_pc to work.
>>>>>
>>>>> vedant
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> thanks,
>>>>>> vedant
>>>>>>
>>>>>> On Jan 8, 2020, at 1:33 PM, David Blaikie via llvm-dev
<
>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>
>>>>>> Sounds good all round - I'll commit these two
modes, and maybe even
>>>>>> the third (given Sony's interest & possible
interest in changing their
>>>>>> consumer to handle it) of a custom form to eek out the
last few bytes from
>>>>>> the more direct addr+offset encoding.
>>>>>>
>>>>>> I'll follow up here with flag names and revision
numbers once they're
>>>>>> in.
>>>>>>
>>>>>> On Wed, Jan 8, 2020 at 1:26 PM Robinson, Paul
<paul.robinson at sony.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On some previous occasion that introduced
additional indirection
>>>>>>> (don't remember the details) my debugger people
groused about the
>>>>>>> additional performance cost of chasing down data in
a different
>>>>>>> object-file section.  So we (Sony) might be happier
with low_pc as
>>>>>>> expressions, than with a ranges-always solution.
>>>>>>>
>>>>>>> But hard to say without data, and getting both
modes in at least
>>>>>>> as a temporary thing sounds like a good plan.
>>>>>>> --paulr
>>>>>>>
>>>>>>>
>>>>>>> > -----Original Message-----
>>>>>>> > From: aprantl at apple.com <aprantl at
apple.com>
>>>>>>> > Sent: Wednesday, January 8, 2020 1:49 PM
>>>>>>> > To: David Blaikie <dblaikie at
gmail.com>
>>>>>>> > Cc: llvm-dev <llvm-dev at
lists.llvm.org>; Jonas Devlieghere
>>>>>>> > <jdevlieghere at apple.com>; Robinson,
Paul <paul.robinson at sony.com>;
>>>>>>> Eric
>>>>>>> > Christopher <echristo at gmail.com>;
Frederic Riss <friss at apple.com>
>>>>>>> > Subject: Re: Increasing address pool
reuse/reducing .o file size in
>>>>>>> > DWARFv5
>>>>>>> >
>>>>>>> > I think this sounds like a good plan for
Linux. I would like to see
>>>>>>> the
>>>>>>> > numbers for Darwin (= non-split DWARF) to
decide whether we should
>>>>>>> just
>>>>>>> > make that the default. Eric's suggestion
of having this committed
>>>>>>> as an
>>>>>>> > option first seems like a good step in that
direction. If it is an
>>>>>>> > advantage across the board we can remove the
option and just make
>>>>>>> this the
>>>>>>> > default behavior.
>>>>>>> >
>>>>>>> > thanks,
>>>>>>> > adrian
>>>>>>> >
>>>>>>> > > On Dec 30, 2019, at 12:08 PM, David
Blaikie <dblaikie at gmail.com>
>>>>>>> wrote:
>>>>>>> > >
>>>>>>> > > tl;dr: in DWARFv5, using DW_AT_ranges
even when the range is
>>>>>>> contiguous
>>>>>>> > reduces linked, uncompressed debug_addr size
for optimized builds
>>>>>>> by 93%
>>>>>>> > and reduces total .o file size (with
compression and split) by 15%.
>>>>>>> It
>>>>>>> > does grow .dwo file size a bit - DWARFv5, no
compression, not split
>>>>>>> shows
>>>>>>> > the net effect if all bytes are equal: -O3
clang binary grows by
>>>>>>> 0.4%, -O0
>>>>>>> > clang binary shrinks by 0.1%
>>>>>>> > > Should we enable this strategy by default
for DWARFv5, for
>>>>>>> DWARFv5+Split
>>>>>>> > DWARF, or not by default at all/only under a
flag?
>>>>>>> > >
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > So, I've brought this up a few times
before - that DWARFv5 does a
>>>>>>> pretty
>>>>>>> > good job of reducing relocations (&
reducing .o file size with Split
>>>>>>> > DWARF) by allowing many uses of addresses to
include some kind of
>>>>>>> > address+offset (debug_rnglists and loclists
allowing "base_address"
>>>>>>> then
>>>>>>> > offset_pairs (an improvement over similar
functionality in DWARFv4
>>>>>>> because
>>>>>>> > the offset pairs can be uleb encoded - so they
can be quite
>>>>>>> compact))
>>>>>>> > >
>>>>>>> > > But one place that DWARFv5 misses to
reduce relocations further is
>>>>>>> > direct addresses from debug_info, such as
DW_AT_low_pc.
>>>>>>> > >
>>>>>>> > > For a while I've wondered if we could
use an extension form for
>>>>>>> > addr+offset, and I prototyped this without an
extension attribute,
>>>>>>> but
>>>>>>> > instead using exprloc. This has slightly
higher overhead to express
>>>>>>> the...
>>>>>>> > expression. (it's 9 bytes in total, could
be as few as 5 with a
>>>>>>> custom
>>>>>>> > form)
>>>>>>> > >
>>>>>>> > > But I had another idea that's more
instantly deployable: Why not
>>>>>>> use
>>>>>>> > DW_AT_ranges even when the range is
contiguous? That way the low_pc
>>>>>>> that
>>>>>>> > previously couldn't use an existing
address pool entry + offset,
>>>>>>> could use
>>>>>>> > the rnglist support for base address.
>>>>>>> > >
>>>>>>> > > The only unnecessary address pool entries
that remain that I've
>>>>>>> found
>>>>>>> > are DW_AT_low_pc for DW_TAG_labels - but
there's only a handful of
>>>>>>> those
>>>>>>> > in most code. So the "ranges
everywhere" strategy gets the
>>>>>>> addresses for
>>>>>>> > optimized clang down from 4758 (v4 address
pool used 9923
>>>>>>> addresses... )
>>>>>>> > to 342, with about ~4 "extra"
addresses for DW_TAG_labels.
>>>>>>> > >
>>>>>>> > > This could also be a bit less costly if
DWARFv5 rnglists didn't
>>>>>>> use a
>>>>>>> > separate offset table (instead encoding the
offsets directly in
>>>>>>> > debug_info, rather than using indexes)
>>>>>>> > >
>>>>>>> > > I have patches for both the addr+offset
exprloc and for the
>>>>>>> ranges-
>>>>>>> > always, both with -mllvm flags - do people
think they're both worth
>>>>>>> > committing for experimentation? Neither?
Default on in some cases
>>>>>>> (like
>>>>>>> > Split DWARF)?
>>>>>>> > >
>>>>>>> > > Thanks,
>>>>>>> > > - Dave
>>>>>>>
>>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>>
>_______________________________________________
>LLVM Developers mailing list
>llvm-dev at lists.llvm.org
>https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

llvm dev - Feb 2021 - Increasing address pool reuse/reducing .o file size in DWARFv5

[llvm-dev] Increasing address pool reuse/reducing .o file size in DWARFv5

[llvm-dev] Increasing address pool reuse/reducing .o file size in DWARFv5