thr3ads.net - llvm dev - [llvm-dev] DWARF .debug_aranges data objects and address spaces [Mar 2020]

If this information is useful, please help other people find it:
Share via:

David Blaikie via llvm-dev

2020-Mar-16 17:20 UTC

[llvm-dev] DWARF .debug_aranges data objects and address spaces

On Mon, Mar 16, 2020 at 9:31 AM Robinson, Paul <paul.robinson at sony.com>
wrote:
> With AVR being affected, upstreaming a patch to put segment selectors into
> .debug_aranges becomes completely reasonable.  There would likely want to
> be a target hook somewhere to return a value saying what size to use, with
> the default implementation returning zero.
>
*nod* something along those lines

>  > If the producer has put ranges on the CU it's not a lot of work -
it's
> parsing one DIE & looking for a couple of attributes.
>
>
>
> It’s walking through all the CUs, picking up the associated abbrevs,
> trolling down the list of attributes… “not a lot” indeed, but not as
> trivial as running through a single section linearly, which is what
> .debug_aranges gets you.  I’ve been lectured by @clayborg on what consumers
> really want for performance gains.
>
Sure enough - though I don't believe aranges is used by default on any
target/platform LLVM supports, so this time/space tradeoff doesn't seem to
have been important to any of them?

>  > It's enough at least at Google for us to not use them & use
CU ranges
> for the same purpose.
>
>
> Google is much more seriously concerned about debug-info size than about
> debugger performance, IIRC.  This is not universally the preferred
> tradeoff.  Just sayin’.
>
Sure enough.

I've just had a couple of people ask about aranges recently (~year or so)
&
when pressing a little further, using the CU's address ranges turned out to
be sufficient for their needs without having to change Clang's defaults or
have their users specify extra flags to explicitly request them, etc.

Out of curiosity/for data/usage/etc - does Sony use aranges? (changing the
default when targeting SCE or the like)

- Dave

> --paulr
>
>
>
> *From:* Dylan McKay <me at dylanmckay.io>
> *Sent:* Monday, March 16, 2020 1:32 AM
> *To:* David Blaikie <dblaikie at gmail.com>
> *Cc:* Robinson, Paul <paul.robinson at sony.com>; llvm-dev at
lists.llvm.org
> *Subject:* Re: [llvm-dev] DWARF .debug_aranges data objects and address
> spaces
>
>
>
> I'm not across most of this debug info stuff but I'll stomp in here
to
> confirm that AVR is a Harvard architecture, with separate addressing for
> the data and program buses via specialized instructions which will load
> from either one, or the other, but never both.
>
>
>
> It makes sense that this particular problem would also affect AVR - the
> backend does have some issues with debug info generation.
>
>
>
>
>
>
>
> On Fri, Mar 13, 2020 at 12:22 PM David Blaikie via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
>
>
>
> On Thu, Mar 12, 2020 at 1:51 PM Robinson, Paul <paul.robinson at
sony.com>
> wrote:
>
> I’ve encountered this kind of architecture before, a long time ago
> (academically).    In a flat-address-space machine such as X64, there is
> still an instruction/data distinction, but usually only down at the level
> of I-cache versus D-cache (instruction fetch versus data fetch).  A Harvard
> architecture machine exposes that to the programmer, which effectively
> doubles the available address space.  Code and data live in different
> address spaces, although the address space identifier per se is not
> explicit.  A Move instruction would implicitly use the data address space,
> while an indirect Branch would implicitly target the code address space.
> An OS running on a Harvard architecture would require the loader to be
> privileged, so it can map data from an object file into the code address
> space and implement any necessary fixups.  Self-modifying code is at least
> wicked hard if not impossible to achieve.
>
>
>
> In DWARF this would indeed be described by a segment selector.  It’s up to
> the target ABI to specify what the segment selector numbers actually are.
> For a Harvard architecture machine this is pretty trivial, you say
> something like 0 for code and 1 for data.  Boom done.
>
>
>
> LLVM basically doesn’t have targets like this, or at least it has never
> come up before that I’m aware of.  So, when we emit DWARF, we assume a flat
> address space (unconditionally setting the segment selector size to zero),
> and llvm-dwarfdump will choke (hopefully cleanly, but still) on an object
> file that uses DWARF segment selectors.
>
>
> FWIW Luke mentioned in the original email the AVR in-tree backend seems to
> have this problem with an ambiguous debug_aranges entries.
>
>
>  The point of .debug_aranges is to accelerate the search for the
> appropriate CU.  Yes you can spend time trolling through .debug_info and
> .debug_abbrev, decoding the CU DIEs looking for low_pc/high_pc pairs (or
> perhaps pointers to .debug_ranges) and effectively rebuild a .debug_aranges
> section yourself, if the compiler/linker isn’t kind enough to pre-build the
> table for you.  I don’t understand why .debug_aranges should be
> discouraged; I shouldn’t think they would be huge, and consumers can avoid
> loading lots of data just to figure out what’s not worth looking at.
> Forcing all consumers to do things the slow way seems unnecessarily
> inefficient.
>
>
> If the producer has put ranges on the CU it's not a lot of work -
it's
> parsing one DIE & looking for a couple of attributes. With Split DWARF
the
> cost of becomes a bit more prominent - Sema.o from clang, with split dwarf
> (v4 or v5 about the same) is about 3.5% larger with debug aranges (not sure
> about the overall data). It's enough at least at Google for us to not
use
> them & use CU ranges for the same purpose.
>
> I thought I might be able to find some email history about why we turned
> it off by default, but seems we never turned it /on/ by default to begin
> with & it wasn't implemented until relatively late in the game
(well, what
> I think as relatively late - after I started on the project at least).
>
>
>  Thinking about Harvard architecture specifically, you **need** the
> segment selector only when an address could be ambiguous about whether it’s
> a code or data address.  This basically comes up **only** in
> .debug_aranges, he said thinking about it for about 30 seconds.  Within
> .debug_info you don’t need it because when you pick up the address of an
> entity, you know whether it’s for a code or data entity.  Location lists
> and range lists always point to code.  For .debug_aranges you would need
> the segment selector, but I think that’s the only place.
>
>
>
> For an architecture with multiple code or data segments, then you’d need
> the segment selector more widely, but I should think this case wouldn’t be
> all that difficult to make work.  Even factoring in the llvm-dwarfdump
> part, it has to understand the selector only for the .debug_aranges
> section; everything else can remain as it is, pretending there’s a flat
> address space.
>
>
>
> Now, if your target is downstream, that would make upstreaming the LLVM
> support a bit dicier, because we’d not want to have that feature in the
> upstream repo if there are no targets using it.  You’d be left maintaining
> that patch on your own.  But as I described above, I don’t think it would
> be a huge deal.
>
>
>
> HTH,
>
> --paulr
>
>
>
> *From:* David Blaikie <dblaikie at gmail.com>
> *Sent:* Thursday, March 12, 2020 2:20 PM
> *To:* Luke Drummond <luke.drummond at codeplay.com>; Adrian Prantl
<
> aprantl at apple.com>; Jonas Devlieghere <jdevlieghere at
apple.com>; Robinson,
> Paul <paul.robinson at sony.com>
> *Cc:* llvm-dev at lists.llvm.org
> *Subject:* Re: [llvm-dev] DWARF .debug_aranges data objects and address
> spaces
>
>
>
>
>
>
>
> On Thu, Mar 12, 2020 at 11:00 AM Luke Drummond <luke.drummond at
codeplay.com>
> wrote:
>
> On Thu Mar 12, 2020 at 5:37 PM, David Blaikie wrote:
> > On Wed, Mar 11, 2020 at 8:09 AM Luke Drummond
> > <luke.drummond at codeplay.com>
> > wrote:
> >
> > > On Tue Mar 10, 2020 at 7:45 PM, David Blaikie wrote:
> > > > If you only want code addresses, why not use the CU's
> > > > low_pc/high_pc/ranges
> > > > - those are guaranteed to be only code addresses, I think?
> > > >
> > > In the common case, for most targets LLVM supports I think
you're
> right,
> > > but for my case, regrettably, not. Because my target is a Harvard
> > > Architecture, any code address can have the same ordinal value as
any
> > > data address: the code and data reside on different buses so the
whole
> > > 4GiB space is available to both code, and data. `DW_AT_low_pc`
and
> > > `DW_AT_high_pc` can be used to find the range of the code
segment, but
> > > given an arbitrary address, cannot be used to conclusively
determine
> > > whether that address belongs to code or data when both segments
contain
> > > addresses in that numeric range.
> >
> >
> > Sorry I'm not following, partly probably due to my not having
worked
> > with
> > such machines before.
> >
> > But how are the code addresses and data addresses differentiated then
> > (eg:
> > if you had segment selectors in debug_aranges, how would they be used?
> > The
> > addresses taken from the system at runtime have some kind of segment
> > selector associated with them, that you can then use to match with the
> > addr+segment selector in aranges?).
> Yes. This. The system mostly provides us the ability to disambiguate
> addresses because the device's simulator / debugger make this
> unambiguous, but the current .debug_aranges does not allow us to do this
> because it's missing such info.
> >
> > Actually, coming at it from a different angle: It sounds like in the
> > original email you're suggesting if debug_aranges did not contain
data
> > addresses, this would be good/sufficient for you? So somehow you'd
be
> > ensuring you only query debug_aranges using things you know are code
> > addresses, not data addresses? So why would the same solution/approach
> > not
> > hold to querying low/high/ranges on a CU that's already guaranteed
not
> > to
> > contain data addresses?
> That's the root of the issue: the .debug_aranges section emitted by
llvm
> *does* contain data addresses by default and therefore can be ambiguous.
> I've worked around this locally by hacking llvm to only emit aranges
for
> text objects,
>
>
> Sorry, but I'm still not understanding why "aranges for only text
objects"
> is more usable for your use case than "high/low/ranges on the
CU"? Could
> you help me understand how those are different in your situation?
>
>
> but I was wandering if it's something that's valuable to
> fix upstream. My guess is that it's probably too niche to worry about
> for the moment, but if there's interest I can propose a design
(probably
> a target hook to ask if segment selectors are required and how to get
> their number from an object).
>
>
> Added a few debug info folks in case they've got opinions. I don't
really
> mind if we removed data objects from debug_aranges, though as you say,
it's
> arguably correct/maybe useful as-is. Supporting it properly - probably
> using address segment selectors would be fine too, I guess AVR uses address
> spaces for its pointers to differentiate data and code addresses? In which
> case we could encode the LLVM address space as the segment selector (&
> probably would need to query the target to decide if it has non-zero
> address spaces and use that to decide whether to use segment selectors in
> debug_aranges)
>
> But in general, I'm mostly just discouraging people from using aranges
-
> the data is duplicated in the CU's ranges anyway (there's some
small
> caveats there - a producer doesn't /have/ to produce ranges on the CU,
but
> I'd just say lower performance on such DWARF would be acceptable) &
makes
> object files/executables larger for minimal value/mostly duplicate data.
>
> - Dave
>
>
>
> Thanks for your help
>
> Luke
>
> --
> Codeplay Software Ltd.
> Company registered in England and Wales, number: 04567874
> Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200316/c5bd8995/attachment.html>

Robinson, Paul via llvm-dev

2020-Mar-16 17:49 UTC

head link

[llvm-dev] DWARF .debug_aranges data objects and address spaces

SCE tuning does turn on the .debug_aranges section.  Our debugger team really
cares about startup cost. Turnaround time in general is huge for our licensees,
to the point where we support edit-and-continue (minimal rebuild, live-patch the
running process).
--paulr

From: David Blaikie <dblaikie at gmail.com>
Sent: Monday, March 16, 2020 1:20 PM
To: Robinson, Paul <paul.robinson at sony.com>
Cc: Dylan McKay <me at dylanmckay.io>; llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] DWARF .debug_aranges data objects and address spaces

On Mon, Mar 16, 2020 at 9:31 AM Robinson, Paul <paul.robinson at
sony.com<mailto:paul.robinson at sony.com>> wrote:
With AVR being affected, upstreaming a patch to put segment selectors into
.debug_aranges becomes completely reasonable.  There would likely want to be a
target hook somewhere to return a value saying what size to use, with the
default implementation returning zero.

*nod* something along those lines

 > If the producer has put ranges on the CU it's not a lot of work -
it's parsing one DIE & looking for a couple of attributes.

It’s walking through all the CUs, picking up the associated abbrevs, trolling
down the list of attributes… “not a lot” indeed, but not as trivial as running
through a single section linearly, which is what .debug_aranges gets you.  I’ve
been lectured by @clayborg on what consumers really want for performance gains.

Sure enough - though I don't believe aranges is used by default on any
target/platform LLVM supports, so this time/space tradeoff doesn't seem to
have been important to any of them?

 > It's enough at least at Google for us to not use them & use CU
ranges for the same purpose.

Google is much more seriously concerned about debug-info size than about
debugger performance, IIRC.  This is not universally the preferred tradeoff. 
Just sayin’.

Sure enough.

I've just had a couple of people ask about aranges recently (~year or so)
& when pressing a little further, using the CU's address ranges turned
out to be sufficient for their needs without having to change Clang's
defaults or have their users specify extra flags to explicitly request them,
etc.

Out of curiosity/for data/usage/etc - does Sony use aranges? (changing the
default when targeting SCE or the like)

- Dave

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200316/fd30efc8/attachment.html>

David Blaikie via llvm-dev

2020-Mar-16 17:56 UTC

head link

[llvm-dev] DWARF .debug_aranges data objects and address spaces

On Mon, Mar 16, 2020 at 10:50 AM Robinson, Paul <paul.robinson at
sony.com>
wrote:
> SCE tuning does turn on the .debug_aranges section.  Our debugger team
> really cares about startup cost. Turnaround time in general is huge for our
> licensees, to the point where we support edit-and-continue (minimal
> rebuild, live-patch the running process).
>
Ah, good to know! I'd be curious to know about the performance tradeoff
when they're disabled if you ever happen to have data around that.
I guess a related question: Does SCE use the non-.text entries (or
otherwise have an opinion on having them) in debug_aranges?

> --paulr
>
>
>
> *From:* David Blaikie <dblaikie at gmail.com>
> *Sent:* Monday, March 16, 2020 1:20 PM
> *To:* Robinson, Paul <paul.robinson at sony.com>
> *Cc:* Dylan McKay <me at dylanmckay.io>; llvm-dev at lists.llvm.org
> *Subject:* Re: [llvm-dev] DWARF .debug_aranges data objects and address
> spaces
>
>
>
> On Mon, Mar 16, 2020 at 9:31 AM Robinson, Paul <paul.robinson at
sony.com>
> wrote:
>
> With AVR being affected, upstreaming a patch to put segment selectors into
> .debug_aranges becomes completely reasonable.  There would likely want to
> be a target hook somewhere to return a value saying what size to use, with
> the default implementation returning zero.
>
>
>
> *nod* something along those lines
>
>
>
>  > If the producer has put ranges on the CU it's not a lot of work -
it's
> parsing one DIE & looking for a couple of attributes.
>
>
>
> It’s walking through all the CUs, picking up the associated abbrevs,
> trolling down the list of attributes… “not a lot” indeed, but not as
> trivial as running through a single section linearly, which is what
> .debug_aranges gets you.  I’ve been lectured by @clayborg on what consumers
> really want for performance gains.
>
>
> Sure enough - though I don't believe aranges is used by default on any
> target/platform LLVM supports, so this time/space tradeoff doesn't seem
to
> have been important to any of them?
>
>
>  > It's enough at least at Google for us to not use them & use
CU ranges
> for the same purpose.
>
>
>
> Google is much more seriously concerned about debug-info size than about
> debugger performance, IIRC.  This is not universally the preferred
> tradeoff.  Just sayin’.
>
>
> Sure enough.
>
> I've just had a couple of people ask about aranges recently (~year or
so)
> & when pressing a little further, using the CU's address ranges
turned out
> to be sufficient for their needs without having to change Clang's
defaults
> or have their users specify extra flags to explicitly request them, etc.
>
> Out of curiosity/for data/usage/etc - does Sony use aranges? (changing the
> default when targeting SCE or the like)
>
> - Dave
>
>
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200316/ea0e7012/attachment.html>

Pavel Labath via llvm-dev

2020-Mar-17 07:02 UTC

head link

[llvm-dev] DWARF .debug_aranges data objects and address spaces

On 16/03/2020 18:20, David Blaikie via llvm-dev wrote:> On Mon, Mar 16, 2020 at 9:31 AM Robinson, Paul <paul.robinson at
sony.com
> <mailto:paul.robinson at sony.com>> wrote:
> 
>     With AVR being affected, upstreaming a patch to put segment
>     selectors into .debug_aranges becomes completely reasonable.  There
>     would likely want to be a target hook somewhere to return a value
>     saying what size to use, with the default implementation returning
zero.
> 
> 
> *nod* something along those lines
>  
Does that mean putting the selector *only* into debug_aranges (and not
debug_line, debug_frame, etc.)?

Even though they are not really needed if the target only ever has one
code address space, it seems somewhat odd to have different values for
segment_selector_size in different sections.

In the DWARF spec these are described as "... containing the size in
bytes of a segment selector on the _target system_". I would interpret
the "target system" portion of that as meaning that the segment
selector
size is a property of a target, and hence, it should be consistent
across all relevant sections.

pl

Robinson, Paul via llvm-dev

2020-Mar-17 13:46 UTC

head link

[llvm-dev] DWARF .debug_aranges data objects and address spaces

> -----Original Message-----
> From: Pavel Labath <pavel at labath.sk>
> Sent: Tuesday, March 17, 2020 3:02 AM
> To: David Blaikie <dblaikie at gmail.com>; Robinson, Paul
> <paul.robinson at sony.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] DWARF .debug_aranges data objects and address
> spaces
> 
> On 16/03/2020 18:20, David Blaikie via llvm-dev wrote:
> > On Mon, Mar 16, 2020 at 9:31 AM Robinson, Paul <paul.robinson at
sony.com
> > <mailto:paul.robinson at sony.com>> wrote:
> >
> >     With AVR being affected, upstreaming a patch to put segment
> >     selectors into .debug_aranges becomes completely reasonable. 
There
> >     would likely want to be a target hook somewhere to return a value
> >     saying what size to use, with the default implementation returning
> zero.
> >
> >
> > *nod* something along those lines
> >
> 
> Does that mean putting the selector *only* into debug_aranges (and not
> debug_line, debug_frame, etc.)?
That was my thought, yes.  It's the only section where there is no other
context to determine whether a raw address is for code or for data.
> 
> Even though they are not really needed if the target only ever has one
> code address space, it seems somewhat odd to have different values for
> segment_selector_size in different sections.
> 
> In the DWARF spec these are described as "... containing the size in
> bytes of a segment selector on the _target system_". I would interpret
> the "target system" portion of that as meaning that the segment
selector
> size is a property of a target, and hence, it should be consistent
> across all relevant sections.
For a target with actual segments (like 80x86) the selector would always
have to be present.

For a Harvard target there is no explicit selector in the machine code, 
and a strict reading of the DWARF spec would require the segement selector
size to be zero in all cases; but that leaves us where we are today, with
.debug_aranges being impossible to interpret correctly.

IMO, having a segment selector in .debug_aranges and nowhere else, for a
Harvard architecture, falls within the "permissive" aspect of DWARF. 
It
solves an actual problem using what is IMO a reasonable interpretation of
the existing DWARF feature set.  If the AVR (+other Harvard-like) targets
prefer, I wouldn't stop them from adding a segment selector to all DWARF
sections, but it seems like a waste of space in other sections.

I'd be happy to propose a DWARF wiki item or even a non-normative bit of
text in the spec, to codify this.  It would affect consumers that target
a Harvard architecture, but they have to contend with this somehow in any
case.
--paulr
> 
> pl

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Mar 2020 - DWARF .debug_aranges data objects and address spaces

[llvm-dev] DWARF .debug_aranges data objects and address spaces

[llvm-dev] DWARF .debug_aranges data objects and address spaces

[llvm-dev] DWARF .debug_aranges data objects and address spaces

[llvm-dev] DWARF .debug_aranges data objects and address spaces

[llvm-dev] DWARF .debug_aranges data objects and address spaces

Seemingly Similar Threads