thr3ads.net - llvm dev - [llvm-dev] DWARF .debug_aranges data objects and address spaces [Mar 2020]

If this information is useful, please help other people find it:
Share via:

Luke Drummond via llvm-dev

2020-Mar-12 18:00 UTC

[llvm-dev] DWARF .debug_aranges data objects and address spaces

On Thu Mar 12, 2020 at 5:37 PM, David Blaikie wrote:> On Wed, Mar 11, 2020 at 8:09 AM Luke Drummond
> <luke.drummond at codeplay.com>
> wrote:
>
> > On Tue Mar 10, 2020 at 7:45 PM, David Blaikie wrote:
> > > If you only want code addresses, why not use the CU's
> > > low_pc/high_pc/ranges
> > > - those are guaranteed to be only code addresses, I think?
> > >
> > In the common case, for most targets LLVM supports I think you're
right,
> > but for my case, regrettably, not. Because my target is a Harvard
> > Architecture, any code address can have the same ordinal value as any
> > data address: the code and data reside on different buses so the whole
> > 4GiB space is available to both code, and data. `DW_AT_low_pc` and
> > `DW_AT_high_pc` can be used to find the range of the code segment, but
> > given an arbitrary address, cannot be used to conclusively determine
> > whether that address belongs to code or data when both segments
contain
> > addresses in that numeric range.
>
>
> Sorry I'm not following, partly probably due to my not having worked
> with
> such machines before.
>
> But how are the code addresses and data addresses differentiated then
> (eg:
> if you had segment selectors in debug_aranges, how would they be used?
> The
> addresses taken from the system at runtime have some kind of segment
> selector associated with them, that you can then use to match with the
> addr+segment selector in aranges?).Yes. This. The system mostly provides us the ability to disambiguate
addresses because the device's simulator / debugger make this
unambiguous, but the current .debug_aranges does not allow us to do this
because it's missing such info.>
> Actually, coming at it from a different angle: It sounds like in the
> original email you're suggesting if debug_aranges did not contain data
> addresses, this would be good/sufficient for you? So somehow you'd be
> ensuring you only query debug_aranges using things you know are code
> addresses, not data addresses? So why would the same solution/approach
> not
> hold to querying low/high/ranges on a CU that's already guaranteed not
> to
> contain data addresses?That's the root of the issue: the .debug_aranges section emitted by llvm
*does* contain data addresses by default and therefore can be ambiguous.
I've worked around this locally by hacking llvm to only emit aranges for
text objects, but I was wandering if it's something that's valuable to
fix upstream. My guess is that it's probably too niche to worry about
for the moment, but if there's interest I can propose a design (probably
a target hook to ask if segment selectors are required and how to get
their number from an object).

Thanks for your help

Luke

-- 
Codeplay Software Ltd.
Company registered in England and Wales, number: 04567874
Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF

David Blaikie via llvm-dev

2020-Mar-12 18:19 UTC

head link

[llvm-dev] DWARF .debug_aranges data objects and address spaces

On Thu, Mar 12, 2020 at 11:00 AM Luke Drummond <luke.drummond at
codeplay.com>
wrote:
> On Thu Mar 12, 2020 at 5:37 PM, David Blaikie wrote:
> > On Wed, Mar 11, 2020 at 8:09 AM Luke Drummond
> > <luke.drummond at codeplay.com>
> > wrote:
> >
> > > On Tue Mar 10, 2020 at 7:45 PM, David Blaikie wrote:
> > > > If you only want code addresses, why not use the CU's
> > > > low_pc/high_pc/ranges
> > > > - those are guaranteed to be only code addresses, I think?
> > > >
> > > In the common case, for most targets LLVM supports I think
you're
> right,
> > > but for my case, regrettably, not. Because my target is a Harvard
> > > Architecture, any code address can have the same ordinal value as
any
> > > data address: the code and data reside on different buses so the
whole
> > > 4GiB space is available to both code, and data. `DW_AT_low_pc`
and
> > > `DW_AT_high_pc` can be used to find the range of the code
segment, but
> > > given an arbitrary address, cannot be used to conclusively
determine
> > > whether that address belongs to code or data when both segments
contain
> > > addresses in that numeric range.
> >
> >
> > Sorry I'm not following, partly probably due to my not having
worked
> > with
> > such machines before.
> >
> > But how are the code addresses and data addresses differentiated then
> > (eg:
> > if you had segment selectors in debug_aranges, how would they be used?
> > The
> > addresses taken from the system at runtime have some kind of segment
> > selector associated with them, that you can then use to match with the
> > addr+segment selector in aranges?).
> Yes. This. The system mostly provides us the ability to disambiguate
> addresses because the device's simulator / debugger make this
> unambiguous, but the current .debug_aranges does not allow us to do this
> because it's missing such info.
> >
> > Actually, coming at it from a different angle: It sounds like in the
> > original email you're suggesting if debug_aranges did not contain
data
> > addresses, this would be good/sufficient for you? So somehow you'd
be
> > ensuring you only query debug_aranges using things you know are code
> > addresses, not data addresses? So why would the same solution/approach
> > not
> > hold to querying low/high/ranges on a CU that's already guaranteed
not
> > to
> > contain data addresses?
> That's the root of the issue: the .debug_aranges section emitted by
llvm
> *does* contain data addresses by default and therefore can be ambiguous.
> I've worked around this locally by hacking llvm to only emit aranges
for
> text objects,

Sorry, but I'm still not understanding why "aranges for only text
objects"
is more usable for your use case than "high/low/ranges on the CU"?
Could
you help me understand how those are different in your situation?

> but I was wandering if it's something that's valuable to
> fix upstream. My guess is that it's probably too niche to worry about
> for the moment, but if there's interest I can propose a design
(probably
> a target hook to ask if segment selectors are required and how to get
> their number from an object).
>
Added a few debug info folks in case they've got opinions. I don't
really
mind if we removed data objects from debug_aranges, though as you say, it's
arguably correct/maybe useful as-is. Supporting it properly - probably
using address segment selectors would be fine too, I guess AVR uses address
spaces for its pointers to differentiate data and code addresses? In which
case we could encode the LLVM address space as the segment selector (&
probably would need to query the target to decide if it has non-zero
address spaces and use that to decide whether to use segment selectors in
debug_aranges)

But in general, I'm mostly just discouraging people from using aranges -
the data is duplicated in the CU's ranges anyway (there's some small
caveats there - a producer doesn't /have/ to produce ranges on the CU, but
I'd just say lower performance on such DWARF would be acceptable) &
makes
object files/executables larger for minimal value/mostly duplicate data.

- Dave

>
> Thanks for your help
>
> Luke
>
> --
> Codeplay Software Ltd.
> Company registered in England and Wales, number: 04567874
> Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200312/fd9f1a29/attachment.html>

Robinson, Paul via llvm-dev

2020-Mar-12 20:51 UTC

head link

[llvm-dev] DWARF .debug_aranges data objects and address spaces

I’ve encountered this kind of architecture before, a long time ago
(academically).    In a flat-address-space machine such as X64, there is still
an instruction/data distinction, but usually only down at the level of I-cache
versus D-cache (instruction fetch versus data fetch).  A Harvard architecture
machine exposes that to the programmer, which effectively doubles the available
address space.  Code and data live in different address spaces, although the
address space identifier per se is not explicit.  A Move instruction would
implicitly use the data address space, while an indirect Branch would implicitly
target the code address space.  An OS running on a Harvard architecture would
require the loader to be privileged, so it can map data from an object file into
the code address space and implement any necessary fixups.  Self-modifying code
is at least wicked hard if not impossible to achieve.

In DWARF this would indeed be described by a segment selector.  It’s up to the
target ABI to specify what the segment selector numbers actually are.  For a
Harvard architecture machine this is pretty trivial, you say something like 0
for code and 1 for data.  Boom done.

LLVM basically doesn’t have targets like this, or at least it has never come up
before that I’m aware of.  So, when we emit DWARF, we assume a flat address
space (unconditionally setting the segment selector size to zero), and
llvm-dwarfdump will choke (hopefully cleanly, but still) on an object file that
uses DWARF segment selectors.

The point of .debug_aranges is to accelerate the search for the appropriate CU. 
Yes you can spend time trolling through .debug_info and .debug_abbrev, decoding
the CU DIEs looking for low_pc/high_pc pairs (or perhaps pointers to
.debug_ranges) and effectively rebuild a .debug_aranges section yourself, if the
compiler/linker isn’t kind enough to pre-build the table for you.  I don’t
understand why .debug_aranges should be discouraged; I shouldn’t think they
would be huge, and consumers can avoid loading lots of data just to figure out
what’s not worth looking at.  Forcing all consumers to do things the slow way
seems unnecessarily inefficient.

Thinking about Harvard architecture specifically, you *need* the segment
selector only when an address could be ambiguous about whether it’s a code or
data address.  This basically comes up *only* in .debug_aranges, he said
thinking about it for about 30 seconds.  Within .debug_info you don’t need it
because when you pick up the address of an entity, you know whether it’s for a
code or data entity.  Location lists and range lists always point to code.  For
.debug_aranges you would need the segment selector, but I think that’s the only
place.

For an architecture with multiple code or data segments, then you’d need the
segment selector more widely, but I should think this case wouldn’t be all that
difficult to make work.  Even factoring in the llvm-dwarfdump part, it has to
understand the selector only for the .debug_aranges section; everything else can
remain as it is, pretending there’s a flat address space.

Now, if your target is downstream, that would make upstreaming the LLVM support
a bit dicier, because we’d not want to have that feature in the upstream repo if
there are no targets using it.  You’d be left maintaining that patch on your
own.  But as I described above, I don’t think it would be a huge deal.

HTH,
--paulr

From: David Blaikie <dblaikie at gmail.com>
Sent: Thursday, March 12, 2020 2:20 PM
To: Luke Drummond <luke.drummond at codeplay.com>; Adrian Prantl
<aprantl at apple.com>; Jonas Devlieghere <jdevlieghere at
apple.com>; Robinson, Paul <paul.robinson at sony.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] DWARF .debug_aranges data objects and address spaces

On Thu, Mar 12, 2020 at 11:00 AM Luke Drummond <luke.drummond at
codeplay.com<mailto:luke.drummond at codeplay.com>> wrote:
On Thu Mar 12, 2020 at 5:37 PM, David Blaikie wrote:> On Wed, Mar 11, 2020 at 8:09 AM Luke Drummond
> <luke.drummond at codeplay.com<mailto:luke.drummond at
codeplay.com>>
> wrote:
>
> > On Tue Mar 10, 2020 at 7:45 PM, David Blaikie wrote:
> > > If you only want code addresses, why not use the CU's
> > > low_pc/high_pc/ranges
> > > - those are guaranteed to be only code addresses, I think?
> > >
> > In the common case, for most targets LLVM supports I think you're
right,
> > but for my case, regrettably, not. Because my target is a Harvard
> > Architecture, any code address can have the same ordinal value as any
> > data address: the code and data reside on different buses so the whole
> > 4GiB space is available to both code, and data. `DW_AT_low_pc` and
> > `DW_AT_high_pc` can be used to find the range of the code segment, but
> > given an arbitrary address, cannot be used to conclusively determine
> > whether that address belongs to code or data when both segments
contain
> > addresses in that numeric range.
>
>
> Sorry I'm not following, partly probably due to my not having worked
> with
> such machines before.
>
> But how are the code addresses and data addresses differentiated then
> (eg:
> if you had segment selectors in debug_aranges, how would they be used?
> The
> addresses taken from the system at runtime have some kind of segment
> selector associated with them, that you can then use to match with the
> addr+segment selector in aranges?).Yes. This. The system mostly provides us the ability to disambiguate
addresses because the device's simulator / debugger make this
unambiguous, but the current .debug_aranges does not allow us to do this
because it's missing such info.>
> Actually, coming at it from a different angle: It sounds like in the
> original email you're suggesting if debug_aranges did not contain data
> addresses, this would be good/sufficient for you? So somehow you'd be
> ensuring you only query debug_aranges using things you know are code
> addresses, not data addresses? So why would the same solution/approach
> not
> hold to querying low/high/ranges on a CU that's already guaranteed not
> to
> contain data addresses?That's the root of the issue: the .debug_aranges section emitted by llvm
*does* contain data addresses by default and therefore can be ambiguous.
I've worked around this locally by hacking llvm to only emit aranges for
text objects,

Sorry, but I'm still not understanding why "aranges for only text
objects" is more usable for your use case than "high/low/ranges on the
CU"? Could you help me understand how those are different in your
situation?

but I was wandering if it's something that's valuable to
fix upstream. My guess is that it's probably too niche to worry about
for the moment, but if there's interest I can propose a design (probably
a target hook to ask if segment selectors are required and how to get
their number from an object).

Added a few debug info folks in case they've got opinions. I don't
really mind if we removed data objects from debug_aranges, though as you say,
it's arguably correct/maybe useful as-is. Supporting it properly - probably
using address segment selectors would be fine too, I guess AVR uses address
spaces for its pointers to differentiate data and code addresses? In which case
we could encode the LLVM address space as the segment selector (& probably
would need to query the target to decide if it has non-zero address spaces and
use that to decide whether to use segment selectors in debug_aranges)

But in general, I'm mostly just discouraging people from using aranges - the
data is duplicated in the CU's ranges anyway (there's some small caveats
there - a producer doesn't /have/ to produce ranges on the CU, but I'd
just say lower performance on such DWARF would be acceptable) & makes object
files/executables larger for minimal value/mostly duplicate data.

- Dave

Thanks for your help

Luke

--
Codeplay Software Ltd.
Company registered in England and Wales, number: 04567874
Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200312/b44d002b/attachment.html>

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Mar 2020 - DWARF .debug_aranges data objects and address spaces

[llvm-dev] DWARF .debug_aranges data objects and address spaces

[llvm-dev] DWARF .debug_aranges data objects and address spaces

[llvm-dev] DWARF .debug_aranges data objects and address spaces

Reasonably Related Threads