thr3ads.net - llvm dev - [llvm-dev] Debug Locations for Optimized Code [Dec 2016]

If this information is useful, please help other people find it:
Share via:

David Blaikie via llvm-dev

2016-Dec-07 18:26 UTC

[llvm-dev] Debug Locations for Optimized Code

On Wed, Dec 7, 2016 at 10:20 AM Hal Finkel <hfinkel at anl.gov> wrote:
> ----- Original Message -----
> > From: "Paul Robinson" <paul.robinson at sony.com>
> > To: "Hal Finkel" <hfinkel at anl.gov>, "David
Blaikie" <dblaikie at gmail.com>
> > Cc: llvm-dev at lists.llvm.org
> > Sent: Wednesday, December 7, 2016 9:39:16 AM
> > Subject: RE: [llvm-dev] Debug Locations for Optimized Code
> >
> > >> I don't know what the right, if any, solution to this is
- but I
> > >> thought I should bring it up in case you or anyone else
wanted to
> > >> puzzle it over & see if the competing needs/desires might
need to
> > >> be
> > >> considered.
> > > One thing that I recall being discussed was changing the way that
> > > we
> > > set the is_stmt flag in the DWARF line-table information. As I
> > > understand it, we currently set this flag for the first
instruction
> > > in
> > > any sequence that is on the same line. This is, in part, why the
> > > debugger appears to jump around when stepping through code with
> > > speculated instructions, etc. If we did not do this for
> > > out-of-place
> > > instructions, then we might be able to keep for debugging
> > > information
> > > for tools while still providing a reasonable debugging
experience.
> >
> > When we are looking at a situation where an instruction is merely
> > *moved*
> > from one place to another, retaining the source location and having a
> > less naïve statement-marking tactic could help the debugging
> > experience
> > without perturbing other consumers (although one still wonders
> > whether
> > profiles will get messed up in cases where e.g. a loop invariant gets
> > hoisted out of a cold loop into a hot predecessor).
> >
> > When we are looking at a situation where two instructions are
> > *merged* or
> > *combined* into one, and the original two instructions had different
> > source locations, that's a separate problem.  In that case there
is
> > no
> > single correct source location for the new instruction, and typically
> > erasing the source location will give a better debugging experience
> > (also
> > a less misleading profile).
>
> Is there a reason why we must only have one location for every
> instruction? If not, why not merge them and keep them all?
>
Not a requirement - of course we could keep them all with some kind of
ordered list and even potentially include a "this is the one we
would've
picked" info (eg: the first one's the one we would pick today, if we
would've picked one rather than none) so we could be backwards compatible
if desired.

That would be a lot of engineering work to plumb through LLVM the notion of
multiple debug locations, I think.

I'm not sure how DWARF (or CodeView) and its consumers currently copes with
multiple locations - it's probably technically possible to describe using
the line table format (not sure if it's intentional/documented for that
purpose), but existing consumers might have to be fixed not to trip over it.

It'd certainly be cute/fun/nice to have the extra fidelity (though all
extra fidelity also comes at a size cost to the IR and the resulting
object/executable files).

Not sure anyone's in a position to sign up for that work right now - but
maybe someone is. (looks like Apple's making a bit of a push on optimized
debug info quality at the moment)

- David

>
>  -Hal
>
> >
> > My personal opinion is that having sanitizers *rely* on debug info
> > for
> > accurate source attribution is just asking for trouble.  It happens
> > to
> > work at –O0 but cannot be considered reliable in the face of
> > optimization.
> > IMO this is a fundamental design flaw; debug info is best-effort and
> > full
> > of ambiguities, as shown above. Sanitizers need a more reliable
> > source-of-truth, i.e. they should encode source info into their own
> > instrumentation.
> >
> > --paulr
> >
> >
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161207/fefaefe6/attachment.html>

Robinson, Paul via llvm-dev

2016-Dec-07 21:11 UTC

head link

[llvm-dev] Debug Locations for Optimized Code

Is there a reason why we must only have one location for every instruction? If
not, why not merge them and keep them all?

Not a requirement - of course we could keep them all with some kind of ordered
list and even potentially include a "this is the one we would've
picked" info (eg: the first one's the one we would pick today, if we
would've picked one rather than none) so we could be backwards compatible if
desired.

That would be a lot of engineering work to plumb through LLVM the notion of
multiple debug locations, I think.

I'm not sure how DWARF (or CodeView) and its consumers currently copes with
multiple locations - it's probably technically possible to describe using
the line table format (not sure if it's intentional/documented for that
purpose), but existing consumers might have to be fixed not to trip over it.

Technically the DWARF encoding of the line table does allow it, I've seen it
happen, but not with the intent of describing two real source locations; it was
by accident.  (And was one of the things that prompted me to submit patch
D27492.)  I seriously doubt any DWARF consumer takes the trouble to look for it.
It's really not clear how a debugger *should* respond to seeing two source
locations for one instruction.
--paulr

From: David Blaikie [mailto:dblaikie at gmail.com]
Sent: Wednesday, December 07, 2016 10:27 AM
To: Hal Finkel; Robinson, Paul
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] Debug Locations for Optimized Code

On Wed, Dec 7, 2016 at 10:20 AM Hal Finkel <hfinkel at
anl.gov<mailto:hfinkel at anl.gov>> wrote:
----- Original Message -----> From: "Paul Robinson" <paul.robinson at
sony.com<mailto:paul.robinson at sony.com>>
> To: "Hal Finkel" <hfinkel at anl.gov<mailto:hfinkel at
anl.gov>>, "David Blaikie" <dblaikie at
gmail.com<mailto:dblaikie at gmail.com>>
> Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
> Sent: Wednesday, December 7, 2016 9:39:16 AM
> Subject: RE: [llvm-dev] Debug Locations for Optimized Code
>
> >> I don't know what the right, if any, solution to this is - but
I
> >> thought I should bring it up in case you or anyone else wanted to
> >> puzzle it over & see if the competing needs/desires might need
to
> >> be
> >> considered.
> > One thing that I recall being discussed was changing the way that
> > we
> > set the is_stmt flag in the DWARF line-table information. As I
> > understand it, we currently set this flag for the first instruction
> > in
> > any sequence that is on the same line. This is, in part, why the
> > debugger appears to jump around when stepping through code with
> > speculated instructions, etc. If we did not do this for
> > out-of-place
> > instructions, then we might be able to keep for debugging
> > information
> > for tools while still providing a reasonable debugging experience.
>
> When we are looking at a situation where an instruction is merely
> *moved*
> from one place to another, retaining the source location and having a
> less naïve statement-marking tactic could help the debugging
> experience
> without perturbing other consumers (although one still wonders
> whether
> profiles will get messed up in cases where e.g. a loop invariant gets
> hoisted out of a cold loop into a hot predecessor).
>
> When we are looking at a situation where two instructions are
> *merged* or
> *combined* into one, and the original two instructions had different
> source locations, that's a separate problem.  In that case there is
> no
> single correct source location for the new instruction, and typically
> erasing the source location will give a better debugging experience
> (also
> a less misleading profile).
Is there a reason why we must only have one location for every instruction? If
not, why not merge them and keep them all?

Not a requirement - of course we could keep them all with some kind of ordered
list and even potentially include a "this is the one we would've
picked" info (eg: the first one's the one we would pick today, if we
would've picked one rather than none) so we could be backwards compatible if
desired.

That would be a lot of engineering work to plumb through LLVM the notion of
multiple debug locations, I think.

I'm not sure how DWARF (or CodeView) and its consumers currently copes with
multiple locations - it's probably technically possible to describe using
the line table format (not sure if it's intentional/documented for that
purpose), but existing consumers might have to be fixed not to trip over it.

It'd certainly be cute/fun/nice to have the extra fidelity (though all extra
fidelity also comes at a size cost to the IR and the resulting object/executable
files).

Not sure anyone's in a position to sign up for that work right now - but
maybe someone is. (looks like Apple's making a bit of a push on optimized
debug info quality at the moment)

- David

 -Hal
>
> My personal opinion is that having sanitizers *rely* on debug info
> for
> accurate source attribution is just asking for trouble.  It happens
> to
> work at –O0 but cannot be considered reliable in the face of
> optimization.
> IMO this is a fundamental design flaw; debug info is best-effort and
> full
> of ambiguities, as shown above. Sanitizers need a more reliable
> source-of-truth, i.e. they should encode source info into their own
> instrumentation.
>
> --paulr
>
>
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161207/4a47538e/attachment.html>

Kostya Serebryany via llvm-dev

2016-Dec-07 21:44 UTC

head link

[llvm-dev] Debug Locations for Optimized Code

my 2c.

the sanitizers rely on debug info to produce human-readable error messages,
and I agree with Reid that it's unwise to have a parallel way of encoding
the source locations.

Well, we have something like this in the clang coverage already... Right?
(I never particularly liked this design decision).
But since the debug info is known to be unreliable it kind of made sense.
Grrr.
And since the coverage instrumentation is applied early (in clang) we can
do it.
asan/etc don't have this luxury.

The sanitizers do not actually rely hard on the correctness of debug info,
but lots of tests in compiler-rt expect the debug info to be sane.

If we break debug info in a way that affects the sanitizers two things may
happen:
a) some of the existing *san tests in compiler-rt will start failing.
That's usually easy to fix.
b) all tests will continue working but users will be getting less readable
reports -- and we will learn about it 6 months from the time of breakage.
That's less welcome, but I am not sure if we can do something here.

--kcc

On Wed, Dec 7, 2016 at 1:11 PM, Robinson, Paul via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Is there a reason why we must only have one location for every
> instruction? If not, why not merge them and keep them all?
>
>
> Not a requirement - of course we could keep them all with some kind of
> ordered list and even potentially include a "this is the one we
would've
> picked" info (eg: the first one's the one we would pick today, if
we
> would've picked one rather than none) so we could be backwards
compatible
> if desired.
>
> That would be a lot of engineering work to plumb through LLVM the notion
> of multiple debug locations, I think.
>
> I'm not sure how DWARF (or CodeView) and its consumers currently copes
> with multiple locations - it's probably technically possible to
describe
> using the line table format (not sure if it's intentional/documented
for
> that purpose), but existing consumers might have to be fixed not to trip
> over it.
>
> Technically the DWARF encoding of the line table does allow it, I've
seen
> it happen, but not with the intent of describing two real source locations;
> it was by accident.  (And was one of the things that prompted me to submit
> patch D27492.)  I seriously doubt any DWARF consumer takes the trouble to
> look for it.  It's really not clear how a debugger *should* respond to
> seeing two source locations for one instruction.
>
> --paulr
>
>
>
> *From:* David Blaikie [mailto:dblaikie at gmail.com]
> *Sent:* Wednesday, December 07, 2016 10:27 AM
> *To:* Hal Finkel; Robinson, Paul
> *Cc:* llvm-dev at lists.llvm.org
> *Subject:* Re: [llvm-dev] Debug Locations for Optimized Code
>
>
>
>
>
> On Wed, Dec 7, 2016 at 10:20 AM Hal Finkel <hfinkel at anl.gov>
wrote:
>
> ----- Original Message -----
> > From: "Paul Robinson" <paul.robinson at sony.com>
> > To: "Hal Finkel" <hfinkel at anl.gov>, "David
Blaikie" <dblaikie at gmail.com>
> > Cc: llvm-dev at lists.llvm.org
> > Sent: Wednesday, December 7, 2016 9:39:16 AM
> > Subject: RE: [llvm-dev] Debug Locations for Optimized Code
> >
> > >> I don't know what the right, if any, solution to this is
- but I
> > >> thought I should bring it up in case you or anyone else
wanted to
> > >> puzzle it over & see if the competing needs/desires might
need to
> > >> be
> > >> considered.
> > > One thing that I recall being discussed was changing the way that
> > > we
> > > set the is_stmt flag in the DWARF line-table information. As I
> > > understand it, we currently set this flag for the first
instruction
> > > in
> > > any sequence that is on the same line. This is, in part, why the
> > > debugger appears to jump around when stepping through code with
> > > speculated instructions, etc. If we did not do this for
> > > out-of-place
> > > instructions, then we might be able to keep for debugging
> > > information
> > > for tools while still providing a reasonable debugging
experience.
> >
> > When we are looking at a situation where an instruction is merely
> > *moved*
> > from one place to another, retaining the source location and having a
> > less naïve statement-marking tactic could help the debugging
> > experience
> > without perturbing other consumers (although one still wonders
> > whether
> > profiles will get messed up in cases where e.g. a loop invariant gets
> > hoisted out of a cold loop into a hot predecessor).
> >
> > When we are looking at a situation where two instructions are
> > *merged* or
> > *combined* into one, and the original two instructions had different
> > source locations, that's a separate problem.  In that case there
is
> > no
> > single correct source location for the new instruction, and typically
> > erasing the source location will give a better debugging experience
> > (also
> > a less misleading profile).
>
> Is there a reason why we must only have one location for every
> instruction? If not, why not merge them and keep them all?
>
>
> Not a requirement - of course we could keep them all with some kind of
> ordered list and even potentially include a "this is the one we
would've
> picked" info (eg: the first one's the one we would pick today, if
we
> would've picked one rather than none) so we could be backwards
compatible
> if desired.
>
> That would be a lot of engineering work to plumb through LLVM the notion
> of multiple debug locations, I think.
>
> I'm not sure how DWARF (or CodeView) and its consumers currently copes
> with multiple locations - it's probably technically possible to
describe
> using the line table format (not sure if it's intentional/documented
for
> that purpose), but existing consumers might have to be fixed not to trip
> over it.
>
> It'd certainly be cute/fun/nice to have the extra fidelity (though all
> extra fidelity also comes at a size cost to the IR and the resulting
> object/executable files).
>
> Not sure anyone's in a position to sign up for that work right now -
but
> maybe someone is. (looks like Apple's making a bit of a push on
optimized
> debug info quality at the moment)
>
> - David
>
>
>
>
>  -Hal
>
> >
> > My personal opinion is that having sanitizers *rely* on debug info
> > for
> > accurate source attribution is just asking for trouble.  It happens
> > to
> > work at –O0 but cannot be considered reliable in the face of
> > optimization.
> > IMO this is a fundamental design flaw; debug info is best-effort and
> > full
> > of ambiguities, as shown above. Sanitizers need a more reliable
> > source-of-truth, i.e. they should encode source info into their own
> > instrumentation.
> >
> > --paulr
> >
> >
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161207/19a10fac/attachment-0001.html>

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Dec 2016 - Debug Locations for Optimized Code

[llvm-dev] Debug Locations for Optimized Code

[llvm-dev] Debug Locations for Optimized Code

[llvm-dev] Debug Locations for Optimized Code

Apparently Analagous Threads