thr3ads.net - llvm dev - [llvm-dev] Debug Locations for Optimized Code [Dec 2016]

If this information is useful, please help other people find it:
Share via:

Robinson, Paul via llvm-dev

2016-Dec-07 15:39 UTC

[llvm-dev] Debug Locations for Optimized Code

>> I don't know what the right, if any, solution to this is - but I
>> thought I should bring it up in case you or anyone else wanted to
>> puzzle it over & see if the competing needs/desires might need to
be
>> considered.
> One thing that I recall being discussed was changing the way that we
> set the is_stmt flag in the DWARF line-table information. As I
> understand it, we currently set this flag for the first instruction in
> any sequence that is on the same line. This is, in part, why the
> debugger appears to jump around when stepping through code with
> speculated instructions, etc. If we did not do this for out-of-place
> instructions, then we might be able to keep for debugging information
> for tools while still providing a reasonable debugging experience.
When we are looking at a situation where an instruction is merely *moved*
from one place to another, retaining the source location and having a 
less naïve statement-marking tactic could help the debugging experience 
without perturbing other consumers (although one still wonders whether 
profiles will get messed up in cases where e.g. a loop invariant gets 
hoisted out of a cold loop into a hot predecessor).

When we are looking at a situation where two instructions are *merged* or
*combined* into one, and the original two instructions had different 
source locations, that's a separate problem.  In that case there is no 
single correct source location for the new instruction, and typically 
erasing the source location will give a better debugging experience (also 
a less misleading profile).

My personal opinion is that having sanitizers *rely* on debug info for 
accurate source attribution is just asking for trouble.  It happens to 
work at –O0 but cannot be considered reliable in the face of optimization.
IMO this is a fundamental design flaw; debug info is best-effort and full 
of ambiguities, as shown above. Sanitizers need a more reliable 
source-of-truth, i.e. they should encode source info into their own 
instrumentation.

--paulr

Reid Kleckner via llvm-dev

2016-Dec-07 16:22 UTC

head link

[llvm-dev] Debug Locations for Optimized Code

On Wed, Dec 7, 2016 at 7:39 AM, Robinson, Paul via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> When we are looking at a situation where an instruction is merely *moved*
> from one place to another, retaining the source location and having a
> less naïve statement-marking tactic could help the debugging experience
> without perturbing other consumers (although one still wonders whether
> profiles will get messed up in cases where e.g. a loop invariant gets
> hoisted out of a cold loop into a hot predecessor).
>
> When we are looking at a situation where two instructions are *merged* or
> *combined* into one, and the original two instructions had different
> source locations, that's a separate problem.  In that case there is no
> single correct source location for the new instruction, and typically
> erasing the source location will give a better debugging experience (also
> a less misleading profile).
>
> My personal opinion is that having sanitizers *rely* on debug info for
> accurate source attribution is just asking for trouble.  It happens to
> work at –O0 but cannot be considered reliable in the face of optimization.
> IMO this is a fundamental design flaw; debug info is best-effort and full
> of ambiguities, as shown above. Sanitizers need a more reliable
> source-of-truth, i.e. they should encode source info into their own
> instrumentation.
>
I don't see how ASan and debuggers are different. It feels like both need
reasonably accurate source location attribution for any instruction. ASan
just happens to care more about loads and stores than interactive stepping
debuggers.

It really doesn't make sense for ASan to invent another mechanism to track
source location information. Any mechanism we build would be so redundant
with debug info that, as an implementation detail, we would find a way to
make them use the same storage when possible. With that in mind, maybe we
should really find a way to mark source locations as "hoisted" or
"sunk" so
that we can suppress them from our line tables or do something else clever.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<lists.llvm.org/pipermail/llvm-dev/attachments/20161207/ce737e6c/attachment.html>

Robinson, Paul via llvm-dev

2016-Dec-07 17:01 UTC

head link

[llvm-dev] Debug Locations for Optimized Code

I don't see how ASan and debuggers are different. It feels like both need
reasonably accurate source location attribution for any instruction. ASan just
happens to care more about loads and stores than interactive stepping debuggers.

Actually they are pretty different in their requirements.

ASan cares about *accurate* source location info for *specific* instructions,
the ones that do something ASan cares about.  The source attributions for any
other instruction is irrelevant to ASan.  The source attributions for these
instructions *must* survive optimization.

Debuggers care about *useful* source location info for *sets* of instructions,
i.e. the instructions related to some particular source statement.  If that set
is only 90% complete/accurate, instead of 100%, generally that doesn't
adversely affect the user experience.  If you step past statement A, and happen
to execute one or two instructions from the next statement B before you actually
stop, generally that is not important to the user.  Debuggers are able to
tolerate a moderate amount of slop in the source attributions, because absolute
accuracy is not critical to correct operation of the debugger.  This is why
optimizations can get away with dropping attributions that are difficult to
represent accurately.

ASan should be able to encode source info for just the instructions it cares
about, e.g. pass an index or other encoded representation to the RT calls. 
Being actual parameters, they will survive any correct optimization, unlike
today's situation where multiple calls might be merged by an optimization,
damaging the correctness of ASan reports.  (We've see this exact thing
happen.)  ASan does not need a line table mapping all instructions back to their
source; it needs a parameter at each call (more or less).  It does need a file
table, that's the main bit of redundancy with debug info that I see
happening.
--paulr

From: Reid Kleckner [mailto:rnk at google.com]
Sent: Wednesday, December 07, 2016 8:23 AM
To: Robinson, Paul
Cc: Hal Finkel; David Blaikie; llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] Debug Locations for Optimized Code

On Wed, Dec 7, 2016 at 7:39 AM, Robinson, Paul via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
When we are looking at a situation where an instruction is merely *moved*
from one place to another, retaining the source location and having a
less naïve statement-marking tactic could help the debugging experience
without perturbing other consumers (although one still wonders whether
profiles will get messed up in cases where e.g. a loop invariant gets
hoisted out of a cold loop into a hot predecessor).

When we are looking at a situation where two instructions are *merged* or
*combined* into one, and the original two instructions had different
source locations, that's a separate problem.  In that case there is no
single correct source location for the new instruction, and typically
erasing the source location will give a better debugging experience (also
a less misleading profile).

My personal opinion is that having sanitizers *rely* on debug info for
accurate source attribution is just asking for trouble.  It happens to
work at –O0 but cannot be considered reliable in the face of optimization.
IMO this is a fundamental design flaw; debug info is best-effort and full
of ambiguities, as shown above. Sanitizers need a more reliable
source-of-truth, i.e. they should encode source info into their own
instrumentation.

I don't see how ASan and debuggers are different. It feels like both need
reasonably accurate source location attribution for any instruction. ASan just
happens to care more about loads and stores than interactive stepping debuggers.

It really doesn't make sense for ASan to invent another mechanism to track
source location information. Any mechanism we build would be so redundant with
debug info that, as an implementation detail, we would find a way to make them
use the same storage when possible. With that in mind, maybe we should really
find a way to mark source locations as "hoisted" or "sunk"
so that we can suppress them from our line tables or do something else clever.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<lists.llvm.org/pipermail/llvm-dev/attachments/20161207/103ae0c3/attachment.html>

Hal Finkel via llvm-dev

2016-Dec-07 18:19 UTC

head link

[llvm-dev] Debug Locations for Optimized Code

----- Original Message -----> From: "Paul Robinson" <paul.robinson at sony.com>
> To: "Hal Finkel" <hfinkel at anl.gov>, "David
Blaikie" <dblaikie at gmail.com>
> Cc: llvm-dev at lists.llvm.org
> Sent: Wednesday, December 7, 2016 9:39:16 AM
> Subject: RE: [llvm-dev] Debug Locations for Optimized Code
> 
> >> I don't know what the right, if any, solution to this is - but
I
> >> thought I should bring it up in case you or anyone else wanted to
> >> puzzle it over & see if the competing needs/desires might need
to
> >> be
> >> considered.
> > One thing that I recall being discussed was changing the way that
> > we
> > set the is_stmt flag in the DWARF line-table information. As I
> > understand it, we currently set this flag for the first instruction
> > in
> > any sequence that is on the same line. This is, in part, why the
> > debugger appears to jump around when stepping through code with
> > speculated instructions, etc. If we did not do this for
> > out-of-place
> > instructions, then we might be able to keep for debugging
> > information
> > for tools while still providing a reasonable debugging experience.
> 
> When we are looking at a situation where an instruction is merely
> *moved*
> from one place to another, retaining the source location and having a
> less naïve statement-marking tactic could help the debugging
> experience
> without perturbing other consumers (although one still wonders
> whether
> profiles will get messed up in cases where e.g. a loop invariant gets
> hoisted out of a cold loop into a hot predecessor).
> 
> When we are looking at a situation where two instructions are
> *merged* or
> *combined* into one, and the original two instructions had different
> source locations, that's a separate problem.  In that case there is
> no
> single correct source location for the new instruction, and typically
> erasing the source location will give a better debugging experience
> (also
> a less misleading profile).
Is there a reason why we must only have one location for every instruction? If
not, why not merge them and keep them all?

 -Hal
> 
> My personal opinion is that having sanitizers *rely* on debug info
> for
> accurate source attribution is just asking for trouble.  It happens
> to
> work at –O0 but cannot be considered reliable in the face of
> optimization.
> IMO this is a fundamental design flaw; debug info is best-effort and
> full
> of ambiguities, as shown above. Sanitizers need a more reliable
> source-of-truth, i.e. they should encode source info into their own
> instrumentation.
> 
> --paulr
> 
> 
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

David Blaikie via llvm-dev

2016-Dec-07 18:26 UTC

head link

[llvm-dev] Debug Locations for Optimized Code

On Wed, Dec 7, 2016 at 10:20 AM Hal Finkel <hfinkel at anl.gov> wrote:
> ----- Original Message -----
> > From: "Paul Robinson" <paul.robinson at sony.com>
> > To: "Hal Finkel" <hfinkel at anl.gov>, "David
Blaikie" <dblaikie at gmail.com>
> > Cc: llvm-dev at lists.llvm.org
> > Sent: Wednesday, December 7, 2016 9:39:16 AM
> > Subject: RE: [llvm-dev] Debug Locations for Optimized Code
> >
> > >> I don't know what the right, if any, solution to this is
- but I
> > >> thought I should bring it up in case you or anyone else
wanted to
> > >> puzzle it over & see if the competing needs/desires might
need to
> > >> be
> > >> considered.
> > > One thing that I recall being discussed was changing the way that
> > > we
> > > set the is_stmt flag in the DWARF line-table information. As I
> > > understand it, we currently set this flag for the first
instruction
> > > in
> > > any sequence that is on the same line. This is, in part, why the
> > > debugger appears to jump around when stepping through code with
> > > speculated instructions, etc. If we did not do this for
> > > out-of-place
> > > instructions, then we might be able to keep for debugging
> > > information
> > > for tools while still providing a reasonable debugging
experience.
> >
> > When we are looking at a situation where an instruction is merely
> > *moved*
> > from one place to another, retaining the source location and having a
> > less naïve statement-marking tactic could help the debugging
> > experience
> > without perturbing other consumers (although one still wonders
> > whether
> > profiles will get messed up in cases where e.g. a loop invariant gets
> > hoisted out of a cold loop into a hot predecessor).
> >
> > When we are looking at a situation where two instructions are
> > *merged* or
> > *combined* into one, and the original two instructions had different
> > source locations, that's a separate problem.  In that case there
is
> > no
> > single correct source location for the new instruction, and typically
> > erasing the source location will give a better debugging experience
> > (also
> > a less misleading profile).
>
> Is there a reason why we must only have one location for every
> instruction? If not, why not merge them and keep them all?
>
Not a requirement - of course we could keep them all with some kind of
ordered list and even potentially include a "this is the one we
would've
picked" info (eg: the first one's the one we would pick today, if we
would've picked one rather than none) so we could be backwards compatible
if desired.

That would be a lot of engineering work to plumb through LLVM the notion of
multiple debug locations, I think.

I'm not sure how DWARF (or CodeView) and its consumers currently copes with
multiple locations - it's probably technically possible to describe using
the line table format (not sure if it's intentional/documented for that
purpose), but existing consumers might have to be fixed not to trip over it.

It'd certainly be cute/fun/nice to have the extra fidelity (though all
extra fidelity also comes at a size cost to the IR and the resulting
object/executable files).

Not sure anyone's in a position to sign up for that work right now - but
maybe someone is. (looks like Apple's making a bit of a push on optimized
debug info quality at the moment)

- David

>
>  -Hal
>
> >
> > My personal opinion is that having sanitizers *rely* on debug info
> > for
> > accurate source attribution is just asking for trouble.  It happens
> > to
> > work at –O0 but cannot be considered reliable in the face of
> > optimization.
> > IMO this is a fundamental design flaw; debug info is best-effort and
> > full
> > of ambiguities, as shown above. Sanitizers need a more reliable
> > source-of-truth, i.e. they should encode source info into their own
> > instrumentation.
> >
> > --paulr
> >
> >
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<lists.llvm.org/pipermail/llvm-dev/attachments/20161207/fefaefe6/attachment.html>

Andrea Di Biagio via llvm-dev

2016-Dec-15 15:05 UTC

head link

[llvm-dev] Debug Locations for Optimized Code

On Wed, Dec 7, 2016 at 3:39 PM, Robinson, Paul via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> >> I don't know what the right, if any, solution to this is - but
I
> >> thought I should bring it up in case you or anyone else wanted to
> >> puzzle it over & see if the competing needs/desires might need
to be
> >> considered.
> > One thing that I recall being discussed was changing the way that we
> > set the is_stmt flag in the DWARF line-table information. As I
> > understand it, we currently set this flag for the first instruction in
> > any sequence that is on the same line. This is, in part, why the
> > debugger appears to jump around when stepping through code with
> > speculated instructions, etc. If we did not do this for out-of-place
> > instructions, then we might be able to keep for debugging information
> > for tools while still providing a reasonable debugging experience.
>
> When we are looking at a situation where an instruction is merely *moved*
> from one place to another, retaining the source location and having a
> less naïve statement-marking tactic could help the debugging experience
> without perturbing other consumers (although one still wonders whether
> profiles will get messed up in cases where e.g. a loop invariant gets
> hoisted out of a cold loop into a hot predecessor).
>
It would be nice to have a way to mark the debugloc of an instruction which
has been hoisted or sunk. Marked locations would not be treated as
"reccomended breakpoint locations". That means, we could take
advantage of
that bit of information to decide how to emit the 'is_stmt' flag.
For the purpose of sample pgo, is_stmt=0 would not be enough. So, it would
be nice if we could encode that information within the discriminator. For
example, we could emit a 'special' discriminator to notify consumers
(example: autofdo) that the location should not be used for the purpose of
pgo. This is obviously just an idea; not sure whether it makes sense, nor
if it would negatively affect this RFC: lists.llvm.org
pipermail/llvm-dev/2016-October/106532.html.

> When we are looking at a situation where two instructions are *merged* or
> *combined* into one, and the original two instructions had different
> source locations, that's a separate problem.  In that case there is no
> single correct source location for the new instruction, and typically
> erasing the source location will give a better debugging experience (also
> a less misleading profile).
>
For the record: revision 289661 (llvm.org/viewvc/llvm-
project?view=revision&revision=289661) added a new API to obtain a
merged/combined location from multiple source locations. That said, the new
functionality added at that revision is just a stub. So, it is still
unclear at the moment how we could 'use' that in future.


-Andrea
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<lists.llvm.org/pipermail/llvm-dev/attachments/20161215/0609d054/attachment.html>

Hal Finkel via llvm-dev

2016-Dec-15 15:31 UTC

head link

[llvm-dev] Debug Locations for Optimized Code

----- Original Message -----
> From: "Andrea Di Biagio" <andrea.dibiagio at gmail.com>
> To: "Paul Robinson" <paul.robinson at sony.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "David
Blaikie"
> <dblaikie at gmail.com>, llvm-dev at lists.llvm.org
> Sent: Thursday, December 15, 2016 9:05:00 AM
> Subject: Re: [llvm-dev] Debug Locations for Optimized Code
> On Wed, Dec 7, 2016 at 3:39 PM, Robinson, Paul via llvm-dev <
> llvm-dev at lists.llvm.org > wrote:
> > >> I don't know what the right, if any, solution to this is
- but I
> 
> > >> thought I should bring it up in case you or anyone else
wanted
> > >> to
> 
> > >> puzzle it over & see if the competing needs/desires might
need
> > >> to
> > >> be
> 
> > >> considered.
> 
> > > One thing that I recall being discussed was changing the way that
> > > we
> 
> > > set the is_stmt flag in the DWARF line-table information. As I
> 
> > > understand it, we currently set this flag for the first
> > > instruction
> > > in
> 
> > > any sequence that is on the same line. This is, in part, why the
> 
> > > debugger appears to jump around when stepping through code with
> 
> > > speculated instructions, etc. If we did not do this for
> > > out-of-place
> 
> > > instructions, then we might be able to keep for debugging
> > > information
> 
> > > for tools while still providing a reasonable debugging
> > > experience.
> 
> > When we are looking at a situation where an instruction is merely
> > *moved*
> 
> > from one place to another, retaining the source location and having
> > a
> 
> > less naïve statement-marking tactic could help the debugging
> > experience
> 
> > without perturbing other consumers (although one still wonders
> > whether
> 
> > profiles will get messed up in cases where e.g. a loop invariant
> > gets
> 
> > hoisted out of a cold loop into a hot predecessor).
> 
> It would be nice to have a way to mark the debugloc of an instruction
> which has been hoisted or sunk. Marked locations would not be
> treated as "reccomended breakpoint locations". That means, we
could
> take advantage of that bit of information to decide how to emit the
> 'is_stmt' flag.
> For the purpose of sample pgo, is_stmt=0 would not be enough. So, it
> would be nice if we could encode that information within the
> discriminator. For example, we could emit a 'special' discriminator
> to notify consumers (example: autofdo) that the location should not
> be used for the purpose of pgo. This is obviously just an idea; not
> sure whether it makes sense, nor if it would negatively affect this
> RFC:
> lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html .
I think that you could use that scheme: just encode a duplication factor of
zero. Dehao, is that correct?

-Hal 
> > When we are looking at a situation where two instructions are
> > *merged* or
> 
> > *combined* into one, and the original two instructions had
> > different
> 
> > source locations, that's a separate problem. In that case there is
> > no
> 
> > single correct source location for the new instruction, and
> > typically
> 
> > erasing the source location will give a better debugging experience
> > (also
> 
> > a less misleading profile).
> 
> For the record: revision 289661 (
> llvm.org/viewvc/llvm-project?view=revision&revision=289661 )
> added a new API to obtain a merged/combined location from multiple
> source locations. That said, the new functionality added at that
> revision is just a stub. So, it is still unclear at the moment how
> we could 'use' that in future.
> -Andrea
-- 

Hal Finkel 
Lead, Compiler Technology and Programming Languages 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<lists.llvm.org/pipermail/llvm-dev/attachments/20161215/790fc35e/attachment.html>

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Dec 2016 - Debug Locations for Optimized Code

[llvm-dev] Debug Locations for Optimized Code

[llvm-dev] Debug Locations for Optimized Code

[llvm-dev] Debug Locations for Optimized Code

[llvm-dev] Debug Locations for Optimized Code

[llvm-dev] Debug Locations for Optimized Code

[llvm-dev] Debug Locations for Optimized Code

[llvm-dev] Debug Locations for Optimized Code

Reasonably Related Threads