thr3ads.net - llvm dev - [llvm-dev] Debug Locations for Optimized Code [Dec 2016]

If this information is useful, please help other people find it:
Share via:

Robinson, Paul via llvm-dev

2016-Dec-07 17:01 UTC

[llvm-dev] Debug Locations for Optimized Code

I don't see how ASan and debuggers are different. It feels like both need
reasonably accurate source location attribution for any instruction. ASan just
happens to care more about loads and stores than interactive stepping debuggers.

Actually they are pretty different in their requirements.

ASan cares about *accurate* source location info for *specific* instructions,
the ones that do something ASan cares about.  The source attributions for any
other instruction is irrelevant to ASan.  The source attributions for these
instructions *must* survive optimization.

Debuggers care about *useful* source location info for *sets* of instructions,
i.e. the instructions related to some particular source statement.  If that set
is only 90% complete/accurate, instead of 100%, generally that doesn't
adversely affect the user experience.  If you step past statement A, and happen
to execute one or two instructions from the next statement B before you actually
stop, generally that is not important to the user.  Debuggers are able to
tolerate a moderate amount of slop in the source attributions, because absolute
accuracy is not critical to correct operation of the debugger.  This is why
optimizations can get away with dropping attributions that are difficult to
represent accurately.

ASan should be able to encode source info for just the instructions it cares
about, e.g. pass an index or other encoded representation to the RT calls. 
Being actual parameters, they will survive any correct optimization, unlike
today's situation where multiple calls might be merged by an optimization,
damaging the correctness of ASan reports.  (We've see this exact thing
happen.)  ASan does not need a line table mapping all instructions back to their
source; it needs a parameter at each call (more or less).  It does need a file
table, that's the main bit of redundancy with debug info that I see
happening.
--paulr

From: Reid Kleckner [mailto:rnk at google.com]
Sent: Wednesday, December 07, 2016 8:23 AM
To: Robinson, Paul
Cc: Hal Finkel; David Blaikie; llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] Debug Locations for Optimized Code

On Wed, Dec 7, 2016 at 7:39 AM, Robinson, Paul via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
When we are looking at a situation where an instruction is merely *moved*
from one place to another, retaining the source location and having a
less naïve statement-marking tactic could help the debugging experience
without perturbing other consumers (although one still wonders whether
profiles will get messed up in cases where e.g. a loop invariant gets
hoisted out of a cold loop into a hot predecessor).

When we are looking at a situation where two instructions are *merged* or
*combined* into one, and the original two instructions had different
source locations, that's a separate problem.  In that case there is no
single correct source location for the new instruction, and typically
erasing the source location will give a better debugging experience (also
a less misleading profile).

My personal opinion is that having sanitizers *rely* on debug info for
accurate source attribution is just asking for trouble.  It happens to
work at –O0 but cannot be considered reliable in the face of optimization.
IMO this is a fundamental design flaw; debug info is best-effort and full
of ambiguities, as shown above. Sanitizers need a more reliable
source-of-truth, i.e. they should encode source info into their own
instrumentation.

I don't see how ASan and debuggers are different. It feels like both need
reasonably accurate source location attribution for any instruction. ASan just
happens to care more about loads and stores than interactive stepping debuggers.

It really doesn't make sense for ASan to invent another mechanism to track
source location information. Any mechanism we build would be so redundant with
debug info that, as an implementation detail, we would find a way to make them
use the same storage when possible. With that in mind, maybe we should really
find a way to mark source locations as "hoisted" or "sunk"
so that we can suppress them from our line tables or do something else clever.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161207/103ae0c3/attachment.html>

Hal Finkel via llvm-dev

2016-Dec-07 18:19 UTC

head link

[llvm-dev] Debug Locations for Optimized Code

----- Original Message -----
> From: "Paul Robinson" <paul.robinson at sony.com>
> To: "Reid Kleckner" <rnk at google.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "David
Blaikie"
> <dblaikie at gmail.com>, llvm-dev at lists.llvm.org
> Sent: Wednesday, December 7, 2016 11:01:57 AM
> Subject: RE: [llvm-dev] Debug Locations for Optimized Code
> I don't see how ASan and debuggers are different. It feels like both
> need reasonably accurate source location attribution for any
> instruction. ASan just happens to care more about loads and stores
> than interactive stepping debuggers.
> Actually they are pretty different in their requirements.
> ASan cares about *accurate* source location info for *specific*
> instructions, the ones that do something ASan cares about. The
> source attributions for any other instruction is irrelevant to ASan.
> The source attributions for these instructions *must* survive
> optimization.
> Debuggers care about *useful* source location info for *sets* of
> instructions, i.e. the instructions related to some particular
> source statement. If that set is only 90% complete/accurate, instead
> of 100%, generally that doesn't adversely affect the user
> experience. If you step past statement A, and happen to execute one
> or two instructions from the next statement B before you actually
> stop, generally that is not important to the user. Debuggers are
> able to tolerate a moderate amount of slop in the source
> attributions, because absolute accuracy is not critical to correct
> operation of the debugger. This is why optimizations can get away
> with dropping attributions that are difficult to represent
> accurately.
> ASan should be able to encode source info for just the instructions
> it cares about, e.g. pass an index or other encoded representation
> to the RT calls. Being actual parameters, they will survive any
> correct optimization, unlike today's situation where multiple calls
> might be merged by an optimization, damaging the correctness of ASan
> reports. (We've see this exact thing happen.) ASan does not need a
> line table mapping all instructions back to their source; it needs a
> parameter at each call (more or less). It does need a file table,
> that's the main bit of redundancy with debug info that I see
> happening.I suspect that you misunderstand where ASan instrumentation is added. Unlike
UBSan, which is added by Clang during initial IR generation, ASan
instrumentation is added late (at the EP_OptimizerLast extension point). I
don't see any better way to get the location information at that point than
using the existing debug info.

-Hal 
> --paulr
> From: Reid Kleckner [mailto:rnk at google.com]
> Sent: Wednesday, December 07, 2016 8:23 AM
> To: Robinson, Paul
> Cc: Hal Finkel; David Blaikie; llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] Debug Locations for Optimized Code
> On Wed, Dec 7, 2016 at 7:39 AM, Robinson, Paul via llvm-dev <
> llvm-dev at lists.llvm.org > wrote:
> When we are looking at a situation where an instruction is merely
> *moved*
> from one place to another, retaining the source location and having a
> less naïve statement-marking tactic could help the debugging
> experience
> without perturbing other consumers (although one still wonders
> whether
> profiles will get messed up in cases where e.g. a loop invariant gets
> hoisted out of a cold loop into a hot predecessor).
> When we are looking at a situation where two instructions are
> *merged* or
> *combined* into one, and the original two instructions had different
> source locations, that's a separate problem. In that case there is no
> single correct source location for the new instruction, and typically
> erasing the source location will give a better debugging experience
> (also
> a less misleading profile).
> My personal opinion is that having sanitizers *rely* on debug info
> for
> accurate source attribution is just asking for trouble. It happens to
> work at –O0 but cannot be considered reliable in the face of
> optimization.
> IMO this is a fundamental design flaw; debug info is best-effort and
> full
> of ambiguities, as shown above. Sanitizers need a more reliable
> source-of-truth, i.e. they should encode source info into their own
> instrumentation.
> I don't see how ASan and debuggers are different. It feels like both
> need reasonably accurate source location attribution for any
> instruction. ASan just happens to care more about loads and stores
> than interactive stepping debuggers.
> It really doesn't make sense for ASan to invent another mechanism to
> track source location information. Any mechanism we build would be
> so redundant with debug info that, as an implementation detail, we
> would find a way to make them use the same storage when possible.
> With that in mind, maybe we should really find a way to mark source
> locations as "hoisted" or "sunk" so that we can
suppress them from
> our line tables or do something else clever.-- 

Hal Finkel 
Lead, Compiler Technology and Programming Languages 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161207/e71b62f1/attachment.html>

David Blaikie via llvm-dev

2016-Dec-07 18:22 UTC

head link

[llvm-dev] Debug Locations for Optimized Code

On Wed, Dec 7, 2016 at 9:02 AM Robinson, Paul <paul.robinson at sony.com>
wrote:
> I don't see how ASan and debuggers are different. It feels like both
need
> reasonably accurate source location attribution for any instruction. ASan
> just happens to care more about loads and stores than interactive stepping
> debuggers.
>
>
>
> Actually they are pretty different in their requirements.
>
I think they're closer than they appear below.

> ASan cares about *accurate* source location info for *specific*
> instructions, the ones that do something ASan cares about.  The source
> attributions for any other instruction is irrelevant to ASan.  The source
> attributions for these instructions *must* survive optimization.
>
Kostya can correct me if I'm wrong - but I don't believe there's a
requirement that the must survive anymore than debug info locations.

I believe the sanitizers run on similar requirements about impact on
optimizations - they probably don't want to adversely perturb optimizations
by adding a more strict location tracking system that was undroppable
(maybe I'm wrong here) like intrinsics. I think this is perhaps the
critical point - if ASan has the same "don't mess with
optimization"
requirement as debug info, and it needs high accuracy, it can be no higher
than debug info /can/ be (even if it's not that accurate now). If that's
the case, then we should endeavor to make debug info (if only for the
instructions ASan cares about) as accurate ASan needs, and that benefits
all debug info consumers.

Now, if there's a competing need for what information (as I brought up in
this thread) hopefully we can have a conversation about what those
competing needs look like - how to address them (if we can reconcile the
different needs, or need different tuning mode, etc).

> Debuggers care about *useful* source location info for *sets* of
> instructions, i.e. the instructions related to some particular source
> statement.  If that set is only 90% complete/accurate, instead of 100%,
> generally that doesn't adversely affect the user experience.  If you
step
> past statement A, and happen to execute one or two instructions from the
> next statement B before you actually stop, generally that is not important
> to the user.  Debuggers are able to tolerate a moderate amount of slop in
> the source attributions, because absolute accuracy is not critical to
> correct operation of the debugger.  This is why optimizations can get away
> with dropping attributions that are difficult to represent accurately.
>
>
>
> ASan should be able to encode source info for just the instructions it
> cares about, e.g. pass an index or other encoded representation to the RT
> calls.  Being actual parameters, they will survive any correct
> optimization, unlike today's situation where multiple calls might be
merged
> by an optimization, damaging the correctness of ASan reports.  (We've
see
> this exact thing happen.)  ASan does not need a line table mapping all
> instructions back to their source; it needs a parameter at each call (more
> or less).  It does need a file table, that's the main bit of redundancy
> with debug info that I see happening.
>
> --paulr
>
>
>
> *From:* Reid Kleckner [mailto:rnk at google.com]
> *Sent:* Wednesday, December 07, 2016 8:23 AM
> *To:* Robinson, Paul
> *Cc:* Hal Finkel; David Blaikie; llvm-dev at lists.llvm.org
> *Subject:* Re: [llvm-dev] Debug Locations for Optimized Code
>
>
>
> On Wed, Dec 7, 2016 at 7:39 AM, Robinson, Paul via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> When we are looking at a situation where an instruction is merely *moved*
> from one place to another, retaining the source location and having a
> less naïve statement-marking tactic could help the debugging experience
> without perturbing other consumers (although one still wonders whether
> profiles will get messed up in cases where e.g. a loop invariant gets
> hoisted out of a cold loop into a hot predecessor).
>
> When we are looking at a situation where two instructions are *merged* or
> *combined* into one, and the original two instructions had different
> source locations, that's a separate problem.  In that case there is no
> single correct source location for the new instruction, and typically
> erasing the source location will give a better debugging experience (also
> a less misleading profile).
>
> My personal opinion is that having sanitizers *rely* on debug info for
> accurate source attribution is just asking for trouble.  It happens to
> work at –O0 but cannot be considered reliable in the face of optimization.
> IMO this is a fundamental design flaw; debug info is best-effort and full
> of ambiguities, as shown above. Sanitizers need a more reliable
> source-of-truth, i.e. they should encode source info into their own
> instrumentation.
>
>
>
> I don't see how ASan and debuggers are different. It feels like both
need
> reasonably accurate source location attribution for any instruction. ASan
> just happens to care more about loads and stores than interactive stepping
> debuggers.
>
>
>
> It really doesn't make sense for ASan to invent another mechanism to
track
> source location information. Any mechanism we build would be so redundant
> with debug info that, as an implementation detail, we would find a way to
> make them use the same storage when possible. With that in mind, maybe we
> should really find a way to mark source locations as "hoisted" or
"sunk" so
> that we can suppress them from our line tables or do something else clever.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161207/329a03cf/attachment.html>

Philip Reames via llvm-dev

2016-Dec-07 18:35 UTC

head link

[llvm-dev] Debug Locations for Optimized Code

FYI, if we do end up deciding that asan needs a stronger guarantee than 
debug info can provide, we have another mechanism in tree which is 
available for this purpose. Operand bundles can be associated with a 
callsite today to provide a guaranteed side table of value locations at 
runtime.  We use the "deopt" bundle type for exactly this purpose and 
it's explicitly part of the design to be stronger than debug info and 
accept the performance impact that implies while trying to minimize it 
as much as possible.  We might have to extend the notion of operand 
bundles to other instruction types, but the fundamental mechanism is 
already in the IR.

Philip

On 12/07/2016 09:01 AM, Robinson, Paul via llvm-dev
wrote:>
> I don't see how ASan and debuggers are different. It feels like both 
> need reasonably accurate source location attribution for any 
> instruction. ASan just happens to care more about loads and stores 
> than interactive stepping debuggers.
>
> Actually they are pretty different in their requirements.
>
> ASan cares about *accurate* source location info for *specific* 
> instructions, the ones that do something ASan cares about. The source 
> attributions for any other instruction is irrelevant to ASan.  The 
> source attributions for these instructions *must* survive optimization.
>
> Debuggers care about *useful* source location info for *sets* of 
> instructions, i.e. the instructions related to some particular source 
> statement.  If that set is only 90% complete/accurate, instead of 
> 100%, generally that doesn't adversely affect the user experience.  If 
> you step past statement A, and happen to execute one or two 
> instructions from the next statement B before you actually stop, 
> generally that is not important to the user.  Debuggers are able to 
> tolerate a moderate amount of slop in the source attributions, because 
> absolute accuracy is not critical to correct operation of the 
> debugger.  This is why optimizations can get away with dropping 
> attributions that are difficult to represent accurately.
>
> ASan should be able to encode source info for just the instructions it 
> cares about, e.g. pass an index or other encoded representation to the 
> RT calls.  Being actual parameters, they will survive any correct 
> optimization, unlike today's situation where multiple calls might be 
> merged by an optimization, damaging the correctness of ASan reports.  
> (We've see this exact thing happen.)  ASan does not need a line table 
> mapping all instructions back to their source; it needs a parameter at 
> each call (more or less). It does need a file table, that's the main 
> bit of redundancy with debug info that I see happening.
>
> --paulr
>
> *From:*Reid Kleckner [mailto:rnk at google.com]
> *Sent:* Wednesday, December 07, 2016 8:23 AM
> *To:* Robinson, Paul
> *Cc:* Hal Finkel; David Blaikie; llvm-dev at lists.llvm.org
> *Subject:* Re: [llvm-dev] Debug Locations for Optimized Code
>
> On Wed, Dec 7, 2016 at 7:39 AM, Robinson, Paul via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
> When we are looking at a situation where an instruction is merely *moved*
> from one place to another, retaining the source location and having a
> less naïve statement-marking tactic could help the debugging experience
> without perturbing other consumers (although one still wonders whether
> profiles will get messed up in cases where e.g. a loop invariant gets
> hoisted out of a cold loop into a hot predecessor).
>
> When we are looking at a situation where two instructions are *merged* or
> *combined* into one, and the original two instructions had different
> source locations, that's a separate problem.  In that case there is no
> single correct source location for the new instruction, and typically
> erasing the source location will give a better debugging experience (also
> a less misleading profile).
>
> My personal opinion is that having sanitizers *rely* on debug info for
> accurate source attribution is just asking for trouble.  It happens to
> work at –O0 but cannot be considered reliable in the face of optimization.
> IMO this is a fundamental design flaw; debug info is best-effort and full
> of ambiguities, as shown above. Sanitizers need a more reliable
> source-of-truth, i.e. they should encode source info into their own
> instrumentation.
>
> I don't see how ASan and debuggers are different. It feels like both 
> need reasonably accurate source location attribution for any 
> instruction. ASan just happens to care more about loads and stores 
> than interactive stepping debuggers.
>
> It really doesn't make sense for ASan to invent another mechanism to 
> track source location information. Any mechanism we build would be so 
> redundant with debug info that, as an implementation detail, we would 
> find a way to make them use the same storage when possible. With that 
> in mind, maybe we should really find a way to mark source locations as 
> "hoisted" or "sunk" so that we can suppress them from
our line tables
> or do something else clever.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161207/f53c386f/attachment.html>

Robinson, Paul via llvm-dev

2016-Dec-07 21:03 UTC

head link

[llvm-dev] Debug Locations for Optimized Code

I suspect that you misunderstand where ASan instrumentation is added. Unlike
UBSan, which is added by Clang during initial IR generation, ASan
instrumentation is added late (at the EP_OptimizerLast extension point).

You are correct, I did not know that.

I don't see any better way to get the location information at that point
than using the existing debug info.

Let us distinguish between information carried around in metadata, and
information emitted to the object file.
Seems like it would be completely feasible to flag a DebugLoc instance as
"do not emit" and yet retain it in the metadata, rather than erasing
it (overwriting it with DebugLoc()).  Then the ASan instrumentation can extract
the file/line info from it, while we still decline to generate it in the DWARF
line table.  ASan gets the info it needs, the debugger doesn't get the
information that will distress the user.

Another possibility is to flag the DebugLoc for a moved/combined instruction as
"not a statement" and cause this flag to suppress the DWARF
"is_stmt" flag.  That idea would need a little more baking but feels
like it could have potential.  Some debuggers might have to learn to pay
attention to the flag, but that's the debugger's problem.
--paulr

From: Hal Finkel [mailto:hfinkel at anl.gov]
Sent: Wednesday, December 07, 2016 10:19 AM
To: Robinson, Paul
Cc: David Blaikie; llvm-dev at lists.llvm.org; Reid Kleckner
Subject: Re: [llvm-dev] Debug Locations for Optimized Code


________________________________
From: "Paul Robinson" <paul.robinson at
sony.com<mailto:paul.robinson at sony.com>>
To: "Reid Kleckner" <rnk at google.com<mailto:rnk at
google.com>>
Cc: "Hal Finkel" <hfinkel at anl.gov<mailto:hfinkel at
anl.gov>>, "David Blaikie" <dblaikie at
gmail.com<mailto:dblaikie at gmail.com>>, llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Sent: Wednesday, December 7, 2016 11:01:57 AM
Subject: RE: [llvm-dev] Debug Locations for Optimized Code
I don't see how ASan and debuggers are different. It feels like both need
reasonably accurate source location attribution for any instruction. ASan just
happens to care more about loads and stores than interactive stepping debuggers.

Actually they are pretty different in their requirements.

ASan cares about *accurate* source location info for *specific* instructions,
the ones that do something ASan cares about.  The source attributions for any
other instruction is irrelevant to ASan.  The source attributions for these
instructions *must* survive optimization.

Debuggers care about *useful* source location info for *sets* of instructions,
i.e. the instructions related to some particular source statement.  If that set
is only 90% complete/accurate, instead of 100%, generally that doesn't
adversely affect the user experience.  If you step past statement A, and happen
to execute one or two instructions from the next statement B before you actually
stop, generally that is not important to the user.  Debuggers are able to
tolerate a moderate amount of slop in the source attributions, because absolute
accuracy is not critical to correct operation of the debugger.  This is why
optimizations can get away with dropping attributions that are difficult to
represent accurately.

ASan should be able to encode source info for just the instructions it cares
about, e.g. pass an index or other encoded representation to the RT calls. 
Being actual parameters, they will survive any correct optimization, unlike
today's situation where multiple calls might be merged by an optimization,
damaging the correctness of ASan reports.  (We've see this exact thing
happen.)  ASan does not need a line table mapping all instructions back to their
source; it needs a parameter at each call (more or less).  It does need a file
table, that's the main bit of redundancy with debug info that I see
happening.
I suspect that you misunderstand where ASan instrumentation is added. Unlike
UBSan, which is added by Clang during initial IR generation, ASan
instrumentation is added late (at the EP_OptimizerLast extension point). I
don't see any better way to get the location information at that point than
using the existing debug info.

 -Hal
--paulr

From: Reid Kleckner [mailto:rnk at google.com]
Sent: Wednesday, December 07, 2016 8:23 AM
To: Robinson, Paul
Cc: Hal Finkel; David Blaikie; llvm-dev at lists.llvm.org<mailto:llvm-dev at
lists.llvm.org>
Subject: Re: [llvm-dev] Debug Locations for Optimized Code

On Wed, Dec 7, 2016 at 7:39 AM, Robinson, Paul via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
When we are looking at a situation where an instruction is merely *moved*
from one place to another, retaining the source location and having a
less naïve statement-marking tactic could help the debugging experience
without perturbing other consumers (although one still wonders whether
profiles will get messed up in cases where e.g. a loop invariant gets
hoisted out of a cold loop into a hot predecessor).

When we are looking at a situation where two instructions are *merged* or
*combined* into one, and the original two instructions had different
source locations, that's a separate problem.  In that case there is no
single correct source location for the new instruction, and typically
erasing the source location will give a better debugging experience (also
a less misleading profile).

My personal opinion is that having sanitizers *rely* on debug info for
accurate source attribution is just asking for trouble.  It happens to
work at –O0 but cannot be considered reliable in the face of optimization.
IMO this is a fundamental design flaw; debug info is best-effort and full
of ambiguities, as shown above. Sanitizers need a more reliable
source-of-truth, i.e. they should encode source info into their own
instrumentation.

I don't see how ASan and debuggers are different. It feels like both need
reasonably accurate source location attribution for any instruction. ASan just
happens to care more about loads and stores than interactive stepping debuggers.

It really doesn't make sense for ASan to invent another mechanism to track
source location information. Any mechanism we build would be so redundant with
debug info that, as an implementation detail, we would find a way to make them
use the same storage when possible. With that in mind, maybe we should really
find a way to mark source locations as "hoisted" or "sunk"
so that we can suppress them from our line tables or do something else clever.



--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161207/ac0e934d/attachment.html>

llvm dev - Dec 2016 - Debug Locations for Optimized Code

[llvm-dev] Debug Locations for Optimized Code

[llvm-dev] Debug Locations for Optimized Code

[llvm-dev] Debug Locations for Optimized Code

[llvm-dev] Debug Locations for Optimized Code

[llvm-dev] Debug Locations for Optimized Code