thr3ads.net - llvm dev - [llvm-dev] [DebugInfo] A value-tracking variable location update [Nov 2020]

If this information is useful, please help other people find it:
Share via:

Jeremy Morse via llvm-dev

2020-Nov-06 18:26 UTC

[llvm-dev] [DebugInfo] A value-tracking variable location update

Hi debug-info folks,

Time for another update on the variable location "instruction
referencing"
implementation I've been doing, see this RFC [0, 1] for background. It's
now at
the point where I'd call it "done" (as far as software ever is),
and so it's a
good time to look at what results it produces. And here are the
scores-on-the-doors using llvm-locstats, on clang-3.4 RelWithDebInfo first in
"normal" mode and then with -Xclang
-fexperimental-debug-variable-locations.
"normal":

 ================================================     cov%           samples    
percentage(~)
 -------------------------------------------------
   0%               765406               22%
   (0%,10%)          45179                1%
   [10%,20%)         51699                1%
   [20%,30%)         52044                1%
   [30%,40%)         46905                1%
   [40%,50%)         48292                1%
   [50%,60%)         61342                1%
   [60%,70%)         58315                1%
   [70%,80%)         69848                2%
   [80%,90%)         81937                2%
   [90%,100%)       101384                2%
   100%            2032034               59%
 ================================================ -the number of debug variables
processed: 3414385
 -PC ranges covered: 61%
 -------------------------------------------------
 -total availability: 64%
 ================================================
With instruction referencing:

 ================================================     cov%           samples    
percentage(~)
 -------------------------------------------------
   0%               751201               21%
   (0%,10%)          40708                1%
   [10%,20%)         44909                1%
   [20%,30%)         47544                1%
   [30%,40%)         41630                1%
   [40%,50%)         42742                1%
   [50%,60%)         56692                1%
   [60%,70%)         53796                1%
   [70%,80%)         64476                1%
   [80%,90%)         73836                2%
   [90%,100%)        74423                2%
   100%            2123749               62%
 ================================================ -the number of debug variables
processed: 3415706
 -PC ranges covered: 68%
 -------------------------------------------------
 -total availability: 64%
 ================================================
The first observation: a significant increase in the byte-coverage statistic,
meaning that we're able to track variable locations for longer and across
more
code. This was one of the main aims of this work, having better tracking of
the locations that we know. The increase of seven percentage points includes an
additional two percentage points of entry-value locations. If we disable entry
value production then the scope-bytes-covered statistic moves from 59% to 64%,
which is still a decent improvement.

The next observation is that the ``total availability'' of variables
hasn't
changed. This isn't the fully story -- if you give an absolute name to every
variable with a location in the clang binary, there are 6949 dropped locations
and 22564 completely new locations, meaning roughly 1% of all variables in the
program have changed, it's just hidden by the statistics rounding. More
detail
on the nature of the changes are below. I was hoping for more false locations
to be dropped; it's quite likely that there are many more false locations
dropped within variables that have more than one value, which aren't readily
reflected in these statistics.

A natural question is: are all these new locations wrong, and the dropped
locations only dropped because of bugs? To address that, I picked 20 new
locations and 20 dropped locations at random and analysed why they happened.
The input samples can be found here [2], along with an llvm-reduce'd version
of
each IR file. I confirmed the reason for the new/dropped location in the
reduced and original file, as llvm-reducing them can alter the reason why
something is dropped or not. Of the new locations, we previously could not
track the location because:
 * 14 DBG_VALUEs come after the vreg operand is out of liveness and are dropped
   by LiveDebugVariables.
 * 2 DBG_VALUEs are out of liveness and dropped by RegisterCoalescing
   out of conservativeness.
 * 2 DBG_VALUEs that appear before their operand is defined. This is out of
   liveness, instruction referencing saves them through preserving debug
   use-before-defs.
 * 2 DBG_VALUEs that are out of liveness after a branch, but the value is live
   down the other branch path.

All of these locations can be tracked with instruction referencing because
liveness is not a consideration, only availability in physical registers. 19 of
the new locations were correct, while one tracked the right value but picked
the wrong location for it, which I've now got a patch for.

For the dropped locations:
 * 8 false locations are dropped, they used to refer to the wrong value because
   of a failure in register coalescing, see the body of [3].
 * 3 locations are un-necessarily dropped when different subregisters are
   merged together in register coalescing.
 * 3 locations are un-necessarily dropped due to conservative tracking of PHI
   values (the code in D86814, can be fixed with more C++).
 * 2 of the sample didn't actually have a dropped location; instead they
   preserved an undef debug instruction in early-taildup, and my scripts picked
   this up as dropping a location.
 * 2 locations aren't tracked by InstrRefBasedLDV through a block that's
   out of scope, meaning the location never covers instructions that are in
   scope. VarLocBasedLDV is vulerable to this too, but MachineSink can drop a
   DBG_VALUE on the far side of the scope gap, saving the location. See
   "Limitations" below.
 * 2 locations dropped during tail duplication: one in early-taildup which
   I haven't tried to address yet (see "Limitations"), one in late
taildup
   where a block containing only debug instructions isn't correctly
duplicated.

To summarise: all the new locations found were correct and not trackable by
DBG_VALUE variable-location tracking, although there are some bugs in picking
locations. Roughly half of the dropped locations are actual false locations,
the other half are due to unimplemented or limited handling of optimisations in
the instruction referencing code so far.

This pretty much fufils the objective of this work: we're able to save a lot
more variable locations through the register allocator because we don't have
to
be so conservative about liveness. Plus, the default behaviour of all
optimisations now is to _drop_ a variable location, as opposed to the existing
situation where after we leave SSA form, all bets are off.

Another question is how much this costs in compile time: a clang-3.4 build
using instruction referencing on my otherwise idle machine usually tracks
within 2% of a normal build. This is IMO expected given the larger amount of
debugging information being produced, and I haven't closely studied the
performance of a whole build using instruction referencing yet, so it'll
probably get better. A more recent change to InstrRefBasedLDV has added a big
slowdown though, so I'm going to skip reporting any performance results for
now.

Current situation
================
Some of this work has landed; I've got some patches up for review [4] that
implement the core parts. I also have a long tail of tweaks and
location-salvaging in a tree here [5] which just fleshes outs more optimisation
passes and installs bugfixes. (Commits there are not written to be human
consumable, alas). There are no fatal flaws in the design as far as I'm
aware,
although there are some annoyances (see "Limitations").

The biggest problem is that this all relies on a new LiveDebugValues
implementation that doesn't have sufficient test coverage yet, and is still
Somewhat Experimental (TM). Given the number of times an unpleasant performance
cliff has been found in VarLoc LiveDebugValues, it wants a long time to soak in
before being deployed.

Limitations
==========
Here's a non-exhaustive list of known problems. None of them are fatal IMO,
and have a small effect on variable availability:
 * Early tail duplication: like late tail duplication, this tears apart SSA
   information and can cause the same "Value" to be defined twice.
This is
   solvable using the SSAUpdater utility, which early-taildup already uses.
 * Attaching a debug instruction number to a COPY instruction is highly
   undesirable because the COPY doesn't actually define a value, it just
moves
   it between locations. At least one optimisation (X86 LEAtoMOV) transforms
   instructions into COPYs (LEA $rsp + 0 => COPY $rsp), which is unfortunate.
   This doesn't happen a lot though, and can be fixed by dropping a DBG_PHI
   of the COPYd register nearby. Plus it only happens post-regalloc, which
   makes it less of a problem.
 * Trivial def rematerialization: there's no pattern to rely on in how the
   register allocator rematerializes values, and so values can rematerialize
   in different registers dominating different parts of the CFG. It's hard
to
   track the variable location after that, because it has multiple values in
   the eyes of InstrRefBasedLDV. My preference would be, seeing how these defs
   are effectively constants, to have the target describe such trivial defs
   in a DIExpression. That avoids having to track the location of a constant
   that we already know.
 * As mentioned in the "missing" variable locations list, gaps in
lexical
   scopes can lead to locations not being propagated sufficiently far, a
   problem for both variable-location tracking solutions as documented in
   PR48091. However, using DBG_VALUEs to track variable locations can save a
   few of them because MachineSink can sink DBG_VALUEs over the scope gap,
   wheras instruction-referencing tries to rely on tracking debug
   use-before-defs which don't propagate across scope gaps. More on how to
   resolve this in PR48091.

Next Steps
=========
While this isn't ready for general use yet, it'd be great to get as much
as
possible into llvm-12 behind the -Xclang
-fexperimental-debug-variable-locations flag. That eases the path to testing
for consumers, which gives a greater chance of finding worst-case slowdowns in
advance of instruction referencing being generally available.

There's a decent amount of stuff under "Limitations" above that I
can address,
plus some performance profiling is still needed. I imagine the next best thing
to do is add support for GlobalISel and some non-X86 backends (certain
TargetInstrInfo hooks need to perform debug-info bookkeeping), which would make
this all more appetising.

[0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html
[1] http://lists.llvm.org/pipermail/llvm-dev/2020-June/142368.html
[2] https://github.com/jmorse/llvm-inst-ref-test-samples
[3] https://reviews.llvm.org/D86813
[4] https://reviews.llvm.org/D88898
[5]
https://github.com/jmorse/llvm-project/commit/0a702b967927d888bd222806252783359fc74d57

--
Thanks,
Jeremy

David Blaikie via llvm-dev

2020-Nov-06 19:10 UTC

head link

[llvm-dev] [DebugInfo] A value-tracking variable location update

Awesome to read how it's coming along - I'm mostly aside from the
debug location work, but had just one or two clarifying questions

On Fri, Nov 6, 2020 at 10:27 AM Jeremy Morse
<jeremy.morse.llvm at gmail.com> wrote:>
> Hi debug-info folks,
>
> Time for another update on the variable location "instruction
referencing"
> implementation I've been doing, see this RFC [0, 1] for background.
It's now at
> the point where I'd call it "done" (as far as software ever
is), and so it's a
> good time to look at what results it produces. And here are the
> scores-on-the-doors using llvm-locstats, on clang-3.4 RelWithDebInfo first
in
> "normal" mode and then with -Xclang
-fexperimental-debug-variable-locations.
> "normal":
>
>  ================================================>      cov%          
samples         percentage(~)
>  -------------------------------------------------
>    0%               765406               22%
>    (0%,10%)          45179                1%
>    [10%,20%)         51699                1%
>    [20%,30%)         52044                1%
>    [30%,40%)         46905                1%
>    [40%,50%)         48292                1%
>    [50%,60%)         61342                1%
>    [60%,70%)         58315                1%
>    [70%,80%)         69848                2%
>    [80%,90%)         81937                2%
>    [90%,100%)       101384                2%
>    100%            2032034               59%
>  ================================================>  -the number of debug
variables processed: 3414385
>  -PC ranges covered: 61%
>  -------------------------------------------------
>  -total availability: 64%
>  ================================================>
> With instruction referencing:
>
>  ================================================>      cov%          
samples         percentage(~)
>  -------------------------------------------------
>    0%               751201               21%
>    (0%,10%)          40708                1%
>    [10%,20%)         44909                1%
>    [20%,30%)         47544                1%
>    [30%,40%)         41630                1%
>    [40%,50%)         42742                1%
>    [50%,60%)         56692                1%
>    [60%,70%)         53796                1%
>    [70%,80%)         64476                1%
>    [80%,90%)         73836                2%
>    [90%,100%)        74423                2%
>    100%            2123749               62%
>  ================================================>  -the number of debug
variables processed: 3415706
>  -PC ranges covered: 68%
>  -------------------------------------------------
>  -total availability: 64%
>  ================================================>
> The first observation: a significant increase in the byte-coverage
statistic,
> meaning that we're able to track variable locations for longer and
across more
> code. This was one of the main aims of this work, having better tracking of
> the locations that we know. The increase of seven percentage points
includes an
> additional two percentage points of entry-value locations. If we disable
entry
> value production then the scope-bytes-covered statistic moves from 59% to
64%,
Was this meant to be "from 64% to 59%"?
How does that compare to the baseline no-entry-value number?

Could you give a quick summary of the distinction between "PC ranges
covered" and "total availability"?
> which is still a decent improvement.
>
> The next observation is that the ``total availability'' of
variables hasn't
> changed. This isn't the fully story -- if you give an absolute name to
every
> variable with a location in the clang binary, there are 6949 dropped
locations
> and 22564 completely new locations, meaning roughly 1% of all variables in
the
> program have changed, it's just hidden by the statistics rounding. More
detail
> on the nature of the changes are below. I was hoping for more false
locations
> to be dropped; it's quite likely that there are many more false
locations
> dropped within variables that have more than one value, which aren't
readily
> reflected in these statistics.
>
> A natural question is: are all these new locations wrong, and the dropped
> locations only dropped because of bugs? To address that, I picked 20 new
> locations and 20 dropped locations at random and analysed why they
happened.
> The input samples can be found here [2], along with an llvm-reduce'd
version of
> each IR file. I confirmed the reason for the new/dropped location in the
> reduced and original file, as llvm-reducing them can alter the reason why
> something is dropped or not. Of the new locations, we previously could not
> track the location because:
>  * 14 DBG_VALUEs come after the vreg operand is out of liveness and are
dropped
>    by LiveDebugVariables.
>  * 2 DBG_VALUEs are out of liveness and dropped by RegisterCoalescing
>    out of conservativeness.
>  * 2 DBG_VALUEs that appear before their operand is defined. This is out of
>    liveness, instruction referencing saves them through preserving debug
>    use-before-defs.
>  * 2 DBG_VALUEs that are out of liveness after a branch, but the value is
live
>    down the other branch path.
>
> All of these locations can be tracked with instruction referencing because
> liveness is not a consideration, only availability in physical registers.
19 of
> the new locations were correct, while one tracked the right value but
picked
> the wrong location for it, which I've now got a patch for.
>
> For the dropped locations:
>  * 8 false locations are dropped, they used to refer to the wrong value
because
>    of a failure in register coalescing, see the body of [3].
Would these issues ^ show up/be testable with Dexter?
>  * 3 locations are un-necessarily dropped when different subregisters are
>    merged together in register coalescing.
>  * 3 locations are un-necessarily dropped due to conservative tracking of
PHI
>    values (the code in D86814, can be fixed with more C++).
>  * 2 of the sample didn't actually have a dropped location; instead
they
>    preserved an undef debug instruction in early-taildup, and my scripts
picked
>    this up as dropping a location.
>  * 2 locations aren't tracked by InstrRefBasedLDV through a block
that's
>    out of scope, meaning the location never covers instructions that are in
>    scope. VarLocBasedLDV is vulerable to this too, but MachineSink can drop
a
>    DBG_VALUE on the far side of the scope gap, saving the location. See
>    "Limitations" below.
>  * 2 locations dropped during tail duplication: one in early-taildup which
>    I haven't tried to address yet (see "Limitations"), one in
late taildup
>    where a block containing only debug instructions isn't correctly
duplicated.
>
> To summarise: all the new locations found were correct and not trackable by
> DBG_VALUE variable-location tracking, although there are some bugs in
picking
> locations. Roughly half of the dropped locations are actual false
locations,
> the other half are due to unimplemented or limited handling of
optimisations in
> the instruction referencing code so far.
>
> This pretty much fufils the objective of this work: we're able to save
a lot
> more variable locations through the register allocator because we don't
have to
> be so conservative about liveness. Plus, the default behaviour of all
> optimisations now is to _drop_ a variable location, as opposed to the
existing
> situation where after we leave SSA form, all bets are off.
>
> Another question is how much this costs in compile time: a clang-3.4 build
> using instruction referencing on my otherwise idle machine usually tracks
> within 2% of a normal build. This is IMO expected given the larger amount
of
> debugging information being produced, and I haven't closely studied the
> performance of a whole build using instruction referencing yet, so
it'll
> probably get better. A more recent change to InstrRefBasedLDV has added a
big
> slowdown though, so I'm going to skip reporting any performance results
for
> now.
>
> Current situation
> ================>
> Some of this work has landed; I've got some patches up for review [4]
that
> implement the core parts. I also have a long tail of tweaks and
> location-salvaging in a tree here [5] which just fleshes outs more
optimisation
> passes and installs bugfixes. (Commits there are not written to be human
> consumable, alas). There are no fatal flaws in the design as far as I'm
aware,
> although there are some annoyances (see "Limitations").
>
> The biggest problem is that this all relies on a new LiveDebugValues
> implementation that doesn't have sufficient test coverage yet, and is
still
> Somewhat Experimental (TM). Given the number of times an unpleasant
performance
> cliff has been found in VarLoc LiveDebugValues, it wants a long time to
soak in
> before being deployed.
>
> Limitations
> ==========>
> Here's a non-exhaustive list of known problems. None of them are fatal
IMO,
> and have a small effect on variable availability:
>  * Early tail duplication: like late tail duplication, this tears apart SSA
>    information and can cause the same "Value" to be defined
twice. This is
>    solvable using the SSAUpdater utility, which early-taildup already uses.
>  * Attaching a debug instruction number to a COPY instruction is highly
>    undesirable because the COPY doesn't actually define a value, it
just moves
>    it between locations. At least one optimisation (X86 LEAtoMOV)
transforms
>    instructions into COPYs (LEA $rsp + 0 => COPY $rsp), which is
unfortunate.
>    This doesn't happen a lot though, and can be fixed by dropping a
DBG_PHI
>    of the COPYd register nearby. Plus it only happens post-regalloc, which
>    makes it less of a problem.
>  * Trivial def rematerialization: there's no pattern to rely on in how
the
>    register allocator rematerializes values, and so values can
rematerialize
>    in different registers dominating different parts of the CFG. It's
hard to
>    track the variable location after that, because it has multiple values
in
>    the eyes of InstrRefBasedLDV. My preference would be, seeing how these
defs
>    are effectively constants, to have the target describe such trivial defs
>    in a DIExpression. That avoids having to track the location of a
constant
>    that we already know.
>  * As mentioned in the "missing" variable locations list, gaps in
lexical
>    scopes can lead to locations not being propagated sufficiently far, a
>    problem for both variable-location tracking solutions as documented in
>    PR48091. However, using DBG_VALUEs to track variable locations can save
a
>    few of them because MachineSink can sink DBG_VALUEs over the scope gap,
>    wheras instruction-referencing tries to rely on tracking debug
>    use-before-defs which don't propagate across scope gaps. More on how
to
>    resolve this in PR48091.
>
> Next Steps
> =========>
> While this isn't ready for general use yet, it'd be great to get as
much as
> possible into llvm-12 behind the -Xclang
> -fexperimental-debug-variable-locations flag. That eases the path to
testing
> for consumers, which gives a greater chance of finding worst-case slowdowns
in
> advance of instruction referencing being generally available.
>
> There's a decent amount of stuff under "Limitations" above
that I can address,
> plus some performance profiling is still needed. I imagine the next best
thing
> to do is add support for GlobalISel and some non-X86 backends (certain
> TargetInstrInfo hooks need to perform debug-info bookkeeping), which would
make
> this all more appetising.
>
> [0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html
> [1] http://lists.llvm.org/pipermail/llvm-dev/2020-June/142368.html
> [2] https://github.com/jmorse/llvm-inst-ref-test-samples
> [3] https://reviews.llvm.org/D86813
> [4] https://reviews.llvm.org/D88898
> [5]
https://github.com/jmorse/llvm-project/commit/0a702b967927d888bd222806252783359fc74d57
>
> --
> Thanks,
> Jeremy

Jeremy Morse via llvm-dev

2020-Nov-06 19:29 UTC

head link

[llvm-dev] [DebugInfo] A value-tracking variable location update

Hi David,

On Fri, Nov 6, 2020 at 7:10 PM David Blaikie <dblaikie at gmail.com>
wrote:> > The first observation: a significant increase in the byte-coverage
statistic,
> > meaning that we're able to track variable locations for longer and
across more
> > code. This was one of the main aims of this work, having better
tracking of
> > the locations that we know. The increase of seven percentage points
includes an
> > additional two percentage points of entry-value locations. If we
disable entry
> > value production then the scope-bytes-covered statistic moves from 59%
to 64%,
>
> Was this meant to be "from 64% to 59%"?
> How does that compare to the baseline no-entry-value number?
I guess that was ambiguous: I mean the baseline scope-bytes-covered
number with no entry values is 59% with LLVM master as it stands
today. With instruction referencing and no entry values, the
scope-bytes-covered number rises to 64%.
> Could you give a quick summary of the distinction between "PC ranges
> covered" and "total availability"?
"total availability" is the percentage of variables in the program
that have a DW_AT_location attached -- so a measure of how well we
preserve locations through the whole compiler.

"PC ranges covered" is, for each variable in each lexical scope,
dividing the number of program text bytes covered by the variable by
the number of program text bytes in the lexical scope. If a variable
location covers 10 bytes of a scope that is 20 bytes in size, then
that's 50% coverage. The headline "PC ranges covered" is the
average
of all variables in the program -- better at measuring how long we can
track known locations. (Obviously there are upsides and downsides of
aggregating like this).
> > For the dropped locations:
> >  * 8 false locations are dropped, they used to refer to the wrong
value because
> >    of a failure in register coalescing, see the body of [3].
>
> Would these issues ^ show up/be testable with Dexter?
Yes, although it's so deep in the compiler that it'd be tricky to
write a test that always hit that problem.

--
Thanks,
Jeremy

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Nov 2020 - [DebugInfo] A value-tracking variable location update

[llvm-dev] [DebugInfo] A value-tracking variable location update

[llvm-dev] [DebugInfo] A value-tracking variable location update

[llvm-dev] [DebugInfo] A value-tracking variable location update

Reasonably Related Threads