thr3ads.net - llvm dev - [llvm-dev] [RFC] A value-tracking LiveDebugValues implementation [Jun 2020]

If this information is useful, please help other people find it:
Share via:

Jeremy Morse via llvm-dev

2020-Jun-18 15:35 UTC

[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation

Hi debuginfo-cabal,

tl;dr: Let's please consider using a new implementation of LiveDebugValues
that produces richer information, might be slightly faster, but mostly will
support the instruction referencing and value tracking paradigm from my RFC [0]
rather than the location tracking that LiveDebugValues does today.

In that RFC, the main motivator is to treat variable locations a bit more like
SSA, having variable location metadata refer to the instructions that define a
value, and track the value moving around registers from there. This
prototype [1] is an implementation of the "track" mechanism, where we
produce
value numbers for every 'def' in a function, and track them being moved
around.
Variable locations are then resolved and extended by following where the value
numbers go.

This matches, to an extent, the "Register Data Flow" / RDF graph
analysis
that recently moved from the Hexagon backend to CodeGen. I had a look, and
it appears RDF computes much richer information than LiveDebugValues needs
(specifically, all uses of values) and allocates for each value number, which
is quite a bit of overhead. I decided it'd be simpler to make a new
LiveDebugValues implementation, as we've had performance problems with
LiveDebugValues in the past, and we don't need to track additional metadata
for each value, only number them uniquely.

There are few independent reasons to adopt this new implementation, and I'm
self-conscious about dropping a code-bomb on people. My questions for this
email are:
* Does the new implementation (described below) make sense, and
* If so, how would I go about landing this?

We could wait until there's a complete suite of patches for the instruction
referencing work (I've got a working prototype, but not production-quality).
On the other hand, this new LiveDebugValues implementation is probably ready
for review now, works independently, and is very much the hardest part of
instruction referencing, so it'd be nice to get it under-way earlier. My
preferred outcome would be adding it as a parallel implementation to the
current LiveDebugValues, enabled by a runtime switch, then start to land
the rest of the instruction referencing work around it. This reduces the
scope for further bit-rot, and as Vedant mentioned [2] this would all need
to sit under a runtime-switch for a transition period anyway.

# Overview

Today's LiveDebugValues (explained in the file docu-comment [3]) concerns
itself with variable /locations/ only, not the values that they contain. The
implementation is very good at its job: it detects when a location is
invalidated (i.e. clobbered), control flow joins are a simple matter of "Do
the predecessors agree on the location?", and not much more work is needed.

Switching to tracking variable values is more complicated. Following the
original proposal, for this pseudo-MIR code sequence:

$rax = somedef
DBG_INSTR_REF reference-to-somedef
$rbx = COPY killed $rax
; Variable location is $rax, then moves to $rbx

We need to track that:
a) $rax contains a value defined by the "somedef" instruction,
b) The DBG_INSTR_REF needs a location for the value somedef defines.
c) That value from "somedef" is copied to rbx,

The new LiveDebugValues implementation doesn't have DBG_INSTR_REF yet, so
I've written it to work with DBG_VALUEs too, to prove that it works just as
well as the current implementation. It deals with code sequences such as:

$rax = somedef
DBG_VALUE $rax
$rbx = COPY killed $rax
; Variable location is $rax, then moves to $rbx

In which the same tracking has to occur, but the DBG_VALUE "reads" a
value
number from $rax. As the variable locations are determined by the value number
that the variable has, it doesn't matter whether that was specified by a
DBG_VALUE or DBG_INSTR_REF. The tracking is sufficiently good to produce an
identical binary to the output of current LiveDebugValues (details below).

There are some unrelated benefits to consider: because the new implementation
tracks all locations that a value may be in, we can handle the following
problem, that the current implementation cannot:

entry:
$rax = somedef
DBG_VALUE $rax
SPILL_TO_STACK %somestackslot, $rax
loop:
[... some code...]
$rax = RESTORE_FROM_STACK %somestackslot
do_things_with_rax
brcc loop
exit:
ret

Here, there's a variable spilt to the stack, that's temporarily restored
in the loop. At the end of the "loop" block, the value has two
locations:
$rax and the stack slot. The current LiveDebugValues always follows the
restore from the stack: it decides the variable is on the stack at the end
of "entry", and in $rax at the end of "loop". Because these
locations don't
agree on entry to "loop", the location in "loop" is dropped
completely.
Having LiveDebugValues ignore or prefer the restore just changes what kind of
circumstances locations get dropped in: the underlying problem is that the
value can be in multiple places, but we can track only one of those places.

The new implementation has enough information to solve this, as it tracks all
values in all machine locations. Unfortunately it doesn't make much
difference
to the location statistics for clang RelWithDebInfo builds. An additional
benefit is that we would be better placed to produce saner locations lists [4].

In terms of performance, the new implementation takes on average just as long
as the current implementation. A full stage2 build of clang may be 0.1% slower,
but it's in the noise on my machine. All the algorithms are much more
localised
than the current implementation (see below), and I reckon there's still
scope
for further optimisation as it's easier to make decisions such as
"Don't try to
join these variable locations in this block, they're immediately re-stated
by
DBG_VALUEs".

# Implementation

The new implementation strictly does more work than the old one: it tracks every
value in the function, not just those that are associated with a variable
location. It also has to detect places where control flow merges and produces
PHI values, where the old implementation could just check that the live-out
variable location in predecessors agreed. To address this, I've separated
the
implementation into two smaller problems, that can be solved almost
independently and with much greater locality.

Both of the problems below are sort-of-like SSA construction (they have a set
of defs, and a large set of uses), but instead of calculating new PHI
placements, we have to infer where PHIs _were_ placed.

Problem 1: Machine value numbering. Every def in the function is given a
number, which is then condensed into a transfer function for each block,
specifying which registers (or spill locs) contain which values. It's
genuinely just a map of machine location to value. Using the same sort of
dataflow model that LiveDebugValues currently uses, values are propagated to
compute the live-ins and live-out values of each block, with one significant
modification: we assign a new value number when control flow merges different
values, essentially re-computing PHI nodes. This leads to some ungraceful
code: see "Ugly PHI stuff" below. This problem is kept localised by
only
considering the registers and spill slots that are accessed, making the
problem size proportionate to the product of the max working set of the
function and the number of blocks. The number of variables used is irrelevant.

Problem 2: Variable value assignment. By using the solution for the first
problem, of "what value number does each machine location contain" at
each
program point, every DBG_VALUE can be interpreted as assigning a value to a
variable, not a location. This gets turned into a transfer function too, and
we can then perform another dataflow analysis, propagating what _value_ a
variable has in a block, and computing the set of live-in variable values for
each block. A specific machine location can be picked after this problem is
solved, possibly from several options if multiple locations have the same
value. This isn't a completely clean abstraction: predecessors with
different
variable values that are in the same location should be propagated, but the
new implementation has to find and validate a PHI value number that represents
those merged values (see "More ugly PHI stuff" below).

I'm keeping problem 2 localised by propagating variable values for one
lexical
scope at a time, and only over the blocks that are in that scope. As well as
localising by turning the main problem into a large number of small problems,
this reduces the amount of work too. The current implementation filters out
variable locations that are out of lexical scope every time a block is
processed, which now no longer needs to happen. Variables in the outermost
scopes won't be re-processed when LiveDebugValues goes around an inner-scope
loop several times; likewise variables in inner scopes won't get re-analysed
when an outer scope variable needs another pass through the function. This also
means that all lexical scopes that cover a single block (or zero) require no
dataflow analysis, which should attenuate the effects of heavy inlining a bit.

The solution to these two problems (block live-in variable values and block
live-in machine location values) can then be combined in a final step through
the function, producing DBG_VALUEs for live-ins and DBG_VALUEs for any
location transfers within a block.

# Testing and validation

The new LiveDebugValues implementation produces an identical AMD64 clang-3.4
binary to the old implementation, based on d481e59863a (Feb 2020). Well,
almost -- it proved too difficult to completely emulate the old version,
instead with the following changes [5] applied to d481e59863a, the two
implementations produce an identical binary:
* Applied D67500 [6],
* Ignore identity copies,
* Force any newly created DBG_VALUEs to be inserted according to the order
in which DebugVariables appear in the input function. So that they're
consistent between the two implementations.
* Disabling all tracking of stack spills.

Ignoring the final point, I think this would demonstrate that the two
implementations are equivalent. Stack spills are tricky: because value
transfers within blocks are left until the final step of the new
implementation, it's hard to work out in advance when old LiveDebugValues
would have transferred a location. It was easier to just compare binaries
where spills hadn't been followed.

IMO, the non-spill (the vast majority) locations being identical should be
sufficient to show that the overall algorithm works, and dealing with spills
is a minor extension.

The tests pass, except for the ones covered by the limitations section, and
a few DBG_VALUE-change-order issues.

# Limitations and unimplemented features

There are a couple of things that I haven't gotten around to implementing
yet, for which the tests fail:

Overlapping fragments: this is trivial to do, but at the back of my job queue,

Entry values: much more interesting. As Djordje has pointed out elsewhere,
LiveDebugValues has been assuming that every DBG_VALUE assigns a new value
to a variable [7], which means if a DBG_VALUE re-states a variable location
when it doesn't have to, we risk dropping an entry value. This will be much
easier in this new model (I haven't had the time to do it yet), because the
machine value numbering means that each entry value has a specific number,
and we can easily identify when it's no longer available. This should just
be a matter of the in-block transfers tracking code identifying that a
variable should have the entry value; it not being any machine location
currently; and issuing an appropriate DBG_VALUE.

# Ugly PHI stuff

Above, I said that this problem is a lot closer to SSA construction than other
classic compiler problems. However, it's actually even weirder: instead of
having a set of uses and defs and computing the locations of PHIs: every
instruction with a DebugLoc in lexical scope is a use, and the PHIs are already
placed just not recorded. Thus the problems above are more about recovering
the locations of PHIs after they're eliminated.

This adds a lot of complexity to the lattice structure used for each variable
location compared to the current LiveDebugValues: there used to only ever be
one candidate location for each variable at every control flow merge, and over
the course of dataflow analysis this was found to be either true or false.
However because we're iteratively discovering new PHI locations as we
explore,
the new implementation's lattice consists of:
* A single candidate def (\top),
* A PHI value at every control flow merge on the path between between the def
and the current block,
* A PHI value at the current block (\bot).
Each of which potentially needs one iteration of dataflow to see whether the
join should produce that value.

This only truly needs exploring at loop heads, and the maximum number of
iterations required is the depth of loops in the function, which is the same as
the old LiveDebugValues implementation. However because all the PHI values
have to be discovered, that number of iterations is required more often than
the old implementation.

I wrote more about the lattice in the docu-comment in the new implementation
[8]. I can produce a worked example too, but this email has gotten a bit
wordy already.

# More ugly PHI stuff

The section above holds true for the variable values analysis too, where
instead of defs we have variable assignments at DBG_VALUEs. Rather than
discovering where PHIs happen and propagating that information, variable
value analysis has to do the same to discover what machine-value PHI would be
needed to merge a variable location from all predecessors, and then examine
whether such a PHI is actually available, and has the right inputs. (This is
annoyingly fiddly).

# Summary

This new implementation goes fast enough, is accurate enough, and appears
to be sound. That isn't a reason to replace the current implementation; but
eventually the instruction reference work would make use of this, so if
possible I'd like to start review on it, if that's acceptable to people.

Seeing how you've just read a lot of words, a motivating reminder: this path
can lead, slowly, to there being no debug instructions for variable locations!
And as described in my RFC sitrep-email, it can be more accurate than
location only tracking too.

[0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html
[1] Branch is here:
https://github.com/jmorse/llvm-project/commits/ldvars-separation , tip
is 11450def5bc592321ece2a64ea2a8cabbc614c39 at time of sending. I
could open a PR containing the changes, but github mucks them up; best
to just look at the bare file [8].
[2] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139469.html
[3]
https://github.com/llvm/llvm-project/blob/3626eba11f2367da88ccb0280a34731243278034/llvm/lib/CodeGen/LiveDebugValues.cpp#L9
[4] https://bugs.llvm.org/show_bug.cgi?id=42834
[5]
https://github.com/jmorse/llvm-project/commit/720876fec2a5113d52977d00e0e1f7d526b4c1f0
four commits from here
[6] https://reviews.llvm.org/D67500 , I probably should push to land this
[7] It's not a bad view, and definitely something I've been preaching,
[8]
https://github.com/jmorse/llvm-project/blob/11450def5bc592321ece2a64ea2a8cabbc614c39/llvm/lib/CodeGen/LiveDebugValues.cpp#L60

--
Thanks for reading this essay,
Jeremy

Adrian Prantl via llvm-dev

2020-Jun-18 16:51 UTC

head link

[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation

Jeremy,

thank you, that is quite a lot of work you put in there! Generally, speaking
there is nothing wrong with replacing the current LiveDebugValues pass with a
new implementation if the new implementation solves all the same problems, and
is either simpler or faster, or more accurate. We just need to be sure to judge
the "simpler" aspect only after all the ugly edge cases have been
addressed.
The LLVM way of staging is is to land the new implementation in trunk, and when
it's ready, make it the default, and then remove the old one. Unfortunately
we have a track record of getting stuck after step 1, because it's always
difficult to completely replace a matured implementation of anything.

I agree that tracking values instead of instructions is a very elegant way of
dealing with the output of regalloc and spilling and that it would seem to have
potential for longer live ranges for the individual debug values. I've
skimmed your implementation and it looks nice and well-documented.

I had two thoughts while read while reading the proposal, but reflecting on it,
I fell like these are not necessarily just problems with your proposal, but also
problems with current implementation. I'm going to include them here because
I already typed them out, but honestly they shouldn't affect our decision
:-)

1. I was wondering about potential downsides of tracking values. I noticed that
a surprising amount of developers like to modify variables in the debugger.
DWARF allows this when you have memory or register locations, as opposed to
(DW_OP) stack values. If we wanted to become better at this, tracking values
sounds potentially dangerous, since we can't know if the same value in a
different location has any correspondence with the variable any more. I suppose
that even with the new algorithm we could insert a hypothetical DBG_DECLARE at
the definition and a DBG_VALUE when the register is moved. I guess there is no
theoretical limit to supporting this with the value tracking approach.

2. When tracking values, do we keep track of the original DBG_VALUE's !dbg
location to know when we need to stop propagating? I was worried that with good
value tracking we might carry a stale value forward too far. But writing down
the example made me realize that this is more an issue for
HowToUpdateDebugInfo.rst than one just local toLiveDebugValues.

If we have the following unoptimized program, with !dbg !1 ~ line 1, etc:

%x = insn1, !dbg !1
DBG_VALUE "v", %x, !dbg !1
insn2, !dbg !2
%y = insn3, !dbg !3
DBG_VALUE "v", %y, !dbg !3
insn4, !dbg !4
%z = insn5, !dbg !5
DBG_VALUE "v", %z, !dbg !5

and it gets optimized to

%y = insn3, !dbg !3
DBG_VALUE "v", %y, !dbg !3
%x = insn1, !dbg !1
DBG_VALUE "v", %x, !dbg !1
insn2, !dbg !2
insn4, !dbg !4 ; now has wrong DBG_VALUE for "v"
%z = insn5, !dbg !5
DBG_VALUE "v", %z, !dbg !5

then insn4 has a stale value for the variable "v", because at location
!dbg !4 it should be %y and not %x. We can require all passes to insert a
DBG_VALUE undef after moving instructions, or we could start giving debug
intrinsics a range of !dbg locations, such as DBG_VALUE "v", %x, [!dbg
!1, dbg !3) and define an ordering of !dbg locations.

If we had such a facility: would it be possible to integrate this into the new
pass implementation?

-- adrian

Jeremy Morse via llvm-dev

2020-Jun-22 12:57 UTC

head link

[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation

Hi Adrian,
> quite a lot of work
Large amounts of credit should go to llvm-reduce, which was fundamental to
getting any of this done,
> I've skimmed your implementation and it looks nice and well-documented.
The thing that worries me is the over-complicated lattices -- I didn't
anticipate the problem, and there's a risk that it's more complex than
it
needs to be. As it sounds like getting this reviewed and landed is feasible,
I'll write about that (and a worked example) in the review.
> I was wondering about potential downsides of tracking values. I noticed
that
> a surprising amount of developers like to modify variables in the debugger
> [...]
This is really interesting, and as you say it's something that is already
happening today. Consider this:

  #include <stdbool.h>
  extern void ext(int);
  extern bool extb(int, int);
  void foo(int bar, int baz) {
    ext(bar);
    bool cont = true;
    while (cont) {
      cont = extb(bar, baz);
    }
  }

Compiled with a recent master -O2 -g and main() / other functions in another
file, we get this disassembly and location list for 'bar':

   0x0000000000400483 <+3>:     mov    %esi,%ebx
   0x0000000000400485 <+5>:     mov    %edi,%ebp
=> 0x0000000000400487 <+7>:     callq  0x4004d0 <ext>

DW_AT_location    (0x00000000:
  [0x0000000000400480, 0x0000000000400487): DW_OP_reg5 RDI
  [0x0000000000400487, 0x00000000004004a3): DW_OP_reg6 RBP)

Stepping through the function, we stop on the call to 'ext', and can set
the
value of 'bar', but because there are two locations (and LiveDebugValues
picked
the long term register $ebp rather than $edi), you can set "bar" but
it has
no effect on the call to 'ext'.

AFAIUI, this is never a problem at -O0 because there's only ever one
location
for a variable. With optimisations, and without a way to describe multiple
locations for a variable in DWARF, I don't think it's generally
solvable.
Plus, if you had two variables that have been CSE'd  into being the same
value,
setting one will modify the other. These are all consequences of optimisations
changing the structure of the program.

The best guarantee that I imagine we could give, is that for any given block,
the variable location in that block is the location that instructions read
from. That way, if you modify the variable in a block, it's very likely that
the next few statements will read that modified variable. It's eminently
do-able with the new implementation, as location selection is the last thing
that happens and is done on a per-block basis, although it wouldn't be free
(in terms of performance cost). I think the "value is only read from one
location" idea is true of SelectionDAG, but it might fall apart with other
optimisations.
> When tracking values, do we keep track of the original DBG_VALUE's !dbg
> location to know when we need to stop propagating? [...] insn4 has a stale
> value for the variable "v"
I don't believe there's any relationship between DBG_VALUE locations and
!dbg
source locations right now. This is actually one of my pet peeves, that
variable locations and the line program don't necessarily line up into
something coherent. It's definitely something that leads to misleading
program
states being presented today; there are at least two bugs.llvm.org reports that
are caused by such problems. [Can't find them right now as bugzilla is
throwing errors]. Defining an order on !dbg locations, and building DBG_VALUEs
into that order, would be really useful for ensuring correctness
> If we had such a facility: would it be possible to integrate this into the
> new pass implementation?
In the simple case, easily: in your example, there would just be an extra
layer of mapping from source location to DBG_VALUE; and each DBG_VALUE would
have a well defined value number. We could even have insn4 set the variable
location to %y's value if it's still available somewhere.

If variable locations are defined by source location however, this undermines
the meaning of how LiveDebugValues determines variable locations when control
flow merges: there wouldn't be a simple "variable location in
predecessor
block" any more. And the merged "live in" location wouldn't
necessarily mean
anything to the source locations in the block.

Stepping further back, we'd also lose one of the freebies that the current
design gives us: when variable values merge at a PHI node, but there's no
actual PHI instruction in the IR (because there's no subsequent IR use), we
don't create a "dbg.phi(...)" instruction, we just ignore it and
let
LiveDebugValues patch it up later, if a location can be recovered. If every
source location needed to be connected to a variable location record, we would
need to represent such debug-only PHIs much earlier in compilation
(or drop them).

On the other hand, what you're describing (plus the instruction referencing
work) is something that doesn't require debug instructions in the IR or MIR,
which is highly desirable. We could just store a set of
{instruction references, variable / fragment / expr, source locations} and
build a location list out of that. Definitely worth pursuing, and value
tracking would definitely enable such designs.

--
Thanks,
Jeremy

Vedant Kumar via llvm-dev

2020-Jun-25 21:33 UTC

head link

[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation

Thanks for sharing this RFC, Jeremy.
> On Jun 18, 2020, at 8:35 AM, Jeremy Morse <jeremy.morse.llvm at
gmail.com> wrote:
> 
> Hi debuginfo-cabal,
> 
> tl;dr: Let's please consider using a new implementation of
LiveDebugValues
> that produces richer information, might be slightly faster, but mostly will
> support the instruction referencing and value tracking paradigm from my RFC
[0]
> rather than the location tracking that LiveDebugValues does today.
> 
> In that RFC, the main motivator is to treat variable locations a bit more
like
> SSA, having variable location metadata refer to the instructions that
define a
> value, and track the value moving around registers from there. This
> prototype [1] is an implementation of the "track" mechanism,
where we produce
> value numbers for every 'def' in a function, and track them being
moved around.
> Variable locations are then resolved and extended by following where the
value
> numbers go.
Your earlier RFC sketches a plan for extending LiveDebugValues to handle
instruction-referencing DBG_VALUEs. From
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html
<http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html> --
>> How then do we translate this new kind of machine location into
>> DWARF/CodeView variable locations, which need to know which register
>> to look in? The answer is: LiveDebugValues [3]. We already perform a
>> dataflow analysis in LiveDebugValues of where values "go"
after
>> they're defined: we can use that to take "defining
instructions" and
>> determine register / stack locations. We would need to track values
>> from the defining instruction up to the DBG_VALUE where that value
>> becomes a variable location, after which it's no different from the
>> LiveDebugValues analysis that we perform today.
I had interpreted this as meaning: introduce transfer functions to walk from the
referenced instruction until we reach the referencing DBG_VALUE, then create a
VarLoc. IIUC, at that point the unmodified LiveDebugValues algorithm could
proceed as usual.

Reading this RFC, I get the sense that that has several drawbacks: it doesn’t
handle situations where the referenced instruction is in a different basic block
than its referencing DBG_VALUE (all the ‘ugly phi stuff’ is missing!), and it
doesn’t track all the locations for a variable. Is that correct?
> This matches, to an extent, the "Register Data Flow" / RDF graph
analysis
> that recently moved from the Hexagon backend to CodeGen. I had a look, and
> it appears RDF computes much richer information than LiveDebugValues needs
> (specifically, all uses of values) and allocates for each value number,
which
> is quite a bit of overhead. I decided it'd be simpler to make a new
> LiveDebugValues implementation, as we've had performance problems with
> LiveDebugValues in the past, and we don't need to track additional
metadata
> for each value, only number them uniquely.
> 
> There are few independent reasons to adopt this new implementation, and
I'm
> self-conscious about dropping a code-bomb on people. My questions for this
> email are:
> * Does the new implementation (described below) make sense, and
I think your results speak for themselves :).
> * If so, how would I go about landing this?
One thing I’m not yet clear on: does your prototype essentially implement the
minimal extension to LiveDebugValues needed to handle instruction-referencing
DBG_VALUEs? If not, should it?, or should we take the opportunity to remove
limitations from the current LiveDebugValues algorithm?

As for how to land this, I propose:

- Move LiveDebugValues.cpp to lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
- Split out common transfer functions and other helpers into
lib/CodeGen/LiveDebugValues/Common.{h,cpp}
- Finally, add lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp
- Add a cl::opt like -instr-ref-dbg-values to allow selecting a DBG_VALUE flavor
> We could wait until there's a complete suite of patches for the
instruction
> referencing work (I've got a working prototype, but not
production-quality).
> On the other hand, this new LiveDebugValues implementation is probably
ready
> for review now, works independently, and is very much the hardest part of
> instruction referencing, so it'd be nice to get it under-way earlier.
My
> preferred outcome would be adding it as a parallel implementation to the
> current LiveDebugValues, enabled by a runtime switch, then start to land
> the rest of the instruction referencing work around it. This reduces the
> scope for further bit-rot, and as Vedant mentioned [2] this would all need
> to sit under a runtime-switch for a transition period anyway.
Yes, I’m in favor of landing work related to instruction-referencing DBG_VALUEs
in small-ish independent patches, as they become ready. I’d very much like to
keep the unmodified var-loc DBG_VALUE code paths intact and separated.
> 
> # Overview
> 
> Today's LiveDebugValues (explained in the file docu-comment [3])
concerns
> itself with variable /locations/ only, not the values that they contain.
The
> implementation is very good at its job: it detects when a location is
> invalidated (i.e. clobbered), control flow joins are a simple matter of
"Do
> the predecessors agree on the location?", and not much more work is
needed.
> 
> Switching to tracking variable values is more complicated. Following the
> original proposal, for this pseudo-MIR code sequence:
> 
>    $rax = somedef
>    DBG_INSTR_REF reference-to-somedef
>    $rbx = COPY killed $rax
>    ; Variable location is $rax, then moves to $rbx
> 
> We need to track that:
> a) $rax contains a value defined by the "somedef" instruction,
> b) The DBG_INSTR_REF needs a location for the value somedef defines.
> c) That value from "somedef" is copied to rbx,
> 
> The new LiveDebugValues implementation doesn't have DBG_INSTR_REF yet,
so
> I've written it to work with DBG_VALUEs too, to prove that it works
just as
> well as the current implementation. It deals with code sequences such as:
> 
>    $rax = somedef
>    DBG_VALUE $rax
>    $rbx = COPY killed $rax
>    ; Variable location is $rax, then moves to $rbx
> 
> In which the same tracking has to occur, but the DBG_VALUE
"reads" a value
> number from $rax. As the variable locations are determined by the value
number
> that the variable has, it doesn't matter whether that was specified by
a
> DBG_VALUE or DBG_INSTR_REF.  The tracking is sufficiently good to produce
an
> identical binary to the output of current LiveDebugValues (details below).
> 
> There are some unrelated benefits to consider: because the new
implementation
> tracks all locations that a value may be in, we can handle the following
> problem, that the current implementation cannot:
> 
>  entry:
>    $rax = somedef
>    DBG_VALUE $rax
>    SPILL_TO_STACK %somestackslot, $rax
>  loop:
>    [... some code...]
>    $rax = RESTORE_FROM_STACK %somestackslot
>    do_things_with_rax
>    brcc loop
>  exit:
>    ret
> 
> Here, there's a variable spilt to the stack, that's temporarily
restored
> in the loop. At the end of the "loop" block, the value has two
locations:
> $rax and the stack slot. The current LiveDebugValues always follows the
> restore from the stack: it decides the variable is on the stack at the end
> of "entry", and in $rax at the end of "loop". Because
these locations don't
> agree on entry to "loop", the location in "loop" is
dropped completely.
> Having LiveDebugValues ignore or prefer the restore just changes what kind
of
> circumstances locations get dropped in: the underlying problem is that the
> value can be in multiple places, but we can track only one of those places.
> 
> The new implementation has enough information to solve this, as it tracks
all
> values in all machine locations. Unfortunately it doesn't make much
difference
> to the location statistics for clang RelWithDebInfo builds. An additional
> benefit is that we would be better placed to produce saner locations lists
[4].
> 
> In terms of performance, the new implementation takes on average just as
long
> as the current implementation. A full stage2 build of clang may be 0.1%
slower,
> but it's in the noise on my machine. All the algorithms are much more
localised
> than the current implementation (see below), and I reckon there's still
scope
> for further optimisation as it's easier to make decisions such as
"Don't try to
> join these variable locations in this block, they're immediately
re-stated by
> DBG_VALUEs".
> 
> # Implementation
> 
> The new implementation strictly does more work than the old one: it tracks
every
> value in the function, not just those that are associated with a variable
> location. It also has to detect places where control flow merges and
produces
> PHI values, where the old implementation could just check that the live-out
> variable location in predecessors agreed. To address this, I've
separated the
> implementation into two smaller problems, that can be solved almost
> independently and with much greater locality.
> 
> Both of the problems below are sort-of-like SSA construction (they have a
set
> of defs, and a large set of uses), but instead of calculating new PHI
> placements, we have to infer where PHIs _were_ placed.
> 
> Problem 1: Machine value numbering. Every def in the function is given a
> number, which is then condensed into a transfer function for each block,
> specifying which registers (or spill locs) contain which values. It's
> genuinely just a map of machine location to value. Using the same sort of
> dataflow model that LiveDebugValues currently uses, values are propagated
to
> compute the live-ins and live-out values of each block, with one
significant
> modification: we assign a new value number when control flow merges
different
> values, essentially re-computing PHI nodes. This leads to some ungraceful
> code: see "Ugly PHI stuff" below. This problem is kept localised
by only
> considering the registers and spill slots that are accessed, making the
> problem size proportionate to the product of the max working set of the
> function and the number of blocks. The number of variables used is
irrelevant.
> 
> Problem 2: Variable value assignment. By using the solution for the first
> problem, of "what value number does each machine location
contain" at each
> program point, every DBG_VALUE can be interpreted as assigning a value to a
> variable, not a location. This gets turned into a transfer function too,
and
> we can then perform another dataflow analysis, propagating what _value_ a
> variable has in a block, and computing the set of live-in variable values
for
> each block. A specific machine location can be picked after this problem is
> solved, possibly from several options if multiple locations have the same
> value. This isn't a completely clean abstraction: predecessors with
different
> variable values that are in the same location should be propagated, but the
> new implementation has to find and validate a PHI value number that
represents
> those merged values (see "More ugly PHI stuff" below).
> 
> I'm keeping problem 2 localised by propagating variable values for one
lexical
> scope at a time, and only over the blocks that are in that scope. As well
as
> localising by turning the main problem into a large number of small
problems,
> this reduces the amount of work too. The current implementation filters out
> variable locations that are out of lexical scope every time a block is
> processed, which now no longer needs to happen. Variables in the outermost
> scopes won't be re-processed when LiveDebugValues goes around an
inner-scope
> loop several times; likewise variables in inner scopes won't get
re-analysed
> when an outer scope variable needs another pass through the function. This
also
> means that all lexical scopes that cover a single block (or zero) require
no
> dataflow analysis, which should attenuate the effects of heavy inlining a
bit.
> 
> The solution to these two problems (block live-in variable values and block
> live-in machine location values) can then be combined in a final step
through
> the function, producing DBG_VALUEs for live-ins and DBG_VALUEs for any
> location transfers within a block.
> 
> # Testing and validation
> 
> The new LiveDebugValues implementation produces an identical AMD64
clang-3.4
> binary to the old implementation, based on d481e59863a (Feb 2020). Well,
> almost -- it proved too difficult to completely emulate the old version,
> instead with the following changes [5] applied to d481e59863a, the two
> implementations produce an identical binary:
> * Applied D67500 [6],
> * Ignore identity copies,
> * Force any newly created DBG_VALUEs to be inserted according to the order
>   in which DebugVariables appear in the input function. So that they're
>   consistent between the two implementations.
> * Disabling all tracking of stack spills.
> 
> Ignoring the final point, I think this would demonstrate that the two
> implementations are equivalent. Stack spills are tricky: because value
> transfers within blocks are left until the final step of the new
> implementation, it's hard to work out in advance when old
LiveDebugValues
> would have transferred a location. It was easier to just compare binaries
> where spills hadn't been followed.
> 
> IMO, the non-spill (the vast majority) locations being identical should be
> sufficient to show that the overall algorithm works, and dealing with
spills
> is a minor extension.
> 
> The tests pass, except for the ones covered by the limitations section, and
> a few DBG_VALUE-change-order issues.
> 
> # Limitations and unimplemented features
> 
> There are a couple of things that I haven't gotten around to
implementing
> yet, for which the tests fail:
> 
> Overlapping fragments: this is trivial to do, but at the back of my job
queue,
> 
> Entry values: much more interesting. As Djordje has pointed out elsewhere,
> LiveDebugValues has been assuming that every DBG_VALUE assigns a new value
> to a variable [7], which means if a DBG_VALUE re-states a variable location
> when it doesn't have to, we risk dropping an entry value. This will be
much
> easier in this new model (I haven't had the time to do it yet), because
the
> machine value numbering means that each entry value has a specific number,
> and we can easily identify when it's no longer available. This should
just
> be a matter of the in-block transfers tracking code identifying that a
> variable should have the entry value; it not being any machine location
> currently; and issuing an appropriate DBG_VALUE.
> 
> # Ugly PHI stuff
> 
> Above, I said that this problem is a lot closer to SSA construction than
other
> classic compiler problems. However, it's actually even weirder: instead
of
> having a set of uses and defs and computing the locations of PHIs: every
> instruction with a DebugLoc in lexical scope is a use, and the PHIs are
already
> placed just not recorded. Thus the problems above are more about recovering
> the locations of PHIs after they're eliminated.
> 
> This adds a lot of complexity to the lattice structure used for each
variable
> location compared to the current LiveDebugValues: there used to only ever
be
> one candidate location for each variable at every control flow merge, and
over
> the course of dataflow analysis this was found to be either true or false.
> However because we're iteratively discovering new PHI locations as we
explore,
> the new implementation's lattice consists of:
> * A single candidate def (\top),
> * A PHI value at every control flow merge on the path between between the
def
>   and the current block,
> * A PHI value at the current block (\bot).
> Each of which potentially needs one iteration of dataflow to see whether
the
> join should produce that value.
> 
> This only truly needs exploring at loop heads, and the maximum number of
> iterations required is the depth of loops in the function, which is the
same as
> the old LiveDebugValues implementation. However because all the PHI values
> have to be discovered, that number of iterations is required more often
than
> the old implementation.
> 
> I wrote more about the lattice in the docu-comment in the new
implementation
> [8]. I can produce a worked example too, but this email has gotten a bit
> wordy already.
> 
> # More ugly PHI stuff
> 
> The section above holds true for the variable values analysis too, where
> instead of defs we have variable assignments at DBG_VALUEs. Rather than
> discovering where PHIs happen and propagating that information, variable
> value analysis has to do the same to discover what machine-value PHI would
be
> needed to merge a variable location from all predecessors, and then examine
> whether such a PHI is actually available, and has the right inputs. (This
is
> annoyingly fiddly).
> 
> # Summary
> 
> This new implementation goes fast enough, is accurate enough, and appears
> to be sound. That isn't a reason to replace the current implementation;
but
> eventually the instruction reference work would make use of this, so if
> possible I'd like to start review on it, if that's acceptable to
people.
> 
> Seeing how you've just read a lot of words, a motivating reminder: this
path
> can lead, slowly, to there being no debug instructions for variable
locations!
> And as described in my RFC sitrep-email, it can be more accurate than
> location only tracking too.
> 
> [0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html
> [1] Branch is here:
> https://github.com/jmorse/llvm-project/commits/ldvars-separation , tip
> is 11450def5bc592321ece2a64ea2a8cabbc614c39 at time of sending. I
> could open a PR containing the changes, but github mucks them up; best
> to just look at the bare file [8].
> [2] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139469.html
> [3]
https://github.com/llvm/llvm-project/blob/3626eba11f2367da88ccb0280a34731243278034/llvm/lib/CodeGen/LiveDebugValues.cpp#L9
> [4] https://bugs.llvm.org/show_bug.cgi?id=42834
> [5]
https://github.com/jmorse/llvm-project/commit/720876fec2a5113d52977d00e0e1f7d526b4c1f0
> four commits from here
> [6] https://reviews.llvm.org/D67500 , I probably should push to land this
> [7] It's not a bad view, and definitely something I've been
preaching,
> [8]
https://github.com/jmorse/llvm-project/blob/11450def5bc592321ece2a64ea2a8cabbc614c39/llvm/lib/CodeGen/LiveDebugValues.cpp#L60
> 
> --
> Thanks for reading this essay,
> Jeremy

vedant
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200625/104a0664/attachment.html>

Jeremy Morse via llvm-dev

2020-Jun-26 12:11 UTC

head link

[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation

Hi Vedant,

On Thu, Jun 25, 2020 at 10:33 PM Vedant Kumar <vedant_kumar at apple.com>
wrote:> Your earlier RFC sketches a plan for extending LiveDebugValues to handle
instruction-referencing DBG_VALUEs. From
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html --
>
> How then do we translate this new kind of machine location into
> DWARF/CodeView variable locations, which need to know which register
> to look in? The answer is: LiveDebugValues [3]. We already perform a
> dataflow analysis in LiveDebugValues of where values "go" after
> they're defined: we can use that to take "defining
instructions" and
> determine register / stack locations. We would need to track values
> from the defining instruction up to the DBG_VALUE where that value
> becomes a variable location, after which it's no different from the
> LiveDebugValues analysis that we perform today.
>
> I had interpreted this as meaning: introduce transfer functions to walk
from the referenced instruction until we reach the referencing DBG_VALUE, then
create a VarLoc. IIUC, at that point the unmodified LiveDebugValues algorithm
could proceed as usual.
I think that was my meaning at the time; and it's still broadly true.
Unfortunately, it turns out that there can be pretty much an arbitrary
computation between the referenced instruction and the referencing
DBG_INSTR_REF. I ran across several scenarios where a referenced
instruction gets CSE'd / PRE'd / hoisted away from the referencing
DBG_INSTR_REF, to the extent that it's on the other side of a loop.
Once it's there, a dataflow algorithm is needed to work out whether
the value is live through the loop, or whether it gets clobbered
somewhere.

That's what the "machine value number" problem is solving. It
could
still plug into the VarLoc implementation if that makes an easier
transition, where a DBG_INSTR_REF is decomposed into a DBG_VALUE when
an instruction reference / value number / location are matched up. It
would make debug use-before-defs harder to deal with, though.

(It's worth noting at this point that the machine value number work
is, in a sense, doing LiveDebugVariables' job, by linking up a
register def with a use. Just, LiveDebugVariables has the register
allocator to help it do it).
> Reading this RFC, I get the sense that that has several drawbacks: it
doesn’t handle situations where the referenced instruction is in a different
basic block than its referencing DBG_VALUE (all the ‘ugly phi stuff’ is
missing!), and it doesn’t track all the locations for a variable. Is that
correct?
This might be my overly-defensive writing style working against me. I
believe those problems are solved, but the solutions are more
complicated than I wanted. For:
> it doesn’t handle situations where the referenced instruction is in a
different basic block than its referencing DBG_VALUE
This is solved by the machine value number dataflow; but as mentioned
requires a more complex lattice to infer where PHI values are
generated,
> and it doesn’t track all the locations for a variable
I believe the implementation correctly picks the _value_ for every
variable in each block, and picking a location is a final
post-processing stage.
> One thing I’m not yet clear on: does your prototype essentially implement
the minimal extension to LiveDebugValues needed to handle
instruction-referencing DBG_VALUEs? If not, should it?, or should we take the
opportunity to remove limitations from the current LiveDebugValues algorithm?
I have the code for this hanging around, but not in the tree I linked
to. It establishes a fairly trivial mapping between:
 * Instructions that are referred to, and
 * The machine value numbers that instruction produces.
I'd suggest that it's not useful without DBG_INSTR_REFs as we
wouldn't
be able to test it., and testing it means bringing in patches to
generate or parse DBG_INSTR_REFs, which I'd rather split up.
> As for how to land this, I propose:
>
> - Move LiveDebugValues.cpp to
lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp
> - Split out common transfer functions and other helpers into
lib/CodeGen/LiveDebugValues/Common.{h,cpp}
> - Finally, add lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp
> - Add a cl::opt like -instr-ref-dbg-values to allow selecting a DBG_VALUE
flavor
Sounds good to me,
> Yes, I’m in favor of landing work related to instruction-referencing
DBG_VALUEs in small-ish independent patches, as they become ready. I’d very much
like to keep the unmodified var-loc DBG_VALUE code paths intact and separated.
--
Thanks,
Jeremy

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Jun 2020 - [RFC] A value-tracking LiveDebugValues implementation

[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation

[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation

[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation

[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation

[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation

Apparently Analagous Threads