thr3ads.net - llvm dev - [llvm-dev] [RFC] A value-tracking LiveDebugValues implementation [Jun 2020]

If this information is useful, please help other people find it:
Share via:

Jeremy Morse via llvm-dev

2020-Jun-22 12:57 UTC

[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation

Hi Adrian,
> quite a lot of work
Large amounts of credit should go to llvm-reduce, which was fundamental to
getting any of this done,
> I've skimmed your implementation and it looks nice and well-documented.
The thing that worries me is the over-complicated lattices -- I didn't
anticipate the problem, and there's a risk that it's more complex than
it
needs to be. As it sounds like getting this reviewed and landed is feasible,
I'll write about that (and a worked example) in the review.
> I was wondering about potential downsides of tracking values. I noticed
that
> a surprising amount of developers like to modify variables in the debugger
> [...]
This is really interesting, and as you say it's something that is already
happening today. Consider this:

  #include <stdbool.h>
  extern void ext(int);
  extern bool extb(int, int);
  void foo(int bar, int baz) {
    ext(bar);
    bool cont = true;
    while (cont) {
      cont = extb(bar, baz);
    }
  }

Compiled with a recent master -O2 -g and main() / other functions in another
file, we get this disassembly and location list for 'bar':

   0x0000000000400483 <+3>:     mov    %esi,%ebx
   0x0000000000400485 <+5>:     mov    %edi,%ebp
=> 0x0000000000400487 <+7>:     callq  0x4004d0 <ext>

DW_AT_location    (0x00000000:
  [0x0000000000400480, 0x0000000000400487): DW_OP_reg5 RDI
  [0x0000000000400487, 0x00000000004004a3): DW_OP_reg6 RBP)

Stepping through the function, we stop on the call to 'ext', and can set
the
value of 'bar', but because there are two locations (and LiveDebugValues
picked
the long term register $ebp rather than $edi), you can set "bar" but
it has
no effect on the call to 'ext'.

AFAIUI, this is never a problem at -O0 because there's only ever one
location
for a variable. With optimisations, and without a way to describe multiple
locations for a variable in DWARF, I don't think it's generally
solvable.
Plus, if you had two variables that have been CSE'd  into being the same
value,
setting one will modify the other. These are all consequences of optimisations
changing the structure of the program.

The best guarantee that I imagine we could give, is that for any given block,
the variable location in that block is the location that instructions read
from. That way, if you modify the variable in a block, it's very likely that
the next few statements will read that modified variable. It's eminently
do-able with the new implementation, as location selection is the last thing
that happens and is done on a per-block basis, although it wouldn't be free
(in terms of performance cost). I think the "value is only read from one
location" idea is true of SelectionDAG, but it might fall apart with other
optimisations.
> When tracking values, do we keep track of the original DBG_VALUE's !dbg
> location to know when we need to stop propagating? [...] insn4 has a stale
> value for the variable "v"
I don't believe there's any relationship between DBG_VALUE locations and
!dbg
source locations right now. This is actually one of my pet peeves, that
variable locations and the line program don't necessarily line up into
something coherent. It's definitely something that leads to misleading
program
states being presented today; there are at least two bugs.llvm.org reports that
are caused by such problems. [Can't find them right now as bugzilla is
throwing errors]. Defining an order on !dbg locations, and building DBG_VALUEs
into that order, would be really useful for ensuring correctness
> If we had such a facility: would it be possible to integrate this into the
> new pass implementation?
In the simple case, easily: in your example, there would just be an extra
layer of mapping from source location to DBG_VALUE; and each DBG_VALUE would
have a well defined value number. We could even have insn4 set the variable
location to %y's value if it's still available somewhere.

If variable locations are defined by source location however, this undermines
the meaning of how LiveDebugValues determines variable locations when control
flow merges: there wouldn't be a simple "variable location in
predecessor
block" any more. And the merged "live in" location wouldn't
necessarily mean
anything to the source locations in the block.

Stepping further back, we'd also lose one of the freebies that the current
design gives us: when variable values merge at a PHI node, but there's no
actual PHI instruction in the IR (because there's no subsequent IR use), we
don't create a "dbg.phi(...)" instruction, we just ignore it and
let
LiveDebugValues patch it up later, if a location can be recovered. If every
source location needed to be connected to a variable location record, we would
need to represent such debug-only PHIs much earlier in compilation
(or drop them).

On the other hand, what you're describing (plus the instruction referencing
work) is something that doesn't require debug instructions in the IR or MIR,
which is highly desirable. We could just store a set of
{instruction references, variable / fragment / expr, source locations} and
build a location list out of that. Definitely worth pursuing, and value
tracking would definitely enable such designs.

--
Thanks,
Jeremy

Pavel Labath via llvm-dev

2020-Jun-23 06:58 UTC

head link

[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation

On 22/06/2020 14:57, Jeremy Morse via llvm-dev wrote:> Hi Adrian,
> 
>> quite a lot of work
> 
> Large amounts of credit should go to llvm-reduce, which was fundamental to
> getting any of this done,
> 
>> I've skimmed your implementation and it looks nice and
well-documented.
> 
> The thing that worries me is the over-complicated lattices -- I didn't
> anticipate the problem, and there's a risk that it's more complex
than it
> needs to be. As it sounds like getting this reviewed and landed is
feasible,
> I'll write about that (and a worked example) in the review.
> 
>> I was wondering about potential downsides of tracking values. I noticed
that
>> a surprising amount of developers like to modify variables in the
debugger
>> [...]
> 
> This is really interesting, and as you say it's something that is
already
> happening today. Consider this:
> 
>   #include <stdbool.h>
>   extern void ext(int);
>   extern bool extb(int, int);
>   void foo(int bar, int baz) {
>     ext(bar);
>     bool cont = true;
>     while (cont) {
>       cont = extb(bar, baz);
>     }
>   }
> 
> Compiled with a recent master -O2 -g and main() / other functions in
another
> file, we get this disassembly and location list for 'bar':
> 
>    0x0000000000400483 <+3>:     mov    %esi,%ebx
>    0x0000000000400485 <+5>:     mov    %edi,%ebp
> => 0x0000000000400487 <+7>:     callq  0x4004d0 <ext>
> 
> DW_AT_location    (0x00000000:
>   [0x0000000000400480, 0x0000000000400487): DW_OP_reg5 RDI
>   [0x0000000000400487, 0x00000000004004a3): DW_OP_reg6 RBP)
> 
> Stepping through the function, we stop on the call to 'ext', and
can set the
> value of 'bar', but because there are two locations (and
LiveDebugValues picked
> the long term register $ebp rather than $edi), you can set "bar"
but it has
> no effect on the call to 'ext'.
> 
> AFAIUI, this is never a problem at -O0 because there's only ever one
location
> for a variable. With optimisations, and without a way to describe multiple
> locations for a variable in DWARF, I don't think it's generally
solvable.
While, I don't want to give too much emphasis on being able to modify
variables in an optimized program (unoptimized programs are a different
story though), I feel obligated to jump in to say that DWARF already
does supports this scenario (Section 2.6.2 of DWARF5, page 43):
"""
The address ranges defined by the bounded location descriptions of a
location list may overlap. When they do, they describe a situation in
which an object exists simultaneously in more than one place.
"""

Now, I don't know of any debugger which actually does this, but in
theory a debugger could go and update all locations of a variable for a
given address instead of just the first one.

Changing only one of the CSE'd variables is obviously impossible, but
that is also something that could be mitigated by a very careful
debugger -- by checking whether any other variable happens to refer to
the same location and warning or aborting the modification.

cheers,
pavel

David Blaikie via llvm-dev

2020-Jun-23 15:38 UTC

head link

[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation

On Tue, Jun 23, 2020 at 5:38 AM Pavel Labath via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
> On 22/06/2020 14:57, Jeremy Morse via llvm-dev wrote:
> > Hi Adrian,
> >
> >> quite a lot of work
> >
> > Large amounts of credit should go to llvm-reduce, which was
fundamental to
> > getting any of this done,
> >
> >> I've skimmed your implementation and it looks nice and
well-documented.
> >
> > The thing that worries me is the over-complicated lattices -- I
didn't
> > anticipate the problem, and there's a risk that it's more
complex than it
> > needs to be. As it sounds like getting this reviewed and landed is
feasible,
> > I'll write about that (and a worked example) in the review.
> >
> >> I was wondering about potential downsides of tracking values. I
noticed that
> >> a surprising amount of developers like to modify variables in the
debugger
> >> [...]
> >
> > This is really interesting, and as you say it's something that is
already
> > happening today. Consider this:
> >
> >   #include <stdbool.h>
> >   extern void ext(int);
> >   extern bool extb(int, int);
> >   void foo(int bar, int baz) {
> >     ext(bar);
> >     bool cont = true;
> >     while (cont) {
> >       cont = extb(bar, baz);
> >     }
> >   }
> >
> > Compiled with a recent master -O2 -g and main() / other functions in
another
> > file, we get this disassembly and location list for 'bar':
> >
> >    0x0000000000400483 <+3>:     mov    %esi,%ebx
> >    0x0000000000400485 <+5>:     mov    %edi,%ebp
> > => 0x0000000000400487 <+7>:     callq  0x4004d0 <ext>
> >
> > DW_AT_location    (0x00000000:
> >   [0x0000000000400480, 0x0000000000400487): DW_OP_reg5 RDI
> >   [0x0000000000400487, 0x00000000004004a3): DW_OP_reg6 RBP)
> >
> > Stepping through the function, we stop on the call to 'ext',
and can set the
> > value of 'bar', but because there are two locations (and
LiveDebugValues picked
> > the long term register $ebp rather than $edi), you can set
"bar" but it has
> > no effect on the call to 'ext'.
> >
> > AFAIUI, this is never a problem at -O0 because there's only ever
one location
> > for a variable. With optimisations, and without a way to describe
multiple
> > locations for a variable in DWARF, I don't think it's
generally solvable.
>
> While, I don't want to give too much emphasis on being able to modify
> variables in an optimized program (unoptimized programs are a different
> story though), I feel obligated to jump in to say that DWARF already
> does supports this scenario (Section 2.6.2 of DWARF5, page 43):
> """
> The address ranges defined by the bounded location descriptions of a
> location list may overlap. When they do, they describe a situation in
> which an object exists simultaneously in more than one place.
> """
Yup +1
> Now, I don't know of any debugger which actually does this, but in
> theory a debugger could go and update all locations of a variable for a
> given address instead of just the first one.
>
> Changing only one of the CSE'd variables is obviously impossible, but
> that is also something that could be mitigated by a very careful
> debugger -- by checking whether any other variable happens to refer to
> the same location and warning or aborting the modification.
That'd be tricky to do in practice & probably involve more than
checking other location descriptions - possible that the compiler
shared storage in some way (maybe not with another variable, maybe
with the return value or a parameter value) & modifying that storage
would produce a different return value/other changes in program
behavior that were unintended.

I'd thought DWARF had some, at least theoretical (like teh
simultaneous location support), mechanism to describe storage a
consumer could realistically mutate & get something like the
observable behavior as distinct from just the immutable value. But at
a glance I don't find any wording like that. In the absence of that,
yeah, it's basically "if you use a debugger to mutate state in an
optimized program... don't bet on the behavior being in any way
reliable/sensible/consistent".

- Dave

llvm dev - Jun 2020 - [RFC] A value-tracking LiveDebugValues implementation

[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation

[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation

[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation