Jeremy Morse via llvm-dev
2020-Jun-22 12:57 UTC
[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation
Hi Adrian,> quite a lot of workLarge amounts of credit should go to llvm-reduce, which was fundamental to getting any of this done,> I've skimmed your implementation and it looks nice and well-documented.The thing that worries me is the over-complicated lattices -- I didn't anticipate the problem, and there's a risk that it's more complex than it needs to be. As it sounds like getting this reviewed and landed is feasible, I'll write about that (and a worked example) in the review.> I was wondering about potential downsides of tracking values. I noticed that > a surprising amount of developers like to modify variables in the debugger > [...]This is really interesting, and as you say it's something that is already happening today. Consider this: #include <stdbool.h> extern void ext(int); extern bool extb(int, int); void foo(int bar, int baz) { ext(bar); bool cont = true; while (cont) { cont = extb(bar, baz); } } Compiled with a recent master -O2 -g and main() / other functions in another file, we get this disassembly and location list for 'bar': 0x0000000000400483 <+3>: mov %esi,%ebx 0x0000000000400485 <+5>: mov %edi,%ebp => 0x0000000000400487 <+7>: callq 0x4004d0 <ext> DW_AT_location (0x00000000: [0x0000000000400480, 0x0000000000400487): DW_OP_reg5 RDI [0x0000000000400487, 0x00000000004004a3): DW_OP_reg6 RBP) Stepping through the function, we stop on the call to 'ext', and can set the value of 'bar', but because there are two locations (and LiveDebugValues picked the long term register $ebp rather than $edi), you can set "bar" but it has no effect on the call to 'ext'. AFAIUI, this is never a problem at -O0 because there's only ever one location for a variable. With optimisations, and without a way to describe multiple locations for a variable in DWARF, I don't think it's generally solvable. Plus, if you had two variables that have been CSE'd into being the same value, setting one will modify the other. These are all consequences of optimisations changing the structure of the program. The best guarantee that I imagine we could give, is that for any given block, the variable location in that block is the location that instructions read from. That way, if you modify the variable in a block, it's very likely that the next few statements will read that modified variable. It's eminently do-able with the new implementation, as location selection is the last thing that happens and is done on a per-block basis, although it wouldn't be free (in terms of performance cost). I think the "value is only read from one location" idea is true of SelectionDAG, but it might fall apart with other optimisations.> When tracking values, do we keep track of the original DBG_VALUE's !dbg > location to know when we need to stop propagating? [...] insn4 has a stale > value for the variable "v"I don't believe there's any relationship between DBG_VALUE locations and !dbg source locations right now. This is actually one of my pet peeves, that variable locations and the line program don't necessarily line up into something coherent. It's definitely something that leads to misleading program states being presented today; there are at least two bugs.llvm.org reports that are caused by such problems. [Can't find them right now as bugzilla is throwing errors]. Defining an order on !dbg locations, and building DBG_VALUEs into that order, would be really useful for ensuring correctness> If we had such a facility: would it be possible to integrate this into the > new pass implementation?In the simple case, easily: in your example, there would just be an extra layer of mapping from source location to DBG_VALUE; and each DBG_VALUE would have a well defined value number. We could even have insn4 set the variable location to %y's value if it's still available somewhere. If variable locations are defined by source location however, this undermines the meaning of how LiveDebugValues determines variable locations when control flow merges: there wouldn't be a simple "variable location in predecessor block" any more. And the merged "live in" location wouldn't necessarily mean anything to the source locations in the block. Stepping further back, we'd also lose one of the freebies that the current design gives us: when variable values merge at a PHI node, but there's no actual PHI instruction in the IR (because there's no subsequent IR use), we don't create a "dbg.phi(...)" instruction, we just ignore it and let LiveDebugValues patch it up later, if a location can be recovered. If every source location needed to be connected to a variable location record, we would need to represent such debug-only PHIs much earlier in compilation (or drop them). On the other hand, what you're describing (plus the instruction referencing work) is something that doesn't require debug instructions in the IR or MIR, which is highly desirable. We could just store a set of {instruction references, variable / fragment / expr, source locations} and build a location list out of that. Definitely worth pursuing, and value tracking would definitely enable such designs. -- Thanks, Jeremy
Pavel Labath via llvm-dev
2020-Jun-23 06:58 UTC
[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation
On 22/06/2020 14:57, Jeremy Morse via llvm-dev wrote:> Hi Adrian, > >> quite a lot of work > > Large amounts of credit should go to llvm-reduce, which was fundamental to > getting any of this done, > >> I've skimmed your implementation and it looks nice and well-documented. > > The thing that worries me is the over-complicated lattices -- I didn't > anticipate the problem, and there's a risk that it's more complex than it > needs to be. As it sounds like getting this reviewed and landed is feasible, > I'll write about that (and a worked example) in the review. > >> I was wondering about potential downsides of tracking values. I noticed that >> a surprising amount of developers like to modify variables in the debugger >> [...] > > This is really interesting, and as you say it's something that is already > happening today. Consider this: > > #include <stdbool.h> > extern void ext(int); > extern bool extb(int, int); > void foo(int bar, int baz) { > ext(bar); > bool cont = true; > while (cont) { > cont = extb(bar, baz); > } > } > > Compiled with a recent master -O2 -g and main() / other functions in another > file, we get this disassembly and location list for 'bar': > > 0x0000000000400483 <+3>: mov %esi,%ebx > 0x0000000000400485 <+5>: mov %edi,%ebp > => 0x0000000000400487 <+7>: callq 0x4004d0 <ext> > > DW_AT_location (0x00000000: > [0x0000000000400480, 0x0000000000400487): DW_OP_reg5 RDI > [0x0000000000400487, 0x00000000004004a3): DW_OP_reg6 RBP) > > Stepping through the function, we stop on the call to 'ext', and can set the > value of 'bar', but because there are two locations (and LiveDebugValues picked > the long term register $ebp rather than $edi), you can set "bar" but it has > no effect on the call to 'ext'. > > AFAIUI, this is never a problem at -O0 because there's only ever one location > for a variable. With optimisations, and without a way to describe multiple > locations for a variable in DWARF, I don't think it's generally solvable.While, I don't want to give too much emphasis on being able to modify variables in an optimized program (unoptimized programs are a different story though), I feel obligated to jump in to say that DWARF already does supports this scenario (Section 2.6.2 of DWARF5, page 43): """ The address ranges defined by the bounded location descriptions of a location list may overlap. When they do, they describe a situation in which an object exists simultaneously in more than one place. """ Now, I don't know of any debugger which actually does this, but in theory a debugger could go and update all locations of a variable for a given address instead of just the first one. Changing only one of the CSE'd variables is obviously impossible, but that is also something that could be mitigated by a very careful debugger -- by checking whether any other variable happens to refer to the same location and warning or aborting the modification. cheers, pavel
David Blaikie via llvm-dev
2020-Jun-23 15:38 UTC
[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation
On Tue, Jun 23, 2020 at 5:38 AM Pavel Labath via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > On 22/06/2020 14:57, Jeremy Morse via llvm-dev wrote: > > Hi Adrian, > > > >> quite a lot of work > > > > Large amounts of credit should go to llvm-reduce, which was fundamental to > > getting any of this done, > > > >> I've skimmed your implementation and it looks nice and well-documented. > > > > The thing that worries me is the over-complicated lattices -- I didn't > > anticipate the problem, and there's a risk that it's more complex than it > > needs to be. As it sounds like getting this reviewed and landed is feasible, > > I'll write about that (and a worked example) in the review. > > > >> I was wondering about potential downsides of tracking values. I noticed that > >> a surprising amount of developers like to modify variables in the debugger > >> [...] > > > > This is really interesting, and as you say it's something that is already > > happening today. Consider this: > > > > #include <stdbool.h> > > extern void ext(int); > > extern bool extb(int, int); > > void foo(int bar, int baz) { > > ext(bar); > > bool cont = true; > > while (cont) { > > cont = extb(bar, baz); > > } > > } > > > > Compiled with a recent master -O2 -g and main() / other functions in another > > file, we get this disassembly and location list for 'bar': > > > > 0x0000000000400483 <+3>: mov %esi,%ebx > > 0x0000000000400485 <+5>: mov %edi,%ebp > > => 0x0000000000400487 <+7>: callq 0x4004d0 <ext> > > > > DW_AT_location (0x00000000: > > [0x0000000000400480, 0x0000000000400487): DW_OP_reg5 RDI > > [0x0000000000400487, 0x00000000004004a3): DW_OP_reg6 RBP) > > > > Stepping through the function, we stop on the call to 'ext', and can set the > > value of 'bar', but because there are two locations (and LiveDebugValues picked > > the long term register $ebp rather than $edi), you can set "bar" but it has > > no effect on the call to 'ext'. > > > > AFAIUI, this is never a problem at -O0 because there's only ever one location > > for a variable. With optimisations, and without a way to describe multiple > > locations for a variable in DWARF, I don't think it's generally solvable. > > While, I don't want to give too much emphasis on being able to modify > variables in an optimized program (unoptimized programs are a different > story though), I feel obligated to jump in to say that DWARF already > does supports this scenario (Section 2.6.2 of DWARF5, page 43): > """ > The address ranges defined by the bounded location descriptions of a > location list may overlap. When they do, they describe a situation in > which an object exists simultaneously in more than one place. > """Yup +1> Now, I don't know of any debugger which actually does this, but in > theory a debugger could go and update all locations of a variable for a > given address instead of just the first one. > > Changing only one of the CSE'd variables is obviously impossible, but > that is also something that could be mitigated by a very careful > debugger -- by checking whether any other variable happens to refer to > the same location and warning or aborting the modification.That'd be tricky to do in practice & probably involve more than checking other location descriptions - possible that the compiler shared storage in some way (maybe not with another variable, maybe with the return value or a parameter value) & modifying that storage would produce a different return value/other changes in program behavior that were unintended. I'd thought DWARF had some, at least theoretical (like teh simultaneous location support), mechanism to describe storage a consumer could realistically mutate & get something like the observable behavior as distinct from just the immutable value. But at a glance I don't find any wording like that. In the absence of that, yeah, it's basically "if you use a debugger to mutate state in an optimized program... don't bet on the behavior being in any way reliable/sensible/consistent". - Dave