Jeremy Morse via llvm-dev
2020-Jun-18 15:35 UTC
[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation
Hi debuginfo-cabal, tl;dr: Let's please consider using a new implementation of LiveDebugValues that produces richer information, might be slightly faster, but mostly will support the instruction referencing and value tracking paradigm from my RFC [0] rather than the location tracking that LiveDebugValues does today. In that RFC, the main motivator is to treat variable locations a bit more like SSA, having variable location metadata refer to the instructions that define a value, and track the value moving around registers from there. This prototype [1] is an implementation of the "track" mechanism, where we produce value numbers for every 'def' in a function, and track them being moved around. Variable locations are then resolved and extended by following where the value numbers go. This matches, to an extent, the "Register Data Flow" / RDF graph analysis that recently moved from the Hexagon backend to CodeGen. I had a look, and it appears RDF computes much richer information than LiveDebugValues needs (specifically, all uses of values) and allocates for each value number, which is quite a bit of overhead. I decided it'd be simpler to make a new LiveDebugValues implementation, as we've had performance problems with LiveDebugValues in the past, and we don't need to track additional metadata for each value, only number them uniquely. There are few independent reasons to adopt this new implementation, and I'm self-conscious about dropping a code-bomb on people. My questions for this email are: * Does the new implementation (described below) make sense, and * If so, how would I go about landing this? We could wait until there's a complete suite of patches for the instruction referencing work (I've got a working prototype, but not production-quality). On the other hand, this new LiveDebugValues implementation is probably ready for review now, works independently, and is very much the hardest part of instruction referencing, so it'd be nice to get it under-way earlier. My preferred outcome would be adding it as a parallel implementation to the current LiveDebugValues, enabled by a runtime switch, then start to land the rest of the instruction referencing work around it. This reduces the scope for further bit-rot, and as Vedant mentioned [2] this would all need to sit under a runtime-switch for a transition period anyway. # Overview Today's LiveDebugValues (explained in the file docu-comment [3]) concerns itself with variable /locations/ only, not the values that they contain. The implementation is very good at its job: it detects when a location is invalidated (i.e. clobbered), control flow joins are a simple matter of "Do the predecessors agree on the location?", and not much more work is needed. Switching to tracking variable values is more complicated. Following the original proposal, for this pseudo-MIR code sequence: $rax = somedef DBG_INSTR_REF reference-to-somedef $rbx = COPY killed $rax ; Variable location is $rax, then moves to $rbx We need to track that: a) $rax contains a value defined by the "somedef" instruction, b) The DBG_INSTR_REF needs a location for the value somedef defines. c) That value from "somedef" is copied to rbx, The new LiveDebugValues implementation doesn't have DBG_INSTR_REF yet, so I've written it to work with DBG_VALUEs too, to prove that it works just as well as the current implementation. It deals with code sequences such as: $rax = somedef DBG_VALUE $rax $rbx = COPY killed $rax ; Variable location is $rax, then moves to $rbx In which the same tracking has to occur, but the DBG_VALUE "reads" a value number from $rax. As the variable locations are determined by the value number that the variable has, it doesn't matter whether that was specified by a DBG_VALUE or DBG_INSTR_REF. The tracking is sufficiently good to produce an identical binary to the output of current LiveDebugValues (details below). There are some unrelated benefits to consider: because the new implementation tracks all locations that a value may be in, we can handle the following problem, that the current implementation cannot: entry: $rax = somedef DBG_VALUE $rax SPILL_TO_STACK %somestackslot, $rax loop: [... some code...] $rax = RESTORE_FROM_STACK %somestackslot do_things_with_rax brcc loop exit: ret Here, there's a variable spilt to the stack, that's temporarily restored in the loop. At the end of the "loop" block, the value has two locations: $rax and the stack slot. The current LiveDebugValues always follows the restore from the stack: it decides the variable is on the stack at the end of "entry", and in $rax at the end of "loop". Because these locations don't agree on entry to "loop", the location in "loop" is dropped completely. Having LiveDebugValues ignore or prefer the restore just changes what kind of circumstances locations get dropped in: the underlying problem is that the value can be in multiple places, but we can track only one of those places. The new implementation has enough information to solve this, as it tracks all values in all machine locations. Unfortunately it doesn't make much difference to the location statistics for clang RelWithDebInfo builds. An additional benefit is that we would be better placed to produce saner locations lists [4]. In terms of performance, the new implementation takes on average just as long as the current implementation. A full stage2 build of clang may be 0.1% slower, but it's in the noise on my machine. All the algorithms are much more localised than the current implementation (see below), and I reckon there's still scope for further optimisation as it's easier to make decisions such as "Don't try to join these variable locations in this block, they're immediately re-stated by DBG_VALUEs". # Implementation The new implementation strictly does more work than the old one: it tracks every value in the function, not just those that are associated with a variable location. It also has to detect places where control flow merges and produces PHI values, where the old implementation could just check that the live-out variable location in predecessors agreed. To address this, I've separated the implementation into two smaller problems, that can be solved almost independently and with much greater locality. Both of the problems below are sort-of-like SSA construction (they have a set of defs, and a large set of uses), but instead of calculating new PHI placements, we have to infer where PHIs _were_ placed. Problem 1: Machine value numbering. Every def in the function is given a number, which is then condensed into a transfer function for each block, specifying which registers (or spill locs) contain which values. It's genuinely just a map of machine location to value. Using the same sort of dataflow model that LiveDebugValues currently uses, values are propagated to compute the live-ins and live-out values of each block, with one significant modification: we assign a new value number when control flow merges different values, essentially re-computing PHI nodes. This leads to some ungraceful code: see "Ugly PHI stuff" below. This problem is kept localised by only considering the registers and spill slots that are accessed, making the problem size proportionate to the product of the max working set of the function and the number of blocks. The number of variables used is irrelevant. Problem 2: Variable value assignment. By using the solution for the first problem, of "what value number does each machine location contain" at each program point, every DBG_VALUE can be interpreted as assigning a value to a variable, not a location. This gets turned into a transfer function too, and we can then perform another dataflow analysis, propagating what _value_ a variable has in a block, and computing the set of live-in variable values for each block. A specific machine location can be picked after this problem is solved, possibly from several options if multiple locations have the same value. This isn't a completely clean abstraction: predecessors with different variable values that are in the same location should be propagated, but the new implementation has to find and validate a PHI value number that represents those merged values (see "More ugly PHI stuff" below). I'm keeping problem 2 localised by propagating variable values for one lexical scope at a time, and only over the blocks that are in that scope. As well as localising by turning the main problem into a large number of small problems, this reduces the amount of work too. The current implementation filters out variable locations that are out of lexical scope every time a block is processed, which now no longer needs to happen. Variables in the outermost scopes won't be re-processed when LiveDebugValues goes around an inner-scope loop several times; likewise variables in inner scopes won't get re-analysed when an outer scope variable needs another pass through the function. This also means that all lexical scopes that cover a single block (or zero) require no dataflow analysis, which should attenuate the effects of heavy inlining a bit. The solution to these two problems (block live-in variable values and block live-in machine location values) can then be combined in a final step through the function, producing DBG_VALUEs for live-ins and DBG_VALUEs for any location transfers within a block. # Testing and validation The new LiveDebugValues implementation produces an identical AMD64 clang-3.4 binary to the old implementation, based on d481e59863a (Feb 2020). Well, almost -- it proved too difficult to completely emulate the old version, instead with the following changes [5] applied to d481e59863a, the two implementations produce an identical binary: * Applied D67500 [6], * Ignore identity copies, * Force any newly created DBG_VALUEs to be inserted according to the order in which DebugVariables appear in the input function. So that they're consistent between the two implementations. * Disabling all tracking of stack spills. Ignoring the final point, I think this would demonstrate that the two implementations are equivalent. Stack spills are tricky: because value transfers within blocks are left until the final step of the new implementation, it's hard to work out in advance when old LiveDebugValues would have transferred a location. It was easier to just compare binaries where spills hadn't been followed. IMO, the non-spill (the vast majority) locations being identical should be sufficient to show that the overall algorithm works, and dealing with spills is a minor extension. The tests pass, except for the ones covered by the limitations section, and a few DBG_VALUE-change-order issues. # Limitations and unimplemented features There are a couple of things that I haven't gotten around to implementing yet, for which the tests fail: Overlapping fragments: this is trivial to do, but at the back of my job queue, Entry values: much more interesting. As Djordje has pointed out elsewhere, LiveDebugValues has been assuming that every DBG_VALUE assigns a new value to a variable [7], which means if a DBG_VALUE re-states a variable location when it doesn't have to, we risk dropping an entry value. This will be much easier in this new model (I haven't had the time to do it yet), because the machine value numbering means that each entry value has a specific number, and we can easily identify when it's no longer available. This should just be a matter of the in-block transfers tracking code identifying that a variable should have the entry value; it not being any machine location currently; and issuing an appropriate DBG_VALUE. # Ugly PHI stuff Above, I said that this problem is a lot closer to SSA construction than other classic compiler problems. However, it's actually even weirder: instead of having a set of uses and defs and computing the locations of PHIs: every instruction with a DebugLoc in lexical scope is a use, and the PHIs are already placed just not recorded. Thus the problems above are more about recovering the locations of PHIs after they're eliminated. This adds a lot of complexity to the lattice structure used for each variable location compared to the current LiveDebugValues: there used to only ever be one candidate location for each variable at every control flow merge, and over the course of dataflow analysis this was found to be either true or false. However because we're iteratively discovering new PHI locations as we explore, the new implementation's lattice consists of: * A single candidate def (\top), * A PHI value at every control flow merge on the path between between the def and the current block, * A PHI value at the current block (\bot). Each of which potentially needs one iteration of dataflow to see whether the join should produce that value. This only truly needs exploring at loop heads, and the maximum number of iterations required is the depth of loops in the function, which is the same as the old LiveDebugValues implementation. However because all the PHI values have to be discovered, that number of iterations is required more often than the old implementation. I wrote more about the lattice in the docu-comment in the new implementation [8]. I can produce a worked example too, but this email has gotten a bit wordy already. # More ugly PHI stuff The section above holds true for the variable values analysis too, where instead of defs we have variable assignments at DBG_VALUEs. Rather than discovering where PHIs happen and propagating that information, variable value analysis has to do the same to discover what machine-value PHI would be needed to merge a variable location from all predecessors, and then examine whether such a PHI is actually available, and has the right inputs. (This is annoyingly fiddly). # Summary This new implementation goes fast enough, is accurate enough, and appears to be sound. That isn't a reason to replace the current implementation; but eventually the instruction reference work would make use of this, so if possible I'd like to start review on it, if that's acceptable to people. Seeing how you've just read a lot of words, a motivating reminder: this path can lead, slowly, to there being no debug instructions for variable locations! And as described in my RFC sitrep-email, it can be more accurate than location only tracking too. [0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html [1] Branch is here: https://github.com/jmorse/llvm-project/commits/ldvars-separation , tip is 11450def5bc592321ece2a64ea2a8cabbc614c39 at time of sending. I could open a PR containing the changes, but github mucks them up; best to just look at the bare file [8]. [2] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139469.html [3] https://github.com/llvm/llvm-project/blob/3626eba11f2367da88ccb0280a34731243278034/llvm/lib/CodeGen/LiveDebugValues.cpp#L9 [4] https://bugs.llvm.org/show_bug.cgi?id=42834 [5] https://github.com/jmorse/llvm-project/commit/720876fec2a5113d52977d00e0e1f7d526b4c1f0 four commits from here [6] https://reviews.llvm.org/D67500 , I probably should push to land this [7] It's not a bad view, and definitely something I've been preaching, [8] https://github.com/jmorse/llvm-project/blob/11450def5bc592321ece2a64ea2a8cabbc614c39/llvm/lib/CodeGen/LiveDebugValues.cpp#L60 -- Thanks for reading this essay, Jeremy
Adrian Prantl via llvm-dev
2020-Jun-18 16:51 UTC
[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation
Jeremy, thank you, that is quite a lot of work you put in there! Generally, speaking there is nothing wrong with replacing the current LiveDebugValues pass with a new implementation if the new implementation solves all the same problems, and is either simpler or faster, or more accurate. We just need to be sure to judge the "simpler" aspect only after all the ugly edge cases have been addressed. The LLVM way of staging is is to land the new implementation in trunk, and when it's ready, make it the default, and then remove the old one. Unfortunately we have a track record of getting stuck after step 1, because it's always difficult to completely replace a matured implementation of anything. I agree that tracking values instead of instructions is a very elegant way of dealing with the output of regalloc and spilling and that it would seem to have potential for longer live ranges for the individual debug values. I've skimmed your implementation and it looks nice and well-documented. I had two thoughts while read while reading the proposal, but reflecting on it, I fell like these are not necessarily just problems with your proposal, but also problems with current implementation. I'm going to include them here because I already typed them out, but honestly they shouldn't affect our decision :-) 1. I was wondering about potential downsides of tracking values. I noticed that a surprising amount of developers like to modify variables in the debugger. DWARF allows this when you have memory or register locations, as opposed to (DW_OP) stack values. If we wanted to become better at this, tracking values sounds potentially dangerous, since we can't know if the same value in a different location has any correspondence with the variable any more. I suppose that even with the new algorithm we could insert a hypothetical DBG_DECLARE at the definition and a DBG_VALUE when the register is moved. I guess there is no theoretical limit to supporting this with the value tracking approach. 2. When tracking values, do we keep track of the original DBG_VALUE's !dbg location to know when we need to stop propagating? I was worried that with good value tracking we might carry a stale value forward too far. But writing down the example made me realize that this is more an issue for HowToUpdateDebugInfo.rst than one just local toLiveDebugValues. If we have the following unoptimized program, with !dbg !1 ~ line 1, etc: %x = insn1, !dbg !1 DBG_VALUE "v", %x, !dbg !1 insn2, !dbg !2 %y = insn3, !dbg !3 DBG_VALUE "v", %y, !dbg !3 insn4, !dbg !4 %z = insn5, !dbg !5 DBG_VALUE "v", %z, !dbg !5 and it gets optimized to %y = insn3, !dbg !3 DBG_VALUE "v", %y, !dbg !3 %x = insn1, !dbg !1 DBG_VALUE "v", %x, !dbg !1 insn2, !dbg !2 insn4, !dbg !4 ; now has wrong DBG_VALUE for "v" %z = insn5, !dbg !5 DBG_VALUE "v", %z, !dbg !5 then insn4 has a stale value for the variable "v", because at location !dbg !4 it should be %y and not %x. We can require all passes to insert a DBG_VALUE undef after moving instructions, or we could start giving debug intrinsics a range of !dbg locations, such as DBG_VALUE "v", %x, [!dbg !1, dbg !3) and define an ordering of !dbg locations. If we had such a facility: would it be possible to integrate this into the new pass implementation? -- adrian
Jeremy Morse via llvm-dev
2020-Jun-22 12:57 UTC
[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation
Hi Adrian,> quite a lot of workLarge amounts of credit should go to llvm-reduce, which was fundamental to getting any of this done,> I've skimmed your implementation and it looks nice and well-documented.The thing that worries me is the over-complicated lattices -- I didn't anticipate the problem, and there's a risk that it's more complex than it needs to be. As it sounds like getting this reviewed and landed is feasible, I'll write about that (and a worked example) in the review.> I was wondering about potential downsides of tracking values. I noticed that > a surprising amount of developers like to modify variables in the debugger > [...]This is really interesting, and as you say it's something that is already happening today. Consider this: #include <stdbool.h> extern void ext(int); extern bool extb(int, int); void foo(int bar, int baz) { ext(bar); bool cont = true; while (cont) { cont = extb(bar, baz); } } Compiled with a recent master -O2 -g and main() / other functions in another file, we get this disassembly and location list for 'bar': 0x0000000000400483 <+3>: mov %esi,%ebx 0x0000000000400485 <+5>: mov %edi,%ebp => 0x0000000000400487 <+7>: callq 0x4004d0 <ext> DW_AT_location (0x00000000: [0x0000000000400480, 0x0000000000400487): DW_OP_reg5 RDI [0x0000000000400487, 0x00000000004004a3): DW_OP_reg6 RBP) Stepping through the function, we stop on the call to 'ext', and can set the value of 'bar', but because there are two locations (and LiveDebugValues picked the long term register $ebp rather than $edi), you can set "bar" but it has no effect on the call to 'ext'. AFAIUI, this is never a problem at -O0 because there's only ever one location for a variable. With optimisations, and without a way to describe multiple locations for a variable in DWARF, I don't think it's generally solvable. Plus, if you had two variables that have been CSE'd into being the same value, setting one will modify the other. These are all consequences of optimisations changing the structure of the program. The best guarantee that I imagine we could give, is that for any given block, the variable location in that block is the location that instructions read from. That way, if you modify the variable in a block, it's very likely that the next few statements will read that modified variable. It's eminently do-able with the new implementation, as location selection is the last thing that happens and is done on a per-block basis, although it wouldn't be free (in terms of performance cost). I think the "value is only read from one location" idea is true of SelectionDAG, but it might fall apart with other optimisations.> When tracking values, do we keep track of the original DBG_VALUE's !dbg > location to know when we need to stop propagating? [...] insn4 has a stale > value for the variable "v"I don't believe there's any relationship between DBG_VALUE locations and !dbg source locations right now. This is actually one of my pet peeves, that variable locations and the line program don't necessarily line up into something coherent. It's definitely something that leads to misleading program states being presented today; there are at least two bugs.llvm.org reports that are caused by such problems. [Can't find them right now as bugzilla is throwing errors]. Defining an order on !dbg locations, and building DBG_VALUEs into that order, would be really useful for ensuring correctness> If we had such a facility: would it be possible to integrate this into the > new pass implementation?In the simple case, easily: in your example, there would just be an extra layer of mapping from source location to DBG_VALUE; and each DBG_VALUE would have a well defined value number. We could even have insn4 set the variable location to %y's value if it's still available somewhere. If variable locations are defined by source location however, this undermines the meaning of how LiveDebugValues determines variable locations when control flow merges: there wouldn't be a simple "variable location in predecessor block" any more. And the merged "live in" location wouldn't necessarily mean anything to the source locations in the block. Stepping further back, we'd also lose one of the freebies that the current design gives us: when variable values merge at a PHI node, but there's no actual PHI instruction in the IR (because there's no subsequent IR use), we don't create a "dbg.phi(...)" instruction, we just ignore it and let LiveDebugValues patch it up later, if a location can be recovered. If every source location needed to be connected to a variable location record, we would need to represent such debug-only PHIs much earlier in compilation (or drop them). On the other hand, what you're describing (plus the instruction referencing work) is something that doesn't require debug instructions in the IR or MIR, which is highly desirable. We could just store a set of {instruction references, variable / fragment / expr, source locations} and build a location list out of that. Definitely worth pursuing, and value tracking would definitely enable such designs. -- Thanks, Jeremy
Vedant Kumar via llvm-dev
2020-Jun-25 21:33 UTC
[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation
Thanks for sharing this RFC, Jeremy.> On Jun 18, 2020, at 8:35 AM, Jeremy Morse <jeremy.morse.llvm at gmail.com> wrote: > > Hi debuginfo-cabal, > > tl;dr: Let's please consider using a new implementation of LiveDebugValues > that produces richer information, might be slightly faster, but mostly will > support the instruction referencing and value tracking paradigm from my RFC [0] > rather than the location tracking that LiveDebugValues does today. > > In that RFC, the main motivator is to treat variable locations a bit more like > SSA, having variable location metadata refer to the instructions that define a > value, and track the value moving around registers from there. This > prototype [1] is an implementation of the "track" mechanism, where we produce > value numbers for every 'def' in a function, and track them being moved around. > Variable locations are then resolved and extended by following where the value > numbers go.Your earlier RFC sketches a plan for extending LiveDebugValues to handle instruction-referencing DBG_VALUEs. From http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html <http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html> -->> How then do we translate this new kind of machine location into >> DWARF/CodeView variable locations, which need to know which register >> to look in? The answer is: LiveDebugValues [3]. We already perform a >> dataflow analysis in LiveDebugValues of where values "go" after >> they're defined: we can use that to take "defining instructions" and >> determine register / stack locations. We would need to track values >> from the defining instruction up to the DBG_VALUE where that value >> becomes a variable location, after which it's no different from the >> LiveDebugValues analysis that we perform today.I had interpreted this as meaning: introduce transfer functions to walk from the referenced instruction until we reach the referencing DBG_VALUE, then create a VarLoc. IIUC, at that point the unmodified LiveDebugValues algorithm could proceed as usual. Reading this RFC, I get the sense that that has several drawbacks: it doesn’t handle situations where the referenced instruction is in a different basic block than its referencing DBG_VALUE (all the ‘ugly phi stuff’ is missing!), and it doesn’t track all the locations for a variable. Is that correct?> This matches, to an extent, the "Register Data Flow" / RDF graph analysis > that recently moved from the Hexagon backend to CodeGen. I had a look, and > it appears RDF computes much richer information than LiveDebugValues needs > (specifically, all uses of values) and allocates for each value number, which > is quite a bit of overhead. I decided it'd be simpler to make a new > LiveDebugValues implementation, as we've had performance problems with > LiveDebugValues in the past, and we don't need to track additional metadata > for each value, only number them uniquely. > > There are few independent reasons to adopt this new implementation, and I'm > self-conscious about dropping a code-bomb on people. My questions for this > email are: > * Does the new implementation (described below) make sense, andI think your results speak for themselves :).> * If so, how would I go about landing this?One thing I’m not yet clear on: does your prototype essentially implement the minimal extension to LiveDebugValues needed to handle instruction-referencing DBG_VALUEs? If not, should it?, or should we take the opportunity to remove limitations from the current LiveDebugValues algorithm? As for how to land this, I propose: - Move LiveDebugValues.cpp to lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp - Split out common transfer functions and other helpers into lib/CodeGen/LiveDebugValues/Common.{h,cpp} - Finally, add lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp - Add a cl::opt like -instr-ref-dbg-values to allow selecting a DBG_VALUE flavor> We could wait until there's a complete suite of patches for the instruction > referencing work (I've got a working prototype, but not production-quality). > On the other hand, this new LiveDebugValues implementation is probably ready > for review now, works independently, and is very much the hardest part of > instruction referencing, so it'd be nice to get it under-way earlier. My > preferred outcome would be adding it as a parallel implementation to the > current LiveDebugValues, enabled by a runtime switch, then start to land > the rest of the instruction referencing work around it. This reduces the > scope for further bit-rot, and as Vedant mentioned [2] this would all need > to sit under a runtime-switch for a transition period anyway.Yes, I’m in favor of landing work related to instruction-referencing DBG_VALUEs in small-ish independent patches, as they become ready. I’d very much like to keep the unmodified var-loc DBG_VALUE code paths intact and separated.> > # Overview > > Today's LiveDebugValues (explained in the file docu-comment [3]) concerns > itself with variable /locations/ only, not the values that they contain. The > implementation is very good at its job: it detects when a location is > invalidated (i.e. clobbered), control flow joins are a simple matter of "Do > the predecessors agree on the location?", and not much more work is needed. > > Switching to tracking variable values is more complicated. Following the > original proposal, for this pseudo-MIR code sequence: > > $rax = somedef > DBG_INSTR_REF reference-to-somedef > $rbx = COPY killed $rax > ; Variable location is $rax, then moves to $rbx > > We need to track that: > a) $rax contains a value defined by the "somedef" instruction, > b) The DBG_INSTR_REF needs a location for the value somedef defines. > c) That value from "somedef" is copied to rbx, > > The new LiveDebugValues implementation doesn't have DBG_INSTR_REF yet, so > I've written it to work with DBG_VALUEs too, to prove that it works just as > well as the current implementation. It deals with code sequences such as: > > $rax = somedef > DBG_VALUE $rax > $rbx = COPY killed $rax > ; Variable location is $rax, then moves to $rbx > > In which the same tracking has to occur, but the DBG_VALUE "reads" a value > number from $rax. As the variable locations are determined by the value number > that the variable has, it doesn't matter whether that was specified by a > DBG_VALUE or DBG_INSTR_REF. The tracking is sufficiently good to produce an > identical binary to the output of current LiveDebugValues (details below). > > There are some unrelated benefits to consider: because the new implementation > tracks all locations that a value may be in, we can handle the following > problem, that the current implementation cannot: > > entry: > $rax = somedef > DBG_VALUE $rax > SPILL_TO_STACK %somestackslot, $rax > loop: > [... some code...] > $rax = RESTORE_FROM_STACK %somestackslot > do_things_with_rax > brcc loop > exit: > ret > > Here, there's a variable spilt to the stack, that's temporarily restored > in the loop. At the end of the "loop" block, the value has two locations: > $rax and the stack slot. The current LiveDebugValues always follows the > restore from the stack: it decides the variable is on the stack at the end > of "entry", and in $rax at the end of "loop". Because these locations don't > agree on entry to "loop", the location in "loop" is dropped completely. > Having LiveDebugValues ignore or prefer the restore just changes what kind of > circumstances locations get dropped in: the underlying problem is that the > value can be in multiple places, but we can track only one of those places. > > The new implementation has enough information to solve this, as it tracks all > values in all machine locations. Unfortunately it doesn't make much difference > to the location statistics for clang RelWithDebInfo builds. An additional > benefit is that we would be better placed to produce saner locations lists [4]. > > In terms of performance, the new implementation takes on average just as long > as the current implementation. A full stage2 build of clang may be 0.1% slower, > but it's in the noise on my machine. All the algorithms are much more localised > than the current implementation (see below), and I reckon there's still scope > for further optimisation as it's easier to make decisions such as "Don't try to > join these variable locations in this block, they're immediately re-stated by > DBG_VALUEs". > > # Implementation > > The new implementation strictly does more work than the old one: it tracks every > value in the function, not just those that are associated with a variable > location. It also has to detect places where control flow merges and produces > PHI values, where the old implementation could just check that the live-out > variable location in predecessors agreed. To address this, I've separated the > implementation into two smaller problems, that can be solved almost > independently and with much greater locality. > > Both of the problems below are sort-of-like SSA construction (they have a set > of defs, and a large set of uses), but instead of calculating new PHI > placements, we have to infer where PHIs _were_ placed. > > Problem 1: Machine value numbering. Every def in the function is given a > number, which is then condensed into a transfer function for each block, > specifying which registers (or spill locs) contain which values. It's > genuinely just a map of machine location to value. Using the same sort of > dataflow model that LiveDebugValues currently uses, values are propagated to > compute the live-ins and live-out values of each block, with one significant > modification: we assign a new value number when control flow merges different > values, essentially re-computing PHI nodes. This leads to some ungraceful > code: see "Ugly PHI stuff" below. This problem is kept localised by only > considering the registers and spill slots that are accessed, making the > problem size proportionate to the product of the max working set of the > function and the number of blocks. The number of variables used is irrelevant. > > Problem 2: Variable value assignment. By using the solution for the first > problem, of "what value number does each machine location contain" at each > program point, every DBG_VALUE can be interpreted as assigning a value to a > variable, not a location. This gets turned into a transfer function too, and > we can then perform another dataflow analysis, propagating what _value_ a > variable has in a block, and computing the set of live-in variable values for > each block. A specific machine location can be picked after this problem is > solved, possibly from several options if multiple locations have the same > value. This isn't a completely clean abstraction: predecessors with different > variable values that are in the same location should be propagated, but the > new implementation has to find and validate a PHI value number that represents > those merged values (see "More ugly PHI stuff" below). > > I'm keeping problem 2 localised by propagating variable values for one lexical > scope at a time, and only over the blocks that are in that scope. As well as > localising by turning the main problem into a large number of small problems, > this reduces the amount of work too. The current implementation filters out > variable locations that are out of lexical scope every time a block is > processed, which now no longer needs to happen. Variables in the outermost > scopes won't be re-processed when LiveDebugValues goes around an inner-scope > loop several times; likewise variables in inner scopes won't get re-analysed > when an outer scope variable needs another pass through the function. This also > means that all lexical scopes that cover a single block (or zero) require no > dataflow analysis, which should attenuate the effects of heavy inlining a bit. > > The solution to these two problems (block live-in variable values and block > live-in machine location values) can then be combined in a final step through > the function, producing DBG_VALUEs for live-ins and DBG_VALUEs for any > location transfers within a block. > > # Testing and validation > > The new LiveDebugValues implementation produces an identical AMD64 clang-3.4 > binary to the old implementation, based on d481e59863a (Feb 2020). Well, > almost -- it proved too difficult to completely emulate the old version, > instead with the following changes [5] applied to d481e59863a, the two > implementations produce an identical binary: > * Applied D67500 [6], > * Ignore identity copies, > * Force any newly created DBG_VALUEs to be inserted according to the order > in which DebugVariables appear in the input function. So that they're > consistent between the two implementations. > * Disabling all tracking of stack spills. > > Ignoring the final point, I think this would demonstrate that the two > implementations are equivalent. Stack spills are tricky: because value > transfers within blocks are left until the final step of the new > implementation, it's hard to work out in advance when old LiveDebugValues > would have transferred a location. It was easier to just compare binaries > where spills hadn't been followed. > > IMO, the non-spill (the vast majority) locations being identical should be > sufficient to show that the overall algorithm works, and dealing with spills > is a minor extension. > > The tests pass, except for the ones covered by the limitations section, and > a few DBG_VALUE-change-order issues. > > # Limitations and unimplemented features > > There are a couple of things that I haven't gotten around to implementing > yet, for which the tests fail: > > Overlapping fragments: this is trivial to do, but at the back of my job queue, > > Entry values: much more interesting. As Djordje has pointed out elsewhere, > LiveDebugValues has been assuming that every DBG_VALUE assigns a new value > to a variable [7], which means if a DBG_VALUE re-states a variable location > when it doesn't have to, we risk dropping an entry value. This will be much > easier in this new model (I haven't had the time to do it yet), because the > machine value numbering means that each entry value has a specific number, > and we can easily identify when it's no longer available. This should just > be a matter of the in-block transfers tracking code identifying that a > variable should have the entry value; it not being any machine location > currently; and issuing an appropriate DBG_VALUE. > > # Ugly PHI stuff > > Above, I said that this problem is a lot closer to SSA construction than other > classic compiler problems. However, it's actually even weirder: instead of > having a set of uses and defs and computing the locations of PHIs: every > instruction with a DebugLoc in lexical scope is a use, and the PHIs are already > placed just not recorded. Thus the problems above are more about recovering > the locations of PHIs after they're eliminated. > > This adds a lot of complexity to the lattice structure used for each variable > location compared to the current LiveDebugValues: there used to only ever be > one candidate location for each variable at every control flow merge, and over > the course of dataflow analysis this was found to be either true or false. > However because we're iteratively discovering new PHI locations as we explore, > the new implementation's lattice consists of: > * A single candidate def (\top), > * A PHI value at every control flow merge on the path between between the def > and the current block, > * A PHI value at the current block (\bot). > Each of which potentially needs one iteration of dataflow to see whether the > join should produce that value. > > This only truly needs exploring at loop heads, and the maximum number of > iterations required is the depth of loops in the function, which is the same as > the old LiveDebugValues implementation. However because all the PHI values > have to be discovered, that number of iterations is required more often than > the old implementation. > > I wrote more about the lattice in the docu-comment in the new implementation > [8]. I can produce a worked example too, but this email has gotten a bit > wordy already. > > # More ugly PHI stuff > > The section above holds true for the variable values analysis too, where > instead of defs we have variable assignments at DBG_VALUEs. Rather than > discovering where PHIs happen and propagating that information, variable > value analysis has to do the same to discover what machine-value PHI would be > needed to merge a variable location from all predecessors, and then examine > whether such a PHI is actually available, and has the right inputs. (This is > annoyingly fiddly). > > # Summary > > This new implementation goes fast enough, is accurate enough, and appears > to be sound. That isn't a reason to replace the current implementation; but > eventually the instruction reference work would make use of this, so if > possible I'd like to start review on it, if that's acceptable to people. > > Seeing how you've just read a lot of words, a motivating reminder: this path > can lead, slowly, to there being no debug instructions for variable locations! > And as described in my RFC sitrep-email, it can be more accurate than > location only tracking too. > > [0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html > [1] Branch is here: > https://github.com/jmorse/llvm-project/commits/ldvars-separation , tip > is 11450def5bc592321ece2a64ea2a8cabbc614c39 at time of sending. I > could open a PR containing the changes, but github mucks them up; best > to just look at the bare file [8]. > [2] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139469.html > [3] https://github.com/llvm/llvm-project/blob/3626eba11f2367da88ccb0280a34731243278034/llvm/lib/CodeGen/LiveDebugValues.cpp#L9 > [4] https://bugs.llvm.org/show_bug.cgi?id=42834 > [5] https://github.com/jmorse/llvm-project/commit/720876fec2a5113d52977d00e0e1f7d526b4c1f0 > four commits from here > [6] https://reviews.llvm.org/D67500 , I probably should push to land this > [7] It's not a bad view, and definitely something I've been preaching, > [8] https://github.com/jmorse/llvm-project/blob/11450def5bc592321ece2a64ea2a8cabbc614c39/llvm/lib/CodeGen/LiveDebugValues.cpp#L60 > > -- > Thanks for reading this essay, > Jeremyvedant -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200625/104a0664/attachment.html>
Jeremy Morse via llvm-dev
2020-Jun-26 12:11 UTC
[llvm-dev] [RFC] A value-tracking LiveDebugValues implementation
Hi Vedant, On Thu, Jun 25, 2020 at 10:33 PM Vedant Kumar <vedant_kumar at apple.com> wrote:> Your earlier RFC sketches a plan for extending LiveDebugValues to handle instruction-referencing DBG_VALUEs. From http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html -- > > How then do we translate this new kind of machine location into > DWARF/CodeView variable locations, which need to know which register > to look in? The answer is: LiveDebugValues [3]. We already perform a > dataflow analysis in LiveDebugValues of where values "go" after > they're defined: we can use that to take "defining instructions" and > determine register / stack locations. We would need to track values > from the defining instruction up to the DBG_VALUE where that value > becomes a variable location, after which it's no different from the > LiveDebugValues analysis that we perform today. > > I had interpreted this as meaning: introduce transfer functions to walk from the referenced instruction until we reach the referencing DBG_VALUE, then create a VarLoc. IIUC, at that point the unmodified LiveDebugValues algorithm could proceed as usual.I think that was my meaning at the time; and it's still broadly true. Unfortunately, it turns out that there can be pretty much an arbitrary computation between the referenced instruction and the referencing DBG_INSTR_REF. I ran across several scenarios where a referenced instruction gets CSE'd / PRE'd / hoisted away from the referencing DBG_INSTR_REF, to the extent that it's on the other side of a loop. Once it's there, a dataflow algorithm is needed to work out whether the value is live through the loop, or whether it gets clobbered somewhere. That's what the "machine value number" problem is solving. It could still plug into the VarLoc implementation if that makes an easier transition, where a DBG_INSTR_REF is decomposed into a DBG_VALUE when an instruction reference / value number / location are matched up. It would make debug use-before-defs harder to deal with, though. (It's worth noting at this point that the machine value number work is, in a sense, doing LiveDebugVariables' job, by linking up a register def with a use. Just, LiveDebugVariables has the register allocator to help it do it).> Reading this RFC, I get the sense that that has several drawbacks: it doesn’t handle situations where the referenced instruction is in a different basic block than its referencing DBG_VALUE (all the ‘ugly phi stuff’ is missing!), and it doesn’t track all the locations for a variable. Is that correct?This might be my overly-defensive writing style working against me. I believe those problems are solved, but the solutions are more complicated than I wanted. For:> it doesn’t handle situations where the referenced instruction is in a different basic block than its referencing DBG_VALUEThis is solved by the machine value number dataflow; but as mentioned requires a more complex lattice to infer where PHI values are generated,> and it doesn’t track all the locations for a variableI believe the implementation correctly picks the _value_ for every variable in each block, and picking a location is a final post-processing stage.> One thing I’m not yet clear on: does your prototype essentially implement the minimal extension to LiveDebugValues needed to handle instruction-referencing DBG_VALUEs? If not, should it?, or should we take the opportunity to remove limitations from the current LiveDebugValues algorithm?I have the code for this hanging around, but not in the tree I linked to. It establishes a fairly trivial mapping between: * Instructions that are referred to, and * The machine value numbers that instruction produces. I'd suggest that it's not useful without DBG_INSTR_REFs as we wouldn't be able to test it., and testing it means bringing in patches to generate or parse DBG_INSTR_REFs, which I'd rather split up.> As for how to land this, I propose: > > - Move LiveDebugValues.cpp to lib/CodeGen/LiveDebugValues/VarLocBasedImpl.cpp > - Split out common transfer functions and other helpers into lib/CodeGen/LiveDebugValues/Common.{h,cpp} > - Finally, add lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp > - Add a cl::opt like -instr-ref-dbg-values to allow selecting a DBG_VALUE flavorSounds good to me,> Yes, I’m in favor of landing work related to instruction-referencing DBG_VALUEs in small-ish independent patches, as they become ready. I’d very much like to keep the unmodified var-loc DBG_VALUE code paths intact and separated.-- Thanks, Jeremy