Jeremy Morse via llvm-dev
2020-Nov-06 18:26 UTC
[llvm-dev] [DebugInfo] A value-tracking variable location update
Hi debug-info folks, Time for another update on the variable location "instruction referencing" implementation I've been doing, see this RFC [0, 1] for background. It's now at the point where I'd call it "done" (as far as software ever is), and so it's a good time to look at what results it produces. And here are the scores-on-the-doors using llvm-locstats, on clang-3.4 RelWithDebInfo first in "normal" mode and then with -Xclang -fexperimental-debug-variable-locations. "normal": ================================================ cov% samples percentage(~) ------------------------------------------------- 0% 765406 22% (0%,10%) 45179 1% [10%,20%) 51699 1% [20%,30%) 52044 1% [30%,40%) 46905 1% [40%,50%) 48292 1% [50%,60%) 61342 1% [60%,70%) 58315 1% [70%,80%) 69848 2% [80%,90%) 81937 2% [90%,100%) 101384 2% 100% 2032034 59% ================================================ -the number of debug variables processed: 3414385 -PC ranges covered: 61% ------------------------------------------------- -total availability: 64% ================================================ With instruction referencing: ================================================ cov% samples percentage(~) ------------------------------------------------- 0% 751201 21% (0%,10%) 40708 1% [10%,20%) 44909 1% [20%,30%) 47544 1% [30%,40%) 41630 1% [40%,50%) 42742 1% [50%,60%) 56692 1% [60%,70%) 53796 1% [70%,80%) 64476 1% [80%,90%) 73836 2% [90%,100%) 74423 2% 100% 2123749 62% ================================================ -the number of debug variables processed: 3415706 -PC ranges covered: 68% ------------------------------------------------- -total availability: 64% ================================================ The first observation: a significant increase in the byte-coverage statistic, meaning that we're able to track variable locations for longer and across more code. This was one of the main aims of this work, having better tracking of the locations that we know. The increase of seven percentage points includes an additional two percentage points of entry-value locations. If we disable entry value production then the scope-bytes-covered statistic moves from 59% to 64%, which is still a decent improvement. The next observation is that the ``total availability'' of variables hasn't changed. This isn't the fully story -- if you give an absolute name to every variable with a location in the clang binary, there are 6949 dropped locations and 22564 completely new locations, meaning roughly 1% of all variables in the program have changed, it's just hidden by the statistics rounding. More detail on the nature of the changes are below. I was hoping for more false locations to be dropped; it's quite likely that there are many more false locations dropped within variables that have more than one value, which aren't readily reflected in these statistics. A natural question is: are all these new locations wrong, and the dropped locations only dropped because of bugs? To address that, I picked 20 new locations and 20 dropped locations at random and analysed why they happened. The input samples can be found here [2], along with an llvm-reduce'd version of each IR file. I confirmed the reason for the new/dropped location in the reduced and original file, as llvm-reducing them can alter the reason why something is dropped or not. Of the new locations, we previously could not track the location because: * 14 DBG_VALUEs come after the vreg operand is out of liveness and are dropped by LiveDebugVariables. * 2 DBG_VALUEs are out of liveness and dropped by RegisterCoalescing out of conservativeness. * 2 DBG_VALUEs that appear before their operand is defined. This is out of liveness, instruction referencing saves them through preserving debug use-before-defs. * 2 DBG_VALUEs that are out of liveness after a branch, but the value is live down the other branch path. All of these locations can be tracked with instruction referencing because liveness is not a consideration, only availability in physical registers. 19 of the new locations were correct, while one tracked the right value but picked the wrong location for it, which I've now got a patch for. For the dropped locations: * 8 false locations are dropped, they used to refer to the wrong value because of a failure in register coalescing, see the body of [3]. * 3 locations are un-necessarily dropped when different subregisters are merged together in register coalescing. * 3 locations are un-necessarily dropped due to conservative tracking of PHI values (the code in D86814, can be fixed with more C++). * 2 of the sample didn't actually have a dropped location; instead they preserved an undef debug instruction in early-taildup, and my scripts picked this up as dropping a location. * 2 locations aren't tracked by InstrRefBasedLDV through a block that's out of scope, meaning the location never covers instructions that are in scope. VarLocBasedLDV is vulerable to this too, but MachineSink can drop a DBG_VALUE on the far side of the scope gap, saving the location. See "Limitations" below. * 2 locations dropped during tail duplication: one in early-taildup which I haven't tried to address yet (see "Limitations"), one in late taildup where a block containing only debug instructions isn't correctly duplicated. To summarise: all the new locations found were correct and not trackable by DBG_VALUE variable-location tracking, although there are some bugs in picking locations. Roughly half of the dropped locations are actual false locations, the other half are due to unimplemented or limited handling of optimisations in the instruction referencing code so far. This pretty much fufils the objective of this work: we're able to save a lot more variable locations through the register allocator because we don't have to be so conservative about liveness. Plus, the default behaviour of all optimisations now is to _drop_ a variable location, as opposed to the existing situation where after we leave SSA form, all bets are off. Another question is how much this costs in compile time: a clang-3.4 build using instruction referencing on my otherwise idle machine usually tracks within 2% of a normal build. This is IMO expected given the larger amount of debugging information being produced, and I haven't closely studied the performance of a whole build using instruction referencing yet, so it'll probably get better. A more recent change to InstrRefBasedLDV has added a big slowdown though, so I'm going to skip reporting any performance results for now. Current situation ================ Some of this work has landed; I've got some patches up for review [4] that implement the core parts. I also have a long tail of tweaks and location-salvaging in a tree here [5] which just fleshes outs more optimisation passes and installs bugfixes. (Commits there are not written to be human consumable, alas). There are no fatal flaws in the design as far as I'm aware, although there are some annoyances (see "Limitations"). The biggest problem is that this all relies on a new LiveDebugValues implementation that doesn't have sufficient test coverage yet, and is still Somewhat Experimental (TM). Given the number of times an unpleasant performance cliff has been found in VarLoc LiveDebugValues, it wants a long time to soak in before being deployed. Limitations ========== Here's a non-exhaustive list of known problems. None of them are fatal IMO, and have a small effect on variable availability: * Early tail duplication: like late tail duplication, this tears apart SSA information and can cause the same "Value" to be defined twice. This is solvable using the SSAUpdater utility, which early-taildup already uses. * Attaching a debug instruction number to a COPY instruction is highly undesirable because the COPY doesn't actually define a value, it just moves it between locations. At least one optimisation (X86 LEAtoMOV) transforms instructions into COPYs (LEA $rsp + 0 => COPY $rsp), which is unfortunate. This doesn't happen a lot though, and can be fixed by dropping a DBG_PHI of the COPYd register nearby. Plus it only happens post-regalloc, which makes it less of a problem. * Trivial def rematerialization: there's no pattern to rely on in how the register allocator rematerializes values, and so values can rematerialize in different registers dominating different parts of the CFG. It's hard to track the variable location after that, because it has multiple values in the eyes of InstrRefBasedLDV. My preference would be, seeing how these defs are effectively constants, to have the target describe such trivial defs in a DIExpression. That avoids having to track the location of a constant that we already know. * As mentioned in the "missing" variable locations list, gaps in lexical scopes can lead to locations not being propagated sufficiently far, a problem for both variable-location tracking solutions as documented in PR48091. However, using DBG_VALUEs to track variable locations can save a few of them because MachineSink can sink DBG_VALUEs over the scope gap, wheras instruction-referencing tries to rely on tracking debug use-before-defs which don't propagate across scope gaps. More on how to resolve this in PR48091. Next Steps ========= While this isn't ready for general use yet, it'd be great to get as much as possible into llvm-12 behind the -Xclang -fexperimental-debug-variable-locations flag. That eases the path to testing for consumers, which gives a greater chance of finding worst-case slowdowns in advance of instruction referencing being generally available. There's a decent amount of stuff under "Limitations" above that I can address, plus some performance profiling is still needed. I imagine the next best thing to do is add support for GlobalISel and some non-X86 backends (certain TargetInstrInfo hooks need to perform debug-info bookkeeping), which would make this all more appetising. [0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html [1] http://lists.llvm.org/pipermail/llvm-dev/2020-June/142368.html [2] https://github.com/jmorse/llvm-inst-ref-test-samples [3] https://reviews.llvm.org/D86813 [4] https://reviews.llvm.org/D88898 [5] https://github.com/jmorse/llvm-project/commit/0a702b967927d888bd222806252783359fc74d57 -- Thanks, Jeremy
David Blaikie via llvm-dev
2020-Nov-06 19:10 UTC
[llvm-dev] [DebugInfo] A value-tracking variable location update
Awesome to read how it's coming along - I'm mostly aside from the debug location work, but had just one or two clarifying questions On Fri, Nov 6, 2020 at 10:27 AM Jeremy Morse <jeremy.morse.llvm at gmail.com> wrote:> > Hi debug-info folks, > > Time for another update on the variable location "instruction referencing" > implementation I've been doing, see this RFC [0, 1] for background. It's now at > the point where I'd call it "done" (as far as software ever is), and so it's a > good time to look at what results it produces. And here are the > scores-on-the-doors using llvm-locstats, on clang-3.4 RelWithDebInfo first in > "normal" mode and then with -Xclang -fexperimental-debug-variable-locations. > "normal": > > ================================================> cov% samples percentage(~) > ------------------------------------------------- > 0% 765406 22% > (0%,10%) 45179 1% > [10%,20%) 51699 1% > [20%,30%) 52044 1% > [30%,40%) 46905 1% > [40%,50%) 48292 1% > [50%,60%) 61342 1% > [60%,70%) 58315 1% > [70%,80%) 69848 2% > [80%,90%) 81937 2% > [90%,100%) 101384 2% > 100% 2032034 59% > ================================================> -the number of debug variables processed: 3414385 > -PC ranges covered: 61% > ------------------------------------------------- > -total availability: 64% > ================================================> > With instruction referencing: > > ================================================> cov% samples percentage(~) > ------------------------------------------------- > 0% 751201 21% > (0%,10%) 40708 1% > [10%,20%) 44909 1% > [20%,30%) 47544 1% > [30%,40%) 41630 1% > [40%,50%) 42742 1% > [50%,60%) 56692 1% > [60%,70%) 53796 1% > [70%,80%) 64476 1% > [80%,90%) 73836 2% > [90%,100%) 74423 2% > 100% 2123749 62% > ================================================> -the number of debug variables processed: 3415706 > -PC ranges covered: 68% > ------------------------------------------------- > -total availability: 64% > ================================================> > The first observation: a significant increase in the byte-coverage statistic, > meaning that we're able to track variable locations for longer and across more > code. This was one of the main aims of this work, having better tracking of > the locations that we know. The increase of seven percentage points includes an > additional two percentage points of entry-value locations. If we disable entry > value production then the scope-bytes-covered statistic moves from 59% to 64%,Was this meant to be "from 64% to 59%"? How does that compare to the baseline no-entry-value number? Could you give a quick summary of the distinction between "PC ranges covered" and "total availability"?> which is still a decent improvement. > > The next observation is that the ``total availability'' of variables hasn't > changed. This isn't the fully story -- if you give an absolute name to every > variable with a location in the clang binary, there are 6949 dropped locations > and 22564 completely new locations, meaning roughly 1% of all variables in the > program have changed, it's just hidden by the statistics rounding. More detail > on the nature of the changes are below. I was hoping for more false locations > to be dropped; it's quite likely that there are many more false locations > dropped within variables that have more than one value, which aren't readily > reflected in these statistics. > > A natural question is: are all these new locations wrong, and the dropped > locations only dropped because of bugs? To address that, I picked 20 new > locations and 20 dropped locations at random and analysed why they happened. > The input samples can be found here [2], along with an llvm-reduce'd version of > each IR file. I confirmed the reason for the new/dropped location in the > reduced and original file, as llvm-reducing them can alter the reason why > something is dropped or not. Of the new locations, we previously could not > track the location because: > * 14 DBG_VALUEs come after the vreg operand is out of liveness and are dropped > by LiveDebugVariables. > * 2 DBG_VALUEs are out of liveness and dropped by RegisterCoalescing > out of conservativeness. > * 2 DBG_VALUEs that appear before their operand is defined. This is out of > liveness, instruction referencing saves them through preserving debug > use-before-defs. > * 2 DBG_VALUEs that are out of liveness after a branch, but the value is live > down the other branch path. > > All of these locations can be tracked with instruction referencing because > liveness is not a consideration, only availability in physical registers. 19 of > the new locations were correct, while one tracked the right value but picked > the wrong location for it, which I've now got a patch for. > > For the dropped locations: > * 8 false locations are dropped, they used to refer to the wrong value because > of a failure in register coalescing, see the body of [3].Would these issues ^ show up/be testable with Dexter?> * 3 locations are un-necessarily dropped when different subregisters are > merged together in register coalescing. > * 3 locations are un-necessarily dropped due to conservative tracking of PHI > values (the code in D86814, can be fixed with more C++). > * 2 of the sample didn't actually have a dropped location; instead they > preserved an undef debug instruction in early-taildup, and my scripts picked > this up as dropping a location. > * 2 locations aren't tracked by InstrRefBasedLDV through a block that's > out of scope, meaning the location never covers instructions that are in > scope. VarLocBasedLDV is vulerable to this too, but MachineSink can drop a > DBG_VALUE on the far side of the scope gap, saving the location. See > "Limitations" below. > * 2 locations dropped during tail duplication: one in early-taildup which > I haven't tried to address yet (see "Limitations"), one in late taildup > where a block containing only debug instructions isn't correctly duplicated. > > To summarise: all the new locations found were correct and not trackable by > DBG_VALUE variable-location tracking, although there are some bugs in picking > locations. Roughly half of the dropped locations are actual false locations, > the other half are due to unimplemented or limited handling of optimisations in > the instruction referencing code so far. > > This pretty much fufils the objective of this work: we're able to save a lot > more variable locations through the register allocator because we don't have to > be so conservative about liveness. Plus, the default behaviour of all > optimisations now is to _drop_ a variable location, as opposed to the existing > situation where after we leave SSA form, all bets are off. > > Another question is how much this costs in compile time: a clang-3.4 build > using instruction referencing on my otherwise idle machine usually tracks > within 2% of a normal build. This is IMO expected given the larger amount of > debugging information being produced, and I haven't closely studied the > performance of a whole build using instruction referencing yet, so it'll > probably get better. A more recent change to InstrRefBasedLDV has added a big > slowdown though, so I'm going to skip reporting any performance results for > now. > > Current situation > ================> > Some of this work has landed; I've got some patches up for review [4] that > implement the core parts. I also have a long tail of tweaks and > location-salvaging in a tree here [5] which just fleshes outs more optimisation > passes and installs bugfixes. (Commits there are not written to be human > consumable, alas). There are no fatal flaws in the design as far as I'm aware, > although there are some annoyances (see "Limitations"). > > The biggest problem is that this all relies on a new LiveDebugValues > implementation that doesn't have sufficient test coverage yet, and is still > Somewhat Experimental (TM). Given the number of times an unpleasant performance > cliff has been found in VarLoc LiveDebugValues, it wants a long time to soak in > before being deployed. > > Limitations > ==========> > Here's a non-exhaustive list of known problems. None of them are fatal IMO, > and have a small effect on variable availability: > * Early tail duplication: like late tail duplication, this tears apart SSA > information and can cause the same "Value" to be defined twice. This is > solvable using the SSAUpdater utility, which early-taildup already uses. > * Attaching a debug instruction number to a COPY instruction is highly > undesirable because the COPY doesn't actually define a value, it just moves > it between locations. At least one optimisation (X86 LEAtoMOV) transforms > instructions into COPYs (LEA $rsp + 0 => COPY $rsp), which is unfortunate. > This doesn't happen a lot though, and can be fixed by dropping a DBG_PHI > of the COPYd register nearby. Plus it only happens post-regalloc, which > makes it less of a problem. > * Trivial def rematerialization: there's no pattern to rely on in how the > register allocator rematerializes values, and so values can rematerialize > in different registers dominating different parts of the CFG. It's hard to > track the variable location after that, because it has multiple values in > the eyes of InstrRefBasedLDV. My preference would be, seeing how these defs > are effectively constants, to have the target describe such trivial defs > in a DIExpression. That avoids having to track the location of a constant > that we already know. > * As mentioned in the "missing" variable locations list, gaps in lexical > scopes can lead to locations not being propagated sufficiently far, a > problem for both variable-location tracking solutions as documented in > PR48091. However, using DBG_VALUEs to track variable locations can save a > few of them because MachineSink can sink DBG_VALUEs over the scope gap, > wheras instruction-referencing tries to rely on tracking debug > use-before-defs which don't propagate across scope gaps. More on how to > resolve this in PR48091. > > Next Steps > =========> > While this isn't ready for general use yet, it'd be great to get as much as > possible into llvm-12 behind the -Xclang > -fexperimental-debug-variable-locations flag. That eases the path to testing > for consumers, which gives a greater chance of finding worst-case slowdowns in > advance of instruction referencing being generally available. > > There's a decent amount of stuff under "Limitations" above that I can address, > plus some performance profiling is still needed. I imagine the next best thing > to do is add support for GlobalISel and some non-X86 backends (certain > TargetInstrInfo hooks need to perform debug-info bookkeeping), which would make > this all more appetising. > > [0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html > [1] http://lists.llvm.org/pipermail/llvm-dev/2020-June/142368.html > [2] https://github.com/jmorse/llvm-inst-ref-test-samples > [3] https://reviews.llvm.org/D86813 > [4] https://reviews.llvm.org/D88898 > [5] https://github.com/jmorse/llvm-project/commit/0a702b967927d888bd222806252783359fc74d57 > > -- > Thanks, > Jeremy
Jeremy Morse via llvm-dev
2020-Nov-06 19:29 UTC
[llvm-dev] [DebugInfo] A value-tracking variable location update
Hi David, On Fri, Nov 6, 2020 at 7:10 PM David Blaikie <dblaikie at gmail.com> wrote:> > The first observation: a significant increase in the byte-coverage statistic, > > meaning that we're able to track variable locations for longer and across more > > code. This was one of the main aims of this work, having better tracking of > > the locations that we know. The increase of seven percentage points includes an > > additional two percentage points of entry-value locations. If we disable entry > > value production then the scope-bytes-covered statistic moves from 59% to 64%, > > Was this meant to be "from 64% to 59%"? > How does that compare to the baseline no-entry-value number?I guess that was ambiguous: I mean the baseline scope-bytes-covered number with no entry values is 59% with LLVM master as it stands today. With instruction referencing and no entry values, the scope-bytes-covered number rises to 64%.> Could you give a quick summary of the distinction between "PC ranges > covered" and "total availability"?"total availability" is the percentage of variables in the program that have a DW_AT_location attached -- so a measure of how well we preserve locations through the whole compiler. "PC ranges covered" is, for each variable in each lexical scope, dividing the number of program text bytes covered by the variable by the number of program text bytes in the lexical scope. If a variable location covers 10 bytes of a scope that is 20 bytes in size, then that's 50% coverage. The headline "PC ranges covered" is the average of all variables in the program -- better at measuring how long we can track known locations. (Obviously there are upsides and downsides of aggregating like this).> > For the dropped locations: > > * 8 false locations are dropped, they used to refer to the wrong value because > > of a failure in register coalescing, see the body of [3]. > > Would these issues ^ show up/be testable with Dexter?Yes, although it's so deep in the compiler that it'd be tricky to write a test that always hit that problem. -- Thanks, Jeremy