Tozer, Stephen via llvm-dev
2020-Sep-11 18:12 UTC
[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands
> Can you elaborate what "direct" means? I'm having trouble understanding what the opposite (a non-exact value) would be.Apologies, "exact" was a misleading/incorrect term. By direct, I mean that the expression computes the value of the variable, as opposed to its memory address, or the value that it points to. Within LLVM, where we don't have DW_OP_reg/DW_OP_breg but instead simply refer to a generic SSA value, this could mean either a register location or stack value.> At the moment we don't make the lvalue/rvalue distinction in LLVM at all. We make an educated guess in AsmPrinter. But that's wrong and something we should strive to fix during this redesigning.I think the opposite; I don't believe there's any reason we need to make the explicit lvalue/rvalue distinction until we're writing DWARF. To put it in more general terms, I think that the IR/MIR debug value instructions should only care about how the variable's value can be computed. Whether the result of that computation is an lvalue is unimportant within LLVM itself as far as I can tell, and is redundant when it can be computed from just the DIExpression and location operands.>As stated above, I don't think we can trivially determine this, because (at least for dbg.values) this info was lost already in LLVM IR. Unless we say the dbg.declare / dbg.value distinction is what determines lvalues vs. rvalues.With the proposed operator, it would be trivial to determine lvalue vs rvalue debug values with a set of rules (ignoring any fragment operator, which may appear at the end but does not affect the location type): 1. If the expression is empty, or any location arguments are $noreg => Empty 2. If the expression ends with DW_OP_implicit_ptr => Implicit pointer (rvalue) 3. If the expression ends with DW_OP_stack_value =>Stack value (rvalue) // LLVM should produce LLVM_direct instead. 4. If the expression ends with DW_OP_LLVM_direct, then... 4a. If the preceding expression is just DW_OP_LLVM_arg, 0 and the only location operand is a register => Register location (lvalue) 4b. Otherwise => Stack value (rvalue) 5. Otherwise => Memory location (lvalue) This covers all the expected cases without ambiguity or almost any reduced expressiveness. I believe that the only expression that LLVM will not be able to produce like this is DW_OP_bregN, DW_OP_stack_value due to fact that when DW_OP_LLVM_direct is used, this would be written as a register location instead of a stack value. I don't think there are any cases where we would choose to emit a stack value location when we're able to produce a register location instead, so this shouldn't be a problem. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200911/4a430ed5/attachment.html>
Adrian Prantl via llvm-dev
2020-Sep-11 19:24 UTC
[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands
> On Sep 11, 2020, at 11:12 AM, Tozer, Stephen <stephen.tozer at sony.com> wrote: > > > Can you elaborate what "direct" means? I'm having trouble understanding what the opposite (a non-exact value) would be. > > Apologies, "exact" was a misleading/incorrect term. By direct, I mean that the expression computes the value of the variable, as opposed to its memory address, or the value that it points to.That sounds to me to be the same concept that I am calling rvalue vs. lvalue. Do you agree, or is there some subtlety that I am missing?> Within LLVM, where we don't have DW_OP_reg/DW_OP_breg but instead simply refer to a generic SSA value, this could mean either a register location or stack value. > > > At the moment we don't make the lvalue/rvalue distinction in LLVM at all. We make an educated guess in AsmPrinter. But that's wrong and something we should strive to fix during this redesigning. > > I think the opposite; I don't believe there's any reason we need to make the explicit lvalue/rvalue distinction until we're writing DWARF.Here is an example of why I think that an optimization pass must have the ability to downgrade an lvalue to an rvalue: ; BEFORE %foo = i32 ... %mem = alloca i32 call %llvm.dbg.declare(%mem, !DILocalVariable("x")) store i32* %mem, i32 %foo ... store i32* %mem, i32 0 ... store i32* %mem, i32 %foo ... ; AFTER %foo = i32 ... %mem = alloca i32 call %llvm.dbg.value(%mem, !DILocalVariable("x"), !DIExpression(DW_OP_deref)) store i32* %mem, i32 %foo ; optimization eliminated the store of 0 to %mem and replaced all loads of %mem with "i32 0". call %llvm.dbg.value(%mem, !DILocalVariable("x"), !DIExpression(DW_OP_constu 0, DW_OP_stack_value)) ... call %llvm.dbg.value(%mem, !DILocalVariable("x"), !DIExpression(DW_OP_deref)) ... The optimization eliminated the store of constant 0 to %mem and replaced all loads of %mem in the subsequent block with a constant "i32 0". This means that we need mark the first dbg.value (that would otherwise look like an lvalue) as an rvalue, because writing a new value to %mem there would not affect the code that is now hardcoded to use a constant 0 value for "x". what do you think? -- adrian> To put it in more general terms, I think that the IR/MIR debug value instructions should only care about how the variable's value can be computed. Whether the result of that computation is an lvalue is unimportant within LLVM itself as far as I can tell, and is redundant when it can be computed from just the DIExpression and location operands. > > >As stated above, I don't think we can trivially determine this, because (at least for dbg.values) this info was lost already in LLVM IR. Unless we say the dbg.declare / dbg.value distinction is what determines lvalues vs. rvalues. > > With the proposed operator, it would be trivial to determine lvalue vs rvalue debug values with a set of rules (ignoring any fragment operator, which may appear at the end but does not affect the location type): > > 1. If the expression is empty, or any location arguments are $noreg => Empty > 2. If the expression ends with DW_OP_implicit_ptr => Implicit pointer (rvalue) > 3. If the expression ends with DW_OP_stack_value =>Stack value (rvalue) // LLVM should produce LLVM_direct instead. > 4. If the expression ends with DW_OP_LLVM_direct, then... > 4a. If the preceding expression is just DW_OP_LLVM_arg, 0 and the only location operand is a register => Register location (lvalue) > 4b. Otherwise => Stack value (rvalue) > 5. Otherwise => Memory location (lvalue) > > This covers all the expected cases without ambiguity or almost any reduced expressiveness. I believe that the only expression that LLVM will not be able to produce like this is DW_OP_bregN, DW_OP_stack_value due to fact that when DW_OP_LLVM_direct is used, this would be written as a register location instead of a stack value. I don't think there are any cases where we would choose to emit a stack value location when we're able to produce a register location instead, so this shouldn't be a problem.-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200911/e502f4b7/attachment.html>
Tozer, Stephen via llvm-dev
2020-Sep-15 17:56 UTC
[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands
> That sounds to me to be the same concept that I am calling r-value vs. l-value. Do you agree, or is there some subtlety that I am missing?I've been assuming that the l-value vs r-value distinction is analogous to C++: an l-value can be written to by the debugger, an r-value cannot. A memory location and a register location are both l-values, while implicit locations (stack value/implicit pointer) are r-values. Directness on the other hand, I've been using to mean "the variable's value is equal to the value computed by the DIExpression". Applied to a DWARF expression this description would only refer to a stack value. LLVM's DIExpressions are different however, because we don't have DW_OP_reg or DW_OP_breg: we simply refer to the register and use context to determine which it should be. The best example of this is from this example[0], expanded on here with the location type: $noreg, (plus_uconst, 8), -> DW_OP_breg7 RSP+8) Memory l-value Indirect 0, (plus_uconst, 8), -> DW_OP_breg7 RSP+8) Memory l-value Indirect $noreg, (plus_uconst, 8, stackval), -> DW_OP_breg7 RSP+8, stackval Stack r-value Direct 0, (plus_uconst, 8, stackval), -> DW_OP_breg7 RSP+8, stackval Stack r-value Direct $noreg, (plus_uconst, 8, deref), -> DW_OP_breg7 RSP+8 Memory l-value Indirect 0, (plus_uconst, 8, deref), -> DW_OP_breg7 RSP+8, deref Memory l-value Indirect $noreg, (plus_uconst, 8, deref, stackval)-> DW_OP_breg7 RSP+8, deref, stackval Stack r-value Direct 0, (plus_uconst, 8, deref, stackval)-> DW_OP_breg7 RSP+8, deref, stackval Stack r-value Direct $noreg, (), -> DW_OP_reg7 RSP Register l-value Direct 0, (), -> DW_OP_breg7 RSP+0 Memory l-value Indirect $noreg, (deref), -> DW_OP_breg7 RSP+0 Memory l-value Indirect 0, (deref), -> DW_OP_breg7 RSP+0, DW_OP_deref Memory l-value Indirect The point of using DW_OP_LLVM_direct is that it supplants the current directness flag, without the redundancy or unintuitive behaviour that the flag does. I believe the only reason that the indirectness flag is necessary right now is to allow register locations to be emitted, by delineating the register location "$rsp, $noreg, ()" -> DW_OP_reg7 RSP" from the memory location "$rsp, 0, () -> DW_OP_breg7 RSP+0". Outside of this case the existing representation is sufficient for all other locations, and ideally the indirectness flag would have no effect (although unfortunately it does). The justification for DW_OP_LLVM_direct rests on the idea that we will generally choose to produce "DW_OP_reg7 RSP" instead of "DW_OP_breg7 RSP, DW_OP_stack_value". It doesn't prevent us from using DW_OP_stack_value instead if we have an exception, but I don't believe there are any. Using DW_OP_LLVM_direct instead of directness and stackval for the table above, we get this: (plus_uconst, 8), -> DW_OP_breg7 RSP+8 Memory l-value Indirect (plus_uconst, 8, LLVM_direct), -> DW_OP_breg7 RSP+8, stackval Stack r-value Direct (plus_uconst, 8, deref), -> DW_OP_breg7 RSP+8, deref Memory l-value Indirect (plus_uconst, 8, deref, LLVM_direct), -> DW_OP_breg7 RSP+8, deref, stackval Stack r-value Direct (), -> DW_OP_breg7 RSP+0 Memory l-value Inirect (LLVM_direct), -> DW_OP_reg7 RSP Register l-value Direct (deref), -> DW_OP_breg7 RSP+0, deref Memory l-value Indirect (deref, LLVM_direct), -> DW_OP_breg7 RSP+0, deref, stackval Stack r-value Direct Two of the examples in this table should be excluded from actual use: the two rows that end with "deref, LLVM_direct" shouldn't be produced within LLVM, because we can cancel the two operators out to give a memory location rather than producing a "deref, stackval" expression. This can be done in LLVM itself through the DIExpression interface, so that we don't hold DIExpressions in an incorrect intermediate state. I'm currently operating under the belief that if a dbg.value can be an l-value, it always should be; if not, then we can use DW_OP_stack_value instead in all cases where we require a given dbg.value to be an r-value. To give an example of why having this could be more useful than just applying stack value, consider a hypothetical "DIExpression optimizer" pass applied to the following code: // `int a` is live... int b = a + 5; int c = b - 5; If both b and c are optimized out and salvaged, then we end up with the following dbg.values: @llvm.dbg.value(i32 %a, !"b", !DIExpression(DW_OP_plus_uconst, 5, DW_OP_LLVM_direct)) @llvm.dbg.value(i32 %a, !"c", !DIExpression(DW_OP_plus_uconst, 5, DW_OP_constu, 5, DW_OP_minus, DW_OP_LLVM_direct)) ; DIExpressions optimized... @llvm.dbg.value(i32 %a, !"b", !DIExpression(DW_OP_plus_uconst, 5, DW_OP_LLVM_direct)) @llvm.dbg.value(i32 %a, !"c", !DIExpression(DW_OP_LLVM_direct)) In this admittedly strange case, we start with b and c as l-values (before they are optimized out), they then become r-values due to optimization, and finally c is a valid l-value again. If we instead applied DW_OP_stack_value when we salvage, then c would not be recovered as an l-value. If we had DW_OP_implicit_ptr instead of DW_OP_LLVM_direct, then the result would be an r-value either way; likewise if we were referencing a memory location, the result would be an l-value regardless of how we modified it. I suspect there may be disagreements over whether c should share an l-value location with a, since this means that a user could write to either c or a, and that doing so would assign to both of them. My personal belief is that even if it seems confusing, we shouldn't arbitrarily restrict write-access to variables on criteria that will not always be clear to a debug user; whether or not to apply such a restriction should be left to the debugger, rather than being baked into the information we produce for it. Personal opinions aside, the other reason I'm taking this approach right now is that it matches LLVM's existing behaviour. If we have the source code: int a = ... int b = a; We will produce the IR: call void @llvm.dbg.value(metadata i32 %a, metadata !13, metadata !DIExpression()), !dbg !15 call void @llvm.dbg.value(metadata i32 %a, metadata !14, metadata !DIExpression()), !dbg !15 This IR will in turn produce register locations for both variables at the same location. Based on that, I believe that the current expected behaviour is that if two source variables map to the same actual location then they should share a DWARF location.> Here is an example of why I think that an optimization pass must have the ability to downgrade an l-value to an r-value: > ... > The optimization eliminated the store of constant 0 to %mem and replaced all loads of %mem in the subsequent block with a constant "i32 0". This means that we need mark the first dbg.value (that would otherwise look like an l-value) as an r-value, because writing a new value to %mem there would not affect the code that is now hardcoded to use a constant 0 value for x.The behaviour seen by the user in this case is: ; BEFORE call %llvm.dbg.declare(%mem, !DILocalVariable("x")) store i32* %mem, i32 %foo ; "x" is set to %foo, and can be written to ... store i32* %mem, i32 0 ; "x" is set to 0, and can be written to ... store i32* %mem, i32 %foo ; "x" is set to %foo, and can be written to ... ; AFTER call %llvm.dbg.value(%mem, !DILocalVariable("x"), !DIExpression(DW_OP_deref)) store i32* %mem, i32 %foo ; "x" is set to %foo, and can be written to ... call %llvm.dbg.value(%mem, !DILocalVariable("x"), !DIExpression(DW_OP_constu 0, DW_OP_stack_value)) ; "x" is set to %foo, and is read-only ... call %llvm.dbg.value(%mem, !DILocalVariable("x"), !DIExpression(DW_OP_deref)) ; "x" is set to its value prior to 0, and can be written to ... I don't think there's any issue with a variable being an r-value at some points in its live range and an l-value at others; in this case I think it's correct that the first dbg.value should be an l-value. Any write to x will not affect the code after the eliminated store, but even without optimizations x would be set to 0 (overriding any debugger assignment) at that point anyway. I do agree that the code produced may be slightly confusing to a user; this code likely maps to something along the lines of: int x = foo; ... x = 0; ... x = foo; If the user breaks just after the first "x = foo" and assigns `x = 5` in the debugger, they will see the correct result for all subsequent uses of x until the next assignment to x. When they step over "x = 0", x has the value 0 (as expected) and it becomes read-only. Finally after stepping over the next "x = foo" they will see that `x == 5`, which might not make a lot of sense when the user is expecting it to be assigned a different value. Even so, this information is a correct representation of the program state. The alternative of making the first dbg.value an r-value is restrictive - if the initial assignment to x is at the top of a large function that the user is debugging, and the other two assignments occur at the very end, it would likely be frustrating to the user that they have no write-access to x throughout the function. Because of this I don't believe that it would be right to make the first dbg.value an r-value; I think again that choosing to apply these restrictions should be left to the debugger. [0] https://bugs.llvm.org/show_bug.cgi?id=41675#c8 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200915/c0ca235b/attachment-0001.html>