Tozer, Stephen via llvm-dev
2020-Sep-16 16:55 UTC
[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands
> That makes sense, and I think for "direct" values in your definition it is true that all direct values are r-values. > Why do we need DW_OP_LLVM_direct when we already have DW_OP_LLVM_stack_value? Can you give an example of something that is definitely not a stack value, but direct?The difference in definition is the intention: DW_OP_LLVM_direct means "we'd like this to be an l-value if possible", DW_OP_stack_value means "this should never be an l-value". Because of this, an expression ending with DW_OP_LLVM_direct can be emitted as an l-value in any case where the value of the preceding expression is equal to an l-value. So for example: DBG_VALUE $rsp, !"x", !DIExpression(DW_OP_LLVM_direct) => DW_OP_reg7 RSP DBG_VALUE $rsp, !"x", !DIExpression(DW_OP_deref, DW_OP_LLVM_direct) => DW_OP_breg7 RSP+0 DBG_VALUE $rsp, !"x", !DIExpression(DW_OP_plus_uconst, 4, DW_OP_LLVM_direct) => DW_OP_breg7 RSP+4, DW_OP_stack_value Your point about the semantics of variable assignments in the debugger is useful, that clears up my misunderstandings. I believe that even with that in mind, LLVM_direct (or whatever name it takes) would be appropriate. If we recognize that a variable must be read-only to preserve those semantics, then we can use DW_OP_stack_value to ensure that it is always an r-value. If we don't have any reason to make a variable read-only other than that we can't *currently* find an l-value location for it, then we would use DW_OP_LLVM_direct. Right now we use DW_OP_stack_value whenever we make a complex expression, but that doesn't need to be the case. The code below is an example program where we may eventually be able to generate a valid l-value for the variable "a" in foo(), but can't without an alternative to DW_OP_stack_value. At the end of the example, "a" is an r-value, but doesn't need to be: there is a single register that holds its exact value, and an assignment to that register would have the same semantics as an equivalent assignment to "a" in the source. The optimizations taking place in this code are analogous to if we had "a = bar() + 4 - 4;", but because we don't figure out that "a = bar()" in a single pass, we pre-emptively assume that "a" must be an r-value. To be able to emit an l-value we would first need the ability to optimize/simplify DIExpressions so that the expression becomes just (DW_OP_stack_value) - this wouldn't be particularly difficult to implement for simple arithmetic. Even with this improvement, the definition of DW_OP_stack_value explicitly forbids the expression from being a register location. If we instead used DW_OP_LLVM_direct, then we would be free to emit the register location (DW_OP_reg0 RAX). // Compile with clang -O2 -g int baz(); int bar2(int arg) { return arg * 4; } int bar() { return bar2(1); } int foo() { int a = baz() + bar() - 4; return a * 2; } ; Eventually becomes the IR... %call = call i32 @_Z3bazv(), !dbg !25 %call1 = call i32 @_Z3barv(), !dbg !26 %add = add nsw i32 %call, %call1, !dbg !27 %sub = sub nsw i32 %add, 4, !dbg !28 call void @llvm.dbg.value(metadata i32 %sub, metadata !24, metadata !DIExpression()), !dbg !29 %mul = mul nsw i32 %sub, 2, !dbg !30 ret i32 %mul, !dbg !31 ; Combine redundant instructions, "a" is salvaged... %call = call i32 @_Z3bazv(), !dbg !25 %call1 = call i32 @_Z3barv(), !dbg !26 %add = add nsw i32 %call, %call1, !dbg !27 call void @llvm.dbg.value(metadata i32 %add, metadata !24, metadata !DIExpression(DW_OP_constu, 4, DW_OP_minus, DW_OP_stack_value)), !dbg !28 %sub = shl i32 %add, 1, !dbg !29 %mul = add i32 %sub, -8, !dbg !29 ret i32 %mul, !dbg !30 ; bar() is found to always return 4 %call = call i32 @_Z3bazv(), !dbg !14 %add = add nsw i32 %call, 4, !dbg !15 call void @llvm.dbg.value(metadata i32 %add, metadata !13, metadata !DIExpression(DW_OP_constu, 4, DW_OP_minus, DW_OP_stack_value)), !dbg !16 %sub = shl i32 %add, 1, !dbg !17 %mul = add i32 %sub, -8, !dbg !17 ret i32 %mul, !dbg !18 ; %add is unused, optimize out and salvage... %call = call i32 @_Z3bazv(), !dbg !24 call void @llvm.dbg.value(metadata i32 %call, metadata !23, metadata !DIExpression(DW_OP_plus_uconst, 4, DW_OP_constu, 4, DW_OP_minus, DW_OP_stack_value)), !dbg !25 %add = shl i32 %call, 1, !dbg !26 ret i32 %add, !dbg !27 ; Final DWARF location for "a": DW_AT_location (0x00000000: [0x0000000000000029, 0x000000000000002b): DW_OP_breg0 RAX+4, DW_OP_constu 0xffffffff, DW_OP_and, DW_OP_lit4, DW_OP_minus, DW_OP_stack_value) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200916/1d35bbd8/attachment-0001.html>
Adrian Prantl via llvm-dev
2020-Oct-05 20:31 UTC
[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands
> On Sep 16, 2020, at 9:55 AM, Tozer, Stephen <stephen.tozer at sony.com> wrote: > > > That makes sense, and I think for "direct" values in your definition it is true that all direct values are r-values. > > Why do we need DW_OP_LLVM_direct when we already have DW_OP_LLVM_stack_value? Can you give an example of something that is definitely not a stack value, but direct? > > The difference in definition is the intention: DW_OP_LLVM_direct means "we'd like this to be an l-value if possible", DW_OP_stack_value means "this should never be an l-value". Because of this, an expression ending with DW_OP_LLVM_direct can be emitted as an l-value in any case where the value of the preceding expression is equal to an l-value. So for example: > > DBG_VALUE $rsp, !"x", !DIExpression(DW_OP_LLVM_direct) => DW_OP_reg7 RSP > DBG_VALUE $rsp, !"x", !DIExpression(DW_OP_deref, DW_OP_LLVM_direct) => DW_OP_breg7 RSP+0 > DBG_VALUE $rsp, !"x", !DIExpression(DW_OP_plus_uconst, 4, DW_OP_LLVM_direct) => DW_OP_breg7 RSP+4, DW_OP_stack_value > > Your point about the semantics of variable assignments in the debugger is useful, that clears up my misunderstandings. I believe that even with that in mind, LLVM_direct (or whatever name it takes) would be appropriate. If we recognize that a variable must be read-only to preserve those semantics, then we can use DW_OP_stack_value to ensure that it is always an r-value. If we don't have any reason to make a variable read-only other than that we can't *currently* find an l-value location for it, then we would use DW_OP_LLVM_direct. Right now we use DW_OP_stack_value whenever we make a complex expression, but that doesn't need to be the case.Great! It sounds like we reached mutual understanding :-)> > The code below is an example program where we may eventually be able to generate a valid l-value for the variable "a" in foo(), but can't without an alternative to DW_OP_stack_value. At the end of the example, "a" is an r-value, but doesn't need to be: there is a single register that holds its exact value, and an assignment to that register would have the same semantics as an equivalent assignment to "a" in the source. The optimizations taking place in this code are analogous to if we had "a = bar() + 4 - 4;", but because we don't figure out that "a = bar()" in a single pass, we pre-emptively assume that "a" must be an r-value. > > To be able to emit an l-value we would first need the ability to optimize/simplify DIExpressions so that the expression becomes just (DW_OP_stack_value) - this wouldn't be particularly difficult to implement for simple arithmetic. Even with this improvement, the definition of DW_OP_stack_value explicitly forbids the expression from being a register location. If we instead used DW_OP_LLVM_direct, then we would be free to emit the register location (DW_OP_reg0 RAX). > > // Compile with clang -O2 -g > int baz(); > int bar2(int arg) { > return arg * 4; > } > int bar() { > return bar2(1); > } > int foo() { > int a = baz() + bar() - 4; > return a * 2; > } > > ; Eventually becomes the IR... > %call = call i32 @_Z3bazv(), !dbg !25 > %call1 = call i32 @_Z3barv(), !dbg !26 > %add = add nsw i32 %call, %call1, !dbg !27 > %sub = sub nsw i32 %add, 4, !dbg !28 > call void @llvm.dbg.value(metadata i32 %sub, metadata !24, metadata !DIExpression()), !dbg !29 > %mul = mul nsw i32 %sub, 2, !dbg !30 > ret i32 %mul, !dbg !31 > > ; Combine redundant instructions, "a" is salvaged... > %call = call i32 @_Z3bazv(), !dbg !25 > %call1 = call i32 @_Z3barv(), !dbg !26 > %add = add nsw i32 %call, %call1, !dbg !27 > call void @llvm.dbg.value(metadata i32 %add, metadata !24, metadata !DIExpression(DW_OP_constu, 4, DW_OP_minus, DW_OP_stack_value)), !dbg !28 > %sub = shl i32 %add, 1, !dbg !29 > %mul = add i32 %sub, -8, !dbg !29 > ret i32 %mul, !dbg !30 > > ; bar() is found to always return 4 > %call = call i32 @_Z3bazv(), !dbg !14 > %add = add nsw i32 %call, 4, !dbg !15 > call void @llvm.dbg.value(metadata i32 %add, metadata !13, metadata !DIExpression(DW_OP_constu, 4, DW_OP_minus, DW_OP_stack_value)), !dbg !16 > %sub = shl i32 %add, 1, !dbg !17 > %mul = add i32 %sub, -8, !dbg !17 > ret i32 %mul, !dbg !18 > > ; %add is unused, optimize out and salvage... > %call = call i32 @_Z3bazv(), !dbg !24 > call void @llvm.dbg.value(metadata i32 %call, metadata !23, metadata !DIExpression(DW_OP_plus_uconst, 4, DW_OP_constu, 4, DW_OP_minus, DW_OP_stack_value)), !dbg !25 > %add = shl i32 %call, 1, !dbg !26 > ret i32 %add, !dbg !27 > > ; Final DWARF location for "a": > DW_AT_location (0x00000000: > [0x0000000000000029, 0x000000000000002b): DW_OP_breg0 RAX+4, DW_OP_constu 0xffffffff, DW_OP_and, DW_OP_lit4, DW_OP_minus, DW_OP_stack_value)So in this example, if we had DW_OP_LLVM_direct, we would salvage "a" as DIExpression(DW_OP_constu, 4, DW_OP_minus, DW_OP_LLVM_direct) ? which would mean: "this is an l-value with some additional DWARF operations. The backend should either emit this as a DW_OP_stack_value, or if the DWARF expression turns out to be a no-op, drop the entire DIExpression and emit this as a register or memory location.". I can see how that could potentially be useful. I'm not sure how often we could practically make use of a situation like this, but I understand your motivation. If we had DW_OP_LLVM_direct: what would be the semantics of DIExpression(DW_OP_constu, 4, DW_OP_minus, DW_OP_LLVM_direct) versus DIExpression(DW_OP_constu, 4, DW_OP_minus) ? thanks, adrian -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201005/1140ed64/attachment-0001.html>
Tozer, Stephen via llvm-dev
2020-Oct-06 12:13 UTC
[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands
> I can see how that could potentially be useful. I'm not sure how often we could practically make use of a situation like this, but I understand your motivation.Indeed, I don't expect us to cancel out DWARF expressions like that very often. Although that edge case is likely to be very rare, the _direct operator itself will appear very frequently, as it would be used for every DBG_VALUE that represents a register location. This allows us to represent register locations in a way that doesn't rely on flags outside of the DIExpression, doesn't require changes to be made to the flag/DIExpression if the register is RAUWd by a constant or other value, and has a clear definition that doesn't clash with anything in the DWARF spec. Supporting the no-op DIExpression reduction is unlikely to have a huge impact in itself, but having a "stack_value that could be an l-value" nicely rounds out the LLVM representation for debug values.>If we had DW_OP_LLVM_direct: what would be the semantics of > >DIExpression(DW_OP_constu, 4, DW_OP_minus, DW_OP_LLVM_direct) > >versus > >DIExpression(DW_OP_constu, 4, DW_OP_minus) ?Once we have the _direct operator, which will be used for all register locations and some implicit locations, we can safely say that any expression that isn't _direct, implicit, or empty will be a memory location. So for the first expression we would check to see if it could be emitted as a register location, and when that fails we emit a stack value: DW_OP_breg7 RSP+0, DW_OP_constu 4, DW_OP_minus, DW_OP_stack_value Since the second expression is not LLVM_direct, stack_value, implicit_ptr, or any other explicitly declared location type, then it must be a memory location, so we emit: DW_OP_breg7 RSP+0, DW_OP_constu 4, DW_OP_minus -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201006/e4389654/attachment-0001.html>