thr3ads.net - llvm dev - [llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands [Oct 2020]

If this information is useful, please help other people find it:
Share via:

Tozer, Stephen via llvm-dev

2020-Sep-16 16:55 UTC

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

> That makes sense, and I think for "direct" values in your
definition it is true that all direct values are r-values.
> Why do we need DW_OP_LLVM_direct when we already have
DW_OP_LLVM_stack_value? Can you give an example of something that is definitely
not a stack value, but direct?
The difference in definition is the intention: DW_OP_LLVM_direct means
"we'd like this to be an l-value if possible", DW_OP_stack_value
means "this should never be an l-value". Because of this, an
expression ending with DW_OP_LLVM_direct can be emitted as an l-value in any
case where the value of the preceding expression is equal to an l-value. So for
example:

  DBG_VALUE $rsp, !"x", !DIExpression(DW_OP_LLVM_direct) =>
DW_OP_reg7 RSP
  DBG_VALUE $rsp, !"x", !DIExpression(DW_OP_deref, DW_OP_LLVM_direct)
=> DW_OP_breg7 RSP+0
  DBG_VALUE $rsp, !"x", !DIExpression(DW_OP_plus_uconst, 4,
DW_OP_LLVM_direct) => DW_OP_breg7 RSP+4, DW_OP_stack_value

Your point about the semantics of variable assignments in the debugger is
useful, that clears up my misunderstandings. I believe that even with that in
mind, LLVM_direct (or whatever name it takes) would be appropriate. If we
recognize that a variable must be read-only to preserve those semantics, then we
can use DW_OP_stack_value to ensure that it is always an r-value. If we
don't have any reason to make a variable read-only other than that we
can't *currently* find an l-value location for it, then we would use
DW_OP_LLVM_direct. Right now we use DW_OP_stack_value whenever we make a complex
expression, but that doesn't need to be the case.

The code below is an example program where we may eventually be able to generate
a valid l-value for the variable "a" in foo(), but can't without
an alternative to DW_OP_stack_value. At the end of the example, "a" is
an r-value, but doesn't need to be: there is a single register that holds
its exact value, and an assignment to that register would have the same
semantics as an equivalent assignment to "a" in the source. The
optimizations taking place in this code are analogous to if we had "a =
bar() + 4 - 4;", but because we don't figure out that "a =
bar()" in a single pass, we pre-emptively assume that "a" must be
an r-value.

To be able to emit an l-value we would first need the ability to
optimize/simplify DIExpressions so that the expression becomes just
(DW_OP_stack_value) - this wouldn't be particularly difficult to implement
for simple arithmetic. Even with this improvement, the definition of
DW_OP_stack_value explicitly forbids the expression from being a register
location. If we instead used DW_OP_LLVM_direct, then we would be free to emit
the register location (DW_OP_reg0 RAX).

  // Compile with clang -O2 -g
  int baz();
  int bar2(int arg) {
    return arg * 4;
  }
  int bar() {
    return bar2(1);
  }
  int foo() {
    int a = baz() + bar() - 4;
    return a * 2;
  }

; Eventually becomes the IR...
  %call = call i32 @_Z3bazv(), !dbg !25
  %call1 = call i32 @_Z3barv(), !dbg !26
  %add = add nsw i32 %call, %call1, !dbg !27
  %sub = sub nsw i32 %add, 4, !dbg !28
  call void @llvm.dbg.value(metadata i32 %sub, metadata !24, metadata
!DIExpression()), !dbg !29
  %mul = mul nsw i32 %sub, 2, !dbg !30
  ret i32 %mul, !dbg !31

; Combine redundant instructions, "a" is salvaged...
  %call = call i32 @_Z3bazv(), !dbg !25
  %call1 = call i32 @_Z3barv(), !dbg !26
  %add = add nsw i32 %call, %call1, !dbg !27
  call void @llvm.dbg.value(metadata i32 %add, metadata !24, metadata
!DIExpression(DW_OP_constu, 4, DW_OP_minus, DW_OP_stack_value)), !dbg !28
  %sub = shl i32 %add, 1, !dbg !29
  %mul = add i32 %sub, -8, !dbg !29
  ret i32 %mul, !dbg !30

; bar() is found to always return 4
  %call = call i32 @_Z3bazv(), !dbg !14
  %add = add nsw i32 %call, 4, !dbg !15
  call void @llvm.dbg.value(metadata i32 %add, metadata !13, metadata
!DIExpression(DW_OP_constu, 4, DW_OP_minus, DW_OP_stack_value)), !dbg !16
  %sub = shl i32 %add, 1, !dbg !17
  %mul = add i32 %sub, -8, !dbg !17
  ret i32 %mul, !dbg !18

; %add is unused, optimize out and salvage...
  %call = call i32 @_Z3bazv(), !dbg !24
  call void @llvm.dbg.value(metadata i32 %call, metadata !23, metadata
!DIExpression(DW_OP_plus_uconst, 4, DW_OP_constu, 4, DW_OP_minus,
DW_OP_stack_value)), !dbg !25
  %add = shl i32 %call, 1, !dbg !26
  ret i32 %add, !dbg !27

  ; Final DWARF location for "a":
  DW_AT_location        (0x00000000:
      [0x0000000000000029, 0x000000000000002b): DW_OP_breg0 RAX+4, DW_OP_constu
0xffffffff, DW_OP_and, DW_OP_lit4, DW_OP_minus, DW_OP_stack_value)


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200916/1d35bbd8/attachment-0001.html>

Adrian Prantl via llvm-dev

2020-Oct-05 20:31 UTC

head link

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

> On Sep 16, 2020, at 9:55 AM, Tozer, Stephen <stephen.tozer at
sony.com> wrote:
> 
> > That makes sense, and I think for "direct" values in your
definition it is true that all direct values are r-values.
> > Why do we need DW_OP_LLVM_direct when we already have
DW_OP_LLVM_stack_value? Can you give an example of something that is definitely
not a stack value, but direct?
> 
> The difference in definition is the intention: DW_OP_LLVM_direct means
"we'd like this to be an l-value if possible", DW_OP_stack_value
means "this should never be an l-value". Because of this, an
expression ending with DW_OP_LLVM_direct can be emitted as an l-value in any
case where the value of the preceding expression is equal to an l-value. So for
example:
> 
>   DBG_VALUE $rsp, !"x", !DIExpression(DW_OP_LLVM_direct) =>
DW_OP_reg7 RSP
>   DBG_VALUE $rsp, !"x", !DIExpression(DW_OP_deref,
DW_OP_LLVM_direct) => DW_OP_breg7 RSP+0
>   DBG_VALUE $rsp, !"x", !DIExpression(DW_OP_plus_uconst, 4,
DW_OP_LLVM_direct) => DW_OP_breg7 RSP+4, DW_OP_stack_value
> 
> Your point about the semantics of variable assignments in the debugger is
useful, that clears up my misunderstandings. I believe that even with that in
mind, LLVM_direct (or whatever name it takes) would be appropriate. If we
recognize that a variable must be read-only to preserve those semantics, then we
can use DW_OP_stack_value to ensure that it is always an r-value. If we
don't have any reason to make a variable read-only other than that we
can't *currently* find an l-value location for it, then we would use
DW_OP_LLVM_direct. Right now we use DW_OP_stack_value whenever we make a complex
expression, but that doesn't need to be the case.
Great! It sounds like we reached mutual understanding :-)
> 
> The code below is an example program where we may eventually be able to
generate a valid l-value for the variable "a" in foo(), but can't
without an alternative to DW_OP_stack_value. At the end of the example,
"a" is an r-value, but doesn't need to be: there is a single
register that holds its exact value, and an assignment to that register would
have the same semantics as an equivalent assignment to "a" in the
source. The optimizations taking place in this code are analogous to if we had
"a = bar() + 4 - 4;", but because we don't figure out that "a
= bar()" in a single pass, we pre-emptively assume that "a" must
be an r-value.
> 
> To be able to emit an l-value we would first need the ability to
optimize/simplify DIExpressions so that the expression becomes just
(DW_OP_stack_value) - this wouldn't be particularly difficult to implement
for simple arithmetic. Even with this improvement, the definition of
DW_OP_stack_value explicitly forbids the expression from being a register
location. If we instead used DW_OP_LLVM_direct, then we would be free to emit
the register location (DW_OP_reg0 RAX).
> 
>   // Compile with clang -O2 -g
>   int baz();
>   int bar2(int arg) {
>     return arg * 4;
>   }
>   int bar() {
>     return bar2(1);
>   }
>   int foo() {
>     int a = baz() + bar() - 4;
>     return a * 2;
>   }
> 
> ; Eventually becomes the IR...
>   %call = call i32 @_Z3bazv(), !dbg !25
>   %call1 = call i32 @_Z3barv(), !dbg !26
>   %add = add nsw i32 %call, %call1, !dbg !27
>   %sub = sub nsw i32 %add, 4, !dbg !28
>   call void @llvm.dbg.value(metadata i32 %sub, metadata !24, metadata
!DIExpression()), !dbg !29
>   %mul = mul nsw i32 %sub, 2, !dbg !30
>   ret i32 %mul, !dbg !31
> 
> ; Combine redundant instructions, "a" is salvaged...
>   %call = call i32 @_Z3bazv(), !dbg !25
>   %call1 = call i32 @_Z3barv(), !dbg !26
>   %add = add nsw i32 %call, %call1, !dbg !27
>   call void @llvm.dbg.value(metadata i32 %add, metadata !24, metadata
!DIExpression(DW_OP_constu, 4, DW_OP_minus, DW_OP_stack_value)), !dbg !28
>   %sub = shl i32 %add, 1, !dbg !29
>   %mul = add i32 %sub, -8, !dbg !29
>   ret i32 %mul, !dbg !30
>   
> ; bar() is found to always return 4
>   %call = call i32 @_Z3bazv(), !dbg !14
>   %add = add nsw i32 %call, 4, !dbg !15
>   call void @llvm.dbg.value(metadata i32 %add, metadata !13, metadata
!DIExpression(DW_OP_constu, 4, DW_OP_minus, DW_OP_stack_value)), !dbg !16
>   %sub = shl i32 %add, 1, !dbg !17
>   %mul = add i32 %sub, -8, !dbg !17
>   ret i32 %mul, !dbg !18
> 
> ; %add is unused, optimize out and salvage...
>   %call = call i32 @_Z3bazv(), !dbg !24
>   call void @llvm.dbg.value(metadata i32 %call, metadata !23, metadata
!DIExpression(DW_OP_plus_uconst, 4, DW_OP_constu, 4, DW_OP_minus,
DW_OP_stack_value)), !dbg !25
>   %add = shl i32 %call, 1, !dbg !26
>   ret i32 %add, !dbg !27
> 
>   ; Final DWARF location for "a":
>   DW_AT_location        (0x00000000:
>       [0x0000000000000029, 0x000000000000002b): DW_OP_breg0 RAX+4,
DW_OP_constu 0xffffffff, DW_OP_and, DW_OP_lit4, DW_OP_minus, DW_OP_stack_value)
So in this example, if we had DW_OP_LLVM_direct, we would salvage "a"
as

DIExpression(DW_OP_constu, 4, DW_OP_minus, DW_OP_LLVM_direct) ?

which would mean: "this is an l-value with some additional DWARF
operations. The backend should either emit this as a DW_OP_stack_value, or if
the DWARF expression turns out to be a no-op, drop the entire DIExpression and
emit this as a register or memory location.".
I can see how that could potentially be useful. I'm not sure how often we
could practically make use of a situation like this, but I understand your
motivation.

If we had DW_OP_LLVM_direct: what would be the semantics of 

DIExpression(DW_OP_constu, 4, DW_OP_minus, DW_OP_LLVM_direct)

versus

DIExpression(DW_OP_constu, 4, DW_OP_minus) ?

thanks,
adrian
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201005/1140ed64/attachment-0001.html>

Tozer, Stephen via llvm-dev

2020-Oct-06 12:13 UTC

head link

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

> I can see how that could potentially be useful. I'm not sure how often
we could practically make use of a situation like this, but I understand your
motivation.
Indeed, I don't expect us to cancel out DWARF expressions like that very
often. Although that edge case is likely to be very rare, the _direct operator
itself will appear very frequently, as it would be used for every DBG_VALUE that
represents a register location. This allows us to represent register locations
in a way that doesn't rely on flags outside of the DIExpression, doesn't
require changes to be made to the flag/DIExpression if the register is RAUWd by
a constant or other value, and has a clear definition that doesn't clash
with anything in the DWARF spec. Supporting the no-op DIExpression reduction is
unlikely to have a huge impact in itself, but having a "stack_value that
could be an l-value" nicely rounds out the LLVM representation for debug
values.
>If we had DW_OP_LLVM_direct: what would be the semantics of
>
>DIExpression(DW_OP_constu, 4, DW_OP_minus, DW_OP_LLVM_direct)
>
>versus
>
>DIExpression(DW_OP_constu, 4, DW_OP_minus) ?
Once we have the _direct operator, which will be used for all register locations
and some implicit locations, we can safely say that any expression that
isn't _direct, implicit, or empty will be a memory location. So for the
first expression we would check to see if it could be emitted as a register
location, and when that fails we emit a stack value:

DW_OP_breg7 RSP+0, DW_OP_constu 4, DW_OP_minus, DW_OP_stack_value

Since the second expression is not LLVM_direct, stack_value, implicit_ptr, or
any other explicitly declared location type, then it must be a memory location,
so we emit:

DW_OP_breg7 RSP+0, DW_OP_constu 4, DW_OP_minus
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201006/e4389654/attachment-0001.html>

llvm dev - Oct 2020 - [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands