thr3ads.net - llvm dev - [llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands [Sep 2020]

If this information is useful, please help other people find it:
Share via:

Tozer, Stephen via llvm-dev

2020-Sep-11 18:12 UTC

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

> Can you elaborate what "direct" means? I'm having trouble
understanding what the opposite (a non-exact value) would be.
Apologies, "exact" was a misleading/incorrect term. By direct, I mean
that the expression computes the value of the variable, as opposed to its memory
address, or the value that it points to. Within LLVM, where we don't have
DW_OP_reg/DW_OP_breg but instead simply refer to a generic SSA value, this could
mean either a register location or stack value.
> At the moment we don't make the lvalue/rvalue distinction in LLVM at
all. We make an educated guess in AsmPrinter. But that's wrong and something
we should strive to fix during this redesigning.
I think the opposite; I don't believe there's any reason we need to make
the explicit lvalue/rvalue distinction until we're writing DWARF. To put it
in more general terms, I think that the IR/MIR debug value instructions should
only care about how the variable's value can be computed. Whether the result
of that computation is an lvalue is unimportant within LLVM itself as far as I
can tell, and is redundant when it can be computed from just the DIExpression
and location operands.
>As stated above, I don't think we can trivially determine this, because
(at least for dbg.values) this info was lost already in LLVM IR. Unless we say
the dbg.declare / dbg.value distinction is what determines lvalues vs. rvalues.
With the proposed operator, it would be trivial to determine lvalue vs rvalue
debug values with a set of rules (ignoring any fragment operator, which may
appear at the end but does not affect the location type):

  1. If the expression is empty, or any location arguments are $noreg =>
Empty
  2. If the expression ends with DW_OP_implicit_ptr => Implicit pointer
(rvalue)
  3. If the expression ends with DW_OP_stack_value =>Stack value (rvalue)  //
LLVM should produce LLVM_direct instead.
  4. If the expression ends with DW_OP_LLVM_direct, then...
    4a. If the preceding expression is just DW_OP_LLVM_arg, 0 and the only
location operand is a register => Register location (lvalue)
    4b. Otherwise => Stack value (rvalue)
  5. Otherwise => Memory location (lvalue)

This covers all the expected cases without ambiguity or almost any reduced
expressiveness. I believe that the only expression that LLVM will not be able to
produce like this is DW_OP_bregN, DW_OP_stack_value due to fact that when
DW_OP_LLVM_direct is used, this would be written as a register location instead
of a stack value. I don't think there are any cases where we would choose to
emit a stack value location when we're able to produce a register location
instead, so this shouldn't be a problem.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200911/4a430ed5/attachment.html>

Adrian Prantl via llvm-dev

2020-Sep-11 19:24 UTC

head link

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

> On Sep 11, 2020, at 11:12 AM, Tozer, Stephen <stephen.tozer at
sony.com> wrote:
> 
> > Can you elaborate what "direct" means? I'm having
trouble understanding what the opposite (a non-exact value) would be.
> 
> Apologies, "exact" was a misleading/incorrect term. By direct, I
mean that the expression computes the value of the variable, as opposed to its
memory address, or the value that it points to.
That sounds to me to be the same concept that I am calling rvalue vs. lvalue. Do
you agree, or is there some subtlety that I am missing?
> Within LLVM, where we don't have DW_OP_reg/DW_OP_breg but instead
simply refer to a generic SSA value, this could mean either a register location
or stack value.
> 
> > At the moment we don't make the lvalue/rvalue distinction in LLVM
at all. We make an educated guess in AsmPrinter. But that's wrong and
something we should strive to fix during this redesigning.
> 
> I think the opposite; I don't believe there's any reason we need to
make the explicit lvalue/rvalue distinction until we're writing DWARF.
Here is an example of why I think that an optimization pass must have the
ability to downgrade an lvalue to an rvalue:

; BEFORE
%foo = i32 ...
%mem = alloca i32
call %llvm.dbg.declare(%mem, !DILocalVariable("x"))
store i32* %mem, i32 %foo
...
store i32* %mem, i32 0
...
store i32* %mem, i32 %foo
...

; AFTER
%foo = i32 ...
%mem = alloca i32
call %llvm.dbg.value(%mem, !DILocalVariable("x"),
!DIExpression(DW_OP_deref))
store i32* %mem, i32 %foo
; optimization eliminated the store of 0 to %mem and replaced all loads of %mem
with "i32 0".
call %llvm.dbg.value(%mem, !DILocalVariable("x"),
!DIExpression(DW_OP_constu 0, DW_OP_stack_value))
...
call %llvm.dbg.value(%mem, !DILocalVariable("x"),
!DIExpression(DW_OP_deref))
...

The optimization eliminated the store of constant 0 to %mem and replaced all
loads of %mem in the subsequent block with a constant "i32 0". This
means that we need mark the first dbg.value (that would otherwise look like an
lvalue) as an rvalue, because writing a new value to %mem there would not affect
the code that is now hardcoded to use a constant 0 value for "x".

what do you think?

-- adrian
> To put it in more general terms, I think that the IR/MIR debug value
instructions should only care about how the variable's value can be
computed. Whether the result of that computation is an lvalue is unimportant
within LLVM itself as far as I can tell, and is redundant when it can be
computed from just the DIExpression and location operands.
> 
> >As stated above, I don't think we can trivially determine this,
because (at least for dbg.values) this info was lost already in LLVM IR. Unless
we say the dbg.declare / dbg.value distinction is what determines lvalues vs.
rvalues.
> 
> With the proposed operator, it would be trivial to determine lvalue vs
rvalue debug values with a set of rules (ignoring any fragment operator, which
may appear at the end but does not affect the location type):
> 
>   1. If the expression is empty, or any location arguments are $noreg =>
Empty
>   2. If the expression ends with DW_OP_implicit_ptr => Implicit pointer
(rvalue)
>   3. If the expression ends with DW_OP_stack_value =>Stack value
(rvalue)  // LLVM should produce LLVM_direct instead.
>   4. If the expression ends with DW_OP_LLVM_direct, then...
>     4a. If the preceding expression is just DW_OP_LLVM_arg, 0 and the only
location operand is a register => Register location (lvalue)
>     4b. Otherwise => Stack value (rvalue)
>   5. Otherwise => Memory location (lvalue)
> 
> This covers all the expected cases without ambiguity or almost any reduced
expressiveness. I believe that the only expression that LLVM will not be able to
produce like this is DW_OP_bregN, DW_OP_stack_value due to fact that when
DW_OP_LLVM_direct is used, this would be written as a register location instead
of a stack value. I don't think there are any cases where we would choose to
emit a stack value location when we're able to produce a register location
instead, so this shouldn't be a problem.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200911/e502f4b7/attachment.html>

Tozer, Stephen via llvm-dev

2020-Sep-15 17:56 UTC

head link

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

> That sounds to me to be the same concept that I am calling r-value vs.
l-value. Do you agree, or is there some subtlety that I am missing?
I've been assuming that the l-value vs r-value distinction is analogous to
C++: an l-value can be written to by the debugger, an r-value cannot. A memory
location and a register location are both l-values, while implicit locations
(stack value/implicit pointer) are r-values. Directness on the other hand,
I've been using to mean "the variable's value is equal to the value
computed by the DIExpression". Applied to a DWARF expression this
description would only refer to a stack value. LLVM's DIExpressions are
different however, because we don't have DW_OP_reg or DW_OP_breg: we simply
refer to the register and use context to determine which it should be. The best
example of this is from this example[0], expanded on here with the location
type:

$noreg, (plus_uconst, 8),                -> DW_OP_breg7 RSP+8)               
Memory    l-value  Indirect
     0, (plus_uconst, 8),                -> DW_OP_breg7 RSP+8)               
Memory    l-value  Indirect
$noreg, (plus_uconst, 8, stackval),      -> DW_OP_breg7 RSP+8, stackval      
Stack     r-value  Direct
     0, (plus_uconst, 8, stackval),      -> DW_OP_breg7 RSP+8, stackval      
Stack     r-value  Direct
$noreg, (plus_uconst, 8, deref),         -> DW_OP_breg7 RSP+8                
Memory    l-value  Indirect
     0, (plus_uconst, 8, deref),         -> DW_OP_breg7 RSP+8, deref         
Memory    l-value  Indirect
$noreg, (plus_uconst, 8, deref, stackval)-> DW_OP_breg7 RSP+8, deref,
stackval    Stack     r-value  Direct
     0, (plus_uconst, 8, deref, stackval)-> DW_OP_breg7 RSP+8, deref,
stackval    Stack     r-value  Direct
$noreg, (),                              -> DW_OP_reg7 RSP                   
Register  l-value  Direct
     0, (),                              -> DW_OP_breg7 RSP+0                
Memory    l-value  Indirect
$noreg, (deref),                         -> DW_OP_breg7 RSP+0                
Memory    l-value  Indirect
     0, (deref),                         -> DW_OP_breg7 RSP+0, DW_OP_deref   
Memory    l-value  Indirect

The point of using DW_OP_LLVM_direct is that it supplants the current directness
flag, without the redundancy or unintuitive behaviour that the flag does. I
believe the only reason that the indirectness flag is necessary right now is to
allow register locations to be emitted, by delineating the register location
"$rsp, $noreg, ()" -> DW_OP_reg7 RSP" from the memory location
"$rsp, 0, () -> DW_OP_breg7 RSP+0". Outside of this case the
existing representation is sufficient for all other locations, and ideally the
indirectness flag would have no effect (although unfortunately it does). The
justification for DW_OP_LLVM_direct rests on the idea that we will generally
choose to produce "DW_OP_reg7 RSP" instead of "DW_OP_breg7 RSP,
DW_OP_stack_value". It doesn't prevent us from using DW_OP_stack_value
instead if we have an exception, but I don't believe there are any. Using
DW_OP_LLVM_direct instead of directness and stackval for the table above, we get
this:

(plus_uconst, 8),                     -> DW_OP_breg7 RSP+8                   
Memory    l-value  Indirect
(plus_uconst, 8, LLVM_direct),        -> DW_OP_breg7 RSP+8, stackval         
Stack     r-value  Direct
(plus_uconst, 8, deref),              -> DW_OP_breg7 RSP+8, deref            
Memory    l-value  Indirect
(plus_uconst, 8, deref, LLVM_direct), -> DW_OP_breg7 RSP+8, deref, stackval  
Stack     r-value  Direct
(),                                   -> DW_OP_breg7 RSP+0                   
Memory    l-value  Inirect
(LLVM_direct),                        -> DW_OP_reg7 RSP                      
Register  l-value  Direct
(deref),                              -> DW_OP_breg7 RSP+0, deref            
Memory    l-value  Indirect
(deref, LLVM_direct),                 -> DW_OP_breg7 RSP+0, deref, stackval  
Stack     r-value  Direct

Two of the examples in this table should be excluded from actual use: the two
rows that end with "deref, LLVM_direct" shouldn't be produced
within LLVM, because we can cancel the two operators out to give a memory
location rather than producing a "deref, stackval" expression. This
can be done in LLVM itself through the DIExpression interface, so that we
don't hold DIExpressions in an incorrect intermediate state. I'm
currently operating under the belief that if a dbg.value can be an l-value, it
always should be; if not, then we can use DW_OP_stack_value instead in all cases
where we require a given dbg.value to be an r-value.

To give an example of why having this could be more useful than just applying
stack value, consider a hypothetical "DIExpression optimizer" pass
applied to the following code:

    // `int a` is live...
    int b = a + 5;
    int c = b - 5;

If both b and c are optimized out and salvaged, then we end up with the
following dbg.values:

    @llvm.dbg.value(i32 %a, !"b", !DIExpression(DW_OP_plus_uconst, 5,
DW_OP_LLVM_direct))
    @llvm.dbg.value(i32 %a, !"c", !DIExpression(DW_OP_plus_uconst, 5,
DW_OP_constu, 5, DW_OP_minus, DW_OP_LLVM_direct))
    ; DIExpressions optimized...
    @llvm.dbg.value(i32 %a, !"b", !DIExpression(DW_OP_plus_uconst, 5,
DW_OP_LLVM_direct))
    @llvm.dbg.value(i32 %a, !"c", !DIExpression(DW_OP_LLVM_direct))

In this admittedly strange case, we start with b and c as l-values (before they
are optimized out), they then become r-values due to optimization, and finally c
is a valid l-value again. If we instead applied DW_OP_stack_value when we
salvage, then c would not be recovered as an l-value. If we had
DW_OP_implicit_ptr instead of DW_OP_LLVM_direct, then the result would be an
r-value either way; likewise if we were referencing a memory location, the
result would be an l-value regardless of how we modified it.

I suspect there may be disagreements over whether c should share an l-value
location with a, since this means that a user could write to either c or a, and
that doing so would assign to both of them. My personal belief is that even if
it seems confusing, we shouldn't arbitrarily restrict write-access to
variables on criteria that will not always be clear to a debug user; whether or
not to apply such a restriction should be left to the debugger, rather than
being baked into the information we produce for it. Personal opinions aside, the
other reason I'm taking this approach right now is that it matches
LLVM's existing behaviour. If we have the source code:

    int a = ...
    int b = a;

We will produce the IR:

    call void @llvm.dbg.value(metadata i32 %a, metadata !13, metadata
!DIExpression()), !dbg !15
    call void @llvm.dbg.value(metadata i32 %a, metadata !14, metadata
!DIExpression()), !dbg !15

This IR will in turn produce register locations for both variables at the same
location. Based on that, I believe that the current expected behaviour is that
if two source variables map to the same actual location then they should share a
DWARF location.
> Here is an example of why I think that an optimization pass must have the
ability to downgrade an l-value to an r-value:
> ...
> The optimization eliminated the store of constant 0 to %mem and replaced
all loads of %mem in the subsequent block with a constant "i32 0".
This means that we need mark the first dbg.value (that would otherwise look like
an l-value) as an r-value, because writing a new value to %mem there would not
affect the code that is now hardcoded to use a constant 0 value for x.
The behaviour seen by the user in this case is:

; BEFORE
call %llvm.dbg.declare(%mem, !DILocalVariable("x"))
store i32* %mem, i32 %foo
; "x" is set to %foo, and can be written to
...
store i32* %mem, i32 0
; "x" is set to 0, and can be written to
...
store i32* %mem, i32 %foo
; "x" is set to %foo, and can be written to
...

; AFTER
call %llvm.dbg.value(%mem, !DILocalVariable("x"),
!DIExpression(DW_OP_deref))
store i32* %mem, i32 %foo
; "x" is set to %foo, and can be written to
...
call %llvm.dbg.value(%mem, !DILocalVariable("x"),
!DIExpression(DW_OP_constu 0, DW_OP_stack_value))
; "x" is set to %foo, and is read-only
...
call %llvm.dbg.value(%mem, !DILocalVariable("x"),
!DIExpression(DW_OP_deref))
; "x" is set to its value prior to 0, and can be written to
...

I don't think there's any issue with a variable being an r-value at some
points in its live range and an l-value at others; in this case I think it's
correct that the first dbg.value should be an l-value. Any write to x will not
affect the code after the eliminated store, but even without optimizations x
would be set to 0 (overriding any debugger assignment) at that point anyway. I
do agree that the code produced may be slightly confusing to a user; this code
likely maps to something along the lines of:

    int x = foo;
    ...
    x = 0;
    ...
    x = foo;

If the user breaks just after the first "x = foo" and assigns `x = 5`
in the debugger, they will see the correct result for all subsequent uses of x
until the next assignment to x. When they step over "x = 0", x has the
value 0 (as expected) and it becomes read-only. Finally after stepping over the
next "x = foo" they will see that `x == 5`, which might not make a lot
of sense when the user is expecting it to be assigned a different value. Even
so, this information is a correct representation of the program state.

The alternative of making the first dbg.value an r-value is restrictive - if the
initial assignment to x is at the top of a large function that the user is
debugging, and the other two assignments occur at the very end, it would likely
be frustrating to the user that they have no write-access to x throughout the
function. Because of this I don't believe that it would be right to make the
first dbg.value an r-value; I think again that choosing to apply these
restrictions should be left to the debugger.

[0] https://bugs.llvm.org/show_bug.cgi?id=41675#c8
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200915/c0ca235b/attachment-0001.html>

llvm dev - Sep 2020 - [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands