thr3ads.net - llvm dev - [llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands [Sep 2020]

If this information is useful, please help other people find it:
Share via:

Robinson, Paul via llvm-dev

2020-Sep-15 22:07 UTC

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

Hi Adrian & Stephen,

One thought here:

But — not all memory locations are l-values. If we have a DWARF location list
for variable "x" which points to a memory address for the first n
instructions and the switches to a constant for the remainder of the scope, the
memory address is not guaranteed to be an l-value, because writing the the
memory address cannot affect the later part of the function where the variable
is a constant.

A very interesting point.  I believe this is a concept that LLVM does not yet
understand?  that it needs to consider the l/r-value-ness of the variable
throughout the function, before deciding on the form that the
location-expressions will take.  I would not want to spend a lot of time parsing
expressions in order to make this kind of decision.

And a second thought here:

I suspect there may be disagreements over whether c should share an l-value
location with a, since this means that a user could write to either c or a, and
that doing so would assign to both of them. My personal belief is that even if
it seems confusing, we shouldn't arbitrarily restrict write-access to
variables on criteria that will not always be clear to a debug user; whether or
not to apply such a restriction should be left to the debugger, rather than
being baked into the information we produce for it.

Ah — you've thought about this already :-)

This is actually really important to me: My take is that we must always err on
the safe side. If the compiler cannot prove that it is safe to write to an
l-value (ie., that the semantics are the same as if the write were inserted into
the source code at that source location) it must downgrade the l-value to an
r-value. That's one thing I'm feeling really strongly about.
The reason behind this is that if we allow debug info that is only maybe correct
(or correct on some paths, or under some other circumstances that are opaque to
the user) — if we allow potentially incorrect debug info to leak into the
program, the user can not distinguish between information they can trust and
information that is bogus. That means all information that they see in the
debugger is potentially bogus. If the debugger is not trustworthy it's
worthless. That is feedback I keep getting particularly from kernel developers,
and I must say it resonates with me.

I should say that Stephen and I had a discussion a while back, where I said IMO
the compiler should emit debug info that correctly reflects the code as-it-is,
and not arbitrarily decide that ‘c’ and/or ‘a’ should not be writeable.  I
believe I also said this was arguable, but that I felt pretty strongly about it.

I have to say, it did not occur to me to consider emitting info that showed
*neither* ‘c’ nor ‘a’ to be writeable!  This is a kind of
protect-the-debugger-from-the-user perspective that I’ve not encountered before.
I would actually argue that showing both variables as writeable is in fact
correct (truthful), because it reflects the instructions as-produced; if you
modify one, you necessarily modify the other.  That’s what the code as-produced
does.
I accept that is not what the code as-originally-written does… and that it is
unlikely that the debugger would go to the trouble to notice that two variables
share a storage location and caution the user about that unexpected interaction.
(It’s hard enough to persuade the compiler to notice…)  Given that engineering
practicality, and that producing debug info that helps the user without
confusing/dismaying the user is a valuable goal, then I can agree that
describing both variables as r-values is more helpful.

In short, I agree with your conclusion, although not for your reasons. 😊
--paulr

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200915/683d28a4/attachment.html>

Adrian Prantl via llvm-dev

2020-Sep-16 00:39 UTC

head link

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

> On Sep 15, 2020, at 3:07 PM, Robinson, Paul <paul.robinson at
sony.com> wrote:
> 
> Hi Adrian & Stephen,
>  
> One thought here:
>  
> But — not all memory locations are l-values. If we have a DWARF location
list for variable "x" which points to a memory address for the first n
instructions and the switches to a constant for the remainder of the scope, the
memory address is not guaranteed to be an l-value, because writing the the
memory address cannot affect the later part of the function where the variable
is a constant.
>  
> A very interesting point.  I believe this is a concept that LLVM does not
yet understand?  that it needs to consider the l/r-value-ness of the variable
throughout the function, before deciding on the form that the
location-expressions will take.  I would not want to spend a lot of time parsing
expressions in order to make this kind of decision.
That's right. Stephen correctly identified that we need some new concept at
the LLVM IR level. My hypothesis is that DW_OP_stack_value — if applied
correctly — is sufficient to represent this. Stephen is making a case that we
could get out more by having a separate flag but I'm not (yet?) convinced
that that is in the end-user's best interest. So if we can avoid having two
flags for confusingly similar semantics I would like to go with the simpler IR.

-- adrian
>  
> And a second thought here:
>  
> I suspect there may be disagreements over whether c should share an l-value
location with a, since this means that a user could write to either c or a, and
that doing so would assign to both of them. My personal belief is that even if
it seems confusing, we shouldn't arbitrarily restrict write-access to
variables on criteria that will not always be clear to a debug user; whether or
not to apply such a restriction should be left to the debugger, rather than
being baked into the information we produce for it.
>  
> Ah — you've thought about this already :-)
>  
> This is actually really important to me: My take is that we must always err
on the safe side. If the compiler cannot prove that it is safe to write to an
l-value (ie., that the semantics are the same as if the write were inserted into
the source code at that source location) it must downgrade the l-value to an
r-value. That's one thing I'm feeling really strongly about.
> The reason behind this is that if we allow debug info that is only maybe
correct (or correct on some paths, or under some other circumstances that are
opaque to the user) — if we allow potentially incorrect debug info to leak into
the program, the user can not distinguish between information they can trust and
information that is bogus. That means all information that they see in the
debugger is potentially bogus. If the debugger is not trustworthy it's
worthless. That is feedback I keep getting particularly from kernel developers,
and I must say it resonates with me.
>  
> I should say that Stephen and I had a discussion a while back, where I said
IMO the compiler should emit debug info that correctly reflects the code
as-it-is, and not arbitrarily decide that ‘c’ and/or ‘a’ should not be
writeable.  I believe I also said this was arguable, but that I felt pretty
strongly about it.
>  
> I have to say, it did not occur to me to consider emitting info that showed
*neither* ‘c’ nor ‘a’ to be writeable!  This is a kind of
protect-the-debugger-from-the-user perspective that I’ve not encountered before.
I would actually argue that showing both variables as writeable is in fact
correct (truthful), because it reflects the instructions as-produced; if you
modify one, you necessarily modify the other.  That’s what the code as-produced
does.
> I accept that is not what the code as-originally-written does… and that it
is unlikely that the debugger would go to the trouble to notice that two
variables share a storage location and caution the user about that unexpected
interaction.  (It’s hard enough to persuade the compiler to notice…)  Given that
engineering practicality, and that producing debug info that helps the user
without confusing/dismaying the user is a valuable goal, then I can agree that
describing both variables as r-values is more helpful.
>  
> In short, I agree with your conclusion, although not for your reasons. 😊
> --paulr
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200915/8da5a5d9/attachment.html>

Tozer, Stephen via llvm-dev

2020-Sep-16 16:55 UTC

head link

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

> That makes sense, and I think for "direct" values in your
definition it is true that all direct values are r-values.
> Why do we need DW_OP_LLVM_direct when we already have
DW_OP_LLVM_stack_value? Can you give an example of something that is definitely
not a stack value, but direct?
The difference in definition is the intention: DW_OP_LLVM_direct means
"we'd like this to be an l-value if possible", DW_OP_stack_value
means "this should never be an l-value". Because of this, an
expression ending with DW_OP_LLVM_direct can be emitted as an l-value in any
case where the value of the preceding expression is equal to an l-value. So for
example:

  DBG_VALUE $rsp, !"x", !DIExpression(DW_OP_LLVM_direct) =>
DW_OP_reg7 RSP
  DBG_VALUE $rsp, !"x", !DIExpression(DW_OP_deref, DW_OP_LLVM_direct)
=> DW_OP_breg7 RSP+0
  DBG_VALUE $rsp, !"x", !DIExpression(DW_OP_plus_uconst, 4,
DW_OP_LLVM_direct) => DW_OP_breg7 RSP+4, DW_OP_stack_value

Your point about the semantics of variable assignments in the debugger is
useful, that clears up my misunderstandings. I believe that even with that in
mind, LLVM_direct (or whatever name it takes) would be appropriate. If we
recognize that a variable must be read-only to preserve those semantics, then we
can use DW_OP_stack_value to ensure that it is always an r-value. If we
don't have any reason to make a variable read-only other than that we
can't *currently* find an l-value location for it, then we would use
DW_OP_LLVM_direct. Right now we use DW_OP_stack_value whenever we make a complex
expression, but that doesn't need to be the case.

The code below is an example program where we may eventually be able to generate
a valid l-value for the variable "a" in foo(), but can't without
an alternative to DW_OP_stack_value. At the end of the example, "a" is
an r-value, but doesn't need to be: there is a single register that holds
its exact value, and an assignment to that register would have the same
semantics as an equivalent assignment to "a" in the source. The
optimizations taking place in this code are analogous to if we had "a =
bar() + 4 - 4;", but because we don't figure out that "a =
bar()" in a single pass, we pre-emptively assume that "a" must be
an r-value.

To be able to emit an l-value we would first need the ability to
optimize/simplify DIExpressions so that the expression becomes just
(DW_OP_stack_value) - this wouldn't be particularly difficult to implement
for simple arithmetic. Even with this improvement, the definition of
DW_OP_stack_value explicitly forbids the expression from being a register
location. If we instead used DW_OP_LLVM_direct, then we would be free to emit
the register location (DW_OP_reg0 RAX).

  // Compile with clang -O2 -g
  int baz();
  int bar2(int arg) {
    return arg * 4;
  }
  int bar() {
    return bar2(1);
  }
  int foo() {
    int a = baz() + bar() - 4;
    return a * 2;
  }

; Eventually becomes the IR...
  %call = call i32 @_Z3bazv(), !dbg !25
  %call1 = call i32 @_Z3barv(), !dbg !26
  %add = add nsw i32 %call, %call1, !dbg !27
  %sub = sub nsw i32 %add, 4, !dbg !28
  call void @llvm.dbg.value(metadata i32 %sub, metadata !24, metadata
!DIExpression()), !dbg !29
  %mul = mul nsw i32 %sub, 2, !dbg !30
  ret i32 %mul, !dbg !31

; Combine redundant instructions, "a" is salvaged...
  %call = call i32 @_Z3bazv(), !dbg !25
  %call1 = call i32 @_Z3barv(), !dbg !26
  %add = add nsw i32 %call, %call1, !dbg !27
  call void @llvm.dbg.value(metadata i32 %add, metadata !24, metadata
!DIExpression(DW_OP_constu, 4, DW_OP_minus, DW_OP_stack_value)), !dbg !28
  %sub = shl i32 %add, 1, !dbg !29
  %mul = add i32 %sub, -8, !dbg !29
  ret i32 %mul, !dbg !30

; bar() is found to always return 4
  %call = call i32 @_Z3bazv(), !dbg !14
  %add = add nsw i32 %call, 4, !dbg !15
  call void @llvm.dbg.value(metadata i32 %add, metadata !13, metadata
!DIExpression(DW_OP_constu, 4, DW_OP_minus, DW_OP_stack_value)), !dbg !16
  %sub = shl i32 %add, 1, !dbg !17
  %mul = add i32 %sub, -8, !dbg !17
  ret i32 %mul, !dbg !18

; %add is unused, optimize out and salvage...
  %call = call i32 @_Z3bazv(), !dbg !24
  call void @llvm.dbg.value(metadata i32 %call, metadata !23, metadata
!DIExpression(DW_OP_plus_uconst, 4, DW_OP_constu, 4, DW_OP_minus,
DW_OP_stack_value)), !dbg !25
  %add = shl i32 %call, 1, !dbg !26
  ret i32 %add, !dbg !27

  ; Final DWARF location for "a":
  DW_AT_location        (0x00000000:
      [0x0000000000000029, 0x000000000000002b): DW_OP_breg0 RAX+4, DW_OP_constu
0xffffffff, DW_OP_and, DW_OP_lit4, DW_OP_minus, DW_OP_stack_value)


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200916/1d35bbd8/attachment-0001.html>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Sep 2020 - [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

Possibly Parallel Threads