thr3ads.net - llvm dev - [llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands [Sep 2020]

If this information is useful, please help other people find it:
Share via:

Tozer, Stephen via llvm-dev

2020-Sep-02 14:01 UTC

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

> I'm not sure this will work as stated here. Indirectness is (mostly)
orthogonal to DW_OP_stack_value. DW_OP_stack_value denotes that we reconstructed
the value of the variable, but it doesn't exist in the program ("The
DW_OP_stack_value operation specifies that the object does not exist in memory
but its value is nonetheless known"), for example, a constant value. I
think we want something like DW_OP_deref instead, at least for r-values. For
l-values (=variables a debugger could write to) we would need to have a
discriminator that declares the DBG_VALUE as a memory location (cf. DWARF5
chapter 2.6).
This is a tricky one. Right now, DIExpressions sort-of mimic DWARF, but with
differences that aren't always immediately clear. The reason why I chose
DW_OP_stack_value for the direct-value-case instead of using DW_OP_deref for the
indirect-value-case is that it is more like actual DWARF: a DWARF expression is
either empty, a register, a memory address, or an implicit location. The new
representation handles each of these faithfully to DWARF, except for being
unable to distinguish between the register and memory case for a single register
argument. In DWARF, the difference is that a register location uses `DW_OP_reg
N`, while any reference to a register's value in any other type of location
uses `DW_OP_breg N`. We cannot specify these in LLVM since we only generate
these operators at the end; previously, this was the job of the indirectness
flag.

Rather than reintroducing a flag just for this purpose however, I instead
propose that we treat this as a special (albeit common) case: we can use
`DW_OP_LLVM_arg 0, DW_OP_stack_value` with a single register operand. We already
reduce the DIExpression to specialized DWARF operators at the point of DWARF
emission, for example: `<Register N>, DW_OP_plus_uconst 5` becomes
`DW_OP_bregN RSP+5`. If in any case where a stack value expression consists of
only a single register it is valid to convert it to a register location, then
this should be a valid transformation; I can't think of any cases where it
wouldn't be, since if we get a variable's value directly from a single
register then it necessarily exists at that location. The only exception would
be where, for one reason or another, we want DWARF to believe that the location
is implicit and thus cannot be written to; if such a case exists then it might
be suitable grounds to change the behaviour here.

This does have the potential to cause confusion to a reader unfamiliar with this
behaviour, but for a reader examining debug info in enough detail that the
removal of DW_OP_stack_value raises an eyebrows, I think simply noting the
behaviour in code comments and the documentation would be sufficient.
________________________________
From: Adrian Prantl <aprantl at apple.com>
Sent: 27 August 2020 17:42
To: Tozer, Stephen <stephen.tozer at sony.com>
Cc: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to
support multiple location operands


> On Aug 25, 2020, at 11:09 AM, Tozer, Stephen via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
>
> Currently there is a series of patches undergoing review[0] that seek to
enable the use of multiple IR/MIR values when describing a source variable's
location. The current plan for the MIR is to add a new instruction,
DBG_VALUE_LIST, that supports this functionality by having a variable number of
operands. It may be better however to simply replace the existing DBG_VALUE
behaviour entirely instead, and so I'm looking for any comments on this
change before pushing ahead with it.
Thank you for writing this up! I think this is generally a good idea.
>
> There are a few differences between the MIR instructions:
>
> Old: DBG_VALUE %x, $noreg, !DILocalVariable("x"), !DIExpression()
> New: DBG_VALUE !DILocalVariable("x"),
!DIExpression(DW_OP_LLVM_arg, 0, DW_OP_stack_value), %x
>
> 1) The "location" operand is moved to the end, as the instruction
is now variadic such that every operand after the DIExpression is a location
operand.
Sounds good.
> 2) The second operand which currently represents "Indirectness"
has been removed entirely, because this is now explicitly specified in the
DIExpression (see 4).
Sounds good, too. We may need to pay a cost for rewriting more DIExpressions,
but I don't see a way to make this work with multiple operands otherwise.
> 3) The DIExpression no longer implicitly treats the location operand as the
first element of the expression, instead each location must be explicitly
referenced in the expression using `DW_OP_LLVM_arg, N` for the Nth location
operand.
This is nice and consistent. If we are worried about the extra memory needed we
can still come up with a more efficient encoding of the common case, but the
DIExpression *interface* should present it like this.
> 4) The DIExpression itself must be explicit about whether it evaluates to
the location of a variable or its literal value, by using DW_OP_stack_value in
the latter case (instead of relying on the Indirectness flag, which is both
confusing and redundant[1]).
I'm not sure this will work as stated here. Indirectness is (mostly)
orthogonal to DW_OP_stack_value. DW_OP_stack_value denotes that we reconstructed
the value of the variable, but it doesn't exist in the program ("The
DW_OP_stack_value operation specifies that the object does not exist in memory
but its value is nonetheless known"), for example, a constant value. I
think we want something like DW_OP_deref instead, at least for r-values. For
l-values (=variables a debugger could write to) we would need to have a
discriminator that declares the DBG_VALUE as a memory location (cf. DWARF5
chapter 2.6).

I think this is going in the right direction, we just need to sort out that last
point!

thanks,
adrian

>
> I believe this is a strict improvement to the expressiveness and clarity of
DBG_VALUE. Although it increases the verbosity of simple expressions, such a
change is necessary to remove potential ambiguities in constant debug
expressions[2]. We will also be relying on the DIExpression to replace the
"Indirectness" flag, since it should now solely determine whether or
not a value is indirect; this brings us closer to the final DWARF
representation. One potential downside is that using DW_OP_stack_value for a
simple single-register DBG_VALUE (as in the example above) would currently lose
information, as it would output the DWARF expression `DW_OP_breg0 RSP+0,
DW_OP_stack_value` instead of the current output `DW_OP_reg0 RSP`. The former is
larger and gives less information, as both expressions evaluate to the same
value but only the latter gives a location for the variable that can be modified
by a debugger. This can be fixed with some pattern matching in the
DwarfExpression class to cover this specific (albeit common) case.
>
> The current approach for the IR is not to add a new instruction, but to add
a new metadata node that contains a list of IR value references (wrapped as
ValueAsMetadata) and use it as the first argument to dbg.value. There is no
syntactic incompatibility between this and the current dbg.value, and therefore
it is possible to support both simultaneously, but I believe it would be
unnecessarily complicated to maintain two separate forms of dbg.value. There is
no immediate plan to change dbg.declare and dbg.addr in the same way: there is
some value in the distinction between the intrinsics, the addresses do not use
constant values (and so avoid the ambiguity described in [2]), and there are few
(possibly no) cases where dbg.addr or dbg.declare intrinsics that use more than
one IR value would actually be produced: only salvageDebugInfo can produce
multi-value debug intrinsics, and debug address intrinsics usually use a
non-salvageable alloca as the location (I am currently unsure as to whether
non-alloca address intrinsics can or should be produced anywhere).
>
> Described here are the differences in the IR intrinsics:
>
> Old: @llvm.dbg.value(metadata i32 %x, metadata
!DILocalVariable("x"), metadata !DIExpression())
> New: @llvm.dbg.value(metadata !DIValueList(i32 %x), metadata
!DILocalVariable("x"), metadata !DIExpression(DW_OP_LLVM_arg, 0,
DW_OP_stack_value))
>
> 1) The location operand is changed from a single Value to a list of 0 or
more Values.
> 2) The DIExpression is modified in the same manner as in the MIR
instruction (see above).
>
> In summary, this is a notice of the intent to introduce these changes in
the patch described above. Currently the patches add these modified instructions
alongside the existing ones, but a total replacement would be a better outcome.
This is not a full RFC but is intended to ensure that this change doesn't
catch anyone by surprise and that there are no significant objections.
>
> [0] https://reviews.llvm.org/D82363
> [1] https://bugs.llvm.org/show_bug.cgi?id=41675#c8
> [2] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139441.html
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200902/0dffc74e/attachment-0001.html>

Adrian Prantl via llvm-dev

2020-Sep-02 23:36 UTC

head link

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

> On Sep 2, 2020, at 7:01 AM, Tozer, Stephen <stephen.tozer at
sony.com> wrote:
> 
>> > I'm not sure this will work as stated here. Indirectness is
(mostly) orthogonal to DW_OP_stack_value. DW_OP_stack_value denotes that we
reconstructed the value of the variable, but it doesn't exist in the program
("The DW_OP_stack_value operation specifies that the object does not exist
in memory but its value is nonetheless known"), for example, a constant
value. I think we want something like DW_OP_deref instead, at least for
r-values. For l-values (=variables a debugger could write to) we would need to
have a discriminator that declares the DBG_VALUE as a memory location (cf.
DWARF5 chapter 2.6).
> 
> This is a tricky one. Right now, DIExpressions sort-of mimic DWARF, but
with differences that aren't always immediately clear. The reason why I
chose DW_OP_stack_value for the direct-value-case instead of using DW_OP_deref
for the indirect-value-case is that it is more like actual DWARF: a DWARF
expression is either empty, a register, a memory address, or an implicit
location.
Yeah, because that decision can only be made much later in LLVM in
AsmPrinter/DwarfExpression.cpp.
> The new representation handles each of these faithfully to DWARF, except
for being unable to distinguish between the register and memory case for a
single register argument. In DWARF, the difference is that a register location
uses `DW_OP_reg N`, while any reference to a register's value in any other
type of location uses `DW_OP_breg N`. We cannot specify these in LLVM since we
only generate these operators at the end; previously, this was the job of the
indirectness flag.
In DWARF, DW_OP_reg(x) is a register l-value, all others can either be l-values
or r-values depending on whether there is a DW_OP_stack_value/DW_OP_implicit* at
the end.
> 
> Rather than reintroducing a flag just for this purpose however, I instead
propose that we treat this as a special (albeit common) case: we can use
`DW_OP_LLVM_arg 0, DW_OP_stack_value` with a single register operand.
I think it would be confusing to talk about registers at the LLVM IR /
DIExpression level. "SSA-Values"?
> We already reduce the DIExpression to specialized DWARF operators at the
point of DWARF emission, for example: `<Register N>, DW_OP_plus_uconst 5`
becomes `DW_OP_bregN RSP+5`. If in any case where a stack value expression
consists of only a single register it is valid to convert it to a register
location,
I don't think that's correct, because a DW_OP_stack_value is an rvalue.
But maybe I misunderstood what you were trying to say.
> then this should be a valid transformation; I can't think of any cases
where it wouldn't be, since if we get a variable's value directly from a
single register then it necessarily exists at that location. The only exception
would be where, for one reason or another, we want DWARF to believe that the
location is implicit and thus cannot be written to; if such a case exists then
it might be suitable grounds to change the behaviour here.
> 
> This does have the potential to cause confusion to a reader unfamiliar with
this behaviour, but for a reader examining debug info in enough detail that the
removal of DW_OP_stack_value raises an eyebrows, I think simply noting the
behaviour in code comments and the documentation would be sufficient.
We should start be defining what DW_OP_stack_value really means in LLVM debug
info metadata. I believe it should just mean "r-value".

-- adrian
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200902/2a804738/attachment.html>

Tozer, Stephen via llvm-dev

2020-Sep-04 10:00 UTC

head link

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

> Yeah, because that decision can only be made much later in LLVM in
AsmPrinter/DwarfExpression.cpp.
> In DWARF, DW_OP_reg(x) is a register l-value, all others can either be
l-values or r-values depending on whether there is a
DW_OP_stack_value/DW_OP_implicit* at the end.
Yes, it might not be clear but that's what I'm trying to say. Out of the
non-empty DWARF locations, register and memory locations are l-values, implicit
locations are r-values. You can technically use DW_OP_breg in an l-value, but
not for register locations. This is why when we have a DBG_VALUE that has a
single register location operand with an otherwise empty DIExpression, we need
some indicator to determine whether we want to produce the register location
[DW_OP_reg] or the memory location [DW_OP_breg] (currently this indicator is the
indirectness flag).
> I think it would be confusing to talk about registers at the LLVM IR /
DIExpression level. "SSA-Values"?
I think terminology is a bit difficult here because this work concerns both the
llvm.dbg.value intrinsic and the DBG_VALUE instruction, which operate on
different kinds of arguments. I think "location operands" is probably
the best description for them, since they are operands to a DIExpression which
is used to compute the variable location.
> I don't think that's correct, because a DW_OP_stack_value is an
rvalue. But maybe I misunderstood what you were trying to say.
> We should start be defining what DW_OP_stack_value really means in LLVM
debug info metadata. I believe it should just mean "r-value".
Having given it some more thought, I've changed my mind - I agree that we
shouldn't use DW_OP_stack_value in this case, because it would be changing
its meaning which is to explicitly declare the expression to be an implicit
location/r-value. My current line of thinking is that it would be better to
introduce a new operator, named DW_OP_LLVM_direct or something similar, which
has the meaning "the variable's exact value is produced by the
preceding expression", and would replace DW_OP_stack_value as it is
currently used within LLVM.

To summarise the logic behind using this operator: LLVM debug info does not need
to explicitly care about r-values or l-values before DWARF emission, only
whether we're describing a variable's memory location, a variable's
exact value, or some other implicit location (such as implicit_pointer). Whether
an expression is an r-value or l-value can be trivially determined at the end of
the pipeline (addMachineRegExpression already does this).

For an expression ending with DW_OP_LLVM_direct: if the preceding expression is
only a single register then we emit a register location, if the preceding
expression ends with DW_OP_deref then we can remove the deref and emit a memory
location, and otherwise we emit the expression with DW_OP_stack_value. In
expression syntax it would behave like an implicit operator, in that it can only
appear at the end of an expression and is incompatible with any implicit
operators, including DW_OP_stack_value.

The alternative I see for this is using a flag or a new DIExpression operator
that explicitly declares a single register DBG_VALUE to be a register location,
while it would otherwise be treated as a memory location, and use stack_value
for all other cases. The main reason I prefer the "direct" operator is
that LLVM doesn't need to know whether a DIExpression results in an l-value
location or an r-value location; it only needs to know how to compute the
variable's location and then determine whether that computation resolves to
an l-value or r-value at the end. Maintaining two separate representations for
stack value locations and register locations when we don't need to is an
unnecessary burden, especially when it may be possible for a given
dbg.value/DBG_VALUE to switch back and forth between them.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200904/3f07271f/attachment-0001.html>

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Sep 2020 - [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

Apparently Analagous Threads