thr3ads.net - llvm dev - [llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands [Sep 2020]

If this information is useful, please help other people find it:
Share via:

Tozer, Stephen via llvm-dev

2020-Sep-04 10:00 UTC

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

> Yeah, because that decision can only be made much later in LLVM in
AsmPrinter/DwarfExpression.cpp.
> In DWARF, DW_OP_reg(x) is a register l-value, all others can either be
l-values or r-values depending on whether there is a
DW_OP_stack_value/DW_OP_implicit* at the end.
Yes, it might not be clear but that's what I'm trying to say. Out of the
non-empty DWARF locations, register and memory locations are l-values, implicit
locations are r-values. You can technically use DW_OP_breg in an l-value, but
not for register locations. This is why when we have a DBG_VALUE that has a
single register location operand with an otherwise empty DIExpression, we need
some indicator to determine whether we want to produce the register location
[DW_OP_reg] or the memory location [DW_OP_breg] (currently this indicator is the
indirectness flag).
> I think it would be confusing to talk about registers at the LLVM IR /
DIExpression level. "SSA-Values"?
I think terminology is a bit difficult here because this work concerns both the
llvm.dbg.value intrinsic and the DBG_VALUE instruction, which operate on
different kinds of arguments. I think "location operands" is probably
the best description for them, since they are operands to a DIExpression which
is used to compute the variable location.
> I don't think that's correct, because a DW_OP_stack_value is an
rvalue. But maybe I misunderstood what you were trying to say.
> We should start be defining what DW_OP_stack_value really means in LLVM
debug info metadata. I believe it should just mean "r-value".
Having given it some more thought, I've changed my mind - I agree that we
shouldn't use DW_OP_stack_value in this case, because it would be changing
its meaning which is to explicitly declare the expression to be an implicit
location/r-value. My current line of thinking is that it would be better to
introduce a new operator, named DW_OP_LLVM_direct or something similar, which
has the meaning "the variable's exact value is produced by the
preceding expression", and would replace DW_OP_stack_value as it is
currently used within LLVM.

To summarise the logic behind using this operator: LLVM debug info does not need
to explicitly care about r-values or l-values before DWARF emission, only
whether we're describing a variable's memory location, a variable's
exact value, or some other implicit location (such as implicit_pointer). Whether
an expression is an r-value or l-value can be trivially determined at the end of
the pipeline (addMachineRegExpression already does this).

For an expression ending with DW_OP_LLVM_direct: if the preceding expression is
only a single register then we emit a register location, if the preceding
expression ends with DW_OP_deref then we can remove the deref and emit a memory
location, and otherwise we emit the expression with DW_OP_stack_value. In
expression syntax it would behave like an implicit operator, in that it can only
appear at the end of an expression and is incompatible with any implicit
operators, including DW_OP_stack_value.

The alternative I see for this is using a flag or a new DIExpression operator
that explicitly declares a single register DBG_VALUE to be a register location,
while it would otherwise be treated as a memory location, and use stack_value
for all other cases. The main reason I prefer the "direct" operator is
that LLVM doesn't need to know whether a DIExpression results in an l-value
location or an r-value location; it only needs to know how to compute the
variable's location and then determine whether that computation resolves to
an l-value or r-value at the end. Maintaining two separate representations for
stack value locations and register locations when we don't need to is an
unnecessary burden, especially when it may be possible for a given
dbg.value/DBG_VALUE to switch back and forth between them.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200904/3f07271f/attachment-0001.html>

Adrian Prantl via llvm-dev

2020-Sep-04 15:59 UTC

head link

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

> On Sep 4, 2020, at 3:00 AM, Tozer, Stephen <stephen.tozer at
sony.com> wrote:
> 
> > Yeah, because that decision can only be made much later in LLVM in
AsmPrinter/DwarfExpression.cpp.
> > In DWARF, DW_OP_reg(x) is a register l-value, all others can either be
l-values or r-values depending on whether there is a
DW_OP_stack_value/DW_OP_implicit* at the end.
> 
> Yes, it might not be clear but that's what I'm trying to say. Out
of the non-empty DWARF locations, register and memory locations are l-values,
implicit locations are r-values. You can technically use DW_OP_breg in an
l-value, but not for register locations. This is why when we have a DBG_VALUE
that has a single register location operand with an otherwise empty
DIExpression, we need some indicator to determine whether we want to produce the
register location [DW_OP_reg] or the memory location [DW_OP_breg] (currently
this indicator is the indirectness flag).
> 
> > I think it would be confusing to talk about registers at the LLVM IR /
DIExpression level. "SSA-Values"?
> 
> I think terminology is a bit difficult here because this work concerns both
the llvm.dbg.value intrinsic and the DBG_VALUE instruction, which operate on
different kinds of arguments. I think "location operands" is probably
the best description for them, since they are operands to a DIExpression which
is used to compute the variable location.
> 
> > I don't think that's correct, because a DW_OP_stack_value is
an rvalue. But maybe I misunderstood what you were trying to say.
> > We should start be defining what DW_OP_stack_value really means in
LLVM debug info metadata. I believe it should just mean "r-value".
> 
> Having given it some more thought, I've changed my mind - I agree that
we shouldn't use DW_OP_stack_value in this case, because it would be
changing its meaning which is to explicitly declare the expression to be an
implicit location/r-value. My current line of thinking is that it would be
better to introduce a new operator, named DW_OP_LLVM_direct or something
similar, which has the meaning "the variable's exact value is produced
by the preceding expression", and would replace DW_OP_stack_value as it is
currently used within LLVM.
Can you elaborate what "direct" means? I'm having trouble
understanding what the opposite (a non-exact value) would be.
> 
> To summarise the logic behind using this operator: LLVM debug info does not
need to explicitly care about r-values or l-values before DWARF emission,
I don't think that statement is correct. Based on the semantics, LLVM IR
knows that a dbg.declare is an l-value — the debugger can write to it and the
value will be changed when continuing the program execution. It can also decide
that a "working copy" of the value, described by a dbg.value is a
legit read-only representation of the variable, but can't be written to
because, e.g., the value exists in more than one place at once.

At the moment we don't make the lvalue/rvalue distinction in LLVM at all. We
make an educated guess in AsmPrinter. But that's wrong and something we
should strive to fix during this redesigning.
> only whether we're describing a variable's memory location, a
variable's exact value, or some other implicit location (such as
implicit_pointer). Whether an expression is an r-value or l-value can be
trivially determined at the end of the pipeline (addMachineRegExpression already
does this).
As stated above, I don't think we can trivially determine this, because (at
least for dbg.values) this info was lost already in LLVM IR. Unless we say the
dbg.declare / dbg.value distinction is what determines lvalues vs. rvalues.
> 
> For an expression ending with DW_OP_LLVM_direct: if the preceding
expression is only a single register then we emit a register location, if the
preceding expression ends with DW_OP_deref then we can remove the deref and emit
a memory location, and otherwise we emit the expression with DW_OP_stack_value.
In expression syntax it would behave like an implicit operator, in that it can
only appear at the end of an expression and is incompatible with any implicit
operators, including DW_OP_stack_value.
> 
> The alternative I see for this is using a flag or a new DIExpression
operator that explicitly declares a single register DBG_VALUE to be a register
location, while it would otherwise be treated as a memory location, and use
stack_value for all other cases. The main reason I prefer the "direct"
operator is that LLVM doesn't need to know whether a DIExpression results in
an l-value location or an r-value location; it only needs to know how to compute
the variable's location and then determine whether that computation resolves
to an l-value or r-value at the end. Maintaining two separate representations
for stack value locations and register locations when we don't need to is an
unnecessary burden, especially when it may be possible for a given
dbg.value/DBG_VALUE to switch back and forth between them.
I do think that your insight that we need one (or more?) additional
discriminator of some kind is correct — we just need to find the right semantics
for it.

thanks,
adrian

Tozer, Stephen via llvm-dev

2020-Sep-11 18:12 UTC

head link

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

> Can you elaborate what "direct" means? I'm having trouble
understanding what the opposite (a non-exact value) would be.
Apologies, "exact" was a misleading/incorrect term. By direct, I mean
that the expression computes the value of the variable, as opposed to its memory
address, or the value that it points to. Within LLVM, where we don't have
DW_OP_reg/DW_OP_breg but instead simply refer to a generic SSA value, this could
mean either a register location or stack value.
> At the moment we don't make the lvalue/rvalue distinction in LLVM at
all. We make an educated guess in AsmPrinter. But that's wrong and something
we should strive to fix during this redesigning.
I think the opposite; I don't believe there's any reason we need to make
the explicit lvalue/rvalue distinction until we're writing DWARF. To put it
in more general terms, I think that the IR/MIR debug value instructions should
only care about how the variable's value can be computed. Whether the result
of that computation is an lvalue is unimportant within LLVM itself as far as I
can tell, and is redundant when it can be computed from just the DIExpression
and location operands.
>As stated above, I don't think we can trivially determine this, because
(at least for dbg.values) this info was lost already in LLVM IR. Unless we say
the dbg.declare / dbg.value distinction is what determines lvalues vs. rvalues.
With the proposed operator, it would be trivial to determine lvalue vs rvalue
debug values with a set of rules (ignoring any fragment operator, which may
appear at the end but does not affect the location type):

  1. If the expression is empty, or any location arguments are $noreg =>
Empty
  2. If the expression ends with DW_OP_implicit_ptr => Implicit pointer
(rvalue)
  3. If the expression ends with DW_OP_stack_value =>Stack value (rvalue)  //
LLVM should produce LLVM_direct instead.
  4. If the expression ends with DW_OP_LLVM_direct, then...
    4a. If the preceding expression is just DW_OP_LLVM_arg, 0 and the only
location operand is a register => Register location (lvalue)
    4b. Otherwise => Stack value (rvalue)
  5. Otherwise => Memory location (lvalue)

This covers all the expected cases without ambiguity or almost any reduced
expressiveness. I believe that the only expression that LLVM will not be able to
produce like this is DW_OP_bregN, DW_OP_stack_value due to fact that when
DW_OP_LLVM_direct is used, this would be written as a register location instead
of a stack value. I don't think there are any cases where we would choose to
emit a stack value location when we're able to produce a register location
instead, so this shouldn't be a problem.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200911/4a430ed5/attachment.html>

llvm dev - Sep 2020 - [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands

[llvm-dev] [Debuginfo] Changing llvm.dbg.value and DBG_VALUE to support multiple location operands