thr3ads.net - llvm dev - [llvm-dev] Defining the DIExpression operator DW_OP_LLVM

If this information is useful, please help other people find it:
Share via:

Tozer, Stephen via llvm-dev

2020-Jul-07 15:28 UTC

[llvm-dev] Defining the DIExpression operator DW_OP_LLVM_arg

Currently, two patches undergoing review wish to add a new operator for
LLVM's DIExpression: D70642[0], and D82363[1]. Though both of these cases
use the same name, the operators have different meanings: in the former, it has
the form `DW_OP_LLVM_argN` where N is in [0-7], and represents the Nth argument
of the containing intrinsic; its purpose is to enable the intrinsic
@llvm.dbg.derefval, which represents implicit pointers. In the latter, it has
the form `DW_OP_LLVM_arg, N`, where N >= 0, and represents the Nth value in a
variadic debug value, which is being added in the same patch. Functionally these
are quite similar; the main difference is that the latter picks the Nth value
from a subset of the intrinsic's operands, while the former picks the Nth
value from the entire containing intrinsic.

My personal opinion (possibly biased as the author of D82363) is that the best
solution is to remove DW_OP_LLVM_argN from the implicit pointer work altogether.
The operator introduced in that patch is more general-purpose than necessary; it
is only used as DW_OP_LLVM_arg0 and in conjunction with @llvm.dbg.derefval, an
intrinsic added in that patch. It seems at a glance that the functionality could
be folded into the new intrinsic and operator pair by having them automatically
refer to arg0. While it's possible that this general-purpose operator could
have value in the future, I think that it is better to have specific operators
for each use case, rather than an operator that could refer to a machine value,
a local variable, or even the DIExpression that contains it. Also of note, the
main feature that this operator would be useful for implementing is the feature
that D82363 is adding: referencing multiple machine values in a debug value. The
difference is that the version of the operator in the more recent patch is
explicitly used to represent machine values; it cannot refer to anything else.

Finally, the implicit pointer patches haven't received an update in a while:
the last comment or update was on January 10th. It is possible at this point
that the work will not make it in for a very long time, and may be substantially
changed when it does - I imagine there will be very little cost associated with
using a different operator in that case.

I'm mainly looking to confirm that there are no objections to this, and also
ensure that this change doesn't sneak past anyone if it is accepted.

[0] https://reviews.llvm.org/D70642
[1] https://reviews.llvm.org/D82363
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200707/563a97a3/attachment.html>

Adrian Prantl via llvm-dev

2020-Jul-07 20:20 UTC

head link

[llvm-dev] Defining the DIExpression operator DW_OP_LLVM_arg

Thanks for bringing this up!

To summarize my understanding: Neither of these operators is strictly necessary,
since the same effect can be achieved by implicitly pushing all operands of a
DBG_VALUE to the stack, followed by a combination of DW_OP_dup, DW_OP_pick,
DW_OP_swap, DW_OP_rot, and DW_OP_over. However, the resulting expressions can
get very long and unwieldy and it is easier to generate efficient DWARF from
DIExpressions that explicitly refer to arguments. DW_OP_LLVM_argN has the
advantage of using less memory, at the cost of limiting the number of arguments
to a hardcoded value.

By restricting their use to new intrinsics only (derefval and DBG_VALUE_VAR) we
can avoid any confusion about whether argument 0 is implicitly pushed to the
stack or not — in those new intrinsics, arguments always have to be referred to
explicitly. Personally, I'm leaning more towards DW_OP_LLVM_argN, because of
the space saving aspect.

Finally, to completely derail this discussion — once we have multi-arg support,
we may be able to simplify parts of the debug intrinsic handling by combining
multiple DW_OP_LLVM_fragment intrinsics describing the same variable into one
multi-argument expression with multiple DW_OP_LLVM_fragments.

-- adrian

Tozer, Stephen via llvm-dev

2020-Jul-08 12:21 UTC

head link

[llvm-dev] Defining the DIExpression operator DW_OP_LLVM_arg

> To summarize my understanding: Neither of these operators is strictly
necessary, since the same effect can be achieved by implicitly pushing all
operands of a DBG_VALUE to the stack, followed by a combination of DW_OP_dup,
DW_OP_pick, DW_OP_swap, DW_OP_rot, and DW_OP_over. However, the resulting
expressions can get very long and unwieldy and it is easier to generate
efficient DWARF from DIExpressions that explicitly refer to arguments.
DW_OP_LLVM_argN has the advantage of using less memory, at the cost of limiting
the number of arguments to a hardcoded value.
I'd also add that since `DW_OP_pick N` picks the Nth argument from the top
of the stack rather than the bottom, using it would require us to know how many
elements are on the stack at each point in the expression, and update the pick
depth if this changes. Also, for the sake of not bloating the final DWARF we
wouldn't want to actually push the args onto the stack first and then pick
them, so we'd basically be treating the final DW_OP_pick operators as
DW_OP_LLVM_arg either way.
> By restricting their use to new intrinsics only (derefval and
DBG_VALUE_VAR) we can avoid any confusion about whether argument 0 is implicitly
pushed to the stack or not — in those new intrinsics, arguments always have to
be referred to explicitly. Personally, I'm leaning more towards
DW_OP_LLVM_argN, because of the space saving aspect.
I think what you're saying here is that we could have them both use the new
operator, assigning it a different meaning depending on the intrinsic it's
used in? If so, I think it's unnecessary - the derefval case does not (as
far as I understand it) need any value other than arg0, and will also always use
arg0; we lose no information by omitting it. Having the meaning of the operand
vary depending on the intrinsic that contains it is also something I think we
should avoid if possible; if derefval was changed to be variadic (not an
unreasonable possibility I think) and the first argument was passed as a
variadic argument then it'd make more sense.

As for the operator, I'm not attached to either implementation;
DW_OP_LLVM_argN saves memory during compilation (though the size of the DWARF
output is identical), while `DW_OP_LLVM_arg N` is easier to work with in code as
we don't need any helper functions to extract N, or to generate one from a
given N value. The other question is whether we might actually need more than 8
arguments at some point; I suspect that for the vast majority of cases the
answer will be "no", but it's hard to be sure exactly what cases
may open up over time. Although...
> Finally, to completely derail this discussion — once we have multi-arg
support, we may be able to simplify parts of the debug intrinsic handling by
combining multiple DW_OP_LLVM_fragment intrinsics describing the same variable
into one multi-argument expression with multiple DW_OP_LLVM_fragments.
This makes for a good example of how a future change might require us to support
more arguments in an intrinsic than initially seems necessary.

llvm dev - Jul 2020 - Defining the DIExpression operator DW_OP_LLVM_arg

[llvm-dev] Defining the DIExpression operator DW_OP_LLVM_arg

[llvm-dev] Defining the DIExpression operator DW_OP_LLVM_arg

[llvm-dev] Defining the DIExpression operator DW_OP_LLVM_arg