thr3ads.net - llvm dev - [llvm-dev] Defining the DIExpression operator DW_OP_LLVM

If this information is useful, please help other people find it:
Share via:

Tozer, Stephen via llvm-dev

2020-Jul-08 12:21 UTC

[llvm-dev] Defining the DIExpression operator DW_OP_LLVM_arg

> To summarize my understanding: Neither of these operators is strictly
necessary, since the same effect can be achieved by implicitly pushing all
operands of a DBG_VALUE to the stack, followed by a combination of DW_OP_dup,
DW_OP_pick, DW_OP_swap, DW_OP_rot, and DW_OP_over. However, the resulting
expressions can get very long and unwieldy and it is easier to generate
efficient DWARF from DIExpressions that explicitly refer to arguments.
DW_OP_LLVM_argN has the advantage of using less memory, at the cost of limiting
the number of arguments to a hardcoded value.
I'd also add that since `DW_OP_pick N` picks the Nth argument from the top
of the stack rather than the bottom, using it would require us to know how many
elements are on the stack at each point in the expression, and update the pick
depth if this changes. Also, for the sake of not bloating the final DWARF we
wouldn't want to actually push the args onto the stack first and then pick
them, so we'd basically be treating the final DW_OP_pick operators as
DW_OP_LLVM_arg either way.
> By restricting their use to new intrinsics only (derefval and
DBG_VALUE_VAR) we can avoid any confusion about whether argument 0 is implicitly
pushed to the stack or not — in those new intrinsics, arguments always have to
be referred to explicitly. Personally, I'm leaning more towards
DW_OP_LLVM_argN, because of the space saving aspect.
I think what you're saying here is that we could have them both use the new
operator, assigning it a different meaning depending on the intrinsic it's
used in? If so, I think it's unnecessary - the derefval case does not (as
far as I understand it) need any value other than arg0, and will also always use
arg0; we lose no information by omitting it. Having the meaning of the operand
vary depending on the intrinsic that contains it is also something I think we
should avoid if possible; if derefval was changed to be variadic (not an
unreasonable possibility I think) and the first argument was passed as a
variadic argument then it'd make more sense.

As for the operator, I'm not attached to either implementation;
DW_OP_LLVM_argN saves memory during compilation (though the size of the DWARF
output is identical), while `DW_OP_LLVM_arg N` is easier to work with in code as
we don't need any helper functions to extract N, or to generate one from a
given N value. The other question is whether we might actually need more than 8
arguments at some point; I suspect that for the vast majority of cases the
answer will be "no", but it's hard to be sure exactly what cases
may open up over time. Although...
> Finally, to completely derail this discussion — once we have multi-arg
support, we may be able to simplify parts of the debug intrinsic handling by
combining multiple DW_OP_LLVM_fragment intrinsics describing the same variable
into one multi-argument expression with multiple DW_OP_LLVM_fragments.
This makes for a good example of how a future change might require us to support
more arguments in an intrinsic than initially seems necessary.

Adrian Prantl via llvm-dev

2020-Jul-08 16:29 UTC

head link

[llvm-dev] Defining the DIExpression operator DW_OP_LLVM_arg

> On Jul 8, 2020, at 5:21 AM, Tozer, Stephen <stephen.tozer at
sony.com> wrote:
> 
>> To summarize my understanding: Neither of these operators is strictly
necessary, since the same effect can be achieved by implicitly pushing all
operands of a DBG_VALUE to the stack, followed by a combination of DW_OP_dup,
DW_OP_pick, DW_OP_swap, DW_OP_rot, and DW_OP_over. However, the resulting
expressions can get very long and unwieldy and it is easier to generate
efficient DWARF from DIExpressions that explicitly refer to arguments.
DW_OP_LLVM_argN has the advantage of using less memory, at the cost of limiting
the number of arguments to a hardcoded value.
> 
> I'd also add that since `DW_OP_pick N` picks the Nth argument from the
top of the stack rather than the bottom, using it would require us to know how
many elements are on the stack at each point in the expression, and update the
pick depth if this changes. Also, for the sake of not bloating the final DWARF
we wouldn't want to actually push the args onto the stack first and then
pick them, so we'd basically be treating the final DW_OP_pick operators as
DW_OP_LLVM_arg either way.
Agreed. I think everyone can agree that this operator is a useful addition.
> 
>> By restricting their use to new intrinsics only (derefval and
DBG_VALUE_VAR) we can avoid any confusion about whether argument 0 is implicitly
pushed to the stack or not — in those new intrinsics, arguments always have to
be referred to explicitly. Personally, I'm leaning more towards
DW_OP_LLVM_argN, because of the space saving aspect.
> 
> I think what you're saying here is that we could have them both use the
new operator, assigning it a different meaning depending on the intrinsic
it's used in? If so, I think it's unnecessary - the derefval case does
not (as far as I understand it) need any value other than arg0, and will also
always use arg0; we lose no information by omitting it. Having the meaning of
the operand vary depending on the intrinsic that contains it is also something I
think we should avoid if possible; if derefval was changed to be variadic (not
an unreasonable possibility I think) and the first argument was passed as a
variadic argument then it'd make more sense.
Is your intention to keep the implicit DW_OP_LLVM_arg 0 at the beginning of all
DIExpressions referenced by debug intrinsics? From an ergonomics perspective it
would bother me if the first argument of a 2-argument debug intrinsic is pushed
implicitly and the second one isn't. However, transition dbg.value to a
first-arg-is-explicit form is difficult, if we don't want to break LTO and
bitcode compatibility. The cleanest way of doing this would be to create a
dbg.value.version-2.0 intrinsic that uses the explicit form, and auto-upgrade
the old dbg.value.
> 
> As for the operator, I'm not attached to either implementation;
DW_OP_LLVM_argN saves memory during compilation (though the size of the DWARF
output is identical), while `DW_OP_LLVM_arg N` is easier to work with in code as
we don't need any helper functions to extract N, or to generate one from a
given N value. The other question is whether we might actually need more than 8
arguments at some point; I suspect that for the vast majority of cases the
answer will be "no", but it's hard to be sure exactly what cases
may open up over time. Although...
Taking a step back, I think it is possible to separate the in-memory encoding
from the visual representation. There is no reason why we couldn't encode
DW_OP_LLVM_arg N in one 64-bit integer field. The DIExpression iterator can hide
this detail. More generally, we could encode *all* fields in DIExpressions as,
e.g, ULEB values in memory and hide that behind the iterator. With that
perspective I think DW_OP_LLVM_arg N is the way to go, and if we are really
bothered by the storage size, we can address this separately.

thanks,
adrian
> 
>> Finally, to completely derail this discussion — once we have multi-arg
support, we may be able to simplify parts of the debug intrinsic handling by
combining multiple DW_OP_LLVM_fragment intrinsics describing the same variable
into one multi-argument expression with multiple DW_OP_LLVM_fragments.
> 
> This makes for a good example of how a future change might require us to
support more arguments in an intrinsic than initially seems necessary.

David Blaikie via llvm-dev

2020-Jul-08 16:33 UTC

head link

[llvm-dev] Defining the DIExpression operator DW_OP_LLVM_arg

On Wed, Jul 8, 2020 at 9:29 AM Adrian Prantl via llvm-dev
<llvm-dev at lists.llvm.org> wrote:>
>
>
> > On Jul 8, 2020, at 5:21 AM, Tozer, Stephen <stephen.tozer at
sony.com> wrote:
> >
> >> To summarize my understanding: Neither of these operators is
strictly necessary, since the same effect can be achieved by implicitly pushing
all operands of a DBG_VALUE to the stack, followed by a combination of
DW_OP_dup, DW_OP_pick, DW_OP_swap, DW_OP_rot, and DW_OP_over. However, the
resulting expressions can get very long and unwieldy and it is easier to
generate efficient DWARF from DIExpressions that explicitly refer to arguments.
DW_OP_LLVM_argN has the advantage of using less memory, at the cost of limiting
the number of arguments to a hardcoded value.
> >
> > I'd also add that since `DW_OP_pick N` picks the Nth argument from
the top of the stack rather than the bottom, using it would require us to know
how many elements are on the stack at each point in the expression, and update
the pick depth if this changes. Also, for the sake of not bloating the final
DWARF we wouldn't want to actually push the args onto the stack first and
then pick them, so we'd basically be treating the final DW_OP_pick operators
as DW_OP_LLVM_arg either way.
>
> Agreed. I think everyone can agree that this operator is a useful addition.
>
> >
> >> By restricting their use to new intrinsics only (derefval and
DBG_VALUE_VAR) we can avoid any confusion about whether argument 0 is implicitly
pushed to the stack or not — in those new intrinsics, arguments always have to
be referred to explicitly. Personally, I'm leaning more towards
DW_OP_LLVM_argN, because of the space saving aspect.
> >
> > I think what you're saying here is that we could have them both
use the new operator, assigning it a different meaning depending on the
intrinsic it's used in? If so, I think it's unnecessary - the derefval
case does not (as far as I understand it) need any value other than arg0, and
will also always use arg0; we lose no information by omitting it. Having the
meaning of the operand vary depending on the intrinsic that contains it is also
something I think we should avoid if possible; if derefval was changed to be
variadic (not an unreasonable possibility I think) and the first argument was
passed as a variadic argument then it'd make more sense.
>
> Is your intention to keep the implicit DW_OP_LLVM_arg 0 at the beginning of
all DIExpressions referenced by debug intrinsics? From an ergonomics perspective
it would bother me if the first argument of a 2-argument debug intrinsic is
pushed implicitly and the second one isn't. However, transition dbg.value to
a first-arg-is-explicit form is difficult, if we don't want to break LTO and
bitcode compatibility. The cleanest way of doing this would be to create a
dbg.value.version-2.0 intrinsic that uses the explicit form, and auto-upgrade
the old dbg.value.
Not sure I'm quite following this bit - what are the particular
challenges of detecting that we're parsing old bitcode that relies on
the implicit style and upgrading it to the explicit style?
>
> >
> > As for the operator, I'm not attached to either implementation;
DW_OP_LLVM_argN saves memory during compilation (though the size of the DWARF
output is identical), while `DW_OP_LLVM_arg N` is easier to work with in code as
we don't need any helper functions to extract N, or to generate one from a
given N value. The other question is whether we might actually need more than 8
arguments at some point; I suspect that for the vast majority of cases the
answer will be "no", but it's hard to be sure exactly what cases
may open up over time. Although...
>
> Taking a step back, I think it is possible to separate the in-memory
encoding from the visual representation. There is no reason why we couldn't
encode DW_OP_LLVM_arg N in one 64-bit integer field. The DIExpression iterator
can hide this detail. More generally, we could encode *all* fields in
DIExpressions as, e.g, ULEB values in memory and hide that behind the iterator.
With that perspective I think DW_OP_LLVM_arg N is the way to go, and if we are
really bothered by the storage size, we can address this separately.
>
> thanks,
> adrian
>
> >
> >> Finally, to completely derail this discussion — once we have
multi-arg support, we may be able to simplify parts of the debug intrinsic
handling by combining multiple DW_OP_LLVM_fragment intrinsics describing the
same variable into one multi-argument expression with multiple
DW_OP_LLVM_fragments.
> >
> > This makes for a good example of how a future change might require us
to support more arguments in an intrinsic than initially seems necessary.
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Tozer, Stephen via llvm-dev

2020-Jul-08 16:58 UTC

head link

[llvm-dev] Defining the DIExpression operator DW_OP_LLVM_arg

> Is your intention to keep the implicit DW_OP_LLVM_arg 0 at the beginning of
all DIExpressions referenced by debug intrinsics? From an ergonomics perspective
it would bother me if the first argument of a 2-argument debug intrinsic is
pushed implicitly and the second one isn't. However, transition dbg.value to
a first-arg-is-explicit form is difficult, if we don't want to break LTO and
bitcode compatibility. The cleanest way of doing this would be to create a
dbg.value.version-2.0 intrinsic that uses the explicit form, and auto-upgrade
the old dbg.value.
To clarify, the implicit DW_OP_LLVM_arg 0 is only for the non-variadic
intrinsics, i.e. dbg.value and dbg.derefval. The version-2.0 intrinsic is more
or less what I'm doing right now with dbg.value.list: a new version of
dbg.value that uses explicit arguments and can have more than 1 of them. This
means the intrinsic itself is used to distinguish between implicit and explicit,
and it's my opinion that it may at some point be suitable to completely
replace dbg.value. With that said, I haven't tried to do so in this stack of
patches, as I suspect that trying to replace the intrinsic may result in a
lengthy discussion that shouldn't need to block the introduction of the new
intrinsic.
> Taking a step back, I think it is possible to separate the in-memory
encoding from the visual representation. There is no reason why we couldn't
encode DW_OP_LLVM_arg N in one 64-bit integer field. The DIExpression iterator
can hide this detail. More generally, we could encode *all* fields in
DIExpressions as, e.g, ULEB values in memory and hide that behind the iterator.
With that perspective I think DW_OP_LLVM_arg N is the way to go, and if we are
really bothered by the storage size, we can address this separately.
Agreed, I'm fine with either representation.

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Jul 2020 - Defining the DIExpression operator DW_OP_LLVM_arg

[llvm-dev] Defining the DIExpression operator DW_OP_LLVM_arg

[llvm-dev] Defining the DIExpression operator DW_OP_LLVM_arg

[llvm-dev] Defining the DIExpression operator DW_OP_LLVM_arg

[llvm-dev] Defining the DIExpression operator DW_OP_LLVM_arg

Apparently Analagous Threads