thr3ads.net - llvm dev - [llvm-dev] Defining the DIExpression operator DW_OP_LLVM

If this information is useful, please help other people find it:
Share via:

David Blaikie via llvm-dev

2020-Jul-08 16:53 UTC

[llvm-dev] Defining the DIExpression operator DW_OP_LLVM_arg

On Wed, Jul 8, 2020 at 9:48 AM Adrian Prantl <aprantl at apple.com>
wrote:>
>
>
> > On Jul 8, 2020, at 9:33 AM, David Blaikie <dblaikie at
gmail.com> wrote:
> >
> > On Wed, Jul 8, 2020 at 9:29 AM Adrian Prantl via llvm-dev
> > <llvm-dev at lists.llvm.org> wrote:
> >>
> >>
> >>
> >>> On Jul 8, 2020, at 5:21 AM, Tozer, Stephen <stephen.tozer
at sony.com> wrote:
> >>>
> >>>> To summarize my understanding: Neither of these operators
is strictly necessary, since the same effect can be achieved by implicitly
pushing all operands of a DBG_VALUE to the stack, followed by a combination of
DW_OP_dup, DW_OP_pick, DW_OP_swap, DW_OP_rot, and DW_OP_over. However, the
resulting expressions can get very long and unwieldy and it is easier to
generate efficient DWARF from DIExpressions that explicitly refer to arguments.
DW_OP_LLVM_argN has the advantage of using less memory, at the cost of limiting
the number of arguments to a hardcoded value.
> >>>
> >>> I'd also add that since `DW_OP_pick N` picks the Nth
argument from the top of the stack rather than the bottom, using it would
require us to know how many elements are on the stack at each point in the
expression, and update the pick depth if this changes. Also, for the sake of not
bloating the final DWARF we wouldn't want to actually push the args onto the
stack first and then pick them, so we'd basically be treating the final
DW_OP_pick operators as DW_OP_LLVM_arg either way.
> >>
> >> Agreed. I think everyone can agree that this operator is a useful
addition.
> >>
> >>>
> >>>> By restricting their use to new intrinsics only (derefval
and DBG_VALUE_VAR) we can avoid any confusion about whether argument 0 is
implicitly pushed to the stack or not — in those new intrinsics, arguments
always have to be referred to explicitly. Personally, I'm leaning more
towards DW_OP_LLVM_argN, because of the space saving aspect.
> >>>
> >>> I think what you're saying here is that we could have them
both use the new operator, assigning it a different meaning depending on the
intrinsic it's used in? If so, I think it's unnecessary - the derefval
case does not (as far as I understand it) need any value other than arg0, and
will also always use arg0; we lose no information by omitting it. Having the
meaning of the operand vary depending on the intrinsic that contains it is also
something I think we should avoid if possible; if derefval was changed to be
variadic (not an unreasonable possibility I think) and the first argument was
passed as a variadic argument then it'd make more sense.
> >>
> >> Is your intention to keep the implicit DW_OP_LLVM_arg 0 at the
beginning of all DIExpressions referenced by debug intrinsics? From an
ergonomics perspective it would bother me if the first argument of a 2-argument
debug intrinsic is pushed implicitly and the second one isn't. However,
transition dbg.value to a first-arg-is-explicit form is difficult, if we
don't want to break LTO and bitcode compatibility. The cleanest way of doing
this would be to create a dbg.value.version-2.0 intrinsic that uses the explicit
form, and auto-upgrade the old dbg.value.
> >
> > Not sure I'm quite following this bit - what are the particular
> > challenges of detecting that we're parsing old bitcode that relies
on
> > the implicit style and upgrading it to the explicit style?
>
> To implement the "detecting that we're parsing old bitcode"
bit, we have to set a bit somewhere to distinguish between old and new format,
and I thought that somewhere could be the *name* of the intrinsic.
Why would we need that? Wouldn't it be in the bitcode elsewhere? (I
guess not, not without reving bitcode itself, which isn't a thing
that'd be done for this - my mistake)

Yeah, if it was in the DICompileUnit hierarchy, I guess we could
address it with a "Debug Version" rev, but since it's an intrinsic
in
the IR it's either rev the IR itself (expensive) or use a new name.
> However, writing this made me realize that DIExpressions also appear
outside of debug intrinsics, namely in DIGlobalVariableExpression(). There we
run into this ambiguity already, since this form
>
>   @i = global i32, !dbg !0
>   !0 = !DIGlobalVariableExpression(var: !DIGlobalVariable(...), expr:
!DIExpression())
>
> implicitly pushes the first arg, but this form
>
> !0 = distinct !DICompileUnit(..., globals: !{!1}, ...)
> !1 = !DIGlobalVariableExpression(var: !DIGlobalVariable(...), expr:
!DIExpression(DW_OP_constu, 42, DW_OP_stack_value))
Ah.
> doesn't. So considering that, we may have to move the bit into the
expression itself (e.g., by adding a version number to the bitcode for
DIExpression).
Hrm, fair enough. Thanks for the context!

- Dave
>
> -- adrian
>
> >
> >>
> >>>
> >>> As for the operator, I'm not attached to either
implementation; DW_OP_LLVM_argN saves memory during compilation (though the size
of the DWARF output is identical), while `DW_OP_LLVM_arg N` is easier to work
with in code as we don't need any helper functions to extract N, or to
generate one from a given N value. The other question is whether we might
actually need more than 8 arguments at some point; I suspect that for the vast
majority of cases the answer will be "no", but it's hard to be
sure exactly what cases may open up over time. Although...
> >>
> >> Taking a step back, I think it is possible to separate the
in-memory encoding from the visual representation. There is no reason why we
couldn't encode DW_OP_LLVM_arg N in one 64-bit integer field. The
DIExpression iterator can hide this detail. More generally, we could encode
*all* fields in DIExpressions as, e.g, ULEB values in memory and hide that
behind the iterator. With that perspective I think DW_OP_LLVM_arg N is the way
to go, and if we are really bothered by the storage size, we can address this
separately.
> >>
> >> thanks,
> >> adrian
> >>
> >>>
> >>>> Finally, to completely derail this discussion — once we
have multi-arg support, we may be able to simplify parts of the debug intrinsic
handling by combining multiple DW_OP_LLVM_fragment intrinsics describing the
same variable into one multi-argument expression with multiple
DW_OP_LLVM_fragments.
> >>>
> >>> This makes for a good example of how a future change might
require us to support more arguments in an intrinsic than initially seems
necessary.
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Adrian Prantl via llvm-dev

2020-Jul-08 17:00 UTC

head link

[llvm-dev] Defining the DIExpression operator DW_OP_LLVM_arg

> On Jul 8, 2020, at 9:53 AM, David Blaikie <dblaikie at gmail.com>
wrote:
> 
> On Wed, Jul 8, 2020 at 9:48 AM Adrian Prantl <aprantl at apple.com>
wrote:
>> 
>> 
>> 
>>> On Jul 8, 2020, at 9:33 AM, David Blaikie <dblaikie at
gmail.com> wrote:
>>> 
>>> On Wed, Jul 8, 2020 at 9:29 AM Adrian Prantl via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Jul 8, 2020, at 5:21 AM, Tozer, Stephen
<stephen.tozer at sony.com> wrote:
>>>>> 
>>>>>> To summarize my understanding: Neither of these
operators is strictly necessary, since the same effect can be achieved by
implicitly pushing all operands of a DBG_VALUE to the stack, followed by a
combination of DW_OP_dup, DW_OP_pick, DW_OP_swap, DW_OP_rot, and DW_OP_over.
However, the resulting expressions can get very long and unwieldy and it is
easier to generate efficient DWARF from DIExpressions that explicitly refer to
arguments. DW_OP_LLVM_argN has the advantage of using less memory, at the cost
of limiting the number of arguments to a hardcoded value.
>>>>> 
>>>>> I'd also add that since `DW_OP_pick N` picks the Nth
argument from the top of the stack rather than the bottom, using it would
require us to know how many elements are on the stack at each point in the
expression, and update the pick depth if this changes. Also, for the sake of not
bloating the final DWARF we wouldn't want to actually push the args onto the
stack first and then pick them, so we'd basically be treating the final
DW_OP_pick operators as DW_OP_LLVM_arg either way.
>>>> 
>>>> Agreed. I think everyone can agree that this operator is a
useful addition.
>>>> 
>>>>> 
>>>>>> By restricting their use to new intrinsics only
(derefval and DBG_VALUE_VAR) we can avoid any confusion about whether argument 0
is implicitly pushed to the stack or not — in those new intrinsics, arguments
always have to be referred to explicitly. Personally, I'm leaning more
towards DW_OP_LLVM_argN, because of the space saving aspect.
>>>>> 
>>>>> I think what you're saying here is that we could have
them both use the new operator, assigning it a different meaning depending on
the intrinsic it's used in? If so, I think it's unnecessary - the
derefval case does not (as far as I understand it) need any value other than
arg0, and will also always use arg0; we lose no information by omitting it.
Having the meaning of the operand vary depending on the intrinsic that contains
it is also something I think we should avoid if possible; if derefval was
changed to be variadic (not an unreasonable possibility I think) and the first
argument was passed as a variadic argument then it'd make more sense.
>>>> 
>>>> Is your intention to keep the implicit DW_OP_LLVM_arg 0 at the
beginning of all DIExpressions referenced by debug intrinsics? From an
ergonomics perspective it would bother me if the first argument of a 2-argument
debug intrinsic is pushed implicitly and the second one isn't. However,
transition dbg.value to a first-arg-is-explicit form is difficult, if we
don't want to break LTO and bitcode compatibility. The cleanest way of doing
this would be to create a dbg.value.version-2.0 intrinsic that uses the explicit
form, and auto-upgrade the old dbg.value.
>>> 
>>> Not sure I'm quite following this bit - what are the particular
>>> challenges of detecting that we're parsing old bitcode that
relies on
>>> the implicit style and upgrading it to the explicit style?
>> 
>> To implement the "detecting that we're parsing old
bitcode" bit, we have to set a bit somewhere to distinguish between old and
new format, and I thought that somewhere could be the *name* of the intrinsic.
> 
> Why would we need that? Wouldn't it be in the bitcode elsewhere? (I
> guess not, not without reving bitcode itself, which isn't a thing
> that'd be done for this - my mistake)
> 
> Yeah, if it was in the DICompileUnit hierarchy, I guess we could
> address it with a "Debug Version" rev, but since it's an
intrinsic in
> the IR it's either rev the IR itself (expensive) or use a new name.
Debug info metadata bitcode changes are currently implemented on a node-by-node
basis. A global IR version would be a nightmare for anyone maintaining their own
stable branch, since it would make cherry-picking bugfixes that need bitcode
changes dangerous, and consequently the version number meaningless. Imagine your
branch was cut at version 1, upstream has bugfixes introducing version 2 and 3,
and you need to cherry-pick bugfix 3. What version number do you pick?

Instead, each DINode has either feature flags or a version number (I know, that
has the same general problem, but it's more localized!) in its bitcode
representation.

-- adrian
> 
>> However, writing this made me realize that DIExpressions also appear
outside of debug intrinsics, namely in DIGlobalVariableExpression(). There we
run into this ambiguity already, since this form
>> 
>>  @i = global i32, !dbg !0
>>  !0 = !DIGlobalVariableExpression(var: !DIGlobalVariable(...), expr:
!DIExpression())
>> 
>> implicitly pushes the first arg, but this form
>> 
>> !0 = distinct !DICompileUnit(..., globals: !{!1}, ...)
>> !1 = !DIGlobalVariableExpression(var: !DIGlobalVariable(...), expr:
!DIExpression(DW_OP_constu, 42, DW_OP_stack_value))
> 
> Ah.
> 
>> doesn't. So considering that, we may have to move the bit into the
expression itself (e.g., by adding a version number to the bitcode for
DIExpression).
> 
> Hrm, fair enough. Thanks for the context!
> 
> - Dave
> 
>> 
>> -- adrian
>> 
>>> 
>>>> 
>>>>> 
>>>>> As for the operator, I'm not attached to either
implementation; DW_OP_LLVM_argN saves memory during compilation (though the size
of the DWARF output is identical), while `DW_OP_LLVM_arg N` is easier to work
with in code as we don't need any helper functions to extract N, or to
generate one from a given N value. The other question is whether we might
actually need more than 8 arguments at some point; I suspect that for the vast
majority of cases the answer will be "no", but it's hard to be
sure exactly what cases may open up over time. Although...
>>>> 
>>>> Taking a step back, I think it is possible to separate the
in-memory encoding from the visual representation. There is no reason why we
couldn't encode DW_OP_LLVM_arg N in one 64-bit integer field. The
DIExpression iterator can hide this detail. More generally, we could encode
*all* fields in DIExpressions as, e.g, ULEB values in memory and hide that
behind the iterator. With that perspective I think DW_OP_LLVM_arg N is the way
to go, and if we are really bothered by the storage size, we can address this
separately.
>>>> 
>>>> thanks,
>>>> adrian
>>>> 
>>>>> 
>>>>>> Finally, to completely derail this discussion — once we
have multi-arg support, we may be able to simplify parts of the debug intrinsic
handling by combining multiple DW_OP_LLVM_fragment intrinsics describing the
same variable into one multi-argument expression with multiple
DW_OP_LLVM_fragments.
>>>>> 
>>>>> This makes for a good example of how a future change might
require us to support more arguments in an intrinsic than initially seems
necessary.
>>>> 
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

David Blaikie via llvm-dev

2020-Jul-08 17:49 UTC

head link

[llvm-dev] Defining the DIExpression operator DW_OP_LLVM_arg

On Wed, Jul 8, 2020 at 10:00 AM Adrian Prantl <aprantl at apple.com>
wrote:>
>
>
> > On Jul 8, 2020, at 9:53 AM, David Blaikie <dblaikie at
gmail.com> wrote:
> >
> > On Wed, Jul 8, 2020 at 9:48 AM Adrian Prantl <aprantl at
apple.com> wrote:
> >>
> >>
> >>
> >>> On Jul 8, 2020, at 9:33 AM, David Blaikie <dblaikie at
gmail.com> wrote:
> >>>
> >>> On Wed, Jul 8, 2020 at 9:29 AM Adrian Prantl via llvm-dev
> >>> <llvm-dev at lists.llvm.org> wrote:
> >>>>
> >>>>
> >>>>
> >>>>> On Jul 8, 2020, at 5:21 AM, Tozer, Stephen
<stephen.tozer at sony.com> wrote:
> >>>>>
> >>>>>> To summarize my understanding: Neither of these
operators is strictly necessary, since the same effect can be achieved by
implicitly pushing all operands of a DBG_VALUE to the stack, followed by a
combination of DW_OP_dup, DW_OP_pick, DW_OP_swap, DW_OP_rot, and DW_OP_over.
However, the resulting expressions can get very long and unwieldy and it is
easier to generate efficient DWARF from DIExpressions that explicitly refer to
arguments. DW_OP_LLVM_argN has the advantage of using less memory, at the cost
of limiting the number of arguments to a hardcoded value.
> >>>>>
> >>>>> I'd also add that since `DW_OP_pick N` picks the
Nth argument from the top of the stack rather than the bottom, using it would
require us to know how many elements are on the stack at each point in the
expression, and update the pick depth if this changes. Also, for the sake of not
bloating the final DWARF we wouldn't want to actually push the args onto the
stack first and then pick them, so we'd basically be treating the final
DW_OP_pick operators as DW_OP_LLVM_arg either way.
> >>>>
> >>>> Agreed. I think everyone can agree that this operator is a
useful addition.
> >>>>
> >>>>>
> >>>>>> By restricting their use to new intrinsics only
(derefval and DBG_VALUE_VAR) we can avoid any confusion about whether argument 0
is implicitly pushed to the stack or not — in those new intrinsics, arguments
always have to be referred to explicitly. Personally, I'm leaning more
towards DW_OP_LLVM_argN, because of the space saving aspect.
> >>>>>
> >>>>> I think what you're saying here is that we could
have them both use the new operator, assigning it a different meaning depending
on the intrinsic it's used in? If so, I think it's unnecessary - the
derefval case does not (as far as I understand it) need any value other than
arg0, and will also always use arg0; we lose no information by omitting it.
Having the meaning of the operand vary depending on the intrinsic that contains
it is also something I think we should avoid if possible; if derefval was
changed to be variadic (not an unreasonable possibility I think) and the first
argument was passed as a variadic argument then it'd make more sense.
> >>>>
> >>>> Is your intention to keep the implicit DW_OP_LLVM_arg 0 at
the beginning of all DIExpressions referenced by debug intrinsics? From an
ergonomics perspective it would bother me if the first argument of a 2-argument
debug intrinsic is pushed implicitly and the second one isn't. However,
transition dbg.value to a first-arg-is-explicit form is difficult, if we
don't want to break LTO and bitcode compatibility. The cleanest way of doing
this would be to create a dbg.value.version-2.0 intrinsic that uses the explicit
form, and auto-upgrade the old dbg.value.
> >>>
> >>> Not sure I'm quite following this bit - what are the
particular
> >>> challenges of detecting that we're parsing old bitcode
that relies on
> >>> the implicit style and upgrading it to the explicit style?
> >>
> >> To implement the "detecting that we're parsing old
bitcode" bit, we have to set a bit somewhere to distinguish between old and
new format, and I thought that somewhere could be the *name* of the intrinsic.
> >
> > Why would we need that? Wouldn't it be in the bitcode elsewhere?
(I
> > guess not, not without reving bitcode itself, which isn't a thing
> > that'd be done for this - my mistake)
> >
> > Yeah, if it was in the DICompileUnit hierarchy, I guess we could
> > address it with a "Debug Version" rev, but since it's an
intrinsic in
> > the IR it's either rev the IR itself (expensive) or use a new
name.
>
> Debug info metadata bitcode changes are currently implemented on a
node-by-node basis. A global IR version would be a nightmare for anyone
maintaining their own stable branch, since it would make cherry-picking bugfixes
that need bitcode changes dangerous, and consequently the version number
meaningless. Imagine your branch was cut at version 1, upstream has bugfixes
introducing version 2 and 3, and you need to cherry-pick bugfix 3. What version
number do you pick?
So the "Debug Info Version" module metadata ( !{i32 2, !"Debug
Info
Version", i32 3}) is essentially unused/saved for some really big
changes only? *nod*
>
> Instead, each DINode has either feature flags or a version number (I know,
that has the same general problem, but it's more localized!) in its bitcode
representation.
>
> -- adrian
>
> >
> >> However, writing this made me realize that DIExpressions also
appear outside of debug intrinsics, namely in DIGlobalVariableExpression().
There we run into this ambiguity already, since this form
> >>
> >>  @i = global i32, !dbg !0
> >>  !0 = !DIGlobalVariableExpression(var: !DIGlobalVariable(...),
expr: !DIExpression())
> >>
> >> implicitly pushes the first arg, but this form
> >>
> >> !0 = distinct !DICompileUnit(..., globals: !{!1}, ...)
> >> !1 = !DIGlobalVariableExpression(var: !DIGlobalVariable(...),
expr: !DIExpression(DW_OP_constu, 42, DW_OP_stack_value))
> >
> > Ah.
> >
> >> doesn't. So considering that, we may have to move the bit into
the expression itself (e.g., by adding a version number to the bitcode for
DIExpression).
> >
> > Hrm, fair enough. Thanks for the context!
> >
> > - Dave
> >
> >>
> >> -- adrian
> >>
> >>>
> >>>>
> >>>>>
> >>>>> As for the operator, I'm not attached to either
implementation; DW_OP_LLVM_argN saves memory during compilation (though the size
of the DWARF output is identical), while `DW_OP_LLVM_arg N` is easier to work
with in code as we don't need any helper functions to extract N, or to
generate one from a given N value. The other question is whether we might
actually need more than 8 arguments at some point; I suspect that for the vast
majority of cases the answer will be "no", but it's hard to be
sure exactly what cases may open up over time. Although...
> >>>>
> >>>> Taking a step back, I think it is possible to separate the
in-memory encoding from the visual representation. There is no reason why we
couldn't encode DW_OP_LLVM_arg N in one 64-bit integer field. The
DIExpression iterator can hide this detail. More generally, we could encode
*all* fields in DIExpressions as, e.g, ULEB values in memory and hide that
behind the iterator. With that perspective I think DW_OP_LLVM_arg N is the way
to go, and if we are really bothered by the storage size, we can address this
separately.
> >>>>
> >>>> thanks,
> >>>> adrian
> >>>>
> >>>>>
> >>>>>> Finally, to completely derail this discussion —
once we have multi-arg support, we may be able to simplify parts of the debug
intrinsic handling by combining multiple DW_OP_LLVM_fragment intrinsics
describing the same variable into one multi-argument expression with multiple
DW_OP_LLVM_fragments.
> >>>>>
> >>>>> This makes for a good example of how a future change
might require us to support more arguments in an intrinsic than initially seems
necessary.
> >>>>
> >>>> _______________________________________________
> >>>> LLVM Developers mailing list
> >>>> llvm-dev at lists.llvm.org
> >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

llvm dev - Jul 2020 - Defining the DIExpression operator DW_OP_LLVM_arg

[llvm-dev] Defining the DIExpression operator DW_OP_LLVM_arg

[llvm-dev] Defining the DIExpression operator DW_OP_LLVM_arg

[llvm-dev] Defining the DIExpression operator DW_OP_LLVM_arg