thr3ads.net - llvm dev - [llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value [Sep 2017]

If this information is useful, please help other people find it:
Share via:

Reid Kleckner via llvm-dev

2017-Sep-06 21:01 UTC

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

On Wed, Sep 6, 2017 at 10:01 AM, David Blaikie <dblaikie at gmail.com>
wrote:
> On Tue, Sep 5, 2017 at 1:00 PM Reid Kleckner via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> LLVM SSA values obviously do not have an address that we can take and
>> they don’t live in registers, so neither the default memory location
model
>> nor DW_OP_regN make sense for LLVM’s dbg.value. We could hypothetically
>> repurpose DW_OP_stack_value to indicate that the SSA value passed to
>> llvm.dbg.value *is* the variable’s value, and if the expression lacks
>> DW_OP_stack_value, it must be a the address of the value. However, that
is
>> backwards incompatible and it seems like quite a stretch.
>>
>
> Seems like a stretch in what sense? The backwards incompatibility is
> certainly something to consider (though we went through that with
> DW_OP_bit_piece too), but this seems like the design I'd go to first so
I'd
> like to better understand why it's not the path forward if there's
some
> more detail about that aspect of the design choice here.
>
> I guess you described this already, but talking it through for
> myself/maybe others will find this useful:
>
> So since we don't have DW_OP_regN for LLVM registers, we could sort of
> assume the implicit first value on the stack is a pseudo-OP_regN of the
> LLVM SSA register.
>
Yep, that's how we use DIExpressions in both IR and MIR: The LHS of the
dbg.value and DBG_VALUE instructions are a register-like value that gets
pushed onto the expression stack. The DWARF asmprinter does some expression
folding to clean things up, but that's the model.

> To support that, all existing uses would need no changes to match the
> DWARF model of registers being implicitly direct values.
>
> Code that wanted to describe the register as containing the memory address
> of the interesting thing would use DW_OP_stack_value to say "this
location
> description that is a register is really an address you should follow to
> find the value, not a direct value itself"?
>
> But code that wanted to describe a variable as being 3 bytes ahead of a
> pointer in an LLVM SSA register would only have "plus 3" in the
expression
> stack, since then it's no longer a direct value but is treated as a
pointer
> to the value. I guess this is where the ambiguity would come in - currently
> how does "plus 3" get interpreted when seen in LLVM IR, I guess
that's
> meant to describe reg value + 3 as being the immediate value of the
> variable? (so it's implicitly OP_stack_value? & OP_stack_value is
added
> somewhere in the DWARF backend?)
>
Our model today is inconsistent. In LLVM IR today, the SSA value of the
dbg.value *is* the interesting value, it is not the address, and we
typically use empty DIExpressions. If the value is ultimately register
allocated and the DIExpression is empty, we will emit a DW_OP_regN location
expression. If the value is spilled, we usually don't need to append
DW_OP_stack_value because the location is now a memory location, which can
be described by DW_OP_[f]breg.

Today, passes that want to add "plus 3" to a DIExpression go out of
their
way to add DW_OP_stack_value to the DIExpression because the backend won't
do it for us, even though dbg.value normally describes the value, not an
address.

To explore the alternative DW_OP_stack_value model, here's how I'd go
about
it:
1. Replace llvm.dbg.value with new intrinsic, llvm.dbg.loc, to make the
semantic change clear. It can express both an address or a value, depending
on the DIExpression.
2. Auto-upgrade llvm.dbg.value to llvm.dbg.loc. Append DW_OP_stack_value to
the DIExpression argument of the intrinsic.
3. Auto-upgrade llvm.dbg.declare to llvm.dbg.loc, leave the DIExpression
alone. The LHS of llvm.dbg.declare is already the address of the variable.
4. Eliminate the second operand of DBG_VALUE MachineInstrs. Indirect
DBG_VALUES are now expressed with a DIExpression that lacks
DW_OP_stack_value at the end.
5. Teach our DWARF expression emitter to combine the new expressions as
necessary. In particular, we can elide DW_OP_stack_value for DBG_VALUEs
with physical register operands. They just use DW_OP_regN, which is
implicitly a value location.
6. Teach all passes that spill virtual registers used by DBG_VALUE to
remove DW_OP_stack_value from the DIExpression, or add DW_OP_deref as
appropriate.

This should be equivalent to DW_OP_LLVM_memory, and more inline with DWARF
location expression semantics, but it has a large migration cost.

---

I think part of the reason I wanted to move in the DW_OP_LLVM_memory
direction is that I originally wanted to add a memory offset operand to it.
Our actual use cases for complex DWARF expressions typically come from
things like safestack, ASan, and blocks. What these all have in common is
that they gather up a number of variables and move them off into a struct
in heap memory. This is very similar to what happens when we spill a
virtual register: instead of describing a register, we modify the
expression to load the value from some FP register with an offset. I think
the right representation for these transforms is basically a "chain of
loads". I was imagining that DW_OP_LLVM_memory with an offset would be that
load chain link.

The idea behind this representation is that it should make it easy for
spilling transforms to prepend a load chain onto the expression, rather
than them having to individually discover if DW_OP_deref is needed, or call
some common helper like DIExpression::prepend. It should always be valid to
push on a new load with an offset.

It also has the advantage that it will be easier to translate to CodeView
than arbitrary DWARF expressions, which we are currently canonicalizing
into a load chain and then attempting to emit.

Does that make sense? I'm starting to feel like I should either pursue the
more ambitious load chain design, or consistently apply DW_OP_stack_value
to llvm.dbg.loc (alternative names welcome).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170906/e454ae27/attachment.html>

Robinson, Paul via llvm-dev

2017-Sep-06 23:26 UTC

head link

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

How about adding dbg.loc without eliminating dbg.value?  Then you can
distinguish the cases without abusing DW_OP_stack_value so horribly.
--paulr

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Reid
Kleckner via llvm-dev
Sent: Wednesday, September 06, 2017 2:02 PM
To: David Blaikie
Cc: llvm-dev
Subject: Re: [llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables
in memory with dbg.value

On Wed, Sep 6, 2017 at 10:01 AM, David Blaikie <dblaikie at
gmail.com<mailto:dblaikie at gmail.com>> wrote:
On Tue, Sep 5, 2017 at 1:00 PM Reid Kleckner via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
LLVM SSA values obviously do not have an address that we can take and they don’t
live in registers, so neither the default memory location model nor DW_OP_regN
make sense for LLVM’s dbg.value. We could hypothetically repurpose
DW_OP_stack_value to indicate that the SSA value passed to llvm.dbg.value *is*
the variable’s value, and if the expression lacks DW_OP_stack_value, it must be
a the address of the value. However, that is backwards incompatible and it seems
like quite a stretch.

Seems like a stretch in what sense? The backwards incompatibility is certainly
something to consider (though we went through that with DW_OP_bit_piece too),
but this seems like the design I'd go to first so I'd like to better
understand why it's not the path forward if there's some more detail
about that aspect of the design choice here.

I guess you described this already, but talking it through for myself/maybe
others will find this useful:

So since we don't have DW_OP_regN for LLVM registers, we could sort of
assume the implicit first value on the stack is a pseudo-OP_regN of the LLVM SSA
register.

Yep, that's how we use DIExpressions in both IR and MIR: The LHS of the
dbg.value and DBG_VALUE instructions are a register-like value that gets pushed
onto the expression stack. The DWARF asmprinter does some expression folding to
clean things up, but that's the model.

To support that, all existing uses would need no changes to match the DWARF
model of registers being implicitly direct values.

Code that wanted to describe the register as containing the memory address of
the interesting thing would use DW_OP_stack_value to say "this location
description that is a register is really an address you should follow to find
the value, not a direct value itself"?

But code that wanted to describe a variable as being 3 bytes ahead of a pointer
in an LLVM SSA register would only have "plus 3" in the expression
stack, since then it's no longer a direct value but is treated as a pointer
to the value. I guess this is where the ambiguity would come in - currently how
does "plus 3" get interpreted when seen in LLVM IR, I guess that's
meant to describe reg value + 3 as being the immediate value of the variable?
(so it's implicitly OP_stack_value? & OP_stack_value is added somewhere
in the DWARF backend?)

Our model today is inconsistent. In LLVM IR today, the SSA value of the
dbg.value *is* the interesting value, it is not the address, and we typically
use empty DIExpressions. If the value is ultimately register allocated and the
DIExpression is empty, we will emit a DW_OP_regN location expression. If the
value is spilled, we usually don't need to append DW_OP_stack_value because
the location is now a memory location, which can be described by DW_OP_[f]breg.

Today, passes that want to add "plus 3" to a DIExpression go out of
their way to add DW_OP_stack_value to the DIExpression because the backend
won't do it for us, even though dbg.value normally describes the value, not
an address.

To explore the alternative DW_OP_stack_value model, here's how I'd go
about it:
1. Replace llvm.dbg.value with new intrinsic, llvm.dbg.loc, to make the semantic
change clear. It can express both an address or a value, depending on the
DIExpression.
2. Auto-upgrade llvm.dbg.value to llvm.dbg.loc. Append DW_OP_stack_value to the
DIExpression argument of the intrinsic.
3. Auto-upgrade llvm.dbg.declare to llvm.dbg.loc, leave the DIExpression alone.
The LHS of llvm.dbg.declare is already the address of the variable.
4. Eliminate the second operand of DBG_VALUE MachineInstrs. Indirect DBG_VALUES
are now expressed with a DIExpression that lacks DW_OP_stack_value at the end.
5. Teach our DWARF expression emitter to combine the new expressions as
necessary. In particular, we can elide DW_OP_stack_value for DBG_VALUEs with
physical register operands. They just use DW_OP_regN, which is implicitly a
value location.
6. Teach all passes that spill virtual registers used by DBG_VALUE to remove
DW_OP_stack_value from the DIExpression, or add DW_OP_deref as appropriate.

This should be equivalent to DW_OP_LLVM_memory, and more inline with DWARF
location expression semantics, but it has a large migration cost.

---

I think part of the reason I wanted to move in the DW_OP_LLVM_memory direction
is that I originally wanted to add a memory offset operand to it. Our actual use
cases for complex DWARF expressions typically come from things like safestack,
ASan, and blocks. What these all have in common is that they gather up a number
of variables and move them off into a struct in heap memory. This is very
similar to what happens when we spill a virtual register: instead of describing
a register, we modify the expression to load the value from some FP register
with an offset. I think the right representation for these transforms is
basically a "chain of loads". I was imagining that DW_OP_LLVM_memory
with an offset would be that load chain link.

The idea behind this representation is that it should make it easy for spilling
transforms to prepend a load chain onto the expression, rather than them having
to individually discover if DW_OP_deref is needed, or call some common helper
like DIExpression::prepend. It should always be valid to push on a new load with
an offset.

It also has the advantage that it will be easier to translate to CodeView than
arbitrary DWARF expressions, which we are currently canonicalizing into a load
chain and then attempting to emit.

Does that make sense? I'm starting to feel like I should either pursue the
more ambitious load chain design, or consistently apply DW_OP_stack_value to
llvm.dbg.loc (alternative names welcome).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170906/9fe4b3e0/attachment.html>

David Blaikie via llvm-dev

2017-Sep-07 00:01 UTC

head link

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

On Wed, Sep 6, 2017 at 2:01 PM Reid Kleckner <rnk at google.com> wrote:
> On Wed, Sep 6, 2017 at 10:01 AM, David Blaikie <dblaikie at
gmail.com> wrote:
>
>> On Tue, Sep 5, 2017 at 1:00 PM Reid Kleckner via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> LLVM SSA values obviously do not have an address that we can take
and
>>> they don’t live in registers, so neither the default memory
location model
>>> nor DW_OP_regN make sense for LLVM’s dbg.value. We could
hypothetically
>>> repurpose DW_OP_stack_value to indicate that the SSA value passed
to
>>> llvm.dbg.value *is* the variable’s value, and if the expression
lacks
>>> DW_OP_stack_value, it must be a the address of the value. However,
that is
>>> backwards incompatible and it seems like quite a stretch.
>>>
>>
>> Seems like a stretch in what sense? The backwards incompatibility is
>> certainly something to consider (though we went through that with
>> DW_OP_bit_piece too), but this seems like the design I'd go to
first so I'd
>> like to better understand why it's not the path forward if
there's some
>> more detail about that aspect of the design choice here.
>>
>> I guess you described this already, but talking it through for
>> myself/maybe others will find this useful:
>>
>> So since we don't have DW_OP_regN for LLVM registers, we could sort
of
>> assume the implicit first value on the stack is a pseudo-OP_regN of the
>> LLVM SSA register.
>>
>
> Yep, that's how we use DIExpressions in both IR and MIR: The LHS of the
> dbg.value and DBG_VALUE instructions are a register-like value that gets
> pushed onto the expression stack. The DWARF asmprinter does some expression
> folding to clean things up, but that's the model.
>
>
>> To support that, all existing uses would need no changes to match the
>> DWARF model of registers being implicitly direct values.
>>
>> Code that wanted to describe the register as containing the memory
>> address of the interesting thing would use DW_OP_stack_value to say
"this
>> location description that is a register is really an address you should
>> follow to find the value, not a direct value itself"?
>>
>> But code that wanted to describe a variable as being 3 bytes ahead of a
>> pointer in an LLVM SSA register would only have "plus 3" in
the expression
>> stack, since then it's no longer a direct value but is treated as a
pointer
>> to the value. I guess this is where the ambiguity would come in -
currently
>> how does "plus 3" get interpreted when seen in LLVM IR, I
guess that's
>> meant to describe reg value + 3 as being the immediate value of the
>> variable? (so it's implicitly OP_stack_value? & OP_stack_value
is added
>> somewhere in the DWARF backend?)
>>
>
> Our model today is inconsistent.
>
Inconsistent between what and what? LLVM and DWARF? Yeah, I guess there's
some mismatch between the semantics, though I'm still having trouble
wrapping my head around it.

> In LLVM IR today, the SSA value of the dbg.value *is* the interesting
> value, it is not the address, and we typically use empty DIExpressions. If
> the value is ultimately register allocated and the DIExpression is empty,
> we will emit a DW_OP_regN location expression. If the value is spilled, we
> usually don't need to append DW_OP_stack_value because the location is
now
> a memory location, which can be described by DW_OP_[f]breg.
>
> Today, passes that want to add "plus 3" to a DIExpression go out
of their
> way to add DW_OP_stack_value to the DIExpression because the backend
won't
> do it for us, even though dbg.value normally describes the value, not an
> address.
>
> To explore the alternative DW_OP_stack_value model, here's how I'd
go
> about it:
> 1. Replace llvm.dbg.value with new intrinsic, llvm.dbg.loc, to make the
> semantic change clear. It can express both an address or a value, depending
> on the DIExpression.
> 2. Auto-upgrade llvm.dbg.value to llvm.dbg.loc. Append DW_OP_stack_value
> to the DIExpression argument of the intrinsic.
> 3. Auto-upgrade llvm.dbg.declare to llvm.dbg.loc, leave the DIExpression
> alone. The LHS of llvm.dbg.declare is already the address of the variable.
> 4. Eliminate the second operand of DBG_VALUE MachineInstrs. Indirect
> DBG_VALUES are now expressed with a DIExpression that lacks
> DW_OP_stack_value at the end.
> 5. Teach our DWARF expression emitter to combine the new expressions as
> necessary. In particular, we can elide DW_OP_stack_value for DBG_VALUEs
> with physical register operands. They just use DW_OP_regN, which is
> implicitly a value location.
> 6. Teach all passes that spill virtual registers used by DBG_VALUE to
> remove DW_OP_stack_value from the DIExpression, or add DW_OP_deref as
> appropriate.
>
> This should be equivalent to DW_OP_LLVM_memory, and more inline with DWARF
> location expression semantics, but it has a large migration cost.
>
> ---
>
> I think part of the reason I wanted to move in the DW_OP_LLVM_memory
> direction is that I originally wanted to add a memory offset operand to it.
> Our actual use cases for complex DWARF expressions typically come from
> things like safestack, ASan, and blocks. What these all have in common is
> that they gather up a number of variables and move them off into a struct
> in heap memory. This is very similar to what happens when we spill a
> virtual register: instead of describing a register, we modify the
> expression to load the value from some FP register with an offset. I think
> the right representation for these transforms is basically a "chain of
> loads".
>
Don't think I've got any mental model of what you mean by this phrase
('chain of loads') - could you provide an example or the like?

> I was imagining that DW_OP_LLVM_memory with an offset would be that load
> chain link.
>
> The idea behind this representation is that it should make it easy for
> spilling transforms to prepend a load chain onto the expression, rather
> than them having to individually discover if DW_OP_deref is needed, or call
> some common helper like DIExpression::prepend. It should always be valid to
> push on a new load with an offset.
>
When would that not be valid today/without LLVM_memory? Sorry, again - it's
all a bit fuzzy in my head.

There'd be some canonicalization opportunities, but not seeing the
correctness issues with being able to prepend onto the location list -
 seems like that might be true with LLVM_memory too... maybe?

> It also has the advantage that it will be easier to translate to CodeView
> than arbitrary DWARF expressions, which we are currently canonicalizing
> into a load chain and then attempting to emit.
>
*nod* my worry is ending up with 3 different representations - DWARF,
CodeView, and the increasingly divergent IRDWARF (especially since it's
"sort of like DWARF" which makes the few divergences more
costly/difficult).

> Does that make sense? I'm starting to feel like I should either pursue
the
> more ambitious load chain design,
>
What would that look like?

> or consistently apply DW_OP_stack_value to llvm.dbg.loc (alternative names
> welcome).
>
Would have to think some more - maybe there's a way to avoid the rename?
But yeah, don't have a problem with llvm.dbg.loc - as you say/implied,
it'd
match the new semantics better.

But really, your original proposal's probably OK/close enough to go ahead.
I don't feel that strongly.

- Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170907/8715c78d/attachment.html>

Reid Kleckner via llvm-dev

2017-Sep-07 00:37 UTC

head link

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

On Wed, Sep 6, 2017 at 5:01 PM, David Blaikie <dblaikie at gmail.com>
wrote:
> On Wed, Sep 6, 2017 at 2:01 PM Reid Kleckner <rnk at google.com>
wrote:
>
>> On Wed, Sep 6, 2017 at 10:01 AM, David Blaikie <dblaikie at
gmail.com>
>> wrote:
>>
>>> I guess you described this already, but talking it through for
>>> myself/maybe others will find this useful:
>>>
>>> So since we don't have DW_OP_regN for LLVM registers, we could
sort of
>>> assume the implicit first value on the stack is a pseudo-OP_regN of
the
>>> LLVM SSA register.
>>>
>>
>> Yep, that's how we use DIExpressions in both IR and MIR: The LHS of
the
>> dbg.value and DBG_VALUE instructions are a register-like value that
gets
>> pushed onto the expression stack. The DWARF asmprinter does some
expression
>> folding to clean things up, but that's the model.
>>
>>
>>> To support that, all existing uses would need no changes to match
the
>>> DWARF model of registers being implicitly direct values.
>>>
>>> Code that wanted to describe the register as containing the memory
>>> address of the interesting thing would use DW_OP_stack_value to say
"this
>>> location description that is a register is really an address you
should
>>> follow to find the value, not a direct value itself"?
>>>
>>> But code that wanted to describe a variable as being 3 bytes ahead
of a
>>> pointer in an LLVM SSA register would only have "plus 3"
in the expression
>>> stack, since then it's no longer a direct value but is treated
as a pointer
>>> to the value. I guess this is where the ambiguity would come in -
currently
>>> how does "plus 3" get interpreted when seen in LLVM IR, I
guess that's
>>> meant to describe reg value + 3 as being the immediate value of the
>>> variable? (so it's implicitly OP_stack_value? &
OP_stack_value is added
>>> somewhere in the DWARF backend?)
>>>
>>
>> Our model today is inconsistent.
>>
>
> Inconsistent between what and what? LLVM and DWARF? Yeah, I guess
there's
> some mismatch between the semantics, though I'm still having trouble
> wrapping my head around it.
>
I mean LLVM's model is internally inconsistent. We have bugs like the RVO
one that you filed (https://llvm.org/pr34513), where we forget if the debug
value is an address or a value.

In LLVM IR today, the SSA value of the dbg.value *is* the
interesting>> value, it is not the address, and we typically use empty DIExpressions.
If
>> the value is ultimately register allocated and the DIExpression is
empty,
>> we will emit a DW_OP_regN location expression. If the value is spilled,
we
>> usually don't need to append DW_OP_stack_value because the location
is now
>> a memory location, which can be described by DW_OP_[f]breg.
>>
>> Today, passes that want to add "plus 3" to a DIExpression go
out of their
>> way to add DW_OP_stack_value to the DIExpression because the backend
won't
>> do it for us, even though dbg.value normally describes the value, not
an
>> address.
>>
>> To explore the alternative DW_OP_stack_value model, here's how
I'd go
>> about it:
>> 1. Replace llvm.dbg.value with new intrinsic, llvm.dbg.loc, to make the
>> semantic change clear. It can express both an address or a value,
depending
>> on the DIExpression.
>> 2. Auto-upgrade llvm.dbg.value to llvm.dbg.loc. Append
DW_OP_stack_value
>> to the DIExpression argument of the intrinsic.
>> 3. Auto-upgrade llvm.dbg.declare to llvm.dbg.loc, leave the
DIExpression
>> alone. The LHS of llvm.dbg.declare is already the address of the
variable.
>> 4. Eliminate the second operand of DBG_VALUE MachineInstrs. Indirect
>> DBG_VALUES are now expressed with a DIExpression that lacks
>> DW_OP_stack_value at the end.
>> 5. Teach our DWARF expression emitter to combine the new expressions as
>> necessary. In particular, we can elide DW_OP_stack_value for DBG_VALUEs
>> with physical register operands. They just use DW_OP_regN, which is
>> implicitly a value location.
>> 6. Teach all passes that spill virtual registers used by DBG_VALUE to
>> remove DW_OP_stack_value from the DIExpression, or add DW_OP_deref as
>> appropriate.
>>
>> This should be equivalent to DW_OP_LLVM_memory, and more inline with
>> DWARF location expression semantics, but it has a large migration cost.
>>
>> ---
>>
>> I think part of the reason I wanted to move in the DW_OP_LLVM_memory
>> direction is that I originally wanted to add a memory offset operand to
it.
>> Our actual use cases for complex DWARF expressions typically come from
>> things like safestack, ASan, and blocks. What these all have in common
is
>> that they gather up a number of variables and move them off into a
struct
>> in heap memory. This is very similar to what happens when we spill a
>> virtual register: instead of describing a register, we modify the
>> expression to load the value from some FP register with an offset. I
think
>> the right representation for these transforms is basically a
"chain of
>> loads".
>>
>
> Don't think I've got any mental model of what you mean by this
phrase
> ('chain of loads') - could you provide an example or the like?
>
Suppose you have a captured variable with __block shared storage, and then
suppose you compile it with ASan and safestack, and then the safestack
pointer is spilled. To compute the value, the debugger starts from a
register, goes to an offset, and loads a pointer, repeating the process
until it finds the value. As we proceed through codegen, we effectively
build up the chain.

1. To implement __block in Clang, we use dbg.declare(%block_descriptor,
DW_OP_deref, DW_OP_constu_plus $offset)
2. Assuming the block descriptor lived in an alloca (which it doesn't, but
assume it does for argument's sake), asan will move that alloca onto the
heap to implement use-after-return detection. It will prepend "DW_OP_deref,
DW_OP_constu, $offset" to the DIExpression.
3. If ASan put its value in an alloca and safestack wanted to move that
alloca to the safe stack (again bear with me), it would do the same:
prepend deref+offset.
4. Finally, spilling the safe stack pointer to the control stack would mean
prepending deref+offset.

This seems like a really common pattern. Right now this offsetting and
loading has to be expressed as separate location expression opcodes. The
DW_OP_deref opcode functions like a load sequencing operation that can only
appear between two offsets, although an offset could be zero, in which case
there would be no DW_OP_constu_plus opcode. I'm suggesting we move to a
representation where the offset and the deref are one. Think "semicolon as
sequencing operator" vs. "semicolon as statement terminator".
When we use
DW_OP_LLVM_memory or DW_OP_deref, the variable must always live in memory,
we're always doing address calculation. We should try to make our
representation more closely match the set of things we actually want to do
and support.

That's kind of the gist of what I had in mind. I didn't think it was
worth
it, which is why I pared the proposal down to just "the opposite of
DW_OP_stack_value".

> I was imagining that DW_OP_LLVM_memory with an offset would be that load
>> chain link.
>>
>> The idea behind this representation is that it should make it easy for
>> spilling transforms to prepend a load chain onto the expression, rather
>> than them having to individually discover if DW_OP_deref is needed, or
call
>> some common helper like DIExpression::prepend. It should always be
valid to
>> push on a new load with an offset.
>>
>
> When would that not be valid today/without LLVM_memory? Sorry, again -
> it's all a bit fuzzy in my head.
>
> There'd be some canonicalization opportunities, but not seeing the
> correctness issues with being able to prepend onto the location list -
>  seems like that might be true with LLVM_memory too... maybe?
>
The correctness issue with today's prepending of offsets and deref is that
it's hard to know when to insert deref, because we don't know if an
expression describes an address or a value. We have code like this in
buildDbgValueForSpill:
  // If the DBG_VALUE already was a memory location, add an extra
  // DW_OP_deref. Otherwise just turning this from a register into a
  // memory/indirect location is sufficient.
  if (IsIndirect)
    Expr = DIExpression::prepend(Expr, DIExpression::WithDeref);

We modify DBG_VALUEs for spills in several other places in codegen and they
don't all correctly insert DW_OP_deref. The load chain representation
should make it easy to just modify the offset on the front or add a new
load depending on what's being done.

> It also has the advantage that it will be easier to translate to CodeView
>> than arbitrary DWARF expressions, which we are currently canonicalizing
>> into a load chain and then attempting to emit.
>>
>
> *nod* my worry is ending up with 3 different representations - DWARF,
> CodeView, and the increasingly divergent IRDWARF (especially since it's
> "sort of like DWARF" which makes the few divergences more
costly/difficult).
>
>
>> Does that make sense? I'm starting to feel like I should either
pursue
>> the more ambitious load chain design,
>>
>
> What would that look like?
>
Just `DW_OP_LLVM_memory, 8, DW_OP_LLVM_memory, 20, ...` in DIExpression
through IR. It's OK to insert more DWARF opcodes between the links, it's
just non-canonical if they are pointer offsetting opcodes that could be
folded into the memory opcode. The DWARF expression backend would fold it
into the same location expressions we have today.

> or consistently apply DW_OP_stack_value to llvm.dbg.loc (alternative names
>> welcome).
>>
>
> Would have to think some more - maybe there's a way to avoid the
rename?
> But yeah, don't have a problem with llvm.dbg.loc - as you say/implied,
it'd
> match the new semantics better.
>
I don't think so. =/ I think the rename is the only safe way to maintain
bitcode compatibility.

> But really, your original proposal's probably OK/close enough to go
ahead.
> I don't feel that strongly.
>
Makes sense. That's basically where I ended up, but now I'm
reconsidering
the dbg.loc+DW_OP_stack_value thing, to bring DIExpressions closer to DWARF.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170906/aa0795a7/attachment.html>

Adrian Prantl via llvm-dev

2017-Sep-07 18:09 UTC

head link

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

> On Sep 6, 2017, at 2:01 PM, Reid Kleckner via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> On Wed, Sep 6, 2017 at 10:01 AM, David Blaikie <dblaikie at gmail.com
<mailto:dblaikie at gmail.com>> wrote:
> On Tue, Sep 5, 2017 at 1:00 PM Reid Kleckner via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> LLVM SSA values obviously do not have an address that we can take and they
don’t live in registers, so neither the default memory location model nor
DW_OP_regN make sense for LLVM’s dbg.value. We could hypothetically repurpose
DW_OP_stack_value to indicate that the SSA value passed to llvm.dbg.value *is*
the variable’s value, and if the expression lacks DW_OP_stack_value, it must be
a the address of the value. However, that is backwards incompatible and it seems
like quite a stretch.
> 
> Seems like a stretch in what sense? The backwards incompatibility is
certainly something to consider (though we went through that with
DW_OP_bit_piece too), but this seems like the design I'd go to first so
I'd like to better understand why it's not the path forward if
there's some more detail about that aspect of the design choice here.
> 
> I guess you described this already, but talking it through for myself/maybe
others will find this useful:
> 
> So since we don't have DW_OP_regN for LLVM registers, we could sort of
assume the implicit first value on the stack is a pseudo-OP_regN of the LLVM SSA
register.
> 
> Yep, that's how we use DIExpressions in both IR and MIR: The LHS of the
dbg.value and DBG_VALUE instructions are a register-like value that gets pushed
onto the expression stack. The DWARF asmprinter does some expression folding to
clean things up, but that's the model.
>  
> To support that, all existing uses would need no changes to match the DWARF
model of registers being implicitly direct values.
> 
> Code that wanted to describe the register as containing the memory address
of the interesting thing would use DW_OP_stack_value to say "this location
description that is a register is really an address you should follow to find
the value, not a direct value itself"?
> 
> But code that wanted to describe a variable as being 3 bytes ahead of a
pointer in an LLVM SSA register would only have "plus 3" in the
expression stack, since then it's no longer a direct value but is treated as
a pointer to the value. I guess this is where the ambiguity would come in -
currently how does "plus 3" get interpreted when seen in LLVM IR, I
guess that's meant to describe reg value + 3 as being the immediate value of
the variable? (so it's implicitly OP_stack_value? & OP_stack_value is
added somewhere in the DWARF backend?)
> 
> Our model today is inconsistent. In LLVM IR today, the SSA value of the
dbg.value *is* the interesting value, it is not the address, and we typically
use empty DIExpressions. If the value is ultimately register allocated and the
DIExpression is empty, we will emit a DW_OP_regN location expression. If the
value is spilled, we usually don't need to append DW_OP_stack_value because
the location is now a memory location, which can be described by DW_OP_[f]breg.
I wouldn't go as far to call it inconsistent, it is just that an IR
dbg.value describing an SSA value has different semantics than a MIR DBG_VALUE
(because in MIR we can actually distinguish between it describing a register or
a register-indirect memory locations).
> 
> Today, passes that want to add "plus 3" to a DIExpression go out
of their way to add DW_OP_stack_value to the DIExpression because the backend
won't do it for us, even though dbg.value normally describes the value, not
an address.
I agree that we are too fuzzy about the semantics of DW_OP_stack_value in
DIExpressions.

-- adrian
> To explore the alternative DW_OP_stack_value model, here's how I'd
go about it:
> 1. Replace llvm.dbg.value with new intrinsic, llvm.dbg.loc, to make the
semantic change clear. It can express both an address or a value, depending on
the DIExpression.
> 2. Auto-upgrade llvm.dbg.value to llvm.dbg.loc. Append DW_OP_stack_value to
the DIExpression argument of the intrinsic.
> 3. Auto-upgrade llvm.dbg.declare to llvm.dbg.loc, leave the DIExpression
alone. The LHS of llvm.dbg.declare is already the address of the variable.
> 4. Eliminate the second operand of DBG_VALUE MachineInstrs. Indirect
DBG_VALUES are now expressed with a DIExpression that lacks DW_OP_stack_value at
the end.
> 5. Teach our DWARF expression emitter to combine the new expressions as
necessary. In particular, we can elide DW_OP_stack_value for DBG_VALUEs with
physical register operands. They just use DW_OP_regN, which is implicitly a
value location.
> 6. Teach all passes that spill virtual registers used by DBG_VALUE to
remove DW_OP_stack_value from the DIExpression, or add DW_OP_deref as
appropriate.
> 
> This should be equivalent to DW_OP_LLVM_memory, and more inline with DWARF
location expression semantics, but it has a large migration cost.
> 
> ---
> 
> I think part of the reason I wanted to move in the DW_OP_LLVM_memory
direction is that I originally wanted to add a memory offset operand to it. Our
actual use cases for complex DWARF expressions typically come from things like
safestack, ASan, and blocks. What these all have in common is that they gather
up a number of variables and move them off into a struct in heap memory. This is
very similar to what happens when we spill a virtual register: instead of
describing a register, we modify the expression to load the value from some FP
register with an offset. I think the right representation for these transforms
is basically a "chain of loads". I was imagining that
DW_OP_LLVM_memory with an offset would be that load chain link.
> 
> The idea behind this representation is that it should make it easy for
spilling transforms to prepend a load chain onto the expression, rather than
them having to individually discover if DW_OP_deref is needed, or call some
common helper like DIExpression::prepend. It should always be valid to push on a
new load with an offset.
> 
> It also has the advantage that it will be easier to translate to CodeView
than arbitrary DWARF expressions, which we are currently canonicalizing into a
load chain and then attempting to emit.
> 
> Does that make sense? I'm starting to feel like I should either pursue
the more ambitious load chain design, or consistently apply DW_OP_stack_value to
llvm.dbg.loc (alternative names welcome).
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170907/7262d780/attachment.html>

llvm dev - Sep 2017 - RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value