thr3ads.net - llvm dev - [llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value [Sep 2017]

If this information is useful, please help other people find it:
Share via:

Reid Kleckner via llvm-dev

2017-Sep-05 20:00 UTC

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

Debug info today handles two cases reasonably well:
1. At -O0, dbg.declare does a good job describing variables that live at
some known stack offset
2. With optimizations, variables promoted to SSA can be described with
dbg.value

This leaves behind a large hole in our optimized debug info: variables that
cannot be promoted, typically because they are address-taken. This is
https://llvm.org/pr34136, and this RFC is mostly about addressing that.

The status today is that instcombine removes all dbg.declares and
heuristically inserts dbg.values where it can identify the value of the
variable in question. This prevents us from having misleading debug info,
but it throws away information about the variable’s location in memory.

Part of the reason that instcombine discards dbg.declares is that we can’t
mix and match dbg.value with dbg.declare. If the backend sees a
dbg.declare, it accepts that information as more reliable and discards all
DBG_VALUE instructions associated with that variable. So, we need something
we can mix. We need a way to say, the variable lives in memory *at this
program point*, and it might live somewhere else later on. I propose that
we introduce DW_OP_LLVM_memory for this purpose, and then we transition
from dbg.declare to dbg.value+DW_OP_LLVM_memory.

Initially I believed that DW_OP_deref was the way to say this with existing
DWARF expression opcodes, but I implemented that in
https://reviews.llvm.org/D37311 and learned more about how DWARF
expressions work. When a debugger begins evaluating a DWARF expression, it
assumes that the resulting value will be a pointer to the variable in
memory. For a debugger, this makes sense, because debug builds put things
in memory and even after optimization many variables must be spilled. Only
the special DW_OP_regN and DW_OP_stack_value expression opcodes change the
location of the value from memory to register or stack value.

LLVM SSA values obviously do not have an address that we can take and they
don’t live in registers, so neither the default memory location model nor
DW_OP_regN make sense for LLVM’s dbg.value. We could hypothetically
repurpose DW_OP_stack_value to indicate that the SSA value passed to
llvm.dbg.value *is* the variable’s value, and if the expression lacks
DW_OP_stack_value, it must be a the address of the value. However, that is
backwards incompatible and it seems like quite a stretch.

DW_OP_LLVM_memory would be very similar to DW_OP_stack_value, though. It
would only be valid at the end of a DIExpression. The backend will always
remove it because the debugger will assume the variable lives in memory
unless it is told otherwise.

For the original problem of improving optimized debug info while avoiding
inaccurate information in the presence of dead store elimination, consider
this C example:
  int x = 42;  // Can DSE
  dostuff(x); // Can propagate 42
  x = computation();  // Post-dominates `x = 42` store
  escape(&x);

We should be able to do this:
  int x; // eliminate `x = 42` store
  dbg.value(!x, 42, !DIExpression()) // mark x as the constant 42 in debug
info
  dostuff(42); // propagate 42
  dbg.value(!x, &x, !DIExpression(DW_OP_LLVM_memory)) // x is in memory
again
  x = computation();
  escape(&x);

Passes that delete stores would be responsible for checking if the store
destination is part of an alloca with associated dbg.value instructions.
They would emit a new dbg.value instruction for that variable with the
stored value, and clone the dbg.value instruction that puts the variable
back in memory before the killing store. If the store is dead because
variable lifetime is ending, the second dbg.value is unnecessary.

This will also allow us to fix debug info for px in this example:
 void __attribute__((optnone, noinline)) usevar(int *x) {}
  int main(int argc, char **argv) {
    int x = 42;
    int *px = &x;
    usevar(&x);
    if (argc) usevar(px);
  }

Today, we emit a location for px like `DW_OP_breg7 RSP+12`, which gives it
the incorrect value 42. This is because our DBG_VALUE instruction for px’s
location uses a frame index, which we assume is in memory. This is not the
case, px is not in memory, it’s value is a stack object pointer.

Please reply if you have any thoughts on this proposal. Adrian and I hashed
this out over Bugzilla, IRC, and in person, so it shouldn’t be too
surprising. Let me know if you want to be CC’d on the patches.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170905/2df8a4c5/attachment.html>

Alex Bradbury via llvm-dev

2017-Sep-06 10:24 UTC

head link

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

On 5 September 2017 at 21:00, Reid Kleckner via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Debug info today handles two cases reasonably well:
> 1. At -O0, dbg.declare does a good job describing variables that live at
> some known stack offset
> 2. With optimizations, variables promoted to SSA can be described with
> dbg.value
>
> This leaves behind a large hole in our optimized debug info: variables that
> cannot be promoted, typically because they are address-taken. This is
> https://llvm.org/pr34136, and this RFC is mostly about addressing that.
>
> The status today is that instcombine removes all dbg.declares and
> heuristically inserts dbg.values where it can identify the value of the
> variable in question. This prevents us from having misleading debug info,
> but it throws away information about the variable’s location in memory.
Hi Reid, thanks for writing such a clear summary of the problem that
this RFC addresses. I was wondering if there is any sort of
methodology for quantifying the "quality" of debug information? e.g.
the gap between the current debug information and 'ideal' debug info?

Thanks,

Alex

David Blaikie via llvm-dev

2017-Sep-06 17:01 UTC

head link

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

On Tue, Sep 5, 2017 at 1:00 PM Reid Kleckner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Debug info today handles two cases reasonably well:
> 1. At -O0, dbg.declare does a good job describing variables that live at
> some known stack offset
> 2. With optimizations, variables promoted to SSA can be described with
> dbg.value
>
> This leaves behind a large hole in our optimized debug info: variables
> that cannot be promoted, typically because they are address-taken. This is
> https://llvm.org/pr34136, and this RFC is mostly about addressing that.
>
> The status today is that instcombine removes all dbg.declares and
> heuristically inserts dbg.values where it can identify the value of the
> variable in question. This prevents us from having misleading debug info,
> but it throws away information about the variable’s location in memory.
>
> Part of the reason that instcombine discards dbg.declares is that we can’t
> mix and match dbg.value with dbg.declare. If the backend sees a
> dbg.declare, it accepts that information as more reliable and discards all
> DBG_VALUE instructions associated with that variable. So, we need something
> we can mix. We need a way to say, the variable lives in memory *at this
> program point*, and it might live somewhere else later on. I propose that
> we introduce DW_OP_LLVM_memory for this purpose, and then we transition
> from dbg.declare to dbg.value+DW_OP_LLVM_memory.
>
> Initially I believed that DW_OP_deref was the way to say this with
> existing DWARF expression opcodes, but I implemented that in
> https://reviews.llvm.org/D37311 and learned more about how DWARF
> expressions work. When a debugger begins evaluating a DWARF expression, it
> assumes that the resulting value will be a pointer to the variable in
> memory. For a debugger, this makes sense, because debug builds put things
> in memory and even after optimization many variables must be spilled. Only
> the special DW_OP_regN and DW_OP_stack_value expression opcodes change the
> location of the value from memory to register or stack value.
>
> LLVM SSA values obviously do not have an address that we can take and they
> don’t live in registers, so neither the default memory location model nor
> DW_OP_regN make sense for LLVM’s dbg.value. We could hypothetically
> repurpose DW_OP_stack_value to indicate that the SSA value passed to
> llvm.dbg.value *is* the variable’s value, and if the expression lacks
> DW_OP_stack_value, it must be a the address of the value. However, that is
> backwards incompatible and it seems like quite a stretch.
>
Seems like a stretch in what sense? The backwards incompatibility is
certainly something to consider (though we went through that with
DW_OP_bit_piece too), but this seems like the design I'd go to first so
I'd
like to better understand why it's not the path forward if there's some
more detail about that aspect of the design choice here.

I guess you described this already, but talking it through for myself/maybe
others will find this useful:

So since we don't have DW_OP_regN for LLVM registers, we could sort of
assume the implicit first value on the stack is a pseudo-OP_regN of the
LLVM SSA register.

To support that, all existing uses would need no changes to match the DWARF
model of registers being implicitly direct values.

Code that wanted to describe the register as containing the memory address
of the interesting thing would use DW_OP_stack_value to say "this location
description that is a register is really an address you should follow to
find the value, not a direct value itself"?

But code that wanted to describe a variable as being 3 bytes ahead of a
pointer in an LLVM SSA register would only have "plus 3" in the
expression
stack, since then it's no longer a direct value but is treated as a pointer
to the value. I guess this is where the ambiguity would come in - currently
how does "plus 3" get interpreted when seen in LLVM IR, I guess
that's
meant to describe reg value + 3 as being the immediate value of the
variable? (so it's implicitly OP_stack_value? & OP_stack_value is added
somewhere in the DWARF backend?)

Thanks,
- Dave

>
> DW_OP_LLVM_memory would be very similar to DW_OP_stack_value, though. It
> would only be valid at the end of a DIExpression. The backend will always
> remove it because the debugger will assume the variable lives in memory
> unless it is told otherwise.
>
> For the original problem of improving optimized debug info while avoiding
> inaccurate information in the presence of dead store elimination, consider
> this C example:
>   int x = 42;  // Can DSE
>   dostuff(x); // Can propagate 42
>   x = computation();  // Post-dominates `x = 42` store
>   escape(&x);
>
> We should be able to do this:
>   int x; // eliminate `x = 42` store
>   dbg.value(!x, 42, !DIExpression()) // mark x as the constant 42 in debug
> info
>   dostuff(42); // propagate 42
>   dbg.value(!x, &x, !DIExpression(DW_OP_LLVM_memory)) // x is in memory
> again
>   x = computation();
>   escape(&x);
>
> Passes that delete stores would be responsible for checking if the store
> destination is part of an alloca with associated dbg.value instructions.
> They would emit a new dbg.value instruction for that variable with the
> stored value, and clone the dbg.value instruction that puts the variable
> back in memory before the killing store. If the store is dead because
> variable lifetime is ending, the second dbg.value is unnecessary.
>
> This will also allow us to fix debug info for px in this example:
>  void __attribute__((optnone, noinline)) usevar(int *x) {}
>   int main(int argc, char **argv) {
>     int x = 42;
>     int *px = &x;
>     usevar(&x);
>     if (argc) usevar(px);
>   }
>
> Today, we emit a location for px like `DW_OP_breg7 RSP+12`, which gives it
> the incorrect value 42. This is because our DBG_VALUE instruction for px’s
> location uses a frame index, which we assume is in memory. This is not the
> case, px is not in memory, it’s value is a stack object pointer.
>
> Please reply if you have any thoughts on this proposal. Adrian and I
> hashed this out over Bugzilla, IRC, and in person, so it shouldn’t be too
> surprising. Let me know if you want to be CC’d on the patches.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170906/522e58a9/attachment.html>

Robinson, Paul via llvm-dev

2017-Sep-06 17:18 UTC

head link

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

> -----Original Message-----
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of
Alex
> Bradbury via llvm-dev
> Sent: Wednesday, September 06, 2017 3:25 AM
> To: Reid Kleckner
> Cc: llvm-dev
> Subject: Re: [llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe
> variables in memory with dbg.value
> 
> On 5 September 2017 at 21:00, Reid Kleckner via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> > Debug info today handles two cases reasonably well:
> > 1. At -O0, dbg.declare does a good job describing variables that live
at
> > some known stack offset
> > 2. With optimizations, variables promoted to SSA can be described with
> > dbg.value
> >
> > This leaves behind a large hole in our optimized debug info: variables
> that
> > cannot be promoted, typically because they are address-taken. This is
> > https://llvm.org/pr34136, and this RFC is mostly about addressing
that.
> >
> > The status today is that instcombine removes all dbg.declares and
> > heuristically inserts dbg.values where it can identify the value of
the
> > variable in question. This prevents us from having misleading debug
> info,
> > but it throws away information about the variable’s location in
memory.
> 
> Hi Reid, thanks for writing such a clear summary of the problem that
> this RFC addresses. I was wondering if there is any sort of
> methodology for quantifying the "quality" of debug information?
e.g.
> the gap between the current debug information and 'ideal' debug
info?
Hi Alex,

The usual oracle for execution-related debug info is "what you get at
-O0."
More abstractly, you would like to be able to stop at any source statement,
and have any in-scope variable's value available for display (and possibly
for modification).

At my previous employer we built a tool that compared the set of breakpoint 
locations at -O0 to -O1/-O2, and also compared the ranges where a variable 
had a defined location to the range of code for its lexical scope.  This 
gives you computable metrics such that you can readily tell when things have
gotten "better" or "worse" compared to some previous
compiler version (or
competitor compiler).

There's no open-source tool that does this at the moment, that I'm aware
of,
although we are working on publishing our DIVA tool (see our lightning talk
at EuroLLVM earlier this year).
--paulr

Robinson, Paul via llvm-dev

2017-Sep-06 17:50 UTC

head link

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

It's worth remembering that there are two syntactically similar but
semantically different kinds of "expression" in DWARF.
A DWARF expression computes a value; if the available value is a pointer, you
add DW_OP_deref to express the pointed-to value.  A DWARF location expression
computes a location, and adds various operators to express locations that a
(value) expression cannot, such as DW_OP_regx.  You also have DW_OP_stack_value
to say "just kidding, this location expression is a value expression."
So, whether we want to start throwing around deref or stack_value or regx
(implicit or explicit) really depends on whether we are going to be using value
expressions or location expressions.  Let's not start mixing them up, it
will just make the discussion more confusing.
--paulr

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of David
Blaikie via llvm-dev
Sent: Wednesday, September 06, 2017 10:02 AM
To: Reid Kleckner; llvm-dev
Subject: Re: [llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables
in memory with dbg.value

On Tue, Sep 5, 2017 at 1:00 PM Reid Kleckner via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Debug info today handles two cases reasonably well:
1. At -O0, dbg.declare does a good job describing variables that live at some
known stack offset
2. With optimizations, variables promoted to SSA can be described with dbg.value

This leaves behind a large hole in our optimized debug info: variables that
cannot be promoted, typically because they are address-taken. This is
https://llvm.org/pr34136, and this RFC is mostly about addressing that.

The status today is that instcombine removes all dbg.declares and heuristically
inserts dbg.values where it can identify the value of the variable in question.
This prevents us from having misleading debug info, but it throws away
information about the variable’s location in memory.

Part of the reason that instcombine discards dbg.declares is that we can’t mix
and match dbg.value with dbg.declare. If the backend sees a dbg.declare, it
accepts that information as more reliable and discards all DBG_VALUE
instructions associated with that variable. So, we need something we can mix. We
need a way to say, the variable lives in memory *at this program point*, and it
might live somewhere else later on. I propose that we introduce
DW_OP_LLVM_memory for this purpose, and then we transition from dbg.declare to
dbg.value+DW_OP_LLVM_memory.

Initially I believed that DW_OP_deref was the way to say this with existing
DWARF expression opcodes, but I implemented that in
https://reviews.llvm.org/D37311 and learned more about how DWARF expressions
work. When a debugger begins evaluating a DWARF expression, it assumes that the
resulting value will be a pointer to the variable in memory. For a debugger,
this makes sense, because debug builds put things in memory and even after
optimization many variables must be spilled. Only the special DW_OP_regN and
DW_OP_stack_value expression opcodes change the location of the value from
memory to register or stack value.

LLVM SSA values obviously do not have an address that we can take and they don’t
live in registers, so neither the default memory location model nor DW_OP_regN
make sense for LLVM’s dbg.value. We could hypothetically repurpose
DW_OP_stack_value to indicate that the SSA value passed to llvm.dbg.value *is*
the variable’s value, and if the expression lacks DW_OP_stack_value, it must be
a the address of the value. However, that is backwards incompatible and it seems
like quite a stretch.

Seems like a stretch in what sense? The backwards incompatibility is certainly
something to consider (though we went through that with DW_OP_bit_piece too),
but this seems like the design I'd go to first so I'd like to better
understand why it's not the path forward if there's some more detail
about that aspect of the design choice here.

I guess you described this already, but talking it through for myself/maybe
others will find this useful:

So since we don't have DW_OP_regN for LLVM registers, we could sort of
assume the implicit first value on the stack is a pseudo-OP_regN of the LLVM SSA
register.

To support that, all existing uses would need no changes to match the DWARF
model of registers being implicitly direct values.

Code that wanted to describe the register as containing the memory address of
the interesting thing would use DW_OP_stack_value to say "this location
description that is a register is really an address you should follow to find
the value, not a direct value itself"?

But code that wanted to describe a variable as being 3 bytes ahead of a pointer
in an LLVM SSA register would only have "plus 3" in the expression
stack, since then it's no longer a direct value but is treated as a pointer
to the value. I guess this is where the ambiguity would come in - currently how
does "plus 3" get interpreted when seen in LLVM IR, I guess that's
meant to describe reg value + 3 as being the immediate value of the variable?
(so it's implicitly OP_stack_value? & OP_stack_value is added somewhere
in the DWARF backend?)

Thanks,
- Dave


DW_OP_LLVM_memory would be very similar to DW_OP_stack_value, though. It would
only be valid at the end of a DIExpression. The backend will always remove it
because the debugger will assume the variable lives in memory unless it is told
otherwise.

For the original problem of improving optimized debug info while avoiding
inaccurate information in the presence of dead store elimination, consider this
C example:
  int x = 42;  // Can DSE
  dostuff(x); // Can propagate 42
  x = computation();  // Post-dominates `x = 42` store
  escape(&x);

We should be able to do this:
  int x; // eliminate `x = 42` store
  dbg.value(!x, 42, !DIExpression()) // mark x as the constant 42 in debug info
  dostuff(42); // propagate 42
  dbg.value(!x, &x, !DIExpression(DW_OP_LLVM_memory)) // x is in memory
again
  x = computation();
  escape(&x);

Passes that delete stores would be responsible for checking if the store
destination is part of an alloca with associated dbg.value instructions. They
would emit a new dbg.value instruction for that variable with the stored value,
and clone the dbg.value instruction that puts the variable back in memory before
the killing store. If the store is dead because variable lifetime is ending, the
second dbg.value is unnecessary.

This will also allow us to fix debug info for px in this example:
 void __attribute__((optnone, noinline)) usevar(int *x) {}
  int main(int argc, char **argv) {
    int x = 42;
    int *px = &x;
    usevar(&x);
    if (argc) usevar(px);
  }

Today, we emit a location for px like `DW_OP_breg7 RSP+12`, which gives it the
incorrect value 42. This is because our DBG_VALUE instruction for px’s location
uses a frame index, which we assume is in memory. This is not the case, px is
not in memory, it’s value is a stack object pointer.

Please reply if you have any thoughts on this proposal. Adrian and I hashed this
out over Bugzilla, IRC, and in person, so it shouldn’t be too surprising. Let me
know if you want to be CC’d on the patches.
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170906/aa0eea22/attachment-0001.html>

Reid Kleckner via llvm-dev

2017-Sep-06 21:01 UTC

head link

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

On Wed, Sep 6, 2017 at 10:01 AM, David Blaikie <dblaikie at gmail.com>
wrote:
> On Tue, Sep 5, 2017 at 1:00 PM Reid Kleckner via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> LLVM SSA values obviously do not have an address that we can take and
>> they don’t live in registers, so neither the default memory location
model
>> nor DW_OP_regN make sense for LLVM’s dbg.value. We could hypothetically
>> repurpose DW_OP_stack_value to indicate that the SSA value passed to
>> llvm.dbg.value *is* the variable’s value, and if the expression lacks
>> DW_OP_stack_value, it must be a the address of the value. However, that
is
>> backwards incompatible and it seems like quite a stretch.
>>
>
> Seems like a stretch in what sense? The backwards incompatibility is
> certainly something to consider (though we went through that with
> DW_OP_bit_piece too), but this seems like the design I'd go to first so
I'd
> like to better understand why it's not the path forward if there's
some
> more detail about that aspect of the design choice here.
>
> I guess you described this already, but talking it through for
> myself/maybe others will find this useful:
>
> So since we don't have DW_OP_regN for LLVM registers, we could sort of
> assume the implicit first value on the stack is a pseudo-OP_regN of the
> LLVM SSA register.
>
Yep, that's how we use DIExpressions in both IR and MIR: The LHS of the
dbg.value and DBG_VALUE instructions are a register-like value that gets
pushed onto the expression stack. The DWARF asmprinter does some expression
folding to clean things up, but that's the model.

> To support that, all existing uses would need no changes to match the
> DWARF model of registers being implicitly direct values.
>
> Code that wanted to describe the register as containing the memory address
> of the interesting thing would use DW_OP_stack_value to say "this
location
> description that is a register is really an address you should follow to
> find the value, not a direct value itself"?
>
> But code that wanted to describe a variable as being 3 bytes ahead of a
> pointer in an LLVM SSA register would only have "plus 3" in the
expression
> stack, since then it's no longer a direct value but is treated as a
pointer
> to the value. I guess this is where the ambiguity would come in - currently
> how does "plus 3" get interpreted when seen in LLVM IR, I guess
that's
> meant to describe reg value + 3 as being the immediate value of the
> variable? (so it's implicitly OP_stack_value? & OP_stack_value is
added
> somewhere in the DWARF backend?)
>
Our model today is inconsistent. In LLVM IR today, the SSA value of the
dbg.value *is* the interesting value, it is not the address, and we
typically use empty DIExpressions. If the value is ultimately register
allocated and the DIExpression is empty, we will emit a DW_OP_regN location
expression. If the value is spilled, we usually don't need to append
DW_OP_stack_value because the location is now a memory location, which can
be described by DW_OP_[f]breg.

Today, passes that want to add "plus 3" to a DIExpression go out of
their
way to add DW_OP_stack_value to the DIExpression because the backend won't
do it for us, even though dbg.value normally describes the value, not an
address.

To explore the alternative DW_OP_stack_value model, here's how I'd go
about
it:
1. Replace llvm.dbg.value with new intrinsic, llvm.dbg.loc, to make the
semantic change clear. It can express both an address or a value, depending
on the DIExpression.
2. Auto-upgrade llvm.dbg.value to llvm.dbg.loc. Append DW_OP_stack_value to
the DIExpression argument of the intrinsic.
3. Auto-upgrade llvm.dbg.declare to llvm.dbg.loc, leave the DIExpression
alone. The LHS of llvm.dbg.declare is already the address of the variable.
4. Eliminate the second operand of DBG_VALUE MachineInstrs. Indirect
DBG_VALUES are now expressed with a DIExpression that lacks
DW_OP_stack_value at the end.
5. Teach our DWARF expression emitter to combine the new expressions as
necessary. In particular, we can elide DW_OP_stack_value for DBG_VALUEs
with physical register operands. They just use DW_OP_regN, which is
implicitly a value location.
6. Teach all passes that spill virtual registers used by DBG_VALUE to
remove DW_OP_stack_value from the DIExpression, or add DW_OP_deref as
appropriate.

This should be equivalent to DW_OP_LLVM_memory, and more inline with DWARF
location expression semantics, but it has a large migration cost.

---

I think part of the reason I wanted to move in the DW_OP_LLVM_memory
direction is that I originally wanted to add a memory offset operand to it.
Our actual use cases for complex DWARF expressions typically come from
things like safestack, ASan, and blocks. What these all have in common is
that they gather up a number of variables and move them off into a struct
in heap memory. This is very similar to what happens when we spill a
virtual register: instead of describing a register, we modify the
expression to load the value from some FP register with an offset. I think
the right representation for these transforms is basically a "chain of
loads". I was imagining that DW_OP_LLVM_memory with an offset would be that
load chain link.

The idea behind this representation is that it should make it easy for
spilling transforms to prepend a load chain onto the expression, rather
than them having to individually discover if DW_OP_deref is needed, or call
some common helper like DIExpression::prepend. It should always be valid to
push on a new load with an offset.

It also has the advantage that it will be easier to translate to CodeView
than arbitrary DWARF expressions, which we are currently canonicalizing
into a load chain and then attempting to emit.

Does that make sense? I'm starting to feel like I should either pursue the
more ambitious load chain design, or consistently apply DW_OP_stack_value
to llvm.dbg.loc (alternative names welcome).
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170906/e454ae27/attachment.html>

Adrian Prantl via llvm-dev

2017-Sep-07 17:59 UTC

head link

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

Sorry for the delay in replying to this, I've been out sick for the last
couple of days.
> On Sep 5, 2017, at 1:00 PM, Reid Kleckner via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Debug info today handles two cases reasonably well:
> 1. At -O0, dbg.declare does a good job describing variables that live at
some known stack offset
> 2. With optimizations, variables promoted to SSA can be described with
dbg.value
> 
> This leaves behind a large hole in our optimized debug info: variables that
cannot be promoted, typically because they are address-taken. This is
https://llvm.org/pr34136 <https://llvm.org/pr34136>, and this RFC is
mostly about addressing that.
> 
> The status today is that instcombine removes all dbg.declares and
heuristically inserts dbg.values where it can identify the value of the variable
in question. This prevents us from having misleading debug info, but it throws
away information about the variable’s location in memory.
> 
> Part of the reason that instcombine discards dbg.declares is that we can’t
mix and match dbg.value with dbg.declare. If the backend sees a dbg.declare, it
accepts that information as more reliable and discards all DBG_VALUE
instructions associated with that variable. So, we need something we can mix. We
need a way to say, the variable lives in memory *at this program point*, and it
might live somewhere else later on. I propose that we introduce
DW_OP_LLVM_memory for this purpose, and then we transition from dbg.declare to
dbg.value+DW_OP_LLVM_memory.
> 
> Initially I believed that DW_OP_deref was the way to say this with existing
DWARF expression opcodes, but I implemented that in
https://reviews.llvm.org/D37311 <https://reviews.llvm.org/D37311> and
learned more about how DWARF expressions work. When a debugger begins evaluating
a DWARF expression, it assumes that the resulting value will be a pointer to the
variable in memory.
That is an oversimplification. DWARF distinguishes at least three different
kinds of locations: register locations (for when a variable is in a particular
register and only there, i.e., the debugger may write to the register to modify
the variable's value; think K&R C register variables), memory locations
(which behave as you describe and allow the debugger to write to memory),
implicit locations (DW_OP_stack_value, constants, ..., where the debugger
can't write to modify). In LLVM we don't distinguish these cases as much
as we should.
> For a debugger, this makes sense, because debug builds put things in memory
and even after optimization many variables must be spilled. Only the special
DW_OP_regN and DW_OP_stack_value expression opcodes change the location of the
value from memory to register or stack value.
See above.
> LLVM SSA values obviously do not have an address that we can take and they
don’t live in registers, so neither the default memory location model nor
DW_OP_regN make sense for LLVM’s dbg.value. We could hypothetically repurpose
DW_OP_stack_value to indicate that the SSA value passed to llvm.dbg.value *is*
the variable’s value, and if the expression lacks DW_OP_stack_value, it must be
a the address of the value. However, that is backwards incompatible and it seems
like quite a stretch.
I don't think we should burden ourselves with backwards compatibility too
much for this, if we have a good solution, I'm fine with having an upgrade
that "looses" all incompatible dbg.value intrinsics once.
More to your point, in DWARF, DW_OP_stack_value means something different (see
above) and I'd prefer to use the DWARF semantics for operators in
DIEpressions and introduce custom DW_OP_LLVM_* ones where the semantics differ.
> DW_OP_LLVM_memory would be very similar to DW_OP_stack_value, though. It
would only be valid at the end of a DIExpression. The backend will always remove
it because the debugger will assume the variable lives in memory unless it is
told otherwise.
> 
> For the original problem of improving optimized debug info while avoiding
inaccurate information in the presence of dead store elimination, consider this
C example:
>   int x = 42;  // Can DSE
>   dostuff(x); // Can propagate 42
>   x = computation();  // Post-dominates `x = 42` store
>   escape(&x);
> 
> We should be able to do this:
>   int x; // eliminate `x = 42` store
>   dbg.value(!x, 42, !DIExpression()) // mark x as the constant 42 in debug
info
>   dostuff(42); // propagate 42
>   dbg.value(!x, &x, !DIExpression(DW_OP_LLVM_memory)) // x is in memory
again
>   x = computation();
>   escape(&x);
> 
> Passes that delete stores would be responsible for checking if the store
destination is part of an alloca with associated dbg.value instructions. They
would emit a new dbg.value instruction for that variable with the stored value,
and clone the dbg.value instruction that puts the variable back in memory before
the killing store. If the store is dead because variable lifetime is ending, the
second dbg.value is unnecessary.
> 
> This will also allow us to fix debug info for px in this example:
>  void __attribute__((optnone, noinline)) usevar(int *x) {}
>   int main(int argc, char **argv) {
>     int x = 42;
>     int *px = &x;
>     usevar(&x);
>     if (argc) usevar(px);
>   }
> 
> Today, we emit a location for px like `DW_OP_breg7 RSP+12`, which gives it
the incorrect value 42. This is because our DBG_VALUE instruction for px’s
location uses a frame index, which we assume is in memory. This is not the case,
px is not in memory, it’s value is a stack object pointer.
> 
> Please reply if you have any thoughts on this proposal. Adrian and I hashed
this out over Bugzilla, IRC, and in person, so it shouldn’t be too surprising.
Let me know if you want to be CC’d on the patches.
I'm going to read through all the other replies now, before commenting on
the concrete proposal.

thanks for writing this up,
Adrian
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170907/cd968717/attachment.html>

Adrian Prantl via llvm-dev

2017-Sep-07 18:02 UTC

head link

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

> On Sep 6, 2017, at 3:24 AM, Alex Bradbury via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> On 5 September 2017 at 21:00, Reid Kleckner via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Debug info today handles two cases reasonably well:
>> 1. At -O0, dbg.declare does a good job describing variables that live
at
>> some known stack offset
>> 2. With optimizations, variables promoted to SSA can be described with
>> dbg.value
>> 
>> This leaves behind a large hole in our optimized debug info: variables
that
>> cannot be promoted, typically because they are address-taken. This is
>> https://llvm.org/pr34136, and this RFC is mostly about addressing that.
>> 
>> The status today is that instcombine removes all dbg.declares and
>> heuristically inserts dbg.values where it can identify the value of the
>> variable in question. This prevents us from having misleading debug
info,
>> but it throws away information about the variable’s location in memory.
> 
> Hi Reid, thanks for writing such a clear summary of the problem that
> this RFC addresses. I was wondering if there is any sort of
> methodology for quantifying the "quality" of debug information?
e.g.
> the gap between the current debug information and 'ideal' debug
info?
It is difficult. One thing we can do is measure the delta between the debug info
quality of two compiler version (https://reviews.llvm.org/D36627
<https://reviews.llvm.org/D36627> adds an option to collect various
metrics to that end to llvm-dwarfdump).

-- adrian

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170907/baaf826d/attachment.html>

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Sep 2017 - RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

Reasonably Related Threads