thr3ads.net - llvm dev - [llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value [Sep 2017]

If this information is useful, please help other people find it:
Share via:

Robinson, Paul via llvm-dev

2017-Sep-05 23:26 UTC

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

Hi Reid,
Thanks for taking this on, I'm very pleased to see improvements related to
debug info for optimized code.  (You can cc me on code reviews, although I'm
sure a lot of the patches will be in areas I am not very familiar with.)

While I have a really good handle on the DWARF standard, and have done a bunch
of work with the type stuff, my understanding of IR mechanics is pretty naïve,
so I'd appreciate any explanations that help me understand why the following
might be really lame.
In optimized code, for things like the address-taken case, does the alloca
survive?  Assuming it does, can we attach the DIVariable metadata to the alloca
instead of having a separate dbg.declare?  (It has always seemed to me that this
would make some things a lot simpler, as you don't have to troll around
looking for that other instruction, use-lists aren't special cased for debug
info instructions, and probably other things.)
If a memory-homed variable retains its alloca and the alloca retains its
metadata, then it seems like it should be straightforward to produce that memory
address as the default location for the variable.
And if we're in the habit of looking at metadata on normal instructions for
DIVariables instead of having dbg.value instructions, then maybe we don't
need dbg.value either.

Thanks,
--paulr

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Reid
Kleckner via llvm-dev
Sent: Tuesday, September 05, 2017 1:00 PM
To: llvm-dev
Subject: [llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in
memory with dbg.value

Debug info today handles two cases reasonably well:
1. At -O0, dbg.declare does a good job describing variables that live at some
known stack offset
2. With optimizations, variables promoted to SSA can be described with dbg.value

This leaves behind a large hole in our optimized debug info: variables that
cannot be promoted, typically because they are address-taken. This is
https://llvm.org/pr34136, and this RFC is mostly about addressing that.

The status today is that instcombine removes all dbg.declares and heuristically
inserts dbg.values where it can identify the value of the variable in question.
This prevents us from having misleading debug info, but it throws away
information about the variable’s location in memory.

Part of the reason that instcombine discards dbg.declares is that we can’t mix
and match dbg.value with dbg.declare. If the backend sees a dbg.declare, it
accepts that information as more reliable and discards all DBG_VALUE
instructions associated with that variable. So, we need something we can mix. We
need a way to say, the variable lives in memory *at this program point*, and it
might live somewhere else later on. I propose that we introduce
DW_OP_LLVM_memory for this purpose, and then we transition from dbg.declare to
dbg.value+DW_OP_LLVM_memory.

Initially I believed that DW_OP_deref was the way to say this with existing
DWARF expression opcodes, but I implemented that in
https://reviews.llvm.org/D37311 and learned more about how DWARF expressions
work. When a debugger begins evaluating a DWARF expression, it assumes that the
resulting value will be a pointer to the variable in memory. For a debugger,
this makes sense, because debug builds put things in memory and even after
optimization many variables must be spilled. Only the special DW_OP_regN and
DW_OP_stack_value expression opcodes change the location of the value from
memory to register or stack value.

LLVM SSA values obviously do not have an address that we can take and they don’t
live in registers, so neither the default memory location model nor DW_OP_regN
make sense for LLVM’s dbg.value. We could hypothetically repurpose
DW_OP_stack_value to indicate that the SSA value passed to llvm.dbg.value *is*
the variable’s value, and if the expression lacks DW_OP_stack_value, it must be
a the address of the value. However, that is backwards incompatible and it seems
like quite a stretch.

DW_OP_LLVM_memory would be very similar to DW_OP_stack_value, though. It would
only be valid at the end of a DIExpression. The backend will always remove it
because the debugger will assume the variable lives in memory unless it is told
otherwise.

For the original problem of improving optimized debug info while avoiding
inaccurate information in the presence of dead store elimination, consider this
C example:
  int x = 42;  // Can DSE
  dostuff(x); // Can propagate 42
  x = computation();  // Post-dominates `x = 42` store
  escape(&x);

We should be able to do this:
  int x; // eliminate `x = 42` store
  dbg.value(!x, 42, !DIExpression()) // mark x as the constant 42 in debug info
  dostuff(42); // propagate 42
  dbg.value(!x, &x, !DIExpression(DW_OP_LLVM_memory)) // x is in memory
again
  x = computation();
  escape(&x);

Passes that delete stores would be responsible for checking if the store
destination is part of an alloca with associated dbg.value instructions. They
would emit a new dbg.value instruction for that variable with the stored value,
and clone the dbg.value instruction that puts the variable back in memory before
the killing store. If the store is dead because variable lifetime is ending, the
second dbg.value is unnecessary.

This will also allow us to fix debug info for px in this example:
 void __attribute__((optnone, noinline)) usevar(int *x) {}
  int main(int argc, char **argv) {
    int x = 42;
    int *px = &x;
    usevar(&x);
    if (argc) usevar(px);
  }

Today, we emit a location for px like `DW_OP_breg7 RSP+12`, which gives it the
incorrect value 42. This is because our DBG_VALUE instruction for px’s location
uses a frame index, which we assume is in memory. This is not the case, px is
not in memory, it’s value is a stack object pointer.

Please reply if you have any thoughts on this proposal. Adrian and I hashed this
out over Bugzilla, IRC, and in person, so it shouldn’t be too surprising. Let me
know if you want to be CC’d on the patches.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170905/8ffcee89/attachment.html>

Reid Kleckner via llvm-dev

2017-Sep-06 00:04 UTC

head link

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

On Tue, Sep 5, 2017 at 4:26 PM, Robinson, Paul <paul.robinson at sony.com>
wrote:>
> While I have a really good handle on the DWARF standard, and have done a
> bunch of work with the type stuff, my understanding of IR mechanics is
> pretty naïve, so I'd appreciate any explanations that help me
understand
> why the following might be really lame.
>
> In optimized code, for things like the address-taken case, does the alloca
> survive?  Assuming it does, can we attach the DIVariable metadata to the
> alloca instead of having a separate dbg.declare?  (It has always seemed to
> me that this would make some things a lot simpler, as you don't have to
> troll around looking for that other instruction, use-lists aren't
special
> cased for debug info instructions, and probably other things.)
>If a memory-homed variable retains its alloca and the alloca retains
its> metadata, then it seems like it should be straightforward to produce that
> memory address as the default location for the variable.
>
I think if I were redesigning LLVM, I would go even further and merge
DILocalVariable and alloca. =) Functions should really just have
"variables" that live in memory and can be accessed from any basic
block.
After SSA promotion, if a variable has no uses that require it to live in
memory, we simply wouldn't allocate space for it. I definitely don't
plan
to do that, though.

In today's LLVM, dbg.declare does serve one useful purpose: it marks the
point of declaration of the variable. We can use it to power
DW_AT_start_scope, so that users won't see uninitialized variables that are
in scope in this example:
  int x = f();
  // break here, 'info locals' prints a garbage y because it is in scope
  int y = f();

If we don't think we'll ever do DW_AT_start_scope, then yes, we could
probably use variable attachments instead of dbg.declare. But I think we
want to go the other direction and standardize on dbg.value.

> And if we're in the habit of looking at metadata on normal instructions
> for DIVariables instead of having dbg.value instructions, then maybe we
> don't need dbg.value either.
>
We definitely need something like dbg.value. For variables that can be
fully promoted to SSA values, we need dbg.value to record in the debug info
that a source-level assignment occurred at this particular program point.
mem2reg completely erases the assigning instruction, so we need some kind
of placeholder. For variables that cannot be fully promoted, passes like
DSE should make an effort to record in the debug info that an assignment
occurred even if the store was deleted.

It's that concept of a "program point" that I don't think we
can replace
with instruction metadata attachments. Today's LLVM instructions move
around too much to represent that.

Thanks for reading!
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170905/8df9dabd/attachment.html>

Robinson, Paul via llvm-dev

2017-Sep-06 18:05 UTC

head link

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

Hi Reid,
Good point about dbg.value being a marker for the source-level semantic of an
assignment, even when the value is computed at some distinctly different point
in the execution and the store is erased.  My previous compiler was not
SSA-based and we had something to hang the debug info on, even if no actual
store occurred.  In LLVM IR you do need this sort of "artificial use"
marker.
Thanks!
--paulr

From: Reid Kleckner [mailto:rnk at google.com]
Sent: Tuesday, September 05, 2017 5:05 PM
To: Robinson, Paul
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables
in memory with dbg.value

On Tue, Sep 5, 2017 at 4:26 PM, Robinson, Paul <paul.robinson at
sony.com<mailto:paul.robinson at sony.com>> wrote:
While I have a really good handle on the DWARF standard, and have done a bunch
of work with the type stuff, my understanding of IR mechanics is pretty naïve,
so I'd appreciate any explanations that help me understand why the following
might be really lame.
In optimized code, for things like the address-taken case, does the alloca
survive?  Assuming it does, can we attach the DIVariable metadata to the alloca
instead of having a separate dbg.declare?  (It has always seemed to me that this
would make some things a lot simpler, as you don't have to troll around
looking for that other instruction, use-lists aren't special cased for debug
info instructions, and probably other things.)
If a memory-homed variable retains its alloca and the alloca retains its
metadata, then it seems like it should be straightforward to produce that memory
address as the default location for the variable.

I think if I were redesigning LLVM, I would go even further and merge
DILocalVariable and alloca. =) Functions should really just have
"variables" that live in memory and can be accessed from any basic
block. After SSA promotion, if a variable has no uses that require it to live in
memory, we simply wouldn't allocate space for it. I definitely don't
plan to do that, though.

In today's LLVM, dbg.declare does serve one useful purpose: it marks the
point of declaration of the variable. We can use it to power DW_AT_start_scope,
so that users won't see uninitialized variables that are in scope in this
example:
  int x = f();
  // break here, 'info locals' prints a garbage y because it is in scope
  int y = f();

If we don't think we'll ever do DW_AT_start_scope, then yes, we could
probably use variable attachments instead of dbg.declare. But I think we want to
go the other direction and standardize on dbg.value.

And if we're in the habit of looking at metadata on normal instructions for
DIVariables instead of having dbg.value instructions, then maybe we don't
need dbg.value either.

We definitely need something like dbg.value. For variables that can be fully
promoted to SSA values, we need dbg.value to record in the debug info that a
source-level assignment occurred at this particular program point. mem2reg
completely erases the assigning instruction, so we need some kind of
placeholder. For variables that cannot be fully promoted, passes like DSE should
make an effort to record in the debug info that an assignment occurred even if
the store was deleted.

It's that concept of a "program point" that I don't think we
can replace with instruction metadata attachments. Today's LLVM instructions
move around too much to represent that.

Thanks for reading!
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170906/aaf7f48c/attachment.html>

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Sep 2017 - RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

[llvm-dev] RFC: Introduce DW_OP_LLVM_memory to describe variables in memory with dbg.value

Reasonably Related Threads