thr3ads.net - llvm dev - [llvm-dev] Notes from dbg.value coffee chat [Oct 2020]

If this information is useful, please help other people find it:
Share via:

Reid Kleckner via llvm-dev

2020-Oct-08 19:07 UTC

[llvm-dev] Notes from dbg.value coffee chat

I chatted with Jeremy Morse, Orlando, and Stephen Tozer at the dev meeting,
and wanted to summarize the conversation for our benefit, and to share it
with others. I aim to be brief, so I apologize if these notes aren't as
helpful as they could be to folks who weren't present.

Three project ideas, by priority:
1. Address https://llvm.org/pr34136, improving quality of variable location
info for variables in memory, by getting the frontend to pre-annotate
assignments.
2. Prototype "stop points", or statement markers in the instruction
stream.
Use Dexter or other tools to measure potential improvements in stepping
behavior, consider productionizing.
3. Move all debug info intrinsics out of the instruction stream by adding a
new all-powerful instruction (maybe dbg_point, dbg_label?) that essentially
multiplexes one llvm::Instruction into multiple debug instructions.
4. (bonus) Old idea, low-priority: Prototype a mode that models stop points
as having side effects. Start the function by escaping all non-temporary
local variables. Variable values should be accurate and writable at all
statement start PCs.

Idea 1: Local variables in memory

Have clang emit dbg.value instructions directly after every assignment. We
discussed the similarities of this idea to the idea of "Key
Instructions"
from Caroline Tice's thesis, but I can't claim this idea is totally
faithful to it. For assignments to memory locations that are not local
variables (*p = v, this->m = v), replace the local variable metadata
argument with the value of the store destination, using ValueAsMetadata.
Standard cleanup passes (instcombine, inliner?) should transform dbg.values
with memory destinations that point into an alloca with the corresponding
local variable for the alloca. This allows passes that delete stores other
than mem2reg (DSE, Instcombine, GVN, anything using MemorySSA) to not worry
about producing dbg.values because they already exist: the frontend has
provided them. This was the fundamental reason why lowerDbgDeclare is
called in Instcombine, so we can remove that, keep the dbg.declare
instructions or something equivalent, and greatly expand the range over
which the variable is known to live in stack memory. Variables which never
participate in dead store elimination (hopefully many) are more likely to
be entirely described by a memory location, and to not need a DWARF
location list. They will be writable as well.

Idea 2: Stop points

This is an old idea: LLVM used to have stop point intrinsics before it had
debug location instruction attachments. Given the new goals around
profiling accuracy that we've declared for location information, perhaps we
should reconsider the merits of the old design. A new intrinsic that
functions similarly to dbg.value in that it produces no value, remains in
the instruction stream, and is not removed by standard dead code
elimination should be introduced. Perhaps dbg.stmt. This could be lowered
down to power the .loc is_stmt bit in the DWARF line tables. We could also
have a mode where the *only* information used to fill in the line tables
comes from these instructions. Some data flow passes would be required to
propagate the current location into blocks during codegen, similar to some
of the existing debug value passes.

Idea 3: dbg_point

This is a representational change that is mostly meant to make LLVM more
efficient. I don't have data, but we believe that runs of dbg.value
instructions slow down passes because they must be iterated over during
optimization. We also believe that they are memory inefficient. A new
representation would address that by allowing us to coalesce multiple
logically distinct dbg.value operations into one llvm::Instruction. This
instruction could be extended to contain all types of debug info
instuctions: dbg.label, dbg.value, dbg.declare, dbg.stmt, or anything else.
Having just watched the MLIR tutorial, it reminds me of MLIR regions.

Idea 4: Not much to say

---

That's all, I'm sure there was more that I missed, and these ideas are
perhaps a bit hare-brained still, but maybe the wider community will have
some input.

Thanks,
Reid
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201008/258ba455/attachment.html>

Cazalet-Hyams, Orlando via llvm-dev

2020-Oct-09 16:38 UTC

head link

[llvm-dev] Notes from dbg.value coffee chat

Hi Reid,

Thanks for sharing this. I plan to work on improving debug-info for variables
living in memory as my next "project" so I am very interested to hear
with what
others have to say about "Idea 1".

There is one part of the idea that confuses me. You say we could "keep the
dbg.declare instructions", but I don't see where dbg.declare
instructions - at
least with their current semantics - fit into the design if the frontend is
emitting dbg.values after every assignment. Could you please expand on this part
a little?

Thanks,
Orlando

From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Reid
Kleckner via llvm-dev
Sent: 08 October 2020 20:07
To: llvm-dev <llvm-dev at lists.llvm.org>
Subject: [llvm-dev] Notes from dbg.value coffee chat

I chatted with Jeremy Morse, Orlando, and Stephen Tozer at the dev meeting, and
wanted to summarize the conversation for our benefit, and to share it with
others. I aim to be brief, so I apologize if these notes aren't as helpful
as they could be to folks who weren't present.

Three project ideas, by priority:
1. Address https://llvm.org/pr34136, improving quality of variable location info
for variables in memory, by getting the frontend to pre-annotate assignments.
2. Prototype "stop points", or statement markers in the instruction
stream. Use Dexter or other tools to measure potential improvements in stepping
behavior, consider productionizing.
3. Move all debug info intrinsics out of the instruction stream by adding a new
all-powerful instruction (maybe dbg_point, dbg_label?) that essentially
multiplexes one llvm::Instruction into multiple debug instructions.
4. (bonus) Old idea, low-priority: Prototype a mode that models stop points as
having side effects. Start the function by escaping all non-temporary local
variables. Variable values should be accurate and writable at all statement
start PCs.

Idea 1: Local variables in memory

Have clang emit dbg.value instructions directly after every assignment. We
discussed the similarities of this idea to the idea of "Key
Instructions" from Caroline Tice's thesis, but I can't claim this
idea is totally faithful to it. For assignments to memory locations that are not
local variables (*p = v, this->m = v), replace the local variable metadata
argument with the value of the store destination, using ValueAsMetadata.
Standard cleanup passes (instcombine, inliner?) should transform dbg.values with
memory destinations that point into an alloca with the corresponding local
variable for the alloca. This allows passes that delete stores other than
mem2reg (DSE, Instcombine, GVN, anything using MemorySSA) to not worry about
producing dbg.values because they already exist: the frontend has provided them.
This was the fundamental reason why lowerDbgDeclare is called in Instcombine, so
we can remove that, keep the dbg.declare instructions or something equivalent,
and greatly expand the range over which the variable is known to live in stack
memory. Variables which never participate in dead store elimination (hopefully
many) are more likely to be entirely described by a memory location, and to not
need a DWARF location list. They will be writable as well.

Idea 2: Stop points

This is an old idea: LLVM used to have stop point intrinsics before it had debug
location instruction attachments. Given the new goals around profiling accuracy
that we've declared for location information, perhaps we should reconsider
the merits of the old design. A new intrinsic that functions similarly to
dbg.value in that it produces no value, remains in the instruction stream, and
is not removed by standard dead code elimination should be introduced. Perhaps
dbg.stmt. This could be lowered down to power the .loc is_stmt bit in the DWARF
line tables. We could also have a mode where the *only* information used to fill
in the line tables comes from these instructions. Some data flow passes would be
required to propagate the current location into blocks during codegen, similar
to some of the existing debug value passes.

Idea 3: dbg_point

This is a representational change that is mostly meant to make LLVM more
efficient. I don't have data, but we believe that runs of dbg.value
instructions slow down passes because they must be iterated over during
optimization. We also believe that they are memory inefficient. A new
representation would address that by allowing us to coalesce multiple logically
distinct dbg.value operations into one llvm::Instruction. This instruction could
be extended to contain all types of debug info instuctions: dbg.label,
dbg.value, dbg.declare, dbg.stmt, or anything else. Having just watched the MLIR
tutorial, it reminds me of MLIR regions.

Idea 4: Not much to say

---

That's all, I'm sure there was more that I missed, and these ideas are
perhaps a bit hare-brained still, but maybe the wider community will have some
input.

Thanks,
Reid

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201009/b728a8e7/attachment-0001.html>

Reid Kleckner via llvm-dev

2020-Oct-09 18:12 UTC

head link

[llvm-dev] Notes from dbg.value coffee chat

On Fri, Oct 9, 2020 at 9:38 AM Cazalet-Hyams, Orlando <
orlando.hyams at sony.com> wrote:
> Hi Reid,
>
>
>
> Thanks for sharing this. I plan to work on improving debug-info for
> variables
>
> living in memory as my next "project" so I am very interested to
hear with
> what
>
> others have to say about "Idea 1".
>
>
>
> There is one part of the idea that confuses me. You say we could "keep
the
>
> dbg.declare instructions", but I don't see where dbg.declare
instructions
> - at
>
> least with their current semantics - fit into the design if the frontend is
>
> emitting dbg.values after every assignment. Could you please expand on
> this part
> a little?
>
I think what I meant is that we need to keep the association between the
alloca and the local variable somewhere. The current implementation of
dbg.declare is not what we want: when we have one, it overrides any
dbg.values, and the alloca is used as the variable location for the full
scope. That's not what we want, but we do need an association between
alloca and variable+scope somewhere. The dbg.values in the design as we
discussed it would not contain the alloca, but the SSA value which is
stored to the alloca. Maybe if we augmented the dbg.values with the store
destination, we could get by without tracking that information separately.

The main design goal here was to have the variable location information be
correct even when DSE occurs, without updating every pass that deletes
stores, because there are many. Thinking about it today, I'm not sure this
design is complete yet. Even if the frontend effectively emits two stores
for every assignment, a real store, and dbg.value, the backend needs to
determine if the real store survived optimization. If it did, then the
variable value lives in memory. If it did not, then the variable value is
the value which would've been stored, if it is available at this program
point, or if it is available somewhere nearby. Maybe that's acceptable, but
it seems difficult. However, maybe that's similar to what we already do for
dbg.values uses that precede definitions.

The alternative to this design would be to say that stores to static
allocas are special, and cannot be deleted without updating debug info.
This is basically https://llvm.org/pr34136#c25. Thinking about it again
today, maybe this approach is feasible. We could build tooling to audit for
passes that delete these special "assigning stores". Maybe that's
better
than bending over backwards just to make it easy for the optimizers.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20201009/0ef50f73/attachment.html>

llvm dev - Oct 2020 - Notes from dbg.value coffee chat

[llvm-dev] Notes from dbg.value coffee chat

[llvm-dev] Notes from dbg.value coffee chat

[llvm-dev] Notes from dbg.value coffee chat