thr3ads.net - llvm dev - [llvm-dev] [RFC] DebugInfo: A different way of specifying variable locations post-isel [Feb 2020]

If this information is useful, please help other people find it:
Share via:

Jeremy Morse via llvm-dev

2020-Feb-25 18:28 UTC

[llvm-dev] [RFC] DebugInfo: A different way of specifying variable locations post-isel

Hi Vedant, thanks for the detailed response,

On Tue, Feb 25, 2020 at 7:23 AM Vedant Kumar <vedant_kumar at apple.com>
wrote:> > Finally, being forced to always specify both the machine location and
> > the program location at the same time (in a single DBG_VALUE)
> > introduces un-necessary burdens. In MachineSink, when we sink between
> > blocks an instruction that defines a vreg, we chose to sink DBG_VALUE
> > instructions referring to that vreg too to avoid losing the variable
> > location. This un-necessarily risks re-ordering assignments,
>
> So under the proposed scheme, would the dbg value not be sunk? Ah, so then
you’d get the dbg use before def scenario, which you argue has some nice
properties below.
Indeed -- I'm aware of previous efforts to suppress dbg-use-before-def
in the past, but I'm not aware of why it's not supported (it does
/feel/ wrong at the very least). Perhaps the reason why was because it
couldn't be described post-regalloc.

(As an aside, this would move us in the direction of making it easier
to change LLVM-IR instructions, but harder to determine what a
variable location is, through the middle stages of the compiler).
> > I'd like to suggest that we can represent variable locations in
the
> > codegen backend / MIR with three things:
> > * The instruction that defines the value of the variable,
> > * The operand of that instruction into which the value is written,
> > * The position in the instruction stream where the assignment of this
> > value to the variable occurs
> >
> > That's effectively modifying a machine location from being a
{v,}reg,
> > into being a "defining instruction" and operand. This is
closer to the
> > LLVM-IR form of a machine location, where the SSA Value and its
> > computation are synonymous. Exactly how this is represented in-memory
> > and in-printed-MIR I haven't thought a lot about; probably by
> > attaching metadata to instructions and having DBG_VALUE use a metadata
> > operand rather than referring to a vreg. Specifying machine locations
> > like this would have the following benefits:
> > * Both DBG_VALUEs and defining instructions are independent and can
> > be moved within the function without loss of information, and without
> > needing to consider so much context,
> > * Likewise, vregs can be rewritten / merged / deleted without the
> > need to update any debug metadata. Only instruction deletion /
> > morphing would need some sort of change,
>
> For deletion, that makes sense, we might consider introducing an MI level
salvageDI method to help with that.
>
> For replacement/morphing, wouldn’t MachineInstr::RAUW be able to update the
new-style debug uses?
I guess it would, on the assumption that a machineinstr having its
operands changed doesn't actually change the value(s) that it
computes.
> > The three instruction scheduling passes would become significantly
> > easier to deal with: they would only have to replace DBG_VALUE
> > instructions in the correct order, not worry about their operands.
> > Various debug facilities in SimpleRegisterCoalescing, MachineSink, and
> > large amounts of LiveDebugVariables would become redundant, as we
> > wouldn't need to maintain a register location through
optimisations.
>
> I view this as a very large potential upside. LiveDebugVariables is
incredibly complex and is a huge compile-time bear. Defining away large chunks
of its job would be really nice.
>
> Would this mean that an equivalence class in LiveDebugVariables would
consist exclusively of UserValues referring to the same variable (as opposed to
including UserValues that share a vreg as well)?
It believe so -- although I can't say I have a complete understanding
of LiveDebugVariables' equivalence classes. That aspect of an
equivalence class could be replaced with the identity of the defining
instruction instead, but, I think the interval-splitting that
LiveDebugVariables does wouldn't be necessary any more and so
computing equivalence wouldn't be necessary either.
> > How then do we translate this new kind of machine location into
> > DWARF/CodeView variable locations, which need to know which register
> > to look in? The answer is: LiveDebugValues [3]. We already perform a
> > dataflow analysis in LiveDebugValues of where values "go"
after
> > they're defined: we can use that to take "defining
instructions" and
> > determine register / stack locations. We would need to track values
> > from the defining instruction up to the DBG_VALUE where that value
> > becomes a variable location, after which it's no different from
the
> > LiveDebugValues analysis that we perform today. LiveDebugValues'
> > ability to track values through stack spills and restores would become
> > a critical feature (it isn't today), as we would no longer
generate
> > stack locations during register allocation.
>
> Do you expect that handling for the current and new-style DBG_VALUEs could
coexist in LiveDebugValues? Could that be done by e.g. introducing a new debug
instr MI (DBG_INSTR_REF)?
I believe it should be relatively straightforwards: we would have an
additional kind of location-record (i.e. VarLoc) that, instead of
identifying a _variable_, identified the instruction/operand that
computed a value. Handling a DBG_INSTR_REF as you suggest would mean
looking up such a VarLoc by instruction/operand to find its register,
then creating a register location VarLoc for the variable as we do
today. Propagating the location of an instruction/operand VarLoc would
happen in exactly the same way as variable locations today, it's just
following a value as it's moved around registers.
> > Limitations
> >
> > The largest problem with this idea is that not all variable values are
> > defined by instructions: PHIs are values that are defined by control
> > flow. To deal with this pre-regalloc, we could move LiveDebugVariables
> > to run before phi-elimination.
>
> I don’t really follow this. Are you suggesting stripping out debug values
before phi elim, and replacing them after virtregrewrite? How would the
re-inserted debug instrs refer to values produced by phis?
>
> > My understanding is that the register
> > allocation phase of LLVM starts there and ends after virtregrewriter,
> > and it'd be legitimate to say "we do special things for these
passes".
> > After regalloc however, there would need to be some way of specifying
> > a block and a register, where entry to the block defines a variable
> > value in that register. This isn't pretty; but IMO is the
> > representation closest to the truth.
>
> Oh, you answer this here. So LiveDebugVariables would need to figure out,
for each debug-use-of-phi, 1) which block to put it in and 2) which
register+variable gets defined. This is different enough from the new-style
debug instr to potentially warrant its own instruction (DBG_PHI?).
Yes and no; I think 1) should be un-necessary, or at least not be
difficult in any way. The DBG_PHIs would refer to a block and
register, identifying the PHIs value at a single program location, the
start of the block where it is defined. This is (98% certain) easy to
identify in virtregrewriter with a location query, and avoids
considering live ranges and interval splitting. We wouldn't need to
try and know where the PHI value is later in the block or in other
blocks: the same LiveDebugValues mechanism as DBG_INSTR_REF would be
able to determine the PHI values register location at the position of
a later DBG_PHI instruction.

(Again, I've put little thought into how that would be represented in
memory or MIR).
> > I feel like this would be a better way of representing variable
> > locations in the codegen backend; my fear is that this is a lot of
> > work, and I don't know what appetite there is for change amongst
other
> > interested parties. Thus I'd be interested in any kind of feedback
as
> > to whether a) this is a good idea, b) whether this category of change
> > is what people want, and c) whether this is seen as being achievable.
>
> Imho this is a good idea and I’d like to see something like this happen.
For it to ”really happen” we (Apple) would probably need a way to transition to
the new representation incrementally (e.g. to toggle a flag to get back the old
representation). I’m not yet sure about what all that would really entail.
That's great to hear! When you say toggling a flag, is that a flag for
a whole compilation, or returning to the old representation at
specific passes? I don't think it'd be too difficult to have two
implementations in the codebase and pick one to use, switching between
the two mid-flight is a lot harder of course.
> > Being able to introduce this change incrementally presents some
> > challenges: while the way of representing variable locations described
> > above is more expressive than the current way, converting between one
> > and the other requires running the LiveDebugValues analysis,
>
> This sounds like you’re planning to change the existing DBG_VALUE
instruction. Have you considered keeping it — unmodified — and introducing new
debug instructions for the new semantics? The migration story for that seems a
lot simpler.
I hadn't thought about keeping it; you're right, it'd be a lot
easier
(and a lot more testable) to do it that way. That also lines up with
having a toggle flag to allow incremental transition.
> If we can delete large parts of live debug variables this will be worth it
imho. But perhaps part of the plan here should include splitting up live debug
values into smaller files to simplify it?
Splitting up LiveDebugValues sounds like a good plan, entry value
development has added a certain amount of complexity to it too.

Thanks for the feedback!

--
Thanks,
Jeremy

Vedant Kumar via llvm-dev

2020-Feb-26 18:49 UTC

head link

[llvm-dev] [RFC] DebugInfo: A different way of specifying variable locations post-isel

> On Feb 25, 2020, at 10:28 AM, Jeremy Morse via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi Vedant, thanks for the detailed response,
> 
> On Tue, Feb 25, 2020 at 7:23 AM Vedant Kumar <vedant_kumar at
apple.com> wrote:
>>> Finally, being forced to always specify both the machine location
and
>>> the program location at the same time (in a single DBG_VALUE)
>>> introduces un-necessary burdens. In MachineSink, when we sink
between
>>> blocks an instruction that defines a vreg, we chose to sink
DBG_VALUE
>>> instructions referring to that vreg too to avoid losing the
variable
>>> location. This un-necessarily risks re-ordering assignments,
>> 
>> So under the proposed scheme, would the dbg value not be sunk? Ah, so
then you’d get the dbg use before def scenario, which you argue has some nice
properties below.
> 
> Indeed -- I'm aware of previous efforts to suppress dbg-use-before-def
> in the past, but I'm not aware of why it's not supported (it does
> /feel/ wrong at the very least). Perhaps the reason why was because it
> couldn't be described post-regalloc.
> 
> (As an aside, this would move us in the direction of making it easier
> to change LLVM-IR instructions, but harder to determine what a
> variable location is, through the middle stages of the compiler).
> 
>>> I'd like to suggest that we can represent variable locations in
the
>>> codegen backend / MIR with three things:
>>> * The instruction that defines the value of the variable,
>>> * The operand of that instruction into which the value is written,
>>> * The position in the instruction stream where the assignment of
this
>>> value to the variable occurs
>>> 
>>> That's effectively modifying a machine location from being a
{v,}reg,
>>> into being a "defining instruction" and operand. This is
closer to the
>>> LLVM-IR form of a machine location, where the SSA Value and its
>>> computation are synonymous. Exactly how this is represented
in-memory
>>> and in-printed-MIR I haven't thought a lot about; probably by
>>> attaching metadata to instructions and having DBG_VALUE use a
metadata
>>> operand rather than referring to a vreg. Specifying machine
locations
>>> like this would have the following benefits:
>>> * Both DBG_VALUEs and defining instructions are independent and can
>>> be moved within the function without loss of information, and
without
>>> needing to consider so much context,
>>> * Likewise, vregs can be rewritten / merged / deleted without the
>>> need to update any debug metadata. Only instruction deletion /
>>> morphing would need some sort of change,
>> 
>> For deletion, that makes sense, we might consider introducing an MI
level salvageDI method to help with that.
>> 
>> For replacement/morphing, wouldn’t MachineInstr::RAUW be able to update
the new-style debug uses?
> 
> I guess it would, on the assumption that a machineinstr having its
> operands changed doesn't actually change the value(s) that it
> computes.
> 
>>> The three instruction scheduling passes would become significantly
>>> easier to deal with: they would only have to replace DBG_VALUE
>>> instructions in the correct order, not worry about their operands.
>>> Various debug facilities in SimpleRegisterCoalescing, MachineSink,
and
>>> large amounts of LiveDebugVariables would become redundant, as we
>>> wouldn't need to maintain a register location through
optimisations.
>> 
>> I view this as a very large potential upside. LiveDebugVariables is
incredibly complex and is a huge compile-time bear. Defining away large chunks
of its job would be really nice.
>> 
>> Would this mean that an equivalence class in LiveDebugVariables would
consist exclusively of UserValues referring to the same variable (as opposed to
including UserValues that share a vreg as well)?
> 
> It believe so -- although I can't say I have a complete understanding
> of LiveDebugVariables' equivalence classes. That aspect of an
> equivalence class could be replaced with the identity of the defining
> instruction instead, but, I think the interval-splitting that
> LiveDebugVariables does wouldn't be necessary any more and so
> computing equivalence wouldn't be necessary either.
This sounds really great.
> 
>>> How then do we translate this new kind of machine location into
>>> DWARF/CodeView variable locations, which need to know which
register
>>> to look in? The answer is: LiveDebugValues [3]. We already perform
a
>>> dataflow analysis in LiveDebugValues of where values "go"
after
>>> they're defined: we can use that to take "defining
instructions" and
>>> determine register / stack locations. We would need to track values
>>> from the defining instruction up to the DBG_VALUE where that value
>>> becomes a variable location, after which it's no different from
the
>>> LiveDebugValues analysis that we perform today.
LiveDebugValues'
>>> ability to track values through stack spills and restores would
become
>>> a critical feature (it isn't today), as we would no longer
generate
>>> stack locations during register allocation.
>> 
>> Do you expect that handling for the current and new-style DBG_VALUEs
could coexist in LiveDebugValues? Could that be done by e.g. introducing a new
debug instr MI (DBG_INSTR_REF)?
> 
> I believe it should be relatively straightforwards: we would have an
> additional kind of location-record (i.e. VarLoc) that, instead of
> identifying a _variable_, identified the instruction/operand that
> computed a value. Handling a DBG_INSTR_REF as you suggest would mean
> looking up such a VarLoc by instruction/operand to find its register,
> then creating a register location VarLoc for the variable as we do
> today. Propagating the location of an instruction/operand VarLoc would
> happen in exactly the same way as variable locations today, it's just
> following a value as it's moved around registers.
I see, so the required addition to LiveDebugValues is basically about
reconstituting VarLocs from new-style debug values? And otherwise, the basic
range extension algorithm is unchanged?
> 
>>> Limitations
>>> 
>>> The largest problem with this idea is that not all variable values
are
>>> defined by instructions: PHIs are values that are defined by
control
>>> flow. To deal with this pre-regalloc, we could move
LiveDebugVariables
>>> to run before phi-elimination.
>> 
>> I don’t really follow this. Are you suggesting stripping out debug
values before phi elim, and replacing them after virtregrewrite? How would the
re-inserted debug instrs refer to values produced by phis?
>> 
>>> My understanding is that the register
>>> allocation phase of LLVM starts there and ends after
virtregrewriter,
>>> and it'd be legitimate to say "we do special things for
these passes".
>>> After regalloc however, there would need to be some way of
specifying
>>> a block and a register, where entry to the block defines a variable
>>> value in that register. This isn't pretty; but IMO is the
>>> representation closest to the truth.
>> 
>> Oh, you answer this here. So LiveDebugVariables would need to figure
out, for each debug-use-of-phi, 1) which block to put it in and 2) which
register+variable gets defined. This is different enough from the new-style
debug instr to potentially warrant its own instruction (DBG_PHI?).
> 
> Yes and no; I think 1) should be un-necessary, or at least not be
> difficult in any way. The DBG_PHIs would refer to a block and
> register, identifying the PHIs value at a single program location, the
> start of the block where it is defined. This is (98% certain) easy to
> identify in virtregrewriter with a location query, and avoids
> considering live ranges and interval splitting. We wouldn't need to
> try and know where the PHI value is later in the block or in other
> blocks: the same LiveDebugValues mechanism as DBG_INSTR_REF would be
> able to determine the PHI values register location at the position of
> a later DBG_PHI instruction.
> 
> (Again, I've put little thought into how that would be represented in
> memory or MIR).
> 
>>> I feel like this would be a better way of representing variable
>>> locations in the codegen backend; my fear is that this is a lot of
>>> work, and I don't know what appetite there is for change
amongst other
>>> interested parties. Thus I'd be interested in any kind of
feedback as
>>> to whether a) this is a good idea, b) whether this category of
change
>>> is what people want, and c) whether this is seen as being
achievable.
>> 
>> Imho this is a good idea and I’d like to see something like this
happen. For it to ”really happen” we (Apple) would probably need a way to
transition to the new representation incrementally (e.g. to toggle a flag to get
back the old representation). I’m not yet sure about what all that would really
entail.
> 
> That's great to hear! When you say toggling a flag, is that a flag for
> a whole compilation, or returning to the old representation at
> specific passes? I don't think it'd be too difficult to have two
> implementations in the codebase and pick one to use, switching between
> the two mid-flight is a lot harder of course.
I'm thinking of a flag for the whole compilation, like -Xclang <use the
new debug instructions>. I don't think we'd need to switch between
representations mid-flight.
> 
>>> Being able to introduce this change incrementally presents some
>>> challenges: while the way of representing variable locations
described
>>> above is more expressive than the current way, converting between
one
>>> and the other requires running the LiveDebugValues analysis,
>> 
>> This sounds like you’re planning to change the existing DBG_VALUE
instruction. Have you considered keeping it — unmodified — and introducing new
debug instructions for the new semantics? The migration story for that seems a
lot simpler.
> 
> I hadn't thought about keeping it; you're right, it'd be a lot
easier
> (and a lot more testable) to do it that way. That also lines up with
> having a toggle flag to allow incremental transition.
> 
>> If we can delete large parts of live debug variables this will be worth
it imho. But perhaps part of the plan here should include splitting up live
debug values into smaller files to simplify it?
> 
> Splitting up LiveDebugValues sounds like a good plan, entry value
> development has added a certain amount of complexity to it too.
> 
> Thanks for the feedback!
> 
> --
> Thanks,
> Jeremy
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<stitching together a response to a sub-thread with Adrian>
>> What about constants and memory locations?
> 
> Good question, and something I've not done too much thinking about. As
> far as I'm aware, memory references today are all register based, with
> the DIExpression expressing any memory operations. That should
> translate to the proposed model naturally: memory locations would be
> instruction references with a suitable DIExpression qualifying the
> value.

For stack slots, I guess we'd mostly still rely on the MachineFunction
side-table.

We do sometimes refer to stack locations using a register-based description,
like:

  DBG_VALUE $rsp, 0, ![[SVAR]], !DIExpression(DW_OP_constu, 8, DW_OP_minus,
DW_OP_plus_uconst, 1)

Under the proposed new model, we could change this to a DBG_INSTR_REF pointing
to the instruction which creates the memory location. I think this is how
it's done in IR. An alternative might be to have the DBG_INSTR_REF point to
the instruction that completes the store of the variable into the memory
location. That doesn't sound as good to me. It doesn't seem like
it'd be simple to work out the location from the instruction (e.g. if
we're storing into an offset).
> Constants are trickier; it's probably easiest to keep DBG_VALUE
> instructions to describe constants. This shouldn't be limiting at all,
> as constant-valued locations aren't tied to specific program locations
> in the same way register locations are.
Seems reasonable to me.

vedant

Jeremy Morse via llvm-dev

2020-Mar-02 18:05 UTC

head link

[llvm-dev] [RFC] DebugInfo: A different way of specifying variable locations post-isel

Hi Vedant,

On Wed, Feb 26, 2020 at 6:49 PM Vedant Kumar <vedant_kumar at apple.com>
wrote:> >> Do you expect that handling for the current and new-style
DBG_VALUEs could coexist in LiveDebugValues? Could that be done by e.g.
introducing a new debug instr MI (DBG_INSTR_REF)?
> >
> > I believe it should be relatively straightforwards: we would have an
> > additional kind of location-record (i.e. VarLoc) that, instead of
> > identifying a _variable_, identified the instruction/operand that
> > computed a value. Handling a DBG_INSTR_REF as you suggest would mean
> > looking up such a VarLoc by instruction/operand to find its register,
> > then creating a register location VarLoc for the variable as we do
> > today. Propagating the location of an instruction/operand VarLoc would
> > happen in exactly the same way as variable locations today, it's
just
> > following a value as it's moved around registers.
>
> I see, so the required addition to LiveDebugValues is basically about
reconstituting VarLocs from new-style debug values? And otherwise, the basic
range extension algorithm is unchanged?
Correct -- the dataflow analysis shouldn't require any kind of
algorithmic change, instead we're altering the position at which
values get tracked to be specified by a defining instruction, rather
than a DBG_VALUE. We're also adding two different flavours of
location:
 * Values that are not yet variable locations
 * Variable locations that don't yet have their value defined
(use-before-def)
However I don't believe these two new flavours alter the algorithm,
they're just making different use of the information it produces. (96%
certain).
> <stitching together a response to a sub-thread with Adrian>
>
> >> What about constants and memory locations?
> >
> > Good question, and something I've not done too much thinking
about. As
> > far as I'm aware, memory references today are all register based,
with
> > the DIExpression expressing any memory operations. That should
> > translate to the proposed model naturally: memory locations would be
> > instruction references with a suitable DIExpression qualifying the
> > value.
>
> For stack slots, I guess we'd mostly still rely on the MachineFunction
side-table.
>
> We do sometimes refer to stack locations using a register-based
description, like:
>
>   DBG_VALUE $rsp, 0, ![[SVAR]], !DIExpression(DW_OP_constu, 8, DW_OP_minus,
DW_OP_plus_uconst, 1)
>
> Under the proposed new model, we could change this to a DBG_INSTR_REF
pointing to the instruction which creates the memory location. I think this is
how it's done in IR. An alternative might be to have the DBG_INSTR_REF point
to the instruction that completes the store of the variable into the memory
location. That doesn't sound as good to me. It doesn't seem like
it'd be simple to work out the location from the instruction (e.g. if
we're storing into an offset).
Hmmmm, this raises some interesting extra questions -- for example, I
don't believe there are instructions creating memory locations until
the PrologEpilog pass runs. An easy solution would be just to retain
DBG_VALUEs referring to frame-indexs, but that replicates some of the
flaws in referring to registers (multiple defs), although not to the
same extent. Offsets from the stack pointer aren't determined until
PrologEpilog too. Slightly more design work is probably needed here.

~

Reid wrote:> Basically, I think this is a great idea.
Much appreciated!,
> Debug values effectively represent the original program location of an
assignment (or PHI, but let's ignore that for now) and the assigned value.
Transforms should focus on preserving information about the assigned value, not
the original program location of the assignment. Transforms should already know
how to calculate the assigned value in terms of other values, but they
shouldn't have to know anything about variable assignments, control
dependence, etc.
Strong agree; the more of this we can design out of maintaining
variable location information the better.

~

Seeing how feedback has been positive, I'll hack up a prototype
sometime soon to allow for experimentation. I think one of the bigger
risks in the whole idea is having all variable locations hinge on
LiveDebugValues' ability to track values: if there's some loss of
information about instructions by the time we reach LiveDebugValues,
it might be impossible to track or guarantee many variable locations.
We would also be increasing the number of locations that it has to
follow (through the new flavours of VarLocs); I'm not too concerned
about this, as I don't believe there would be many more locations than
we track today.

--
Thanks,
Jeremy

llvm dev - Feb 2020 - [RFC] DebugInfo: A different way of specifying variable locations post-isel

[llvm-dev] [RFC] DebugInfo: A different way of specifying variable locations post-isel

[llvm-dev] [RFC] DebugInfo: A different way of specifying variable locations post-isel

[llvm-dev] [RFC] DebugInfo: A different way of specifying variable locations post-isel