thr3ads.net - llvm dev - [LLVMdev] First-class debug info IR: MDLocation [Nov 2014]

If this information is useful, please help other people find it:
Share via:

Duncan P. N. Exon Smith

2014-Oct-27 17:49 UTC

[LLVMdev] First-class debug info IR: MDLocation

> On 2014-Oct-27, at 00:58, Chandler Carruth <chandlerc at google.com>
wrote:
> 
> I haven't been able to follow all of the thread that got us here but
your patch below has distilled the result enough for me to at least ask
questions.
Always takes a patch to draw people in :).
> I'm sorry of some of the justification is buried in the thread and
I'm just making you repeat it, but I suspect I'm not the only one that
would benefit from the rationale being summarized here.
> 
> On Fri, Oct 24, 2014 at 4:16 PM, Duncan P. N. Exon Smith <dexonsmith at
apple.com> wrote:
> Using `Value` instead of `MDNode`
> ================================> 
> A number of APIs expect `MDNode` -- previously, the only referenceable
> type of metadata -- but this patch (and the ones that will follow) have
> referenceable metadata that *do not* inherit from `MDNode`.  Metadata
> APIs such as `Instruction::getMetadata()` and
> `NamedMDNode::getOperand()` need to return non-`MDNode` metadata.
> 
> To me, this change is a red flag
This bothers me too -- which is why I highlighted it -- but I don't
see any alternatives.  It seems like a natural fallout of the rest of
the proposal.
> and points out a bit of a lie in the subject line: this is not actually
first-class debug-info IR. This is just making debug info become special
metadata with special encoding properties.
How special does it have to be to be labeled "first-class"?

IMO, the label makes sense here:  custom C++ type, bitcode, assembly,
uniquing, and ownership.  Doesn't seem any less "special" than,
say,
`AddInst`, but I don't really care what we call it.

Have I missed your point?  (Are you suggesting metadata is inherently
second-class?  How so?)
> Note, I'm actually ok with us having special metadata that has special
encoding properties. But if we're going that route, I don't think that
there is anything "debug info" centric about it, and it shouldn't
be described as such.
The infrastructure won't be debug info centric, but the scope of the
project certainly is.

I'll be making "first-class" types for each type of debug info we
use,
and changing their schema, ownership, and uniquing along the way.

I'm not going to touch any other metadata, although certainly if we
find that (e.g.) profile metadata has become a source of pain, we
could customize it in the future.
> I also think the relationship of MDUser, MDNode, and MDString need to be
clarified a great deal. Why doesn't getMetadata return an 'MDUser*'
for example?
In the class hierarchy:

  - Value -> MDString
  - Value -> MDNode
  - Value -> NamedMDNode
  - Value -> User -> MDUser -> ...

I named `MDUser` by its relationship with `User`, but maybe a better
name is `CustomMD`?  (Any suggestions?)

Here's a breakdown:

  - `MDString` is an arbitrary string that can be used as an operand.
    It's owned by an `LLVMContext` and treated like a constant.

  - `MDNode` is a generic node with arbitrary operands.  It can
    itself be used as an operand, and it can be attached to
    `Instruction`s.  Its operands don't "know" that it's using
them.
    It's owned by an `LLVMContext` and treated like a constant.

  - `NamedMDNode` is a generic node with arbitrary operands.  It cannot
    be used as an operand, and its operands must all be `MDNode` (or
    `MDUser`/`CustomMD`).  It's owned by a `Module`.

  - `MDUser`/`CustomMD` is a parent class for specific types of
    metadata.  They can be used as operands and attached to
    `Instruction`s.  Their metadata operands "know" that they're
being
    used, but they may have handles to non-metadata (which don't know).
    Their ownership is customized per subclass.

Also relevant (and kind of implied by the above): `Instruction` can
have arbitrary metadata attached to it, but it must be an `MDNode` (or
an `MDUser`/`CustomMD`).
> It feels as though you really want to sink the current functionality of
MDNode down to some subclass of a more generic metadata IR type? Maybe I'm
misunderstanding?
IMO, the overlap in functionality between `User` and `MDNode` precludes
having a (sane) inheritance relationship between `MDNode` and `MDUser`/
`CustomMD`.

Both `User` and `MDNode` implement support for an arbitrary number of
operands, but using completely incompatible mechanisms.

Since `MDNode` can reference an arbitrary number of arbitrary `Value`s,
it uses a subclass of `CallbackVH` called `MDNodeOperand` that costs
32B (x86-64).  Moreover, RAUW is expensive.

`MDUser`/`CustomMD` inherits from `User` so that its subclasses can
leverage the use-list infrastructure (8B per operand, fast RAUW).
> I also have to ask because I can't currently see it: what does debug
info being metadata buy us?
I suppose it buys us:

  - the guarantee that debug info doesn't modify optimizations or code
    generation, and

  - the flexibility for optimizations to ignore/drop it when they're
    not smart enough to update it.
> How much code is simplified by that, and at what cost?
I think that's hard to quantify.

I suppose the obvious alternative is to rewrite debug info from the
ground up, without using metadata at all.  I considered this, but
didn't find a compelling argument for it.  Main arguments against it:
it would be harder to implement incrementally, and it would increase
the amount of non-code IR.

Moreover, once we have specific subclasses and bitcode support for
debug info types, moving away from metadata (or even the `Value`
hierarchy entirely) would be an incremental step.

Do you have any specific alternatives mind?

Eric Christopher

2014-Nov-05 23:47 UTC

head link

[LLVMdev] First-class debug info IR: MDLocation

Hi Duncan,

Here's a bit of a long delayed reply, I've been thinking about this for
a
bit now and have a bit (I think) of a more firm idea on how I see this.

> Always takes a patch to draw people in :).
>
>Agreed. I've been looking at the patches as they've been going in. I
think
it's mostly making me hate the Value->User hierarchy.

> > Using `Value` instead of `MDNode`
> > ================================> >
> > A number of APIs expect `MDNode` -- previously, the only referenceable
> > type of metadata -- but this patch (and the ones that will follow)
have
> > referenceable metadata that *do not* inherit from `MDNode`.  Metadata
> > APIs such as `Instruction::getMetadata()` and
> > `NamedMDNode::getOperand()` need to return non-`MDNode` metadata.
> >
> > To me, this change is a red flag
>
> This bothers me too -- which is why I highlighted it -- but I don't
> see any alternatives.  It seems like a natural fallout of the rest of
> the proposal.
>
>This hurts a lot. It's by far the worst part of this currently.

I think ideally I'd almost rather see Metadata not inherit from Value.
Metadata doesn't need to be typed, it describes typed things. It isn't
or
doesn't produce a value, it's just a side structure that we find worth
serializing. A container of some bits that can reference IR when necessary.

The problematic thing is the User-ness of Metadata. It definitely wants to
point to Values/Users, it definitely can use a list of where it's used for
things like RAUW - which we still use for temporary MDNodes right now. I
think that aspect is a good thing, but it doesn't necessarily need to be
blazingly fast.

> `MDUser`/`CustomMD` inherits from `User` so that its subclasses can
> leverage the use-list infrastructure (8B per operand, fast RAUW).
>
> > I also have to ask because I can't currently see it: what does
debug
> info being metadata buy us?
>
> I suppose it buys us:
>
>   - the guarantee that debug info doesn't modify optimizations or code
>     generation, and
>
>   - the flexibility for optimizations to ignore/drop it when they're
>     not smart enough to update it.
>
Agreed. I think this is important.

> > How much code is simplified by that, and at what cost?
>
> I think that's hard to quantify.
>
>Quite a bit IMO. If we could get rid of dbg_declare and dbg_value it'd be
even more. Ideally, as we've discussed in the past, these could be metadata
on instructions or <things that turn into lexical blocks>. Some work would
need to be done to handle this, but we'd get rid of the instructions out of
the IR.

> I suppose the obvious alternative is to rewrite debug info from the
> ground up, without using metadata at all.  I considered this, but
> didn't find a compelling argument for it.  Main arguments against it:
> it would be harder to implement incrementally, and it would increase
> the amount of non-code IR.
>
> Moreover, once we have specific subclasses and bitcode support for
> debug info types, moving away from metadata (or even the `Value`
> hierarchy entirely) would be an incremental step.
>
> Do you have any specific alternatives mind?
>
How about the above discussion? I think it might end up being a bit more
work in the short term, but it moves metadata to a separate hierarchy. I
don't think a fast RAUW is too important (call it once per type - I think
we RAUW much more on instructions/constants/etc and so speed is an
advantage there). Module already has a separate set of iterators for named
metadata so that side of things isn't too bad.

Thoughts?

-eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20141105/8eea9b2a/attachment.html>

Duncan P. N. Exon Smith

2014-Nov-06 02:47 UTC

head link

[LLVMdev] First-class debug info IR: MDLocation

> On 2014-Nov-05, at 15:47, Eric Christopher <echristo at gmail.com>
wrote:
> 
> Hi Duncan,
> 
> Here's a bit of a long delayed reply, I've been thinking about this
for a bit now and have a bit (I think) of a more firm idea on how I see this.
>  
> Always takes a patch to draw people in :).
> 
> 
> Agreed. I've been looking at the patches as they've been going in.
I think it's mostly making me hate the Value->User hierarchy.
>  
> > Using `Value` instead of `MDNode`
> > ================================> >
> > A number of APIs expect `MDNode` -- previously, the only referenceable
> > type of metadata -- but this patch (and the ones that will follow)
have
> > referenceable metadata that *do not* inherit from `MDNode`.  Metadata
> > APIs such as `Instruction::getMetadata()` and
> > `NamedMDNode::getOperand()` need to return non-`MDNode` metadata.
> >
> > To me, this change is a red flag
> 
> This bothers me too -- which is why I highlighted it -- but I don't
> see any alternatives.  It seems like a natural fallout of the rest of
> the proposal.
> 
> 
> This hurts a lot. It's by far the worst part of this currently.
(FWIW, I think I've made all the `MDNode => Value` changes, so I
don't
expect any further ugliness going forward.)
> I think ideally I'd almost rather see Metadata not inherit from Value.
Metadata doesn't need to be typed, it describes typed things. It isn't
or doesn't produce a value, it's just a side structure that we find
worth serializing. A container of some bits that can reference IR when
necessary.
> 
> The problematic thing is the User-ness of Metadata. It definitely wants to
point to Values/Users, it definitely can use a list of where it's used for
things like RAUW - which we still use for temporary MDNodes right now. I think
that aspect is a good thing, but it doesn't necessarily need to be blazingly
fast.
I think this is an interesting direction.  More below.

Side note: currently RAUW happens *a lot* in debug info during
parsing and linking, but my plan for debug-info specific classes and
"fixed" ownership designs it away (so it's a moot point).
> `MDUser`/`CustomMD` inherits from `User` so that its subclasses can
> leverage the use-list infrastructure (8B per operand, fast RAUW).
> 
> > I also have to ask because I can't currently see it: what does
debug info being metadata buy us?
> 
> I suppose it buys us:
> 
>   - the guarantee that debug info doesn't modify optimizations or code
>     generation, and
> 
>   - the flexibility for optimizations to ignore/drop it when they're
>     not smart enough to update it.
> 
> Agreed. I think this is important.
>  
> > How much code is simplified by that, and at what cost?
> 
> I think that's hard to quantify.
> 
> 
> Quite a bit IMO. If we could get rid of dbg_declare and dbg_value it'd
be even more. Ideally, as we've discussed in the past, these could be
metadata on instructions or <things that turn into lexical blocks>. Some
work would need to be done to handle this, but we'd get rid of the
instructions out of the IR.
I've thought through how to get rid of these, via !dbg.value metadata
attachments.  I don't think it would be too difficult.  However, since
dbg.value sometimes needs to reference function arguments, it requires
new IR support.

As a straw man, I was thinking something ugly like this for the
assembly:

    ; Single metadata attachment to %arg.
    void @foo(i64 %arg !dbg.value !23) {
      ret void
    }

    ; Multiple attachments to %arg.
    void @foo(i64 %arg {!dbg.value !23, !other.attachment !24}) {
      ret void
    }

I think there are non-trivial implications for SDAG and MI of removing
the intrinsics, but I haven't really dug into it yet.
>  
> I suppose the obvious alternative is to rewrite debug info from the
> ground up, without using metadata at all.  I considered this, but
> didn't find a compelling argument for it.  Main arguments against it:
> it would be harder to implement incrementally, and it would increase
> the amount of non-code IR.
> 
> Moreover, once we have specific subclasses and bitcode support for
> debug info types, moving away from metadata (or even the `Value`
> hierarchy entirely) would be an incremental step.
> 
> Do you have any specific alternatives mind?
> 
> How about the above discussion? I think it might end up being a bit more
work in the short term, but it moves metadata to a separate hierarchy. I
don't think a fast RAUW is too important (call it once per type - I think we
RAUW much more on instructions/constants/etc and so speed is an advantage
there). Module already has a separate set of iterators for named metadata so
that side of things isn't too bad.
> 
> Thoughts?
This goal SGTM.  I think splitting from `Value` hierarchy designs away a
lot of problems for metadata.  I hadn't previously considered separating
`MDNode` from the `Value` hierarchy (I was thinking of just splitting
out debug info), but maybe I wasn't thinking big enough.

I have a few questions/concerns:

 1. This seems to imply dropping support for RAUW on general metadata,
    only supporting it for specific cases like `MDNode::getTemporary()`.
    Is that a good idea?

 2. This also implies dropping support for
    metadata-as-intrinsic-operands.  Is that a good idea?

 3. This is gated on replacing the debug info intrinsics with metadata
    attachments.  Is that the right sequence for staging?

Putting those questions aside, splitting metadata from `Value` seems
just as easy/hard after establishing a debug info hierarchy as before --
i.e., the assembly syntax and ownership work I've outlined is still a
reasonable incremental step.

How do you feel about moving forward roughly as I'd planned, and we can
work out design issues around cutting-the-Value-cord in parallel?

llvm dev - Nov 2014 - [LLVMdev] First-class debug info IR: MDLocation

[LLVMdev] First-class debug info IR: MDLocation

[LLVMdev] First-class debug info IR: MDLocation

[LLVMdev] First-class debug info IR: MDLocation