Duncan P. N. Exon Smith
2014-Nov-10 18:17 UTC
[LLVMdev] [RFC] Separating Metadata from the Value hierarchy
> On 2014-Nov-10, at 08:30, Chris Lattner <clattner at apple.com> wrote: > >> On Nov 9, 2014, at 5:02 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: >> In response to my recent commits (e.g., [1]) that changed API from >> `MDNode` to `Value`, Eric had a really interesting idea [2] -- split >> metadata entirely from the `Value` hierarchy, and drop general support >> for metadata RAUW. > > Wow, this never occurred to me, but in hindsight seems like obviously the right direction.Yup, good on Eric here.>> Here is what we'd lose: >> >> 1. No more arbitrary RAUW of metadata. >> >> While we'd keep support for RAUW of temporary MDNodes for use as >> forward declarations (when parsing assembly or constructing cyclic >> graphs), drop RAUW support for all other metadata. > > This seems fine to me, a corollary of this should be that MDNodes never “move around” in memory due to late uniquing etc, which means that TrackingVH shouldn’t be necessary, right? This should make all frontends a lot more memory efficient because they can just use raw pointers to MDNodes everywhere.Almost. Two caveats: 1. Handles should generally be `MDRef` instead of `Metadata*`, since I threw in reference-counting semantics so that no-longer-referenced metadata cleans itself up. 2. If a handle might point to a forward reference -- i.e., a `TempMDNode` in this patch -- it should use `TrackingMDRef`. When the forward reference gets resolved, it updates all of its tracking references. Nevertheless, `sizeof(MDRef) == sizeof(TrackingMDRef) == sizeof(void*)`. Frontends *only* pay memory overhead for the currently-unresolved forward references. (Maybe `MDNodeFwdRef` is a better name than `TempMDNode`?)> >> 2. No more function-local metadata. > > This was a great idea that never mattered, I’m happy to drop it for the greater good :-) >Awesome.> +class Metadata { > +private: > + LLVMContext &Context; > > Out of curiosity, why do Metadata nodes all need to carry around a context? This seems like memory bloat that would be great to avoid if possible.There are two uses for the context: - RAUW. Metadata that can RAUW (`TempMDNode` and `MetadataAsValue`) need a context. Other metadata need access to a context when their operands get RAUW'ed (so they can re-unique themselves), but this could be passed in as an argument, so they don't need their own. - Reference counting. My sketch uses reference counting for all the nodes, so they all need a context to delete themselves when their last reference gets dropped. Customized ownership of debug info metadata will allow us to get the context from parent pointers, so we might be able to remove this down the road (depending on whether the `Metadata` subclass is uniqued).> +template <class T> class TypedMDRef : public MDRef { > > Is this a safe way to model this, should it use private inheritance instead? Covariance seems like it would break this: specifically something could pass off a typed MDRef as an MDRef&, then the client could reassign something of the wrong type into it.Good point.> + MDString(LLVMContext *Context) : Metadata(*Context, MDStringKind) { > + assert(Context && "Expected non-null context"); > ... > + static MDStringRef get(LLVMContext &Context, StringRef String); > > You have reference/pointer disagreement, I’d recommend taking LLVMContext& to the ctor for uniformity (and remove the assert).This is a workaround for `StringMapEntry`'s imperfect forwarding -- it passes the `InitVal` constructor argument by value. I'll fix the problem at its source and clean this up.> +class ReplaceableMetadataImpl { > > I’m not following your algorithm intentions well yet, but this seems like a really heavy-weight implementation, given that this is a common base class for a number of other important things.Unfortunately I think this weight will be hard to optimize out (although I'm sure it could be a little smaller). In the `Value` hierarchy, you pay memory for RAUW at the site of each `Use`. This makes sense, since most values can be RAUW'ed. Since very little metadata needs to be RAUW'ed, we don't want to pay memory at every use-site. Instead, we pay the cost inside the (hopefully) few instances that support RAUW. The core assumption is that there are relatively few replaceable metadata instances. The only subclasses are `ValueAsMetadata` and `TempMDNode`. How many of these are live? - There will be at most one `ValueAsMetadata` instance for each metadata operand that points at a `Value`. My data from a couple of months ago showed that there are very few of these. - `TempMDNodes` are used as forward references. You only pay their cost until the forward reference gets resolved (i.e., deleted). The main case I'm worried about here are references to `ConstantInt`s, which are used pretty heavily outside of debug info (e.g., in `!prof` attachments). However, if that shows up in a profile, we can bypass `Value`s entirely by creating a new `MDIntArray` subclass (or something).> +class ValueAsMetadata : public Metadata, ReplaceableMetadataImpl { > > I think that ValueAsMetadata is a great thing to have - it is essential to refer to global variables etc. That said, I think that it will eventually be worthwhile to have metadata versions of integers and other common things to reduce memory. This should come in a later pass of course.Yup, agree entirely.> > I don’t follow the point of > +class UniquableMDNode : public MDNode { > > What would subclass it? What is the use-case? Won’t MDStrings continue to be uniqued?I think you've just been misled by my terrible name. This sketch splits the "temporary" concept for `MDNode` into a separate subclass, so that non-forward-reference `MDNode`s don't have to pay for RAUW overhead. The class hierarchy I envision looks something like this: Metadata MDNode TempMDNode // MDNodeFwdRef? UniquableMDNode // GenericMDNode? DINode // Is this layer useful? DILocation DIScope DIType DIBasicType DICompositeType ... DISubprogram ... DICompileUnit ... MDString ValueAsMetadata `UniquableMDNode` is a leaf-class that behaves like the current `MDNode` (when it's not a temporary). I called it "uniquable" because, unlike `TempMDNode`, it is uniqued by default (although you can opt-out of it, and uniquing might be dropped). Maybe a better name is `GenericMDNode`? Also, off-topic, but after sketching out the imagined hierarchy above, I'm not sure `DINode` is particularly useful.
Pete Cooper
2014-Nov-10 19:05 UTC
[LLVMdev] [RFC] Separating Metadata from the Value hierarchy
> On Nov 10, 2014, at 10:17 AM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: > > The class hierarchy I envision looks something like this: > > Metadata > MDNode > TempMDNode // MDNodeFwdRef? > UniquableMDNode // GenericMDNode? > DINode // Is this layer useful? > DILocation > DIScope > DIType > DIBasicType > DICompositeType > ... > DISubprogram > ... > DICompileUnit > ... > MDString > ValueAsMetadataWhy not just make the Debug classes totally first class, i.e., the root of their own hierarchy? Given that you’re removing local metadata, fixing up dbgvalue, and debug locations are already stored on instructions in a special way, I don’t see why we need to even represent debug info as metadata at all. Debug info is already being treated specially in parsing, printing, storage, etc. Seems like having Module contain a pointer to its CompileUnit(s) would move a huge amount of this stuff completely out of metadata. Then you can still have your own rules for uniquing if you want them. Pete -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141110/ad0b0630/attachment.html>
Duncan P. N. Exon Smith
2014-Nov-10 19:57 UTC
[LLVMdev] [RFC] Separating Metadata from the Value hierarchy
> On 2014-Nov-10, at 11:05, Pete Cooper <peter_cooper at apple.com> wrote: > > >> On Nov 10, 2014, at 10:17 AM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote: >> >> The class hierarchy I envision looks something like this: >> >> Metadata >> MDNode >> TempMDNode // MDNodeFwdRef? >> UniquableMDNode // GenericMDNode? >> DINode // Is this layer useful? >> DILocation >> DIScope >> DIType >> DIBasicType >> DICompositeType >> ... >> DISubprogram >> ... >> DICompileUnit >> ... >> MDString >> ValueAsMetadata > Why not just make the Debug classes totally first class, i.e., the root of their own hierarchy? Given that you’re removing local metadata, fixing up dbgvalue, and debug locations are already stored on instructions in a special way, I don’t see why we need to even represent debug info as metadata at all.Two reasons jump out: 1. It's serving as a nice forcing function for cleaning up the rest of metadata. 2. I suspect the rules for "how to deal with debug info" would look pretty similar to those for "how to deal with metadata".> Debug info is already being treated specially in parsing, printing, storage, etc. Seems like having Module contain a pointer to its CompileUnit(s) would move a huge amount of this stuff completely out of metadata. Then you can still have your own rules for uniquing if you want them.Yup, that's the plan, except for the "out of metadata" part. Once the hierarchy is fixed I'll create custom types for debug info, whose ownership (and uniquing where appropriate) I'll customize. I agree that a module should own its compile unit(s) directly. If there's a compelling reason to separate debug info entirely, the planned work is IMO the right incremental step to get there. We need to customize the ownership first. But cleaning up metadata and customizing debug info ownership (etc.) could be sufficient.
Eric Christopher
2014-Nov-10 20:08 UTC
[LLVMdev] [RFC] Separating Metadata from the Value hierarchy
On Mon Nov 10 2014 at 11:08:04 AM Pete Cooper <peter_cooper at apple.com> wrote:> On Nov 10, 2014, at 10:17 AM, Duncan P. N. Exon Smith < > dexonsmith at apple.com> wrote: > > The class hierarchy I envision looks something like this: > > Metadata > MDNode > TempMDNode // MDNodeFwdRef? > UniquableMDNode // GenericMDNode? > DINode // Is this layer useful? > DILocation > DIScope > DIType > DIBasicType > DICompositeType > ... > DISubprogram > ... > DICompileUnit > ... > MDString > ValueAsMetadata > > Why not just make the Debug classes totally first class, i.e., the root of > their own hierarchy? Given that you’re removing local metadata, fixing up > dbgvalue, and debug locations are already stored on instructions in a > special way, I don’t see why we need to even represent debug info as > metadata at all. > >Because it fits the general idea of metadata. The same ideas behind how to deal with debug info is the same as how to deal with metadata.> Debug info is already being treated specially in parsing, printing, > storage, etc. Seems like having Module contain a pointer to its > CompileUnit(s) would move a huge amount of this stuff completely out of > metadata. Then you can still have your own rules for uniquing if you want > them. >I disagree with this in that I don't think it's a necessary part of the design. With these changes there's no compelling argument to separate out debug info at all. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141110/e9077026/attachment.html>
Eric Christopher
2014-Nov-10 20:28 UTC
[LLVMdev] [RFC] Separating Metadata from the Value hierarchy
> > > > I’m not following your algorithm intentions well yet, but this seems > like a really heavy-weight implementation, given that this is a common base > class for a number of other important things. > > Unfortunately I think this weight will be hard to optimize out (although > I'm sure it could be a little smaller). > > In the `Value` hierarchy, you pay memory for RAUW at the site of each > `Use`. This makes sense, since most values can be RAUW'ed. > > Since very little metadata needs to be RAUW'ed, we don't want to pay > memory at every use-site. Instead, we pay the cost inside the > (hopefully) few instances that support RAUW. >Agreed. I don't see us needing to RAUW debug info outside of llvm-link and the forward references. For other metadata uses RAUW doesn't seem to be used at all (at least a quick glance didn't show TempMDNodes anywhere).> > The core assumption is that there are relatively few replaceable > metadata instances. The only subclasses are `ValueAsMetadata` and > `TempMDNode`. How many of these are live? > > - There will be at most one `ValueAsMetadata` instance for each > metadata operand that points at a `Value`. My data from a couple of > months ago showed that there are very few of these. > > - `TempMDNodes` are used as forward references. You only pay their > cost until the forward reference gets resolved (i.e., deleted). > > The main case I'm worried about here are references to `ConstantInt`s, > which are used pretty heavily outside of debug info (e.g., in `!prof` > attachments). However, if that shows up in a profile, we can bypass > `Value`s entirely by creating a new `MDIntArray` subclass (or > something). > >This makes sense.> > The class hierarchy I envision looks something like this: > > Metadata > MDNode > TempMDNode // MDNodeFwdRef? > UniquableMDNode // GenericMDNode? > DINode // Is this layer useful? > DILocation > DIScope > DIType > DIBasicType > DICompositeType > ... > DISubprogram > ... > DICompileUnit >FWIW technically a compile unit is a scope so would probably need to be under that. Otherwise I'm missing a change rationale (I haven't read the full patch yet, I apologize if it's clear from the patch).> ... > MDString > ValueAsMetadata > > `UniquableMDNode` is a leaf-class that behaves like the current `MDNode` > (when it's not a temporary). I called it "uniquable" because, unlike > `TempMDNode`, it is uniqued by default (although you can opt-out of it, > and uniquing might be dropped). > > Maybe a better name is `GenericMDNode`? > > Also, off-topic, but after sketching out the imagined hierarchy above, > I'm not sure `DINode` is particularly useful. >Possibly? Might be from the "organizing all of the metadata types" level? -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141110/904545fc/attachment.html>
Duncan P. N. Exon Smith
2014-Nov-10 20:58 UTC
[LLVMdev] [RFC] Separating Metadata from the Value hierarchy
> On 2014-Nov-10, at 12:28, Eric Christopher <echristo at gmail.com> wrote: > > The class hierarchy I envision looks something like this: > > Metadata > MDNode > TempMDNode // MDNodeFwdRef? > UniquableMDNode // GenericMDNode? > DINode // Is this layer useful? > DILocation > DIScope > DIType > DIBasicType > DICompositeType > ... > DISubprogram > ... > DICompileUnit > > FWIW technically a compile unit is a scope so would probably need to be under that. Otherwise I'm missing a change rationale (I haven't read the full patch yet, I apologize if it's clear from the patch).My mistake, just thinking out loud. (None of this debug info stuff is in the patch, it'll just come out naturally when I get back to the debug info proposals.)