thr3ads.net - llvm dev - [LLVMdev] RFC: Metadata attachments to function definitions [Apr 2015]

If this information is useful, please help other people find it:
Share via:

Duncan P. N. Exon Smith

2015-Apr-15 05:59 UTC

[LLVMdev] RFC: Metadata attachments to function definitions

> On 2015 Apr 14, at 21:46, David Blaikie <dblaikie at gmail.com>
wrote:
> 
> On Tue, Apr 14, 2015 at 9:33 PM, Duncan P. N. Exon Smith
> <dexonsmith at apple.com> wrote:
>> 
>> `Function` definitions should support `MDNode` attachments, with a
>> similar syntax to instructions:
>> 
>>    define void @foo() nounwind !attach !0 {
>>      unreachable
>>    }
>>    !0 = !{}
>> 
>> Attachments wouldn't be allowed on declarations, just definitions.
>> 
>> There are two open problems this can help with:
>> 
>> 1. For PGO, we need somewhere to attach the function entry count.
>>    Attaching to the function definition is a simple solution.
> 
> No comment - can't say I know anything about that.
> 
>> 
>>        define void @foo() !prof !0 {
>>          unreachable
>>        }
>>        !0 = !{i32 987}
>> 
>> 2. In debug info, we repeatedly build up a map from `Function` to the
>>    canonical `MDSubrogram` for it.
> 
> Sounds great - I'd imagine this working somewhat like the way I've
> made implicit special members & other non-standard members of class
> types work in the debug info metadata, which is to say that the
> children reference the parent, but the parent doesn't reference the
> children (in this case, that would mean things like comdat folding in
> LTO would 'just work' - whichever function we picked would keep its
> debug info and attachment to its CU - the other CU would just appear
> not to have that function anymore - we might need a special case for
> comdat folding where one copy has debug info and another copy doesn't,
> make sure we move the metadata over if we're picking one without debug
> info).
> 
> Though that will mean that whenever we want to walk all the
> subprograms of a CU, we'd have to build it by walking all the
> Functions in a module - it might be worth checking if/when/where we
> "need" to do that - I suspect it's rare and maybe can be made
entirely
> unnecessary.
I *think* we only do this once, and it's (ironically) to create the
"Subprogram -> CompileUnit" map.

That highlights a problem that my proposal doesn't solve on its own.
While this proposal creates a "Function -> Subprogram" map, there
still
isn't a "Subprogram -> CompileUnit" map -- that would still
(for now)
be stored implicitly via `MDCompileUnit::getSubprograms()`.

Why isn't this map encoded in the `scope:` chain?  Because many of the
`scope:` chains currently terminate at `MDFile`s or `null` instead of
`MDCompileUnit`s.  Why?  Because LTO type uniquing needs scope chains
to be identical.  (I have a vague plan for fixing this, too: (1) move
ownership of Metadata to the Module (so metadata isn't leaked by
`lto_module_dispose()`), (2) add a "StringRef -> MDType" map to the
Module (only for types with an ODR-style UUID), (3) delete the concept
of `MDString`-based type refs and update lib/Linker to rely on the
"StringRef -> MDType" map in the destination Module, (4) make all
`scope:` chains terminate at an `MDCompileUnit` and drop "scope"-ness
of `MDFile`, and (5) finally drop the `subprograms:` field from
`MDCompileUnit`.  But I'm not confident about step 4 yet.)
> 
>> Keeping this mapping accurate takes
>>    subtle logic in `lib/Linker` (see PR21910/PR22792) and it's
>>    expensive to compute and maintain.  Attaching it directly to the
>>    `Function` designs away the problem.
>> 
>>        define void @foo() !dbg !0 {
>>          unreachable
>>        }
>>        !0 = !MDSubprogram(name: "foo", function: void ()*
@foo)
>> 
>> Thoughts?
>> 
>> Moving onto implementation, I'd provide the same generic API that
>> `Instruction` has, and wouldn't bother with the "fast
path" API for
>> `!dbg`.  Moreover, the generic path wouldn't be slow.  Since there
are
>> fewer functions than instructions, we can afford to store the
>> attachments directly on the `Function` instead of off in the
>> `LLVMContext`.
>> 
>> It's not clear to me just how precious memory is in `Function`;
IIRC
>> it's sitting at 168B right now for x86-64.  IMO, a
`SmallVector<..., 1>`
>> -- cost of 64B -- seems fine.  I'll start with this if I don't
hear any
>> objections; we can optimize later if necessary.
>> 
>> Otherwise, I could hack together a custom vector-like object with the
>> "small string" optimization.
> 
> Not important, but I'm missing something: what're you picturing
that
> would be different from/better than SmallVector?
> 
Data storage would be:

    struct Data {
      struct value_type {
        unsigned Tag;
        TrackingMDNodeRef MD;
      };

      unsigned Capacity;
      union {
        unsigned LargeSize;
        unsigned SmallTag;
      } Unsigned;

      AlignedCharArrayUnion<
          value_type *,     // LargeArray
          TrackingMDNodeRef // SmallMD
        > Pointer;
    };
    static_assert(sizeof(Data) == sizeof(void *) + sizeof(unsigned) * 2,
                  "Wrong size");

Two advantages over `SmallVector<value_type, 1>`:

  - 32-bits each for size and capacity (instead of 64-bits).
  - Domain knowledge of `value_type` allows aggressive unions.

Can't provide as much API as `std::vector<>`, but the API for metadata
attachments is small and opaque:

    bool hasMetadata(unsigned Tag) const;
    MDNode *getMetadata(unsigned Tag) const;
    void setMetadata(unsigned Tag, const MDNode *MD);
    void getAllMetadata(
        SmallVectorImpl<std::pair<unsigned, const MDNode *>>
&MDs) const;
>>  Cost would be 16B per `Function`, with the
>> same malloc/heap costs as `SmallVector<..., 1>`.  But I
haven't seen a
>> profile that suggests this would be worth the complexity...
>> 
>> Any opinions?
>>

Reid Kleckner

2015-Apr-15 15:59 UTC

head link

[LLVMdev] RFC: Metadata attachments to function definitions

Conventional wisdom, which is often wrong, says that bytes in Function are
precious. In the past we've bent over backwards to put bits on Function and
have DenseMaps in LLVMContexts and Modules. I think this is probably the
*wrong* approach for debug info, which, when it's being used, is used
everywhere.

Have you considered something like TinyPtrVector, for a cost of one pointer
when it's not being used? It has higher access costs than SmallVector, but
when there's exactly one piece of metadata (the subprogram), it's pretty
good.

On Tue, Apr 14, 2015 at 10:59 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:
>
> > On 2015 Apr 14, at 21:46, David Blaikie <dblaikie at gmail.com>
wrote:
> >
> > On Tue, Apr 14, 2015 at 9:33 PM, Duncan P. N. Exon Smith
> > <dexonsmith at apple.com> wrote:
> >>
> >> `Function` definitions should support `MDNode` attachments, with a
> >> similar syntax to instructions:
> >>
> >>    define void @foo() nounwind !attach !0 {
> >>      unreachable
> >>    }
> >>    !0 = !{}
> >>
> >> Attachments wouldn't be allowed on declarations, just
definitions.
> >>
> >> There are two open problems this can help with:
> >>
> >> 1. For PGO, we need somewhere to attach the function entry count.
> >>    Attaching to the function definition is a simple solution.
> >
> > No comment - can't say I know anything about that.
> >
> >>
> >>        define void @foo() !prof !0 {
> >>          unreachable
> >>        }
> >>        !0 = !{i32 987}
> >>
> >> 2. In debug info, we repeatedly build up a map from `Function` to
the
> >>    canonical `MDSubrogram` for it.
> >
> > Sounds great - I'd imagine this working somewhat like the way
I've
> > made implicit special members & other non-standard members of
class
> > types work in the debug info metadata, which is to say that the
> > children reference the parent, but the parent doesn't reference
the
> > children (in this case, that would mean things like comdat folding in
> > LTO would 'just work' - whichever function we picked would
keep its
> > debug info and attachment to its CU - the other CU would just appear
> > not to have that function anymore - we might need a special case for
> > comdat folding where one copy has debug info and another copy
doesn't,
> > make sure we move the metadata over if we're picking one without
debug
> > info).
> >
> > Though that will mean that whenever we want to walk all the
> > subprograms of a CU, we'd have to build it by walking all the
> > Functions in a module - it might be worth checking if/when/where we
> > "need" to do that - I suspect it's rare and maybe can be
made entirely
> > unnecessary.
>
> I *think* we only do this once, and it's (ironically) to create the
> "Subprogram -> CompileUnit" map.
>
> That highlights a problem that my proposal doesn't solve on its own.
> While this proposal creates a "Function -> Subprogram" map,
there still
> isn't a "Subprogram -> CompileUnit" map -- that would
still (for now)
> be stored implicitly via `MDCompileUnit::getSubprograms()`.
>
> Why isn't this map encoded in the `scope:` chain?  Because many of the
> `scope:` chains currently terminate at `MDFile`s or `null` instead of
> `MDCompileUnit`s.  Why?  Because LTO type uniquing needs scope chains
> to be identical.  (I have a vague plan for fixing this, too: (1) move
> ownership of Metadata to the Module (so metadata isn't leaked by
> `lto_module_dispose()`), (2) add a "StringRef -> MDType" map
to the
> Module (only for types with an ODR-style UUID), (3) delete the concept
> of `MDString`-based type refs and update lib/Linker to rely on the
> "StringRef -> MDType" map in the destination Module, (4) make
all
> `scope:` chains terminate at an `MDCompileUnit` and drop
"scope"-ness
> of `MDFile`, and (5) finally drop the `subprograms:` field from
> `MDCompileUnit`.  But I'm not confident about step 4 yet.)
>
> >
> >> Keeping this mapping accurate takes
> >>    subtle logic in `lib/Linker` (see PR21910/PR22792) and it's
> >>    expensive to compute and maintain.  Attaching it directly to
the
> >>    `Function` designs away the problem.
> >>
> >>        define void @foo() !dbg !0 {
> >>          unreachable
> >>        }
> >>        !0 = !MDSubprogram(name: "foo", function: void
()* @foo)
> >>
> >> Thoughts?
> >>
> >> Moving onto implementation, I'd provide the same generic API
that
> >> `Instruction` has, and wouldn't bother with the "fast
path" API for
> >> `!dbg`.  Moreover, the generic path wouldn't be slow.  Since
there are
> >> fewer functions than instructions, we can afford to store the
> >> attachments directly on the `Function` instead of off in the
> >> `LLVMContext`.
> >>
> >> It's not clear to me just how precious memory is in
`Function`; IIRC
> >> it's sitting at 168B right now for x86-64.  IMO, a
`SmallVector<..., 1>`
> >> -- cost of 64B -- seems fine.  I'll start with this if I
don't hear any
> >> objections; we can optimize later if necessary.
> >>
> >> Otherwise, I could hack together a custom vector-like object with
the
> >> "small string" optimization.
> >
> > Not important, but I'm missing something: what're you
picturing that
> > would be different from/better than SmallVector?
> >
>
> Data storage would be:
>
>     struct Data {
>       struct value_type {
>         unsigned Tag;
>         TrackingMDNodeRef MD;
>       };
>
>       unsigned Capacity;
>       union {
>         unsigned LargeSize;
>         unsigned SmallTag;
>       } Unsigned;
>
>       AlignedCharArrayUnion<
>           value_type *,     // LargeArray
>           TrackingMDNodeRef // SmallMD
>         > Pointer;
>     };
>     static_assert(sizeof(Data) == sizeof(void *) + sizeof(unsigned) * 2,
>                   "Wrong size");
>
> Two advantages over `SmallVector<value_type, 1>`:
>
>   - 32-bits each for size and capacity (instead of 64-bits).
>   - Domain knowledge of `value_type` allows aggressive unions.
>
> Can't provide as much API as `std::vector<>`, but the API for
metadata
> attachments is small and opaque:
>
>     bool hasMetadata(unsigned Tag) const;
>     MDNode *getMetadata(unsigned Tag) const;
>     void setMetadata(unsigned Tag, const MDNode *MD);
>     void getAllMetadata(
>         SmallVectorImpl<std::pair<unsigned, const MDNode *>>
&MDs) const;
>
> >>  Cost would be 16B per `Function`, with the
> >> same malloc/heap costs as `SmallVector<..., 1>`.  But I
haven't seen a
> >> profile that suggests this would be worth the complexity...
> >>
> >> Any opinions?
> >>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150415/cec52832/attachment.html>

Duncan P. N. Exon Smith

2015-Apr-15 16:46 UTC

head link

[LLVMdev] RFC: Metadata attachments to function definitions

> On 2015-Apr-15, at 08:59, Reid Kleckner <rnk at google.com> wrote:
> 
> Conventional wisdom, which is often wrong, says that bytes in Function are
precious. In the past we've bent over backwards to put bits on Function and
have DenseMaps in LLVMContexts and Modules. I think this is probably the *wrong*
approach for debug info, which, when it's being used, is used everywhere.
Right: if we can afford a `DebugLoc` on `Instruction`, I figure we can
afford a couple of pointers on `Function`.

But maybe a `SmallVector<..., 1>` is too much?  (BTW, it would only be
40B, not 64B, although my math error is probably irrelevant.)

I'm happy to defer to conventional wisdom here if anyone wants me to
(and TBH, I've only been looking at profiles that include debug info,
so maybe `sizeof(Function)` matters with -g0).
> Have you considered something like TinyPtrVector, for a cost of one pointer
when it's not being used? It has higher access costs than SmallVector, but
when there's exactly one piece of metadata (the subprogram), it's pretty
good.
Interesting -- I didn't know we had this!  Two reasons it can't be used
directly:

  - Here, the element type is actually a `std::pair<>`, with `.first`
    being a tag for the type of attachment (really this is a low-mem
    map from tag to pointer).
  - The pointer type needs to have `MetadataTracking` support so that
    LLParser, BitcodeReader, and DIBuilder can assign a temporary
    attachment that later gets RAUW'ed.

However, if I'm going to roll a custom type anyway, I might as well go
full hog:

  - Make the "large" vector `std::pair<unsigned,
TrackingMDNodeRef>`.
  - Reserve the "small" vector for the `!dbg` tag.
  - Add custom hooks for `MetadataTracking` for the "small" version,
    allowing a raw `MDSubprogram*` instead of a `TrackingMDNodeRef`.

This would squeeze it all into 8B, but require the "large" vector
whenever we have a non-`!dbg` attachment (even if there's just one
attachment).

Another possible opportunity here:  bringing it down to pointer size
means we could share the implementation with `Instruction` (not that
I've seen instruction attachments show up in any sort of profile...).
> 
> On Tue, Apr 14, 2015 at 10:59 PM, Duncan P. N. Exon Smith <dexonsmith at
apple.com> wrote:
> 
> > On 2015 Apr 14, at 21:46, David Blaikie <dblaikie at gmail.com>
wrote:
> >
> > On Tue, Apr 14, 2015 at 9:33 PM, Duncan P. N. Exon Smith
> > <dexonsmith at apple.com> wrote:
> >>
> >> `Function` definitions should support `MDNode` attachments, with a
> >> similar syntax to instructions:
> >>
> >>    define void @foo() nounwind !attach !0 {
> >>      unreachable
> >>    }
> >>    !0 = !{}
> >>
> >> Attachments wouldn't be allowed on declarations, just
definitions.
> >>
> >> There are two open problems this can help with:
> >>
> >> 1. For PGO, we need somewhere to attach the function entry count.
> >>    Attaching to the function definition is a simple solution.
> >
> > No comment - can't say I know anything about that.
> >
> >>
> >>        define void @foo() !prof !0 {
> >>          unreachable
> >>        }
> >>        !0 = !{i32 987}
> >>
> >> 2. In debug info, we repeatedly build up a map from `Function` to
the
> >>    canonical `MDSubrogram` for it.
> >
> > Sounds great - I'd imagine this working somewhat like the way
I've
> > made implicit special members & other non-standard members of
class
> > types work in the debug info metadata, which is to say that the
> > children reference the parent, but the parent doesn't reference
the
> > children (in this case, that would mean things like comdat folding in
> > LTO would 'just work' - whichever function we picked would
keep its
> > debug info and attachment to its CU - the other CU would just appear
> > not to have that function anymore - we might need a special case for
> > comdat folding where one copy has debug info and another copy
doesn't,
> > make sure we move the metadata over if we're picking one without
debug
> > info).
> >
> > Though that will mean that whenever we want to walk all the
> > subprograms of a CU, we'd have to build it by walking all the
> > Functions in a module - it might be worth checking if/when/where we
> > "need" to do that - I suspect it's rare and maybe can be
made entirely
> > unnecessary.
> 
> I *think* we only do this once, and it's (ironically) to create the
> "Subprogram -> CompileUnit" map.
> 
> That highlights a problem that my proposal doesn't solve on its own.
> While this proposal creates a "Function -> Subprogram" map,
there still
> isn't a "Subprogram -> CompileUnit" map -- that would
still (for now)
> be stored implicitly via `MDCompileUnit::getSubprograms()`.
> 
> Why isn't this map encoded in the `scope:` chain?  Because many of the
> `scope:` chains currently terminate at `MDFile`s or `null` instead of
> `MDCompileUnit`s.  Why?  Because LTO type uniquing needs scope chains
> to be identical.  (I have a vague plan for fixing this, too: (1) move
> ownership of Metadata to the Module (so metadata isn't leaked by
> `lto_module_dispose()`), (2) add a "StringRef -> MDType" map
to the
> Module (only for types with an ODR-style UUID), (3) delete the concept
> of `MDString`-based type refs and update lib/Linker to rely on the
> "StringRef -> MDType" map in the destination Module, (4) make
all
> `scope:` chains terminate at an `MDCompileUnit` and drop
"scope"-ness
> of `MDFile`, and (5) finally drop the `subprograms:` field from
> `MDCompileUnit`.  But I'm not confident about step 4 yet.)
> 
> >
> >> Keeping this mapping accurate takes
> >>    subtle logic in `lib/Linker` (see PR21910/PR22792) and it's
> >>    expensive to compute and maintain.  Attaching it directly to
the
> >>    `Function` designs away the problem.
> >>
> >>        define void @foo() !dbg !0 {
> >>          unreachable
> >>        }
> >>        !0 = !MDSubprogram(name: "foo", function: void
()* @foo)
> >>
> >> Thoughts?
> >>
> >> Moving onto implementation, I'd provide the same generic API
that
> >> `Instruction` has, and wouldn't bother with the "fast
path" API for
> >> `!dbg`.  Moreover, the generic path wouldn't be slow.  Since
there are
> >> fewer functions than instructions, we can afford to store the
> >> attachments directly on the `Function` instead of off in the
> >> `LLVMContext`.
> >>
> >> It's not clear to me just how precious memory is in
`Function`; IIRC
> >> it's sitting at 168B right now for x86-64.  IMO, a
`SmallVector<..., 1>`
> >> -- cost of 64B
In case anyone's trying to figure out where I got 64B from:

 1. 3 * sizeof(void *) + sizeof(std::pair<unsigned, void *>)
 2. 3 * 8B + 16B
 3. 48B !? + 16B
 4. 64B

There's an error in line 3 :/.  The cost would actually be 40B.  Not sure
if it's a relevant difference.
> -- seems fine.  I'll start with this if I don't hear any
> >> objections; we can optimize later if necessary.
> >>
> >> Otherwise, I could hack together a custom vector-like object with
the
> >> "small string" optimization.
> >
> > Not important, but I'm missing something: what're you
picturing that
> > would be different from/better than SmallVector?
> >
> 
> Data storage would be:
> 
>     struct Data {
>       struct value_type {
>         unsigned Tag;
>         TrackingMDNodeRef MD;
>       };
> 
>       unsigned Capacity;
>       union {
>         unsigned LargeSize;
>         unsigned SmallTag;
>       } Unsigned;
> 
>       AlignedCharArrayUnion<
>           value_type *,     // LargeArray
>           TrackingMDNodeRef // SmallMD
>         > Pointer;
>     };
>     static_assert(sizeof(Data) == sizeof(void *) + sizeof(unsigned) * 2,
>                   "Wrong size");
> 
> Two advantages over `SmallVector<value_type, 1>`:
> 
>   - 32-bits each for size and capacity (instead of 64-bits).
>   - Domain knowledge of `value_type` allows aggressive unions.
> 
> Can't provide as much API as `std::vector<>`, but the API for
metadata
> attachments is small and opaque:
> 
>     bool hasMetadata(unsigned Tag) const;
>     MDNode *getMetadata(unsigned Tag) const;
>     void setMetadata(unsigned Tag, const MDNode *MD);
>     void getAllMetadata(
>         SmallVectorImpl<std::pair<unsigned, const MDNode *>>
&MDs) const;
> 
> >>  Cost would be 16B per `Function`, with the
> >> same malloc/heap costs as `SmallVector<..., 1>`.  But I
haven't seen a
> >> profile that suggests this would be worth the complexity...
> >>
> >> Any opinions?
> >>
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

David Blaikie

2015-Apr-15 17:06 UTC

head link

[LLVMdev] RFC: Metadata attachments to function definitions

On Tue, Apr 14, 2015 at 10:59 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:
>
> > On 2015 Apr 14, at 21:46, David Blaikie <dblaikie at gmail.com>
wrote:
> >
> > On Tue, Apr 14, 2015 at 9:33 PM, Duncan P. N. Exon Smith
> > <dexonsmith at apple.com> wrote:
> >>
> >> `Function` definitions should support `MDNode` attachments, with a
> >> similar syntax to instructions:
> >>
> >>    define void @foo() nounwind !attach !0 {
> >>      unreachable
> >>    }
> >>    !0 = !{}
> >>
> >> Attachments wouldn't be allowed on declarations, just
definitions.
> >>
> >> There are two open problems this can help with:
> >>
> >> 1. For PGO, we need somewhere to attach the function entry count.
> >>    Attaching to the function definition is a simple solution.
> >
> > No comment - can't say I know anything about that.
> >
> >>
> >>        define void @foo() !prof !0 {
> >>          unreachable
> >>        }
> >>        !0 = !{i32 987}
> >>
> >> 2. In debug info, we repeatedly build up a map from `Function` to
the
> >>    canonical `MDSubrogram` for it.
> >
> > Sounds great - I'd imagine this working somewhat like the way
I've
> > made implicit special members & other non-standard members of
class
> > types work in the debug info metadata, which is to say that the
> > children reference the parent, but the parent doesn't reference
the
> > children (in this case, that would mean things like comdat folding in
> > LTO would 'just work' - whichever function we picked would
keep its
> > debug info and attachment to its CU - the other CU would just appear
> > not to have that function anymore - we might need a special case for
> > comdat folding where one copy has debug info and another copy
doesn't,
> > make sure we move the metadata over if we're picking one without
debug
> > info).
> >
> > Though that will mean that whenever we want to walk all the
> > subprograms of a CU, we'd have to build it by walking all the
> > Functions in a module - it might be worth checking if/when/where we
> > "need" to do that - I suspect it's rare and maybe can be
made entirely
> > unnecessary.
>
> I *think* we only do this once, and it's (ironically) to create the
> "Subprogram -> CompileUnit" map.
>
Right, I thought that might be the case.

> That highlights a problem that my proposal doesn't solve on its own.
> While this proposal creates a "Function -> Subprogram" map,
there still
> isn't a "Subprogram -> CompileUnit" map -- that would
still (for now)
> be stored implicitly via `MDCompileUnit::getSubprograms()`.
>
I guess this is (also?) what I was thinking about.

> Why isn't this map encoded in the `scope:` chain?  Because many of the
> `scope:` chains currently terminate at `MDFile`s or `null` instead of
> `MDCompileUnit`s.  Why?  Because LTO type uniquing needs scope chains
> to be identical.

Ah, right.

(side note: sometimes need to end in MDFile or we might need an equivalent
of DILexicalBlockFile for the CU - to switch files within the same CU
(things defined in headers, etc))

>   (I have a vague plan for fixing this, too: (1) move
> ownership of Metadata to the Module (so metadata isn't leaked by
> `lto_module_dispose()`), (2) add a "StringRef -> MDType" map
to the
> Module (only for types with an ODR-style UUID), (3) delete the concept
> of `MDString`-based type refs and update lib/Linker to rely on the
> "StringRef -> MDType" map in the destination Module, (4) make
all
> `scope:` chains terminate at an `MDCompileUnit` and drop
"scope"-ness
> of `MDFile`, and (5) finally drop the `subprograms:` field from
> `MDCompileUnit`.  But I'm not confident about step 4 yet.)
>
Sounds plausible.

(side note: any plans to do the scope-based textual IR that was thrown
around during the prototype stages? It'd be another readability (&
writability) improvement to not have to manually walk all the scope chains.

Anyway, maybe after most/all the functionality improvements are made we
could do a maintainability pass & see whether we could get to a practically
writable/readable format... I'm still not sure how likely that is, given
the fact that debug info necessarily /describes/ the source the user wrote,
so it's always going to be more verbose, but maybe it's achievable)

>
> >
> >> Keeping this mapping accurate takes
> >>    subtle logic in `lib/Linker` (see PR21910/PR22792) and it's
> >>    expensive to compute and maintain.  Attaching it directly to
the
> >>    `Function` designs away the problem.
> >>
> >>        define void @foo() !dbg !0 {
> >>          unreachable
> >>        }
> >>        !0 = !MDSubprogram(name: "foo", function: void
()* @foo)
> >>
> >> Thoughts?
> >>
> >> Moving onto implementation, I'd provide the same generic API
that
> >> `Instruction` has, and wouldn't bother with the "fast
path" API for
> >> `!dbg`.  Moreover, the generic path wouldn't be slow.  Since
there are
> >> fewer functions than instructions, we can afford to store the
> >> attachments directly on the `Function` instead of off in the
> >> `LLVMContext`.
> >>
> >> It's not clear to me just how precious memory is in
`Function`; IIRC
> >> it's sitting at 168B right now for x86-64.  IMO, a
`SmallVector<..., 1>`
> >> -- cost of 64B -- seems fine.  I'll start with this if I
don't hear any
> >> objections; we can optimize later if necessary.
> >>
> >> Otherwise, I could hack together a custom vector-like object with
the
> >> "small string" optimization.
> >
> > Not important, but I'm missing something: what're you
picturing that
> > would be different from/better than SmallVector?
> >
>
> Data storage would be:
>
>     struct Data {
>       struct value_type {
>         unsigned Tag;
>         TrackingMDNodeRef MD;
>       };
>
>       unsigned Capacity;
>       union {
>         unsigned LargeSize;
>         unsigned SmallTag;
>       } Unsigned;
>
>       AlignedCharArrayUnion<
>           value_type *,     // LargeArray
>           TrackingMDNodeRef // SmallMD
>         > Pointer;
>     };
>     static_assert(sizeof(Data) == sizeof(void *) + sizeof(unsigned) * 2,
>                   "Wrong size");
>
> Two advantages over `SmallVector<value_type, 1>`:
>
>   - 32-bits each for size and capacity (instead of 64-bits).
>   - Domain knowledge of `value_type` allows aggressive unions.
>
> Can't provide as much API as `std::vector<>`, but the API for
metadata
> attachments is small and opaque:
>
>     bool hasMetadata(unsigned Tag) const;
>     MDNode *getMetadata(unsigned Tag) const;
>     void setMetadata(unsigned Tag, const MDNode *MD);
>     void getAllMetadata(
>         SmallVectorImpl<std::pair<unsigned, const MDNode *>>
&MDs) const;
>
> >>  Cost would be 16B per `Function`, with the
> >> same malloc/heap costs as `SmallVector<..., 1>`.  But I
haven't seen a
> >> profile that suggests this would be worth the complexity...
> >>
> >> Any opinions?
> >>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150415/ab9af7c7/attachment.html>

Duncan P. N. Exon Smith

2015-Apr-18 02:15 UTC

head link

[LLVMdev] RFC: Metadata attachments to function definitions

> On 2015 Apr 15, at 10:06, David Blaikie <dblaikie at gmail.com>
wrote:
> 
> 
> 
> On Tue, Apr 14, 2015 at 10:59 PM, Duncan P. N. Exon Smith <dexonsmith at
apple.com> wrote:
> 
> > On 2015 Apr 14, at 21:46, David Blaikie <dblaikie at gmail.com>
wrote:
> >
> > On Tue, Apr 14, 2015 at 9:33 PM, Duncan P. N. Exon Smith
> > <dexonsmith at apple.com> wrote:
> >>
> >> `Function` definitions should support `MDNode` attachments, with a
> >> similar syntax to instructions:
> >>
> >>    define void @foo() nounwind !attach !0 {
> >>      unreachable
> >>    }
> >>    !0 = !{}
> >>
> >> Attachments wouldn't be allowed on declarations, just
definitions.
> >>
> >> There are two open problems this can help with:
> >>
> >> 1. For PGO, we need somewhere to attach the function entry count.
> >>    Attaching to the function definition is a simple solution.
> >
> > No comment - can't say I know anything about that.
> >
> >>
> >>        define void @foo() !prof !0 {
> >>          unreachable
> >>        }
> >>        !0 = !{i32 987}
> >>
> >> 2. In debug info, we repeatedly build up a map from `Function` to
the
> >>    canonical `MDSubrogram` for it.
> >
> > Sounds great - I'd imagine this working somewhat like the way
I've
> > made implicit special members & other non-standard members of
class
> > types work in the debug info metadata, which is to say that the
> > children reference the parent, but the parent doesn't reference
the
> > children (in this case, that would mean things like comdat folding in
> > LTO would 'just work' - whichever function we picked would
keep its
> > debug info and attachment to its CU - the other CU would just appear
> > not to have that function anymore - we might need a special case for
> > comdat folding where one copy has debug info and another copy
doesn't,
> > make sure we move the metadata over if we're picking one without
debug
> > info).
> >
> > Though that will mean that whenever we want to walk all the
> > subprograms of a CU, we'd have to build it by walking all the
> > Functions in a module - it might be worth checking if/when/where we
> > "need" to do that - I suspect it's rare and maybe can be
made entirely
> > unnecessary.
> 
> I *think* we only do this once, and it's (ironically) to create the
> "Subprogram -> CompileUnit" map.
> 
> Right, I thought that might be the case.
>  
> That highlights a problem that my proposal doesn't solve on its own.
> While this proposal creates a "Function -> Subprogram" map,
there still
> isn't a "Subprogram -> CompileUnit" map -- that would
still (for now)
> be stored implicitly via `MDCompileUnit::getSubprograms()`.
> 
> I guess this is (also?) what I was thinking about.
>  
> Why isn't this map encoded in the `scope:` chain?  Because many of the
> `scope:` chains currently terminate at `MDFile`s or `null` instead of
> `MDCompileUnit`s.  Why?  Because LTO type uniquing needs scope chains
> to be identical.
> 
> Ah, right.
> 
> (side note: sometimes need to end in MDFile or we might need an equivalent
of DILexicalBlockFile for the CU - to switch files within the same CU (things
defined in headers, etc))
Ah, okay.  I thought we could just replace them with pointers to the
compile unit.  Something like `DIFileScope` with `scope:` and
`file:` fields would probably work?  (Which means I shouldn't have
merged the two types of "file" nodes when I introduced the new
hierarchy.  Boo.)
>  
>   (I have a vague plan for fixing this, too: (1) move
> ownership of Metadata to the Module (so metadata isn't leaked by
> `lto_module_dispose()`), (2) add a "StringRef -> MDType" map
to the
> Module (only for types with an ODR-style UUID), (3) delete the concept
> of `MDString`-based type refs and update lib/Linker to rely on the
> "StringRef -> MDType" map in the destination Module, (4) make
all
> `scope:` chains terminate at an `MDCompileUnit` and drop
"scope"-ness
> of `MDFile`, and (5) finally drop the `subprograms:` field from
> `MDCompileUnit`.  But I'm not confident about step 4 yet.)
> 
> Sounds plausible.
> 
> (side note: any plans to do the scope-based textual IR that was thrown
around during the prototype stages? It'd be another readability (&
writability) improvement to not have to manually walk all the scope chains.
Vague plans, but yes.  Doing it the way I want is blocked on
removing type refs (maybe among other things?).
> Anyway, maybe after most/all the functionality improvements are made we
could do a maintainability pass & see whether we could get to a practically
writable/readable format... I'm still not sure how likely that is, given the
fact that debug info necessarily /describes/ the source the user wrote, so
it's always going to be more verbose, but maybe it's achievable)
Yeah, I think this is a great idea.
>  
> 
> >
> >> Keeping this mapping accurate takes
> >>    subtle logic in `lib/Linker` (see PR21910/PR22792) and it's
> >>    expensive to compute and maintain.  Attaching it directly to
the
> >>    `Function` designs away the problem.
> >>
> >>        define void @foo() !dbg !0 {
> >>          unreachable
> >>        }
> >>        !0 = !MDSubprogram(name: "foo", function: void
()* @foo)
> >>
> >> Thoughts?
> >>
> >> Moving onto implementation, I'd provide the same generic API
that
> >> `Instruction` has, and wouldn't bother with the "fast
path" API for
> >> `!dbg`.  Moreover, the generic path wouldn't be slow.  Since
there are
> >> fewer functions than instructions, we can afford to store the
> >> attachments directly on the `Function` instead of off in the
> >> `LLVMContext`.
> >>
> >> It's not clear to me just how precious memory is in
`Function`; IIRC
> >> it's sitting at 168B right now for x86-64.  IMO, a
`SmallVector<..., 1>`
> >> -- cost of 64B -- seems fine.  I'll start with this if I
don't hear any
> >> objections; we can optimize later if necessary.
> >>
> >> Otherwise, I could hack together a custom vector-like object with
the
> >> "small string" optimization.
> >
> > Not important, but I'm missing something: what're you
picturing that
> > would be different from/better than SmallVector?
> >
> 
> Data storage would be:
> 
>     struct Data {
>       struct value_type {
>         unsigned Tag;
>         TrackingMDNodeRef MD;
>       };
> 
>       unsigned Capacity;
>       union {
>         unsigned LargeSize;
>         unsigned SmallTag;
>       } Unsigned;
> 
>       AlignedCharArrayUnion<
>           value_type *,     // LargeArray
>           TrackingMDNodeRef // SmallMD
>         > Pointer;
>     };
>     static_assert(sizeof(Data) == sizeof(void *) + sizeof(unsigned) * 2,
>                   "Wrong size");
> 
> Two advantages over `SmallVector<value_type, 1>`:
> 
>   - 32-bits each for size and capacity (instead of 64-bits).
>   - Domain knowledge of `value_type` allows aggressive unions.
> 
> Can't provide as much API as `std::vector<>`, but the API for
metadata
> attachments is small and opaque:
> 
>     bool hasMetadata(unsigned Tag) const;
>     MDNode *getMetadata(unsigned Tag) const;
>     void setMetadata(unsigned Tag, const MDNode *MD);
>     void getAllMetadata(
>         SmallVectorImpl<std::pair<unsigned, const MDNode *>>
&MDs) const;
> 
> >>  Cost would be 16B per `Function`, with the
> >> same malloc/heap costs as `SmallVector<..., 1>`.  But I
haven't seen a
> >> profile that suggests this would be worth the complexity...
> >>
> >> Any opinions?
> >>

llvm dev - Apr 2015 - [LLVMdev] RFC: Metadata attachments to function definitions

[LLVMdev] RFC: Metadata attachments to function definitions

[LLVMdev] RFC: Metadata attachments to function definitions

[LLVMdev] RFC: Metadata attachments to function definitions

[LLVMdev] RFC: Metadata attachments to function definitions

[LLVMdev] RFC: Metadata attachments to function definitions