thr3ads.net - llvm dev - [LLVMdev] [Debug Info + LTO] Type Uniquing for C types? [Oct 2013]

If this information is useful, please help other people find it:
Share via:

Manman Ren

2013-Oct-11 19:01 UTC

[LLVMdev] [Debug Info + LTO] Type Uniquing for C types?

On Fri, Oct 11, 2013 at 11:48 AM, Eric Christopher <echristo at
gmail.com>wrote:
> > With C++'s ODR, we are able to unique C++ types by using type
> identifiers to
> > refer to types.
> > Type identifiers are generated by C++ mangler. What about languages
> without
> > ODR? Should we unique C types as well?
> >
>
> We can, but the identifier will need to be constructed on, likely, a
> language dependent basis to ensure uniqueness.
>
> > One solution for C types is to generate a cross-CU unique identifier
for
> C
> > types. And before linking, we update all type identifiers in a source
> module
> > with the corresponding hash of the C types, then linking can continue
as
> > usual.
> >
>
> Yes.
>
> > This requires clang to generate a cross-CU unique identifier for C
types
> > (one simple scheme is using a identifier that is unique within the CU
and
> > concatenating the CU's file name). And it also requires hashing of
C
> types
> > at DebugInfo IR level. We can add an API such as
> > updateTypeIdentifiers(Module *), linker can call it right before
linking
> in
> > a source module.
> >
>
> I think the easiest design you'll get for uniquing C types that are
> named the same thing (i.e. type defined in a .h file) is to use the
> name of the struct combined with the file (and possibly line/column)
> as an identifier.

Since we don't have ODR, we may have macros defined differently for a
struct in a .h file,
thus having two versions of the struct from two different CU. It seems that
we can't assume
structs with the same name and defined in the same file/line/column are the
same.

> If you want to unify by structure then you'll need
> to do something the equivalent to the type hashing that we're
> implementing in the back end, but that'll be more difficult to
> construct via the front end - it may be possible though.
>
Hashing the types can happen either at the front end or at IR level. That
is our first design choice :)

I think we should try not to hash the types for non-LTO builds at the front
end or at IR level, since it does not give us
any benefit given that we are hashing them at the back end.

One advantage of hashing it at IR level is that we can just hash the
MDNodes that affect the
type MDNode, at front end, the AST contains more information and should be
harder to hash.

Thanks,
Manman

>
> -eric
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131011/fe4f9f9c/attachment.html>

Eric Christopher

2013-Oct-11 19:07 UTC

head link

[LLVMdev] [Debug Info + LTO] Type Uniquing for C types?

>
> Since we don't have ODR, we may have macros defined differently for a
struct
> in a .h file,
> thus having two versions of the struct from two different CU. It seems that
> we can't assume
> structs with the same name and defined in the same file/line/column are the
> same.
>
Ah right sorry, I remember this. Also, macros are evil, just ask the
modules guys :)
> Hashing the types can happen either at the front end or at IR level. That
is
> our first design choice :)
>
Sorta :)
> I think we should try not to hash the types for non-LTO builds at the front
> end or at IR level, since it does not give us
> any benefit given that we are hashing them at the back end.
>
> One advantage of hashing it at IR level is that we can just hash the
MDNodes
> that affect the
> type MDNode, at front end, the AST contains more information and should be
> harder to hash.
It depends upon the goals. If the goal is to make debug information
post-link smaller then just using the type hashing machinery for
structs will be sufficient. However, if it's to save space during an
LTO link then we'll want to do it in the front end.

Doug: Have a link for how you do the C type merging for modules?

-eric

Manman Ren

2013-Oct-11 19:19 UTC

head link

[LLVMdev] [Debug Info + LTO] Type Uniquing for C types?

On Fri, Oct 11, 2013 at 12:07 PM, Eric Christopher <echristo at
gmail.com>wrote:
> >
> > Since we don't have ODR, we may have macros defined differently
for a
> struct
> > in a .h file,
> > thus having two versions of the struct from two different CU. It seems
> that
> > we can't assume
> > structs with the same name and defined in the same file/line/column
are
> the
> > same.
> >
>
> Ah right sorry, I remember this. Also, macros are evil, just ask the
> modules guys :)
>
> > Hashing the types can happen either at the front end or at IR level.
> That is
> > our first design choice :)
> >
>
> Sorta :)
>
> > I think we should try not to hash the types for non-LTO builds at the
> front
> > end or at IR level, since it does not give us
> > any benefit given that we are hashing them at the back end.
> >
> > One advantage of hashing it at IR level is that we can just hash the
> MDNodes
> > that affect the
> > type MDNode, at front end, the AST contains more information and
should
> be
> > harder to hash.
>
> It depends upon the goals. If the goal is to make debug information
> post-link smaller then just using the type hashing machinery for
> structs will be sufficient.

By "the type hashing machinery for structs", are you referring to the
type
hashing at the back end?

> However, if it's to save space during an
> LTO link then we'll want to do it in the front end.
>
Yes, my purpose here is to save memory space in number of MDNodes (also #
of DIEs) generated in a LTO build.
Type hashing at the DIE level can reduce the dwarf size.

Manman

>
> Doug: Have a link for how you do the C type merging for modules?
>
> -eric
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131011/7b704efe/attachment.html>

Douglas Gregor

2013-Oct-14 22:49 UTC

head link

[LLVMdev] [Debug Info + LTO] Type Uniquing for C types?

On Oct 11, 2013, at 12:07 PM, Eric Christopher <echristo at gmail.com>
wrote:
>> 
>> Since we don't have ODR, we may have macros defined differently for
a struct
>> in a .h file,
>> thus having two versions of the struct from two different CU. It seems
that
>> we can't assume
>> structs with the same name and defined in the same file/line/column are
the
>> same.
>> 
> 
> Ah right sorry, I remember this. Also, macros are evil, just ask the
> modules guys :)
> 
>> Hashing the types can happen either at the front end or at IR level.
That is
>> our first design choice :)
>> 
> 
> Sorta :)
> 
>> I think we should try not to hash the types for non-LTO builds at the
front
>> end or at IR level, since it does not give us
>> any benefit given that we are hashing them at the back end.
>> 
>> One advantage of hashing it at IR level is that we can just hash the
MDNodes
>> that affect the
>> type MDNode, at front end, the AST contains more information and should
be
>> harder to hash.
> 
> It depends upon the goals. If the goal is to make debug information
> post-link smaller then just using the type hashing machinery for
> structs will be sufficient. However, if it's to save space during an
> LTO link then we'll want to do it in the front end.
> 
> Doug: Have a link for how you do the C type merging for modules?

Modules foists the C++ one definition rule on C/Objective-C so that it can avoid
performing type merging, so we can’t look there.

C doesn’t have a one definition rule per se. The cross-translation-unit
compatibility rules are in 6.2.7 of the C standard, which boils down to
structural equality:

Moreover, two structure, union, or enumerated types declared in separate
translation units are compatible if their tags and members satisfy the following
requirements: If one is declared with a tag, the other shall be declared with
the same tag. If both are complete types, then the following additional
requirements apply: there shall be a one-to-one correspondence between their
members such that each pair of corresponding members are declared with
compatible types, and such that if one member of a corresponding pair is
declared with a name, the other member is declared with the same name. For two
structures, corresponding members shall be declared in the same order. For two
structures or unions, corresponding bit-fields shall have the same widths. For
two enumerations, corresponding members shall have the same values.

	- Doug
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131014/34b8ea92/attachment.html>

Reasonably Related Threads

Search for more reasonably related threads

llvm dev - Oct 2013 - [LLVMdev] [Debug Info + LTO] Type Uniquing for C types?

[LLVMdev] [Debug Info + LTO] Type Uniquing for C types?

[LLVMdev] [Debug Info + LTO] Type Uniquing for C types?

[LLVMdev] [Debug Info + LTO] Type Uniquing for C types?

[LLVMdev] [Debug Info + LTO] Type Uniquing for C types?

Reasonably Related Threads