thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm [Apr 2016]

If this information is useful, please help other people find it:
Share via:

Eric Christopher via llvm-dev

2016-Mar-30 06:50 UTC

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

On Tue, Mar 29, 2016 at 11:20 PM Robinson, Paul <
Paul_Robinson at playstation.sony.com> wrote:
> Skipping a serialization and doing something clever about LTO uniquing
> sounds awesome.  I'm guessing you achieve this by extracting types out
of
> DI metadata and packaging them as lumps-o-DWARF that the back-end can then
> paste together?  Reading between the lines a bit here.
>
>Pretty much, yes.

> Can you share data about how much "pure" types dominate the size
of debug
> info?  Or at least the current metadata scheme?  (Channeling Sean Silva
> here: show me the data!)  Does this hold for C as well as C++?
>They're huge. It's ridiculous. Take a look at the size of the metadata
and
then the size of the stuff we put in there versus dwarf.

And yes, it also trivially holds for C.

> Not much discussion of data objects and code objects (other than concrete
> subprograms), is that because they basically aren't changing?  Still
> defined in the metadata and still managed/emitted by the back-end?
>
Yep. A way of looking at it is more that it is related to things in the IR
and so needs IR to represent it.

> Please say something about types (which you're thinking of as a
front-end
> thing) defined within scopes (which it looks like you're thinking of as
a
> back-end thing).  Not seeing how to get the scoping right.
>
>
>
Basic idea is non-defining declarations holding types and be the abstract
origin for the concrete function? Honestly, I wish they were type unitable
at the moment, but that might be something to look into. The current plan
at least. This will make some debug info a little bit larger, but only for
things like nested types where we need to throw an extra declaration (i.e.
the same sorts of places that type units make things larger).

At any rate, the first thing is to get the APIs split anyhow.

-eric

> Thanks!
>
> --paulr
>
>
>
> *From:* cfe-dev [mailto:cfe-dev-bounces at lists.llvm.org] *On Behalf Of
*Eric
> Christopher via cfe-dev
> *Sent:* Tuesday, March 29, 2016 6:01 PM
> *To:* Clang Dev; llvm-dev
> *Subject:* [cfe-dev] RFC: Up front type information generation in clang
> and llvm
>
>
>
> Hi All,
>
>
>
> This is something that's been talked about for some time and it's
probably
> time to propose it.
>
>
>
> The "We" in this document is everyone on the cc line plus me.
>
>
>
> Please go ahead and take a look.
>
>
>
> Thanks!
>
>
>
> -eric
>
>
>
>
>
> Objective (and TL;DR)
>
> ================>
>
>
> Migrate debug type information generation from the backends to the front
> end.
>
>
>
> This will enable:
>
> 1. Separation of concerns and maintainability: LLVM shouldn’t have to know
> about C preprocessor macros, Obj-C properties, or extensive details about
> debug information binary formats.
>
> 2. Performance: Skipping a serialization should speed up normal
> compilations.
>
> 3. Memory usage: The DI metadata structures are smaller than they were,
> but are still fairly large and pointer heavy.
>
>
>
> Motivation
>
> =======>
>
>
> Currently, types in LLVM debug info are described by the DIType class
> hierarchy. This hierarchy evolved organically from a more flexible
> sea-of-nodes representation into what it is today - a large, only somewhat
> format neutral representation of debug types. Making this more format
> neutral will only increase the memory use - and for no reason as type
> information is static (or nearly so). Debug formats already have a memory
> efficient serialization, their own binary format so we should support a
> front end emitting type information with sufficient representation to allow
> the backend to emit debug information based on the more normal IR features:
> functions, scopes, variables, etc.
>
>
>
> Scope/Impact
>
> ==========>
>
>
> This is going to involve large scale changes across both LLVM and clang.
> This will also affect any out-of-tree front ends, however, we expect the
> impact to be on the order of a large API change rather than needing massive
> infrastructure changes.
>
>
>
> Related work
>
> =========>
>
>
> This is related to the efforts to support CodeView in LLVM and clang as
> well as efforts to reduce overall memory consumption when compiling with
> debug information enabled;  in particular efforts to prune LTO memory
usage.
>
>
>
>
>
> Concerns
>
> =======>
>
>
>
>
> We need a good story for transitioning all the debug info testcases in the
> backend without giving up coverage and/or readability. David believes he
> has a plan here.
>
>
>
> Proposal
>
> ======>
>
>
> Short version
>
> -----------------
>
>
>
> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line Table.
>
> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
>
> 3. Add a LLVM DWARF emission library similar to the existing CodeView one.
>
> 4. Migrate the Types API into a clang internal API taking clang AST
> structures and use the LLVM binary emission libraries to produce type
> information.
>
> 5. Remove the old binary emission out of LLVM.
>
>
>
>
>
> Questions/Thoughts/Elaboration
>
> -------------------------------------------
>
>
>
> Splitting the DIBuilder API
>
> ~~~~~~~~~~~~~~~~~~~~
>
> Will DISubprogram be part of both?
>
>    * We should split it in two: Full declarations with type and a slimmed
> down version with an abstract origin.
>
>
>
> How will we reference types in the DWARF blob?
>
>    * ODR types can be referenced by name
>
>    * Non-odr types by full DWARF hash
>
>    * Each type can be a pair(tuple) of identifier (DITypeRef today) and
> blob.
>
>    * For < DWARF4 we can emit each type as a unit, but not a DWARF Type
> Unit and use references and module relocations for the offsets. (See below)
>
>
>
> How will we handle references in DWARF2 or global relocations for non-type
> template parameters?
>
>    * We can use a “relocation” metadata as part of the format.
>
>    * Representable as a tuple that has the DIType and the offset within
> the DIBlob as where to write the final relocation/offset for the reference
> at emission time.
>
>
>
> Why break up the types at all?
>
>    * To enable non-debug format aware linking and type uniquing for LTO
> that won’t be huge in size. We break up the types so we don’t need to parse
> debug information to link two modules together efficiently.
>
>
>
> Any other concerns there?
>
>    * Debug information without type units might be slightly larger in this
> scheme due to parents being duplicated (declarations and abstract origin,
> not full parents). It may be possible to extend dsymutil/etc to merge all
> siblings into a common parent. Open question for better ways to solve this.
>
>
>
> How should we handle DWARF5/Apple Accelerator Tables?
>
>    * Thoughts:
>
>    * We can parse the dwarf in the back end and generate them.
>
>    * We can emit in the front end for the base case of non-LTO (with help
> from the backend for relocation aspects).
>
>    * We can use dsymutil on LTO debug information to generate them.
>
>
>
> Why isn’t this a more detailed spec?
>
>    * Mostly because we’ve thought about the issues, but we can’t plan for
> everything during implementation.
>
>
>
>
>
> Future work
>
> ----------------
>
>
>
> Not contained as part of this, but an obvious future direction is that the
> Module linker could grow support for debug aware linking. Then we can have
> all of the type information for a single translation unit in a single blob
> and use the debug aware linking to handle merging types.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160330/596ff641/attachment.html>

Robinson, Paul via llvm-dev

2016-Mar-30 18:31 UTC

head link

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

Probably don't want to package the lumps into actual type units before the
backend can figure out what's going on; referencing a type unit is clumsy
and inefficient in both space and time.  Partial units are probably a better fit
in many cases, especially for non-file-scope types.  Also LTO would probably
want to bias against type units, again that feels like a decision better made by
the backend than the frontend.
Just a thought.
--paulr

From: Eric Christopher [mailto:echristo at gmail.com]
Sent: Tuesday, March 29, 2016 11:51 PM
To: Robinson, Paul; llvm-dev; cfe-dev at lists.llvm.org
Subject: Re: [cfe-dev] RFC: Up front type information generation in clang and
llvm


On Tue, Mar 29, 2016 at 11:20 PM Robinson, Paul <Paul_Robinson at
playstation.sony.com<mailto:Paul_Robinson at playstation.sony.com>>
wrote:
Skipping a serialization and doing something clever about LTO uniquing sounds
awesome.  I'm guessing you achieve this by extracting types out of DI
metadata and packaging them as lumps-o-DWARF that the back-end can then paste
together?  Reading between the lines a bit here.

Pretty much, yes.

Can you share data about how much "pure" types dominate the size of
debug info?  Or at least the current metadata scheme?  (Channeling Sean Silva
here: show me the data!)  Does this hold for C as well as C++?
They're huge. It's ridiculous. Take a look at the size of the metadata
and then the size of the stuff we put in there versus dwarf.

And yes, it also trivially holds for C.

Not much discussion of data objects and code objects (other than concrete
subprograms), is that because they basically aren't changing?  Still defined
in the metadata and still managed/emitted by the back-end?

Yep. A way of looking at it is more that it is related to things in the IR and
so needs IR to represent it.

Please say something about types (which you're thinking of as a front-end
thing) defined within scopes (which it looks like you're thinking of as a
back-end thing).  Not seeing how to get the scoping right.


Basic idea is non-defining declarations holding types and be the abstract origin
for the concrete function? Honestly, I wish they were type unitable at the
moment, but that might be something to look into. The current plan at least.
This will make some debug info a little bit larger, but only for things like
nested types where we need to throw an extra declaration (i.e. the same sorts of
places that type units make things larger).

At any rate, the first thing is to get the APIs split anyhow.

-eric

Thanks!
--paulr

From: cfe-dev [mailto:cfe-dev-bounces at
lists.llvm.org<mailto:cfe-dev-bounces at lists.llvm.org>] On Behalf Of
Eric Christopher via cfe-dev
Sent: Tuesday, March 29, 2016 6:01 PM
To: Clang Dev; llvm-dev
Subject: [cfe-dev] RFC: Up front type information generation in clang and llvm

Hi All,

This is something that's been talked about for some time and it's
probably time to propose it.

The "We" in this document is everyone on the cc line plus me.

Please go ahead and take a look.

Thanks!

-eric


Objective (and TL;DR)
================
Migrate debug type information generation from the backends to the front end.

This will enable:
1. Separation of concerns and maintainability: LLVM shouldn’t have to know about
C preprocessor macros, Obj-C properties, or extensive details about debug
information binary formats.
2. Performance: Skipping a serialization should speed up normal compilations.
3. Memory usage: The DI metadata structures are smaller than they were, but are
still fairly large and pointer heavy.

Motivation
=======
Currently, types in LLVM debug info are described by the DIType class hierarchy.
This hierarchy evolved organically from a more flexible sea-of-nodes
representation into what it is today - a large, only somewhat format neutral
representation of debug types. Making this more format neutral will only
increase the memory use - and for no reason as type information is static (or
nearly so). Debug formats already have a memory efficient serialization, their
own binary format so we should support a front end emitting type information
with sufficient representation to allow the backend to emit debug information
based on the more normal IR features: functions, scopes, variables, etc.

Scope/Impact
==========
This is going to involve large scale changes across both LLVM and clang. This
will also affect any out-of-tree front ends, however, we expect the impact to be
on the order of a large API change rather than needing massive infrastructure
changes.

Related work
=========
This is related to the efforts to support CodeView in LLVM and clang as well as
efforts to reduce overall memory consumption when compiling with debug
information enabled;  in particular efforts to prune LTO memory usage.


Concerns
=======

We need a good story for transitioning all the debug info testcases in the
backend without giving up coverage and/or readability. David believes he has a
plan here.

Proposal
======
Short version
-----------------

1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line Table.
2. Split the clang CGDebugInfo API into Types and Line Table to match.
3. Add a LLVM DWARF emission library similar to the existing CodeView one.
4. Migrate the Types API into a clang internal API taking clang AST structures
and use the LLVM binary emission libraries to produce type information.
5. Remove the old binary emission out of LLVM.


Questions/Thoughts/Elaboration
-------------------------------------------

Splitting the DIBuilder API
~~~~~~~~~~~~~~~~~~~~
Will DISubprogram be part of both?
   * We should split it in two: Full declarations with type and a slimmed down
version with an abstract origin.

How will we reference types in the DWARF blob?
   * ODR types can be referenced by name
   * Non-odr types by full DWARF hash
   * Each type can be a pair(tuple) of identifier (DITypeRef today) and blob.
   * For < DWARF4 we can emit each type as a unit, but not a DWARF Type Unit
and use references and module relocations for the offsets. (See below)

How will we handle references in DWARF2 or global relocations for non-type
template parameters?
   * We can use a “relocation” metadata as part of the format.
   * Representable as a tuple that has the DIType and the offset within the
DIBlob as where to write the final relocation/offset for the reference at
emission time.

Why break up the types at all?
   * To enable non-debug format aware linking and type uniquing for LTO that
won’t be huge in size. We break up the types so we don’t need to parse debug
information to link two modules together efficiently.

Any other concerns there?
   * Debug information without type units might be slightly larger in this
scheme due to parents being duplicated (declarations and abstract origin, not
full parents). It may be possible to extend dsymutil/etc to merge all siblings
into a common parent. Open question for better ways to solve this.

How should we handle DWARF5/Apple Accelerator Tables?
   * Thoughts:
   * We can parse the dwarf in the back end and generate them.
   * We can emit in the front end for the base case of non-LTO (with help from
the backend for relocation aspects).
   * We can use dsymutil on LTO debug information to generate them.

Why isn’t this a more detailed spec?
   * Mostly because we’ve thought about the issues, but we can’t plan for
everything during implementation.


Future work
----------------

Not contained as part of this, but an obvious future direction is that the
Module linker could grow support for debug aware linking. Then we can have all
of the type information for a single translation unit in a single blob and use
the debug aware linking to handle merging types.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160330/0d874e13/attachment-0001.html>

David Blaikie via llvm-dev

2016-Apr-01 02:11 UTC

head link

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

On Tue, Mar 29, 2016 at 11:50 PM, Eric Christopher via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
>
>
> On Tue, Mar 29, 2016 at 11:20 PM Robinson, Paul <
> Paul_Robinson at playstation.sony.com> wrote:
>
>> Skipping a serialization and doing something clever about LTO uniquing
>> sounds awesome.  I'm guessing you achieve this by extracting types
out of
>> DI metadata and packaging them as lumps-o-DWARF that the back-end can
then
>> paste together?  Reading between the lines a bit here.
>>
>>
> Pretty much, yes.
>
>
>> Can you share data about how much "pure" types dominate the
size of debug
>> info?  Or at least the current metadata scheme?  (Channeling Sean Silva
>> here: show me the data!)  Does this hold for C as well as C++?
>>
> They're huge. It's ridiculous. Take a look at the size of the
metadata and
> then the size of the stuff we put in there versus dwarf.
>
Because numbers are nice to have, I modified Clang to generate every type
as 'int' (patch attached - I may've screwed some things up) &
then compiled
llvm-tblgen's object files with -flto (I would've used all of clang, but
I
don't have the lto plugin setup, so I couldn't get past tblgen)

Without debug info: 77 MB of bitcode files
With debug info: 24 MB
With debug info, but no types: 46 MB

so... 59% is pure type descriptions (these are the pure ones, the same
things we put in type units - I didn't even remove the injected
declarations (so if you compile example programs with this - you'll find
that the DW_TAG_base_type for "int" has a child for every member
function
declaration that's defined (even used inline functions) in this translation
unit) for this particular test, at least. Clang would be a larger/more
representative sample.

I confirmed that both with and without types, there were the same number
(48542) of subprogram definitions and without types there were no instances
of DICompositeType (both of these were confirmed with xargs/llvm-dis/grep,
nothing fancy)



>
> And yes, it also trivially holds for C.
>
>
>> Not much discussion of data objects and code objects (other than
concrete
>> subprograms), is that because they basically aren't changing? 
Still
>> defined in the metadata and still managed/emitted by the back-end?
>>
>
> Yep. A way of looking at it is more that it is related to things in the IR
> and so needs IR to represent it.
>
>
>> Please say something about types (which you're thinking of as a
front-end
>> thing) defined within scopes (which it looks like you're thinking
of as a
>> back-end thing).  Not seeing how to get the scoping right.
>>
>>
>>
>
> Basic idea is non-defining declarations holding types and be the abstract
> origin for the concrete function? Honestly, I wish they were type unitable
> at the moment, but that might be something to look into. The current plan
> at least. This will make some debug info a little bit larger, but only for
> things like nested types where we need to throw an extra declaration (i.e.
> the same sorts of places that type units make things larger).
>
> At any rate, the first thing is to get the APIs split anyhow.
>
> -eric
>
>
>> Thanks!
>>
>> --paulr
>>
>>
>>
>> *From:* cfe-dev [mailto:cfe-dev-bounces at lists.llvm.org] *On Behalf
Of *Eric
>> Christopher via cfe-dev
>> *Sent:* Tuesday, March 29, 2016 6:01 PM
>> *To:* Clang Dev; llvm-dev
>> *Subject:* [cfe-dev] RFC: Up front type information generation in clang
>> and llvm
>>
>>
>>
>> Hi All,
>>
>>
>>
>> This is something that's been talked about for some time and
it's
>> probably time to propose it.
>>
>>
>>
>> The "We" in this document is everyone on the cc line plus me.
>>
>>
>>
>> Please go ahead and take a look.
>>
>>
>>
>> Thanks!
>>
>>
>>
>> -eric
>>
>>
>>
>>
>>
>> Objective (and TL;DR)
>>
>> ================>>
>>
>>
>> Migrate debug type information generation from the backends to the
front
>> end.
>>
>>
>>
>> This will enable:
>>
>> 1. Separation of concerns and maintainability: LLVM shouldn’t have to
>> know about C preprocessor macros, Obj-C properties, or extensive
details
>> about debug information binary formats.
>>
>> 2. Performance: Skipping a serialization should speed up normal
>> compilations.
>>
>> 3. Memory usage: The DI metadata structures are smaller than they were,
>> but are still fairly large and pointer heavy.
>>
>>
>>
>> Motivation
>>
>> =======>>
>>
>>
>> Currently, types in LLVM debug info are described by the DIType class
>> hierarchy. This hierarchy evolved organically from a more flexible
>> sea-of-nodes representation into what it is today - a large, only
somewhat
>> format neutral representation of debug types. Making this more format
>> neutral will only increase the memory use - and for no reason as type
>> information is static (or nearly so). Debug formats already have a
memory
>> efficient serialization, their own binary format so we should support a
>> front end emitting type information with sufficient representation to
allow
>> the backend to emit debug information based on the more normal IR
features:
>> functions, scopes, variables, etc.
>>
>>
>>
>> Scope/Impact
>>
>> ==========>>
>>
>>
>> This is going to involve large scale changes across both LLVM and
clang.
>> This will also affect any out-of-tree front ends, however, we expect
the
>> impact to be on the order of a large API change rather than needing
massive
>> infrastructure changes.
>>
>>
>>
>> Related work
>>
>> =========>>
>>
>>
>> This is related to the efforts to support CodeView in LLVM and clang as
>> well as efforts to reduce overall memory consumption when compiling
with
>> debug information enabled;  in particular efforts to prune LTO memory
usage.
>>
>>
>>
>>
>>
>> Concerns
>>
>> =======>>
>>
>>
>>
>>
>> We need a good story for transitioning all the debug info testcases in
>> the backend without giving up coverage and/or readability. David
believes
>> he has a plan here.
>>
>>
>>
>> Proposal
>>
>> ======>>
>>
>>
>> Short version
>>
>> -----------------
>>
>>
>>
>> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line
>> Table.
>>
>> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
>>
>> 3. Add a LLVM DWARF emission library similar to the existing CodeView
one.
>>
>> 4. Migrate the Types API into a clang internal API taking clang AST
>> structures and use the LLVM binary emission libraries to produce type
>> information.
>>
>> 5. Remove the old binary emission out of LLVM.
>>
>>
>>
>>
>>
>> Questions/Thoughts/Elaboration
>>
>> -------------------------------------------
>>
>>
>>
>> Splitting the DIBuilder API
>>
>> ~~~~~~~~~~~~~~~~~~~~
>>
>> Will DISubprogram be part of both?
>>
>>    * We should split it in two: Full declarations with type and a
slimmed
>> down version with an abstract origin.
>>
>>
>>
>> How will we reference types in the DWARF blob?
>>
>>    * ODR types can be referenced by name
>>
>>    * Non-odr types by full DWARF hash
>>
>>    * Each type can be a pair(tuple) of identifier (DITypeRef today) and
>> blob.
>>
>>    * For < DWARF4 we can emit each type as a unit, but not a DWARF
Type
>> Unit and use references and module relocations for the offsets. (See
below)
>>
>>
>>
>> How will we handle references in DWARF2 or global relocations for
>> non-type template parameters?
>>
>>    * We can use a “relocation” metadata as part of the format.
>>
>>    * Representable as a tuple that has the DIType and the offset within
>> the DIBlob as where to write the final relocation/offset for the
reference
>> at emission time.
>>
>>
>>
>> Why break up the types at all?
>>
>>    * To enable non-debug format aware linking and type uniquing for LTO
>> that won’t be huge in size. We break up the types so we don’t need to
parse
>> debug information to link two modules together efficiently.
>>
>>
>>
>> Any other concerns there?
>>
>>    * Debug information without type units might be slightly larger in
>> this scheme due to parents being duplicated (declarations and abstract
>> origin, not full parents). It may be possible to extend dsymutil/etc to
>> merge all siblings into a common parent. Open question for better ways
to
>> solve this.
>>
>>
>>
>> How should we handle DWARF5/Apple Accelerator Tables?
>>
>>    * Thoughts:
>>
>>    * We can parse the dwarf in the back end and generate them.
>>
>>    * We can emit in the front end for the base case of non-LTO (with
help
>> from the backend for relocation aspects).
>>
>>    * We can use dsymutil on LTO debug information to generate them.
>>
>>
>>
>> Why isn’t this a more detailed spec?
>>
>>    * Mostly because we’ve thought about the issues, but we can’t plan
for
>> everything during implementation.
>>
>>
>>
>>
>>
>> Future work
>>
>> ----------------
>>
>>
>>
>> Not contained as part of this, but an obvious future direction is that
>> the Module linker could grow support for debug aware linking. Then we
can
>> have all of the type information for a single translation unit in a
single
>> blob and use the debug aware linking to handle merging types.
>>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160331/2c4b613d/attachment-0001.html>
-------------- next part --------------
diff --git lib/CodeGen/CGDebugInfo.cpp lib/CodeGen/CGDebugInfo.cpp
index e300ed5..7974873 100644
--- lib/CodeGen/CGDebugInfo.cpp
+++ lib/CodeGen/CGDebugInfo.cpp
@@ -1509,16 +1509,6 @@ void CGDebugInfo::completeRequiredType(const RecordDecl
*RD) {
 }
 
 void CGDebugInfo::completeClassData(const RecordDecl *RD) {
-  if (DebugKind <= codegenoptions::DebugLineTablesOnly)
-    return;
-  QualType Ty = CGM.getContext().getRecordType(RD);
-  void *TyPtr = Ty.getAsOpaquePtr();
-  auto I = TypeCache.find(TyPtr);
-  if (I != TypeCache.end() &&
!cast<llvm::DIType>(I->second)->isForwardDecl())
-    return;
-  llvm::DIType *Res = CreateTypeDefinition(Ty->castAs<RecordType>());
-  assert(!Res->isForwardDecl());
-  TypeCache[TyPtr].reset(Res);
 }
 
 static bool hasExplicitMemberDefinition(CXXRecordDecl::method_iterator I,
@@ -2169,6 +2159,9 @@ llvm::DIType *CGDebugInfo::getTypeOrNull(QualType Ty) {
   // Unwrap the type as needed for debug information.
   Ty = UnwrapTypeForDebugInfo(Ty, CGM.getContext());
 
+  if (Ty->getTypeClass() != Type::FunctionProto &&
Ty->getTypeClass() != Type::FunctionNoProto)
+    Ty = CGM.getContext().IntTy;
+
   auto it = TypeCache.find(Ty.getAsOpaquePtr());
   if (it != TypeCache.end()) {
     // Verify that the debug info still exists.
@@ -2197,6 +2190,9 @@ llvm::DIType *CGDebugInfo::getOrCreateType(QualType Ty,
llvm::DIFile *Unit) {
   // Unwrap the type as needed for debug information.
   Ty = UnwrapTypeForDebugInfo(Ty, CGM.getContext());
 
+  if (Ty->getTypeClass() != Type::FunctionProto &&
Ty->getTypeClass() != Type::FunctionNoProto)
+    Ty = CGM.getContext().IntTy;
+
   if (auto *T = getTypeOrNull(Ty))
     return T;
 
@@ -2603,7 +2599,7 @@ llvm::DISubprogram
*CGDebugInfo::getFunctionDeclaration(const Decl *D) {
     if (const CXXMethodDecl *MD             
dyn_cast<CXXMethodDecl>(FD->getCanonicalDecl())) {
       return CreateCXXMemberFunction(MD, getOrCreateFile(MD->getLocation()),
-                                     cast<llvm::DICompositeType>(S));
+                                     cast<llvm::DIType>(S));
     }
   }
   if (MI != SPCache.end()) {
@@ -3328,7 +3324,7 @@
CGDebugInfo::getOrCreateStaticDataMemberDeclarationOrNull(const VarDecl *D) {
   // If the member wasn't found in the cache, lazily construct and add it
to the
   // type (used when a limited form of the type is emitted).
   auto DC = D->getDeclContext();
-  auto *Ctxt = cast<llvm::DICompositeType>(getDeclContextDescriptor(D));
+  auto *Ctxt = cast<llvm::DIType>(getDeclContextDescriptor(D));
   return CreateRecordStaticField(D, Ctxt, cast<RecordDecl>(DC));
 }
 
@@ -3400,12 +3396,8 @@ void CGDebugInfo::EmitGlobalVariable(const ValueDecl *VD,
     const EnumDecl *ED = cast<EnumDecl>(ECD->getDeclContext());
     assert(isa<EnumType>(ED->getTypeForDecl()) && "Enum
without EnumType?");
     Ty = getOrCreateType(QualType(ED->getTypeForDecl(), 0), Unit);
-  }
-  // Do not use global variables for enums.
-  //
-  // FIXME: why not?
-  if (Ty->getTag() == llvm::dwarf::DW_TAG_enumeration_type)
     return;
+  }
   // Do not emit separate definitions for function local const/statics.
   if (isa<FunctionDecl>(VD->getDeclContext()))
     return;

David Blaikie via llvm-dev

2016-Apr-01 02:28 UTC

head link

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

On Mar 31, 2016 7:11 PM, "David Blaikie" <dblaikie at gmail.com>
wrote:>
>
>
> On Tue, Mar 29, 2016 at 11:50 PM, Eric Christopher via cfe-dev <
cfe-dev at lists.llvm.org> wrote:>>
>>
>>
>> On Tue, Mar 29, 2016 at 11:20 PM Robinson, Paul <
Paul_Robinson at playstation.sony.com> wrote:>>>
>>> Skipping a serialization and doing something clever about LTO
uniquingsounds awesome.  I'm guessing you achieve this by extracting types out of
DI metadata and packaging them as lumps-o-DWARF that the back-end can then
paste together?  Reading between the lines a bit here.>>
>>
>> Pretty much, yes.
>>
>>>
>>> Can you share data about how much "pure" types dominate
the size ofdebug info?  Or at least the current metadata scheme?  (Channeling Sean
Silva here: show me the data!)  Does this hold for C as well as
C++?>>
>> They're huge. It's ridiculous. Take a look at the size of the
metadataand then the size of the stuff we put in there versus
dwarf.>
>
> Because numbers are nice to have, I modified Clang to generate every typeas 'int' (patch attached - I may've screwed some things up) &
then compiled
llvm-tblgen's object files with -flto (I would've used all of clang, but
I
don't have the lto plugin setup, so I couldn't get past
tblgen)>
> Without debug info: 77 MB of bitcode files
> With debug info: 24 MB
> With debug info, but no types: 46 MB
>
> so... 59% is pure type descriptions
To clarify, I mean 59% of the debug info ((46-24)/(77-24) == without type
info is 41% the size of total with type info), not of the total file size.
If that makes sense.
> (these are the pure ones, the same things we put in type units - I
didn'teven remove the injected declarations (so if you compile example programs
with this - you'll find that the DW_TAG_base_type for "int" has a
child for
every member function declaration that's defined (even used inline
functions) in this translation unit) for this particular test, at least.
Clang would be a larger/more representative sample.>
> I confirmed that both with and without types, there were the same number(48542) of subprogram definitions and without types there were no instances
of DICompositeType (both of these were confirmed with xargs/llvm-dis/grep,
nothing fancy)>
>
>
>>
>>
>> And yes, it also trivially holds for C.
>>
>>>
>>> Not much discussion of data objects and code objects (other thanconcrete subprograms), is that because they basically aren't changing?
Still defined in the metadata and still managed/emitted by the
back-end?>>
>>
>> Yep. A way of looking at it is more that it is related to things in the
IR and so needs IR to represent it.>>
>>>
>>> Please say something about types (which you're thinking of as afront-end thing) defined within scopes (which it looks like you're thinking
of as a back-end thing).  Not seeing how to get the scoping
right.>>>
>>>
>>
>>
>> Basic idea is non-defining declarations holding types and be theabstract origin for the concrete function? Honestly, I wish they were type
unitable at the moment, but that might be something to look into. The
current plan at least. This will make some debug info a little bit larger,
but only for things like nested types where we need to throw an extra
declaration (i.e. the same sorts of places that type units make things
larger).>>
>> At any rate, the first thing is to get the APIs split anyhow.
>>
>> -eric
>>
>>>
>>> Thanks!
>>>
>>> --paulr
>>>
>>>
>>>
>>> From: cfe-dev [mailto:cfe-dev-bounces at lists.llvm.org] On Behalf
Of Eric
Christopher via cfe-dev>>> Sent: Tuesday, March 29, 2016 6:01 PM
>>> To: Clang Dev; llvm-dev
>>> Subject: [cfe-dev] RFC: Up front type information generation in
clang
and llvm>>>
>>>
>>>
>>> Hi All,
>>>
>>>
>>>
>>> This is something that's been talked about for some time and
it's
probably time to propose it.>>>
>>>
>>>
>>> The "We" in this document is everyone on the cc line plus
me.
>>>
>>>
>>>
>>> Please go ahead and take a look.
>>>
>>>
>>>
>>> Thanks!
>>>
>>>
>>>
>>> -eric
>>>
>>>
>>>
>>>
>>>
>>> Objective (and TL;DR)
>>>
>>> ================>>>
>>>
>>>
>>> Migrate debug type information generation from the backends to the
front end.>>>
>>>
>>>
>>> This will enable:
>>>
>>> 1. Separation of concerns and maintainability: LLVM shouldn’t have
toknow about C preprocessor macros, Obj-C properties, or extensive details
about debug information binary formats.>>>
>>> 2. Performance: Skipping a serialization should speed up normal
compilations.>>>
>>> 3. Memory usage: The DI metadata structures are smaller than they
were,
but are still fairly large and pointer heavy.>>>
>>>
>>>
>>> Motivation
>>>
>>> =======>>>
>>>
>>>
>>> Currently, types in LLVM debug info are described by the DIType
classhierarchy. This hierarchy evolved organically from a more flexible
sea-of-nodes representation into what it is today - a large, only somewhat
format neutral representation of debug types. Making this more format
neutral will only increase the memory use - and for no reason as type
information is static (or nearly so). Debug formats already have a memory
efficient serialization, their own binary format so we should support a
front end emitting type information with sufficient representation to allow
the backend to emit debug information based on the more normal IR features:
functions, scopes, variables, etc.>>>
>>>
>>>
>>> Scope/Impact
>>>
>>> ==========>>>
>>>
>>>
>>> This is going to involve large scale changes across both LLVM andclang. This will also affect any out-of-tree front ends, however, we expect
the impact to be on the order of a large API change rather than needing
massive infrastructure changes.>>>
>>>
>>>
>>> Related work
>>>
>>> =========>>>
>>>
>>>
>>> This is related to the efforts to support CodeView in LLVM and
clang aswell as efforts to reduce overall memory consumption when compiling with
debug information enabled;  in particular efforts to prune LTO memory
usage.>>>
>>>
>>>
>>>
>>>
>>> Concerns
>>>
>>> =======>>>
>>>
>>>
>>>
>>>
>>> We need a good story for transitioning all the debug info testcases
inthe backend without giving up coverage and/or readability. David believes
he has a plan here.>>>
>>>
>>>
>>> Proposal
>>>
>>> ======>>>
>>>
>>>
>>> Short version
>>>
>>> -----------------
>>>
>>>
>>>
>>> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and
Line
Table.>>>
>>> 2. Split the clang CGDebugInfo API into Types and Line Table to
match.
>>>
>>> 3. Add a LLVM DWARF emission library similar to the existing
CodeView
one.>>>
>>> 4. Migrate the Types API into a clang internal API taking clang ASTstructures and use the LLVM binary emission libraries to produce type
information.>>>
>>> 5. Remove the old binary emission out of LLVM.
>>>
>>>
>>>
>>>
>>>
>>> Questions/Thoughts/Elaboration
>>>
>>> -------------------------------------------
>>>
>>>
>>>
>>> Splitting the DIBuilder API
>>>
>>> ~~~~~~~~~~~~~~~~~~~~
>>>
>>> Will DISubprogram be part of both?
>>>
>>>    * We should split it in two: Full declarations with type and a
slimmed down version with an abstract origin.>>>
>>>
>>>
>>> How will we reference types in the DWARF blob?
>>>
>>>    * ODR types can be referenced by name
>>>
>>>    * Non-odr types by full DWARF hash
>>>
>>>    * Each type can be a pair(tuple) of identifier (DITypeRef today)
and
blob.>>>
>>>    * For < DWARF4 we can emit each type as a unit, but not a
DWARF TypeUnit and use references and module relocations for the offsets. (See
below)>>>
>>>
>>>
>>> How will we handle references in DWARF2 or global relocations for
non-type template parameters?>>>
>>>    * We can use a “relocation” metadata as part of the format.
>>>
>>>    * Representable as a tuple that has the DIType and the offset
withinthe DIBlob as where to write the final relocation/offset for the reference
at emission time.>>>
>>>
>>>
>>> Why break up the types at all?
>>>
>>>    * To enable non-debug format aware linking and type uniquing for
LTOthat won’t be huge in size. We break up the types so we don’t need to parse
debug information to link two modules together
efficiently.>>>
>>>
>>>
>>> Any other concerns there?
>>>
>>>    * Debug information without type units might be slightly larger
inthis scheme due to parents being duplicated (declarations and abstract
origin, not full parents). It may be possible to extend dsymutil/etc to
merge all siblings into a common parent. Open question for better ways to
solve this.>>>
>>>
>>>
>>> How should we handle DWARF5/Apple Accelerator Tables?
>>>
>>>    * Thoughts:
>>>
>>>    * We can parse the dwarf in the back end and generate them.
>>>
>>>    * We can emit in the front end for the base case of non-LTO
(with
help from the backend for relocation aspects).>>>
>>>    * We can use dsymutil on LTO debug information to generate them.
>>>
>>>
>>>
>>> Why isn’t this a more detailed spec?
>>>
>>>    * Mostly because we’ve thought about the issues, but we can’t
plan
for everything during implementation.>>>
>>>
>>>
>>>
>>>
>>> Future work
>>>
>>> ----------------
>>>
>>>
>>>
>>> Not contained as part of this, but an obvious future direction is
thatthe Module linker could grow support for debug aware linking. Then we can
have all of the type information for a single translation unit in a single
blob and use the debug aware linking to handle merging
types.>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160331/1c60ae9a/attachment.html>

Mehdi Amini via llvm-dev

2016-Apr-01 03:50 UTC

head link

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

> On Mar 31, 2016, at 7:11 PM, David Blaikie via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
> 
> On Tue, Mar 29, 2016 at 11:50 PM, Eric Christopher via cfe-dev <cfe-dev
at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
> 
> 
> On Tue, Mar 29, 2016 at 11:20 PM Robinson, Paul <Paul_Robinson at
playstation.sony.com <mailto:Paul_Robinson at playstation.sony.com>>
wrote:
> Skipping a serialization and doing something clever about LTO uniquing
sounds awesome.  I'm guessing you achieve this by extracting types out of DI
metadata and packaging them as lumps-o-DWARF that the back-end can then paste
together?  Reading between the lines a bit here.
> 
> 
> 
> Pretty much, yes.
>  
> Can you share data about how much "pure" types dominate the size
of debug info?  Or at least the current metadata scheme?  (Channeling Sean Silva
here: show me the data!)  Does this hold for C as well as C++?
> 
> They're huge. It's ridiculous. Take a look at the size of the
metadata and then the size of the stuff we put in there versus dwarf.
> 
> Because numbers are nice to have, I modified Clang to generate every type
as 'int' (patch attached - I may've screwed some things up) &
then compiled llvm-tblgen's object files with -flto (I would've used all
of clang, but I don't have the lto plugin setup, so I couldn't get past
tblgen)
I guess you have a non-LTO build somewhere, so you should be able to build other
tools by bypassing the llvm-tblgen build using:

cmake -DLLVM_TABLEGEN=path/to/llvm-tblgen ..

-- 
Mehdi


> 
> Without debug info: 77 MB of bitcode files
> With debug info: 24 MB
> With debug info, but no types: 46 MB
> 
> so... 59% is pure type descriptions (these are the pure ones, the same
things we put in type units - I didn't even remove the injected declarations
(so if you compile example programs with this - you'll find that the
DW_TAG_base_type for "int" has a child for every member function
declaration that's defined (even used inline functions) in this translation
unit) for this particular test, at least. Clang would be a larger/more
representative sample.
> 
> I confirmed that both with and without types, there were the same number
(48542) of subprogram definitions and without types there were no instances of
DICompositeType (both of these were confirmed with xargs/llvm-dis/grep, nothing
fancy)
> 
> 
>  
> 
> And yes, it also trivially holds for C.
>  
> 
> Not much discussion of data objects and code objects (other than concrete
subprograms), is that because they basically aren't changing?  Still defined
in the metadata and still managed/emitted by the back-end?
> 
> 
> Yep. A way of looking at it is more that it is related to things in the IR
and so needs IR to represent it.
>  
> 
> Please say something about types (which you're thinking of as a
front-end thing) defined within scopes (which it looks like you're thinking
of as a back-end thing).  Not seeing how to get the scoping right.
> 
>  
> 
> 
> Basic idea is non-defining declarations holding types and be the abstract
origin for the concrete function? Honestly, I wish they were type unitable at
the moment, but that might be something to look into. The current plan at least.
This will make some debug info a little bit larger, but only for things like
nested types where we need to throw an extra declaration (i.e. the same sorts of
places that type units make things larger).
> 
> At any rate, the first thing is to get the APIs split anyhow.
> 
> -eric
>  
> 
> Thanks!
> 
> --paulr
> 
>   <>
> From: cfe-dev [mailto:cfe-dev-bounces at lists.llvm.org
<mailto:cfe-dev-bounces at lists.llvm.org>] On Behalf Of Eric Christopher
via cfe-dev
> Sent: Tuesday, March 29, 2016 6:01 PM
> To: Clang Dev; llvm-dev
> Subject: [cfe-dev] RFC: Up front type information generation in clang and
llvm
> 
>  
> 
> Hi All,
> 
>  
> 
> This is something that's been talked about for some time and it's
probably time to propose it.
> 
>  
> 
> The "We" in this document is everyone on the cc line plus me.
> 
>  
> 
> Please go ahead and take a look.
> 
>  
> 
> Thanks!
> 
>  
> 
> -eric
> 
>  
> 
>  
> 
> Objective (and TL;DR)
> 
> ================> 
>  
> 
> Migrate debug type information generation from the backends to the front
end.
> 
>  
> 
> This will enable:
> 
> 1. Separation of concerns and maintainability: LLVM shouldn’t have to know
about C preprocessor macros, Obj-C properties, or extensive details about debug
information binary formats.
> 
> 2. Performance: Skipping a serialization should speed up normal
compilations.
> 
> 3. Memory usage: The DI metadata structures are smaller than they were, but
are still fairly large and pointer heavy.
> 
>  
> 
> Motivation
> 
> =======> 
>  
> 
> Currently, types in LLVM debug info are described by the DIType class
hierarchy. This hierarchy evolved organically from a more flexible sea-of-nodes
representation into what it is today - a large, only somewhat format neutral
representation of debug types. Making this more format neutral will only
increase the memory use - and for no reason as type information is static (or
nearly so). Debug formats already have a memory efficient serialization, their
own binary format so we should support a front end emitting type information
with sufficient representation to allow the backend to emit debug information
based on the more normal IR features: functions, scopes, variables, etc.
> 
>  
> 
> Scope/Impact
> 
> ==========> 
>  
> 
> This is going to involve large scale changes across both LLVM and clang.
This will also affect any out-of-tree front ends, however, we expect the impact
to be on the order of a large API change rather than needing massive
infrastructure changes.
> 
>  
> 
> Related work
> 
> =========> 
>  
> 
> This is related to the efforts to support CodeView in LLVM and clang as
well as efforts to reduce overall memory consumption when compiling with debug
information enabled;  in particular efforts to prune LTO memory usage.
> 
>  
> 
>  
> 
> Concerns
> 
> =======> 
>  
> 
>  
> 
> We need a good story for transitioning all the debug info testcases in the
backend without giving up coverage and/or readability. David believes he has a
plan here.
> 
>  
> 
> Proposal
> 
> ======> 
>  
> 
> Short version
> 
> -----------------
> 
>  
> 
> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line Table.
> 
> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
> 
> 3. Add a LLVM DWARF emission library similar to the existing CodeView one.
> 
> 4. Migrate the Types API into a clang internal API taking clang AST
structures and use the LLVM binary emission libraries to produce type
information.
> 
> 5. Remove the old binary emission out of LLVM.
> 
>  
> 
>  
> 
> Questions/Thoughts/Elaboration
> 
> -------------------------------------------
> 
>  
> 
> Splitting the DIBuilder API
> 
> ~~~~~~~~~~~~~~~~~~~~
> 
> Will DISubprogram be part of both?
> 
>    * We should split it in two: Full declarations with type and a slimmed
down version with an abstract origin.
> 
>  
> 
> How will we reference types in the DWARF blob?
> 
>    * ODR types can be referenced by name
> 
>    * Non-odr types by full DWARF hash
> 
>    * Each type can be a pair(tuple) of identifier (DITypeRef today) and
blob.
> 
>    * For < DWARF4 we can emit each type as a unit, but not a DWARF Type
Unit and use references and module relocations for the offsets. (See below)
> 
>  
> 
> How will we handle references in DWARF2 or global relocations for non-type
template parameters?
> 
>    * We can use a “relocation” metadata as part of the format.
> 
>    * Representable as a tuple that has the DIType and the offset within the
DIBlob as where to write the final relocation/offset for the reference at
emission time.
> 
>  
> 
> Why break up the types at all?
> 
>    * To enable non-debug format aware linking and type uniquing for LTO
that won’t be huge in size. We break up the types so we don’t need to parse
debug information to link two modules together efficiently.
> 
>  
> 
> Any other concerns there?
> 
>    * Debug information without type units might be slightly larger in this
scheme due to parents being duplicated (declarations and abstract origin, not
full parents). It may be possible to extend dsymutil/etc to merge all siblings
into a common parent. Open question for better ways to solve this.
> 
>  
> 
> How should we handle DWARF5/Apple Accelerator Tables?
> 
>    * Thoughts:
> 
>    * We can parse the dwarf in the back end and generate them.
> 
>    * We can emit in the front end for the base case of non-LTO (with help
from the backend for relocation aspects).
> 
>    * We can use dsymutil on LTO debug information to generate them.
> 
>  
> 
> Why isn’t this a more detailed spec?
> 
>    * Mostly because we’ve thought about the issues, but we can’t plan for
everything during implementation.
> 
>  
> 
>  
> 
> Future work
> 
> ----------------
> 
>  
> 
> Not contained as part of this, but an obvious future direction is that the
Module linker could grow support for debug aware linking. Then we can have all
of the type information for a single translation unit in a single blob and use
the debug aware linking to handle merging types.
> 
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
> 
> 
> <notypes.diff>_______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160331/74ad64da/attachment.html>

David Blaikie via llvm-dev

2016-Apr-01 04:51 UTC

head link

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

On Mar 31, 2016 7:11 PM, "David Blaikie" <dblaikie at gmail.com>
wrote:>
>
>
> On Tue, Mar 29, 2016 at 11:50 PM, Eric Christopher via cfe-dev <
cfe-dev at lists.llvm.org> wrote:>>
>>
>>
>> On Tue, Mar 29, 2016 at 11:20 PM Robinson, Paul <
Paul_Robinson at playstation.sony.com> wrote:>>>
>>> Skipping a serialization and doing something clever about LTO
uniquingsounds awesome.  I'm guessing you achieve this by extracting types out of
DI metadata and packaging them as lumps-o-DWARF that the back-end can then
paste together?  Reading between the lines a bit here.>>
>>
>> Pretty much, yes.
>>
>>>
>>> Can you share data about how much "pure" types dominate
the size ofdebug info?  Or at least the current metadata scheme?  (Channeling Sean
Silva here: show me the data!)  Does this hold for C as well as
C++?>>
>> They're huge. It's ridiculous. Take a look at the size of the
metadataand then the size of the stuff we put in there versus
dwarf.>
>
> Because numbers are nice to have, I modified Clang to generate every typeas 'int' (patch attached - I may've screwed some things up) &
then compiled
llvm-tblgen's object files with -flto (I would've used all of clang, but
I
don't have the lto plugin setup, so I couldn't get past
tblgen)>
> Without debug info: 77 MB of bitcode files
> With debug info: 24 MB
Oh, and I got these ^ numbers jumbled up. 77 with, 24 without.
> With debug info, but no types: 46 MB
>
> so... 59% is pure type descriptions (these are the pure ones, the samethings we put in type units - I didn't even remove the injected
declarations (so if you compile example programs with this - you'll find
that the DW_TAG_base_type for "int" has a child for every member
function
declaration that's defined (even used inline functions) in this translation
unit) for this particular test, at least. Clang would be a larger/more
representative sample.>
> I confirmed that both with and without types, there were the same number(48542) of subprogram definitions and without types there were no instances
of DICompositeType (both of these were confirmed with xargs/llvm-dis/grep,
nothing fancy)>
>
>
>>
>>
>> And yes, it also trivially holds for C.
>>
>>>
>>> Not much discussion of data objects and code objects (other thanconcrete subprograms), is that because they basically aren't changing?
Still defined in the metadata and still managed/emitted by the
back-end?>>
>>
>> Yep. A way of looking at it is more that it is related to things in the
IR and so needs IR to represent it.>>
>>>
>>> Please say something about types (which you're thinking of as afront-end thing) defined within scopes (which it looks like you're thinking
of as a back-end thing).  Not seeing how to get the scoping
right.>>>
>>>
>>
>>
>> Basic idea is non-defining declarations holding types and be theabstract origin for the concrete function? Honestly, I wish they were type
unitable at the moment, but that might be something to look into. The
current plan at least. This will make some debug info a little bit larger,
but only for things like nested types where we need to throw an extra
declaration (i.e. the same sorts of places that type units make things
larger).>>
>> At any rate, the first thing is to get the APIs split anyhow.
>>
>> -eric
>>
>>>
>>> Thanks!
>>>
>>> --paulr
>>>
>>>
>>>
>>> From: cfe-dev [mailto:cfe-dev-bounces at lists.llvm.org] On Behalf
Of Eric
Christopher via cfe-dev>>> Sent: Tuesday, March 29, 2016 6:01 PM
>>> To: Clang Dev; llvm-dev
>>> Subject: [cfe-dev] RFC: Up front type information generation in
clang
and llvm>>>
>>>
>>>
>>> Hi All,
>>>
>>>
>>>
>>> This is something that's been talked about for some time and
it's
probably time to propose it.>>>
>>>
>>>
>>> The "We" in this document is everyone on the cc line plus
me.
>>>
>>>
>>>
>>> Please go ahead and take a look.
>>>
>>>
>>>
>>> Thanks!
>>>
>>>
>>>
>>> -eric
>>>
>>>
>>>
>>>
>>>
>>> Objective (and TL;DR)
>>>
>>> ================>>>
>>>
>>>
>>> Migrate debug type information generation from the backends to the
front end.>>>
>>>
>>>
>>> This will enable:
>>>
>>> 1. Separation of concerns and maintainability: LLVM shouldn’t have
toknow about C preprocessor macros, Obj-C properties, or extensive details
about debug information binary formats.>>>
>>> 2. Performance: Skipping a serialization should speed up normal
compilations.>>>
>>> 3. Memory usage: The DI metadata structures are smaller than they
were,
but are still fairly large and pointer heavy.>>>
>>>
>>>
>>> Motivation
>>>
>>> =======>>>
>>>
>>>
>>> Currently, types in LLVM debug info are described by the DIType
classhierarchy. This hierarchy evolved organically from a more flexible
sea-of-nodes representation into what it is today - a large, only somewhat
format neutral representation of debug types. Making this more format
neutral will only increase the memory use - and for no reason as type
information is static (or nearly so). Debug formats already have a memory
efficient serialization, their own binary format so we should support a
front end emitting type information with sufficient representation to allow
the backend to emit debug information based on the more normal IR features:
functions, scopes, variables, etc.>>>
>>>
>>>
>>> Scope/Impact
>>>
>>> ==========>>>
>>>
>>>
>>> This is going to involve large scale changes across both LLVM andclang. This will also affect any out-of-tree front ends, however, we expect
the impact to be on the order of a large API change rather than needing
massive infrastructure changes.>>>
>>>
>>>
>>> Related work
>>>
>>> =========>>>
>>>
>>>
>>> This is related to the efforts to support CodeView in LLVM and
clang aswell as efforts to reduce overall memory consumption when compiling with
debug information enabled;  in particular efforts to prune LTO memory
usage.>>>
>>>
>>>
>>>
>>>
>>> Concerns
>>>
>>> =======>>>
>>>
>>>
>>>
>>>
>>> We need a good story for transitioning all the debug info testcases
inthe backend without giving up coverage and/or readability. David believes
he has a plan here.>>>
>>>
>>>
>>> Proposal
>>>
>>> ======>>>
>>>
>>>
>>> Short version
>>>
>>> -----------------
>>>
>>>
>>>
>>> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and
Line
Table.>>>
>>> 2. Split the clang CGDebugInfo API into Types and Line Table to
match.
>>>
>>> 3. Add a LLVM DWARF emission library similar to the existing
CodeView
one.>>>
>>> 4. Migrate the Types API into a clang internal API taking clang ASTstructures and use the LLVM binary emission libraries to produce type
information.>>>
>>> 5. Remove the old binary emission out of LLVM.
>>>
>>>
>>>
>>>
>>>
>>> Questions/Thoughts/Elaboration
>>>
>>> -------------------------------------------
>>>
>>>
>>>
>>> Splitting the DIBuilder API
>>>
>>> ~~~~~~~~~~~~~~~~~~~~
>>>
>>> Will DISubprogram be part of both?
>>>
>>>    * We should split it in two: Full declarations with type and a
slimmed down version with an abstract origin.>>>
>>>
>>>
>>> How will we reference types in the DWARF blob?
>>>
>>>    * ODR types can be referenced by name
>>>
>>>    * Non-odr types by full DWARF hash
>>>
>>>    * Each type can be a pair(tuple) of identifier (DITypeRef today)
and
blob.>>>
>>>    * For < DWARF4 we can emit each type as a unit, but not a
DWARF TypeUnit and use references and module relocations for the offsets. (See
below)>>>
>>>
>>>
>>> How will we handle references in DWARF2 or global relocations for
non-type template parameters?>>>
>>>    * We can use a “relocation” metadata as part of the format.
>>>
>>>    * Representable as a tuple that has the DIType and the offset
withinthe DIBlob as where to write the final relocation/offset for the reference
at emission time.>>>
>>>
>>>
>>> Why break up the types at all?
>>>
>>>    * To enable non-debug format aware linking and type uniquing for
LTOthat won’t be huge in size. We break up the types so we don’t need to parse
debug information to link two modules together
efficiently.>>>
>>>
>>>
>>> Any other concerns there?
>>>
>>>    * Debug information without type units might be slightly larger
inthis scheme due to parents being duplicated (declarations and abstract
origin, not full parents). It may be possible to extend dsymutil/etc to
merge all siblings into a common parent. Open question for better ways to
solve this.>>>
>>>
>>>
>>> How should we handle DWARF5/Apple Accelerator Tables?
>>>
>>>    * Thoughts:
>>>
>>>    * We can parse the dwarf in the back end and generate them.
>>>
>>>    * We can emit in the front end for the base case of non-LTO
(with
help from the backend for relocation aspects).>>>
>>>    * We can use dsymutil on LTO debug information to generate them.
>>>
>>>
>>>
>>> Why isn’t this a more detailed spec?
>>>
>>>    * Mostly because we’ve thought about the issues, but we can’t
plan
for everything during implementation.>>>
>>>
>>>
>>>
>>>
>>> Future work
>>>
>>> ----------------
>>>
>>>
>>>
>>> Not contained as part of this, but an obvious future direction is
thatthe Module linker could grow support for debug aware linking. Then we can
have all of the type information for a single translation unit in a single
blob and use the debug aware linking to handle merging
types.>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160331/8238d5fb/attachment.html>

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Apr 2016 - [cfe-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

Seemingly Similar Threads