thr3ads.net - llvm dev - [llvm-dev] RFC: Up front type information generation in clang and llvm [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Eric Christopher via llvm-dev

2016-Mar-30 01:00 UTC

[llvm-dev] RFC: Up front type information generation in clang and llvm

Hi All,

This is something that's been talked about for some time and it's
probably
time to propose it.

The "We" in this document is everyone on the cc line plus me.

Please go ahead and take a look.

Thanks!

-eric


Objective (and TL;DR)
================
Migrate debug type information generation from the backends to the front
end.

This will enable:
1. Separation of concerns and maintainability: LLVM shouldn’t have to know
about C preprocessor macros, Obj-C properties, or extensive details about
debug information binary formats.
2. Performance: Skipping a serialization should speed up normal
compilations.
3. Memory usage: The DI metadata structures are smaller than they were, but
are still fairly large and pointer heavy.

Motivation
=======
Currently, types in LLVM debug info are described by the DIType class
hierarchy. This hierarchy evolved organically from a more flexible
sea-of-nodes representation into what it is today - a large, only somewhat
format neutral representation of debug types. Making this more format
neutral will only increase the memory use - and for no reason as type
information is static (or nearly so). Debug formats already have a memory
efficient serialization, their own binary format so we should support a
front end emitting type information with sufficient representation to allow
the backend to emit debug information based on the more normal IR features:
functions, scopes, variables, etc.

Scope/Impact
==========
This is going to involve large scale changes across both LLVM and clang.
This will also affect any out-of-tree front ends, however, we expect the
impact to be on the order of a large API change rather than needing massive
infrastructure changes.

Related work
=========
This is related to the efforts to support CodeView in LLVM and clang as
well as efforts to reduce overall memory consumption when compiling with
debug information enabled;  in particular efforts to prune LTO memory usage.


Concerns
=======

We need a good story for transitioning all the debug info testcases in the
backend without giving up coverage and/or readability. David believes he
has a plan here.

Proposal
======
Short version
-----------------

1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line Table.
2. Split the clang CGDebugInfo API into Types and Line Table to match.
3. Add a LLVM DWARF emission library similar to the existing CodeView one.
4. Migrate the Types API into a clang internal API taking clang AST
structures and use the LLVM binary emission libraries to produce type
information.
5. Remove the old binary emission out of LLVM.


Questions/Thoughts/Elaboration
-------------------------------------------

Splitting the DIBuilder API
~~~~~~~~~~~~~~~~~~~~
Will DISubprogram be part of both?
   * We should split it in two: Full declarations with type and a slimmed
down version with an abstract origin.

How will we reference types in the DWARF blob?
   * ODR types can be referenced by name
   * Non-odr types by full DWARF hash
   * Each type can be a pair(tuple) of identifier (DITypeRef today) and
blob.
   * For < DWARF4 we can emit each type as a unit, but not a DWARF Type
Unit and use references and module relocations for the offsets. (See below)

How will we handle references in DWARF2 or global relocations for non-type
template parameters?
   * We can use a “relocation” metadata as part of the format.
   * Representable as a tuple that has the DIType and the offset within the
DIBlob as where to write the final relocation/offset for the reference at
emission time.

Why break up the types at all?
   * To enable non-debug format aware linking and type uniquing for LTO
that won’t be huge in size. We break up the types so we don’t need to parse
debug information to link two modules together efficiently.

Any other concerns there?
   * Debug information without type units might be slightly larger in this
scheme due to parents being duplicated (declarations and abstract origin,
not full parents). It may be possible to extend dsymutil/etc to merge all
siblings into a common parent. Open question for better ways to solve this.

How should we handle DWARF5/Apple Accelerator Tables?
   * Thoughts:
   * We can parse the dwarf in the back end and generate them.
   * We can emit in the front end for the base case of non-LTO (with help
from the backend for relocation aspects).
   * We can use dsymutil on LTO debug information to generate them.

Why isn’t this a more detailed spec?
   * Mostly because we’ve thought about the issues, but we can’t plan for
everything during implementation.


Future work
----------------

Not contained as part of this, but an obvious future direction is that the
Module linker could grow support for debug aware linking. Then we can have
all of the type information for a single translation unit in a single blob
and use the debug aware linking to handle merging types.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160330/fa4aa7a5/attachment.html>

Eric Christopher via llvm-dev

2016-Mar-30 01:03 UTC

head link

[llvm-dev] RFC: Up front type information generation in clang and llvm

(To be clear: Reid, Adrian, Duncan, Dave, and myself.)

On Tue, Mar 29, 2016 at 6:00 PM Eric Christopher <echristo at gmail.com>
wrote:
> Hi All,
>
> This is something that's been talked about for some time and it's
probably
> time to propose it.
>
> The "We" in this document is everyone on the cc line plus me.
>
> Please go ahead and take a look.
>
> Thanks!
>
> -eric
>
>
> Objective (and TL;DR)
> ================>
> Migrate debug type information generation from the backends to the front
> end.
>
> This will enable:
> 1. Separation of concerns and maintainability: LLVM shouldn’t have to know
> about C preprocessor macros, Obj-C properties, or extensive details about
> debug information binary formats.
> 2. Performance: Skipping a serialization should speed up normal
> compilations.
> 3. Memory usage: The DI metadata structures are smaller than they were,
> but are still fairly large and pointer heavy.
>
> Motivation
> =======>
> Currently, types in LLVM debug info are described by the DIType class
> hierarchy. This hierarchy evolved organically from a more flexible
> sea-of-nodes representation into what it is today - a large, only somewhat
> format neutral representation of debug types. Making this more format
> neutral will only increase the memory use - and for no reason as type
> information is static (or nearly so). Debug formats already have a memory
> efficient serialization, their own binary format so we should support a
> front end emitting type information with sufficient representation to allow
> the backend to emit debug information based on the more normal IR features:
> functions, scopes, variables, etc.
>
> Scope/Impact
> ==========>
> This is going to involve large scale changes across both LLVM and clang.
> This will also affect any out-of-tree front ends, however, we expect the
> impact to be on the order of a large API change rather than needing massive
> infrastructure changes.
>
> Related work
> =========>
> This is related to the efforts to support CodeView in LLVM and clang as
> well as efforts to reduce overall memory consumption when compiling with
> debug information enabled;  in particular efforts to prune LTO memory
usage.
>
>
> Concerns
> =======>
>
> We need a good story for transitioning all the debug info testcases in the
> backend without giving up coverage and/or readability. David believes he
> has a plan here.
>
> Proposal
> ======>
> Short version
> -----------------
>
> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line Table.
> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
> 3. Add a LLVM DWARF emission library similar to the existing CodeView one.
> 4. Migrate the Types API into a clang internal API taking clang AST
> structures and use the LLVM binary emission libraries to produce type
> information.
> 5. Remove the old binary emission out of LLVM.
>
>
> Questions/Thoughts/Elaboration
> -------------------------------------------
>
> Splitting the DIBuilder API
> ~~~~~~~~~~~~~~~~~~~~
> Will DISubprogram be part of both?
>    * We should split it in two: Full declarations with type and a slimmed
> down version with an abstract origin.
>
> How will we reference types in the DWARF blob?
>    * ODR types can be referenced by name
>    * Non-odr types by full DWARF hash
>    * Each type can be a pair(tuple) of identifier (DITypeRef today) and
> blob.
>    * For < DWARF4 we can emit each type as a unit, but not a DWARF Type
> Unit and use references and module relocations for the offsets. (See below)
>
> How will we handle references in DWARF2 or global relocations for non-type
> template parameters?
>    * We can use a “relocation” metadata as part of the format.
>    * Representable as a tuple that has the DIType and the offset within
> the DIBlob as where to write the final relocation/offset for the reference
> at emission time.
>
> Why break up the types at all?
>    * To enable non-debug format aware linking and type uniquing for LTO
> that won’t be huge in size. We break up the types so we don’t need to parse
> debug information to link two modules together efficiently.
>
> Any other concerns there?
>    * Debug information without type units might be slightly larger in this
> scheme due to parents being duplicated (declarations and abstract origin,
> not full parents). It may be possible to extend dsymutil/etc to merge all
> siblings into a common parent. Open question for better ways to solve this.
>
> How should we handle DWARF5/Apple Accelerator Tables?
>    * Thoughts:
>    * We can parse the dwarf in the back end and generate them.
>    * We can emit in the front end for the base case of non-LTO (with help
> from the backend for relocation aspects).
>    * We can use dsymutil on LTO debug information to generate them.
>
> Why isn’t this a more detailed spec?
>    * Mostly because we’ve thought about the issues, but we can’t plan for
> everything during implementation.
>
>
> Future work
> ----------------
>
> Not contained as part of this, but an obvious future direction is that the
> Module linker could grow support for debug aware linking. Then we can have
> all of the type information for a single translation unit in a single blob
> and use the debug aware linking to handle merging types.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160330/dde38750/attachment.html>

Peter Collingbourne via llvm-dev

2016-Mar-30 02:31 UTC

head link

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

Thanks for sharing this. Mostly seems like a reasonable plan to me. A few
comments below.

On Tue, Mar 29, 2016 at 6:00 PM, Eric Christopher via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> Hi All,
>
> This is something that's been talked about for some time and it's
probably
> time to propose it.
>
> The "We" in this document is everyone on the cc line plus me.
>
> Please go ahead and take a look.
>
> Thanks!
>
> -eric
>
>
> Objective (and TL;DR)
> ================>
> Migrate debug type information generation from the backends to the front
> end.
>
> This will enable:
> 1. Separation of concerns and maintainability: LLVM shouldn’t have to know
> about C preprocessor macros, Obj-C properties, or extensive details about
> debug information binary formats.
> 2. Performance: Skipping a serialization should speed up normal
> compilations.
> 3. Memory usage: The DI metadata structures are smaller than they were,
> but are still fairly large and pointer heavy.
>
> Motivation
> =======>
> Currently, types in LLVM debug info are described by the DIType class
> hierarchy. This hierarchy evolved organically from a more flexible
> sea-of-nodes representation into what it is today - a large, only somewhat
> format neutral representation of debug types. Making this more format
> neutral will only increase the memory use - and for no reason as type
> information is static (or nearly so). Debug formats already have a memory
> efficient serialization, their own binary format so we should support a
> front end emitting type information with sufficient representation to allow
> the backend to emit debug information based on the more normal IR features:
> functions, scopes, variables, etc.
>
> Scope/Impact
> ==========>
> This is going to involve large scale changes across both LLVM and clang.
> This will also affect any out-of-tree front ends, however, we expect the
> impact to be on the order of a large API change rather than needing massive
> infrastructure changes.
>
> Related work
> =========>
> This is related to the efforts to support CodeView in LLVM and clang as
> well as efforts to reduce overall memory consumption when compiling with
> debug information enabled;  in particular efforts to prune LTO memory
usage.
>
>
> Concerns
> =======>
>
> We need a good story for transitioning all the debug info testcases in the
> backend without giving up coverage and/or readability. David believes he
> has a plan here.
>
> Proposal
> ======>
> Short version
> -----------------
>
> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line Table.
> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
> 3. Add a LLVM DWARF emission library similar to the existing CodeView one.
> 4. Migrate the Types API into a clang internal API taking clang AST
> structures and use the LLVM binary emission libraries to produce type
> information.
> 5. Remove the old binary emission out of LLVM.
>
>
> Questions/Thoughts/Elaboration
> -------------------------------------------
>
> Splitting the DIBuilder API
> ~~~~~~~~~~~~~~~~~~~~
> Will DISubprogram be part of both?
>    * We should split it in two: Full declarations with type and a slimmed
> down version with an abstract origin.
>
> How will we reference types in the DWARF blob?
>    * ODR types can be referenced by name
>    * Non-odr types by full DWARF hash
>    * Each type can be a pair(tuple) of identifier (DITypeRef today) and
> blob.
>    * For < DWARF4 we can emit each type as a unit, but not a DWARF Type
> Unit and use references and module relocations for the offsets. (See below)
>
> How will we handle references in DWARF2 or global relocations for non-type
> template parameters?
>    * We can use a “relocation” metadata as part of the format.
>    * Representable as a tuple that has the DIType and the offset within
> the DIBlob as where to write the final relocation/offset for the reference
> at emission time.
>
> Why break up the types at all?
>    * To enable non-debug format aware linking and type uniquing for LTO
> that won’t be huge in size. We break up the types so we don’t need to parse
> debug information to link two modules together efficiently.
>
How do you plan to handle abbreviations? You wouldn't necessarily be able
to embed them directly in the blob, as when doing LTO each compilation unit
would have its own set of abbreviations. I suppose you could do something
like treat them as a special sort of reference to an abbreviation table
entry, or maybe pre-allocate in the frontend (but would complicate
cross-frontend LTO) but curious what you have in mind.

Any other concerns there?>    * Debug information without type units might be slightly larger in this
> scheme due to parents being duplicated (declarations and abstract origin,
> not full parents). It may be possible to extend dsymutil/etc to merge all
> siblings into a common parent. Open question for better ways to solve this.
>
When we were thinking about teaching the backend to produce blobs from IR
metadata we were thinking about cases where the debug info emitter would
discover special member functions during IR traversal. I guess since we're
moving all of that to the frontend we can just ask the frontend directly
which special members are needed on the class. That solves the problem for
a single translation unit. But what do you plan to do in the multiple
translation unit case where two TUs declare different special members on a
class? Would it be fine to just emit the two definitions and let the
debugger sort it out? I guess this is the type of thing that debuggers
normally deal with in the non-LTO case, so I suppose so?

> How should we handle DWARF5/Apple Accelerator Tables?
>    * Thoughts:
>    * We can parse the dwarf in the back end and generate them.
>    * We can emit in the front end for the base case of non-LTO (with help
> from the backend for relocation aspects).
>    * We can use dsymutil on LTO debug information to generate them.
>
> Why isn’t this a more detailed spec?
>    * Mostly because we’ve thought about the issues, but we can’t plan for
> everything during implementation.
>
>
> Future work
> ----------------
>
> Not contained as part of this, but an obvious future direction is that the
> Module linker could grow support for debug aware linking. Then we can have
> all of the type information for a single translation unit in a single blob
> and use the debug aware linking to handle merging types.
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>

-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160329/592080ef/attachment.html>

Eric Christopher via llvm-dev

2016-Mar-30 02:43 UTC

head link

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

On Tue, Mar 29, 2016 at 7:31 PM Peter Collingbourne <peter at pcc.me.uk>
wrote:
> Thanks for sharing this. Mostly seems like a reasonable plan to me. A few
> comments below.
>
>Thanks Peter!

> On Tue, Mar 29, 2016 at 6:00 PM, Eric Christopher via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> Hi All,
>>
>> This is something that's been talked about for some time and
it's
>> probably time to propose it.
>>
>> The "We" in this document is everyone on the cc line plus me.
>>
>> Please go ahead and take a look.
>>
>> Thanks!
>>
>> -eric
>>
>>
>> Objective (and TL;DR)
>> ================>>
>> Migrate debug type information generation from the backends to the
front
>> end.
>>
>> This will enable:
>> 1. Separation of concerns and maintainability: LLVM shouldn’t have to
>> know about C preprocessor macros, Obj-C properties, or extensive
details
>> about debug information binary formats.
>> 2. Performance: Skipping a serialization should speed up normal
>> compilations.
>> 3. Memory usage: The DI metadata structures are smaller than they were,
>> but are still fairly large and pointer heavy.
>>
>> Motivation
>> =======>>
>> Currently, types in LLVM debug info are described by the DIType class
>> hierarchy. This hierarchy evolved organically from a more flexible
>> sea-of-nodes representation into what it is today - a large, only
somewhat
>> format neutral representation of debug types. Making this more format
>> neutral will only increase the memory use - and for no reason as type
>> information is static (or nearly so). Debug formats already have a
memory
>> efficient serialization, their own binary format so we should support a
>> front end emitting type information with sufficient representation to
allow
>> the backend to emit debug information based on the more normal IR
features:
>> functions, scopes, variables, etc.
>>
>> Scope/Impact
>> ==========>>
>> This is going to involve large scale changes across both LLVM and
clang.
>> This will also affect any out-of-tree front ends, however, we expect
the
>> impact to be on the order of a large API change rather than needing
massive
>> infrastructure changes.
>>
>> Related work
>> =========>>
>> This is related to the efforts to support CodeView in LLVM and clang as
>> well as efforts to reduce overall memory consumption when compiling
with
>> debug information enabled;  in particular efforts to prune LTO memory
usage.
>>
>>
>> Concerns
>> =======>>
>>
>> We need a good story for transitioning all the debug info testcases in
>> the backend without giving up coverage and/or readability. David
believes
>> he has a plan here.
>>
>> Proposal
>> ======>>
>> Short version
>> -----------------
>>
>> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line
>> Table.
>> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
>> 3. Add a LLVM DWARF emission library similar to the existing CodeView
one.
>> 4. Migrate the Types API into a clang internal API taking clang AST
>> structures and use the LLVM binary emission libraries to produce type
>> information.
>> 5. Remove the old binary emission out of LLVM.
>>
>>
>> Questions/Thoughts/Elaboration
>> -------------------------------------------
>>
>> Splitting the DIBuilder API
>> ~~~~~~~~~~~~~~~~~~~~
>> Will DISubprogram be part of both?
>>    * We should split it in two: Full declarations with type and a
slimmed
>> down version with an abstract origin.
>>
>> How will we reference types in the DWARF blob?
>>    * ODR types can be referenced by name
>>    * Non-odr types by full DWARF hash
>>    * Each type can be a pair(tuple) of identifier (DITypeRef today) and
>> blob.
>>    * For < DWARF4 we can emit each type as a unit, but not a DWARF
Type
>> Unit and use references and module relocations for the offsets. (See
below)
>>
>> How will we handle references in DWARF2 or global relocations for
>> non-type template parameters?
>>    * We can use a “relocation” metadata as part of the format.
>>    * Representable as a tuple that has the DIType and the offset within
>> the DIBlob as where to write the final relocation/offset for the
reference
>> at emission time.
>>
>> Why break up the types at all?
>>    * To enable non-debug format aware linking and type uniquing for LTO
>> that won’t be huge in size. We break up the types so we don’t need to
parse
>> debug information to link two modules together efficiently.
>>
>
> How do you plan to handle abbreviations? You wouldn't necessarily be
able
> to embed them directly in the blob, as when doing LTO each compilation unit
> would have its own set of abbreviations. I suppose you could do something
> like treat them as a special sort of reference to an abbreviation table
> entry, or maybe pre-allocate in the frontend (but would complicate
> cross-frontend LTO) but curious what you have in mind.
>
Thanks for reminding me, I knew I was forgetting something I'd talked about
when writing all of this down. :)

Basically to handle abbreviations you can do them the similarly to types by
creating a blob with an index/hash/etc and then reference that as part of
the type tuple, e.g.:

$1 = { DIAbbrev: 0x1234, DIBlob: <blah> }
$2 = { DIType: <ID>, DIAbbrev: $1, DIBlob: <blah> }

and keep them uniqued during emission and remember to merge these as well
during module merge time.

>
> Any other concerns there?
>>    * Debug information without type units might be slightly larger in
>> this scheme due to parents being duplicated (declarations and abstract
>> origin, not full parents). It may be possible to extend dsymutil/etc to
>> merge all siblings into a common parent. Open question for better ways
to
>> solve this.
>>
>
> When we were thinking about teaching the backend to produce blobs from IR
> metadata we were thinking about cases where the debug info emitter would
> discover special member functions during IR traversal. I guess since
we're
> moving all of that to the frontend we can just ask the frontend directly
> which special members are needed on the class. That solves the problem for
> a single translation unit. But what do you plan to do in the multiple
> translation unit case where two TUs declare different special members on a
> class? Would it be fine to just emit the two definitions and let the
> debugger sort it out? I guess this is the type of thing that debuggers
> normally deal with in the non-LTO case, so I suppose so?
>
Pretty much. This is one area where I have... disagreements with the DWARF
committee and I don't think there's anything else we can do here. TBH
right
now I think we'd have issues with type units and special member functions
since we're using ODR-ness to unique.

-eric

>
>
>> How should we handle DWARF5/Apple Accelerator Tables?
>>    * Thoughts:
>>    * We can parse the dwarf in the back end and generate them.
>>    * We can emit in the front end for the base case of non-LTO (with
help
>> from the backend for relocation aspects).
>>    * We can use dsymutil on LTO debug information to generate them.
>>
>> Why isn’t this a more detailed spec?
>>    * Mostly because we’ve thought about the issues, but we can’t plan
for
>> everything during implementation.
>>
>>
>> Future work
>> ----------------
>>
>> Not contained as part of this, but an obvious future direction is that
>> the Module linker could grow support for debug aware linking. Then we
can
>> have all of the type information for a single translation unit in a
single
>> blob and use the debug aware linking to handle merging types.
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
>
>
> --
> --
> Peter
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160330/d0b6f8d1/attachment-0001.html>

Robinson, Paul via llvm-dev

2016-Mar-30 06:20 UTC

head link

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

Skipping a serialization and doing something clever about LTO uniquing sounds
awesome.  I'm guessing you achieve this by extracting types out of DI
metadata and packaging them as lumps-o-DWARF that the back-end can then paste
together?  Reading between the lines a bit here.
Can you share data about how much "pure" types dominate the size of
debug info?  Or at least the current metadata scheme?  (Channeling Sean Silva
here: show me the data!)  Does this hold for C as well as C++?
Not much discussion of data objects and code objects (other than concrete
subprograms), is that because they basically aren't changing?  Still defined
in the metadata and still managed/emitted by the back-end?
Please say something about types (which you're thinking of as a front-end
thing) defined within scopes (which it looks like you're thinking of as a
back-end thing).  Not seeing how to get the scoping right.

Thanks!
--paulr

From: cfe-dev [mailto:cfe-dev-bounces at lists.llvm.org] On Behalf Of Eric
Christopher via cfe-dev
Sent: Tuesday, March 29, 2016 6:01 PM
To: Clang Dev; llvm-dev
Subject: [cfe-dev] RFC: Up front type information generation in clang and llvm

Hi All,

This is something that's been talked about for some time and it's
probably time to propose it.

The "We" in this document is everyone on the cc line plus me.

Please go ahead and take a look.

Thanks!

-eric


Objective (and TL;DR)
================
Migrate debug type information generation from the backends to the front end.

This will enable:
1. Separation of concerns and maintainability: LLVM shouldn’t have to know about
C preprocessor macros, Obj-C properties, or extensive details about debug
information binary formats.
2. Performance: Skipping a serialization should speed up normal compilations.
3. Memory usage: The DI metadata structures are smaller than they were, but are
still fairly large and pointer heavy.

Motivation
=======
Currently, types in LLVM debug info are described by the DIType class hierarchy.
This hierarchy evolved organically from a more flexible sea-of-nodes
representation into what it is today - a large, only somewhat format neutral
representation of debug types. Making this more format neutral will only
increase the memory use - and for no reason as type information is static (or
nearly so). Debug formats already have a memory efficient serialization, their
own binary format so we should support a front end emitting type information
with sufficient representation to allow the backend to emit debug information
based on the more normal IR features: functions, scopes, variables, etc.

Scope/Impact
==========
This is going to involve large scale changes across both LLVM and clang. This
will also affect any out-of-tree front ends, however, we expect the impact to be
on the order of a large API change rather than needing massive infrastructure
changes.

Related work
=========
This is related to the efforts to support CodeView in LLVM and clang as well as
efforts to reduce overall memory consumption when compiling with debug
information enabled;  in particular efforts to prune LTO memory usage.


Concerns
=======

We need a good story for transitioning all the debug info testcases in the
backend without giving up coverage and/or readability. David believes he has a
plan here.

Proposal
======
Short version
-----------------

1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line Table.
2. Split the clang CGDebugInfo API into Types and Line Table to match.
3. Add a LLVM DWARF emission library similar to the existing CodeView one.
4. Migrate the Types API into a clang internal API taking clang AST structures
and use the LLVM binary emission libraries to produce type information.
5. Remove the old binary emission out of LLVM.


Questions/Thoughts/Elaboration
-------------------------------------------

Splitting the DIBuilder API
~~~~~~~~~~~~~~~~~~~~
Will DISubprogram be part of both?
   * We should split it in two: Full declarations with type and a slimmed down
version with an abstract origin.

How will we reference types in the DWARF blob?
   * ODR types can be referenced by name
   * Non-odr types by full DWARF hash
   * Each type can be a pair(tuple) of identifier (DITypeRef today) and blob.
   * For < DWARF4 we can emit each type as a unit, but not a DWARF Type Unit
and use references and module relocations for the offsets. (See below)

How will we handle references in DWARF2 or global relocations for non-type
template parameters?
   * We can use a “relocation” metadata as part of the format.
   * Representable as a tuple that has the DIType and the offset within the
DIBlob as where to write the final relocation/offset for the reference at
emission time.

Why break up the types at all?
   * To enable non-debug format aware linking and type uniquing for LTO that
won’t be huge in size. We break up the types so we don’t need to parse debug
information to link two modules together efficiently.

Any other concerns there?
   * Debug information without type units might be slightly larger in this
scheme due to parents being duplicated (declarations and abstract origin, not
full parents). It may be possible to extend dsymutil/etc to merge all siblings
into a common parent. Open question for better ways to solve this.

How should we handle DWARF5/Apple Accelerator Tables?
   * Thoughts:
   * We can parse the dwarf in the back end and generate them.
   * We can emit in the front end for the base case of non-LTO (with help from
the backend for relocation aspects).
   * We can use dsymutil on LTO debug information to generate them.

Why isn’t this a more detailed spec?
   * Mostly because we’ve thought about the issues, but we can’t plan for
everything during implementation.


Future work
----------------

Not contained as part of this, but an obvious future direction is that the
Module linker could grow support for debug aware linking. Then we can have all
of the type information for a single translation unit in a single blob and use
the debug aware linking to handle merging types.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160330/46e4db36/attachment-0001.html>

Eric Christopher via llvm-dev

2016-Mar-30 06:50 UTC

head link

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

On Tue, Mar 29, 2016 at 11:20 PM Robinson, Paul <
Paul_Robinson at playstation.sony.com> wrote:
> Skipping a serialization and doing something clever about LTO uniquing
> sounds awesome.  I'm guessing you achieve this by extracting types out
of
> DI metadata and packaging them as lumps-o-DWARF that the back-end can then
> paste together?  Reading between the lines a bit here.
>
>Pretty much, yes.

> Can you share data about how much "pure" types dominate the size
of debug
> info?  Or at least the current metadata scheme?  (Channeling Sean Silva
> here: show me the data!)  Does this hold for C as well as C++?
>They're huge. It's ridiculous. Take a look at the size of the metadata
and
then the size of the stuff we put in there versus dwarf.

And yes, it also trivially holds for C.

> Not much discussion of data objects and code objects (other than concrete
> subprograms), is that because they basically aren't changing?  Still
> defined in the metadata and still managed/emitted by the back-end?
>
Yep. A way of looking at it is more that it is related to things in the IR
and so needs IR to represent it.

> Please say something about types (which you're thinking of as a
front-end
> thing) defined within scopes (which it looks like you're thinking of as
a
> back-end thing).  Not seeing how to get the scoping right.
>
>
>
Basic idea is non-defining declarations holding types and be the abstract
origin for the concrete function? Honestly, I wish they were type unitable
at the moment, but that might be something to look into. The current plan
at least. This will make some debug info a little bit larger, but only for
things like nested types where we need to throw an extra declaration (i.e.
the same sorts of places that type units make things larger).

At any rate, the first thing is to get the APIs split anyhow.

-eric

> Thanks!
>
> --paulr
>
>
>
> *From:* cfe-dev [mailto:cfe-dev-bounces at lists.llvm.org] *On Behalf Of
*Eric
> Christopher via cfe-dev
> *Sent:* Tuesday, March 29, 2016 6:01 PM
> *To:* Clang Dev; llvm-dev
> *Subject:* [cfe-dev] RFC: Up front type information generation in clang
> and llvm
>
>
>
> Hi All,
>
>
>
> This is something that's been talked about for some time and it's
probably
> time to propose it.
>
>
>
> The "We" in this document is everyone on the cc line plus me.
>
>
>
> Please go ahead and take a look.
>
>
>
> Thanks!
>
>
>
> -eric
>
>
>
>
>
> Objective (and TL;DR)
>
> ================>
>
>
> Migrate debug type information generation from the backends to the front
> end.
>
>
>
> This will enable:
>
> 1. Separation of concerns and maintainability: LLVM shouldn’t have to know
> about C preprocessor macros, Obj-C properties, or extensive details about
> debug information binary formats.
>
> 2. Performance: Skipping a serialization should speed up normal
> compilations.
>
> 3. Memory usage: The DI metadata structures are smaller than they were,
> but are still fairly large and pointer heavy.
>
>
>
> Motivation
>
> =======>
>
>
> Currently, types in LLVM debug info are described by the DIType class
> hierarchy. This hierarchy evolved organically from a more flexible
> sea-of-nodes representation into what it is today - a large, only somewhat
> format neutral representation of debug types. Making this more format
> neutral will only increase the memory use - and for no reason as type
> information is static (or nearly so). Debug formats already have a memory
> efficient serialization, their own binary format so we should support a
> front end emitting type information with sufficient representation to allow
> the backend to emit debug information based on the more normal IR features:
> functions, scopes, variables, etc.
>
>
>
> Scope/Impact
>
> ==========>
>
>
> This is going to involve large scale changes across both LLVM and clang.
> This will also affect any out-of-tree front ends, however, we expect the
> impact to be on the order of a large API change rather than needing massive
> infrastructure changes.
>
>
>
> Related work
>
> =========>
>
>
> This is related to the efforts to support CodeView in LLVM and clang as
> well as efforts to reduce overall memory consumption when compiling with
> debug information enabled;  in particular efforts to prune LTO memory
usage.
>
>
>
>
>
> Concerns
>
> =======>
>
>
>
>
> We need a good story for transitioning all the debug info testcases in the
> backend without giving up coverage and/or readability. David believes he
> has a plan here.
>
>
>
> Proposal
>
> ======>
>
>
> Short version
>
> -----------------
>
>
>
> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line Table.
>
> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
>
> 3. Add a LLVM DWARF emission library similar to the existing CodeView one.
>
> 4. Migrate the Types API into a clang internal API taking clang AST
> structures and use the LLVM binary emission libraries to produce type
> information.
>
> 5. Remove the old binary emission out of LLVM.
>
>
>
>
>
> Questions/Thoughts/Elaboration
>
> -------------------------------------------
>
>
>
> Splitting the DIBuilder API
>
> ~~~~~~~~~~~~~~~~~~~~
>
> Will DISubprogram be part of both?
>
>    * We should split it in two: Full declarations with type and a slimmed
> down version with an abstract origin.
>
>
>
> How will we reference types in the DWARF blob?
>
>    * ODR types can be referenced by name
>
>    * Non-odr types by full DWARF hash
>
>    * Each type can be a pair(tuple) of identifier (DITypeRef today) and
> blob.
>
>    * For < DWARF4 we can emit each type as a unit, but not a DWARF Type
> Unit and use references and module relocations for the offsets. (See below)
>
>
>
> How will we handle references in DWARF2 or global relocations for non-type
> template parameters?
>
>    * We can use a “relocation” metadata as part of the format.
>
>    * Representable as a tuple that has the DIType and the offset within
> the DIBlob as where to write the final relocation/offset for the reference
> at emission time.
>
>
>
> Why break up the types at all?
>
>    * To enable non-debug format aware linking and type uniquing for LTO
> that won’t be huge in size. We break up the types so we don’t need to parse
> debug information to link two modules together efficiently.
>
>
>
> Any other concerns there?
>
>    * Debug information without type units might be slightly larger in this
> scheme due to parents being duplicated (declarations and abstract origin,
> not full parents). It may be possible to extend dsymutil/etc to merge all
> siblings into a common parent. Open question for better ways to solve this.
>
>
>
> How should we handle DWARF5/Apple Accelerator Tables?
>
>    * Thoughts:
>
>    * We can parse the dwarf in the back end and generate them.
>
>    * We can emit in the front end for the base case of non-LTO (with help
> from the backend for relocation aspects).
>
>    * We can use dsymutil on LTO debug information to generate them.
>
>
>
> Why isn’t this a more detailed spec?
>
>    * Mostly because we’ve thought about the issues, but we can’t plan for
> everything during implementation.
>
>
>
>
>
> Future work
>
> ----------------
>
>
>
> Not contained as part of this, but an obvious future direction is that the
> Module linker could grow support for debug aware linking. Then we can have
> all of the type information for a single translation unit in a single blob
> and use the debug aware linking to handle merging types.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160330/596ff641/attachment.html>

Adrian Prantl via llvm-dev

2016-Mar-31 17:00 UTC

head link

[llvm-dev] RFC: Up front type information generation in clang and llvm

> On Mar 29, 2016, at 6:00 PM, Eric Christopher <echristo at gmail.com>
wrote:
> 
> Hi All,
> 
> This is something that's been talked about for some time and it's
probably time to propose it.
> 
> The "We" in this document is everyone on the cc line plus me.
> 
> Please go ahead and take a look.
> 
> Thanks!
> 
> -eric
> Objective (and TL;DR)
> ================> 
> Migrate debug type information generation from the backends to the front
end.
> 
> This will enable:
> 1. Separation of concerns and maintainability: LLVM shouldn’t have to know
about C preprocessor macros, Obj-C properties, or extensive details about debug
information binary formats.
This is a bit of an overstatement: This proposal is only about the debug *type*
information. The back end still needs to know about line tables, subprograms,
etc., and in order to produce the Apple/DWARF5 accelerator tables it will even
need to have a basic understanding of the type info.
> 2. Performance: Skipping a serialization should speed up normal
compilations.
> 3. Memory usage: The DI metadata structures are smaller than they were, but
are still fairly large and pointer heavy.
We should back up this claim with some numbers, but the idea is that the
expected savings come from the “type units” being variable-length records with
abbreviations not unlike LLVM bitcode. In contrast to LLVM metadata, however,
there is also some additional overhead due to each “type unit” containing a
redundant declcontext, the support for relocations, and potentially for
supporting the accelerator tables.
> 
> Motivation
> =======> 
> Currently, types in LLVM debug info are described by the DIType class
hierarchy. This hierarchy evolved organically from a more flexible sea-of-nodes
representation into what it is today - a large, only somewhat format neutral
representation of debug types. Making this more format neutral will only
increase the memory use - and for no reason as type information is static (or
nearly so). Debug formats already have a memory efficient serialization, their
own binary format so we should support a front end emitting type information
with sufficient representation to allow the backend to emit debug information
based on the more normal IR features: functions, scopes, variables, etc.
> 
> Scope/Impact
> ==========> 
> This is going to involve large scale changes across both LLVM and clang.
This will also affect any out-of-tree front ends, however, we expect the impact
to be on the order of a large API change rather than needing massive
infrastructure changes.
> 
> Related work
> =========> 
> This is related to the efforts to support CodeView in LLVM and clang as
well as efforts to reduce overall memory consumption when compiling with debug
information enabled;  in particular efforts to prune LTO memory usage.
> 
> 
> Concerns
> =======> 
> 
> We need a good story for transitioning all the debug info testcases in the
backend without giving up coverage and/or readability. David believes he has a
plan here.
David, can you elaborate on this?
> 
> Proposal
> ======> 
> Short version
> -----------------
> 
> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line Table.
> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
> 3. Add a LLVM DWARF emission library similar to the existing CodeView one.
> 4. Migrate the Types API into a clang internal API taking clang AST
structures and use the LLVM binary emission libraries to produce type
information.
> 5. Remove the old binary emission out of LLVM.
> 
> 
> Questions/Thoughts/Elaboration
> -------------------------------------------
> 
> Splitting the DIBuilder API
> ~~~~~~~~~~~~~~~~~~~~
> Will DISubprogram be part of both?
>    * We should split it in two: Full declarations with type and a slimmed
down version with an abstract origin.
> 
> How will we reference types in the DWARF blob?
>    * ODR types can be referenced by name
>    * Non-odr types by full DWARF hash
>    * Each type can be a pair(tuple) of identifier (DITypeRef today) and
blob.
>    * For < DWARF4 we can emit each type as a unit, but not a DWARF Type
Unit and use references and module relocations for the offsets. (See below)
> 
> How will we handle references in DWARF2 or global relocations for non-type
template parameters?
>    * We can use a “relocation” metadata as part of the format.
>    * Representable as a tuple that has the DIType and the offset within the
DIBlob as where to write the final relocation/offset for the reference at
emission time.
> 
> Why break up the types at all?
>    * To enable non-debug format aware linking and type uniquing for LTO
that won’t be huge in size. We break up the types so we don’t need to parse
debug information to link two modules together efficiently.
> 
> Any other concerns there?
>    * Debug information without type units might be slightly larger in this
scheme due to parents being duplicated (declarations and abstract origin, not
full parents). It may be possible to extend dsymutil/etc to merge all siblings
into a common parent. Open question for better ways to solve this.
> 
> How should we handle DWARF5/Apple Accelerator Tables?
>    * Thoughts:
>    * We can parse the dwarf in the back end and generate them.
>    * We can emit in the front end for the base case of non-LTO (with help
from the backend for relocation aspects).
>    * We can use dsymutil on LTO debug information to generate them.
I realized that the last two bullet points would not work well with ThinLTO. One
of its selling points is (fast) incremental rebuilds, and requiring dsymutil to
make a second pass over all the object files is negating this benefit.

-- adrian> 
> Why isn’t this a more detailed spec?
>    * Mostly because we’ve thought about the issues, but we can’t plan for
everything during implementation.
> 
> 
> Future work
> ----------------
> 
> Not contained as part of this, but an obvious future direction is that the
Module linker could grow support for debug aware linking. Then we can have all
of the type information for a single translation unit in a single blob and use
the debug aware linking to handle merging types.

Aboud, Amjad via llvm-dev

2016-Mar-31 19:13 UTC

head link

[llvm-dev] RFC: Up front type information generation in clang and llvm

Forgot to add llvm-dev mailing list.

From: Aboud, Amjad
Sent: Thursday, March 31, 2016 21:07
To: 'Eric Christopher' <echristo at gmail.com>; Clang Dev
<cfe-dev at lists.llvm.org>
Subject: RE: [llvm-dev] RFC: Up front type information generation in clang and
llvm

Hi Eric,
I can understand the need for improving the current design of debug info
representation and emission in LLVM.
However, let’s not forget that the motivation was and still to support CodeView
debug info emission.

I am wondering if it is right to spend the huge effort needed to implement the
below proposal while knowing these facts:

1.      It would be more clear how to improve the design when we have a working
CodeView support.

You said it yourself, that we still do not know what challenges we will face
while implementing this proposal.

2.      I understand that CodeView will need some extra extensions to current
dwarf debug info, like ‘this’ adjustment.

However, it is doable to introduce a CodeView wrapper data structures that can
be created from current dwarf debug info IR.

And this can be done in CodeGen (e.g. CodeViewDebug.cpp) while emitting the
code/debug info.

Again, I understand that your proposal is trying to improve a lot of things, but
it seems that we should first try support CodeView debug info with the current
debug info IR.
The advantages:

1.      It works, even though you still have doubts about few issues, I believe
we can resolve them with minimum modification to the LLVM IR/Clang FE.

2.      It requires much smaller effort.

3.      It is much clean.

4.      We will understand more the requirements needed by CodeView that can be
used to improve the below proposal (before diving into implementing it).

I suggest that we start with:

1.      Define the CodeView wrapper data structure. (CodeViewDebugIR)

2.      Build the CodeView wrapper data structure based on dwarf debug info IR.
(CodeViewDebugBuilder)

3.      Emit the CodeView wrapper data structure into COFF object file.
(CodeViewDebugEmitter)

4.      Figure out what modification/extension need to be done to dwarf debug
info IR/Clang FE.

What do you think?

Thanks,
Amjad

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Eric
Christopher via llvm-dev
Sent: Wednesday, March 30, 2016 04:01
To: Clang Dev <cfe-dev at lists.llvm.org<mailto:cfe-dev at
lists.llvm.org>>; llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: [llvm-dev] RFC: Up front type information generation in clang and llvm

Hi All,

This is something that's been talked about for some time and it's
probably time to propose it.

The "We" in this document is everyone on the cc line plus me.

Please go ahead and take a look.

Thanks!

-eric

Objective (and TL;DR)
================
Migrate debug type information generation from the backends to the front end.

This will enable:
1. Separation of concerns and maintainability: LLVM shouldn’t have to know about
C preprocessor macros, Obj-C properties, or extensive details about debug
information binary formats.
2. Performance: Skipping a serialization should speed up normal compilations.
3. Memory usage: The DI metadata structures are smaller than they were, but are
still fairly large and pointer heavy.

Motivation
=======
Currently, types in LLVM debug info are described by the DIType class hierarchy.
This hierarchy evolved organically from a more flexible sea-of-nodes
representation into what it is today - a large, only somewhat format neutral
representation of debug types. Making this more format neutral will only
increase the memory use - and for no reason as type information is static (or
nearly so). Debug formats already have a memory efficient serialization, their
own binary format so we should support a front end emitting type information
with sufficient representation to allow the backend to emit debug information
based on the more normal IR features: functions, scopes, variables, etc.

Scope/Impact
==========
This is going to involve large scale changes across both LLVM and clang. This
will also affect any out-of-tree front ends, however, we expect the impact to be
on the order of a large API change rather than needing massive infrastructure
changes.

Related work
=========
This is related to the efforts to support CodeView in LLVM and clang as well as
efforts to reduce overall memory consumption when compiling with debug
information enabled;  in particular efforts to prune LTO memory usage.

Concerns
=======

We need a good story for transitioning all the debug info testcases in the
backend without giving up coverage and/or readability. David believes he has a
plan here.

Proposal
======
Short version
-----------------

1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line Table.
2. Split the clang CGDebugInfo API into Types and Line Table to match.
3. Add a LLVM DWARF emission library similar to the existing CodeView one.
4. Migrate the Types API into a clang internal API taking clang AST structures
and use the LLVM binary emission libraries to produce type information.
5. Remove the old binary emission out of LLVM.

Questions/Thoughts/Elaboration
-------------------------------------------

Splitting the DIBuilder API
~~~~~~~~~~~~~~~~~~~~
Will DISubprogram be part of both?
   * We should split it in two: Full declarations with type and a slimmed down
version with an abstract origin.

How will we reference types in the DWARF blob?
   * ODR types can be referenced by name
   * Non-odr types by full DWARF hash
   * Each type can be a pair(tuple) of identifier (DITypeRef today) and blob.
   * For < DWARF4 we can emit each type as a unit, but not a DWARF Type Unit
and use references and module relocations for the offsets. (See below)

How will we handle references in DWARF2 or global relocations for non-type
template parameters?
   * We can use a “relocation” metadata as part of the format.
   * Representable as a tuple that has the DIType and the offset within the
DIBlob as where to write the final relocation/offset for the reference at
emission time.

Why break up the types at all?
   * To enable non-debug format aware linking and type uniquing for LTO that
won’t be huge in size. We break up the types so we don’t need to parse debug
information to link two modules together efficiently.

Any other concerns there?
   * Debug information without type units might be slightly larger in this
scheme due to parents being duplicated (declarations and abstract origin, not
full parents). It may be possible to extend dsymutil/etc to merge all siblings
into a common parent. Open question for better ways to solve this.

How should we handle DWARF5/Apple Accelerator Tables?
   * Thoughts:
   * We can parse the dwarf in the back end and generate them.
   * We can emit in the front end for the base case of non-LTO (with help from
the backend for relocation aspects).
   * We can use dsymutil on LTO debug information to generate them.

Why isn’t this a more detailed spec?
   * Mostly because we’ve thought about the issues, but we can’t plan for
everything during implementation.

Future work
----------------

Not contained as part of this, but an obvious future direction is that the
Module linker could grow support for debug aware linking. Then we can have all
of the type information for a single translation unit in a single blob and use
the debug aware linking to handle merging types.
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160331/9db9c53d/attachment.html>

Carlo Kok via llvm-dev

2016-Apr-04 12:06 UTC

head link

[llvm-dev] RFC: Up front type information generation in clang and llvm

Op 2016-03-30 om 03:00 schreef Eric Christopher via
llvm-dev:> Hi All,
>
> This is something that's been talked about for some time and it's
> probably time to propose it.
>
> The "We" in this document is everyone on the cc line plus me.
>
> Please go ahead and take a look.
>
> Thanks!
>
> -eric
>
>
> Motivation
> =======>
> Currently, types in LLVM debug info are described by the DIType class
> hierarchy. This hierarchy evolved organically from a more flexible
> sea-of-nodes representation into what it is today - a large, only
> somewhat format neutral representation of debug types. Making this more
> format neutral will only increase the memory use - and for no reason as
> type information is static (or nearly so). Debug formats already have a
> memory efficient serialization, their own binary format so we should
> support a front end emitting type information with sufficient
> representation to allow the backend to emit debug information based on
> the more normal IR features: functions, scopes, variables, etc.
>
> Scope/Impact
> ==========>
> This is going to involve large scale changes across both LLVM and clang.
> This will also affect any out-of-tree front ends, however, we expect the
> impact to be on the order of a large API change rather than needing
> massive infrastructure changes.
>
How will you make it on the order of a large api change? At the moment 
we build bitcode ourselves and generate dibuilder equivalent structures. 
wouldn't frontends need to do their own well, DWARF and CodeView 
writing? Especially the ones that are tied to the C only apis.

> Proposal
> ======>
> Short version
> -----------------
>
> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line Table.
> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
> 3. Add a LLVM DWARF emission library similar to the existing CodeView one.
> 4. Migrate the Types API into a clang internal API taking clang AST
> structures and use the LLVM binary emission libraries to produce type
> information.
> 5. Remove the old binary emission out of LLVM.
>
What about allow multiple debug info formats at once? The current format 
could potentially
allow such an option in the future (i know it doesn't actually do it 
now), will the new option
hardcode it to a single format?

-- 
Carlo Kok
RemObjects Software

Eric Christopher via llvm-dev

2016-Apr-04 21:23 UTC

head link

[llvm-dev] RFC: Up front type information generation in clang and llvm

On Mon, Apr 4, 2016 at 5:06 AM Carlo Kok via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
>
> Op 2016-03-30 om 03:00 schreef Eric Christopher via llvm-dev:
> > Hi All,
> >
> > This is something that's been talked about for some time and
it's
> > probably time to propose it.
> >
> > The "We" in this document is everyone on the cc line plus
me.
> >
> > Please go ahead and take a look.
> >
> > Thanks!
> >
> > -eric
> >
> >
> > Motivation
> > =======> >
> > Currently, types in LLVM debug info are described by the DIType class
> > hierarchy. This hierarchy evolved organically from a more flexible
> > sea-of-nodes representation into what it is today - a large, only
> > somewhat format neutral representation of debug types. Making this
more
> > format neutral will only increase the memory use - and for no reason
as
> > type information is static (or nearly so). Debug formats already have
a
> > memory efficient serialization, their own binary format so we should
> > support a front end emitting type information with sufficient
> > representation to allow the backend to emit debug information based on
> > the more normal IR features: functions, scopes, variables, etc.
> >
> > Scope/Impact
> > ==========> >
> > This is going to involve large scale changes across both LLVM and
clang.
> > This will also affect any out-of-tree front ends, however, we expect
the
> > impact to be on the order of a large API change rather than needing
> > massive infrastructure changes.
> >
>
> How will you make it on the order of a large api change? At the moment
> we build bitcode ourselves and generate dibuilder equivalent structures.
> wouldn't frontends need to do their own well, DWARF and CodeView
> writing? Especially the ones that are tied to the C only apis.
>
>There will be some backend support. The hope is that it'll be a fairly
direct translation from the existing APIs.

I make no claims about C API as we don't handle types at the C API level
currently.

>
> > Proposal
> > ======> >
> > Short version
> > -----------------
> >
> > 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line
> Table.
> > 2. Split the clang CGDebugInfo API into Types and Line Table to match.
> > 3. Add a LLVM DWARF emission library similar to the existing CodeView
> one.
> > 4. Migrate the Types API into a clang internal API taking clang AST
> > structures and use the LLVM binary emission libraries to produce type
> > information.
> > 5. Remove the old binary emission out of LLVM.
> >
>
> What about allow multiple debug info formats at once? The current format
> could potentially
> allow such an option in the future (i know it doesn't actually do it
> now), will the new option
> hardcode it to a single format?
>
>What option?

I don't think this is much of a worry - at the very least it's probably
not
much more difficult than doing it right now.

-eric


> --
> Carlo Kok
> RemObjects Software
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160404/ad38cc07/attachment.html>

Peter S. Housel via llvm-dev

2016-Apr-04 21:26 UTC

head link

[llvm-dev] RFC: Up front type information generation in clang and llvm

On 04/04/2016 05:06 AM, Carlo Kok via llvm-dev wrote:> Op 2016-03-30 om 03:00 schreef Eric Christopher via llvm-dev:
>> Hi All,
>>
>> This is something that's been talked about for some time and
it's
>> probably time to propose it.
>>
>> The "We" in this document is everyone on the cc line plus me.
>>
>> Please go ahead and take a look.
>>
>> Thanks!
>>
>> -eric
>>
>>
>> Motivation
>> =======>>
>> Currently, types in LLVM debug info are described by the DIType class
>> hierarchy. This hierarchy evolved organically from a more flexible
>> sea-of-nodes representation into what it is today - a large, only
>> somewhat format neutral representation of debug types. Making this more
>> format neutral will only increase the memory use - and for no reason as
>> type information is static (or nearly so). Debug formats already have a
>> memory efficient serialization, their own binary format so we should
>> support a front end emitting type information with sufficient
>> representation to allow the backend to emit debug information based on
>> the more normal IR features: functions, scopes, variables, etc.
>>
>> Scope/Impact
>> ==========>>
>> This is going to involve large scale changes across both LLVM and
clang.
>> This will also affect any out-of-tree front ends, however, we expect
the
>> impact to be on the order of a large API change rather than needing
>> massive infrastructure changes.
>>
>
> How will you make it on the order of a large api change? At the moment 
> we build bitcode ourselves and generate dibuilder equivalent 
> structures. wouldn't frontends need to do their own well, DWARF and 
> CodeView writing? Especially the ones that are tied to the C only apis.
The Open Dylan compiler doesn't link with any LLVM libraries; its only 
interface with LLVM is through bitcode, using a bitcode writer that I 
wrote myself in Dylan. Frontends that write textual LLVM assembly are in 
the same situation.

The type information that the Open Dylan LLVM support generates within 
debug information is very simple, mostly amounting to void* (and 
function signatures containing varying numbers of void* arguments). It 
sometimes goes beyond this when foreign (C) function support is used 
within Dylan programs.

We would prefer if some level of support were maintained for generating 
least-common-denominator debug info (both DWARF and CodeView) from 
structured metadata. The potential performance improvements from 
implementing this proposal don't really apply to our compiler's use 
case, since the debug types for foreign C structs that are generated 
generally only appear in a single translation unit across the entire 
program. I'd prefer to avoid having to maintain code that deals with 
DWARF and CodeView directly.

-Peter S. Housel-

Reid Kleckner via llvm-dev

2016-Apr-27 23:41 UTC

head link

[llvm-dev] RFC: Up front type information generation in clang and llvm

My general feeling is that this design represents a mid-point between our
current metadata design, and a future design where frontends just emit type
information and LTO links it in a format-aware way.

I don't think it's an imminent priority for anyone to do this for DWARF,
so
I worry that if we start building infrastructure for it, it will end up
overengineered.

Also, people seem to agree that in the long term, we really need a
format-aware linker, and maybe LTO should just use one. Supposedly Frédéric
has patches to llvm-dsymutil to make one for DWARF, but he hasn't found the
time to upstream them.

Together, these reasons make me feel that we should limit the short-term
scope to just CodeView, and add utilities to lib/Linker for performing
basic tasks like type stream merging or type extraction, possibly with
forward declaration of composite types.

In the future, when we do this work for DWARF, we can add a new DIType*
stand-in similar to what you are describing.

The working patch that I have for just CodeView, all types as a single
blob, is up here: http://reviews.llvm.org/D19236 While it doesn't deal with
type blobs or LTO type merging yet, I think it shows that there is
surprisingly little need to bifurcate other parts of LLVM.

Thoughts?

On Tue, Mar 29, 2016 at 6:00 PM, Eric Christopher <echristo at gmail.com>
wrote:
> Hi All,
>
> This is something that's been talked about for some time and it's
probably
> time to propose it.
>
> The "We" in this document is everyone on the cc line plus me.
>
> Please go ahead and take a look.
>
> Thanks!
>
> -eric
>
>
> Objective (and TL;DR)
> ================>
> Migrate debug type information generation from the backends to the front
> end.
>
> This will enable:
> 1. Separation of concerns and maintainability: LLVM shouldn’t have to know
> about C preprocessor macros, Obj-C properties, or extensive details about
> debug information binary formats.
> 2. Performance: Skipping a serialization should speed up normal
> compilations.
> 3. Memory usage: The DI metadata structures are smaller than they were,
> but are still fairly large and pointer heavy.
>
> Motivation
> =======>
> Currently, types in LLVM debug info are described by the DIType class
> hierarchy. This hierarchy evolved organically from a more flexible
> sea-of-nodes representation into what it is today - a large, only somewhat
> format neutral representation of debug types. Making this more format
> neutral will only increase the memory use - and for no reason as type
> information is static (or nearly so). Debug formats already have a memory
> efficient serialization, their own binary format so we should support a
> front end emitting type information with sufficient representation to allow
> the backend to emit debug information based on the more normal IR features:
> functions, scopes, variables, etc.
>
> Scope/Impact
> ==========>
> This is going to involve large scale changes across both LLVM and clang.
> This will also affect any out-of-tree front ends, however, we expect the
> impact to be on the order of a large API change rather than needing massive
> infrastructure changes.
>
> Related work
> =========>
> This is related to the efforts to support CodeView in LLVM and clang as
> well as efforts to reduce overall memory consumption when compiling with
> debug information enabled;  in particular efforts to prune LTO memory
usage.
>
>
> Concerns
> =======>
>
> We need a good story for transitioning all the debug info testcases in the
> backend without giving up coverage and/or readability. David believes he
> has a plan here.
>
> Proposal
> ======>
> Short version
> -----------------
>
> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line Table.
> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
> 3. Add a LLVM DWARF emission library similar to the existing CodeView one.
> 4. Migrate the Types API into a clang internal API taking clang AST
> structures and use the LLVM binary emission libraries to produce type
> information.
> 5. Remove the old binary emission out of LLVM.
>
>
> Questions/Thoughts/Elaboration
> -------------------------------------------
>
> Splitting the DIBuilder API
> ~~~~~~~~~~~~~~~~~~~~
> Will DISubprogram be part of both?
>    * We should split it in two: Full declarations with type and a slimmed
> down version with an abstract origin.
>
> How will we reference types in the DWARF blob?
>    * ODR types can be referenced by name
>    * Non-odr types by full DWARF hash
>    * Each type can be a pair(tuple) of identifier (DITypeRef today) and
> blob.
>    * For < DWARF4 we can emit each type as a unit, but not a DWARF Type
> Unit and use references and module relocations for the offsets. (See below)
>
> How will we handle references in DWARF2 or global relocations for non-type
> template parameters?
>    * We can use a “relocation” metadata as part of the format.
>    * Representable as a tuple that has the DIType and the offset within
> the DIBlob as where to write the final relocation/offset for the reference
> at emission time.
>
> Why break up the types at all?
>    * To enable non-debug format aware linking and type uniquing for LTO
> that won’t be huge in size. We break up the types so we don’t need to parse
> debug information to link two modules together efficiently.
>
> Any other concerns there?
>    * Debug information without type units might be slightly larger in this
> scheme due to parents being duplicated (declarations and abstract origin,
> not full parents). It may be possible to extend dsymutil/etc to merge all
> siblings into a common parent. Open question for better ways to solve this.
>
> How should we handle DWARF5/Apple Accelerator Tables?
>    * Thoughts:
>    * We can parse the dwarf in the back end and generate them.
>    * We can emit in the front end for the base case of non-LTO (with help
> from the backend for relocation aspects).
>    * We can use dsymutil on LTO debug information to generate them.
>
> Why isn’t this a more detailed spec?
>    * Mostly because we’ve thought about the issues, but we can’t plan for
> everything during implementation.
>
>
> Future work
> ----------------
>
> Not contained as part of this, but an obvious future direction is that the
> Module linker could grow support for debug aware linking. Then we can have
> all of the type information for a single translation unit in a single blob
> and use the debug aware linking to handle merging types.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160427/f1a4fc30/attachment-0001.html>

Reid Kleckner via llvm-dev

2016-Apr-27 23:51 UTC

head link

[llvm-dev] RFC: Up front type information generation in clang and llvm

Somehow I managed to respond without being explicit about the difference
between your design and mine: I'm saying we should just have one type blob
per TU. This will avoid the need for cross-blob references, but it will
necessitate format-aware type handing during LTO and LTO-like use-cases
(ThinLTO, llvm-extract, etc).

On Wed, Apr 27, 2016 at 4:41 PM, Reid Kleckner <rnk at google.com> wrote:
> My general feeling is that this design represents a mid-point between our
> current metadata design, and a future design where frontends just emit type
> information and LTO links it in a format-aware way.
>
> I don't think it's an imminent priority for anyone to do this for
DWARF,
> so I worry that if we start building infrastructure for it, it will end up
> overengineered.
>
> Also, people seem to agree that in the long term, we really need a
> format-aware linker, and maybe LTO should just use one. Supposedly Frédéric
> has patches to llvm-dsymutil to make one for DWARF, but he hasn't found
the
> time to upstream them.
>
> Together, these reasons make me feel that we should limit the short-term
> scope to just CodeView, and add utilities to lib/Linker for performing
> basic tasks like type stream merging or type extraction, possibly with
> forward declaration of composite types.
>
> In the future, when we do this work for DWARF, we can add a new DIType*
> stand-in similar to what you are describing.
>
> The working patch that I have for just CodeView, all types as a single
> blob, is up here: http://reviews.llvm.org/D19236 While it doesn't deal
> with type blobs or LTO type merging yet, I think it shows that there is
> surprisingly little need to bifurcate other parts of LLVM.
>
> Thoughts?
>
> On Tue, Mar 29, 2016 at 6:00 PM, Eric Christopher <echristo at
gmail.com>
> wrote:
>
>> Hi All,
>>
>> This is something that's been talked about for some time and
it's
>> probably time to propose it.
>>
>> The "We" in this document is everyone on the cc line plus me.
>>
>> Please go ahead and take a look.
>>
>> Thanks!
>>
>> -eric
>>
>>
>> Objective (and TL;DR)
>> ================>>
>> Migrate debug type information generation from the backends to the
front
>> end.
>>
>> This will enable:
>> 1. Separation of concerns and maintainability: LLVM shouldn’t have to
>> know about C preprocessor macros, Obj-C properties, or extensive
details
>> about debug information binary formats.
>> 2. Performance: Skipping a serialization should speed up normal
>> compilations.
>> 3. Memory usage: The DI metadata structures are smaller than they were,
>> but are still fairly large and pointer heavy.
>>
>> Motivation
>> =======>>
>> Currently, types in LLVM debug info are described by the DIType class
>> hierarchy. This hierarchy evolved organically from a more flexible
>> sea-of-nodes representation into what it is today - a large, only
somewhat
>> format neutral representation of debug types. Making this more format
>> neutral will only increase the memory use - and for no reason as type
>> information is static (or nearly so). Debug formats already have a
memory
>> efficient serialization, their own binary format so we should support a
>> front end emitting type information with sufficient representation to
allow
>> the backend to emit debug information based on the more normal IR
features:
>> functions, scopes, variables, etc.
>>
>> Scope/Impact
>> ==========>>
>> This is going to involve large scale changes across both LLVM and
clang.
>> This will also affect any out-of-tree front ends, however, we expect
the
>> impact to be on the order of a large API change rather than needing
massive
>> infrastructure changes.
>>
>> Related work
>> =========>>
>> This is related to the efforts to support CodeView in LLVM and clang as
>> well as efforts to reduce overall memory consumption when compiling
with
>> debug information enabled;  in particular efforts to prune LTO memory
usage.
>>
>>
>> Concerns
>> =======>>
>>
>> We need a good story for transitioning all the debug info testcases in
>> the backend without giving up coverage and/or readability. David
believes
>> he has a plan here.
>>
>> Proposal
>> ======>>
>> Short version
>> -----------------
>>
>> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line
>> Table.
>> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
>> 3. Add a LLVM DWARF emission library similar to the existing CodeView
one.
>> 4. Migrate the Types API into a clang internal API taking clang AST
>> structures and use the LLVM binary emission libraries to produce type
>> information.
>> 5. Remove the old binary emission out of LLVM.
>>
>>
>> Questions/Thoughts/Elaboration
>> -------------------------------------------
>>
>> Splitting the DIBuilder API
>> ~~~~~~~~~~~~~~~~~~~~
>> Will DISubprogram be part of both?
>>    * We should split it in two: Full declarations with type and a
slimmed
>> down version with an abstract origin.
>>
>> How will we reference types in the DWARF blob?
>>    * ODR types can be referenced by name
>>    * Non-odr types by full DWARF hash
>>    * Each type can be a pair(tuple) of identifier (DITypeRef today) and
>> blob.
>>    * For < DWARF4 we can emit each type as a unit, but not a DWARF
Type
>> Unit and use references and module relocations for the offsets. (See
below)
>>
>> How will we handle references in DWARF2 or global relocations for
>> non-type template parameters?
>>    * We can use a “relocation” metadata as part of the format.
>>    * Representable as a tuple that has the DIType and the offset within
>> the DIBlob as where to write the final relocation/offset for the
reference
>> at emission time.
>>
>> Why break up the types at all?
>>    * To enable non-debug format aware linking and type uniquing for LTO
>> that won’t be huge in size. We break up the types so we don’t need to
parse
>> debug information to link two modules together efficiently.
>>
>> Any other concerns there?
>>    * Debug information without type units might be slightly larger in
>> this scheme due to parents being duplicated (declarations and abstract
>> origin, not full parents). It may be possible to extend dsymutil/etc to
>> merge all siblings into a common parent. Open question for better ways
to
>> solve this.
>>
>> How should we handle DWARF5/Apple Accelerator Tables?
>>    * Thoughts:
>>    * We can parse the dwarf in the back end and generate them.
>>    * We can emit in the front end for the base case of non-LTO (with
help
>> from the backend for relocation aspects).
>>    * We can use dsymutil on LTO debug information to generate them.
>>
>> Why isn’t this a more detailed spec?
>>    * Mostly because we’ve thought about the issues, but we can’t plan
for
>> everything during implementation.
>>
>>
>> Future work
>> ----------------
>>
>> Not contained as part of this, but an obvious future direction is that
>> the Module linker could grow support for debug aware linking. Then we
can
>> have all of the type information for a single translation unit in a
single
>> blob and use the debug aware linking to handle merging types.
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160427/80ed7d31/attachment.html>

Eric Christopher via llvm-dev

2016-Apr-27 23:53 UTC

head link

[llvm-dev] RFC: Up front type information generation in clang and llvm

I don't agree in general here because of:

a) maintainability - there isn't a one true path through things and now is
scattering more windows knowledge through debug info and lto

b) higher bar for implementing similar dwarf functionality - there's
nothing here that makes it at any point better for our general debug info
support. Incrementally updating to an intermediate step is much easier and
a lower bar than needing to implement everything up to and including a
format aware linker and support that through ThinLTO, the JIT, and full LTO.

c) if there's no reason to do this for dwarf there's no reason to do it
for
windows. The existing proposal was a way to get you type emission in the
front end so that you'd have to do less work. Ultimately though I don't
see
a reason to do this if all of the platforms don't look the same.

d) ThinLTO/ORC won't support the debug info you have in your proposal right
now without patches

e) You're regressing LTO linking performance hugely for windows with debug
until you write the patches that enable format aware linking of code view
information

I'm open to arguments on any of these points from anyone.

-eric

On Wed, Apr 27, 2016 at 4:41 PM Reid Kleckner <rnk at google.com> wrote:
> My general feeling is that this design represents a mid-point between our
> current metadata design, and a future design where frontends just emit type
> information and LTO links it in a format-aware way.
>
> I don't think it's an imminent priority for anyone to do this for
DWARF,
> so I worry that if we start building infrastructure for it, it will end up
> overengineered.
>
> Also, people seem to agree that in the long term, we really need a
> format-aware linker, and maybe LTO should just use one. Supposedly Frédéric
> has patches to llvm-dsymutil to make one for DWARF, but he hasn't found
the
> time to upstream them.
>
> Together, these reasons make me feel that we should limit the short-term
> scope to just CodeView, and add utilities to lib/Linker for performing
> basic tasks like type stream merging or type extraction, possibly with
> forward declaration of composite types.
>
> In the future, when we do this work for DWARF, we can add a new DIType*
> stand-in similar to what you are describing.
>
> The working patch that I have for just CodeView, all types as a single
> blob, is up here: http://reviews.llvm.org/D19236 While it doesn't deal
> with type blobs or LTO type merging yet, I think it shows that there is
> surprisingly little need to bifurcate other parts of LLVM.
>
> Thoughts?
>
> On Tue, Mar 29, 2016 at 6:00 PM, Eric Christopher <echristo at
gmail.com>
> wrote:
>
>> Hi All,
>>
>> This is something that's been talked about for some time and
it's
>> probably time to propose it.
>>
>> The "We" in this document is everyone on the cc line plus me.
>>
>> Please go ahead and take a look.
>>
>> Thanks!
>>
>> -eric
>>
>>
>> Objective (and TL;DR)
>> ================>>
>> Migrate debug type information generation from the backends to the
front
>> end.
>>
>> This will enable:
>> 1. Separation of concerns and maintainability: LLVM shouldn’t have to
>> know about C preprocessor macros, Obj-C properties, or extensive
details
>> about debug information binary formats.
>> 2. Performance: Skipping a serialization should speed up normal
>> compilations.
>> 3. Memory usage: The DI metadata structures are smaller than they were,
>> but are still fairly large and pointer heavy.
>>
>> Motivation
>> =======>>
>> Currently, types in LLVM debug info are described by the DIType class
>> hierarchy. This hierarchy evolved organically from a more flexible
>> sea-of-nodes representation into what it is today - a large, only
somewhat
>> format neutral representation of debug types. Making this more format
>> neutral will only increase the memory use - and for no reason as type
>> information is static (or nearly so). Debug formats already have a
memory
>> efficient serialization, their own binary format so we should support a
>> front end emitting type information with sufficient representation to
allow
>> the backend to emit debug information based on the more normal IR
features:
>> functions, scopes, variables, etc.
>>
>> Scope/Impact
>> ==========>>
>> This is going to involve large scale changes across both LLVM and
clang.
>> This will also affect any out-of-tree front ends, however, we expect
the
>> impact to be on the order of a large API change rather than needing
massive
>> infrastructure changes.
>>
>> Related work
>> =========>>
>> This is related to the efforts to support CodeView in LLVM and clang as
>> well as efforts to reduce overall memory consumption when compiling
with
>> debug information enabled;  in particular efforts to prune LTO memory
usage.
>>
>>
>> Concerns
>> =======>>
>>
>> We need a good story for transitioning all the debug info testcases in
>> the backend without giving up coverage and/or readability. David
believes
>> he has a plan here.
>>
>> Proposal
>> ======>>
>> Short version
>> -----------------
>>
>> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line
>> Table.
>> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
>> 3. Add a LLVM DWARF emission library similar to the existing CodeView
one.
>> 4. Migrate the Types API into a clang internal API taking clang AST
>> structures and use the LLVM binary emission libraries to produce type
>> information.
>> 5. Remove the old binary emission out of LLVM.
>>
>>
>> Questions/Thoughts/Elaboration
>> -------------------------------------------
>>
>> Splitting the DIBuilder API
>> ~~~~~~~~~~~~~~~~~~~~
>> Will DISubprogram be part of both?
>>    * We should split it in two: Full declarations with type and a
slimmed
>> down version with an abstract origin.
>>
>> How will we reference types in the DWARF blob?
>>    * ODR types can be referenced by name
>>    * Non-odr types by full DWARF hash
>>    * Each type can be a pair(tuple) of identifier (DITypeRef today) and
>> blob.
>>    * For < DWARF4 we can emit each type as a unit, but not a DWARF
Type
>> Unit and use references and module relocations for the offsets. (See
below)
>>
>> How will we handle references in DWARF2 or global relocations for
>> non-type template parameters?
>>    * We can use a “relocation” metadata as part of the format.
>>    * Representable as a tuple that has the DIType and the offset within
>> the DIBlob as where to write the final relocation/offset for the
reference
>> at emission time.
>>
>> Why break up the types at all?
>>    * To enable non-debug format aware linking and type uniquing for LTO
>> that won’t be huge in size. We break up the types so we don’t need to
parse
>> debug information to link two modules together efficiently.
>>
>> Any other concerns there?
>>    * Debug information without type units might be slightly larger in
>> this scheme due to parents being duplicated (declarations and abstract
>> origin, not full parents). It may be possible to extend dsymutil/etc to
>> merge all siblings into a common parent. Open question for better ways
to
>> solve this.
>>
>> How should we handle DWARF5/Apple Accelerator Tables?
>>    * Thoughts:
>>    * We can parse the dwarf in the back end and generate them.
>>    * We can emit in the front end for the base case of non-LTO (with
help
>> from the backend for relocation aspects).
>>    * We can use dsymutil on LTO debug information to generate them.
>>
>> Why isn’t this a more detailed spec?
>>    * Mostly because we’ve thought about the issues, but we can’t plan
for
>> everything during implementation.
>>
>>
>> Future work
>> ----------------
>>
>> Not contained as part of this, but an obvious future direction is that
>> the Module linker could grow support for debug aware linking. Then we
can
>> have all of the type information for a single translation unit in a
single
>> blob and use the debug aware linking to handle merging types.
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160427/6ca56569/attachment.html>

Frédéric Riss via llvm-dev

2016-Apr-28 01:26 UTC

head link

[llvm-dev] RFC: Up front type information generation in clang and llvm

> On Apr 27, 2016, at 4:41 PM, Reid Kleckner <rnk at google.com> wrote:
> 
> My general feeling is that this design represents a mid-point between our
current metadata design, and a future design where frontends just emit type
information and LTO links it in a format-aware way.
> 
> I don't think it's an imminent priority for anyone to do this for
DWARF, so I worry that if we start building infrastructure for it, it will end
up overengineered.
> 
> Also, people seem to agree that in the long term, we really need a
format-aware linker, and maybe LTO should just use one. Supposedly Frédéric has
patches to llvm-dsymutil to make one for DWARF, but he hasn't found the time
to upstream them.
There are pieces missing upstream — mostly the accelerator tables — and I’m
really struggling to find time to upstream these. However, the DIE tree linking
part of upstream llvm-dsymutil is complete. That’s not to say that it would be
easy to use it as a generic DWARF linker. I tried to make it as agnostic to the
platform as I could, but it was designed to be bit-for-bit compatible with the
original dsymutil and that surely made it a lot less generic.

Would you envision the format-aware link to take place during the LTO link? This
would seem pretty expensive to me (DWARF linking is not really cheap, as it’s
not a format designed for this). I think it would make more sense to leave the
type info in the object files and to somehow have the LTO link emit external
references to it (ala module debugging). Then have the debug info link happen as
an explicit step; this matches the Darwin model, but not the usual *nix model.

Fred
> Together, these reasons make me feel that we should limit the short-term
scope to just CodeView, and add utilities to lib/Linker for performing basic
tasks like type stream merging or type extraction, possibly with forward
declaration of composite types.
> 
> In the future, when we do this work for DWARF, we can add a new DIType*
stand-in similar to what you are describing.
> 
> The working patch that I have for just CodeView, all types as a single
blob, is up here: http://reviews.llvm.org/D19236
<http://reviews.llvm.org/D19236> While it doesn't deal with type blobs
or LTO type merging yet, I think it shows that there is surprisingly little need
to bifurcate other parts of LLVM.
> 
> Thoughts?
> 
> On Tue, Mar 29, 2016 at 6:00 PM, Eric Christopher <echristo at gmail.com
<mailto:echristo at gmail.com>> wrote:
> Hi All,
> 
> This is something that's been talked about for some time and it's
probably time to propose it.
> 
> The "We" in this document is everyone on the cc line plus me.
> 
> Please go ahead and take a look.
> 
> Thanks!
> 
> -eric
> 
> 
> Objective (and TL;DR)
> ================> 
> Migrate debug type information generation from the backends to the front
end.
> 
> This will enable:
> 1. Separation of concerns and maintainability: LLVM shouldn’t have to know
about C preprocessor macros, Obj-C properties, or extensive details about debug
information binary formats.
> 2. Performance: Skipping a serialization should speed up normal
compilations.
> 3. Memory usage: The DI metadata structures are smaller than they were, but
are still fairly large and pointer heavy.
> 
> Motivation
> =======> 
> Currently, types in LLVM debug info are described by the DIType class
hierarchy. This hierarchy evolved organically from a more flexible sea-of-nodes
representation into what it is today - a large, only somewhat format neutral
representation of debug types. Making this more format neutral will only
increase the memory use - and for no reason as type information is static (or
nearly so). Debug formats already have a memory efficient serialization, their
own binary format so we should support a front end emitting type information
with sufficient representation to allow the backend to emit debug information
based on the more normal IR features: functions, scopes, variables, etc.
> 
> Scope/Impact
> ==========> 
> This is going to involve large scale changes across both LLVM and clang.
This will also affect any out-of-tree front ends, however, we expect the impact
to be on the order of a large API change rather than needing massive
infrastructure changes.
> 
> Related work
> =========> 
> This is related to the efforts to support CodeView in LLVM and clang as
well as efforts to reduce overall memory consumption when compiling with debug
information enabled;  in particular efforts to prune LTO memory usage.
> 
> 
> Concerns
> =======> 
> 
> We need a good story for transitioning all the debug info testcases in the
backend without giving up coverage and/or readability. David believes he has a
plan here.
> 
> Proposal
> ======> 
> Short version
> -----------------
> 
> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line Table.
> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
> 3. Add a LLVM DWARF emission library similar to the existing CodeView one.
> 4. Migrate the Types API into a clang internal API taking clang AST
structures and use the LLVM binary emission libraries to produce type
information.
> 5. Remove the old binary emission out of LLVM.
> 
> 
> Questions/Thoughts/Elaboration
> -------------------------------------------
> 
> Splitting the DIBuilder API
> ~~~~~~~~~~~~~~~~~~~~
> Will DISubprogram be part of both?
>    * We should split it in two: Full declarations with type and a slimmed
down version with an abstract origin.
> 
> How will we reference types in the DWARF blob?
>    * ODR types can be referenced by name
>    * Non-odr types by full DWARF hash
>    * Each type can be a pair(tuple) of identifier (DITypeRef today) and
blob.
>    * For < DWARF4 we can emit each type as a unit, but not a DWARF Type
Unit and use references and module relocations for the offsets. (See below)
> 
> How will we handle references in DWARF2 or global relocations for non-type
template parameters?
>    * We can use a “relocation” metadata as part of the format.
>    * Representable as a tuple that has the DIType and the offset within the
DIBlob as where to write the final relocation/offset for the reference at
emission time.
> 
> Why break up the types at all?
>    * To enable non-debug format aware linking and type uniquing for LTO
that won’t be huge in size. We break up the types so we don’t need to parse
debug information to link two modules together efficiently.
> 
> Any other concerns there?
>    * Debug information without type units might be slightly larger in this
scheme due to parents being duplicated (declarations and abstract origin, not
full parents). It may be possible to extend dsymutil/etc to merge all siblings
into a common parent. Open question for better ways to solve this.
> 
> How should we handle DWARF5/Apple Accelerator Tables?
>    * Thoughts:
>    * We can parse the dwarf in the back end and generate them.
>    * We can emit in the front end for the base case of non-LTO (with help
from the backend for relocation aspects).
>    * We can use dsymutil on LTO debug information to generate them.
> 
> Why isn’t this a more detailed spec?
>    * Mostly because we’ve thought about the issues, but we can’t plan for
everything during implementation.
> 
> 
> Future work
> ----------------
> 
> Not contained as part of this, but an obvious future direction is that the
Module linker could grow support for debug aware linking. Then we can have all
of the type information for a single translation unit in a single blob and use
the debug aware linking to handle merging types.
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160427/52f2ae58/attachment.html>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Mar 2016 - RFC: Up front type information generation in clang and llvm

[llvm-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] RFC: Up front type information generation in clang and llvm

Possibly Parallel Threads