thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Aboud, Amjad via llvm-dev

2016-Mar-31 22:44 UTC

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

Hi Mehdi,
I understand the reasoning for supporting this proposal independently from
CodeView support.
However, I do not think that it is needed for supporting CodeView.

When I say that my suggestion is more clean, I was pointing to CodeView support,
assuming the changes in LLVM IR/Clang FE indicated in this proposal.
Also, it is not that clear from the proposal what will be shared (generic)
between Dwarf and CodeView and what will be specific.

Regards,
Amjad

From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com]
Sent: Thursday, March 31, 2016 22:27
To: Aboud, Amjad <amjad.aboud at intel.com>
Cc: Eric Christopher <echristo at gmail.com>; Clang Dev <cfe-dev at
lists.llvm.org>; llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [cfe-dev] [llvm-dev] RFC: Up front type information generation in
clang and llvm

Hi Aboud,

On Mar 31, 2016, at 11:06 AM, Aboud, Amjad via cfe-dev <cfe-dev at
lists.llvm.org<mailto:cfe-dev at lists.llvm.org>> wrote:

Hi Eric,
I can understand the need for improving the current design of debug info
representation and emission in LLVM.
However, let’s not forget that the motivation was and still to support CodeView
debug info emission.

Well, that is *one* motivation.

I am wondering if it is right to spend the huge effort needed to implement the
below proposal while knowing these facts:
1.      It would be more clear how to improve the design when we have a working
CodeView support.
You said it yourself, that we still do not know what challenges we will face
while implementing this proposal.
2.      I understand that CodeView will need some extra extensions to current
dwarf debug info, like ‘this’ adjustment.
However, it is doable to introduce a CodeView wrapper data structures that can
be created from current dwarf debug info IR.
And this can be done in CodeGen (e.g. CodeViewDebug.cpp) while emitting the
code/debug info.

Again, I understand that your proposal is trying to improve a lot of things

Yes, and to give some different perspective: some of these "things"
are a lot higher priority than CodeView (for other people/use cases of course),
because DebugInfo cost is prohibitive for some use cases.


, but it seems that we should first try support CodeView debug info with the
current debug info IR.
The advantages:
1.      It works, even though you still have doubts about few issues, I believe
we can resolve them with minimum modification to the LLVM IR/Clang FE.
2.      It requires much smaller effort.
3.      It is much clean.

If it is "much cleaner" in the IR, I understand that you have insights
about Eric's proposal being "less clean", independently of adding
CodeView before or after this change. If so it's worth elaborating on this.



4.      We will understand more the requirements needed by CodeView that can be
used to improve the below proposal (before diving into implementing it).

Don't you forget the "Cons":

1) It is easier to perform large refactoring/changes to the debug info flow
*before* complexifying the problem.
2) This is adding more stuff that will need to go through all these changes,
wasting effort in the process.
3) It will limit forward progress for people who don't care about CodeView
but want to move forward with restructuring DI deeply, like Eric's proposal
is doing.

That is not to say that your points are not valid, but that it's not that
clear cut either.

--
Mehdi




I suggest that we start with:
1.      Define the CodeView wrapper data structure. (CodeViewDebugIR)
2.      Build the CodeView wrapper data structure based on dwarf debug info IR.
(CodeViewDebugBuilder)
3.      Emit the CodeView wrapper data structure into COFF object file.
(CodeViewDebugEmitter)
4.      Figure out what modification/extension need to be done to dwarf debug
info IR/Clang FE.

What do you think?

Thanks,
Amjad

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Eric
Christopher via llvm-dev
Sent: Wednesday, March 30, 2016 04:01
To: Clang Dev <cfe-dev at lists.llvm.org<mailto:cfe-dev at
lists.llvm.org>>; llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: [llvm-dev] RFC: Up front type information generation in clang and llvm

Hi All,

This is something that's been talked about for some time and it's
probably time to propose it.

The "We" in this document is everyone on the cc line plus me.

Please go ahead and take a look.

Thanks!

-eric


Objective (and TL;DR)
================
Migrate debug type information generation from the backends to the front end.

This will enable:
1. Separation of concerns and maintainability: LLVM shouldn’t have to know about
C preprocessor macros, Obj-C properties, or extensive details about debug
information binary formats.
2. Performance: Skipping a serialization should speed up normal compilations.
3. Memory usage: The DI metadata structures are smaller than they were, but are
still fairly large and pointer heavy.

Motivation
=======
Currently, types in LLVM debug info are described by the DIType class hierarchy.
This hierarchy evolved organically from a more flexible sea-of-nodes
representation into what it is today - a large, only somewhat format neutral
representation of debug types. Making this more format neutral will only
increase the memory use - and for no reason as type information is static (or
nearly so). Debug formats already have a memory efficient serialization, their
own binary format so we should support a front end emitting type information
with sufficient representation to allow the backend to emit debug information
based on the more normal IR features: functions, scopes, variables, etc.

Scope/Impact
==========
This is going to involve large scale changes across both LLVM and clang. This
will also affect any out-of-tree front ends, however, we expect the impact to be
on the order of a large API change rather than needing massive infrastructure
changes.

Related work
=========
This is related to the efforts to support CodeView in LLVM and clang as well as
efforts to reduce overall memory consumption when compiling with debug
information enabled;  in particular efforts to prune LTO memory usage.


Concerns
=======

We need a good story for transitioning all the debug info testcases in the
backend without giving up coverage and/or readability. David believes he has a
plan here.

Proposal
======
Short version
-----------------

1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line Table.
2. Split the clang CGDebugInfo API into Types and Line Table to match.
3. Add a LLVM DWARF emission library similar to the existing CodeView one.
4. Migrate the Types API into a clang internal API taking clang AST structures
and use the LLVM binary emission libraries to produce type information.
5. Remove the old binary emission out of LLVM.


Questions/Thoughts/Elaboration
-------------------------------------------

Splitting the DIBuilder API
~~~~~~~~~~~~~~~~~~~~
Will DISubprogram be part of both?
   * We should split it in two: Full declarations with type and a slimmed down
version with an abstract origin.

How will we reference types in the DWARF blob?
   * ODR types can be referenced by name
   * Non-odr types by full DWARF hash
   * Each type can be a pair(tuple) of identifier (DITypeRef today) and blob.
   * For < DWARF4 we can emit each type as a unit, but not a DWARF Type Unit
and use references and module relocations for the offsets. (See below)

How will we handle references in DWARF2 or global relocations for non-type
template parameters?
   * We can use a “relocation” metadata as part of the format.
   * Representable as a tuple that has the DIType and the offset within the
DIBlob as where to write the final relocation/offset for the reference at
emission time.

Why break up the types at all?
   * To enable non-debug format aware linking and type uniquing for LTO that
won’t be huge in size. We break up the types so we don’t need to parse debug
information to link two modules together efficiently.

Any other concerns there?
   * Debug information without type units might be slightly larger in this
scheme due to parents being duplicated (declarations and abstract origin, not
full parents). It may be possible to extend dsymutil/etc to merge all siblings
into a common parent. Open question for better ways to solve this.

How should we handle DWARF5/Apple Accelerator Tables?
   * Thoughts:
   * We can parse the dwarf in the back end and generate them.
   * We can emit in the front end for the base case of non-LTO (with help from
the backend for relocation aspects).
   * We can use dsymutil on LTO debug information to generate them.

Why isn’t this a more detailed spec?
   * Mostly because we’ve thought about the issues, but we can’t plan for
everything during implementation.


Future work
----------------

Not contained as part of this, but an obvious future direction is that the
Module linker could grow support for debug aware linking. Then we can have all
of the type information for a single translation unit in a single blob and use
the debug aware linking to handle merging types.
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
_______________________________________________
cfe-dev mailing list
cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160331/9af8f8d6/attachment-0001.html>

Mehdi Amini via llvm-dev

2016-Mar-31 22:45 UTC

head link

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

Hey,
> On Mar 31, 2016, at 3:44 PM, Aboud, Amjad <amjad.aboud at intel.com>
wrote:
> 
> Hi Mehdi,
> I understand the reasoning for supporting this proposal independently from
CodeView support.
> However, I do not think that it is needed for supporting CodeView.
>  
> When I say that my suggestion is more clean, I was pointing to CodeView
support, assuming the changes in LLVM IR/Clang FE indicated in this proposal.
> Also, it is not that clear from the proposal what will be shared (generic)
between Dwarf and CodeView and what will be specific. <>It all makes sense.


Thanks,

Mehdi

>  <>
>  
> Regards,
> Amjad
>  
>  <>From: mehdi.amini at apple.com <mailto:mehdi.amini at
apple.com> [mailto:mehdi.amini at apple.com <mailto:mehdi.amini at
apple.com>]
> Sent: Thursday, March 31, 2016 22:27
> To: Aboud, Amjad <amjad.aboud at intel.com <mailto:amjad.aboud at
intel.com>>
> Cc: Eric Christopher <echristo at gmail.com <mailto:echristo at
gmail.com>>; Clang Dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at
lists.llvm.org>>; llvm-dev <llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>>
> Subject: Re: [cfe-dev] [llvm-dev] RFC: Up front type information generation
in clang and llvm
>  
> Hi Aboud,
>  
> On Mar 31, 2016, at 11:06 AM, Aboud, Amjad via cfe-dev <cfe-dev at
lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>  
> Hi Eric,
> I can understand the need for improving the current design of debug info
representation and emission in LLVM.
> However, let’s not forget that the motivation was and still to support
CodeView debug info emission.
>  
> Well, that is *one* motivation.
>  
> I am wondering if it is right to spend the huge effort needed to implement
the below proposal while knowing these facts:
> 1.      It would be more clear how to improve the design when we have a
working CodeView support.
> You said it yourself, that we still do not know what challenges we will
face while implementing this proposal.
> 2.      I understand that CodeView will need some extra extensions to
current dwarf debug info, like ‘this’ adjustment.
> However, it is doable to introduce a CodeView wrapper data structures that
can be created from current dwarf debug info IR.
> And this can be done in CodeGen (e.g. CodeViewDebug.cpp) while emitting the
code/debug info.
>  
> Again, I understand that your proposal is trying to improve a lot of things
>  
> Yes, and to give some different perspective: some of these
"things" are a lot higher priority than CodeView (for other people/use
cases of course), because DebugInfo cost is prohibitive for some use cases.
> 
> 
> , but it seems that we should first try support CodeView debug info with
the current debug info IR.
> The advantages:
> 1.      It works, even though you still have doubts about few issues, I
believe we can resolve them with minimum modification to the LLVM IR/Clang FE.
> 2.      It requires much smaller effort.
> 3.      It is much clean.
>  
> If it is "much cleaner" in the IR, I understand that you have
insights about Eric's proposal being "less clean", independently
of adding CodeView before or after this change. If so it's worth elaborating
on this.
>  
> 
> 
> 4.      We will understand more the requirements needed by CodeView that
can be used to improve the below proposal (before diving into implementing it).
>  
> Don't you forget the "Cons":
>  
> 1) It is easier to perform large refactoring/changes to the debug info flow
*before* complexifying the problem.
> 2) This is adding more stuff that will need to go through all these
changes, wasting effort in the process.
> 3) It will limit forward progress for people who don't care about
CodeView but want to move forward with restructuring DI deeply, like Eric's
proposal is doing.
>  
> That is not to say that your points are not valid, but that it's not
that clear cut either.
>  
> -- 
> Mehdi
>  
> 
> 
>  
> I suggest that we start with:
> 1.      Define the CodeView wrapper data structure. (CodeViewDebugIR)
> 2.      Build the CodeView wrapper data structure based on dwarf debug info
IR. (CodeViewDebugBuilder)
> 3.      Emit the CodeView wrapper data structure into COFF object file.
(CodeViewDebugEmitter)
> 4.      Figure out what modification/extension need to be done to dwarf
debug info IR/Clang FE.
>  
> What do you think?
>  
> Thanks,
> Amjad
>  
> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org
<mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of Eric Christopher
via llvm-dev
> Sent: Wednesday, March 30, 2016 04:01
> To: Clang Dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at
lists.llvm.org>>; llvm-dev <llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>>
> Subject: [llvm-dev] RFC: Up front type information generation in clang and
llvm
>  
> Hi All,
>  
> This is something that's been talked about for some time and it's
probably time to propose it.
>  
> The "We" in this document is everyone on the cc line plus me.
>  
> Please go ahead and take a look.
>  
> Thanks!
>  
> -eric
>  
>  
> Objective (and TL;DR)
> ================>  
> Migrate debug type information generation from the backends to the front
end.
>  
> This will enable:
> 1. Separation of concerns and maintainability: LLVM shouldn’t have to know
about C preprocessor macros, Obj-C properties, or extensive details about debug
information binary formats.
> 2. Performance: Skipping a serialization should speed up normal
compilations.
> 3. Memory usage: The DI metadata structures are smaller than they were, but
are still fairly large and pointer heavy.
>  
> Motivation
> =======>  
> Currently, types in LLVM debug info are described by the DIType class
hierarchy. This hierarchy evolved organically from a more flexible sea-of-nodes
representation into what it is today - a large, only somewhat format neutral
representation of debug types. Making this more format neutral will only
increase the memory use - and for no reason as type information is static (or
nearly so). Debug formats already have a memory efficient serialization, their
own binary format so we should support a front end emitting type information
with sufficient representation to allow the backend to emit debug information
based on the more normal IR features: functions, scopes, variables, etc.
>  
> Scope/Impact
> ==========>  
> This is going to involve large scale changes across both LLVM and clang.
This will also affect any out-of-tree front ends, however, we expect the impact
to be on the order of a large API change rather than needing massive
infrastructure changes.
>  
> Related work
> =========>  
> This is related to the efforts to support CodeView in LLVM and clang as
well as efforts to reduce overall memory consumption when compiling with debug
information enabled;  in particular efforts to prune LTO memory usage.
>  
>  
> Concerns
> =======>  
>  
> We need a good story for transitioning all the debug info testcases in the
backend without giving up coverage and/or readability. David believes he has a
plan here.
>  
> Proposal
> ======>  
> Short version
> -----------------
>  
> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line Table.
> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
> 3. Add a LLVM DWARF emission library similar to the existing CodeView one.
> 4. Migrate the Types API into a clang internal API taking clang AST
structures and use the LLVM binary emission libraries to produce type
information.
> 5. Remove the old binary emission out of LLVM.
>  
>  
> Questions/Thoughts/Elaboration
> -------------------------------------------
>  
> Splitting the DIBuilder API
> ~~~~~~~~~~~~~~~~~~~~
> Will DISubprogram be part of both?
>    * We should split it in two: Full declarations with type and a slimmed
down version with an abstract origin.
>  
> How will we reference types in the DWARF blob?
>    * ODR types can be referenced by name
>    * Non-odr types by full DWARF hash
>    * Each type can be a pair(tuple) of identifier (DITypeRef today) and
blob.
>    * For < DWARF4 we can emit each type as a unit, but not a DWARF Type
Unit and use references and module relocations for the offsets. (See below)
>  
> How will we handle references in DWARF2 or global relocations for non-type
template parameters?
>    * We can use a “relocation” metadata as part of the format.
>    * Representable as a tuple that has the DIType and the offset within the
DIBlob as where to write the final relocation/offset for the reference at
emission time.
>  
> Why break up the types at all?
>    * To enable non-debug format aware linking and type uniquing for LTO
that won’t be huge in size. We break up the types so we don’t need to parse
debug information to link two modules together efficiently.
>  
> Any other concerns there?
>    * Debug information without type units might be slightly larger in this
scheme due to parents being duplicated (declarations and abstract origin, not
full parents). It may be possible to extend dsymutil/etc to merge all siblings
into a common parent. Open question for better ways to solve this.
>  
> How should we handle DWARF5/Apple Accelerator Tables?
>    * Thoughts:
>    * We can parse the dwarf in the back end and generate them.
>    * We can emit in the front end for the base case of non-LTO (with help
from the backend for relocation aspects).
>    * We can use dsymutil on LTO debug information to generate them.
>  
> Why isn’t this a more detailed spec?
>    * Mostly because we’ve thought about the issues, but we can’t plan for
everything during implementation.
>  
>  
> Future work
> ----------------
>  
> Not contained as part of this, but an obvious future direction is that the
Module linker could grow support for debug aware linking. Then we can have all
of the type information for a single translation unit in a single blob and use
the debug aware linking to handle merging types.
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>  
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160331/61ed7456/attachment.html>

Reid Kleckner via llvm-dev

2016-Mar-31 22:47 UTC

head link

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

The split between CodeView and DWARF will happen at the level of type
information. So, DIVariable, DISubprogram, DILocation, DILocalScope, etc
will all be shared, but records and composite types etc will not.

On Thu, Mar 31, 2016 at 3:44 PM, Aboud, Amjad via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi Mehdi,
>
> I understand the reasoning for supporting this proposal independently from
> CodeView support.
>
> However, I do not think that it is needed for supporting CodeView.
>
>
>
> When I say that my suggestion is more clean, I was pointing to CodeView
> support, assuming the changes in LLVM IR/Clang FE indicated in this
> proposal.
>
> Also, it is not that clear from the proposal what will be shared (generic)
> between Dwarf and CodeView and what will be specific.
>
>
>
> Regards,
>
> Amjad
>
>
>
> *From:* mehdi.amini at apple.com [mailto:mehdi.amini at apple.com]
> *Sent:* Thursday, March 31, 2016 22:27
> *To:* Aboud, Amjad <amjad.aboud at intel.com>
> *Cc:* Eric Christopher <echristo at gmail.com>; Clang Dev <
> cfe-dev at lists.llvm.org>; llvm-dev <llvm-dev at lists.llvm.org>
> *Subject:* Re: [cfe-dev] [llvm-dev] RFC: Up front type information
> generation in clang and llvm
>
>
>
> Hi Aboud,
>
>
>
> On Mar 31, 2016, at 11:06 AM, Aboud, Amjad via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>
>
> Hi Eric,
>
> I can understand the need for improving the current design of debug info
> representation and emission in LLVM.
>
> However, let’s not forget that the motivation was and still to support
> CodeView debug info emission.
>
>
>
> Well, that is *one* motivation.
>
>
>
> I am wondering if it is right to spend the huge effort needed to implement
> the below proposal while knowing these facts:
>
> 1.      It would be more clear how to improve the design when we have a
> working CodeView support.
>
> You said it yourself, that we still do not know what challenges we will
> face while implementing this proposal.
>
> 2.      I understand that CodeView will need some extra extensions to
> current dwarf debug info, like ‘this’ adjustment.
>
> However, it is doable to introduce a CodeView wrapper data structures that
> can be created from current dwarf debug info IR.
>
> And this can be done in CodeGen (e.g. CodeViewDebug.cpp) while emitting
> the code/debug info.
>
>
>
> Again, I understand that your proposal is trying to improve a lot of things
>
>
>
> Yes, and to give some different perspective: some of these
"things" are a
> lot higher priority than CodeView (for other people/use cases of course),
> because DebugInfo cost is prohibitive for some use cases.
>
>
>
> , but it seems that we should first try support CodeView debug info with
> the current debug info IR.
>
> The advantages:
>
> 1.      It works, even though you still have doubts about few issues, I
> believe we can resolve them with minimum modification to the LLVM IR/Clang
> FE.
>
> 2.      It requires much smaller effort.
>
> 3.      It is much clean.
>
>
>
> If it is "much cleaner" in the IR, I understand that you have
insights
> about Eric's proposal being "less clean", independently of
adding CodeView
> before or after this change. If so it's worth elaborating on this.
>
>
>
>
>
> 4.      We will understand more the requirements needed by CodeView that
> can be used to improve the below proposal (before diving into implementing
> it).
>
>
>
> Don't you forget the "Cons":
>
>
>
> 1) It is easier to perform large refactoring/changes to the debug info
> flow *before* complexifying the problem.
>
> 2) This is adding more stuff that will need to go through all these
> changes, wasting effort in the process.
>
> 3) It will limit forward progress for people who don't care about
CodeView
> but want to move forward with restructuring DI deeply, like Eric's
proposal
> is doing.
>
>
>
> That is not to say that your points are not valid, but that it's not
that
> clear cut either.
>
>
>
> --
>
> Mehdi
>
>
>
>
>
>
>
> I suggest that we start with:
>
> 1.      Define the CodeView wrapper data structure. (CodeViewDebugIR)
>
> 2.      Build the CodeView wrapper data structure based on dwarf debug
> info IR. (CodeViewDebugBuilder)
>
> 3.      Emit the CodeView wrapper data structure into COFF object file.
> (CodeViewDebugEmitter)
>
> 4.      Figure out what modification/extension need to be done to dwarf
> debug info IR/Clang FE.
>
>
>
> What do you think?
>
>
>
> Thanks,
>
> Amjad
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org
> <llvm-dev-bounces at lists.llvm.org>] *On Behalf Of *Eric Christopher
via
> llvm-dev
> *Sent:* Wednesday, March 30, 2016 04:01
> *To:* Clang Dev <cfe-dev at lists.llvm.org>; llvm-dev <
> llvm-dev at lists.llvm.org>
> *Subject:* [llvm-dev] RFC: Up front type information generation in clang
> and llvm
>
>
>
> Hi All,
>
>
>
> This is something that's been talked about for some time and it's
probably
> time to propose it.
>
>
>
> The "We" in this document is everyone on the cc line plus me.
>
>
>
> Please go ahead and take a look.
>
>
>
> Thanks!
>
>
>
> -eric
>
>
>
>
>
> Objective (and TL;DR)
>
> ================>
>
>
> Migrate debug type information generation from the backends to the front
> end.
>
>
>
> This will enable:
>
> 1. Separation of concerns and maintainability: LLVM shouldn’t have to know
> about C preprocessor macros, Obj-C properties, or extensive details about
> debug information binary formats.
>
> 2. Performance: Skipping a serialization should speed up normal
> compilations.
>
> 3. Memory usage: The DI metadata structures are smaller than they were,
> but are still fairly large and pointer heavy.
>
>
>
> Motivation
>
> =======>
>
>
> Currently, types in LLVM debug info are described by the DIType class
> hierarchy. This hierarchy evolved organically from a more flexible
> sea-of-nodes representation into what it is today - a large, only somewhat
> format neutral representation of debug types. Making this more format
> neutral will only increase the memory use - and for no reason as type
> information is static (or nearly so). Debug formats already have a memory
> efficient serialization, their own binary format so we should support a
> front end emitting type information with sufficient representation to allow
> the backend to emit debug information based on the more normal IR features:
> functions, scopes, variables, etc.
>
>
>
> Scope/Impact
>
> ==========>
>
>
> This is going to involve large scale changes across both LLVM and clang.
> This will also affect any out-of-tree front ends, however, we expect the
> impact to be on the order of a large API change rather than needing massive
> infrastructure changes.
>
>
>
> Related work
>
> =========>
>
>
> This is related to the efforts to support CodeView in LLVM and clang as
> well as efforts to reduce overall memory consumption when compiling with
> debug information enabled;  in particular efforts to prune LTO memory
usage.
>
>
>
>
>
> Concerns
>
> =======>
>
>
>
>
> We need a good story for transitioning all the debug info testcases in the
> backend without giving up coverage and/or readability. David believes he
> has a plan here.
>
>
>
> Proposal
>
> ======>
>
>
> Short version
>
> -----------------
>
>
>
> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line Table.
>
> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
>
> 3. Add a LLVM DWARF emission library similar to the existing CodeView one.
>
> 4. Migrate the Types API into a clang internal API taking clang AST
> structures and use the LLVM binary emission libraries to produce type
> information.
>
> 5. Remove the old binary emission out of LLVM.
>
>
>
>
>
> Questions/Thoughts/Elaboration
>
> -------------------------------------------
>
>
>
> Splitting the DIBuilder API
>
> ~~~~~~~~~~~~~~~~~~~~
>
> Will DISubprogram be part of both?
>
>    * We should split it in two: Full declarations with type and a slimmed
> down version with an abstract origin.
>
>
>
> How will we reference types in the DWARF blob?
>
>    * ODR types can be referenced by name
>
>    * Non-odr types by full DWARF hash
>
>    * Each type can be a pair(tuple) of identifier (DITypeRef today) and
> blob.
>
>    * For < DWARF4 we can emit each type as a unit, but not a DWARF Type
> Unit and use references and module relocations for the offsets. (See below)
>
>
>
> How will we handle references in DWARF2 or global relocations for non-type
> template parameters?
>
>    * We can use a “relocation” metadata as part of the format.
>
>    * Representable as a tuple that has the DIType and the offset within
> the DIBlob as where to write the final relocation/offset for the reference
> at emission time.
>
>
>
> Why break up the types at all?
>
>    * To enable non-debug format aware linking and type uniquing for LTO
> that won’t be huge in size. We break up the types so we don’t need to parse
> debug information to link two modules together efficiently.
>
>
>
> Any other concerns there?
>
>    * Debug information without type units might be slightly larger in this
> scheme due to parents being duplicated (declarations and abstract origin,
> not full parents). It may be possible to extend dsymutil/etc to merge all
> siblings into a common parent. Open question for better ways to solve this.
>
>
>
> How should we handle DWARF5/Apple Accelerator Tables?
>
>    * Thoughts:
>
>    * We can parse the dwarf in the back end and generate them.
>
>    * We can emit in the front end for the base case of non-LTO (with help
> from the backend for relocation aspects).
>
>    * We can use dsymutil on LTO debug information to generate them.
>
>
>
> Why isn’t this a more detailed spec?
>
>    * Mostly because we’ve thought about the issues, but we can’t plan for
> everything during implementation.
>
>
>
>
>
> Future work
>
> ----------------
>
>
>
> Not contained as part of this, but an obvious future direction is that the
> Module linker could grow support for debug aware linking. Then we can have
> all of the type information for a single translation unit in a single blob
> and use the debug aware linking to handle merging types.
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160331/dd76e752/attachment-0001.html>

Aboud, Amjad via llvm-dev

2016-Apr-01 09:19 UTC

head link

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

I will say it one more time, it does not sound a good design to change the IR
according to the target debug info format.
Why?

·        This breaks the modularity of LLVM compiler, assume we want to support
another debug info format in the future? Do you expect us to add another
Record/Type specification to the IR?

·        This means that we need to modify Clang FE, LLVM MED optimizations, and
maybe duplicate/rewrite all related LIT tests , in addition to the expected work
in codegen.

As I see it, we can easily separate the FE/MED parts from backend (codegen).

·        In FE/MED we should have one representation for all debug info that
capture the debug information from the sources (we should not lose information
just because some formats, like DWARF, does not need them!)

·        In Backend, we should have separate emitters that convert the debug
information captured in the IR into suitable data structures that are related to
the target debug info format.
In addition, if we think that some information will be much easier to calculate
in FE rather than BE, we can create it either always, or according to the
target, but in this case it should be defined as optional field in the debug
info IR rather than a totally new separate debug info IR.

Regards,
Amjad

From: Reid Kleckner [mailto:rnk at google.com]
Sent: Friday, April 01, 2016 01:47
To: Aboud, Amjad <amjad.aboud at intel.com>
Cc: mehdi.amini at apple.com; llvm-dev <llvm-dev at lists.llvm.org>; Clang
Dev <cfe-dev at lists.llvm.org>
Subject: Re: [llvm-dev] [cfe-dev] RFC: Up front type information generation in
clang and llvm

The split between CodeView and DWARF will happen at the level of type
information. So, DIVariable, DISubprogram, DILocation, DILocalScope, etc will
all be shared, but records and composite types etc will not.

On Thu, Mar 31, 2016 at 3:44 PM, Aboud, Amjad via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Hi Mehdi,
I understand the reasoning for supporting this proposal independently from
CodeView support.
However, I do not think that it is needed for supporting CodeView.

When I say that my suggestion is more clean, I was pointing to CodeView support,
assuming the changes in LLVM IR/Clang FE indicated in this proposal.
Also, it is not that clear from the proposal what will be shared (generic)
between Dwarf and CodeView and what will be specific.

Regards,
Amjad

From: mehdi.amini at apple.com<mailto:mehdi.amini at apple.com>
[mailto:mehdi.amini at apple.com<mailto:mehdi.amini at apple.com>]
Sent: Thursday, March 31, 2016 22:27
To: Aboud, Amjad <amjad.aboud at intel.com<mailto:amjad.aboud at
intel.com>>
Cc: Eric Christopher <echristo at gmail.com<mailto:echristo at
gmail.com>>; Clang Dev <cfe-dev at lists.llvm.org<mailto:cfe-dev at
lists.llvm.org>>; llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: Re: [cfe-dev] [llvm-dev] RFC: Up front type information generation in
clang and llvm

Hi Aboud,

On Mar 31, 2016, at 11:06 AM, Aboud, Amjad via cfe-dev <cfe-dev at
lists.llvm.org<mailto:cfe-dev at lists.llvm.org>> wrote:

Hi Eric,
I can understand the need for improving the current design of debug info
representation and emission in LLVM.
However, let’s not forget that the motivation was and still to support CodeView
debug info emission.

Well, that is *one* motivation.

I am wondering if it is right to spend the huge effort needed to implement the
below proposal while knowing these facts:
1.      It would be more clear how to improve the design when we have a working
CodeView support.
You said it yourself, that we still do not know what challenges we will face
while implementing this proposal.
2.      I understand that CodeView will need some extra extensions to current
dwarf debug info, like ‘this’ adjustment.
However, it is doable to introduce a CodeView wrapper data structures that can
be created from current dwarf debug info IR.
And this can be done in CodeGen (e.g. CodeViewDebug.cpp) while emitting the
code/debug info.

Again, I understand that your proposal is trying to improve a lot of things

Yes, and to give some different perspective: some of these "things"
are a lot higher priority than CodeView (for other people/use cases of course),
because DebugInfo cost is prohibitive for some use cases.

, but it seems that we should first try support CodeView debug info with the
current debug info IR.
The advantages:
1.      It works, even though you still have doubts about few issues, I believe
we can resolve them with minimum modification to the LLVM IR/Clang FE.
2.      It requires much smaller effort.
3.      It is much clean.

If it is "much cleaner" in the IR, I understand that you have insights
about Eric's proposal being "less clean", independently of adding
CodeView before or after this change. If so it's worth elaborating on this.


4.      We will understand more the requirements needed by CodeView that can be
used to improve the below proposal (before diving into implementing it).

Don't you forget the "Cons":

1) It is easier to perform large refactoring/changes to the debug info flow
*before* complexifying the problem.
2) This is adding more stuff that will need to go through all these changes,
wasting effort in the process.
3) It will limit forward progress for people who don't care about CodeView
but want to move forward with restructuring DI deeply, like Eric's proposal
is doing.

That is not to say that your points are not valid, but that it's not that
clear cut either.

--
Mehdi



I suggest that we start with:
1.      Define the CodeView wrapper data structure. (CodeViewDebugIR)
2.      Build the CodeView wrapper data structure based on dwarf debug info IR.
(CodeViewDebugBuilder)
3.      Emit the CodeView wrapper data structure into COFF object file.
(CodeViewDebugEmitter)
4.      Figure out what modification/extension need to be done to dwarf debug
info IR/Clang FE.

What do you think?

Thanks,
Amjad

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Eric
Christopher via llvm-dev
Sent: Wednesday, March 30, 2016 04:01
To: Clang Dev <cfe-dev at lists.llvm.org<mailto:cfe-dev at
lists.llvm.org>>; llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: [llvm-dev] RFC: Up front type information generation in clang and llvm

Hi All,

This is something that's been talked about for some time and it's
probably time to propose it.

The "We" in this document is everyone on the cc line plus me.

Please go ahead and take a look.

Thanks!

-eric


Objective (and TL;DR)
================
Migrate debug type information generation from the backends to the front end.

This will enable:
1. Separation of concerns and maintainability: LLVM shouldn’t have to know about
C preprocessor macros, Obj-C properties, or extensive details about debug
information binary formats.
2. Performance: Skipping a serialization should speed up normal compilations.
3. Memory usage: The DI metadata structures are smaller than they were, but are
still fairly large and pointer heavy.

Motivation
=======
Currently, types in LLVM debug info are described by the DIType class hierarchy.
This hierarchy evolved organically from a more flexible sea-of-nodes
representation into what it is today - a large, only somewhat format neutral
representation of debug types. Making this more format neutral will only
increase the memory use - and for no reason as type information is static (or
nearly so). Debug formats already have a memory efficient serialization, their
own binary format so we should support a front end emitting type information
with sufficient representation to allow the backend to emit debug information
based on the more normal IR features: functions, scopes, variables, etc.

Scope/Impact
==========
This is going to involve large scale changes across both LLVM and clang. This
will also affect any out-of-tree front ends, however, we expect the impact to be
on the order of a large API change rather than needing massive infrastructure
changes.

Related work
=========
This is related to the efforts to support CodeView in LLVM and clang as well as
efforts to reduce overall memory consumption when compiling with debug
information enabled;  in particular efforts to prune LTO memory usage.


Concerns
=======

We need a good story for transitioning all the debug info testcases in the
backend without giving up coverage and/or readability. David believes he has a
plan here.

Proposal
======
Short version
-----------------

1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line Table.
2. Split the clang CGDebugInfo API into Types and Line Table to match.
3. Add a LLVM DWARF emission library similar to the existing CodeView one.
4. Migrate the Types API into a clang internal API taking clang AST structures
and use the LLVM binary emission libraries to produce type information.
5. Remove the old binary emission out of LLVM.


Questions/Thoughts/Elaboration
-------------------------------------------

Splitting the DIBuilder API
~~~~~~~~~~~~~~~~~~~~
Will DISubprogram be part of both?
   * We should split it in two: Full declarations with type and a slimmed down
version with an abstract origin.

How will we reference types in the DWARF blob?
   * ODR types can be referenced by name
   * Non-odr types by full DWARF hash
   * Each type can be a pair(tuple) of identifier (DITypeRef today) and blob.
   * For < DWARF4 we can emit each type as a unit, but not a DWARF Type Unit
and use references and module relocations for the offsets. (See below)

How will we handle references in DWARF2 or global relocations for non-type
template parameters?
   * We can use a “relocation” metadata as part of the format.
   * Representable as a tuple that has the DIType and the offset within the
DIBlob as where to write the final relocation/offset for the reference at
emission time.

Why break up the types at all?
   * To enable non-debug format aware linking and type uniquing for LTO that
won’t be huge in size. We break up the types so we don’t need to parse debug
information to link two modules together efficiently.

Any other concerns there?
   * Debug information without type units might be slightly larger in this
scheme due to parents being duplicated (declarations and abstract origin, not
full parents). It may be possible to extend dsymutil/etc to merge all siblings
into a common parent. Open question for better ways to solve this.

How should we handle DWARF5/Apple Accelerator Tables?
   * Thoughts:
   * We can parse the dwarf in the back end and generate them.
   * We can emit in the front end for the base case of non-LTO (with help from
the backend for relocation aspects).
   * We can use dsymutil on LTO debug information to generate them.

Why isn’t this a more detailed spec?
   * Mostly because we’ve thought about the issues, but we can’t plan for
everything during implementation.


Future work
----------------

Not contained as part of this, but an obvious future direction is that the
Module linker could grow support for debug aware linking. Then we can have all
of the type information for a single translation unit in a single blob and use
the debug aware linking to handle merging types.
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
_______________________________________________
cfe-dev mailing list
cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160401/78ded816/attachment-0001.html>

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Mar 2016 - [cfe-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

Reasonably Related Threads