thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Eric Christopher via llvm-dev

2016-Mar-30 03:15 UTC

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

On Tue, Mar 29, 2016 at 8:11 PM Peter Collingbourne <peter at pcc.me.uk>
wrote:
> On Tue, Mar 29, 2016 at 7:43 PM, Eric Christopher <echristo at
gmail.com>
> wrote:
>
>>
>>
>> On Tue, Mar 29, 2016 at 7:31 PM Peter Collingbourne <peter at
pcc.me.uk>
>> wrote:
>>
>>> Thanks for sharing this. Mostly seems like a reasonable plan to me.
A
>>> few comments below.
>>>
>>>
>> Thanks Peter!
>>
>>
>>> On Tue, Mar 29, 2016 at 6:00 PM, Eric Christopher via cfe-dev <
>>> cfe-dev at lists.llvm.org> wrote:
>>>
>>>> Hi All,
>>>>
>>>> This is something that's been talked about for some time
and it's
>>>> probably time to propose it.
>>>>
>>>> The "We" in this document is everyone on the cc line
plus me.
>>>>
>>>> Please go ahead and take a look.
>>>>
>>>> Thanks!
>>>>
>>>> -eric
>>>>
>>>>
>>>> Objective (and TL;DR)
>>>> ================>>>>
>>>> Migrate debug type information generation from the backends to
the
>>>> front end.
>>>>
>>>> This will enable:
>>>> 1. Separation of concerns and maintainability: LLVM shouldn’t
have to
>>>> know about C preprocessor macros, Obj-C properties, or
extensive details
>>>> about debug information binary formats.
>>>> 2. Performance: Skipping a serialization should speed up normal
>>>> compilations.
>>>> 3. Memory usage: The DI metadata structures are smaller than
they were,
>>>> but are still fairly large and pointer heavy.
>>>>
>>>> Motivation
>>>> =======>>>>
>>>> Currently, types in LLVM debug info are described by the DIType
class
>>>> hierarchy. This hierarchy evolved organically from a more
flexible
>>>> sea-of-nodes representation into what it is today - a large,
only somewhat
>>>> format neutral representation of debug types. Making this more
format
>>>> neutral will only increase the memory use - and for no reason
as type
>>>> information is static (or nearly so). Debug formats already
have a memory
>>>> efficient serialization, their own binary format so we should
support a
>>>> front end emitting type information with sufficient
representation to allow
>>>> the backend to emit debug information based on the more normal
IR features:
>>>> functions, scopes, variables, etc.
>>>>
>>>> Scope/Impact
>>>> ==========>>>>
>>>> This is going to involve large scale changes across both LLVM
and
>>>> clang. This will also affect any out-of-tree front ends,
however, we expect
>>>> the impact to be on the order of a large API change rather than
needing
>>>> massive infrastructure changes.
>>>>
>>>> Related work
>>>> =========>>>>
>>>> This is related to the efforts to support CodeView in LLVM and
clang as
>>>> well as efforts to reduce overall memory consumption when
compiling with
>>>> debug information enabled;  in particular efforts to prune LTO
memory usage.
>>>>
>>>>
>>>> Concerns
>>>> =======>>>>
>>>>
>>>> We need a good story for transitioning all the debug info
testcases in
>>>> the backend without giving up coverage and/or readability.
David believes
>>>> he has a plan here.
>>>>
>>>> Proposal
>>>> ======>>>>
>>>> Short version
>>>> -----------------
>>>>
>>>> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and
Line
>>>> Table.
>>>> 2. Split the clang CGDebugInfo API into Types and Line Table to
match.
>>>> 3. Add a LLVM DWARF emission library similar to the existing
CodeView
>>>> one.
>>>> 4. Migrate the Types API into a clang internal API taking clang
AST
>>>> structures and use the LLVM binary emission libraries to
produce type
>>>> information.
>>>> 5. Remove the old binary emission out of LLVM.
>>>>
>>>>
>>>> Questions/Thoughts/Elaboration
>>>> -------------------------------------------
>>>>
>>>> Splitting the DIBuilder API
>>>> ~~~~~~~~~~~~~~~~~~~~
>>>> Will DISubprogram be part of both?
>>>>    * We should split it in two: Full declarations with type and
a
>>>> slimmed down version with an abstract origin.
>>>>
>>>> How will we reference types in the DWARF blob?
>>>>    * ODR types can be referenced by name
>>>>    * Non-odr types by full DWARF hash
>>>>    * Each type can be a pair(tuple) of identifier (DITypeRef
today) and
>>>> blob.
>>>>    * For < DWARF4 we can emit each type as a unit, but not a
DWARF Type
>>>> Unit and use references and module relocations for the offsets.
(See below)
>>>>
>>>> How will we handle references in DWARF2 or global relocations
for
>>>> non-type template parameters?
>>>>    * We can use a “relocation” metadata as part of the format.
>>>>    * Representable as a tuple that has the DIType and the
offset within
>>>> the DIBlob as where to write the final relocation/offset for
the reference
>>>> at emission time.
>>>>
>>>> Why break up the types at all?
>>>>    * To enable non-debug format aware linking and type uniquing
for LTO
>>>> that won’t be huge in size. We break up the types so we don’t
need to parse
>>>> debug information to link two modules together efficiently.
>>>>
>>>
>>> How do you plan to handle abbreviations? You wouldn't
necessarily be
>>> able to embed them directly in the blob, as when doing LTO each
compilation
>>> unit would have its own set of abbreviations. I suppose you could
do
>>> something like treat them as a special sort of reference to an
abbreviation
>>> table entry, or maybe pre-allocate in the frontend (but would
complicate
>>> cross-frontend LTO) but curious what you have in mind.
>>>
>>
>> Thanks for reminding me, I knew I was forgetting something I'd
talked
>> about when writing all of this down. :)
>>
>> Basically to handle abbreviations you can do them the similarly to
types
>> by creating a blob with an index/hash/etc and then reference that as
part
>> of the type tuple, e.g.:
>>
>> $1 = { DIAbbrev: 0x1234, DIBlob: <blah> }
>> $2 = { DIType: <ID>, DIAbbrev: $1, DIBlob: <blah> }
>>
>> and keep them uniqued during emission and remember to merge these as
well
>> during module merge time.
>>
>
> Makes sense, but wouldn't you need multiple abbreviations for each
DIType,
> in order to represent DITypes formed of multiple DIEs (e.g. enums,
records)?
>
> Maybe something like this would work:
>
> $1 = { DIAbbrev: 0x1234, DIBlob: DW_TAG_enumeration_type<blah> }
> $2 = { DIAbbrev: 0x5678, DIBlob: DW_TAG_enumerator<blah> }
> $3 = { DIType: <ID>, DIAbbrev: [(0, $1), (8, $2), (16, $2)], DIBlob:
<8
> bytes of DW_TAG_enumeration_type attrs><8 bytes of DW_TAG_enumerator
> attrs><8 bytes of DW_TAG_enumerator attrs><0> }
>
> ?
>
*nod* That (or something similar) will work.

-eric


>
>
>>
>>>
>>> Any other concerns there?
>>>>    * Debug information without type units might be slightly
larger in
>>>> this scheme due to parents being duplicated (declarations and
abstract
>>>> origin, not full parents). It may be possible to extend
dsymutil/etc to
>>>> merge all siblings into a common parent. Open question for
better ways to
>>>> solve this.
>>>>
>>>
>>> When we were thinking about teaching the backend to produce blobs
from
>>> IR metadata we were thinking about cases where the debug info
emitter would
>>> discover special member functions during IR traversal. I guess
since we're
>>> moving all of that to the frontend we can just ask the frontend
directly
>>> which special members are needed on the class. That solves the
problem for
>>> a single translation unit. But what do you plan to do in the
multiple
>>> translation unit case where two TUs declare different special
members on a
>>> class? Would it be fine to just emit the two definitions and let
the
>>> debugger sort it out? I guess this is the type of thing that
debuggers
>>> normally deal with in the non-LTO case, so I suppose so?
>>>
>>
>> Pretty much. This is one area where I have... disagreements with the
>> DWARF committee and I don't think there's anything else we can
do here. TBH
>> right now I think we'd have issues with type units and special
member
>> functions since we're using ODR-ness to unique.
>>
>> -eric
>>
>>
>>>
>>>
>>>> How should we handle DWARF5/Apple Accelerator Tables?
>>>>    * Thoughts:
>>>>    * We can parse the dwarf in the back end and generate them.
>>>>    * We can emit in the front end for the base case of non-LTO
(with
>>>> help from the backend for relocation aspects).
>>>>    * We can use dsymutil on LTO debug information to generate
them.
>>>>
>>>> Why isn’t this a more detailed spec?
>>>>    * Mostly because we’ve thought about the issues, but we
can’t plan
>>>> for everything during implementation.
>>>>
>>>>
>>>> Future work
>>>> ----------------
>>>>
>>>> Not contained as part of this, but an obvious future direction
is that
>>>> the Module linker could grow support for debug aware linking.
Then we can
>>>> have all of the type information for a single translation unit
in a single
>>>> blob and use the debug aware linking to handle merging types.
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>
>>>>
>>>
>>>
>>> --
>>> --
>>> Peter
>>>
>>
>
>
> --
> --
> Peter
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160330/635cb17b/attachment.html>

mats petersson via llvm-dev

2016-Mar-30 06:35 UTC

head link

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

How will this affect other languages that generate debug info - not that
you should care about those, I'm just curious - my Pascal compiler does not
generate clang-style AST, and does not use clang at all. I currently have
code that in uses DIBuilder directly...

--
Mats

On 30 March 2016 at 04:15, Eric Christopher via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
>
>
> On Tue, Mar 29, 2016 at 8:11 PM Peter Collingbourne <peter at
pcc.me.uk>
> wrote:
>
>> On Tue, Mar 29, 2016 at 7:43 PM, Eric Christopher <echristo at
gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Mar 29, 2016 at 7:31 PM Peter Collingbourne <peter at
pcc.me.uk>
>>> wrote:
>>>
>>>> Thanks for sharing this. Mostly seems like a reasonable plan to
me. A
>>>> few comments below.
>>>>
>>>>
>>> Thanks Peter!
>>>
>>>
>>>> On Tue, Mar 29, 2016 at 6:00 PM, Eric Christopher via cfe-dev
<
>>>> cfe-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> This is something that's been talked about for some
time and it's
>>>>> probably time to propose it.
>>>>>
>>>>> The "We" in this document is everyone on the cc
line plus me.
>>>>>
>>>>> Please go ahead and take a look.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> -eric
>>>>>
>>>>>
>>>>> Objective (and TL;DR)
>>>>> ================>>>>>
>>>>> Migrate debug type information generation from the backends
to the
>>>>> front end.
>>>>>
>>>>> This will enable:
>>>>> 1. Separation of concerns and maintainability: LLVM
shouldn’t have to
>>>>> know about C preprocessor macros, Obj-C properties, or
extensive details
>>>>> about debug information binary formats.
>>>>> 2. Performance: Skipping a serialization should speed up
normal
>>>>> compilations.
>>>>> 3. Memory usage: The DI metadata structures are smaller
than they
>>>>> were, but are still fairly large and pointer heavy.
>>>>>
>>>>> Motivation
>>>>> =======>>>>>
>>>>> Currently, types in LLVM debug info are described by the
DIType class
>>>>> hierarchy. This hierarchy evolved organically from a more
flexible
>>>>> sea-of-nodes representation into what it is today - a
large, only somewhat
>>>>> format neutral representation of debug types. Making this
more format
>>>>> neutral will only increase the memory use - and for no
reason as type
>>>>> information is static (or nearly so). Debug formats already
have a memory
>>>>> efficient serialization, their own binary format so we
should support a
>>>>> front end emitting type information with sufficient
representation to allow
>>>>> the backend to emit debug information based on the more
normal IR features:
>>>>> functions, scopes, variables, etc.
>>>>>
>>>>> Scope/Impact
>>>>> ==========>>>>>
>>>>> This is going to involve large scale changes across both
LLVM and
>>>>> clang. This will also affect any out-of-tree front ends,
however, we expect
>>>>> the impact to be on the order of a large API change rather
than needing
>>>>> massive infrastructure changes.
>>>>>
>>>>> Related work
>>>>> =========>>>>>
>>>>> This is related to the efforts to support CodeView in LLVM
and clang
>>>>> as well as efforts to reduce overall memory consumption
when compiling with
>>>>> debug information enabled;  in particular efforts to prune
LTO memory usage.
>>>>>
>>>>>
>>>>> Concerns
>>>>> =======>>>>>
>>>>>
>>>>> We need a good story for transitioning all the debug info
testcases in
>>>>> the backend without giving up coverage and/or readability.
David believes
>>>>> he has a plan here.
>>>>>
>>>>> Proposal
>>>>> ======>>>>>
>>>>> Short version
>>>>> -----------------
>>>>>
>>>>> 1. Split the DIBuilder API into Types (+Macros, Imports, …)
and Line
>>>>> Table.
>>>>> 2. Split the clang CGDebugInfo API into Types and Line
Table to match.
>>>>> 3. Add a LLVM DWARF emission library similar to the
existing CodeView
>>>>> one.
>>>>> 4. Migrate the Types API into a clang internal API taking
clang AST
>>>>> structures and use the LLVM binary emission libraries to
produce type
>>>>> information.
>>>>> 5. Remove the old binary emission out of LLVM.
>>>>>
>>>>>
>>>>> Questions/Thoughts/Elaboration
>>>>> -------------------------------------------
>>>>>
>>>>> Splitting the DIBuilder API
>>>>> ~~~~~~~~~~~~~~~~~~~~
>>>>> Will DISubprogram be part of both?
>>>>>    * We should split it in two: Full declarations with type
and a
>>>>> slimmed down version with an abstract origin.
>>>>>
>>>>> How will we reference types in the DWARF blob?
>>>>>    * ODR types can be referenced by name
>>>>>    * Non-odr types by full DWARF hash
>>>>>    * Each type can be a pair(tuple) of identifier
(DITypeRef today)
>>>>> and blob.
>>>>>    * For < DWARF4 we can emit each type as a unit, but
not a DWARF
>>>>> Type Unit and use references and module relocations for the
offsets. (See
>>>>> below)
>>>>>
>>>>> How will we handle references in DWARF2 or global
relocations for
>>>>> non-type template parameters?
>>>>>    * We can use a “relocation” metadata as part of the
format.
>>>>>    * Representable as a tuple that has the DIType and the
offset
>>>>> within the DIBlob as where to write the final
relocation/offset for the
>>>>> reference at emission time.
>>>>>
>>>>> Why break up the types at all?
>>>>>    * To enable non-debug format aware linking and type
uniquing for
>>>>> LTO that won’t be huge in size. We break up the types so we
don’t need to
>>>>> parse debug information to link two modules together
efficiently.
>>>>>
>>>>
>>>> How do you plan to handle abbreviations? You wouldn't
necessarily be
>>>> able to embed them directly in the blob, as when doing LTO each
compilation
>>>> unit would have its own set of abbreviations. I suppose you
could do
>>>> something like treat them as a special sort of reference to an
abbreviation
>>>> table entry, or maybe pre-allocate in the frontend (but would
complicate
>>>> cross-frontend LTO) but curious what you have in mind.
>>>>
>>>
>>> Thanks for reminding me, I knew I was forgetting something I'd
talked
>>> about when writing all of this down. :)
>>>
>>> Basically to handle abbreviations you can do them the similarly to
types
>>> by creating a blob with an index/hash/etc and then reference that
as part
>>> of the type tuple, e.g.:
>>>
>>> $1 = { DIAbbrev: 0x1234, DIBlob: <blah> }
>>> $2 = { DIType: <ID>, DIAbbrev: $1, DIBlob: <blah> }
>>>
>>> and keep them uniqued during emission and remember to merge these
as
>>> well during module merge time.
>>>
>>
>> Makes sense, but wouldn't you need multiple abbreviations for each
>> DIType, in order to represent DITypes formed of multiple DIEs (e.g.
enums,
>> records)?
>>
>> Maybe something like this would work:
>>
>> $1 = { DIAbbrev: 0x1234, DIBlob: DW_TAG_enumeration_type<blah> }
>> $2 = { DIAbbrev: 0x5678, DIBlob: DW_TAG_enumerator<blah> }
>> $3 = { DIType: <ID>, DIAbbrev: [(0, $1), (8, $2), (16, $2)],
DIBlob: <8
>> bytes of DW_TAG_enumeration_type attrs><8 bytes of
DW_TAG_enumerator
>> attrs><8 bytes of DW_TAG_enumerator attrs><0> }
>>
>> ?
>>
>
> *nod* That (or something similar) will work.
>
> -eric
>
>
>
>>
>>
>>>
>>>>
>>>> Any other concerns there?
>>>>>    * Debug information without type units might be slightly
larger in
>>>>> this scheme due to parents being duplicated (declarations
and abstract
>>>>> origin, not full parents). It may be possible to extend
dsymutil/etc to
>>>>> merge all siblings into a common parent. Open question for
better ways to
>>>>> solve this.
>>>>>
>>>>
>>>> When we were thinking about teaching the backend to produce
blobs from
>>>> IR metadata we were thinking about cases where the debug info
emitter would
>>>> discover special member functions during IR traversal. I guess
since we're
>>>> moving all of that to the frontend we can just ask the frontend
directly
>>>> which special members are needed on the class. That solves the
problem for
>>>> a single translation unit. But what do you plan to do in the
multiple
>>>> translation unit case where two TUs declare different special
members on a
>>>> class? Would it be fine to just emit the two definitions and
let the
>>>> debugger sort it out? I guess this is the type of thing that
debuggers
>>>> normally deal with in the non-LTO case, so I suppose so?
>>>>
>>>
>>> Pretty much. This is one area where I have... disagreements with
the
>>> DWARF committee and I don't think there's anything else we
can do here. TBH
>>> right now I think we'd have issues with type units and special
member
>>> functions since we're using ODR-ness to unique.
>>>
>>> -eric
>>>
>>>
>>>>
>>>>
>>>>> How should we handle DWARF5/Apple Accelerator Tables?
>>>>>    * Thoughts:
>>>>>    * We can parse the dwarf in the back end and generate
them.
>>>>>    * We can emit in the front end for the base case of
non-LTO (with
>>>>> help from the backend for relocation aspects).
>>>>>    * We can use dsymutil on LTO debug information to
generate them.
>>>>>
>>>>> Why isn’t this a more detailed spec?
>>>>>    * Mostly because we’ve thought about the issues, but we
can’t plan
>>>>> for everything during implementation.
>>>>>
>>>>>
>>>>> Future work
>>>>> ----------------
>>>>>
>>>>> Not contained as part of this, but an obvious future
direction is that
>>>>> the Module linker could grow support for debug aware
linking. Then we can
>>>>> have all of the type information for a single translation
unit in a single
>>>>> blob and use the debug aware linking to handle merging
types.
>>>>>
>>>>> _______________________________________________
>>>>> cfe-dev mailing list
>>>>> cfe-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> --
>>>> Peter
>>>>
>>>
>>
>>
>> --
>> --
>> Peter
>>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160330/8ebb39a3/attachment.html>

Adrian Prantl via llvm-dev

2016-Mar-30 15:39 UTC

head link

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

> On Mar 29, 2016, at 11:35 PM, mats petersson via cfe-dev <cfe-dev at
lists.llvm.org> wrote:
> 
> How will this affect other languages that generate debug info - not that
you should care about those, I'm just curious - my Pascal compiler does not
generate clang-style AST, and does not use clang at all. I currently have code
that in uses DIBuilder directly...
I don’t think that the code for generating DWARF types should move into Clang,
but rather in a separate library that can be shared by multiple frontends. It
can even keep most of the existing DIBuilder interface (but we may need to split
DIBuilder in a types vs. everything else part).

-- adrian
> 
> --
> Mats
> 
> On 30 March 2016 at 04:15, Eric Christopher via cfe-dev <cfe-dev at
lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
> 
> 
> On Tue, Mar 29, 2016 at 8:11 PM Peter Collingbourne <peter at pcc.me.uk
<mailto:peter at pcc.me.uk>> wrote:
> On Tue, Mar 29, 2016 at 7:43 PM, Eric Christopher <echristo at gmail.com
<mailto:echristo at gmail.com>> wrote:
> 
> 
> On Tue, Mar 29, 2016 at 7:31 PM Peter Collingbourne <peter at pcc.me.uk
<mailto:peter at pcc.me.uk>> wrote:
> Thanks for sharing this. Mostly seems like a reasonable plan to me. A few
comments below.
> 
> 
> Thanks Peter!
>  
> On Tue, Mar 29, 2016 at 6:00 PM, Eric Christopher via cfe-dev <cfe-dev
at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
> Hi All,
> 
> This is something that's been talked about for some time and it's
probably time to propose it.
> 
> The "We" in this document is everyone on the cc line plus me.
> 
> Please go ahead and take a look.
> 
> Thanks!
> 
> -eric
> 
> 
> Objective (and TL;DR)
> ================> 
> Migrate debug type information generation from the backends to the front
end.
> 
> This will enable:
> 1. Separation of concerns and maintainability: LLVM shouldn’t have to know
about C preprocessor macros, Obj-C properties, or extensive details about debug
information binary formats.
> 2. Performance: Skipping a serialization should speed up normal
compilations.
> 3. Memory usage: The DI metadata structures are smaller than they were, but
are still fairly large and pointer heavy.
> 
> Motivation
> =======> 
> Currently, types in LLVM debug info are described by the DIType class
hierarchy. This hierarchy evolved organically from a more flexible sea-of-nodes
representation into what it is today - a large, only somewhat format neutral
representation of debug types. Making this more format neutral will only
increase the memory use - and for no reason as type information is static (or
nearly so). Debug formats already have a memory efficient serialization, their
own binary format so we should support a front end emitting type information
with sufficient representation to allow the backend to emit debug information
based on the more normal IR features: functions, scopes, variables, etc.
> 
> Scope/Impact
> ==========> 
> This is going to involve large scale changes across both LLVM and clang.
This will also affect any out-of-tree front ends, however, we expect the impact
to be on the order of a large API change rather than needing massive
infrastructure changes.
> 
> Related work
> =========> 
> This is related to the efforts to support CodeView in LLVM and clang as
well as efforts to reduce overall memory consumption when compiling with debug
information enabled;  in particular efforts to prune LTO memory usage.
> 
> 
> Concerns
> =======> 
> 
> We need a good story for transitioning all the debug info testcases in the
backend without giving up coverage and/or readability. David believes he has a
plan here.
> 
> Proposal
> ======> 
> Short version
> -----------------
> 
> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line Table.
> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
> 3. Add a LLVM DWARF emission library similar to the existing CodeView one.
> 4. Migrate the Types API into a clang internal API taking clang AST
structures and use the LLVM binary emission libraries to produce type
information.
> 5. Remove the old binary emission out of LLVM.
> 
> 
> Questions/Thoughts/Elaboration
> -------------------------------------------
> 
> Splitting the DIBuilder API
> ~~~~~~~~~~~~~~~~~~~~
> Will DISubprogram be part of both?
>    * We should split it in two: Full declarations with type and a slimmed
down version with an abstract origin.
> 
> How will we reference types in the DWARF blob?
>    * ODR types can be referenced by name
>    * Non-odr types by full DWARF hash
>    * Each type can be a pair(tuple) of identifier (DITypeRef today) and
blob.
>    * For < DWARF4 we can emit each type as a unit, but not a DWARF Type
Unit and use references and module relocations for the offsets. (See below)
> 
> How will we handle references in DWARF2 or global relocations for non-type
template parameters?
>    * We can use a “relocation” metadata as part of the format.
>    * Representable as a tuple that has the DIType and the offset within the
DIBlob as where to write the final relocation/offset for the reference at
emission time.
> 
> Why break up the types at all?
>    * To enable non-debug format aware linking and type uniquing for LTO
that won’t be huge in size. We break up the types so we don’t need to parse
debug information to link two modules together efficiently.
> 
> How do you plan to handle abbreviations? You wouldn't necessarily be
able to embed them directly in the blob, as when doing LTO each compilation unit
would have its own set of abbreviations. I suppose you could do something like
treat them as a special sort of reference to an abbreviation table entry, or
maybe pre-allocate in the frontend (but would complicate cross-frontend LTO) but
curious what you have in mind.
> 
> Thanks for reminding me, I knew I was forgetting something I'd talked
about when writing all of this down. :)
> 
> Basically to handle abbreviations you can do them the similarly to types by
creating a blob with an index/hash/etc and then reference that as part of the
type tuple, e.g.:
> 
> $1 = { DIAbbrev: 0x1234, DIBlob: <blah> }
> $2 = { DIType: <ID>, DIAbbrev: $1, DIBlob: <blah> }
> 
> and keep them uniqued during emission and remember to merge these as well
during module merge time.
> 
> Makes sense, but wouldn't you need multiple abbreviations for each
DIType, in order to represent DITypes formed of multiple DIEs (e.g. enums,
records)?
> 
> Maybe something like this would work:
> 
> $1 = { DIAbbrev: 0x1234, DIBlob: DW_TAG_enumeration_type<blah> }
> $2 = { DIAbbrev: 0x5678, DIBlob: DW_TAG_enumerator<blah> }
> $3 = { DIType: <ID>, DIAbbrev: [(0, $1), (8, $2), (16, $2)], DIBlob:
<8 bytes of DW_TAG_enumeration_type attrs><8 bytes of DW_TAG_enumerator
attrs><8 bytes of DW_TAG_enumerator attrs><0> }
> 
> ?
> 
> *nod* That (or something similar) will work.
> 
> -eric
> 
>  
> 
>  
> 
> Any other concerns there?
>    * Debug information without type units might be slightly larger in this
scheme due to parents being duplicated (declarations and abstract origin, not
full parents). It may be possible to extend dsymutil/etc to merge all siblings
into a common parent. Open question for better ways to solve this.
> 
> When we were thinking about teaching the backend to produce blobs from IR
metadata we were thinking about cases where the debug info emitter would
discover special member functions during IR traversal. I guess since we're
moving all of that to the frontend we can just ask the frontend directly which
special members are needed on the class. That solves the problem for a single
translation unit. But what do you plan to do in the multiple translation unit
case where two TUs declare different special members on a class? Would it be
fine to just emit the two definitions and let the debugger sort it out? I guess
this is the type of thing that debuggers normally deal with in the non-LTO case,
so I suppose so?
> 
> Pretty much. This is one area where I have... disagreements with the DWARF
committee and I don't think there's anything else we can do here. TBH
right now I think we'd have issues with type units and special member
functions since we're using ODR-ness to unique.
> 
> -eric
>  
>  
> How should we handle DWARF5/Apple Accelerator Tables?
>    * Thoughts:
>    * We can parse the dwarf in the back end and generate them.
>    * We can emit in the front end for the base case of non-LTO (with help
from the backend for relocation aspects).
>    * We can use dsymutil on LTO debug information to generate them.
> 
> Why isn’t this a more detailed spec?
>    * Mostly because we’ve thought about the issues, but we can’t plan for
everything during implementation.
> 
> 
> Future work
> ----------------
> 
> Not contained as part of this, but an obvious future direction is that the
Module linker could grow support for debug aware linking. Then we can have all
of the type information for a single translation unit in a single blob and use
the debug aware linking to handle merging types.
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
> 
> 
> 
> 
> -- 
> -- 
> Peter
> 
> 
> 
> -- 
> -- 
> Peter
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
> 
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160330/95a70ed0/attachment.html>

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Mar 2016 - [cfe-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

Apparently Analagous Threads