thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm [May 2016]

If this information is useful, please help other people find it:
Share via:

Mehdi Amini via llvm-dev

2016-May-11 06:32 UTC

[llvm-dev] RFC: Up front type information generation in clang and llvm

Hi Eric,

I'm coming back on this topic after discussing this offline quickly with
Reid, and at length with Adrian, Duncan, and Fred.
I may have to take back some of my words from my previous email, especially as
it is not clear how and why what Reid is proposing to do is hurting a future
path for Dwarf.

Especially, if my understanding is correct, the key point that differentiate
what Reid is trying to do from what you envision is that he would emit a single
type blob per Module. Following up on what Fred mentioned, i.e. " it would
make more sense to leave the type info in the object files and to somehow have
the LTO link emit external references to it (ala module debugging)", it
seems to be quite LTO friendly and very efficient. I like the fact that you
don't pay the price of building a type hierarchy graph when you don't
need it, and I'm not sure why we should clobbered the IR with all the graph
when it is not relevant (i.e. outside of debug-info linking).

On the other hand, it seems that what you're proposing is basically
"optimized" for "type units" (which are not supported on
Darwin anyway) and the only advantage we could see is to have an easy way of
type-uniquing directly in the IR.

Our conclusion was that for us, a single type blob with somehow "smart
reference" to be able to point inside the blob from the outside is the most
efficient things we can built upon. However the cost/benefit of getting there is
too high for us to prioritize working this at this point.
(If I misrepresented anything, please Adrian/Duncan/Fred correct me)

-- 
Mehdi
 

> On Apr 27, 2016, at 6:41 PM, Mehdi Amini via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> Hi,
> 
> I tend to agree with Eric, but since I'm too busy to compute data or to
sign-up for doing the work at that time, I won't weight strongly on this.
> 
> Especially Eric's point b) is worrying to me: unless the work to
"correctly" design is unbearable, having work performed on CodeView
that would make things harder to do for Dwarf later is a red flag IMO.
> 
> I'd like to answer especially:
>  "I don't think it's an imminent priority for anyone to do
this for DWARF, so I worry that if we start building infrastructure for it, it
will end up over engineered."
> 
> LTO is impacted a lot by debug info memory size and CPU time. ThinLTO is
impacted even on a larger scale. So it should be an "imminent
priority" to address that, and we (well Adrian and Duncan) worked a lot on
the "non-type" part of it recently, improving the current state
significantly. However debug info is still a bottleneck and we aim at doing
better. The plan about moving type emission in the front-end is definitively
appealing to me because of that.
> 
> Another aspect is that when building LLVM, we are linking 56 separate
binaries (I'm not talking about archive) from largely overlapping sets of
object files.
> It means that any work that is performed during LTO/Codegen on these files
but could be moved during the compile phase is already almost a x56 speedup win
(and lower peak memory during LTO).
> Knowing that the peak memory is reached during CodeGen and that the Dwarf
emission is a large part of it, this is a major candidate to be moved in the
compile phase.
> 
> -- 
> Mehdi
> 
> 
> 
>> On Apr 27, 2016, at 4:53 PM, Eric Christopher via llvm-dev <llvm-dev
at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>> I don't agree in general here because of:
>> 
>> a) maintainability - there isn't a one true path through things and
now is scattering more windows knowledge through debug info and lto
>> 
>> b) higher bar for implementing similar dwarf functionality -
there's nothing here that makes it at any point better for our general debug
info support. Incrementally updating to an intermediate step is much easier and
a lower bar than needing to implement everything up to and including a format
aware linker and support that through ThinLTO, the JIT, and full LTO.
>> 
>> c) if there's no reason to do this for dwarf there's no reason
to do it for windows. The existing proposal was a way to get you type emission
in the front end so that you'd have to do less work. Ultimately though I
don't see a reason to do this if all of the platforms don't look the
same.
>> 
>> d) ThinLTO/ORC won't support the debug info you have in your
proposal right now without patches
>> 
>> e) You're regressing LTO linking performance hugely for windows
with debug until you write the patches that enable format aware linking of code
view information
>> 
>> I'm open to arguments on any of these points from anyone.
>> 
>> -eric
>> 
>> On Wed, Apr 27, 2016 at 4:41 PM Reid Kleckner <rnk at google.com
<mailto:rnk at google.com>> wrote:
>> My general feeling is that this design represents a mid-point between
our current metadata design, and a future design where frontends just emit type
information and LTO links it in a format-aware way.
>> 
>> I don't think it's an imminent priority for anyone to do this
for DWARF, so I worry that if we start building infrastructure for it, it will
end up overengineered.
>> 
>> Also, people seem to agree that in the long term, we really need a
format-aware linker, and maybe LTO should just use one. Supposedly Frédéric has
patches to llvm-dsymutil to make one for DWARF, but he hasn't found the time
to upstream them.
>> 
>> Together, these reasons make me feel that we should limit the
short-term scope to just CodeView, and add utilities to lib/Linker for
performing basic tasks like type stream merging or type extraction, possibly
with forward declaration of composite types.
>> 
>> In the future, when we do this work for DWARF, we can add a new DIType*
stand-in similar to what you are describing.
>> 
>> The working patch that I have for just CodeView, all types as a single
blob, is up here: http://reviews.llvm.org/D19236
<http://reviews.llvm.org/D19236> While it doesn't deal with type blobs
or LTO type merging yet, I think it shows that there is surprisingly little need
to bifurcate other parts of LLVM.
>> 
>> Thoughts?
>> 
>> On Tue, Mar 29, 2016 at 6:00 PM, Eric Christopher <echristo at
gmail.com <mailto:echristo at gmail.com>> wrote:
>> Hi All,
>> 
>> This is something that's been talked about for some time and
it's probably time to propose it.
>> 
>> The "We" in this document is everyone on the cc line plus me.
>> 
>> Please go ahead and take a look.
>> 
>> Thanks!
>> 
>> -eric
>> 
>> 
>> Objective (and TL;DR)
>> ================>> 
>> Migrate debug type information generation from the backends to the
front end.
>> 
>> This will enable:
>> 1. Separation of concerns and maintainability: LLVM shouldn’t have to
know about C preprocessor macros, Obj-C properties, or extensive details about
debug information binary formats.
>> 2. Performance: Skipping a serialization should speed up normal
compilations.
>> 3. Memory usage: The DI metadata structures are smaller than they were,
but are still fairly large and pointer heavy.
>> 
>> Motivation
>> =======>> 
>> Currently, types in LLVM debug info are described by the DIType class
hierarchy. This hierarchy evolved organically from a more flexible sea-of-nodes
representation into what it is today - a large, only somewhat format neutral
representation of debug types. Making this more format neutral will only
increase the memory use - and for no reason as type information is static (or
nearly so). Debug formats already have a memory efficient serialization, their
own binary format so we should support a front end emitting type information
with sufficient representation to allow the backend to emit debug information
based on the more normal IR features: functions, scopes, variables, etc.
>> 
>> Scope/Impact
>> ==========>> 
>> This is going to involve large scale changes across both LLVM and
clang. This will also affect any out-of-tree front ends, however, we expect the
impact to be on the order of a large API change rather than needing massive
infrastructure changes.
>> 
>> Related work
>> =========>> 
>> This is related to the efforts to support CodeView in LLVM and clang as
well as efforts to reduce overall memory consumption when compiling with debug
information enabled;  in particular efforts to prune LTO memory usage.
>> 
>> 
>> Concerns
>> =======>> 
>> 
>> We need a good story for transitioning all the debug info testcases in
the backend without giving up coverage and/or readability. David believes he has
a plan here.
>> 
>> Proposal
>> ======>> 
>> Short version
>> -----------------
>> 
>> 1. Split the DIBuilder API into Types (+Macros, Imports, …) and Line
Table.
>> 2. Split the clang CGDebugInfo API into Types and Line Table to match.
>> 3. Add a LLVM DWARF emission library similar to the existing CodeView
one.
>> 4. Migrate the Types API into a clang internal API taking clang AST
structures and use the LLVM binary emission libraries to produce type
information.
>> 5. Remove the old binary emission out of LLVM.
>> 
>> 
>> Questions/Thoughts/Elaboration
>> -------------------------------------------
>> 
>> Splitting the DIBuilder API
>> ~~~~~~~~~~~~~~~~~~~~
>> Will DISubprogram be part of both?
>>    * We should split it in two: Full declarations with type and a
slimmed down version with an abstract origin.
>> 
>> How will we reference types in the DWARF blob?
>>    * ODR types can be referenced by name
>>    * Non-odr types by full DWARF hash
>>    * Each type can be a pair(tuple) of identifier (DITypeRef today) and
blob.
>>    * For < DWARF4 we can emit each type as a unit, but not a DWARF
Type Unit and use references and module relocations for the offsets. (See below)
>> 
>> How will we handle references in DWARF2 or global relocations for
non-type template parameters?
>>    * We can use a “relocation” metadata as part of the format.
>>    * Representable as a tuple that has the DIType and the offset within
the DIBlob as where to write the final relocation/offset for the reference at
emission time.
>> 
>> Why break up the types at all?
>>    * To enable non-debug format aware linking and type uniquing for LTO
that won’t be huge in size. We break up the types so we don’t need to parse
debug information to link two modules together efficiently.
>> 
>> Any other concerns there?
>>    * Debug information without type units might be slightly larger in
this scheme due to parents being duplicated (declarations and abstract origin,
not full parents). It may be possible to extend dsymutil/etc to merge all
siblings into a common parent. Open question for better ways to solve this.
>> 
>> How should we handle DWARF5/Apple Accelerator Tables?
>>    * Thoughts:
>>    * We can parse the dwarf in the back end and generate them.
>>    * We can emit in the front end for the base case of non-LTO (with
help from the backend for relocation aspects).
>>    * We can use dsymutil on LTO debug information to generate them.
>> 
>> Why isn’t this a more detailed spec?
>>    * Mostly because we’ve thought about the issues, but we can’t plan
for everything during implementation.
>> 
>> 
>> Future work
>> ----------------
>> 
>> Not contained as part of this, but an obvious future direction is that
the Module linker could grow support for debug aware linking. Then we can have
all of the type information for a single translation unit in a single blob and
use the debug aware linking to handle merging types.
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160510/e40cbdd4/attachment-0001.html>

Reid Kleckner via llvm-dev

2016-May-11 17:39 UTC

head link

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

Responses to Mehdi and Eric below.

On Wed, Apr 27, 2016 at 4:53 PM, Eric Christopher <echristo at gmail.com>
wrote:> I don't agree in general here because of:
>
> a) maintainability - there isn't a one true path through things and now
is
> scattering more windows knowledge through debug info and lto
There was never going to be one true way to generate LLVM debug info
for both formats. We need some help from the frontend.
> b) higher bar for implementing similar dwarf functionality - there's
nothing
> here that makes it at any point better for our general debug info support.
> Incrementally updating to an intermediate step is much easier and a lower
> bar than needing to implement everything up to and including a format aware
> linker and support that through ThinLTO, the JIT, and full LTO.
I claim that everything does not have to be format aware. All it has
to do is call out to a library which is format aware. We can come up
with reasonable high-level abstractions for operations that we'll want
to do on types, such as "extract this type and everything it
references".
> c) if there's no reason to do this for dwarf there's no reason to
do it for
> windows. The existing proposal was a way to get you type emission in the
> front end so that you'd have to do less work. Ultimately though I
don't see
> a reason to do this if all of the platforms don't look the same.
There are reasons to do this for DWARF, but they are not compelling
enough to do a total rewrite of our type information support.
> d) ThinLTO/ORC won't support the debug info you have in your proposal
right
> now without patches
>
> e) You're regressing LTO linking performance hugely for windows with
debug
> until you write the patches that enable format aware linking of code view
> information
The way I see it, there is no existing CodeView debug info
functionality to regress for any of ORC, LTO, or ThinLTO. Apparently
we don't see this the same way.

And I've already written the patch to do type merging:
http://reviews.llvm.org/D20122 Regular LTO can call this code, and
rewrite the DITypeIndex numbers with the map produced. While this may
not be directly applicable to ORC and ThinLTO, I don't expect that
supporting them will be much more work.



On Tue, May 10, 2016 at 11:32 PM, Mehdi Amini via cfe-dev
<cfe-dev at lists.llvm.org> wrote:> On the other hand, it seems that what you're proposing is basically
> "optimized" for "type units" (which are not supported
on Darwin anyway) and
> the only advantage we could see is to have an easy way of type-uniquing
> directly in the IR.
Splitting up the type information into opaque units lets you do
format-agnostic type uniquing, but it doesn't let you extract forward
declarations like ThinLTO wants to do.
> Our conclusion was that for us, a single type blob with somehow "smart
> reference" to be able to point inside the blob from the outside is the
most
> efficient things we can built upon. However the cost/benefit of getting
> there is too high for us to prioritize working this at this point.
> (If I misrepresented anything, please Adrian/Duncan/Fred correct me)
Yeah, this is kind of where I am. Having one blob per module is
probably the most efficient thing possible that I could do for
CodeView, but I estimate that the cost of also doing it for DWARF is
very high. We have a lot of dependencies on the existing
representation. We can attempt to try and generalize up-front emission
to DWARF, but I think if we don't pay the full cost, we will end up
with something half-baked for DWARF. I don't think I have the time to
do it justice.

Speaking of the idea of smart references that point out of the IR into
separate type info, my current approach (DITypeIndex) is very
CV-specific. However, I think if we allow one kind of smart reference,
we can add support for more, and they can be format-specific. As long
as we're OK making DITypeRefs opaque, adding new kinds of type refs is
cheap.

Smith, Kevin B via llvm-dev

2016-May-11 17:51 UTC

head link

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

>-----Original Message-----
>From: cfe-dev [mailto:cfe-dev-bounces at lists.llvm.org] On Behalf Of Reid
>Kleckner via cfe-dev
>Sent: Wednesday, May 11, 2016 10:40 AM
>To: Mehdi Amini <mehdi.amini at apple.com>
>Cc: llvm-dev <llvm-dev at lists.llvm.org>; Clang Dev <cfe-dev at
lists.llvm.org>
>Subject: Re: [cfe-dev] [llvm-dev] RFC: Up front type information generation
in
>clang and llvm
>
>Responses to Mehdi and Eric below.
>
>On Wed, Apr 27, 2016 at 4:53 PM, Eric Christopher <echristo at
gmail.com>
>wrote:
>> I don't agree in general here because of:
>>
>> a) maintainability - there isn't a one true path through things and
now is
>> scattering more windows knowledge through debug info and lto
>
>There was never going to be one true way to generate LLVM debug info
>for both formats. We need some help from the frontend.
I believe that Amjad Aboud has argued several times that there could be one true
way to generate LLVM debug info such that both
windows and DWARF debug info could be generated from it.  I know for a fact that
within the Intel Compiler that the FE generates a single
set of debug info representation, that then gets translated into either MS PDB
format, or DWARF depending on the target platform.

Architecturally, that is very desirable. You really do not want to have every FE
have to know about, and generate different debug info depending
on whether they are targeting windows or a DWARF enabled target, do you?
 >
>> b) higher bar for implementing similar dwarf functionality -
there's nothing
>> here that makes it at any point better for our general debug info
support.
>> Incrementally updating to an intermediate step is much easier and a
lower
>> bar than needing to implement everything up to and including a format
>aware
>> linker and support that through ThinLTO, the JIT, and full LTO.
>
>I claim that everything does not have to be format aware. All it has
>to do is call out to a library which is format aware. We can come up
>with reasonable high-level abstractions for operations that we'll want
>to do on types, such as "extract this type and everything it
>references".
>
>> c) if there's no reason to do this for dwarf there's no reason
to do it for
>> windows. The existing proposal was a way to get you type emission in
the
>> front end so that you'd have to do less work. Ultimately though I
don't see
>> a reason to do this if all of the platforms don't look the same.
>
>There are reasons to do this for DWARF, but they are not compelling
>enough to do a total rewrite of our type information support.
>
>> d) ThinLTO/ORC won't support the debug info you have in your
proposal
>right
>> now without patches
>>
>> e) You're regressing LTO linking performance hugely for windows
with
>debug
>> until you write the patches that enable format aware linking of code
view
>> information
>
>The way I see it, there is no existing CodeView debug info
>functionality to regress for any of ORC, LTO, or ThinLTO. Apparently
>we don't see this the same way.
>
>And I've already written the patch to do type merging:
>http://reviews.llvm.org/D20122 Regular LTO can call this code, and
>rewrite the DITypeIndex numbers with the map produced. While this may
>not be directly applicable to ORC and ThinLTO, I don't expect that
>supporting them will be much more work.
>
>
>
>On Tue, May 10, 2016 at 11:32 PM, Mehdi Amini via cfe-dev
><cfe-dev at lists.llvm.org> wrote:
>> On the other hand, it seems that what you're proposing is basically
>> "optimized" for "type units" (which are not
supported on Darwin anyway)
>and
>> the only advantage we could see is to have an easy way of type-uniquing
>> directly in the IR.
>
>Splitting up the type information into opaque units lets you do
>format-agnostic type uniquing, but it doesn't let you extract forward
>declarations like ThinLTO wants to do.
>
>> Our conclusion was that for us, a single type blob with somehow
"smart
>> reference" to be able to point inside the blob from the outside is
the most
>> efficient things we can built upon. However the cost/benefit of getting
>> there is too high for us to prioritize working this at this point.
>> (If I misrepresented anything, please Adrian/Duncan/Fred correct me)
>
>Yeah, this is kind of where I am. Having one blob per module is
>probably the most efficient thing possible that I could do for
>CodeView, but I estimate that the cost of also doing it for DWARF is
>very high. We have a lot of dependencies on the existing
>representation. We can attempt to try and generalize up-front emission
>to DWARF, but I think if we don't pay the full cost, we will end up
>with something half-baked for DWARF. I don't think I have the time to
>do it justice.
>
>Speaking of the idea of smart references that point out of the IR into
>separate type info, my current approach (DITypeIndex) is very
>CV-specific. However, I think if we allow one kind of smart reference,
>we can add support for more, and they can be format-specific. As long
>as we're OK making DITypeRefs opaque, adding new kinds of type refs is
>cheap.
>_______________________________________________
>cfe-dev mailing list
>cfe-dev at lists.llvm.org
>http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - May 2016 - [cfe-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

[llvm-dev] [cfe-dev] RFC: Up front type information generation in clang and llvm

Apparently Analagous Threads