thr3ads.net - llvm dev - [LLVMdev] Proposal: type uniquing of debug info for LTO [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Eric Christopher

2013-Jun-21 18:56 UTC

[LLVMdev] Proposal: type uniquing of debug info for LTO

On Fri, Jun 21, 2013 at 11:50 AM, Manman Ren <mren at apple.com>
wrote:>
> On Jun 21, 2013, at 11:35 AM, Eric Christopher wrote:
>
>> On Thu, Jun 20, 2013 at 10:52 PM, Manman Ren <mren at apple.com>
wrote:
>>>
>>> A summary of options for issue #3:
>>> 3> To actually access the MDNode referenced via the hash value,
we need to perform a lookup from the hash value to find the corresponding
MDNode.
>>> The questions are where to store this map and how to keep it
up-to-date when a MDNode is replaced.
>>> ---------------------
>>> Option a) a map in DwarfDebug, AsmWriter and DIBuilder, modify
access functions in DI classes to pass in the map.
>>> May need to modify printing functions in AsmWriter.
>>
>> You may not need the map in DIBuilder if you piggy back on the
>> existing map in CGDebugInfo.cpp. The AsmWriter stuff is unfortunate
>> for just debug dumps of the IR. A way around this might be nice, but
>> I've not looked into it.
>>
>>> ---------------------
>>> Option b) I am going to expand David's description with more
details, correct me if I am wrong.
>>
>> I'll let David continue this part of the thread with you.
>>
>>>
>>> I personally prefer option b over a since it is much cleaner.
>>>
>>
>> Option b is more general and should probably work. I suggested a
>> originally because I know it would work and the impact is limited to
>> debug info. Also it doesn't involve touching the generic metadata
>> interface or code and could be done quickly and incrementally. b may
>> also require further changes to the debug info infrastructure to deal
>> with lookup, etc so may end up being as much churn as a. b is more
>> useful if we think other users of metadata may want/need similar
>> functionality in the future.
>>
>> I'm fine with b as a choice, but it's going to involve more
planning
>> and code review and involve people outside of you, me and Dave :)
>
> Got it. David, are you okay with the details I added about option b?
> Let's start a new thread with title "Proposal: extend Metadata
(MDNode) to support MDHash" or something like that.
I'd like to hear a rationale for wanting b first. I don't know (and
can think of) any other users of this functionality right now.
> We can add a weight to MDHash (other names are fine with me), to support
multiple MDNodes with the same hash, but different weight.
>
Wait, what? This sounds like a bad idea.

-eric
> Thanks,
> Manman
>
>>
>> -eric
>>
>>
>>> Thanks,
>>> Manman
>>>
>>> On Jun 20, 2013, at 5:39 PM, David Blaikie wrote:
>>>
>>>> On Thu, Jun 20, 2013 at 5:25 PM, Manman Ren <mren at
apple.com> wrote:
>>>>>
>>>>> On Jun 20, 2013, at 5:18 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>>>>>
>>>>> On Thu, Jun 20, 2013 at 5:13 PM, Manman Ren <mren at
apple.com> wrote:
>>>>>
>>>>>
>>>>> On Jun 20, 2013, at 4:52 PM, David Blaikie wrote:
>>>>>
>>>>> On Thu, Jun 20, 2013 at 4:45 PM, Manman Ren <mren at
apple.com> wrote:
>>>>>
>>>>>
>>>>> On Jun 20, 2013, at 3:55 PM, Manman Ren <mren at
apple.com> wrote:
>>>>>
>>>>>
>>>>> On Jun 20, 2013, at 2:58 PM, Eric Christopher wrote:
>>>>>
>>>>> Hi Manman,
>>>>>
>>>>> On Thu, Jun 20, 2013 at 2:51 PM, Manman Ren <mren at
apple.com> wrote:
>>>>>
>>>>>
>>>>> The intent of this proposal is to speedup compilation of
"-flto -g" for c++
>>>>> programs.
>>>>> This is based on discussions with Adrian, David and Eric.
>>>>>
>>>>>
>>>>> Thanks for bringing this back to the list. The original
thread was
>>>>> getting quite long.
>>>>>
>>>>> ---------------------------
>>>>> Problem:
>>>>> A single class can be used in multiple source files and the
DI (Debug Info)
>>>>> class is included in multiple bc files. The duplication of
>>>>> class definitions causes blow-up in # of MDNodes, # of
DIEs, leading to
>>>>> large memory requirement.
>>>>>
>>>>> As an example, SPEC xalancbmk requires 7GB of memory when
compiled with
>>>>> -flto -g.
>>>>> With a preliminary implementation of type uniquing, the
memory usage will be
>>>>> down to 2GB.
>>>>>
>>>>> In order to unique types, we have to break cycles in the
MDNodes.
>>>>>
>>>>> A simple struct definition
>>>>> struct Base {
>>>>> int a;
>>>>> };
>>>>> can cause cycles in MDNodes:
>>>>> !12 = metadata !{i32 786451, metadata !13, null, metadata
!"Base", i32 1,
>>>>> i64 32, i64 32, i32 0, i32 0, null, metadata !14, i32 0,
null, null} ; [
>>>>> DW_TAG_structure_type ] [Base] [line 1, size 32, align 32,
offset 0] [from ]
>>>>> !14 = metadata !{metadata !15, metadata !16}
>>>>> !15 = metadata !{i32 786445, metadata !13, metadata !12,
metadata !"a", i32
>>>>> 2, i64 32, i64 32, i64 0, i32 0, metadata !8} ; [
DW_TAG_member ] [a] [line
>>>>> 2, size 32, align 32, offset 0] [from int]
>>>>> !16 = metadata !{i32 786478, metadata !13, metadata !12,
metadata !"Base",
>>>>> metadata !"Base", metadata !"", i32 1,
metadata !17, i1 false, i1 false, i32
>>>>> 0, i32 0, null, i32 320, i1 false, null, null, i32 0,
metadata !20, i32 1} ;
>>>>> [ DW_TAG_subprogram ] [line 1] [Base]
>>>>>
>>>>> Cycles: !12 -- !14 -- !15 -- !12
>>>>>       !12 -- !14 -- !16 -- !12
>>>>>
>>>>> These cycles make it hard to unique the same struct used in
two bc files.
>>>>>
>>>>> ---------------------------
>>>>> How to fix:
>>>>>
>>>>> We attach a hash value to types to help type uniquing and
we also replace
>>>>> references to types with their hash values.
>>>>> For the above struct "Base", we now have the
following MDNodes:
>>>>> !4 = metadata !{i32 786451, metadata !5, null, metadata
!"Base", i32 1, i64
>>>>> 32, i64 32, i32 0, i32 0, null, metadata !6, i32 0, i32 0,
null, i32
>>>>> 915398439} ; [ DW_TAG_structure_type ] [Base] [line 1, size
32, align 32,
>>>>> offset 0] [from ]
>>>>> !6 = metadata !{metadata !7, metadata !9}
>>>>> !7 = metadata !{i32 786445, metadata !5, i32 915398439,
metadata !"a", i32
>>>>> 2, i64 32, i64 32, i64 0, i32 0, metadata !8} ; [
DW_TAG_member ] [a] [line
>>>>> 2, size 32, align 32, offset 0] [from int]
>>>>> !9 = metadata !{i32 786478, metadata !5, i32 915398439,
metadata !"Base",
>>>>> metadata !"Base", metadata !"", i32 1,
metadata !10, i1 false, i1 false, i32
>>>>> 0, i32 0, null, i32 320, i1 false, null, null, i32 0,
metadata !13, i32 1} ;
>>>>> [ DW_TAG_subprogram ] [line 1] [Base]
>>>>>
>>>>> Note that the cycles are gone and !4 has a hash value of
915398439, and the
>>>>> references to !4 are replaced with 915398439.
>>>>> Thanks Eric for suggesting replacing MD reference with a
hash value.
>>>>>
>>>>>
>>>>> In particular I recommended this:
>>>>>
>>>>> a) For C++ odr replace it with the "hash"
that's just a string
>>>>> representing the type name.
>>>>> b) For Internal C++ types and all C types replace it with a
string
>>>>> that's a concatenation of the type name and the name of
the compile
>>>>> unit.
>>>>>
>>>>> Yes, that is what we agreed on over email.
>>>>>
>>>>>
>>>>>
>>>>> There are a few issues:
>>>>> 1> How to generate the hash for a given type?
>>>>> With C++'s ODR, it should be enough by using the
context and the name for
>>>>> non-internal c++ types.
>>>>> For internal c++ types and types of other languages, hash
will not be used.
>>>>>
>>>>>
>>>>> Explain this?
>>>>>
>>>>>
>>>>> For a while, I am going to support both hash and MD
reference, once
>>>>> everything is working with hash,
>>>>> I will update all debug info testing cases, turn
-gtype-uniquing on, and
>>>>> remove the other path.
>>>>>
>>>>> For internal c++ types, initially they will follow the path
of using MD
>>>>> references without a hash.
>>>>>
>>>>>
>>>>> My current implementation is to add a few static member
functions in MDNode
>>>>> to profile DI nodes differently.
>>>>> + /// If the array of Vals is for debug info, profile it
specially and
>>>>> return true.
>>>>> + /// If the DI node has a hash value, generate the profile
using only the
>>>>> hash value and the declaration flag.
>>>>> +  static bool profileDebugInfoNode(ArrayRef<Value*>
Vals, FoldingSetNodeID
>>>>> &ID);
>>>>>
>>>>> + /// If the MDNode is for debug info, profile it specially
and return true.
>>>>> + /// If the DI node has a hash value, generate the profile
using only the
>>>>> hash value and the declaration flag.
>>>>> +  static bool profileDebugInfoNode(const MDNode *M,
FoldingSetNodeID &ID);
>>>>>
>>>>> + /// Given a hash value and a flag, generate the profile
for later lookup.
>>>>> +  static bool profileDebugInfoNode(unsigned Hash, bool
Declaration,
>>>>> FoldingSetNodeID &ID);
>>>>>
>>>>> These static functions are called in Metadata.cpp:
>>>>> void MDNode::Profile(FoldingSetNodeID &ID) const {
>>>>> +  if (profileDebugInfoNode(this, ID))
>>>>> +    return;
>>>>> +
>>>>>
>>>>> There are other examples of these in MDNode for handling of
specific
>>>>> metadata.
>>>>> /// Methods for metadata merging.
>>>>> static MDNode *getMostGenericTBAA(MDNode *A, MDNode *B);
>>>>> static MDNode *getMostGenericFPMath(MDNode *A, MDNode *B);
>>>>> static MDNode *getMostGenericRange(MDNode *A, MDNode *B);
>>>>>
>>>>> Comments are welcome on whether this violates any layering
rule.
>>>>>
>>>>>
>>>>> As I've said many times in email, I don't think
this is a good idea
>>>>> and would prefer either a or b below. a) is a much simpler
solution.
>>>>>
>>>>> Any reason that why it is not a good idea?
>>>>>
>>>>>
>>>>> Other choices are:
>>>>> a> Keep a map in DwarfDebug
>>>>> Keep in mind that the map is used at many stages, and it
has to be in sync
>>>>> with MDNodeSet.
>>>>> b> Generalize MDNode to be aware of hash (David can
provide more details)
>>>>> c> Extend MDNode to DINode and modify streamers (bitcode
reader|writer, ll
>>>>> reader|writer) to be aware of DINode
>>>>> We can provide DINode::get(…) to create a DINode. DINode
can have its own
>>>>> Profile function.
>>>>> Other suggestions are welcome.
>>>>>
>>>>>
>>>>> a or b please.
>>>>>
>>>>> Option a> will require a DwarfDebug pointer in every
stage of the compiler,
>>>>> and passing the map to the DI classes.
>>>>> A rough estimation is around 100 places.
>>>>> Is it reasonable to pass a DwarfDebug pointer to DIBuilder
and llvm linker?
>>>>> Also the map needs to be in sync with MDNodeSet, maybe
using ValueHandle can
>>>>> solve the problem.
>>>>>
>>>>>
>>>>> What about putting the map in LLVMContextImpl?
>>>>> It already has a few things specifically for debug info:
>>>>> std::vector<DebugRecVH> ScopeRecords;
>>>>> DenseMap<std::pair<MDNode*, MDNode*>, int>
ScopeInlinedAt;
>>>>> …
>>>>>
>>>>> I remember David mentioned it once and I forgot about the
conclusion.
>>>>>
>>>>>
>>>>> I mentioned it only as speculation as to how you were
implementing it
>>>>> already (but you were doing the profile-changing stuff).
>>>>>
>>>>> I don't think it should be necessary to have the map
(in option (a))
>>>>> in such a central location as LLVMContext. It should be
usable just
>>>>> from DwarfDebug for generation, and DIBuilder can have its
own,
>>>>> separate map to do similar things during DI building.
>>>>>
>>>>>
>>>>> We also need the map during llvm linking since linking will
create new
>>>>> MDNodes.
>>>>>
>>>>>
>>>>> I don't understand what you mean by this. IR linking
shouldn't need to
>>>>> do anything debug-info-specific, it should just be the
normal IR
>>>>> approach.
>>>>>
>>>>> The declaration-v-definition resolution can be done at
codegen-time.
>>>>> We'll have to walk all the lists of retained types
anyway to build the
>>>>> map, so we can do declaration-v-definition (keep
definitions over
>>>>> declarations when we see both) at that point.
>>>>>
>>>>>
>>>>> Yes, you are right that at codegen-time, we can generate
the map from the
>>>>> lists
>>>>> of retained types.
>>>>>
>>>>> But dumping the linked ll file requires the map when
outputting comments of
>>>>> the MDNode :]
>>>>
>>>> Depending on which things we print out, but yes, in some cases
>>>> (derived types) we do print out the type referenced. I assume
the
>>>> AsmPrinter can build such a map, then. (in fact, with a few
clients
>>>> like this, it might be nice to build a bit of an abstraction
around it
>>>> rather than just using a raw map - something that has a ctor
(or I
>>>> suppose it could be a factory function) that reads in the right
>>>> metadata, walks the compile units, etc, and builds the mapping)
>>>>
>>>>>
>>>>> Thanks,
>>>>> Manman
>>>>>
>>>>>
>>>>> So any other opinion on putting it in LLVMContext other
than it being
>>>>> central?
>>>>>
>>>>> Thanks,
>>>>> Manman
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Manman
>>>>>
>>>>>
>>>>> More details for option b from David
>>>>>
>>>>> < The alternative I have in mind is a more complete
version of what
>>>>> < you're proposing - a full MD feature, not an MD
feature that's just
>>>>> < barely enough to support the needs of debug info. What
we could do is
>>>>> < allow the insertion of these MDHash things you spoke
about but take it
>>>>> < a step further and have MDNode::getOperand walk
through the hash &
>>>>> < give the value (in this way, DebugInfo wouldn't
have to change at all
>>>>> < to handle hashes - if the Metadata APIs are going to
be aware of the
>>>>> < hashes anyway, they might as well provide this
convenience
>>>>> < functionality) the metadata feature would also have to
have some
>>>>> < blessed top-level named metadata that would have a
list of hash+MDNode
>>>>> < to keep those MDNodes alive (so you wouldn't have
to stuff all the
>>>>> < types in the retained types list - metadata would
provide the full
>>>>> < support, not just half of it).
>>>>>
>>>>>
>>>>>
>>>>> Transition from current DI Metadata:
>>>>> To have a smooth transition, I will add a flag
"-gtype-hashing" for the type
>>>>> uniquing work and turn it on by default when we are ready.
>>>>>
>>>>>
>>>>> I'd prefer just make the change to have the front end
emit the "hash"
>>>>> (it's not really a hash, it's just a string).
>>>>>
>>>>> Are you saying no transition period? A single patch to have
correct handling
>>>>> of "hash" and to update all existing testing
cases?
>>>>>
>>>>>
>>>>> -----------------------------
>>>>> Patches:
>>>>> Expect the following patches:
>>>>> 1> add flag -gtype-hashing
>>>>> 2> add hash field to DI types
>>>>> 3> modify DIBuilder to use hash instead of MD reference
>>>>> 4> related to issue 3
>>>>>
>>>>>
>>>>> These can all be a single patch since it shouldn't be
very large if we
>>>>> go with a) above. If we go with b) then the MDNode work
should be done
>>>>> in isolation first and then the debug info on top of it.
>>>>>
>>>>> What is wrong with smaller patches?
>>>>> My estimation for all the above with a) is about 30K +
testing cases.
>>>>>
>>>>>
>>>>> 5> backend change (in DwarfDebug|CompileUnit) to support
types shared among
>>>>> compile units
>>>>> requires gdwarf-2 gdwarf-3 gdwarf-4 support for issues
related to ref_addr
>>>>>
>>>>>
>>>>> #5 can and should be done before the rest of them.
>>>>>
>>>>> I prefer to submit patches according to the flow of the
compiler, starting
>>>>> from the frontend, then IR, then backend.
>>>>> The testing cases will be added for front end, llvm-link
and backend.
>>>>> Any reason why #5 should be done first?
>>>>>
>>>>>
>>>>> All changes should be local to debug info classes except
patch #4.
>>>>>
>>>>>
>>>>> What's patch #4?
>>>>>
>>>>> Patch #4 above: related to issue 3 (changes corresponding
to how to solve
>>>>> issue #3)
>>>>>
>>>>> -Manman
>>>>>
>>>>>
>>>>> -eric
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>
>>>>>
>>>
>

Manman Ren

2013-Jun-21 19:33 UTC

head link

[LLVMdev] Proposal: type uniquing of debug info for LTO

On Jun 21, 2013, at 11:56 AM, Eric Christopher wrote:
> On Fri, Jun 21, 2013 at 11:50 AM, Manman Ren <mren at apple.com>
wrote:
>> 
>> On Jun 21, 2013, at 11:35 AM, Eric Christopher wrote:
>> 
>>> On Thu, Jun 20, 2013 at 10:52 PM, Manman Ren <mren at
apple.com> wrote:
>>>> 
>>>> A summary of options for issue #3:
>>>> 3> To actually access the MDNode referenced via the hash
value, we need to perform a lookup from the hash value to find the corresponding
MDNode.
>>>> The questions are where to store this map and how to keep it
up-to-date when a MDNode is replaced.
>>>> ---------------------
>>>> Option a) a map in DwarfDebug, AsmWriter and DIBuilder, modify
access functions in DI classes to pass in the map.
>>>> May need to modify printing functions in AsmWriter.
>>> 
>>> You may not need the map in DIBuilder if you piggy back on the
>>> existing map in CGDebugInfo.cpp. The AsmWriter stuff is unfortunate
>>> for just debug dumps of the IR. A way around this might be nice,
but
>>> I've not looked into it.
Calling dump() on a MDNode inside gdb will not give us the derived type
information, since we don't have a handle to the map.
>>> 
>>>> ---------------------
>>>> Option b) I am going to expand David's description with
more details, correct me if I am wrong.
>>> 
>>> I'll let David continue this part of the thread with you.
>>> 
>>>> 
>>>> I personally prefer option b over a since it is much cleaner.
>>>> 
>>> 
>>> Option b is more general and should probably work. I suggested a
>>> originally because I know it would work and the impact is limited
to
>>> debug info. Also it doesn't involve touching the generic
metadata
>>> interface or code and could be done quickly and incrementally. b
may
>>> also require further changes to the debug info infrastructure to
deal
>>> with lookup, etc so may end up being as much churn as a. b is more
>>> useful if we think other users of metadata may want/need similar
>>> functionality in the future.
>>> 
>>> I'm fine with b as a choice, but it's going to involve more
planning
>>> and code review and involve people outside of you, me and Dave :)
>> 
>> Got it. David, are you okay with the details I added about option b?
>> Let's start a new thread with title "Proposal: extend Metadata
(MDNode) to support MDHash" or something like that.
> 
> I'd like to hear a rationale for wanting b first. I don't know (and
> can think of) any other users of this functionality right now.
I will try my best :]
MDNode has a list of operands, one MDNode's full content can be thought of
as a graph, when there is a reference to another MDNode, we have
an edge. The MDHash can be used to give a unique id (or string) to the full
content of the graph.

The hash value applies to any kind of MDNode with complicated MDNode references.

If you think this is not general enough to be in MDNode, my earlier proposal of
extending MDNode to DINode is going the other direction.
Basically, MDNode will have an opcode like the opcode for Instruction, to
support TBAA, DI, and other types.
Inside streamers (bc ll reader), we can create the specialized class according
to the opcode.
That requires a lot of changes too :(
> 
>> We can add a weight to MDHash (other names are fine with me), to
support multiple MDNodes with the same hash, but different weight.
>> 
> 
> Wait, what? This sounds like a bad idea.Why?
The weight is needed to differentiate a forward declaration vs. a definition,
they both name the same class, but they are different.

Thanks,
Manman
> 
> -eric
> 
>> Thanks,
>> Manman
>> 
>>> 
>>> -eric
>>> 
>>> 
>>>> Thanks,
>>>> Manman
>>>> 
>>>> On Jun 20, 2013, at 5:39 PM, David Blaikie wrote:
>>>> 
>>>>> On Thu, Jun 20, 2013 at 5:25 PM, Manman Ren <mren at
apple.com> wrote:
>>>>>> 
>>>>>> On Jun 20, 2013, at 5:18 PM, David Blaikie <dblaikie
at gmail.com> wrote:
>>>>>> 
>>>>>> On Thu, Jun 20, 2013 at 5:13 PM, Manman Ren <mren at
apple.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> On Jun 20, 2013, at 4:52 PM, David Blaikie wrote:
>>>>>> 
>>>>>> On Thu, Jun 20, 2013 at 4:45 PM, Manman Ren <mren at
apple.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> On Jun 20, 2013, at 3:55 PM, Manman Ren <mren at
apple.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> On Jun 20, 2013, at 2:58 PM, Eric Christopher wrote:
>>>>>> 
>>>>>> Hi Manman,
>>>>>> 
>>>>>> On Thu, Jun 20, 2013 at 2:51 PM, Manman Ren <mren at
apple.com> wrote:
>>>>>> 
>>>>>> 
>>>>>> The intent of this proposal is to speedup compilation
of "-flto -g" for c++
>>>>>> programs.
>>>>>> This is based on discussions with Adrian, David and
Eric.
>>>>>> 
>>>>>> 
>>>>>> Thanks for bringing this back to the list. The original
thread was
>>>>>> getting quite long.
>>>>>> 
>>>>>> ---------------------------
>>>>>> Problem:
>>>>>> A single class can be used in multiple source files and
the DI (Debug Info)
>>>>>> class is included in multiple bc files. The duplication
of
>>>>>> class definitions causes blow-up in # of MDNodes, # of
DIEs, leading to
>>>>>> large memory requirement.
>>>>>> 
>>>>>> As an example, SPEC xalancbmk requires 7GB of memory
when compiled with
>>>>>> -flto -g.
>>>>>> With a preliminary implementation of type uniquing, the
memory usage will be
>>>>>> down to 2GB.
>>>>>> 
>>>>>> In order to unique types, we have to break cycles in
the MDNodes.
>>>>>> 
>>>>>> A simple struct definition
>>>>>> struct Base {
>>>>>> int a;
>>>>>> };
>>>>>> can cause cycles in MDNodes:
>>>>>> !12 = metadata !{i32 786451, metadata !13, null,
metadata !"Base", i32 1,
>>>>>> i64 32, i64 32, i32 0, i32 0, null, metadata !14, i32
0, null, null} ; [
>>>>>> DW_TAG_structure_type ] [Base] [line 1, size 32, align
32, offset 0] [from ]
>>>>>> !14 = metadata !{metadata !15, metadata !16}
>>>>>> !15 = metadata !{i32 786445, metadata !13, metadata
!12, metadata !"a", i32
>>>>>> 2, i64 32, i64 32, i64 0, i32 0, metadata !8} ; [
DW_TAG_member ] [a] [line
>>>>>> 2, size 32, align 32, offset 0] [from int]
>>>>>> !16 = metadata !{i32 786478, metadata !13, metadata
!12, metadata !"Base",
>>>>>> metadata !"Base", metadata !"", i32
1, metadata !17, i1 false, i1 false, i32
>>>>>> 0, i32 0, null, i32 320, i1 false, null, null, i32 0,
metadata !20, i32 1} ;
>>>>>> [ DW_TAG_subprogram ] [line 1] [Base]
>>>>>> 
>>>>>> Cycles: !12 -- !14 -- !15 -- !12
>>>>>>      !12 -- !14 -- !16 -- !12
>>>>>> 
>>>>>> These cycles make it hard to unique the same struct
used in two bc files.
>>>>>> 
>>>>>> ---------------------------
>>>>>> How to fix:
>>>>>> 
>>>>>> We attach a hash value to types to help type uniquing
and we also replace
>>>>>> references to types with their hash values.
>>>>>> For the above struct "Base", we now have the
following MDNodes:
>>>>>> !4 = metadata !{i32 786451, metadata !5, null, metadata
!"Base", i32 1, i64
>>>>>> 32, i64 32, i32 0, i32 0, null, metadata !6, i32 0, i32
0, null, i32
>>>>>> 915398439} ; [ DW_TAG_structure_type ] [Base] [line 1,
size 32, align 32,
>>>>>> offset 0] [from ]
>>>>>> !6 = metadata !{metadata !7, metadata !9}
>>>>>> !7 = metadata !{i32 786445, metadata !5, i32 915398439,
metadata !"a", i32
>>>>>> 2, i64 32, i64 32, i64 0, i32 0, metadata !8} ; [
DW_TAG_member ] [a] [line
>>>>>> 2, size 32, align 32, offset 0] [from int]
>>>>>> !9 = metadata !{i32 786478, metadata !5, i32 915398439,
metadata !"Base",
>>>>>> metadata !"Base", metadata !"", i32
1, metadata !10, i1 false, i1 false, i32
>>>>>> 0, i32 0, null, i32 320, i1 false, null, null, i32 0,
metadata !13, i32 1} ;
>>>>>> [ DW_TAG_subprogram ] [line 1] [Base]
>>>>>> 
>>>>>> Note that the cycles are gone and !4 has a hash value
of 915398439, and the
>>>>>> references to !4 are replaced with 915398439.
>>>>>> Thanks Eric for suggesting replacing MD reference with
a hash value.
>>>>>> 
>>>>>> 
>>>>>> In particular I recommended this:
>>>>>> 
>>>>>> a) For C++ odr replace it with the "hash"
that's just a string
>>>>>> representing the type name.
>>>>>> b) For Internal C++ types and all C types replace it
with a string
>>>>>> that's a concatenation of the type name and the
name of the compile
>>>>>> unit.
>>>>>> 
>>>>>> Yes, that is what we agreed on over email.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> There are a few issues:
>>>>>> 1> How to generate the hash for a given type?
>>>>>> With C++'s ODR, it should be enough by using the
context and the name for
>>>>>> non-internal c++ types.
>>>>>> For internal c++ types and types of other languages,
hash will not be used.
>>>>>> 
>>>>>> 
>>>>>> Explain this?
>>>>>> 
>>>>>> 
>>>>>> For a while, I am going to support both hash and MD
reference, once
>>>>>> everything is working with hash,
>>>>>> I will update all debug info testing cases, turn
-gtype-uniquing on, and
>>>>>> remove the other path.
>>>>>> 
>>>>>> For internal c++ types, initially they will follow the
path of using MD
>>>>>> references without a hash.
>>>>>> 
>>>>>> 
>>>>>> My current implementation is to add a few static member
functions in MDNode
>>>>>> to profile DI nodes differently.
>>>>>> + /// If the array of Vals is for debug info, profile
it specially and
>>>>>> return true.
>>>>>> + /// If the DI node has a hash value, generate the
profile using only the
>>>>>> hash value and the declaration flag.
>>>>>> +  static bool
profileDebugInfoNode(ArrayRef<Value*> Vals, FoldingSetNodeID
>>>>>> &ID);
>>>>>> 
>>>>>> + /// If the MDNode is for debug info, profile it
specially and return true.
>>>>>> + /// If the DI node has a hash value, generate the
profile using only the
>>>>>> hash value and the declaration flag.
>>>>>> +  static bool profileDebugInfoNode(const MDNode *M,
FoldingSetNodeID &ID);
>>>>>> 
>>>>>> + /// Given a hash value and a flag, generate the
profile for later lookup.
>>>>>> +  static bool profileDebugInfoNode(unsigned Hash, bool
Declaration,
>>>>>> FoldingSetNodeID &ID);
>>>>>> 
>>>>>> These static functions are called in Metadata.cpp:
>>>>>> void MDNode::Profile(FoldingSetNodeID &ID) const {
>>>>>> +  if (profileDebugInfoNode(this, ID))
>>>>>> +    return;
>>>>>> +
>>>>>> 
>>>>>> There are other examples of these in MDNode for
handling of specific
>>>>>> metadata.
>>>>>> /// Methods for metadata merging.
>>>>>> static MDNode *getMostGenericTBAA(MDNode *A, MDNode
*B);
>>>>>> static MDNode *getMostGenericFPMath(MDNode *A, MDNode
*B);
>>>>>> static MDNode *getMostGenericRange(MDNode *A, MDNode
*B);
>>>>>> 
>>>>>> Comments are welcome on whether this violates any
layering rule.
>>>>>> 
>>>>>> 
>>>>>> As I've said many times in email, I don't think
this is a good idea
>>>>>> and would prefer either a or b below. a) is a much
simpler solution.
>>>>>> 
>>>>>> Any reason that why it is not a good idea?
>>>>>> 
>>>>>> 
>>>>>> Other choices are:
>>>>>> a> Keep a map in DwarfDebug
>>>>>> Keep in mind that the map is used at many stages, and
it has to be in sync
>>>>>> with MDNodeSet.
>>>>>> b> Generalize MDNode to be aware of hash (David can
provide more details)
>>>>>> c> Extend MDNode to DINode and modify streamers
(bitcode reader|writer, ll
>>>>>> reader|writer) to be aware of DINode
>>>>>> We can provide DINode::get(…) to create a DINode.
DINode can have its own
>>>>>> Profile function.
>>>>>> Other suggestions are welcome.
>>>>>> 
>>>>>> 
>>>>>> a or b please.
>>>>>> 
>>>>>> Option a> will require a DwarfDebug pointer in every
stage of the compiler,
>>>>>> and passing the map to the DI classes.
>>>>>> A rough estimation is around 100 places.
>>>>>> Is it reasonable to pass a DwarfDebug pointer to
DIBuilder and llvm linker?
>>>>>> Also the map needs to be in sync with MDNodeSet, maybe
using ValueHandle can
>>>>>> solve the problem.
>>>>>> 
>>>>>> 
>>>>>> What about putting the map in LLVMContextImpl?
>>>>>> It already has a few things specifically for debug
info:
>>>>>> std::vector<DebugRecVH> ScopeRecords;
>>>>>> DenseMap<std::pair<MDNode*, MDNode*>, int>
ScopeInlinedAt;
>>>>>> …
>>>>>> 
>>>>>> I remember David mentioned it once and I forgot about
the conclusion.
>>>>>> 
>>>>>> 
>>>>>> I mentioned it only as speculation as to how you were
implementing it
>>>>>> already (but you were doing the profile-changing
stuff).
>>>>>> 
>>>>>> I don't think it should be necessary to have the
map (in option (a))
>>>>>> in such a central location as LLVMContext. It should be
usable just
>>>>>> from DwarfDebug for generation, and DIBuilder can have
its own,
>>>>>> separate map to do similar things during DI building.
>>>>>> 
>>>>>> 
>>>>>> We also need the map during llvm linking since linking
will create new
>>>>>> MDNodes.
>>>>>> 
>>>>>> 
>>>>>> I don't understand what you mean by this. IR
linking shouldn't need to
>>>>>> do anything debug-info-specific, it should just be the
normal IR
>>>>>> approach.
>>>>>> 
>>>>>> The declaration-v-definition resolution can be done at
codegen-time.
>>>>>> We'll have to walk all the lists of retained types
anyway to build the
>>>>>> map, so we can do declaration-v-definition (keep
definitions over
>>>>>> declarations when we see both) at that point.
>>>>>> 
>>>>>> 
>>>>>> Yes, you are right that at codegen-time, we can
generate the map from the
>>>>>> lists
>>>>>> of retained types.
>>>>>> 
>>>>>> But dumping the linked ll file requires the map when
outputting comments of
>>>>>> the MDNode :]
>>>>> 
>>>>> Depending on which things we print out, but yes, in some
cases
>>>>> (derived types) we do print out the type referenced. I
assume the
>>>>> AsmPrinter can build such a map, then. (in fact, with a few
clients
>>>>> like this, it might be nice to build a bit of an
abstraction around it
>>>>> rather than just using a raw map - something that has a
ctor (or I
>>>>> suppose it could be a factory function) that reads in the
right
>>>>> metadata, walks the compile units, etc, and builds the
mapping)
>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> Manman
>>>>>> 
>>>>>> 
>>>>>> So any other opinion on putting it in LLVMContext other
than it being
>>>>>> central?
>>>>>> 
>>>>>> Thanks,
>>>>>> Manman
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> Manman
>>>>>> 
>>>>>> 
>>>>>> More details for option b from David
>>>>>> 
>>>>>> < The alternative I have in mind is a more complete
version of what
>>>>>> < you're proposing - a full MD feature, not an
MD feature that's just
>>>>>> < barely enough to support the needs of debug info.
What we could do is
>>>>>> < allow the insertion of these MDHash things you
spoke about but take it
>>>>>> < a step further and have MDNode::getOperand walk
through the hash &
>>>>>> < give the value (in this way, DebugInfo
wouldn't have to change at all
>>>>>> < to handle hashes - if the Metadata APIs are going
to be aware of the
>>>>>> < hashes anyway, they might as well provide this
convenience
>>>>>> < functionality) the metadata feature would also
have to have some
>>>>>> < blessed top-level named metadata that would have a
list of hash+MDNode
>>>>>> < to keep those MDNodes alive (so you wouldn't
have to stuff all the
>>>>>> < types in the retained types list - metadata would
provide the full
>>>>>> < support, not just half of it).
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Transition from current DI Metadata:
>>>>>> To have a smooth transition, I will add a flag
"-gtype-hashing" for the type
>>>>>> uniquing work and turn it on by default when we are
ready.
>>>>>> 
>>>>>> 
>>>>>> I'd prefer just make the change to have the front
end emit the "hash"
>>>>>> (it's not really a hash, it's just a string).
>>>>>> 
>>>>>> Are you saying no transition period? A single patch to
have correct handling
>>>>>> of "hash" and to update all existing testing
cases?
>>>>>> 
>>>>>> 
>>>>>> -----------------------------
>>>>>> Patches:
>>>>>> Expect the following patches:
>>>>>> 1> add flag -gtype-hashing
>>>>>> 2> add hash field to DI types
>>>>>> 3> modify DIBuilder to use hash instead of MD
reference
>>>>>> 4> related to issue 3
>>>>>> 
>>>>>> 
>>>>>> These can all be a single patch since it shouldn't
be very large if we
>>>>>> go with a) above. If we go with b) then the MDNode work
should be done
>>>>>> in isolation first and then the debug info on top of
it.
>>>>>> 
>>>>>> What is wrong with smaller patches?
>>>>>> My estimation for all the above with a) is about 30K +
testing cases.
>>>>>> 
>>>>>> 
>>>>>> 5> backend change (in DwarfDebug|CompileUnit) to
support types shared among
>>>>>> compile units
>>>>>> requires gdwarf-2 gdwarf-3 gdwarf-4 support for issues
related to ref_addr
>>>>>> 
>>>>>> 
>>>>>> #5 can and should be done before the rest of them.
>>>>>> 
>>>>>> I prefer to submit patches according to the flow of the
compiler, starting
>>>>>> from the frontend, then IR, then backend.
>>>>>> The testing cases will be added for front end,
llvm-link and backend.
>>>>>> Any reason why #5 should be done first?
>>>>>> 
>>>>>> 
>>>>>> All changes should be local to debug info classes
except patch #4.
>>>>>> 
>>>>>> 
>>>>>> What's patch #4?
>>>>>> 
>>>>>> Patch #4 above: related to issue 3 (changes
corresponding to how to solve
>>>>>> issue #3)
>>>>>> 
>>>>>> -Manman
>>>>>> 
>>>>>> 
>>>>>> -eric
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>> 
>>>>>> 
>>>> 
>>

David Blaikie

2013-Jun-21 20:12 UTC

head link

[LLVMdev] Proposal: type uniquing of debug info for LTO

On Fri, Jun 21, 2013 at 12:33 PM, Manman Ren <mren at apple.com>
wrote:>
> On Jun 21, 2013, at 11:56 AM, Eric Christopher wrote:
>
>> On Fri, Jun 21, 2013 at 11:50 AM, Manman Ren <mren at apple.com>
wrote:
>>>
>>> On Jun 21, 2013, at 11:35 AM, Eric Christopher wrote:
>>>
>>>> On Thu, Jun 20, 2013 at 10:52 PM, Manman Ren <mren at
apple.com> wrote:
>>>>>
>>>>> A summary of options for issue #3:
>>>>> 3> To actually access the MDNode referenced via the hash
value, we need to perform a lookup from the hash value to find the corresponding
MDNode.
>>>>> The questions are where to store this map and how to keep
it up-to-date when a MDNode is replaced.
>>>>> ---------------------
>>>>> Option a) a map in DwarfDebug, AsmWriter and DIBuilder,
modify access functions in DI classes to pass in the map.
>>>>> May need to modify printing functions in AsmWriter.
>>>>
>>>> You may not need the map in DIBuilder if you piggy back on the
>>>> existing map in CGDebugInfo.cpp. The AsmWriter stuff is
unfortunate
>>>> for just debug dumps of the IR. A way around this might be
nice, but
>>>> I've not looked into it.
>
> Calling dump() on a MDNode inside gdb will not give us the derived type
information, since we don't have a handle to the map.
The only case where dump currently navigates a type link that I know
is printing derived types, where we print "[from foo]". By just
printing the string identifier for the type (without having to
navigate the link/use the map) should be about as useful here.
>
>>>>
>>>>> ---------------------
>>>>> Option b) I am going to expand David's description with
more details, correct me if I am wrong.
>>>>
>>>> I'll let David continue this part of the thread with you.
>>>>
>>>>>
>>>>> I personally prefer option b over a since it is much
cleaner.
>>>>>
>>>>
>>>> Option b is more general and should probably work. I suggested
a
>>>> originally because I know it would work and the impact is
limited to
>>>> debug info. Also it doesn't involve touching the generic
metadata
>>>> interface or code and could be done quickly and incrementally.
b may
>>>> also require further changes to the debug info infrastructure
to deal
>>>> with lookup, etc so may end up being as much churn as a. b is
more
>>>> useful if we think other users of metadata may want/need
similar
>>>> functionality in the future.
>>>>
>>>> I'm fine with b as a choice, but it's going to involve
more planning
>>>> and code review and involve people outside of you, me and Dave
:)
>>>
>>> Got it. David, are you okay with the details I added about option
b?
>>> Let's start a new thread with title "Proposal: extend
Metadata (MDNode) to support MDHash" or something like that.
>>
>> I'd like to hear a rationale for wanting b first. I don't know
(and
>> can think of) any other users of this functionality right now.
>
> I will try my best :]
> MDNode has a list of operands, one MDNode's full content can be thought
of as a graph, when there is a reference to another MDNode, we have
> an edge. The MDHash can be used to give a unique id (or string) to the full
content of the graph.
>
> The hash value applies to any kind of MDNode with complicated MDNode
references.
>
> If you think this is not general enough to be in MDNode,
Eric's concern/question is whether it's a feature that would be useful
to any other code, I believe. (I'm less concerned about this so long
as the feature is appropriately general, even if it doesn't have any
other consumers at the moment - I don't think it'd drastically
complicate the Metadata handling code (which I think is currently
fairly simple, but I admit to not knowing it in any great detail))
> my earlier proposal of extending MDNode to DINode is going the other
direction.
As discussed, this isn't far enough in the other direction. If we're
going to touch core LLVM Metadata handling code at all, it needs to be
a complete/cohesive feature, not just tendrils of a debug info
feature. Hence (a) or (b) not (c).
> Basically, MDNode will have an opcode like the opcode for Instruction, to
support TBAA, DI, and other types.
> Inside streamers (bc ll reader), we can create the specialized class
according to the opcode.
> That requires a lot of changes too :(
>
>>
>>> We can add a weight to MDHash (other names are fine with me), to
support multiple MDNodes with the same hash, but different weight.
>>>
>>
>> Wait, what? This sounds like a bad idea.
> Why?
> The weight is needed to differentiate a forward declaration vs. a
definition, they both name the same class, but they are different.
This would probably tip me over into Eric's perspective that it would
be too esoteric a feature to add to a generic/core piece of
infrastructure.

Yes, I'm not sure we should resolve duplicates in this case, then
(short of assigning a separate identifier to declarations V
definitions so they don't collide/get deduped - then doing a pass in
DwarfDebug/during debug-info-generation time to resolve in favor of
definitions (hmm, yeah, even that doesn't quite make sense - you'd
have to be able to visit the things referring to the declarations &
update them to refer to definitions, and we'd have no real way to
discover those easily except walking all the debug info... could
work))
>
> Thanks,
> Manman
>
>>
>> -eric
>>
>>> Thanks,
>>> Manman
>>>
>>>>
>>>> -eric
>>>>
>>>>
>>>>> Thanks,
>>>>> Manman
>>>>>
>>>>> On Jun 20, 2013, at 5:39 PM, David Blaikie wrote:
>>>>>
>>>>>> On Thu, Jun 20, 2013 at 5:25 PM, Manman Ren <mren at
apple.com> wrote:
>>>>>>>
>>>>>>> On Jun 20, 2013, at 5:18 PM, David Blaikie
<dblaikie at gmail.com> wrote:
>>>>>>>
>>>>>>> On Thu, Jun 20, 2013 at 5:13 PM, Manman Ren
<mren at apple.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Jun 20, 2013, at 4:52 PM, David Blaikie wrote:
>>>>>>>
>>>>>>> On Thu, Jun 20, 2013 at 4:45 PM, Manman Ren
<mren at apple.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Jun 20, 2013, at 3:55 PM, Manman Ren <mren at
apple.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Jun 20, 2013, at 2:58 PM, Eric Christopher
wrote:
>>>>>>>
>>>>>>> Hi Manman,
>>>>>>>
>>>>>>> On Thu, Jun 20, 2013 at 2:51 PM, Manman Ren
<mren at apple.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> The intent of this proposal is to speedup
compilation of "-flto -g" for c++
>>>>>>> programs.
>>>>>>> This is based on discussions with Adrian, David and
Eric.
>>>>>>>
>>>>>>>
>>>>>>> Thanks for bringing this back to the list. The
original thread was
>>>>>>> getting quite long.
>>>>>>>
>>>>>>> ---------------------------
>>>>>>> Problem:
>>>>>>> A single class can be used in multiple source files
and the DI (Debug Info)
>>>>>>> class is included in multiple bc files. The
duplication of
>>>>>>> class definitions causes blow-up in # of MDNodes, #
of DIEs, leading to
>>>>>>> large memory requirement.
>>>>>>>
>>>>>>> As an example, SPEC xalancbmk requires 7GB of
memory when compiled with
>>>>>>> -flto -g.
>>>>>>> With a preliminary implementation of type uniquing,
the memory usage will be
>>>>>>> down to 2GB.
>>>>>>>
>>>>>>> In order to unique types, we have to break cycles
in the MDNodes.
>>>>>>>
>>>>>>> A simple struct definition
>>>>>>> struct Base {
>>>>>>> int a;
>>>>>>> };
>>>>>>> can cause cycles in MDNodes:
>>>>>>> !12 = metadata !{i32 786451, metadata !13, null,
metadata !"Base", i32 1,
>>>>>>> i64 32, i64 32, i32 0, i32 0, null, metadata !14,
i32 0, null, null} ; [
>>>>>>> DW_TAG_structure_type ] [Base] [line 1, size 32,
align 32, offset 0] [from ]
>>>>>>> !14 = metadata !{metadata !15, metadata !16}
>>>>>>> !15 = metadata !{i32 786445, metadata !13, metadata
!12, metadata !"a", i32
>>>>>>> 2, i64 32, i64 32, i64 0, i32 0, metadata !8} ; [
DW_TAG_member ] [a] [line
>>>>>>> 2, size 32, align 32, offset 0] [from int]
>>>>>>> !16 = metadata !{i32 786478, metadata !13, metadata
!12, metadata !"Base",
>>>>>>> metadata !"Base", metadata !"",
i32 1, metadata !17, i1 false, i1 false, i32
>>>>>>> 0, i32 0, null, i32 320, i1 false, null, null, i32
0, metadata !20, i32 1} ;
>>>>>>> [ DW_TAG_subprogram ] [line 1] [Base]
>>>>>>>
>>>>>>> Cycles: !12 -- !14 -- !15 -- !12
>>>>>>>      !12 -- !14 -- !16 -- !12
>>>>>>>
>>>>>>> These cycles make it hard to unique the same struct
used in two bc files.
>>>>>>>
>>>>>>> ---------------------------
>>>>>>> How to fix:
>>>>>>>
>>>>>>> We attach a hash value to types to help type
uniquing and we also replace
>>>>>>> references to types with their hash values.
>>>>>>> For the above struct "Base", we now have
the following MDNodes:
>>>>>>> !4 = metadata !{i32 786451, metadata !5, null,
metadata !"Base", i32 1, i64
>>>>>>> 32, i64 32, i32 0, i32 0, null, metadata !6, i32 0,
i32 0, null, i32
>>>>>>> 915398439} ; [ DW_TAG_structure_type ] [Base] [line
1, size 32, align 32,
>>>>>>> offset 0] [from ]
>>>>>>> !6 = metadata !{metadata !7, metadata !9}
>>>>>>> !7 = metadata !{i32 786445, metadata !5, i32
915398439, metadata !"a", i32
>>>>>>> 2, i64 32, i64 32, i64 0, i32 0, metadata !8} ; [
DW_TAG_member ] [a] [line
>>>>>>> 2, size 32, align 32, offset 0] [from int]
>>>>>>> !9 = metadata !{i32 786478, metadata !5, i32
915398439, metadata !"Base",
>>>>>>> metadata !"Base", metadata !"",
i32 1, metadata !10, i1 false, i1 false, i32
>>>>>>> 0, i32 0, null, i32 320, i1 false, null, null, i32
0, metadata !13, i32 1} ;
>>>>>>> [ DW_TAG_subprogram ] [line 1] [Base]
>>>>>>>
>>>>>>> Note that the cycles are gone and !4 has a hash
value of 915398439, and the
>>>>>>> references to !4 are replaced with 915398439.
>>>>>>> Thanks Eric for suggesting replacing MD reference
with a hash value.
>>>>>>>
>>>>>>>
>>>>>>> In particular I recommended this:
>>>>>>>
>>>>>>> a) For C++ odr replace it with the "hash"
that's just a string
>>>>>>> representing the type name.
>>>>>>> b) For Internal C++ types and all C types replace
it with a string
>>>>>>> that's a concatenation of the type name and the
name of the compile
>>>>>>> unit.
>>>>>>>
>>>>>>> Yes, that is what we agreed on over email.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> There are a few issues:
>>>>>>> 1> How to generate the hash for a given type?
>>>>>>> With C++'s ODR, it should be enough by using
the context and the name for
>>>>>>> non-internal c++ types.
>>>>>>> For internal c++ types and types of other
languages, hash will not be used.
>>>>>>>
>>>>>>>
>>>>>>> Explain this?
>>>>>>>
>>>>>>>
>>>>>>> For a while, I am going to support both hash and MD
reference, once
>>>>>>> everything is working with hash,
>>>>>>> I will update all debug info testing cases, turn
-gtype-uniquing on, and
>>>>>>> remove the other path.
>>>>>>>
>>>>>>> For internal c++ types, initially they will follow
the path of using MD
>>>>>>> references without a hash.
>>>>>>>
>>>>>>>
>>>>>>> My current implementation is to add a few static
member functions in MDNode
>>>>>>> to profile DI nodes differently.
>>>>>>> + /// If the array of Vals is for debug info,
profile it specially and
>>>>>>> return true.
>>>>>>> + /// If the DI node has a hash value, generate the
profile using only the
>>>>>>> hash value and the declaration flag.
>>>>>>> +  static bool
profileDebugInfoNode(ArrayRef<Value*> Vals, FoldingSetNodeID
>>>>>>> &ID);
>>>>>>>
>>>>>>> + /// If the MDNode is for debug info, profile it
specially and return true.
>>>>>>> + /// If the DI node has a hash value, generate the
profile using only the
>>>>>>> hash value and the declaration flag.
>>>>>>> +  static bool profileDebugInfoNode(const MDNode
*M, FoldingSetNodeID &ID);
>>>>>>>
>>>>>>> + /// Given a hash value and a flag, generate the
profile for later lookup.
>>>>>>> +  static bool profileDebugInfoNode(unsigned Hash,
bool Declaration,
>>>>>>> FoldingSetNodeID &ID);
>>>>>>>
>>>>>>> These static functions are called in Metadata.cpp:
>>>>>>> void MDNode::Profile(FoldingSetNodeID &ID)
const {
>>>>>>> +  if (profileDebugInfoNode(this, ID))
>>>>>>> +    return;
>>>>>>> +
>>>>>>>
>>>>>>> There are other examples of these in MDNode for
handling of specific
>>>>>>> metadata.
>>>>>>> /// Methods for metadata merging.
>>>>>>> static MDNode *getMostGenericTBAA(MDNode *A, MDNode
*B);
>>>>>>> static MDNode *getMostGenericFPMath(MDNode *A,
MDNode *B);
>>>>>>> static MDNode *getMostGenericRange(MDNode *A,
MDNode *B);
>>>>>>>
>>>>>>> Comments are welcome on whether this violates any
layering rule.
>>>>>>>
>>>>>>>
>>>>>>> As I've said many times in email, I don't
think this is a good idea
>>>>>>> and would prefer either a or b below. a) is a much
simpler solution.
>>>>>>>
>>>>>>> Any reason that why it is not a good idea?
>>>>>>>
>>>>>>>
>>>>>>> Other choices are:
>>>>>>> a> Keep a map in DwarfDebug
>>>>>>> Keep in mind that the map is used at many stages,
and it has to be in sync
>>>>>>> with MDNodeSet.
>>>>>>> b> Generalize MDNode to be aware of hash (David
can provide more details)
>>>>>>> c> Extend MDNode to DINode and modify streamers
(bitcode reader|writer, ll
>>>>>>> reader|writer) to be aware of DINode
>>>>>>> We can provide DINode::get(…) to create a DINode.
DINode can have its own
>>>>>>> Profile function.
>>>>>>> Other suggestions are welcome.
>>>>>>>
>>>>>>>
>>>>>>> a or b please.
>>>>>>>
>>>>>>> Option a> will require a DwarfDebug pointer in
every stage of the compiler,
>>>>>>> and passing the map to the DI classes.
>>>>>>> A rough estimation is around 100 places.
>>>>>>> Is it reasonable to pass a DwarfDebug pointer to
DIBuilder and llvm linker?
>>>>>>> Also the map needs to be in sync with MDNodeSet,
maybe using ValueHandle can
>>>>>>> solve the problem.
>>>>>>>
>>>>>>>
>>>>>>> What about putting the map in LLVMContextImpl?
>>>>>>> It already has a few things specifically for debug
info:
>>>>>>> std::vector<DebugRecVH> ScopeRecords;
>>>>>>> DenseMap<std::pair<MDNode*, MDNode*>,
int> ScopeInlinedAt;
>>>>>>> …
>>>>>>>
>>>>>>> I remember David mentioned it once and I forgot
about the conclusion.
>>>>>>>
>>>>>>>
>>>>>>> I mentioned it only as speculation as to how you
were implementing it
>>>>>>> already (but you were doing the profile-changing
stuff).
>>>>>>>
>>>>>>> I don't think it should be necessary to have
the map (in option (a))
>>>>>>> in such a central location as LLVMContext. It
should be usable just
>>>>>>> from DwarfDebug for generation, and DIBuilder can
have its own,
>>>>>>> separate map to do similar things during DI
building.
>>>>>>>
>>>>>>>
>>>>>>> We also need the map during llvm linking since
linking will create new
>>>>>>> MDNodes.
>>>>>>>
>>>>>>>
>>>>>>> I don't understand what you mean by this. IR
linking shouldn't need to
>>>>>>> do anything debug-info-specific, it should just be
the normal IR
>>>>>>> approach.
>>>>>>>
>>>>>>> The declaration-v-definition resolution can be done
at codegen-time.
>>>>>>> We'll have to walk all the lists of retained
types anyway to build the
>>>>>>> map, so we can do declaration-v-definition (keep
definitions over
>>>>>>> declarations when we see both) at that point.
>>>>>>>
>>>>>>>
>>>>>>> Yes, you are right that at codegen-time, we can
generate the map from the
>>>>>>> lists
>>>>>>> of retained types.
>>>>>>>
>>>>>>> But dumping the linked ll file requires the map
when outputting comments of
>>>>>>> the MDNode :]
>>>>>>
>>>>>> Depending on which things we print out, but yes, in
some cases
>>>>>> (derived types) we do print out the type referenced. I
assume the
>>>>>> AsmPrinter can build such a map, then. (in fact, with a
few clients
>>>>>> like this, it might be nice to build a bit of an
abstraction around it
>>>>>> rather than just using a raw map - something that has a
ctor (or I
>>>>>> suppose it could be a factory function) that reads in
the right
>>>>>> metadata, walks the compile units, etc, and builds the
mapping)
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Manman
>>>>>>>
>>>>>>>
>>>>>>> So any other opinion on putting it in LLVMContext
other than it being
>>>>>>> central?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Manman
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Manman
>>>>>>>
>>>>>>>
>>>>>>> More details for option b from David
>>>>>>>
>>>>>>> < The alternative I have in mind is a more
complete version of what
>>>>>>> < you're proposing - a full MD feature, not
an MD feature that's just
>>>>>>> < barely enough to support the needs of debug
info. What we could do is
>>>>>>> < allow the insertion of these MDHash things you
spoke about but take it
>>>>>>> < a step further and have MDNode::getOperand
walk through the hash &
>>>>>>> < give the value (in this way, DebugInfo
wouldn't have to change at all
>>>>>>> < to handle hashes - if the Metadata APIs are
going to be aware of the
>>>>>>> < hashes anyway, they might as well provide this
convenience
>>>>>>> < functionality) the metadata feature would also
have to have some
>>>>>>> < blessed top-level named metadata that would
have a list of hash+MDNode
>>>>>>> < to keep those MDNodes alive (so you
wouldn't have to stuff all the
>>>>>>> < types in the retained types list - metadata
would provide the full
>>>>>>> < support, not just half of it).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Transition from current DI Metadata:
>>>>>>> To have a smooth transition, I will add a flag
"-gtype-hashing" for the type
>>>>>>> uniquing work and turn it on by default when we are
ready.
>>>>>>>
>>>>>>>
>>>>>>> I'd prefer just make the change to have the
front end emit the "hash"
>>>>>>> (it's not really a hash, it's just a
string).
>>>>>>>
>>>>>>> Are you saying no transition period? A single patch
to have correct handling
>>>>>>> of "hash" and to update all existing
testing cases?
>>>>>>>
>>>>>>>
>>>>>>> -----------------------------
>>>>>>> Patches:
>>>>>>> Expect the following patches:
>>>>>>> 1> add flag -gtype-hashing
>>>>>>> 2> add hash field to DI types
>>>>>>> 3> modify DIBuilder to use hash instead of MD
reference
>>>>>>> 4> related to issue 3
>>>>>>>
>>>>>>>
>>>>>>> These can all be a single patch since it
shouldn't be very large if we
>>>>>>> go with a) above. If we go with b) then the MDNode
work should be done
>>>>>>> in isolation first and then the debug info on top
of it.
>>>>>>>
>>>>>>> What is wrong with smaller patches?
>>>>>>> My estimation for all the above with a) is about
30K + testing cases.
>>>>>>>
>>>>>>>
>>>>>>> 5> backend change (in DwarfDebug|CompileUnit) to
support types shared among
>>>>>>> compile units
>>>>>>> requires gdwarf-2 gdwarf-3 gdwarf-4 support for
issues related to ref_addr
>>>>>>>
>>>>>>>
>>>>>>> #5 can and should be done before the rest of them.
>>>>>>>
>>>>>>> I prefer to submit patches according to the flow of
the compiler, starting
>>>>>>> from the frontend, then IR, then backend.
>>>>>>> The testing cases will be added for front end,
llvm-link and backend.
>>>>>>> Any reason why #5 should be done first?
>>>>>>>
>>>>>>>
>>>>>>> All changes should be local to debug info classes
except patch #4.
>>>>>>>
>>>>>>>
>>>>>>> What's patch #4?
>>>>>>>
>>>>>>> Patch #4 above: related to issue 3 (changes
corresponding to how to solve
>>>>>>> issue #3)
>>>>>>>
>>>>>>> -Manman
>>>>>>>
>>>>>>>
>>>>>>> -eric
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> llvm-commits mailing list
>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>
>>>>>>>
>>>>>
>>>
>

Eric Christopher

2013-Jun-21 20:14 UTC

head link

[LLVMdev] Proposal: type uniquing of debug info for LTO

On Fri, Jun 21, 2013 at 12:33 PM, Manman Ren <mren at apple.com>
wrote:>
> On Jun 21, 2013, at 11:56 AM, Eric Christopher wrote:
>
>> On Fri, Jun 21, 2013 at 11:50 AM, Manman Ren <mren at apple.com>
wrote:
>>>
>>> On Jun 21, 2013, at 11:35 AM, Eric Christopher wrote:
>>>
>>>> On Thu, Jun 20, 2013 at 10:52 PM, Manman Ren <mren at
apple.com> wrote:
>>>>>
>>>>> A summary of options for issue #3:
>>>>> 3> To actually access the MDNode referenced via the hash
value, we need to perform a lookup from the hash value to find the corresponding
MDNode.
>>>>> The questions are where to store this map and how to keep
it up-to-date when a MDNode is replaced.
>>>>> ---------------------
>>>>> Option a) a map in DwarfDebug, AsmWriter and DIBuilder,
modify access functions in DI classes to pass in the map.
>>>>> May need to modify printing functions in AsmWriter.
>>>>
>>>> You may not need the map in DIBuilder if you piggy back on the
>>>> existing map in CGDebugInfo.cpp. The AsmWriter stuff is
unfortunate
>>>> for just debug dumps of the IR. A way around this might be
nice, but
>>>> I've not looked into it.
>
> Calling dump() on a MDNode inside gdb will not give us the derived type
information, since we don't have a handle to the map.
>
It can give you the identifier value for it, which, while not pretty
is likely just as good.
> If you think this is not general enough to be in MDNode, my earlier
proposal of extending MDNode to DINode is going the other direction.
> Basically, MDNode will have an opcode like the opcode for Instruction, to
support TBAA, DI, and other types.
> Inside streamers (bc ll reader), we can create the specialized class
according to the opcode.
> That requires a lot of changes too :(
>
I'm still not seeing why your option is the only other one. I.e. my
first proposal.
>>
>>> We can add a weight to MDHash (other names are fine with me), to
support multiple MDNodes with the same hash, but different weight.
>>>
>>
>> Wait, what? This sounds like a bad idea.
> Why?
> The weight is needed to differentiate a forward declaration vs. a
definition, they both name the same class, but they are different.
>
The contents of the node have forward declaration versus definition.
They won't be merged or RAUW'd. As you build up the hash table you can
make sure that definitions win in creating DIEs yes? i.e. just add the
full types to the map as you construct them and when you look up
DW_AT_type/DW_AT_specification you can replace with the correct DIE?

-eric
> Thanks,
> Manman
>
>>
>> -eric
>>
>>> Thanks,
>>> Manman
>>>
>>>>
>>>> -eric
>>>>
>>>>
>>>>> Thanks,
>>>>> Manman
>>>>>
>>>>> On Jun 20, 2013, at 5:39 PM, David Blaikie wrote:
>>>>>
>>>>>> On Thu, Jun 20, 2013 at 5:25 PM, Manman Ren <mren at
apple.com> wrote:
>>>>>>>
>>>>>>> On Jun 20, 2013, at 5:18 PM, David Blaikie
<dblaikie at gmail.com> wrote:
>>>>>>>
>>>>>>> On Thu, Jun 20, 2013 at 5:13 PM, Manman Ren
<mren at apple.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Jun 20, 2013, at 4:52 PM, David Blaikie wrote:
>>>>>>>
>>>>>>> On Thu, Jun 20, 2013 at 4:45 PM, Manman Ren
<mren at apple.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Jun 20, 2013, at 3:55 PM, Manman Ren <mren at
apple.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Jun 20, 2013, at 2:58 PM, Eric Christopher
wrote:
>>>>>>>
>>>>>>> Hi Manman,
>>>>>>>
>>>>>>> On Thu, Jun 20, 2013 at 2:51 PM, Manman Ren
<mren at apple.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> The intent of this proposal is to speedup
compilation of "-flto -g" for c++
>>>>>>> programs.
>>>>>>> This is based on discussions with Adrian, David and
Eric.
>>>>>>>
>>>>>>>
>>>>>>> Thanks for bringing this back to the list. The
original thread was
>>>>>>> getting quite long.
>>>>>>>
>>>>>>> ---------------------------
>>>>>>> Problem:
>>>>>>> A single class can be used in multiple source files
and the DI (Debug Info)
>>>>>>> class is included in multiple bc files. The
duplication of
>>>>>>> class definitions causes blow-up in # of MDNodes, #
of DIEs, leading to
>>>>>>> large memory requirement.
>>>>>>>
>>>>>>> As an example, SPEC xalancbmk requires 7GB of
memory when compiled with
>>>>>>> -flto -g.
>>>>>>> With a preliminary implementation of type uniquing,
the memory usage will be
>>>>>>> down to 2GB.
>>>>>>>
>>>>>>> In order to unique types, we have to break cycles
in the MDNodes.
>>>>>>>
>>>>>>> A simple struct definition
>>>>>>> struct Base {
>>>>>>> int a;
>>>>>>> };
>>>>>>> can cause cycles in MDNodes:
>>>>>>> !12 = metadata !{i32 786451, metadata !13, null,
metadata !"Base", i32 1,
>>>>>>> i64 32, i64 32, i32 0, i32 0, null, metadata !14,
i32 0, null, null} ; [
>>>>>>> DW_TAG_structure_type ] [Base] [line 1, size 32,
align 32, offset 0] [from ]
>>>>>>> !14 = metadata !{metadata !15, metadata !16}
>>>>>>> !15 = metadata !{i32 786445, metadata !13, metadata
!12, metadata !"a", i32
>>>>>>> 2, i64 32, i64 32, i64 0, i32 0, metadata !8} ; [
DW_TAG_member ] [a] [line
>>>>>>> 2, size 32, align 32, offset 0] [from int]
>>>>>>> !16 = metadata !{i32 786478, metadata !13, metadata
!12, metadata !"Base",
>>>>>>> metadata !"Base", metadata !"",
i32 1, metadata !17, i1 false, i1 false, i32
>>>>>>> 0, i32 0, null, i32 320, i1 false, null, null, i32
0, metadata !20, i32 1} ;
>>>>>>> [ DW_TAG_subprogram ] [line 1] [Base]
>>>>>>>
>>>>>>> Cycles: !12 -- !14 -- !15 -- !12
>>>>>>>      !12 -- !14 -- !16 -- !12
>>>>>>>
>>>>>>> These cycles make it hard to unique the same struct
used in two bc files.
>>>>>>>
>>>>>>> ---------------------------
>>>>>>> How to fix:
>>>>>>>
>>>>>>> We attach a hash value to types to help type
uniquing and we also replace
>>>>>>> references to types with their hash values.
>>>>>>> For the above struct "Base", we now have
the following MDNodes:
>>>>>>> !4 = metadata !{i32 786451, metadata !5, null,
metadata !"Base", i32 1, i64
>>>>>>> 32, i64 32, i32 0, i32 0, null, metadata !6, i32 0,
i32 0, null, i32
>>>>>>> 915398439} ; [ DW_TAG_structure_type ] [Base] [line
1, size 32, align 32,
>>>>>>> offset 0] [from ]
>>>>>>> !6 = metadata !{metadata !7, metadata !9}
>>>>>>> !7 = metadata !{i32 786445, metadata !5, i32
915398439, metadata !"a", i32
>>>>>>> 2, i64 32, i64 32, i64 0, i32 0, metadata !8} ; [
DW_TAG_member ] [a] [line
>>>>>>> 2, size 32, align 32, offset 0] [from int]
>>>>>>> !9 = metadata !{i32 786478, metadata !5, i32
915398439, metadata !"Base",
>>>>>>> metadata !"Base", metadata !"",
i32 1, metadata !10, i1 false, i1 false, i32
>>>>>>> 0, i32 0, null, i32 320, i1 false, null, null, i32
0, metadata !13, i32 1} ;
>>>>>>> [ DW_TAG_subprogram ] [line 1] [Base]
>>>>>>>
>>>>>>> Note that the cycles are gone and !4 has a hash
value of 915398439, and the
>>>>>>> references to !4 are replaced with 915398439.
>>>>>>> Thanks Eric for suggesting replacing MD reference
with a hash value.
>>>>>>>
>>>>>>>
>>>>>>> In particular I recommended this:
>>>>>>>
>>>>>>> a) For C++ odr replace it with the "hash"
that's just a string
>>>>>>> representing the type name.
>>>>>>> b) For Internal C++ types and all C types replace
it with a string
>>>>>>> that's a concatenation of the type name and the
name of the compile
>>>>>>> unit.
>>>>>>>
>>>>>>> Yes, that is what we agreed on over email.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> There are a few issues:
>>>>>>> 1> How to generate the hash for a given type?
>>>>>>> With C++'s ODR, it should be enough by using
the context and the name for
>>>>>>> non-internal c++ types.
>>>>>>> For internal c++ types and types of other
languages, hash will not be used.
>>>>>>>
>>>>>>>
>>>>>>> Explain this?
>>>>>>>
>>>>>>>
>>>>>>> For a while, I am going to support both hash and MD
reference, once
>>>>>>> everything is working with hash,
>>>>>>> I will update all debug info testing cases, turn
-gtype-uniquing on, and
>>>>>>> remove the other path.
>>>>>>>
>>>>>>> For internal c++ types, initially they will follow
the path of using MD
>>>>>>> references without a hash.
>>>>>>>
>>>>>>>
>>>>>>> My current implementation is to add a few static
member functions in MDNode
>>>>>>> to profile DI nodes differently.
>>>>>>> + /// If the array of Vals is for debug info,
profile it specially and
>>>>>>> return true.
>>>>>>> + /// If the DI node has a hash value, generate the
profile using only the
>>>>>>> hash value and the declaration flag.
>>>>>>> +  static bool
profileDebugInfoNode(ArrayRef<Value*> Vals, FoldingSetNodeID
>>>>>>> &ID);
>>>>>>>
>>>>>>> + /// If the MDNode is for debug info, profile it
specially and return true.
>>>>>>> + /// If the DI node has a hash value, generate the
profile using only the
>>>>>>> hash value and the declaration flag.
>>>>>>> +  static bool profileDebugInfoNode(const MDNode
*M, FoldingSetNodeID &ID);
>>>>>>>
>>>>>>> + /// Given a hash value and a flag, generate the
profile for later lookup.
>>>>>>> +  static bool profileDebugInfoNode(unsigned Hash,
bool Declaration,
>>>>>>> FoldingSetNodeID &ID);
>>>>>>>
>>>>>>> These static functions are called in Metadata.cpp:
>>>>>>> void MDNode::Profile(FoldingSetNodeID &ID)
const {
>>>>>>> +  if (profileDebugInfoNode(this, ID))
>>>>>>> +    return;
>>>>>>> +
>>>>>>>
>>>>>>> There are other examples of these in MDNode for
handling of specific
>>>>>>> metadata.
>>>>>>> /// Methods for metadata merging.
>>>>>>> static MDNode *getMostGenericTBAA(MDNode *A, MDNode
*B);
>>>>>>> static MDNode *getMostGenericFPMath(MDNode *A,
MDNode *B);
>>>>>>> static MDNode *getMostGenericRange(MDNode *A,
MDNode *B);
>>>>>>>
>>>>>>> Comments are welcome on whether this violates any
layering rule.
>>>>>>>
>>>>>>>
>>>>>>> As I've said many times in email, I don't
think this is a good idea
>>>>>>> and would prefer either a or b below. a) is a much
simpler solution.
>>>>>>>
>>>>>>> Any reason that why it is not a good idea?
>>>>>>>
>>>>>>>
>>>>>>> Other choices are:
>>>>>>> a> Keep a map in DwarfDebug
>>>>>>> Keep in mind that the map is used at many stages,
and it has to be in sync
>>>>>>> with MDNodeSet.
>>>>>>> b> Generalize MDNode to be aware of hash (David
can provide more details)
>>>>>>> c> Extend MDNode to DINode and modify streamers
(bitcode reader|writer, ll
>>>>>>> reader|writer) to be aware of DINode
>>>>>>> We can provide DINode::get(…) to create a DINode.
DINode can have its own
>>>>>>> Profile function.
>>>>>>> Other suggestions are welcome.
>>>>>>>
>>>>>>>
>>>>>>> a or b please.
>>>>>>>
>>>>>>> Option a> will require a DwarfDebug pointer in
every stage of the compiler,
>>>>>>> and passing the map to the DI classes.
>>>>>>> A rough estimation is around 100 places.
>>>>>>> Is it reasonable to pass a DwarfDebug pointer to
DIBuilder and llvm linker?
>>>>>>> Also the map needs to be in sync with MDNodeSet,
maybe using ValueHandle can
>>>>>>> solve the problem.
>>>>>>>
>>>>>>>
>>>>>>> What about putting the map in LLVMContextImpl?
>>>>>>> It already has a few things specifically for debug
info:
>>>>>>> std::vector<DebugRecVH> ScopeRecords;
>>>>>>> DenseMap<std::pair<MDNode*, MDNode*>,
int> ScopeInlinedAt;
>>>>>>> …
>>>>>>>
>>>>>>> I remember David mentioned it once and I forgot
about the conclusion.
>>>>>>>
>>>>>>>
>>>>>>> I mentioned it only as speculation as to how you
were implementing it
>>>>>>> already (but you were doing the profile-changing
stuff).
>>>>>>>
>>>>>>> I don't think it should be necessary to have
the map (in option (a))
>>>>>>> in such a central location as LLVMContext. It
should be usable just
>>>>>>> from DwarfDebug for generation, and DIBuilder can
have its own,
>>>>>>> separate map to do similar things during DI
building.
>>>>>>>
>>>>>>>
>>>>>>> We also need the map during llvm linking since
linking will create new
>>>>>>> MDNodes.
>>>>>>>
>>>>>>>
>>>>>>> I don't understand what you mean by this. IR
linking shouldn't need to
>>>>>>> do anything debug-info-specific, it should just be
the normal IR
>>>>>>> approach.
>>>>>>>
>>>>>>> The declaration-v-definition resolution can be done
at codegen-time.
>>>>>>> We'll have to walk all the lists of retained
types anyway to build the
>>>>>>> map, so we can do declaration-v-definition (keep
definitions over
>>>>>>> declarations when we see both) at that point.
>>>>>>>
>>>>>>>
>>>>>>> Yes, you are right that at codegen-time, we can
generate the map from the
>>>>>>> lists
>>>>>>> of retained types.
>>>>>>>
>>>>>>> But dumping the linked ll file requires the map
when outputting comments of
>>>>>>> the MDNode :]
>>>>>>
>>>>>> Depending on which things we print out, but yes, in
some cases
>>>>>> (derived types) we do print out the type referenced. I
assume the
>>>>>> AsmPrinter can build such a map, then. (in fact, with a
few clients
>>>>>> like this, it might be nice to build a bit of an
abstraction around it
>>>>>> rather than just using a raw map - something that has a
ctor (or I
>>>>>> suppose it could be a factory function) that reads in
the right
>>>>>> metadata, walks the compile units, etc, and builds the
mapping)
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Manman
>>>>>>>
>>>>>>>
>>>>>>> So any other opinion on putting it in LLVMContext
other than it being
>>>>>>> central?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Manman
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Manman
>>>>>>>
>>>>>>>
>>>>>>> More details for option b from David
>>>>>>>
>>>>>>> < The alternative I have in mind is a more
complete version of what
>>>>>>> < you're proposing - a full MD feature, not
an MD feature that's just
>>>>>>> < barely enough to support the needs of debug
info. What we could do is
>>>>>>> < allow the insertion of these MDHash things you
spoke about but take it
>>>>>>> < a step further and have MDNode::getOperand
walk through the hash &
>>>>>>> < give the value (in this way, DebugInfo
wouldn't have to change at all
>>>>>>> < to handle hashes - if the Metadata APIs are
going to be aware of the
>>>>>>> < hashes anyway, they might as well provide this
convenience
>>>>>>> < functionality) the metadata feature would also
have to have some
>>>>>>> < blessed top-level named metadata that would
have a list of hash+MDNode
>>>>>>> < to keep those MDNodes alive (so you
wouldn't have to stuff all the
>>>>>>> < types in the retained types list - metadata
would provide the full
>>>>>>> < support, not just half of it).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Transition from current DI Metadata:
>>>>>>> To have a smooth transition, I will add a flag
"-gtype-hashing" for the type
>>>>>>> uniquing work and turn it on by default when we are
ready.
>>>>>>>
>>>>>>>
>>>>>>> I'd prefer just make the change to have the
front end emit the "hash"
>>>>>>> (it's not really a hash, it's just a
string).
>>>>>>>
>>>>>>> Are you saying no transition period? A single patch
to have correct handling
>>>>>>> of "hash" and to update all existing
testing cases?
>>>>>>>
>>>>>>>
>>>>>>> -----------------------------
>>>>>>> Patches:
>>>>>>> Expect the following patches:
>>>>>>> 1> add flag -gtype-hashing
>>>>>>> 2> add hash field to DI types
>>>>>>> 3> modify DIBuilder to use hash instead of MD
reference
>>>>>>> 4> related to issue 3
>>>>>>>
>>>>>>>
>>>>>>> These can all be a single patch since it
shouldn't be very large if we
>>>>>>> go with a) above. If we go with b) then the MDNode
work should be done
>>>>>>> in isolation first and then the debug info on top
of it.
>>>>>>>
>>>>>>> What is wrong with smaller patches?
>>>>>>> My estimation for all the above with a) is about
30K + testing cases.
>>>>>>>
>>>>>>>
>>>>>>> 5> backend change (in DwarfDebug|CompileUnit) to
support types shared among
>>>>>>> compile units
>>>>>>> requires gdwarf-2 gdwarf-3 gdwarf-4 support for
issues related to ref_addr
>>>>>>>
>>>>>>>
>>>>>>> #5 can and should be done before the rest of them.
>>>>>>>
>>>>>>> I prefer to submit patches according to the flow of
the compiler, starting
>>>>>>> from the frontend, then IR, then backend.
>>>>>>> The testing cases will be added for front end,
llvm-link and backend.
>>>>>>> Any reason why #5 should be done first?
>>>>>>>
>>>>>>>
>>>>>>> All changes should be local to debug info classes
except patch #4.
>>>>>>>
>>>>>>>
>>>>>>> What's patch #4?
>>>>>>>
>>>>>>> Patch #4 above: related to issue 3 (changes
corresponding to how to solve
>>>>>>> issue #3)
>>>>>>>
>>>>>>> -Manman
>>>>>>>
>>>>>>>
>>>>>>> -eric
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> llvm-commits mailing list
>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>
>>>>>>>
>>>>>
>>>
>

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - Jun 2013 - [LLVMdev] Proposal: type uniquing of debug info for LTO

[LLVMdev] Proposal: type uniquing of debug info for LTO

[LLVMdev] Proposal: type uniquing of debug info for LTO

[LLVMdev] Proposal: type uniquing of debug info for LTO

[LLVMdev] Proposal: type uniquing of debug info for LTO

Reasonably Related Threads