thr3ads.net - llvm dev - [LLVMdev] Proposal: type uniquing of debug info for LTO [Jun 2013]

If this information is useful, please help other people find it:
Share via:

David Blaikie

2013-Jun-21 20:12 UTC

[LLVMdev] Proposal: type uniquing of debug info for LTO

On Fri, Jun 21, 2013 at 12:33 PM, Manman Ren <mren at apple.com>
wrote:>
> On Jun 21, 2013, at 11:56 AM, Eric Christopher wrote:
>
>> On Fri, Jun 21, 2013 at 11:50 AM, Manman Ren <mren at apple.com>
wrote:
>>>
>>> On Jun 21, 2013, at 11:35 AM, Eric Christopher wrote:
>>>
>>>> On Thu, Jun 20, 2013 at 10:52 PM, Manman Ren <mren at
apple.com> wrote:
>>>>>
>>>>> A summary of options for issue #3:
>>>>> 3> To actually access the MDNode referenced via the hash
value, we need to perform a lookup from the hash value to find the corresponding
MDNode.
>>>>> The questions are where to store this map and how to keep
it up-to-date when a MDNode is replaced.
>>>>> ---------------------
>>>>> Option a) a map in DwarfDebug, AsmWriter and DIBuilder,
modify access functions in DI classes to pass in the map.
>>>>> May need to modify printing functions in AsmWriter.
>>>>
>>>> You may not need the map in DIBuilder if you piggy back on the
>>>> existing map in CGDebugInfo.cpp. The AsmWriter stuff is
unfortunate
>>>> for just debug dumps of the IR. A way around this might be
nice, but
>>>> I've not looked into it.
>
> Calling dump() on a MDNode inside gdb will not give us the derived type
information, since we don't have a handle to the map.
The only case where dump currently navigates a type link that I know
is printing derived types, where we print "[from foo]". By just
printing the string identifier for the type (without having to
navigate the link/use the map) should be about as useful here.
>
>>>>
>>>>> ---------------------
>>>>> Option b) I am going to expand David's description with
more details, correct me if I am wrong.
>>>>
>>>> I'll let David continue this part of the thread with you.
>>>>
>>>>>
>>>>> I personally prefer option b over a since it is much
cleaner.
>>>>>
>>>>
>>>> Option b is more general and should probably work. I suggested
a
>>>> originally because I know it would work and the impact is
limited to
>>>> debug info. Also it doesn't involve touching the generic
metadata
>>>> interface or code and could be done quickly and incrementally.
b may
>>>> also require further changes to the debug info infrastructure
to deal
>>>> with lookup, etc so may end up being as much churn as a. b is
more
>>>> useful if we think other users of metadata may want/need
similar
>>>> functionality in the future.
>>>>
>>>> I'm fine with b as a choice, but it's going to involve
more planning
>>>> and code review and involve people outside of you, me and Dave
:)
>>>
>>> Got it. David, are you okay with the details I added about option
b?
>>> Let's start a new thread with title "Proposal: extend
Metadata (MDNode) to support MDHash" or something like that.
>>
>> I'd like to hear a rationale for wanting b first. I don't know
(and
>> can think of) any other users of this functionality right now.
>
> I will try my best :]
> MDNode has a list of operands, one MDNode's full content can be thought
of as a graph, when there is a reference to another MDNode, we have
> an edge. The MDHash can be used to give a unique id (or string) to the full
content of the graph.
>
> The hash value applies to any kind of MDNode with complicated MDNode
references.
>
> If you think this is not general enough to be in MDNode,
Eric's concern/question is whether it's a feature that would be useful
to any other code, I believe. (I'm less concerned about this so long
as the feature is appropriately general, even if it doesn't have any
other consumers at the moment - I don't think it'd drastically
complicate the Metadata handling code (which I think is currently
fairly simple, but I admit to not knowing it in any great detail))
> my earlier proposal of extending MDNode to DINode is going the other
direction.
As discussed, this isn't far enough in the other direction. If we're
going to touch core LLVM Metadata handling code at all, it needs to be
a complete/cohesive feature, not just tendrils of a debug info
feature. Hence (a) or (b) not (c).
> Basically, MDNode will have an opcode like the opcode for Instruction, to
support TBAA, DI, and other types.
> Inside streamers (bc ll reader), we can create the specialized class
according to the opcode.
> That requires a lot of changes too :(
>
>>
>>> We can add a weight to MDHash (other names are fine with me), to
support multiple MDNodes with the same hash, but different weight.
>>>
>>
>> Wait, what? This sounds like a bad idea.
> Why?
> The weight is needed to differentiate a forward declaration vs. a
definition, they both name the same class, but they are different.
This would probably tip me over into Eric's perspective that it would
be too esoteric a feature to add to a generic/core piece of
infrastructure.

Yes, I'm not sure we should resolve duplicates in this case, then
(short of assigning a separate identifier to declarations V
definitions so they don't collide/get deduped - then doing a pass in
DwarfDebug/during debug-info-generation time to resolve in favor of
definitions (hmm, yeah, even that doesn't quite make sense - you'd
have to be able to visit the things referring to the declarations &
update them to refer to definitions, and we'd have no real way to
discover those easily except walking all the debug info... could
work))
>
> Thanks,
> Manman
>
>>
>> -eric
>>
>>> Thanks,
>>> Manman
>>>
>>>>
>>>> -eric
>>>>
>>>>
>>>>> Thanks,
>>>>> Manman
>>>>>
>>>>> On Jun 20, 2013, at 5:39 PM, David Blaikie wrote:
>>>>>
>>>>>> On Thu, Jun 20, 2013 at 5:25 PM, Manman Ren <mren at
apple.com> wrote:
>>>>>>>
>>>>>>> On Jun 20, 2013, at 5:18 PM, David Blaikie
<dblaikie at gmail.com> wrote:
>>>>>>>
>>>>>>> On Thu, Jun 20, 2013 at 5:13 PM, Manman Ren
<mren at apple.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Jun 20, 2013, at 4:52 PM, David Blaikie wrote:
>>>>>>>
>>>>>>> On Thu, Jun 20, 2013 at 4:45 PM, Manman Ren
<mren at apple.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Jun 20, 2013, at 3:55 PM, Manman Ren <mren at
apple.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Jun 20, 2013, at 2:58 PM, Eric Christopher
wrote:
>>>>>>>
>>>>>>> Hi Manman,
>>>>>>>
>>>>>>> On Thu, Jun 20, 2013 at 2:51 PM, Manman Ren
<mren at apple.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> The intent of this proposal is to speedup
compilation of "-flto -g" for c++
>>>>>>> programs.
>>>>>>> This is based on discussions with Adrian, David and
Eric.
>>>>>>>
>>>>>>>
>>>>>>> Thanks for bringing this back to the list. The
original thread was
>>>>>>> getting quite long.
>>>>>>>
>>>>>>> ---------------------------
>>>>>>> Problem:
>>>>>>> A single class can be used in multiple source files
and the DI (Debug Info)
>>>>>>> class is included in multiple bc files. The
duplication of
>>>>>>> class definitions causes blow-up in # of MDNodes, #
of DIEs, leading to
>>>>>>> large memory requirement.
>>>>>>>
>>>>>>> As an example, SPEC xalancbmk requires 7GB of
memory when compiled with
>>>>>>> -flto -g.
>>>>>>> With a preliminary implementation of type uniquing,
the memory usage will be
>>>>>>> down to 2GB.
>>>>>>>
>>>>>>> In order to unique types, we have to break cycles
in the MDNodes.
>>>>>>>
>>>>>>> A simple struct definition
>>>>>>> struct Base {
>>>>>>> int a;
>>>>>>> };
>>>>>>> can cause cycles in MDNodes:
>>>>>>> !12 = metadata !{i32 786451, metadata !13, null,
metadata !"Base", i32 1,
>>>>>>> i64 32, i64 32, i32 0, i32 0, null, metadata !14,
i32 0, null, null} ; [
>>>>>>> DW_TAG_structure_type ] [Base] [line 1, size 32,
align 32, offset 0] [from ]
>>>>>>> !14 = metadata !{metadata !15, metadata !16}
>>>>>>> !15 = metadata !{i32 786445, metadata !13, metadata
!12, metadata !"a", i32
>>>>>>> 2, i64 32, i64 32, i64 0, i32 0, metadata !8} ; [
DW_TAG_member ] [a] [line
>>>>>>> 2, size 32, align 32, offset 0] [from int]
>>>>>>> !16 = metadata !{i32 786478, metadata !13, metadata
!12, metadata !"Base",
>>>>>>> metadata !"Base", metadata !"",
i32 1, metadata !17, i1 false, i1 false, i32
>>>>>>> 0, i32 0, null, i32 320, i1 false, null, null, i32
0, metadata !20, i32 1} ;
>>>>>>> [ DW_TAG_subprogram ] [line 1] [Base]
>>>>>>>
>>>>>>> Cycles: !12 -- !14 -- !15 -- !12
>>>>>>>      !12 -- !14 -- !16 -- !12
>>>>>>>
>>>>>>> These cycles make it hard to unique the same struct
used in two bc files.
>>>>>>>
>>>>>>> ---------------------------
>>>>>>> How to fix:
>>>>>>>
>>>>>>> We attach a hash value to types to help type
uniquing and we also replace
>>>>>>> references to types with their hash values.
>>>>>>> For the above struct "Base", we now have
the following MDNodes:
>>>>>>> !4 = metadata !{i32 786451, metadata !5, null,
metadata !"Base", i32 1, i64
>>>>>>> 32, i64 32, i32 0, i32 0, null, metadata !6, i32 0,
i32 0, null, i32
>>>>>>> 915398439} ; [ DW_TAG_structure_type ] [Base] [line
1, size 32, align 32,
>>>>>>> offset 0] [from ]
>>>>>>> !6 = metadata !{metadata !7, metadata !9}
>>>>>>> !7 = metadata !{i32 786445, metadata !5, i32
915398439, metadata !"a", i32
>>>>>>> 2, i64 32, i64 32, i64 0, i32 0, metadata !8} ; [
DW_TAG_member ] [a] [line
>>>>>>> 2, size 32, align 32, offset 0] [from int]
>>>>>>> !9 = metadata !{i32 786478, metadata !5, i32
915398439, metadata !"Base",
>>>>>>> metadata !"Base", metadata !"",
i32 1, metadata !10, i1 false, i1 false, i32
>>>>>>> 0, i32 0, null, i32 320, i1 false, null, null, i32
0, metadata !13, i32 1} ;
>>>>>>> [ DW_TAG_subprogram ] [line 1] [Base]
>>>>>>>
>>>>>>> Note that the cycles are gone and !4 has a hash
value of 915398439, and the
>>>>>>> references to !4 are replaced with 915398439.
>>>>>>> Thanks Eric for suggesting replacing MD reference
with a hash value.
>>>>>>>
>>>>>>>
>>>>>>> In particular I recommended this:
>>>>>>>
>>>>>>> a) For C++ odr replace it with the "hash"
that's just a string
>>>>>>> representing the type name.
>>>>>>> b) For Internal C++ types and all C types replace
it with a string
>>>>>>> that's a concatenation of the type name and the
name of the compile
>>>>>>> unit.
>>>>>>>
>>>>>>> Yes, that is what we agreed on over email.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> There are a few issues:
>>>>>>> 1> How to generate the hash for a given type?
>>>>>>> With C++'s ODR, it should be enough by using
the context and the name for
>>>>>>> non-internal c++ types.
>>>>>>> For internal c++ types and types of other
languages, hash will not be used.
>>>>>>>
>>>>>>>
>>>>>>> Explain this?
>>>>>>>
>>>>>>>
>>>>>>> For a while, I am going to support both hash and MD
reference, once
>>>>>>> everything is working with hash,
>>>>>>> I will update all debug info testing cases, turn
-gtype-uniquing on, and
>>>>>>> remove the other path.
>>>>>>>
>>>>>>> For internal c++ types, initially they will follow
the path of using MD
>>>>>>> references without a hash.
>>>>>>>
>>>>>>>
>>>>>>> My current implementation is to add a few static
member functions in MDNode
>>>>>>> to profile DI nodes differently.
>>>>>>> + /// If the array of Vals is for debug info,
profile it specially and
>>>>>>> return true.
>>>>>>> + /// If the DI node has a hash value, generate the
profile using only the
>>>>>>> hash value and the declaration flag.
>>>>>>> +  static bool
profileDebugInfoNode(ArrayRef<Value*> Vals, FoldingSetNodeID
>>>>>>> &ID);
>>>>>>>
>>>>>>> + /// If the MDNode is for debug info, profile it
specially and return true.
>>>>>>> + /// If the DI node has a hash value, generate the
profile using only the
>>>>>>> hash value and the declaration flag.
>>>>>>> +  static bool profileDebugInfoNode(const MDNode
*M, FoldingSetNodeID &ID);
>>>>>>>
>>>>>>> + /// Given a hash value and a flag, generate the
profile for later lookup.
>>>>>>> +  static bool profileDebugInfoNode(unsigned Hash,
bool Declaration,
>>>>>>> FoldingSetNodeID &ID);
>>>>>>>
>>>>>>> These static functions are called in Metadata.cpp:
>>>>>>> void MDNode::Profile(FoldingSetNodeID &ID)
const {
>>>>>>> +  if (profileDebugInfoNode(this, ID))
>>>>>>> +    return;
>>>>>>> +
>>>>>>>
>>>>>>> There are other examples of these in MDNode for
handling of specific
>>>>>>> metadata.
>>>>>>> /// Methods for metadata merging.
>>>>>>> static MDNode *getMostGenericTBAA(MDNode *A, MDNode
*B);
>>>>>>> static MDNode *getMostGenericFPMath(MDNode *A,
MDNode *B);
>>>>>>> static MDNode *getMostGenericRange(MDNode *A,
MDNode *B);
>>>>>>>
>>>>>>> Comments are welcome on whether this violates any
layering rule.
>>>>>>>
>>>>>>>
>>>>>>> As I've said many times in email, I don't
think this is a good idea
>>>>>>> and would prefer either a or b below. a) is a much
simpler solution.
>>>>>>>
>>>>>>> Any reason that why it is not a good idea?
>>>>>>>
>>>>>>>
>>>>>>> Other choices are:
>>>>>>> a> Keep a map in DwarfDebug
>>>>>>> Keep in mind that the map is used at many stages,
and it has to be in sync
>>>>>>> with MDNodeSet.
>>>>>>> b> Generalize MDNode to be aware of hash (David
can provide more details)
>>>>>>> c> Extend MDNode to DINode and modify streamers
(bitcode reader|writer, ll
>>>>>>> reader|writer) to be aware of DINode
>>>>>>> We can provide DINode::get(…) to create a DINode.
DINode can have its own
>>>>>>> Profile function.
>>>>>>> Other suggestions are welcome.
>>>>>>>
>>>>>>>
>>>>>>> a or b please.
>>>>>>>
>>>>>>> Option a> will require a DwarfDebug pointer in
every stage of the compiler,
>>>>>>> and passing the map to the DI classes.
>>>>>>> A rough estimation is around 100 places.
>>>>>>> Is it reasonable to pass a DwarfDebug pointer to
DIBuilder and llvm linker?
>>>>>>> Also the map needs to be in sync with MDNodeSet,
maybe using ValueHandle can
>>>>>>> solve the problem.
>>>>>>>
>>>>>>>
>>>>>>> What about putting the map in LLVMContextImpl?
>>>>>>> It already has a few things specifically for debug
info:
>>>>>>> std::vector<DebugRecVH> ScopeRecords;
>>>>>>> DenseMap<std::pair<MDNode*, MDNode*>,
int> ScopeInlinedAt;
>>>>>>> …
>>>>>>>
>>>>>>> I remember David mentioned it once and I forgot
about the conclusion.
>>>>>>>
>>>>>>>
>>>>>>> I mentioned it only as speculation as to how you
were implementing it
>>>>>>> already (but you were doing the profile-changing
stuff).
>>>>>>>
>>>>>>> I don't think it should be necessary to have
the map (in option (a))
>>>>>>> in such a central location as LLVMContext. It
should be usable just
>>>>>>> from DwarfDebug for generation, and DIBuilder can
have its own,
>>>>>>> separate map to do similar things during DI
building.
>>>>>>>
>>>>>>>
>>>>>>> We also need the map during llvm linking since
linking will create new
>>>>>>> MDNodes.
>>>>>>>
>>>>>>>
>>>>>>> I don't understand what you mean by this. IR
linking shouldn't need to
>>>>>>> do anything debug-info-specific, it should just be
the normal IR
>>>>>>> approach.
>>>>>>>
>>>>>>> The declaration-v-definition resolution can be done
at codegen-time.
>>>>>>> We'll have to walk all the lists of retained
types anyway to build the
>>>>>>> map, so we can do declaration-v-definition (keep
definitions over
>>>>>>> declarations when we see both) at that point.
>>>>>>>
>>>>>>>
>>>>>>> Yes, you are right that at codegen-time, we can
generate the map from the
>>>>>>> lists
>>>>>>> of retained types.
>>>>>>>
>>>>>>> But dumping the linked ll file requires the map
when outputting comments of
>>>>>>> the MDNode :]
>>>>>>
>>>>>> Depending on which things we print out, but yes, in
some cases
>>>>>> (derived types) we do print out the type referenced. I
assume the
>>>>>> AsmPrinter can build such a map, then. (in fact, with a
few clients
>>>>>> like this, it might be nice to build a bit of an
abstraction around it
>>>>>> rather than just using a raw map - something that has a
ctor (or I
>>>>>> suppose it could be a factory function) that reads in
the right
>>>>>> metadata, walks the compile units, etc, and builds the
mapping)
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Manman
>>>>>>>
>>>>>>>
>>>>>>> So any other opinion on putting it in LLVMContext
other than it being
>>>>>>> central?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Manman
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Manman
>>>>>>>
>>>>>>>
>>>>>>> More details for option b from David
>>>>>>>
>>>>>>> < The alternative I have in mind is a more
complete version of what
>>>>>>> < you're proposing - a full MD feature, not
an MD feature that's just
>>>>>>> < barely enough to support the needs of debug
info. What we could do is
>>>>>>> < allow the insertion of these MDHash things you
spoke about but take it
>>>>>>> < a step further and have MDNode::getOperand
walk through the hash &
>>>>>>> < give the value (in this way, DebugInfo
wouldn't have to change at all
>>>>>>> < to handle hashes - if the Metadata APIs are
going to be aware of the
>>>>>>> < hashes anyway, they might as well provide this
convenience
>>>>>>> < functionality) the metadata feature would also
have to have some
>>>>>>> < blessed top-level named metadata that would
have a list of hash+MDNode
>>>>>>> < to keep those MDNodes alive (so you
wouldn't have to stuff all the
>>>>>>> < types in the retained types list - metadata
would provide the full
>>>>>>> < support, not just half of it).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Transition from current DI Metadata:
>>>>>>> To have a smooth transition, I will add a flag
"-gtype-hashing" for the type
>>>>>>> uniquing work and turn it on by default when we are
ready.
>>>>>>>
>>>>>>>
>>>>>>> I'd prefer just make the change to have the
front end emit the "hash"
>>>>>>> (it's not really a hash, it's just a
string).
>>>>>>>
>>>>>>> Are you saying no transition period? A single patch
to have correct handling
>>>>>>> of "hash" and to update all existing
testing cases?
>>>>>>>
>>>>>>>
>>>>>>> -----------------------------
>>>>>>> Patches:
>>>>>>> Expect the following patches:
>>>>>>> 1> add flag -gtype-hashing
>>>>>>> 2> add hash field to DI types
>>>>>>> 3> modify DIBuilder to use hash instead of MD
reference
>>>>>>> 4> related to issue 3
>>>>>>>
>>>>>>>
>>>>>>> These can all be a single patch since it
shouldn't be very large if we
>>>>>>> go with a) above. If we go with b) then the MDNode
work should be done
>>>>>>> in isolation first and then the debug info on top
of it.
>>>>>>>
>>>>>>> What is wrong with smaller patches?
>>>>>>> My estimation for all the above with a) is about
30K + testing cases.
>>>>>>>
>>>>>>>
>>>>>>> 5> backend change (in DwarfDebug|CompileUnit) to
support types shared among
>>>>>>> compile units
>>>>>>> requires gdwarf-2 gdwarf-3 gdwarf-4 support for
issues related to ref_addr
>>>>>>>
>>>>>>>
>>>>>>> #5 can and should be done before the rest of them.
>>>>>>>
>>>>>>> I prefer to submit patches according to the flow of
the compiler, starting
>>>>>>> from the frontend, then IR, then backend.
>>>>>>> The testing cases will be added for front end,
llvm-link and backend.
>>>>>>> Any reason why #5 should be done first?
>>>>>>>
>>>>>>>
>>>>>>> All changes should be local to debug info classes
except patch #4.
>>>>>>>
>>>>>>>
>>>>>>> What's patch #4?
>>>>>>>
>>>>>>> Patch #4 above: related to issue 3 (changes
corresponding to how to solve
>>>>>>> issue #3)
>>>>>>>
>>>>>>> -Manman
>>>>>>>
>>>>>>>
>>>>>>> -eric
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> llvm-commits mailing list
>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>
>>>>>>>
>>>>>
>>>
>

Manman Ren

2013-Jun-21 22:16 UTC

head link

[LLVMdev] Proposal: type uniquing of debug info for LTO

On Jun 21, 2013, at 1:12 PM, David Blaikie <dblaikie at gmail.com> wrote:
> On Fri, Jun 21, 2013 at 12:33 PM, Manman Ren <mren at apple.com>
wrote:
>> 
>> On Jun 21, 2013, at 11:56 AM, Eric Christopher wrote:
>> 
>>> On Fri, Jun 21, 2013 at 11:50 AM, Manman Ren <mren at
apple.com> wrote:
>>>> 
>>>> On Jun 21, 2013, at 11:35 AM, Eric Christopher wrote:
>>>> 
>>>>> On Thu, Jun 20, 2013 at 10:52 PM, Manman Ren <mren at
apple.com> wrote:
>>>>>> 
>>>>>> A summary of options for issue #3:
>>>>>> 3> To actually access the MDNode referenced via the
hash value, we need to perform a lookup from the hash value to find the
corresponding MDNode.
>>>>>> The questions are where to store this map and how to
keep it up-to-date when a MDNode is replaced.
>>>>>> ---------------------
>>>>>> Option a) a map in DwarfDebug, AsmWriter and DIBuilder,
modify access functions in DI classes to pass in the map.
>>>>>> May need to modify printing functions in AsmWriter.
>>>>> 
>>>>> You may not need the map in DIBuilder if you piggy back on
the
>>>>> existing map in CGDebugInfo.cpp. The AsmWriter stuff is
unfortunate
>>>>> for just debug dumps of the IR. A way around this might be
nice, but
>>>>> I've not looked into it.
>> 
>> Calling dump() on a MDNode inside gdb will not give us the derived type
information, since we don't have a handle to the map.
> 
> The only case where dump currently navigates a type link that I know
> is printing derived types, where we print "[from foo]". By just
> printing the string identifier for the type (without having to
> navigate the link/use the map) should be about as useful here.
> 
>> 
>>>>> 
>>>>>> ---------------------
>>>>>> Option b) I am going to expand David's description
with more details, correct me if I am wrong.
>>>>> 
>>>>> I'll let David continue this part of the thread with
you.
>>>>> 
>>>>>> 
>>>>>> I personally prefer option b over a since it is much
cleaner.
>>>>>> 
>>>>> 
>>>>> Option b is more general and should probably work. I
suggested a
>>>>> originally because I know it would work and the impact is
limited to
>>>>> debug info. Also it doesn't involve touching the
generic metadata
>>>>> interface or code and could be done quickly and
incrementally. b may
>>>>> also require further changes to the debug info
infrastructure to deal
>>>>> with lookup, etc so may end up being as much churn as a. b
is more
>>>>> useful if we think other users of metadata may want/need
similar
>>>>> functionality in the future.
>>>>> 
>>>>> I'm fine with b as a choice, but it's going to
involve more planning
>>>>> and code review and involve people outside of you, me and
Dave :)
>>>> 
>>>> Got it. David, are you okay with the details I added about
option b?
>>>> Let's start a new thread with title "Proposal: extend
Metadata (MDNode) to support MDHash" or something like that.
>>> 
>>> I'd like to hear a rationale for wanting b first. I don't
know (and
>>> can think of) any other users of this functionality right now.
>> 
>> I will try my best :]
>> MDNode has a list of operands, one MDNode's full content can be
thought of as a graph, when there is a reference to another MDNode, we have
>> an edge. The MDHash can be used to give a unique id (or string) to the
full content of the graph.
>> 
>> The hash value applies to any kind of MDNode with complicated MDNode
references.
>> 
>> If you think this is not general enough to be in MDNode,
> 
> Eric's concern/question is whether it's a feature that would be
useful
> to any other code, I believe. (I'm less concerned about this so long
> as the feature is appropriately general, even if it doesn't have any
> other consumers at the moment - I don't think it'd drastically
> complicate the Metadata handling code (which I think is currently
> fairly simple, but I admit to not knowing it in any great detail))
Yes, I tried to explain why MDHash can help with MDNodes that form a complicated
graph.
 > 
>> my earlier proposal of extending MDNode to DINode is going the other
direction.
> 
> As discussed, this isn't far enough in the other direction. If
we're
> going to touch core LLVM Metadata handling code at all, it needs to be
> a complete/cohesive feature, not just tendrils of a debug info
> feature. Hence (a) or (b) not (c).
> 
>> Basically, MDNode will have an opcode like the opcode for Instruction,
to support TBAA, DI, and other types.
>> Inside streamers (bc ll reader), we can create the specialized class
according to the opcode.
>> That requires a lot of changes too :(
>> 
>>> 
>>>> We can add a weight to MDHash (other names are fine with me),
to support multiple MDNodes with the same hash, but different weight.
>>>> 
>>> 
>>> Wait, what? This sounds like a bad idea.
>> Why?
>> The weight is needed to differentiate a forward declaration vs. a
definition, they both name the same class, but they are different.
> 
> This would probably tip me over into Eric's perspective that it would
> be too esoteric a feature to add to a generic/core piece of
> infrastructure.
If we allow the map to map one hash value (or string) to multiple MDNodes,
without adding the weight, it should still work.
But is that general enough?

Thanks,
Manman
 > 
> Yes, I'm not sure we should resolve duplicates in this case, then
> (short of assigning a separate identifier to declarations V
> definitions so they don't collide/get deduped - then doing a pass in
> DwarfDebug/during debug-info-generation time to resolve in favor of
> definitions (hmm, yeah, even that doesn't quite make sense - you'd
> have to be able to visit the things referring to the declarations &
> update them to refer to definitions, and we'd have no real way to
> discover those easily except walking all the debug info... could
> work))
> 
>> 
>> Thanks,
>> Manman
>> 
>>> 
>>> -eric
>>> 
>>>> Thanks,
>>>> Manman
>>>> 
>>>>> 
>>>>> -eric
>>>>> 
>>>>> 
>>>>>> Thanks,
>>>>>> Manman
>>>>>> 
>>>>>> On Jun 20, 2013, at 5:39 PM, David Blaikie wrote:
>>>>>> 
>>>>>>> On Thu, Jun 20, 2013 at 5:25 PM, Manman Ren
<mren at apple.com> wrote:
>>>>>>>> 
>>>>>>>> On Jun 20, 2013, at 5:18 PM, David Blaikie
<dblaikie at gmail.com> wrote:
>>>>>>>> 
>>>>>>>> On Thu, Jun 20, 2013 at 5:13 PM, Manman Ren
<mren at apple.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Jun 20, 2013, at 4:52 PM, David Blaikie
wrote:
>>>>>>>> 
>>>>>>>> On Thu, Jun 20, 2013 at 4:45 PM, Manman Ren
<mren at apple.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Jun 20, 2013, at 3:55 PM, Manman Ren
<mren at apple.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Jun 20, 2013, at 2:58 PM, Eric Christopher
wrote:
>>>>>>>> 
>>>>>>>> Hi Manman,
>>>>>>>> 
>>>>>>>> On Thu, Jun 20, 2013 at 2:51 PM, Manman Ren
<mren at apple.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> The intent of this proposal is to speedup
compilation of "-flto -g" for c++
>>>>>>>> programs.
>>>>>>>> This is based on discussions with Adrian, David
and Eric.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks for bringing this back to the list. The
original thread was
>>>>>>>> getting quite long.
>>>>>>>> 
>>>>>>>> ---------------------------
>>>>>>>> Problem:
>>>>>>>> A single class can be used in multiple source
files and the DI (Debug Info)
>>>>>>>> class is included in multiple bc files. The
duplication of
>>>>>>>> class definitions causes blow-up in # of
MDNodes, # of DIEs, leading to
>>>>>>>> large memory requirement.
>>>>>>>> 
>>>>>>>> As an example, SPEC xalancbmk requires 7GB of
memory when compiled with
>>>>>>>> -flto -g.
>>>>>>>> With a preliminary implementation of type
uniquing, the memory usage will be
>>>>>>>> down to 2GB.
>>>>>>>> 
>>>>>>>> In order to unique types, we have to break
cycles in the MDNodes.
>>>>>>>> 
>>>>>>>> A simple struct definition
>>>>>>>> struct Base {
>>>>>>>> int a;
>>>>>>>> };
>>>>>>>> can cause cycles in MDNodes:
>>>>>>>> !12 = metadata !{i32 786451, metadata !13,
null, metadata !"Base", i32 1,
>>>>>>>> i64 32, i64 32, i32 0, i32 0, null, metadata
!14, i32 0, null, null} ; [
>>>>>>>> DW_TAG_structure_type ] [Base] [line 1, size
32, align 32, offset 0] [from ]
>>>>>>>> !14 = metadata !{metadata !15, metadata !16}
>>>>>>>> !15 = metadata !{i32 786445, metadata !13,
metadata !12, metadata !"a", i32
>>>>>>>> 2, i64 32, i64 32, i64 0, i32 0, metadata !8} ;
[ DW_TAG_member ] [a] [line
>>>>>>>> 2, size 32, align 32, offset 0] [from int]
>>>>>>>> !16 = metadata !{i32 786478, metadata !13,
metadata !12, metadata !"Base",
>>>>>>>> metadata !"Base", metadata
!"", i32 1, metadata !17, i1 false, i1 false, i32
>>>>>>>> 0, i32 0, null, i32 320, i1 false, null, null,
i32 0, metadata !20, i32 1} ;
>>>>>>>> [ DW_TAG_subprogram ] [line 1] [Base]
>>>>>>>> 
>>>>>>>> Cycles: !12 -- !14 -- !15 -- !12
>>>>>>>>     !12 -- !14 -- !16 -- !12
>>>>>>>> 
>>>>>>>> These cycles make it hard to unique the same
struct used in two bc files.
>>>>>>>> 
>>>>>>>> ---------------------------
>>>>>>>> How to fix:
>>>>>>>> 
>>>>>>>> We attach a hash value to types to help type
uniquing and we also replace
>>>>>>>> references to types with their hash values.
>>>>>>>> For the above struct "Base", we now
have the following MDNodes:
>>>>>>>> !4 = metadata !{i32 786451, metadata !5, null,
metadata !"Base", i32 1, i64
>>>>>>>> 32, i64 32, i32 0, i32 0, null, metadata !6,
i32 0, i32 0, null, i32
>>>>>>>> 915398439} ; [ DW_TAG_structure_type ] [Base]
[line 1, size 32, align 32,
>>>>>>>> offset 0] [from ]
>>>>>>>> !6 = metadata !{metadata !7, metadata !9}
>>>>>>>> !7 = metadata !{i32 786445, metadata !5, i32
915398439, metadata !"a", i32
>>>>>>>> 2, i64 32, i64 32, i64 0, i32 0, metadata !8} ;
[ DW_TAG_member ] [a] [line
>>>>>>>> 2, size 32, align 32, offset 0] [from int]
>>>>>>>> !9 = metadata !{i32 786478, metadata !5, i32
915398439, metadata !"Base",
>>>>>>>> metadata !"Base", metadata
!"", i32 1, metadata !10, i1 false, i1 false, i32
>>>>>>>> 0, i32 0, null, i32 320, i1 false, null, null,
i32 0, metadata !13, i32 1} ;
>>>>>>>> [ DW_TAG_subprogram ] [line 1] [Base]
>>>>>>>> 
>>>>>>>> Note that the cycles are gone and !4 has a hash
value of 915398439, and the
>>>>>>>> references to !4 are replaced with 915398439.
>>>>>>>> Thanks Eric for suggesting replacing MD
reference with a hash value.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> In particular I recommended this:
>>>>>>>> 
>>>>>>>> a) For C++ odr replace it with the
"hash" that's just a string
>>>>>>>> representing the type name.
>>>>>>>> b) For Internal C++ types and all C types
replace it with a string
>>>>>>>> that's a concatenation of the type name and
the name of the compile
>>>>>>>> unit.
>>>>>>>> 
>>>>>>>> Yes, that is what we agreed on over email.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> There are a few issues:
>>>>>>>> 1> How to generate the hash for a given
type?
>>>>>>>> With C++'s ODR, it should be enough by
using the context and the name for
>>>>>>>> non-internal c++ types.
>>>>>>>> For internal c++ types and types of other
languages, hash will not be used.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Explain this?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> For a while, I am going to support both hash
and MD reference, once
>>>>>>>> everything is working with hash,
>>>>>>>> I will update all debug info testing cases,
turn -gtype-uniquing on, and
>>>>>>>> remove the other path.
>>>>>>>> 
>>>>>>>> For internal c++ types, initially they will
follow the path of using MD
>>>>>>>> references without a hash.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> My current implementation is to add a few
static member functions in MDNode
>>>>>>>> to profile DI nodes differently.
>>>>>>>> + /// If the array of Vals is for debug info,
profile it specially and
>>>>>>>> return true.
>>>>>>>> + /// If the DI node has a hash value, generate
the profile using only the
>>>>>>>> hash value and the declaration flag.
>>>>>>>> +  static bool
profileDebugInfoNode(ArrayRef<Value*> Vals, FoldingSetNodeID
>>>>>>>> &ID);
>>>>>>>> 
>>>>>>>> + /// If the MDNode is for debug info, profile
it specially and return true.
>>>>>>>> + /// If the DI node has a hash value, generate
the profile using only the
>>>>>>>> hash value and the declaration flag.
>>>>>>>> +  static bool profileDebugInfoNode(const
MDNode *M, FoldingSetNodeID &ID);
>>>>>>>> 
>>>>>>>> + /// Given a hash value and a flag, generate
the profile for later lookup.
>>>>>>>> +  static bool profileDebugInfoNode(unsigned
Hash, bool Declaration,
>>>>>>>> FoldingSetNodeID &ID);
>>>>>>>> 
>>>>>>>> These static functions are called in
Metadata.cpp:
>>>>>>>> void MDNode::Profile(FoldingSetNodeID &ID)
const {
>>>>>>>> +  if (profileDebugInfoNode(this, ID))
>>>>>>>> +    return;
>>>>>>>> +
>>>>>>>> 
>>>>>>>> There are other examples of these in MDNode for
handling of specific
>>>>>>>> metadata.
>>>>>>>> /// Methods for metadata merging.
>>>>>>>> static MDNode *getMostGenericTBAA(MDNode *A,
MDNode *B);
>>>>>>>> static MDNode *getMostGenericFPMath(MDNode *A,
MDNode *B);
>>>>>>>> static MDNode *getMostGenericRange(MDNode *A,
MDNode *B);
>>>>>>>> 
>>>>>>>> Comments are welcome on whether this violates
any layering rule.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> As I've said many times in email, I
don't think this is a good idea
>>>>>>>> and would prefer either a or b below. a) is a
much simpler solution.
>>>>>>>> 
>>>>>>>> Any reason that why it is not a good idea?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Other choices are:
>>>>>>>> a> Keep a map in DwarfDebug
>>>>>>>> Keep in mind that the map is used at many
stages, and it has to be in sync
>>>>>>>> with MDNodeSet.
>>>>>>>> b> Generalize MDNode to be aware of hash
(David can provide more details)
>>>>>>>> c> Extend MDNode to DINode and modify
streamers (bitcode reader|writer, ll
>>>>>>>> reader|writer) to be aware of DINode
>>>>>>>> We can provide DINode::get(…) to create a
DINode. DINode can have its own
>>>>>>>> Profile function.
>>>>>>>> Other suggestions are welcome.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> a or b please.
>>>>>>>> 
>>>>>>>> Option a> will require a DwarfDebug pointer
in every stage of the compiler,
>>>>>>>> and passing the map to the DI classes.
>>>>>>>> A rough estimation is around 100 places.
>>>>>>>> Is it reasonable to pass a DwarfDebug pointer
to DIBuilder and llvm linker?
>>>>>>>> Also the map needs to be in sync with
MDNodeSet, maybe using ValueHandle can
>>>>>>>> solve the problem.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> What about putting the map in LLVMContextImpl?
>>>>>>>> It already has a few things specifically for
debug info:
>>>>>>>> std::vector<DebugRecVH> ScopeRecords;
>>>>>>>> DenseMap<std::pair<MDNode*, MDNode*>,
int> ScopeInlinedAt;
>>>>>>>> …
>>>>>>>> 
>>>>>>>> I remember David mentioned it once and I forgot
about the conclusion.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I mentioned it only as speculation as to how
you were implementing it
>>>>>>>> already (but you were doing the
profile-changing stuff).
>>>>>>>> 
>>>>>>>> I don't think it should be necessary to
have the map (in option (a))
>>>>>>>> in such a central location as LLVMContext. It
should be usable just
>>>>>>>> from DwarfDebug for generation, and DIBuilder
can have its own,
>>>>>>>> separate map to do similar things during DI
building.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> We also need the map during llvm linking since
linking will create new
>>>>>>>> MDNodes.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I don't understand what you mean by this.
IR linking shouldn't need to
>>>>>>>> do anything debug-info-specific, it should just
be the normal IR
>>>>>>>> approach.
>>>>>>>> 
>>>>>>>> The declaration-v-definition resolution can be
done at codegen-time.
>>>>>>>> We'll have to walk all the lists of
retained types anyway to build the
>>>>>>>> map, so we can do declaration-v-definition
(keep definitions over
>>>>>>>> declarations when we see both) at that point.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Yes, you are right that at codegen-time, we can
generate the map from the
>>>>>>>> lists
>>>>>>>> of retained types.
>>>>>>>> 
>>>>>>>> But dumping the linked ll file requires the map
when outputting comments of
>>>>>>>> the MDNode :]
>>>>>>> 
>>>>>>> Depending on which things we print out, but yes, in
some cases
>>>>>>> (derived types) we do print out the type
referenced. I assume the
>>>>>>> AsmPrinter can build such a map, then. (in fact,
with a few clients
>>>>>>> like this, it might be nice to build a bit of an
abstraction around it
>>>>>>> rather than just using a raw map - something that
has a ctor (or I
>>>>>>> suppose it could be a factory function) that reads
in the right
>>>>>>> metadata, walks the compile units, etc, and builds
the mapping)
>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Manman
>>>>>>>> 
>>>>>>>> 
>>>>>>>> So any other opinion on putting it in
LLVMContext other than it being
>>>>>>>> central?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Manman
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Manman
>>>>>>>> 
>>>>>>>> 
>>>>>>>> More details for option b from David
>>>>>>>> 
>>>>>>>> < The alternative I have in mind is a more
complete version of what
>>>>>>>> < you're proposing - a full MD feature,
not an MD feature that's just
>>>>>>>> < barely enough to support the needs of
debug info. What we could do is
>>>>>>>> < allow the insertion of these MDHash things
you spoke about but take it
>>>>>>>> < a step further and have MDNode::getOperand
walk through the hash &
>>>>>>>> < give the value (in this way, DebugInfo
wouldn't have to change at all
>>>>>>>> < to handle hashes - if the Metadata APIs
are going to be aware of the
>>>>>>>> < hashes anyway, they might as well provide
this convenience
>>>>>>>> < functionality) the metadata feature would
also have to have some
>>>>>>>> < blessed top-level named metadata that
would have a list of hash+MDNode
>>>>>>>> < to keep those MDNodes alive (so you
wouldn't have to stuff all the
>>>>>>>> < types in the retained types list -
metadata would provide the full
>>>>>>>> < support, not just half of it).
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Transition from current DI Metadata:
>>>>>>>> To have a smooth transition, I will add a flag
"-gtype-hashing" for the type
>>>>>>>> uniquing work and turn it on by default when we
are ready.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I'd prefer just make the change to have the
front end emit the "hash"
>>>>>>>> (it's not really a hash, it's just a
string).
>>>>>>>> 
>>>>>>>> Are you saying no transition period? A single
patch to have correct handling
>>>>>>>> of "hash" and to update all existing
testing cases?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -----------------------------
>>>>>>>> Patches:
>>>>>>>> Expect the following patches:
>>>>>>>> 1> add flag -gtype-hashing
>>>>>>>> 2> add hash field to DI types
>>>>>>>> 3> modify DIBuilder to use hash instead of
MD reference
>>>>>>>> 4> related to issue 3
>>>>>>>> 
>>>>>>>> 
>>>>>>>> These can all be a single patch since it
shouldn't be very large if we
>>>>>>>> go with a) above. If we go with b) then the
MDNode work should be done
>>>>>>>> in isolation first and then the debug info on
top of it.
>>>>>>>> 
>>>>>>>> What is wrong with smaller patches?
>>>>>>>> My estimation for all the above with a) is
about 30K + testing cases.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 5> backend change (in
DwarfDebug|CompileUnit) to support types shared among
>>>>>>>> compile units
>>>>>>>> requires gdwarf-2 gdwarf-3 gdwarf-4 support for
issues related to ref_addr
>>>>>>>> 
>>>>>>>> 
>>>>>>>> #5 can and should be done before the rest of
them.
>>>>>>>> 
>>>>>>>> I prefer to submit patches according to the
flow of the compiler, starting
>>>>>>>> from the frontend, then IR, then backend.
>>>>>>>> The testing cases will be added for front end,
llvm-link and backend.
>>>>>>>> Any reason why #5 should be done first?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> All changes should be local to debug info
classes except patch #4.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> What's patch #4?
>>>>>>>> 
>>>>>>>> Patch #4 above: related to issue 3 (changes
corresponding to how to solve
>>>>>>>> issue #3)
>>>>>>>> 
>>>>>>>> -Manman
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -eric
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>>>>>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> llvm-commits mailing list
>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130621/31726c65/attachment.html>

David Blaikie

2013-Jun-21 22:25 UTC

head link

[LLVMdev] Proposal: type uniquing of debug info for LTO

On Fri, Jun 21, 2013 at 3:16 PM, Manman Ren <mren at apple.com>
wrote:>
> On Jun 21, 2013, at 1:12 PM, David Blaikie <dblaikie at gmail.com>
wrote:
>
> On Fri, Jun 21, 2013 at 12:33 PM, Manman Ren <mren at apple.com>
wrote:
>
>
> On Jun 21, 2013, at 11:56 AM, Eric Christopher wrote:
>
> On Fri, Jun 21, 2013 at 11:50 AM, Manman Ren <mren at apple.com>
wrote:
>
>
> On Jun 21, 2013, at 11:35 AM, Eric Christopher wrote:
>
> On Thu, Jun 20, 2013 at 10:52 PM, Manman Ren <mren at apple.com>
wrote:
>
>
> A summary of options for issue #3:
> 3> To actually access the MDNode referenced via the hash value, we need
to
> perform a lookup from the hash value to find the corresponding MDNode.
> The questions are where to store this map and how to keep it up-to-date
when
> a MDNode is replaced.
> ---------------------
> Option a) a map in DwarfDebug, AsmWriter and DIBuilder, modify access
> functions in DI classes to pass in the map.
> May need to modify printing functions in AsmWriter.
>
>
> You may not need the map in DIBuilder if you piggy back on the
> existing map in CGDebugInfo.cpp. The AsmWriter stuff is unfortunate
> for just debug dumps of the IR. A way around this might be nice, but
> I've not looked into it.
>
>
> Calling dump() on a MDNode inside gdb will not give us the derived type
> information, since we don't have a handle to the map.
>
>
> The only case where dump currently navigates a type link that I know
> is printing derived types, where we print "[from foo]". By just
> printing the string identifier for the type (without having to
> navigate the link/use the map) should be about as useful here.
>
>
>
> ---------------------
> Option b) I am going to expand David's description with more details,
> correct me if I am wrong.
>
>
> I'll let David continue this part of the thread with you.
>
>
> I personally prefer option b over a since it is much cleaner.
>
>
> Option b is more general and should probably work. I suggested a
> originally because I know it would work and the impact is limited to
> debug info. Also it doesn't involve touching the generic metadata
> interface or code and could be done quickly and incrementally. b may
> also require further changes to the debug info infrastructure to deal
> with lookup, etc so may end up being as much churn as a. b is more
> useful if we think other users of metadata may want/need similar
> functionality in the future.
>
> I'm fine with b as a choice, but it's going to involve more
planning
> and code review and involve people outside of you, me and Dave :)
>
>
> Got it. David, are you okay with the details I added about option b?
> Let's start a new thread with title "Proposal: extend Metadata
(MDNode) to
> support MDHash" or something like that.
>
>
> I'd like to hear a rationale for wanting b first. I don't know (and
> can think of) any other users of this functionality right now.
>
>
> I will try my best :]
> MDNode has a list of operands, one MDNode's full content can be thought
of
> as a graph, when there is a reference to another MDNode, we have
> an edge. The MDHash can be used to give a unique id (or string) to the full
> content of the graph.
>
> The hash value applies to any kind of MDNode with complicated MDNode
> references.
>
> If you think this is not general enough to be in MDNode,
>
>
> Eric's concern/question is whether it's a feature that would be
useful
> to any other code, I believe. (I'm less concerned about this so long
> as the feature is appropriately general, even if it doesn't have any
> other consumers at the moment - I don't think it'd drastically
> complicate the Metadata handling code (which I think is currently
> fairly simple, but I admit to not knowing it in any great detail))
>
>
> Yes, I tried to explain why MDHash can help with MDNodes that form a
> complicated graph.
>
>
>
> my earlier proposal of extending MDNode to DINode is going the other
> direction.
>
>
> As discussed, this isn't far enough in the other direction. If
we're
> going to touch core LLVM Metadata handling code at all, it needs to be
> a complete/cohesive feature, not just tendrils of a debug info
> feature. Hence (a) or (b) not (c).
>
> Basically, MDNode will have an opcode like the opcode for Instruction, to
> support TBAA, DI, and other types.
> Inside streamers (bc ll reader), we can create the specialized class
> according to the opcode.
> That requires a lot of changes too :(
>
>
> We can add a weight to MDHash (other names are fine with me), to support
> multiple MDNodes with the same hash, but different weight.
>
>
> Wait, what? This sounds like a bad idea.
>
> Why?
> The weight is needed to differentiate a forward declaration vs. a
> definition, they both name the same class, but they are different.
>
>
> This would probably tip me over into Eric's perspective that it would
> be too esoteric a feature to add to a generic/core piece of
> infrastructure.
>
>
> If we allow the map to map one hash value (or string) to multiple MDNodes,
> without adding the weight, it should still work.
> But is that general enough?
Then there would be no transparent access through the nodes - so I
don't know what the access API would look like anymore & it would seem
rather awkward as a general API.

The more you point out the reasonable complications in using such a
generic API for your task, the less it seems like it would be
sufficient/a good fit - the more I'd be inclined to go with Eric's
choice of just keeping the mapping locally (in DwarfDebug, etc) & not
impacting metadata APIs at all.

Manman Ren

2013-Jun-21 22:58 UTC

head link

[LLVMdev] Proposal: type uniquing of debug info for LTO

On Jun 21, 2013, at 3:25 PM, David Blaikie wrote:
> On Fri, Jun 21, 2013 at 3:16 PM, Manman Ren <mren at apple.com>
wrote:
>> 
>> On Jun 21, 2013, at 1:12 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>> 
>> On Fri, Jun 21, 2013 at 12:33 PM, Manman Ren <mren at apple.com>
wrote:
>> 
>> 
>> On Jun 21, 2013, at 11:56 AM, Eric Christopher wrote:
>> 
>> On Fri, Jun 21, 2013 at 11:50 AM, Manman Ren <mren at apple.com>
wrote:
>> 
>> 
>> On Jun 21, 2013, at 11:35 AM, Eric Christopher wrote:
>> 
>> On Thu, Jun 20, 2013 at 10:52 PM, Manman Ren <mren at apple.com>
wrote:
>> 
>> 
>> A summary of options for issue #3:
>> 3> To actually access the MDNode referenced via the hash value, we
need to
>> perform a lookup from the hash value to find the corresponding MDNode.
>> The questions are where to store this map and how to keep it up-to-date
when
>> a MDNode is replaced.
>> ---------------------
>> Option a) a map in DwarfDebug, AsmWriter and DIBuilder, modify access
>> functions in DI classes to pass in the map.
>> May need to modify printing functions in AsmWriter.
>> 
>> 
>> You may not need the map in DIBuilder if you piggy back on the
>> existing map in CGDebugInfo.cpp. The AsmWriter stuff is unfortunate
>> for just debug dumps of the IR. A way around this might be nice, but
>> I've not looked into it.
>> 
>> 
>> Calling dump() on a MDNode inside gdb will not give us the derived type
>> information, since we don't have a handle to the map.
>> 
>> 
>> The only case where dump currently navigates a type link that I know
>> is printing derived types, where we print "[from foo]". By
just
>> printing the string identifier for the type (without having to
>> navigate the link/use the map) should be about as useful here.
>> 
>> 
>> 
>> ---------------------
>> Option b) I am going to expand David's description with more
details,
>> correct me if I am wrong.
>> 
>> 
>> I'll let David continue this part of the thread with you.
>> 
>> 
>> I personally prefer option b over a since it is much cleaner.
>> 
>> 
>> Option b is more general and should probably work. I suggested a
>> originally because I know it would work and the impact is limited to
>> debug info. Also it doesn't involve touching the generic metadata
>> interface or code and could be done quickly and incrementally. b may
>> also require further changes to the debug info infrastructure to deal
>> with lookup, etc so may end up being as much churn as a. b is more
>> useful if we think other users of metadata may want/need similar
>> functionality in the future.
>> 
>> I'm fine with b as a choice, but it's going to involve more
planning
>> and code review and involve people outside of you, me and Dave :)
>> 
>> 
>> Got it. David, are you okay with the details I added about option b?
>> Let's start a new thread with title "Proposal: extend Metadata
(MDNode) to
>> support MDHash" or something like that.
>> 
>> 
>> I'd like to hear a rationale for wanting b first. I don't know
(and
>> can think of) any other users of this functionality right now.
>> 
>> 
>> I will try my best :]
>> MDNode has a list of operands, one MDNode's full content can be
thought of
>> as a graph, when there is a reference to another MDNode, we have
>> an edge. The MDHash can be used to give a unique id (or string) to the
full
>> content of the graph.
>> 
>> The hash value applies to any kind of MDNode with complicated MDNode
>> references.
>> 
>> If you think this is not general enough to be in MDNode,
>> 
>> 
>> Eric's concern/question is whether it's a feature that would be
useful
>> to any other code, I believe. (I'm less concerned about this so
long
>> as the feature is appropriately general, even if it doesn't have
any
>> other consumers at the moment - I don't think it'd drastically
>> complicate the Metadata handling code (which I think is currently
>> fairly simple, but I admit to not knowing it in any great detail))
>> 
>> 
>> Yes, I tried to explain why MDHash can help with MDNodes that form a
>> complicated graph.
>> 
>> 
>> 
>> my earlier proposal of extending MDNode to DINode is going the other
>> direction.
>> 
>> 
>> As discussed, this isn't far enough in the other direction. If
we're
>> going to touch core LLVM Metadata handling code at all, it needs to be
>> a complete/cohesive feature, not just tendrils of a debug info
>> feature. Hence (a) or (b) not (c).
>> 
>> Basically, MDNode will have an opcode like the opcode for Instruction,
to
>> support TBAA, DI, and other types.
>> Inside streamers (bc ll reader), we can create the specialized class
>> according to the opcode.
>> That requires a lot of changes too :(
>> 
>> 
>> We can add a weight to MDHash (other names are fine with me), to
support
>> multiple MDNodes with the same hash, but different weight.
>> 
>> 
>> Wait, what? This sounds like a bad idea.
>> 
>> Why?
>> The weight is needed to differentiate a forward declaration vs. a
>> definition, they both name the same class, but they are different.
>> 
>> 
>> This would probably tip me over into Eric's perspective that it
would
>> be too esoteric a feature to add to a generic/core piece of
>> infrastructure.
>> 
>> 
>> If we allow the map to map one hash value (or string) to multiple
MDNodes,
>> without adding the weight, it should still work.
>> But is that general enough?
> 
> Then there would be no transparent access through the nodes - so I
> don't know what the access API would look like anymore & it would
seem
> rather awkward as a general API.
The access function will have two modes:
1> it does not matter which one is returned
(This works for verification purposes and printing purposes.)
2> return all the MDNodes having the hash, DwarfDebug will figure out which
one has priority

It is not clean. I actually prefer adding a weight, it kind of makes sense when
we are dealing with multiple MDNodes with the same hash.
With weight, we can always pick one with higher weight (or priority).

If we veto option b, we will only have option a, which has its own issues:
1> it is not really local to debug info, we have to modify AsmWriter at
least.
2> we have to keep (and construct) one map at a few places.
3> we have to modify the following access function in DI classes
getContext, getTypeDerivedFrom, getClassType, getContainingType, getType
to take an extra map
and of course the call sites of all these functions.

Compare against putting the map in LLVMContext:
1> quoting from David
"General principle of keeping things in the least scope necessary.
Reducing scope helps code
maintenance by making the influence of certain data is only accessed
within a certain scope."

Another issue is on how to submit the patches:
I prefer to submit small patches and have a transition period where we use
-gtype-uniquing to test the work in progress.
The advantages are
1> People can see our progress.
2> We can incrementally add more testing cases while testing the feature
under the flag.
3> If the patch causes problem, we don't have to revert the big patch and
submit another big patch that has the fix.

Thanks,
Manman
> 
> The more you point out the reasonable complications in using such a
> generic API for your task, the less it seems like it would be
> sufficient/a good fit - the more I'd be inclined to go with Eric's
> choice of just keeping the mapping locally (in DwarfDebug, etc) & not
> impacting metadata APIs at all.

David Blaikie

2013-Jun-21 23:01 UTC

head link

[LLVMdev] Proposal: type uniquing of debug info for LTO

On Fri, Jun 21, 2013 at 3:58 PM, Manman Ren <mren at apple.com>
wrote:>
> On Jun 21, 2013, at 3:25 PM, David Blaikie wrote:
>
>> On Fri, Jun 21, 2013 at 3:16 PM, Manman Ren <mren at apple.com>
wrote:
>>>
>>> On Jun 21, 2013, at 1:12 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>>>
>>> On Fri, Jun 21, 2013 at 12:33 PM, Manman Ren <mren at
apple.com> wrote:
>>>
>>>
>>> On Jun 21, 2013, at 11:56 AM, Eric Christopher wrote:
>>>
>>> On Fri, Jun 21, 2013 at 11:50 AM, Manman Ren <mren at
apple.com> wrote:
>>>
>>>
>>> On Jun 21, 2013, at 11:35 AM, Eric Christopher wrote:
>>>
>>> On Thu, Jun 20, 2013 at 10:52 PM, Manman Ren <mren at
apple.com> wrote:
>>>
>>>
>>> A summary of options for issue #3:
>>> 3> To actually access the MDNode referenced via the hash value,
we need to
>>> perform a lookup from the hash value to find the corresponding
MDNode.
>>> The questions are where to store this map and how to keep it
up-to-date when
>>> a MDNode is replaced.
>>> ---------------------
>>> Option a) a map in DwarfDebug, AsmWriter and DIBuilder, modify
access
>>> functions in DI classes to pass in the map.
>>> May need to modify printing functions in AsmWriter.
>>>
>>>
>>> You may not need the map in DIBuilder if you piggy back on the
>>> existing map in CGDebugInfo.cpp. The AsmWriter stuff is unfortunate
>>> for just debug dumps of the IR. A way around this might be nice,
but
>>> I've not looked into it.
>>>
>>>
>>> Calling dump() on a MDNode inside gdb will not give us the derived
type
>>> information, since we don't have a handle to the map.
>>>
>>>
>>> The only case where dump currently navigates a type link that I
know
>>> is printing derived types, where we print "[from foo]".
By just
>>> printing the string identifier for the type (without having to
>>> navigate the link/use the map) should be about as useful here.
>>>
>>>
>>>
>>> ---------------------
>>> Option b) I am going to expand David's description with more
details,
>>> correct me if I am wrong.
>>>
>>>
>>> I'll let David continue this part of the thread with you.
>>>
>>>
>>> I personally prefer option b over a since it is much cleaner.
>>>
>>>
>>> Option b is more general and should probably work. I suggested a
>>> originally because I know it would work and the impact is limited
to
>>> debug info. Also it doesn't involve touching the generic
metadata
>>> interface or code and could be done quickly and incrementally. b
may
>>> also require further changes to the debug info infrastructure to
deal
>>> with lookup, etc so may end up being as much churn as a. b is more
>>> useful if we think other users of metadata may want/need similar
>>> functionality in the future.
>>>
>>> I'm fine with b as a choice, but it's going to involve more
planning
>>> and code review and involve people outside of you, me and Dave :)
>>>
>>>
>>> Got it. David, are you okay with the details I added about option
b?
>>> Let's start a new thread with title "Proposal: extend
Metadata (MDNode) to
>>> support MDHash" or something like that.
>>>
>>>
>>> I'd like to hear a rationale for wanting b first. I don't
know (and
>>> can think of) any other users of this functionality right now.
>>>
>>>
>>> I will try my best :]
>>> MDNode has a list of operands, one MDNode's full content can be
thought of
>>> as a graph, when there is a reference to another MDNode, we have
>>> an edge. The MDHash can be used to give a unique id (or string) to
the full
>>> content of the graph.
>>>
>>> The hash value applies to any kind of MDNode with complicated
MDNode
>>> references.
>>>
>>> If you think this is not general enough to be in MDNode,
>>>
>>>
>>> Eric's concern/question is whether it's a feature that
would be useful
>>> to any other code, I believe. (I'm less concerned about this so
long
>>> as the feature is appropriately general, even if it doesn't
have any
>>> other consumers at the moment - I don't think it'd
drastically
>>> complicate the Metadata handling code (which I think is currently
>>> fairly simple, but I admit to not knowing it in any great detail))
>>>
>>>
>>> Yes, I tried to explain why MDHash can help with MDNodes that form
a
>>> complicated graph.
>>>
>>>
>>>
>>> my earlier proposal of extending MDNode to DINode is going the
other
>>> direction.
>>>
>>>
>>> As discussed, this isn't far enough in the other direction. If
we're
>>> going to touch core LLVM Metadata handling code at all, it needs to
be
>>> a complete/cohesive feature, not just tendrils of a debug info
>>> feature. Hence (a) or (b) not (c).
>>>
>>> Basically, MDNode will have an opcode like the opcode for
Instruction, to
>>> support TBAA, DI, and other types.
>>> Inside streamers (bc ll reader), we can create the specialized
class
>>> according to the opcode.
>>> That requires a lot of changes too :(
>>>
>>>
>>> We can add a weight to MDHash (other names are fine with me), to
support
>>> multiple MDNodes with the same hash, but different weight.
>>>
>>>
>>> Wait, what? This sounds like a bad idea.
>>>
>>> Why?
>>> The weight is needed to differentiate a forward declaration vs. a
>>> definition, they both name the same class, but they are different.
>>>
>>>
>>> This would probably tip me over into Eric's perspective that it
would
>>> be too esoteric a feature to add to a generic/core piece of
>>> infrastructure.
>>>
>>>
>>> If we allow the map to map one hash value (or string) to multiple
MDNodes,
>>> without adding the weight, it should still work.
>>> But is that general enough?
>>
>> Then there would be no transparent access through the nodes - so I
>> don't know what the access API would look like anymore & it
would seem
>> rather awkward as a general API.
>
> The access function will have two modes:
> 1> it does not matter which one is returned
> (This works for verification purposes and printing purposes.)
> 2> return all the MDNodes having the hash, DwarfDebug will figure out
which one has priority
>
> It is not clean. I actually prefer adding a weight, it kind of makes sense
when we are dealing with multiple MDNodes with the same hash.
> With weight, we can always pick one with higher weight (or priority).
>
> If we veto option b, we will only have option a, which has its own issues:
> 1> it is not really local to debug info, we have to modify AsmWriter at
least.
I'm pretty sure I've kept saying we don't need to do this. Printing
asm can just print the identifier for the type node - it doesn't need
to navigate across that link.
> 2> we have to keep (and construct) one map at a few places.
> 3> we have to modify the following access function in DI classes
> getContext, getTypeDerivedFrom, getClassType, getContainingType, getType
> to take an extra map
> and of course the call sites of all these functions.
>
> Compare against putting the map in LLVMContext:
> 1> quoting from David
> "General principle of keeping things in the least scope necessary.
> Reducing scope helps code
> maintenance by making the influence of certain data is only accessed
> within a certain scope."
>
> Another issue is on how to submit the patches:
> I prefer to submit small patches and have a transition period where we use
-gtype-uniquing to test the work in progress.
Incremental work doesn't require a flag. You can simply migrate one
field at a time. Is there a reason that wouldn't work?
> The advantages are
> 1> People can see our progress.
> 2> We can incrementally add more testing cases while testing the feature
under the flag.
> 3> If the patch causes problem, we don't have to revert the big
patch and submit another big patch that has the fix.
>
> Thanks,
> Manman
>
>>
>> The more you point out the reasonable complications in using such a
>> generic API for your task, the less it seems like it would be
>> sufficient/a good fit - the more I'd be inclined to go with
Eric's
>> choice of just keeping the mapping locally (in DwarfDebug, etc) &
not
>> impacting metadata APIs at all.
>

Manman Ren

2013-Jun-21 23:30 UTC

head link

[LLVMdev] Proposal: type uniquing of debug info for LTO

On Jun 21, 2013, at 4:01 PM, David Blaikie <dblaikie at gmail.com> wrote:
> On Fri, Jun 21, 2013 at 3:58 PM, Manman Ren <mren at apple.com>
wrote:
>> 
>> On Jun 21, 2013, at 3:25 PM, David Blaikie wrote:
>> 
>>> On Fri, Jun 21, 2013 at 3:16 PM, Manman Ren <mren at
apple.com> wrote:
>>>> 
>>>> On Jun 21, 2013, at 1:12 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>>>> 
>>>> On Fri, Jun 21, 2013 at 12:33 PM, Manman Ren <mren at
apple.com> wrote:
>>>> 
>>>> 
>>>> On Jun 21, 2013, at 11:56 AM, Eric Christopher wrote:
>>>> 
>>>> On Fri, Jun 21, 2013 at 11:50 AM, Manman Ren <mren at
apple.com> wrote:
>>>> 
>>>> 
>>>> On Jun 21, 2013, at 11:35 AM, Eric Christopher wrote:
>>>> 
>>>> On Thu, Jun 20, 2013 at 10:52 PM, Manman Ren <mren at
apple.com> wrote:
>>>> 
>>>> 
>>>> A summary of options for issue #3:
>>>> 3> To actually access the MDNode referenced via the hash
value, we need to
>>>> perform a lookup from the hash value to find the corresponding
MDNode.
>>>> The questions are where to store this map and how to keep it
up-to-date when
>>>> a MDNode is replaced.
>>>> ---------------------
>>>> Option a) a map in DwarfDebug, AsmWriter and DIBuilder, modify
access
>>>> functions in DI classes to pass in the map.
>>>> May need to modify printing functions in AsmWriter.
>>>> 
>>>> 
>>>> You may not need the map in DIBuilder if you piggy back on the
>>>> existing map in CGDebugInfo.cpp. The AsmWriter stuff is
unfortunate
>>>> for just debug dumps of the IR. A way around this might be
nice, but
>>>> I've not looked into it.
>>>> 
>>>> 
>>>> Calling dump() on a MDNode inside gdb will not give us the
derived type
>>>> information, since we don't have a handle to the map.
>>>> 
>>>> 
>>>> The only case where dump currently navigates a type link that I
know
>>>> is printing derived types, where we print "[from
foo]". By just
>>>> printing the string identifier for the type (without having to
>>>> navigate the link/use the map) should be about as useful here.
>>>> 
>>>> 
>>>> 
>>>> ---------------------
>>>> Option b) I am going to expand David's description with
more details,
>>>> correct me if I am wrong.
>>>> 
>>>> 
>>>> I'll let David continue this part of the thread with you.
>>>> 
>>>> 
>>>> I personally prefer option b over a since it is much cleaner.
>>>> 
>>>> 
>>>> Option b is more general and should probably work. I suggested
a
>>>> originally because I know it would work and the impact is
limited to
>>>> debug info. Also it doesn't involve touching the generic
metadata
>>>> interface or code and could be done quickly and incrementally.
b may
>>>> also require further changes to the debug info infrastructure
to deal
>>>> with lookup, etc so may end up being as much churn as a. b is
more
>>>> useful if we think other users of metadata may want/need
similar
>>>> functionality in the future.
>>>> 
>>>> I'm fine with b as a choice, but it's going to involve
more planning
>>>> and code review and involve people outside of you, me and Dave
:)
>>>> 
>>>> 
>>>> Got it. David, are you okay with the details I added about
option b?
>>>> Let's start a new thread with title "Proposal: extend
Metadata (MDNode) to
>>>> support MDHash" or something like that.
>>>> 
>>>> 
>>>> I'd like to hear a rationale for wanting b first. I
don't know (and
>>>> can think of) any other users of this functionality right now.
>>>> 
>>>> 
>>>> I will try my best :]
>>>> MDNode has a list of operands, one MDNode's full content
can be thought of
>>>> as a graph, when there is a reference to another MDNode, we
have
>>>> an edge. The MDHash can be used to give a unique id (or string)
to the full
>>>> content of the graph.
>>>> 
>>>> The hash value applies to any kind of MDNode with complicated
MDNode
>>>> references.
>>>> 
>>>> If you think this is not general enough to be in MDNode,
>>>> 
>>>> 
>>>> Eric's concern/question is whether it's a feature that
would be useful
>>>> to any other code, I believe. (I'm less concerned about
this so long
>>>> as the feature is appropriately general, even if it doesn't
have any
>>>> other consumers at the moment - I don't think it'd
drastically
>>>> complicate the Metadata handling code (which I think is
currently
>>>> fairly simple, but I admit to not knowing it in any great
detail))
>>>> 
>>>> 
>>>> Yes, I tried to explain why MDHash can help with MDNodes that
form a
>>>> complicated graph.
>>>> 
>>>> 
>>>> 
>>>> my earlier proposal of extending MDNode to DINode is going the
other
>>>> direction.
>>>> 
>>>> 
>>>> As discussed, this isn't far enough in the other direction.
If we're
>>>> going to touch core LLVM Metadata handling code at all, it
needs to be
>>>> a complete/cohesive feature, not just tendrils of a debug info
>>>> feature. Hence (a) or (b) not (c).
>>>> 
>>>> Basically, MDNode will have an opcode like the opcode for
Instruction, to
>>>> support TBAA, DI, and other types.
>>>> Inside streamers (bc ll reader), we can create the specialized
class
>>>> according to the opcode.
>>>> That requires a lot of changes too :(
>>>> 
>>>> 
>>>> We can add a weight to MDHash (other names are fine with me),
to support
>>>> multiple MDNodes with the same hash, but different weight.
>>>> 
>>>> 
>>>> Wait, what? This sounds like a bad idea.
>>>> 
>>>> Why?
>>>> The weight is needed to differentiate a forward declaration vs.
a
>>>> definition, they both name the same class, but they are
different.
>>>> 
>>>> 
>>>> This would probably tip me over into Eric's perspective
that it would
>>>> be too esoteric a feature to add to a generic/core piece of
>>>> infrastructure.
>>>> 
>>>> 
>>>> If we allow the map to map one hash value (or string) to
multiple MDNodes,
>>>> without adding the weight, it should still work.
>>>> But is that general enough?
>>> 
>>> Then there would be no transparent access through the nodes - so I
>>> don't know what the access API would look like anymore & it
would seem
>>> rather awkward as a general API.
>> 
>> The access function will have two modes:
>> 1> it does not matter which one is returned
>> (This works for verification purposes and printing purposes.)
>> 2> return all the MDNodes having the hash, DwarfDebug will figure
out which one has priority
>> 
>> It is not clean. I actually prefer adding a weight, it kind of makes
sense when we are dealing with multiple MDNodes with the same hash.
>> With weight, we can always pick one with higher weight (or priority).
>> 
>> If we veto option b, we will only have option a, which has its own
issues:
>> 1> it is not really local to debug info, we have to modify AsmWriter
at least.
> 
> I'm pretty sure I've kept saying we don't need to do this.
Printing
> asm can just print the identifier for the type node - it doesn't need
> to navigate across that link.
Okay, if we are fine with no access to the link in a gdb | lldb session.
> 
>> 2> we have to keep (and construct) one map at a few places.
>> 3> we have to modify the following access function in DI classes
>> getContext, getTypeDerivedFrom, getClassType, getContainingType,
getType
>> to take an extra map
>> and of course the call sites of all these functions.
>> 
>> Compare against putting the map in LLVMContext:
>> 1> quoting from David
>> "General principle of keeping things in the least scope necessary.
>> Reducing scope helps code
>> maintenance by making the influence of certain data is only accessed
>> within a certain scope."
>> 
>> Another issue is on how to submit the patches:
>> I prefer to submit small patches and have a transition period where we
use -gtype-uniquing to test the work in progress.
> 
> Incremental work doesn't require a flag. You can simply migrate one
> field at a time. Is there a reason that wouldn't work?
The reason I want a flag is to avoid the need to update the existing testing
cases while this is a work in progress.
I believe migrating one field means updating the existing testing cases?

Thanks,
Manman
> 
>> The advantages are
>> 1> People can see our progress.
>> 2> We can incrementally add more testing cases while testing the
feature under the flag.
>> 3> If the patch causes problem, we don't have to revert the big
patch and submit another big patch that has the fix.
>> 
>> Thanks,
>> Manman
>> 
>>> 
>>> The more you point out the reasonable complications in using such a
>>> generic API for your task, the less it seems like it would be
>>> sufficient/a good fit - the more I'd be inclined to go with
Eric's
>>> choice of just keeping the mapping locally (in DwarfDebug, etc)
& not
>>> impacting metadata APIs at all.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130621/5effac04/attachment.html>

David Blaikie

2013-Jun-21 23:41 UTC

head link

[LLVMdev] Proposal: type uniquing of debug info for LTO

> The reason I want a flag is to avoid the need to update the existing
testing
> cases while this is a work in progress.
> I believe migrating one field means updating the existing testing cases?
It would, yes - and that's how you'd get coverage to know your change
was stable. You'll have to update all the tests eventually anyway &
doing so incrementally isn't substantially more expensive in my
experience. Perhaps there's something unique to this migration that
would make it so?

Manman Ren

2013-Jun-21 23:50 UTC

head link

[LLVMdev] Proposal: type uniquing of debug info for LTO

On Jun 21, 2013, at 4:41 PM, David Blaikie <dblaikie at gmail.com> wrote:
>> The reason I want a flag is to avoid the need to update the existing
testing
>> cases while this is a work in progress.
>> I believe migrating one field means updating the existing testing
cases?
> 
> It would, yes - and that's how you'd get coverage to know your
change
> was stable. You'll have to update all the tests eventually anyway &
> doing so incrementally isn't substantially more expensive in my
> experience. Perhaps there's something unique to this migration that
> would make it so?
I am going to update the existing testing cases locally to have the test
coverage, but I don't want to check in the changes often.
Once we turn the flag on by default and remove the other path, I will submit the
changes on the testing cases.

Another point to have a flag is to check in the patches in steps: DIBuilder
changes, changes related to the map, and DwarfDebug changes.
Without the flag, when I migrate the first field, I have to make sure it works
from frontend all the way to the backend.

Hopefully that makes sense.

Thanks for all your help,
Manman
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130621/9658504b/attachment.html>

David Blaikie

2013-Jun-21 23:59 UTC

head link

[LLVMdev] Proposal: type uniquing of debug info for LTO

On Fri, Jun 21, 2013 at 4:50 PM, Manman Ren <mren at apple.com>
wrote:>
> On Jun 21, 2013, at 4:41 PM, David Blaikie <dblaikie at gmail.com>
wrote:
>
> The reason I want a flag is to avoid the need to update the existing
testing
> cases while this is a work in progress.
> I believe migrating one field means updating the existing testing cases?
>
>
> It would, yes - and that's how you'd get coverage to know your
change
> was stable. You'll have to update all the tests eventually anyway &
> doing so incrementally isn't substantially more expensive in my
> experience. Perhaps there's something unique to this migration that
> would make it so?
>
>
> I am going to update the existing testing cases locally to have the test
> coverage, but I don't want to check in the changes often.
> Once we turn the flag on by default and remove the other path, I will
submit
> the changes on the testing cases.
I'd rather skip the flag & have the test coverage in the open along
with the changes. Having changes without testing is bad. I don't see a
benefit to you keeping the test coverage locally.
> Another point to have a flag is to check in the patches in steps: DIBuilder
> changes, changes related to the map, and DwarfDebug changes.
> Without the flag, when I migrate the first field, I have to make sure it
> works from frontend all the way to the backend.
True enough, but I don't think that's a great drawback.

Though technically, if you really wanted, you could start at the
backend & make the implementation conditional (check the type of the
field, if it's an MDNode - use that, if it's a string, do the lookup)
- implement that, then port DIBuilder, then remove the conditionality.
Though I don't like it a lot & I think the changes will be small
enough that frontend to backend will be still be understandable
patches.
>
> Hopefully that makes sense.
>
> Thanks for all your help,
> Manman

Manman Ren

2013-Jun-22 03:00 UTC

head link

[LLVMdev] Proposal: type uniquing of debug info for LTO

On Jun 21, 2013, at 5:01 PM, Eric Christopher <echristo at gmail.com>
wrote:
> On Fri, Jun 21, 2013 at 4:59 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>> On Fri, Jun 21, 2013 at 4:50 PM, Manman Ren <mren at apple.com>
wrote:
>>> 
>>> On Jun 21, 2013, at 4:41 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>>> 
>>> The reason I want a flag is to avoid the need to update the
existing testing
>>> cases while this is a work in progress.
>>> I believe migrating one field means updating the existing testing
cases?
>>> 
>>> 
>>> It would, yes - and that's how you'd get coverage to know
your change
>>> was stable. You'll have to update all the tests eventually
anyway &
>>> doing so incrementally isn't substantially more expensive in my
>>> experience. Perhaps there's something unique to this migration
that
>>> would make it so?
>>> 
>>> 
>>> I am going to update the existing testing cases locally to have the
test
>>> coverage, but I don't want to check in the changes often.
>>> Once we turn the flag on by default and remove the other path, I
will submit
>>> the changes on the testing cases.
>> 
>> I'd rather skip the flag & have the test coverage in the open
along
>> with the changes. Having changes without testing is bad. I don't
see a
>> benefit to you keeping the test coverage locally.
>> 
>>> Another point to have a flag is to check in the patches in steps:
DIBuilder
>>> changes, changes related to the map, and DwarfDebug changes.
>>> Without the flag, when I migrate the first field, I have to make
sure it
>>> works from frontend all the way to the backend.
>> 
>> True enough, but I don't think that's a great drawback.
>> 
>> Though technically, if you really wanted, you could start at the
>> backend & make the implementation conditional (check the type of
the
>> field, if it's an MDNode - use that, if it's a string, do the
lookup)
>> - implement that, then port DIBuilder, then remove the conditionality.
>> Though I don't like it a lot & I think the changes will be
small
>> enough that frontend to backend will be still be understandable
>> patches.
> 
> All of this.
+2
BTW, What is considered as small enough?

Manman> 
> -eric

Manman Ren

2013-Jun-22 05:01 UTC

head link

[LLVMdev] Proposal: type uniquing of debug info for LTO

Some scoping for option a:

There are a few Verify functions
DISubprogram::Verify
DICompositeType::Verify
DIType::Verify
that try to access the context link:
  if (getContext() && !getContext().Verify())
    return false;

And the Verify functions are called from 10+ files.

If we are oaky with not calling getContext().Verify(), option a) appears much
better to me.
For printing | debugging purpose, we agree that not being able to trace the
pointer is okay.

Let me know your thoughts,
Manman

On Jun 21, 2013, at 5:01 PM, Eric Christopher wrote:
> On Fri, Jun 21, 2013 at 4:59 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>> On Fri, Jun 21, 2013 at 4:50 PM, Manman Ren <mren at apple.com>
wrote:
>>> 
>>> On Jun 21, 2013, at 4:41 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>>> 
>>> The reason I want a flag is to avoid the need to update the
existing testing
>>> cases while this is a work in progress.
>>> I believe migrating one field means updating the existing testing
cases?
>>> 
>>> 
>>> It would, yes - and that's how you'd get coverage to know
your change
>>> was stable. You'll have to update all the tests eventually
anyway &
>>> doing so incrementally isn't substantially more expensive in my
>>> experience. Perhaps there's something unique to this migration
that
>>> would make it so?
>>> 
>>> 
>>> I am going to update the existing testing cases locally to have the
test
>>> coverage, but I don't want to check in the changes often.
>>> Once we turn the flag on by default and remove the other path, I
will submit
>>> the changes on the testing cases.
>> 
>> I'd rather skip the flag & have the test coverage in the open
along
>> with the changes. Having changes without testing is bad. I don't
see a
>> benefit to you keeping the test coverage locally.
>> 
>>> Another point to have a flag is to check in the patches in steps:
DIBuilder
>>> changes, changes related to the map, and DwarfDebug changes.
>>> Without the flag, when I migrate the first field, I have to make
sure it
>>> works from frontend all the way to the backend.
>> 
>> True enough, but I don't think that's a great drawback.
>> 
>> Though technically, if you really wanted, you could start at the
>> backend & make the implementation conditional (check the type of
the
>> field, if it's an MDNode - use that, if it's a string, do the
lookup)
>> - implement that, then port DIBuilder, then remove the conditionality.
>> Though I don't like it a lot & I think the changes will be
small
>> enough that frontend to backend will be still be understandable
>> patches.
>> 
> 
> All of this.
> 
> -eric

Manman Ren

2013-Jun-25 17:13 UTC

head link

[LLVMdev] Proposal: type uniquing of debug info for LTO

On Jun 25, 2013, at 9:15 AM, David Blaikie <dblaikie at gmail.com> wrote:
> On Tue, Jun 25, 2013 at 8:59 AM, Manman Ren <mren at apple.com>
wrote:
>> 
> 
>> Any suggestion on how to move this forward?
>> My feeling is that we should not call Verify from so many files.
>> If you agree, I can try to clean up the Verify functions first.
> 
> Yes, we probably shouldn't be using Verify much. I haven't looked
at
> all the use cases though.
Verify called from the following 12 files other than DebugInfo DIBuilder Dwarf
Index: tools/opt/opt.cpp
Index: lib/Target/NVPTX/NVPTXAsmPrinter.cpp
Index: lib/Transforms/Utils/Local.cpp
Index: lib/Transforms/Instrumentation/GCOVProfiling.cpp
Index: lib/Transforms/IPO/StripSymbols.cpp
Index: lib/Transforms/IPO/DeadArgumentElimination.cpp
Index: lib/CodeGen/SelectionDAG/FastISel.cpp
Index: lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
Index: lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
Index: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
Index: lib/CodeGen/MachineInstr.cpp
Index: lib/IR/AsmWriter.cpp

A sample patch which catches all uses of Verify outside Dwarfxxx.cpp
DebugInfo.cpp DIBuilder.cpp:


We are calling Verify to make sure a MDNode is indeed a specific DI node.

> 
> If we're using it to differentiate nodes (which I think I've done
in
> one or two places) that's probably a bit dodgy & we should do
> something more explicit (based on tag values, etc).
> 
> I think the trickiest use of it (& even then it's not quite
> sufficient) is in the assembly comment printing stuff - I'm not sure
> there's a good alternative there. Treating any metadata that happens
> to have an integer first operand that happens to be a tag value is...
> unreliable at best. Not sure what to do about that.Same feeling here. But we can only parse the content of the MDNode to know
whether it is a DI node and
if yes, what type of DI node it is.
> 
>> If you prefer to communicate via lists, let me know.
> 
> Generally, yes. Discussion should happen in the open unless there's a
> need for it to be private.
> 
>> As to the MDNode interface, I think debug info is the biggest user,
other usage is quite simple, such as TBAA.
> 
> I'm not sure what you're referring to here.
I am referring to the fact nobody else had comments on our proposal of modifying
MDNode :]
That probably means we are the ones who care the most about MDNode?

Thanks,
Manman
> 
>> 
>> Thanks,
>> Manman
>> 
>> On Jun 21, 2013, at 10:01 PM, Manman Ren wrote:
>> 
>>> 
>>> Some scoping for option a:
>>> 
>>> There are a few Verify functions
>>> DISubprogram::Verify
>>> DICompositeType::Verify
>>> DIType::Verify
>>> that try to access the context link:
>>> if (getContext() && !getContext().Verify())
>>>   return false;
>>> 
>>> And the Verify functions are called from 10+ files.
>>> 
>>> If we are oaky with not calling getContext().Verify(), option a)
appears much better to me.
>>> For printing | debugging purpose, we agree that not being able to
trace the pointer is okay.
>>> 
>>> Let me know your thoughts,
>>> Manman
>>> 
>>> On Jun 21, 2013, at 5:01 PM, Eric Christopher wrote:
>>> 
>>>> On Fri, Jun 21, 2013 at 4:59 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>>>>> On Fri, Jun 21, 2013 at 4:50 PM, Manman Ren <mren at
apple.com> wrote:
>>>>>> 
>>>>>> On Jun 21, 2013, at 4:41 PM, David Blaikie <dblaikie
at gmail.com> wrote:
>>>>>> 
>>>>>> The reason I want a flag is to avoid the need to update
the existing testing
>>>>>> cases while this is a work in progress.
>>>>>> I believe migrating one field means updating the
existing testing cases?
>>>>>> 
>>>>>> 
>>>>>> It would, yes - and that's how you'd get
coverage to know your change
>>>>>> was stable. You'll have to update all the tests
eventually anyway &
>>>>>> doing so incrementally isn't substantially more
expensive in my
>>>>>> experience. Perhaps there's something unique to
this migration that
>>>>>> would make it so?
>>>>>> 
>>>>>> 
>>>>>> I am going to update the existing testing cases locally
to have the test
>>>>>> coverage, but I don't want to check in the changes
often.
>>>>>> Once we turn the flag on by default and remove the
other path, I will submit
>>>>>> the changes on the testing cases.
>>>>> 
>>>>> I'd rather skip the flag & have the test coverage
in the open along
>>>>> with the changes. Having changes without testing is bad. I
don't see a
>>>>> benefit to you keeping the test coverage locally.
>>>>> 
>>>>>> Another point to have a flag is to check in the patches
in steps: DIBuilder
>>>>>> changes, changes related to the map, and DwarfDebug
changes.
>>>>>> Without the flag, when I migrate the first field, I
have to make sure it
>>>>>> works from frontend all the way to the backend.
>>>>> 
>>>>> True enough, but I don't think that's a great
drawback.
>>>>> 
>>>>> Though technically, if you really wanted, you could start
at the
>>>>> backend & make the implementation conditional (check
the type of the
>>>>> field, if it's an MDNode - use that, if it's a
string, do the lookup)
>>>>> - implement that, then port DIBuilder, then remove the
conditionality.
>>>>> Though I don't like it a lot & I think the changes
will be small
>>>>> enough that frontend to backend will be still be
understandable
>>>>> patches.
>>>>> 
>>>> 
>>>> All of this.
>>>> 
>>>> -eric
>>> 
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130625/bde0e8d6/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: get_verify.patch
Type: application/octet-stream
Size: 8650 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130625/bde0e8d6/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130625/bde0e8d6/attachment-0001.html>

David Blaikie

2013-Jun-25 18:35 UTC

head link

[LLVMdev] Proposal: type uniquing of debug info for LTO

On Tue, Jun 25, 2013 at 10:13 AM, Manman Ren <mren at apple.com>
wrote:>
> On Jun 25, 2013, at 9:15 AM, David Blaikie <dblaikie at gmail.com>
wrote:
>
> On Tue, Jun 25, 2013 at 8:59 AM, Manman Ren <mren at apple.com>
wrote:
>
>
> Any suggestion on how to move this forward?
> My feeling is that we should not call Verify from so many files.
> If you agree, I can try to clean up the Verify functions first.
>
>
> Yes, we probably shouldn't be using Verify much. I haven't looked
at
> all the use cases though.
>
>
> Verify called from the following 12 files other than DebugInfo DIBuilder
> Dwarf
> Index: tools/opt/opt.cpp
> Index: lib/Target/NVPTX/NVPTXAsmPrinter.cpp
> Index: lib/Transforms/Utils/Local.cpp
> Index: lib/Transforms/Instrumentation/GCOVProfiling.cpp
> Index: lib/Transforms/IPO/StripSymbols.cpp
> Index: lib/Transforms/IPO/DeadArgumentElimination.cpp
> Index: lib/CodeGen/SelectionDAG/FastISel.cpp
> Index: lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
> Index: lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
> Index: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
> Index: lib/CodeGen/MachineInstr.cpp
> Index: lib/IR/AsmWriter.cpp
>
> A sample patch which catches all uses of Verify outside Dwarfxxx.cpp
> DebugInfo.cpp DIBuilder.cpp:
I don't really understand this patch.

1) does Verify really need the map at all? Could we look at the cases
where Verify would use the map & decide whether we could just not
bother with those cases?
2) even if we were going to have such an API, why not have a default
value for the parameter so we don't have to update existing callers?
3) point below \/ is that we should probably remove these Verify calls
(except for assembly printing) - debug values & debug info metadata
should be reasonably assumed to be valid if it's accessed through the
relevant entry points (navigating from debug info for a
function/variable, from a dbg_value/declare intrinsic, etc... we
shouldn't be conditionalizing based on the verification of these
values) - at some point we'd want to add an up-front verifier that
checks these invariants in one go (& doesn't bother with it when the
IR was generated from a trusted front end, in the same way don't use
the IR verifier in those cases).
>
>
> We are calling Verify to make sure a MDNode is indeed a specific DI node.
>
>
>
> If we're using it to differentiate nodes (which I think I've done
in
> one or two places) that's probably a bit dodgy & we should do
> something more explicit (based on tag values, etc).
>
> I think the trickiest use of it (& even then it's not quite
> sufficient) is in the assembly comment printing stuff - I'm not sure
> there's a good alternative there. Treating any metadata that happens
> to have an integer first operand that happens to be a tag value is...
> unreliable at best. Not sure what to do about that.
>
> Same feeling here. But we can only parse the content of the MDNode to know
> whether it is a DI node and
> if yes, what type of DI node it is.
Sure - but the question is how much parsing do we need to do. When I
removed the debug info versioning we started getting a lot more
conflicts with accidentally annotating non-debug metadata as debug
metadata (you can see this firing on some of the line info debug
nodes, for example). One possibility is we could restore something
like that - a 'unique' number to make it sufficiently sane to just
examine the number to decide if we're looking at a debug info node.

I'm not sure if that's the right answer.

In any case this problem should only apply to assembly printing - it
doesn't explain why other parts of the codebase need to use Verify at
all. Those parts of the codebase actually trying to handle debug info
should be handling valid debug info to begin with - we shouldn't need
to call "Verify" all over the place & behave differently (at most
it
should be an assert - but it should probably just move entirely to a
verifier like the IR verifier we already have).
>
>
> If you prefer to communicate via lists, let me know.
>
>
> Generally, yes. Discussion should happen in the open unless there's a
> need for it to be private.
>
> As to the MDNode interface, I think debug info is the biggest user, other
> usage is quite simple, such as TBAA.
>
>
> I'm not sure what you're referring to here.
>
>
> I am referring to the fact nobody else had comments on our proposal of
> modifying MDNode :]
> That probably means we are the ones who care the most about MDNode?
Probably - but it helps to explain to the community what we're doing &
why we're doing it so it's not a surprise when they start seeing
commits.

- David
>
> Thanks,
> Manman
>
>
>
> Thanks,
> Manman
>
> On Jun 21, 2013, at 10:01 PM, Manman Ren wrote:
>
>
> Some scoping for option a:
>
> There are a few Verify functions
> DISubprogram::Verify
> DICompositeType::Verify
> DIType::Verify
> that try to access the context link:
> if (getContext() && !getContext().Verify())
>   return false;
>
> And the Verify functions are called from 10+ files.
>
> If we are oaky with not calling getContext().Verify(), option a) appears
> much better to me.
> For printing | debugging purpose, we agree that not being able to trace the
> pointer is okay.
>
> Let me know your thoughts,
> Manman
>
> On Jun 21, 2013, at 5:01 PM, Eric Christopher wrote:
>
> On Fri, Jun 21, 2013 at 4:59 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>
> On Fri, Jun 21, 2013 at 4:50 PM, Manman Ren <mren at apple.com>
wrote:
>
>
> On Jun 21, 2013, at 4:41 PM, David Blaikie <dblaikie at gmail.com>
wrote:
>
> The reason I want a flag is to avoid the need to update the existing
testing
> cases while this is a work in progress.
> I believe migrating one field means updating the existing testing cases?
>
>
> It would, yes - and that's how you'd get coverage to know your
change
> was stable. You'll have to update all the tests eventually anyway &
> doing so incrementally isn't substantially more expensive in my
> experience. Perhaps there's something unique to this migration that
> would make it so?
>
>
> I am going to update the existing testing cases locally to have the test
> coverage, but I don't want to check in the changes often.
> Once we turn the flag on by default and remove the other path, I will
submit
> the changes on the testing cases.
>
>
> I'd rather skip the flag & have the test coverage in the open along
> with the changes. Having changes without testing is bad. I don't see a
> benefit to you keeping the test coverage locally.
>
> Another point to have a flag is to check in the patches in steps: DIBuilder
> changes, changes related to the map, and DwarfDebug changes.
> Without the flag, when I migrate the first field, I have to make sure it
> works from frontend all the way to the backend.
>
>
> True enough, but I don't think that's a great drawback.
>
> Though technically, if you really wanted, you could start at the
> backend & make the implementation conditional (check the type of the
> field, if it's an MDNode - use that, if it's a string, do the
lookup)
> - implement that, then port DIBuilder, then remove the conditionality.
> Though I don't like it a lot & I think the changes will be small
> enough that frontend to backend will be still be understandable
> patches.
>
>
> All of this.
>
> -eric
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
>

Manman Ren

2013-Jun-25 19:05 UTC

head link

[LLVMdev] Proposal: type uniquing of debug info for LTO

On Jun 25, 2013, at 11:35 AM, David Blaikie wrote:
> On Tue, Jun 25, 2013 at 10:13 AM, Manman Ren <mren at apple.com>
wrote:
>> 
>> On Jun 25, 2013, at 9:15 AM, David Blaikie <dblaikie at
gmail.com> wrote:
>> 
>> On Tue, Jun 25, 2013 at 8:59 AM, Manman Ren <mren at apple.com>
wrote:
>> 
>> 
>> Any suggestion on how to move this forward?
>> My feeling is that we should not call Verify from so many files.
>> If you agree, I can try to clean up the Verify functions first.
>> 
>> 
>> Yes, we probably shouldn't be using Verify much. I haven't
looked at
>> all the use cases though.
>> 
>> 
>> Verify called from the following 12 files other than DebugInfo
DIBuilder
>> Dwarf
>> Index: tools/opt/opt.cpp
>> Index: lib/Target/NVPTX/NVPTXAsmPrinter.cpp
>> Index: lib/Transforms/Utils/Local.cpp
>> Index: lib/Transforms/Instrumentation/GCOVProfiling.cpp
>> Index: lib/Transforms/IPO/StripSymbols.cpp
>> Index: lib/Transforms/IPO/DeadArgumentElimination.cpp
>> Index: lib/CodeGen/SelectionDAG/FastISel.cpp
>> Index: lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
>> Index: lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
>> Index: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
>> Index: lib/CodeGen/MachineInstr.cpp
>> Index: lib/IR/AsmWriter.cpp
>> 
>> A sample patch which catches all uses of Verify outside Dwarfxxx.cpp
>> DebugInfo.cpp DIBuilder.cpp:
> 
> I don't really understand this patch.
> 
> 1) does Verify really need the map at all? Could we look at the cases
> where Verify would use the map & decide whether we could just not
> bother with those cases?
> 2) even if we were going to have such an API, why not have a default
> value for the parameter so we don't have to update existing callers?
> 3) point below \/ is that we should probably remove these Verify calls
> (except for assembly printing) - debug values & debug info metadata
> should be reasonably assumed to be valid if it's accessed through the
> relevant entry points (navigating from debug info for a
> function/variable, from a dbg_value/declare intrinsic, etc... we
> shouldn't be conditionalizing based on the verification of these
> values) - at some point we'd want to add an up-front verifier that
> checks these invariants in one go (& doesn't bother with it when
the
> IR was generated from a trusted front end, in the same way don't use
> the IR verifier in those cases).
Sorry for the confusion, I was just trying to list all source locations that use
Verify.
It is not a valid patch and I am not proposing to change the API for Verify.
> 
>> 
>> 
>> We are calling Verify to make sure a MDNode is indeed a specific DI
node.
>> 
>> 
>> 
>> If we're using it to differentiate nodes (which I think I've
done in
>> one or two places) that's probably a bit dodgy & we should do
>> something more explicit (based on tag values, etc).
>> 
>> I think the trickiest use of it (& even then it's not quite
>> sufficient) is in the assembly comment printing stuff - I'm not
sure
>> there's a good alternative there. Treating any metadata that
happens
>> to have an integer first operand that happens to be a tag value is...
>> unreliable at best. Not sure what to do about that.
>> 
>> Same feeling here. But we can only parse the content of the MDNode to
know
>> whether it is a DI node and
>> if yes, what type of DI node it is.
> 
> Sure - but the question is how much parsing do we need to do. When I
> removed the debug info versioning we started getting a lot more
> conflicts with accidentally annotating non-debug metadata as debug
> metadata (you can see this firing on some of the line info debug
> nodes, for example). One possibility is we could restore something
> like that - a 'unique' number to make it sufficiently sane to just
> examine the number to decide if we're looking at a debug info node.Sounds like a good idea to me. One number to enumerate all possible MD Nodes.
> 
> I'm not sure if that's the right answer.
> 
> In any case this problem should only apply to assembly printing - it
> doesn't explain why other parts of the codebase need to use Verify at
> all.
Totally agree. We should just be able to use isSubprogram in the following
example.

--- tools/opt/opt.cpp   (revision 183537)
+++ tools/opt/opt.cpp   (working copy)
@@ -389,7 +389,7 @@
       for (unsigned i = 0, e = NMD->getNumOperands(); i != e; ++i) {
         std::string Name;
         DISubprogram SP(NMD->getOperand(i));
-        if (SP.Verify())
+        if (SP.Verify(0/*MAP*/))
           getContextName(SP.getContext(), Name);
         Name = Name + SP.getDisplayName().str();
         if (!Name.empty() && Processed.insert(Name)) {

It should be enough to just check isSubprogram(), instead of calling Verify, I
guess the purpose
of calling Verify is to perform error checking, such as number of operands.

/// Verify - Verify that a subprogram descriptor is well formed.
bool DISubprogram::Verify(int MAP) const {
  if (!isSubprogram())
    return false;
  if (getContext() && !getContext().Verify())
    return false;
  DICompositeType Ty = getType();
  if (!Ty.Verify(MAP))
    return false;
  return DbgNode->getNumOperands() == 20;
}

> Those parts of the codebase actually trying to handle debug info
> should be handling valid debug info to begin with - we shouldn't need
> to call "Verify" all over the place & behave differently (at
most it
> should be an assert - but it should probably just move entirely to a
> verifier like the IR verifier we already have).
Agree.
The current code base uses Verify to decide what type of DI node we are
handling, as given in the above example.

Thanks,
Manman
> 
>> 
>> 
>> If you prefer to communicate via lists, let me know.
>> 
>> 
>> Generally, yes. Discussion should happen in the open unless there's
a
>> need for it to be private.
>> 
>> As to the MDNode interface, I think debug info is the biggest user,
other
>> usage is quite simple, such as TBAA.
>> 
>> 
>> I'm not sure what you're referring to here.
>> 
>> 
>> I am referring to the fact nobody else had comments on our proposal of
>> modifying MDNode :]
>> That probably means we are the ones who care the most about MDNode?
> 
> Probably - but it helps to explain to the community what we're doing
&
> why we're doing it so it's not a surprise when they start seeing
> commits.
> 
> - David
> 
>> 
>> Thanks,
>> Manman
>> 
>> 
>> 
>> Thanks,
>> Manman
>> 
>> On Jun 21, 2013, at 10:01 PM, Manman Ren wrote:
>> 
>> 
>> Some scoping for option a:
>> 
>> There are a few Verify functions
>> DISubprogram::Verify
>> DICompositeType::Verify
>> DIType::Verify
>> that try to access the context link:
>> if (getContext() && !getContext().Verify())
>>  return false;
>> 
>> And the Verify functions are called from 10+ files.
>> 
>> If we are oaky with not calling getContext().Verify(), option a)
appears
>> much better to me.
>> For printing | debugging purpose, we agree that not being able to trace
the
>> pointer is okay.
>> 
>> Let me know your thoughts,
>> Manman
>> 
>> On Jun 21, 2013, at 5:01 PM, Eric Christopher wrote:
>> 
>> On Fri, Jun 21, 2013 at 4:59 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>> 
>> On Fri, Jun 21, 2013 at 4:50 PM, Manman Ren <mren at apple.com>
wrote:
>> 
>> 
>> On Jun 21, 2013, at 4:41 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>> 
>> The reason I want a flag is to avoid the need to update the existing
testing
>> cases while this is a work in progress.
>> I believe migrating one field means updating the existing testing
cases?
>> 
>> 
>> It would, yes - and that's how you'd get coverage to know your
change
>> was stable. You'll have to update all the tests eventually anyway
&
>> doing so incrementally isn't substantially more expensive in my
>> experience. Perhaps there's something unique to this migration that
>> would make it so?
>> 
>> 
>> I am going to update the existing testing cases locally to have the
test
>> coverage, but I don't want to check in the changes often.
>> Once we turn the flag on by default and remove the other path, I will
submit
>> the changes on the testing cases.
>> 
>> 
>> I'd rather skip the flag & have the test coverage in the open
along
>> with the changes. Having changes without testing is bad. I don't
see a
>> benefit to you keeping the test coverage locally.
>> 
>> Another point to have a flag is to check in the patches in steps:
DIBuilder
>> changes, changes related to the map, and DwarfDebug changes.
>> Without the flag, when I migrate the first field, I have to make sure
it
>> works from frontend all the way to the backend.
>> 
>> 
>> True enough, but I don't think that's a great drawback.
>> 
>> Though technically, if you really wanted, you could start at the
>> backend & make the implementation conditional (check the type of
the
>> field, if it's an MDNode - use that, if it's a string, do the
lookup)
>> - implement that, then port DIBuilder, then remove the conditionality.
>> Though I don't like it a lot & I think the changes will be
small
>> enough that frontend to backend will be still be understandable
>> patches.
>> 
>> 
>> All of this.
>> 
>> -eric
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> 
>> 
>>

Eric Christopher

2013-Jun-25 22:53 UTC

head link

[LLVMdev] Proposal: type uniquing of debug info for LTO

> I am referring to the fact nobody else had comments on our proposal of
> modifying MDNode :]
> That probably means we are the ones who care the most about MDNode?
>
Undoubtedly :)

-eric
> Thanks,
> Manman
>
>
>
> Thanks,
> Manman
>
> On Jun 21, 2013, at 10:01 PM, Manman Ren wrote:
>
>
> Some scoping for option a:
>
> There are a few Verify functions
> DISubprogram::Verify
> DICompositeType::Verify
> DIType::Verify
> that try to access the context link:
> if (getContext() && !getContext().Verify())
>   return false;
>
> And the Verify functions are called from 10+ files.
>
> If we are oaky with not calling getContext().Verify(), option a) appears
> much better to me.
> For printing | debugging purpose, we agree that not being able to trace the
> pointer is okay.
>
> Let me know your thoughts,
> Manman
>
> On Jun 21, 2013, at 5:01 PM, Eric Christopher wrote:
>
> On Fri, Jun 21, 2013 at 4:59 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>
> On Fri, Jun 21, 2013 at 4:50 PM, Manman Ren <mren at apple.com>
wrote:
>
>
> On Jun 21, 2013, at 4:41 PM, David Blaikie <dblaikie at gmail.com>
wrote:
>
> The reason I want a flag is to avoid the need to update the existing
testing
> cases while this is a work in progress.
> I believe migrating one field means updating the existing testing cases?
>
>
> It would, yes - and that's how you'd get coverage to know your
change
> was stable. You'll have to update all the tests eventually anyway &
> doing so incrementally isn't substantially more expensive in my
> experience. Perhaps there's something unique to this migration that
> would make it so?
>
>
> I am going to update the existing testing cases locally to have the test
> coverage, but I don't want to check in the changes often.
> Once we turn the flag on by default and remove the other path, I will
submit
> the changes on the testing cases.
>
>
> I'd rather skip the flag & have the test coverage in the open along
> with the changes. Having changes without testing is bad. I don't see a
> benefit to you keeping the test coverage locally.
>
> Another point to have a flag is to check in the patches in steps: DIBuilder
> changes, changes related to the map, and DwarfDebug changes.
> Without the flag, when I migrate the first field, I have to make sure it
> works from frontend all the way to the backend.
>
>
> True enough, but I don't think that's a great drawback.
>
> Though technically, if you really wanted, you could start at the
> backend & make the implementation conditional (check the type of the
> field, if it's an MDNode - use that, if it's a string, do the
lookup)
> - implement that, then port DIBuilder, then remove the conditionality.
> Though I don't like it a lot & I think the changes will be small
> enough that frontend to backend will be still be understandable
> patches.
>
>
> All of this.
>
> -eric
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
>

Eric Christopher

2013-Jun-25 22:56 UTC

head link

[LLVMdev] Proposal: type uniquing of debug info for LTO

>> Those parts of the codebase actually trying to handle debug info
>> should be handling valid debug info to begin with - we shouldn't
need
>> to call "Verify" all over the place & behave differently
(at most it
>> should be an assert - but it should probably just move entirely to a
>> verifier like the IR verifier we already have).
>
> Agree.
> The current code base uses Verify to decide what type of DI node we are
handling, as given in the above example.
>
Agreed, those cases should be just checking the type explicitly. I
don't think anything needs to be done to the various verify methods
(other than update them slightly for types if we want to verify
subtypes) with the patch, but getting our usage of Verify straightened
out would be general goodness.

Thanks!

-eric
> Thanks,
> Manman
>
>>
>>>
>>>
>>> If you prefer to communicate via lists, let me know.
>>>
>>>
>>> Generally, yes. Discussion should happen in the open unless
there's a
>>> need for it to be private.
>>>
>>> As to the MDNode interface, I think debug info is the biggest user,
other
>>> usage is quite simple, such as TBAA.
>>>
>>>
>>> I'm not sure what you're referring to here.
>>>
>>>
>>> I am referring to the fact nobody else had comments on our proposal
of
>>> modifying MDNode :]
>>> That probably means we are the ones who care the most about MDNode?
>>
>> Probably - but it helps to explain to the community what we're
doing &
>> why we're doing it so it's not a surprise when they start
seeing
>> commits.
>>
>> - David
>>
>>>
>>> Thanks,
>>> Manman
>>>
>>>
>>>
>>> Thanks,
>>> Manman
>>>
>>> On Jun 21, 2013, at 10:01 PM, Manman Ren wrote:
>>>
>>>
>>> Some scoping for option a:
>>>
>>> There are a few Verify functions
>>> DISubprogram::Verify
>>> DICompositeType::Verify
>>> DIType::Verify
>>> that try to access the context link:
>>> if (getContext() && !getContext().Verify())
>>>  return false;
>>>
>>> And the Verify functions are called from 10+ files.
>>>
>>> If we are oaky with not calling getContext().Verify(), option a)
appears
>>> much better to me.
>>> For printing | debugging purpose, we agree that not being able to
trace the
>>> pointer is okay.
>>>
>>> Let me know your thoughts,
>>> Manman
>>>
>>> On Jun 21, 2013, at 5:01 PM, Eric Christopher wrote:
>>>
>>> On Fri, Jun 21, 2013 at 4:59 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>>>
>>> On Fri, Jun 21, 2013 at 4:50 PM, Manman Ren <mren at
apple.com> wrote:
>>>
>>>
>>> On Jun 21, 2013, at 4:41 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>>>
>>> The reason I want a flag is to avoid the need to update the
existing testing
>>> cases while this is a work in progress.
>>> I believe migrating one field means updating the existing testing
cases?
>>>
>>>
>>> It would, yes - and that's how you'd get coverage to know
your change
>>> was stable. You'll have to update all the tests eventually
anyway &
>>> doing so incrementally isn't substantially more expensive in my
>>> experience. Perhaps there's something unique to this migration
that
>>> would make it so?
>>>
>>>
>>> I am going to update the existing testing cases locally to have the
test
>>> coverage, but I don't want to check in the changes often.
>>> Once we turn the flag on by default and remove the other path, I
will submit
>>> the changes on the testing cases.
>>>
>>>
>>> I'd rather skip the flag & have the test coverage in the
open along
>>> with the changes. Having changes without testing is bad. I
don't see a
>>> benefit to you keeping the test coverage locally.
>>>
>>> Another point to have a flag is to check in the patches in steps:
DIBuilder
>>> changes, changes related to the map, and DwarfDebug changes.
>>> Without the flag, when I migrate the first field, I have to make
sure it
>>> works from frontend all the way to the backend.
>>>
>>>
>>> True enough, but I don't think that's a great drawback.
>>>
>>> Though technically, if you really wanted, you could start at the
>>> backend & make the implementation conditional (check the type
of the
>>> field, if it's an MDNode - use that, if it's a string, do
the lookup)
>>> - implement that, then port DIBuilder, then remove the
conditionality.
>>> Though I don't like it a lot & I think the changes will be
small
>>> enough that frontend to backend will be still be understandable
>>> patches.
>>>
>>>
>>> All of this.
>>>
>>> -eric
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>>>
>

Manman Ren

2013-Jun-26 21:28 UTC

head link

[LLVMdev] Proposal: type uniquing of debug info for LTO

First cleanup patch:
r185020

Let me know if you have questions.

Thanks,
Manman

On Jun 25, 2013, at 3:56 PM, Eric Christopher wrote:
>>> Those parts of the codebase actually trying to handle debug info
>>> should be handling valid debug info to begin with - we
shouldn't need
>>> to call "Verify" all over the place & behave
differently (at most it
>>> should be an assert - but it should probably just move entirely to
a
>>> verifier like the IR verifier we already have).
>> 
>> Agree.
>> The current code base uses Verify to decide what type of DI node we are
handling, as given in the above example.
>> 
> 
> Agreed, those cases should be just checking the type explicitly. I
> don't think anything needs to be done to the various verify methods
> (other than update them slightly for types if we want to verify
> subtypes) with the patch, but getting our usage of Verify straightened
> out would be general goodness.
> 
> Thanks!
> 
> -eric
> 
>> Thanks,
>> Manman
>> 
>>> 
>>>> 
>>>> 
>>>> If you prefer to communicate via lists, let me know.
>>>> 
>>>> 
>>>> Generally, yes. Discussion should happen in the open unless
there's a
>>>> need for it to be private.
>>>> 
>>>> As to the MDNode interface, I think debug info is the biggest
user, other
>>>> usage is quite simple, such as TBAA.
>>>> 
>>>> 
>>>> I'm not sure what you're referring to here.
>>>> 
>>>> 
>>>> I am referring to the fact nobody else had comments on our
proposal of
>>>> modifying MDNode :]
>>>> That probably means we are the ones who care the most about
MDNode?
>>> 
>>> Probably - but it helps to explain to the community what we're
doing &
>>> why we're doing it so it's not a surprise when they start
seeing
>>> commits.
>>> 
>>> - David
>>> 
>>>> 
>>>> Thanks,
>>>> Manman
>>>> 
>>>> 
>>>> 
>>>> Thanks,
>>>> Manman
>>>> 
>>>> On Jun 21, 2013, at 10:01 PM, Manman Ren wrote:
>>>> 
>>>> 
>>>> Some scoping for option a:
>>>> 
>>>> There are a few Verify functions
>>>> DISubprogram::Verify
>>>> DICompositeType::Verify
>>>> DIType::Verify
>>>> that try to access the context link:
>>>> if (getContext() && !getContext().Verify())
>>>> return false;
>>>> 
>>>> And the Verify functions are called from 10+ files.
>>>> 
>>>> If we are oaky with not calling getContext().Verify(), option
a) appears
>>>> much better to me.
>>>> For printing | debugging purpose, we agree that not being able
to trace the
>>>> pointer is okay.
>>>> 
>>>> Let me know your thoughts,
>>>> Manman
>>>> 
>>>> On Jun 21, 2013, at 5:01 PM, Eric Christopher wrote:
>>>> 
>>>> On Fri, Jun 21, 2013 at 4:59 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>>>> 
>>>> On Fri, Jun 21, 2013 at 4:50 PM, Manman Ren <mren at
apple.com> wrote:
>>>> 
>>>> 
>>>> On Jun 21, 2013, at 4:41 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>>>> 
>>>> The reason I want a flag is to avoid the need to update the
existing testing
>>>> cases while this is a work in progress.
>>>> I believe migrating one field means updating the existing
testing cases?
>>>> 
>>>> 
>>>> It would, yes - and that's how you'd get coverage to
know your change
>>>> was stable. You'll have to update all the tests eventually
anyway &
>>>> doing so incrementally isn't substantially more expensive
in my
>>>> experience. Perhaps there's something unique to this
migration that
>>>> would make it so?
>>>> 
>>>> 
>>>> I am going to update the existing testing cases locally to have
the test
>>>> coverage, but I don't want to check in the changes often.
>>>> Once we turn the flag on by default and remove the other path,
I will submit
>>>> the changes on the testing cases.
>>>> 
>>>> 
>>>> I'd rather skip the flag & have the test coverage in
the open along
>>>> with the changes. Having changes without testing is bad. I
don't see a
>>>> benefit to you keeping the test coverage locally.
>>>> 
>>>> Another point to have a flag is to check in the patches in
steps: DIBuilder
>>>> changes, changes related to the map, and DwarfDebug changes.
>>>> Without the flag, when I migrate the first field, I have to
make sure it
>>>> works from frontend all the way to the backend.
>>>> 
>>>> 
>>>> True enough, but I don't think that's a great drawback.
>>>> 
>>>> Though technically, if you really wanted, you could start at
the
>>>> backend & make the implementation conditional (check the
type of the
>>>> field, if it's an MDNode - use that, if it's a string,
do the lookup)
>>>> - implement that, then port DIBuilder, then remove the
conditionality.
>>>> Though I don't like it a lot & I think the changes will
be small
>>>> enough that frontend to backend will be still be understandable
>>>> patches.
>>>> 
>>>> 
>>>> All of this.
>>>> 
>>>> -eric
>>>> 
>>>> 
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>> 
>>>> 
>>>> 
>>

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Jun 2013 - [LLVMdev] Proposal: type uniquing of debug info for LTO

[LLVMdev] Proposal: type uniquing of debug info for LTO

[LLVMdev] Proposal: type uniquing of debug info for LTO

[LLVMdev] Proposal: type uniquing of debug info for LTO

[LLVMdev] Proposal: type uniquing of debug info for LTO

[LLVMdev] Proposal: type uniquing of debug info for LTO

[LLVMdev] Proposal: type uniquing of debug info for LTO

[LLVMdev] Proposal: type uniquing of debug info for LTO

[LLVMdev] Proposal: type uniquing of debug info for LTO

[LLVMdev] Proposal: type uniquing of debug info for LTO

[LLVMdev] Proposal: type uniquing of debug info for LTO

[LLVMdev] Proposal: type uniquing of debug info for LTO

[LLVMdev] Proposal: type uniquing of debug info for LTO

[LLVMdev] Proposal: type uniquing of debug info for LTO

[LLVMdev] Proposal: type uniquing of debug info for LTO

[LLVMdev] Proposal: type uniquing of debug info for LTO

[LLVMdev] Proposal: type uniquing of debug info for LTO

[LLVMdev] Proposal: type uniquing of debug info for LTO

Apparently Analagous Threads