thr3ads.net - llvm dev - [LLVMdev] [Debug Info PATCH] for support of ref_addr and removal of DIE duplication [Oct 2013]

If this information is useful, please help other people find it:
Share via:

David Blaikie

2013-Oct-15 18:34 UTC

[LLVMdev] [Debug Info PATCH] for support of ref_addr and removal of DIE duplication

On Tue, Oct 15, 2013 at 10:51 AM, Manman Ren <manman.ren at gmail.com>
wrote:
>
>
>
> On Tue, Oct 15, 2013 at 10:10 AM, David Blaikie <dblaikie at
gmail.com>wrote:
>
>>
>>
>>
>> On Tue, Oct 15, 2013 at 10:05 AM, Manman Ren <manman.ren at
gmail.com>wrote:
>>
>>>
>>>
>>>
>>> On Wed, Oct 9, 2013 at 5:22 PM, Manman Ren <manman.ren at
gmail.com> wrote:
>>>
>>>>
>>>>
>>>>
>>>> On Wed, Oct 9, 2013 at 1:32 PM, Manman Ren <manman.ren at
gmail.com>wrote:
>>>>
>>>>>
>>>>> David,
>>>>>
>>>>> Thanks for reviewing!
>>>>>
>>>>> On Wed, Oct 9, 2013 at 11:36 AM, David Blaikie <dblaikie
at gmail.com>wrote:
>>>>>
>>>>>> Might be easier if these were on Phabricator, but here
are some
>>>>>> thoughts:
>>>>>>
>>>>>> 0001:
>>>>>>   This patch generally, while separated for legibility,
is untested &
>>>>>> difficult to discuss in isolation.
>>>>>>
>>>>> I agree, this patch adds the functionality but does not use
it, the
>>>>> 2nd patch uses ref_addr.
>>>>> If you think I should merge the two and commit as a single
patch, let
>>>>> me know.
>>>>>
>>>>>
>>>>>>  I may need to refer to the second patch in reviewing
this first one.
>>>>>>   DwarfDebug.cpp:
>>>>>>     computeSizeAndOffsets:
>>>>>>       I believe this produces the wrong offset for the
3rd CU and
>>>>>> onwards. computeSizeAndOffset returns the EndOffset
which is absolute, not
>>>>>> relative to the Offset passed in, so it should be
assigned to SecOffset,
>>>>>> not added to it. (eg: if you have CUs at 0, 42, and 57
- the first pass
>>>>>> through SecOffset will be zero, then it'll be 0 +
42, then on the 3rd it'll
>>>>>> be (0 + 42) + 57 which isn't right, it should just
be 57). Please add more
>>>>>> test coverage while fixing this issue.
>>>>>>
>>>>>
>>>>> computeSizeAndOffset takes an offset that is relative to
the start of
>>>>> the CU and returns the offset relative to the CU after
laying out the DIE.
>>>>> The initial offset before laying out the CU DIE is the
header size,
>>>>> EndOffset will be the offset relative to the CU after
laying out the whole
>>>>> CU DIE.
>>>>> We can think of EndOffset as the size of the whole CU DIE.
SecOffset
>>>>> is the offset from the Debug Info section, so we can update
it by adding
>>>>> the CU size.
>>>>>
>>>>>   // Offset from the beginning of debug info section.
>>>>>   unsigned SecOffset = 0;
>>>>>   for (SmallVectorImpl<CompileUnit *>::iterator I =
CUs.begin(),
>>>>>          E = CUs.end(); I != E; ++I) {
>>>>>     (*I)->setDebugInfoOffset(SecOffset);
>>>>>      unsigned Offset >>>>>      
sizeof(int32_t) + // Length of Compilation Unit Info
>>>>>       sizeof(int16_t) + // DWARF version number
>>>>>       sizeof(int32_t) + // Offset Into Abbrev. Section
>>>>>       sizeof(int8_t);   // Pointer Size (in bytes)
>>>>>
>>>>>     unsigned EndOffset =
computeSizeAndOffset((*I)->getCUDie(),
>>>>> Offset);
>>>>>     SecOffset += EndOffset;
>>>>>   }
>>>>>
>>>>
>>>> Added more comments in attached patch.
>>>>
>>>>
>>>>>
>>>>>
>>>>>> Eric/Manman: rough design question: compute the
absolute offset of
>>>>>> each CU within the debug_info section and describe them
all relative to a
>>>>>> single symbol at the start of the debug_info section,
or should we put a
>>>>>> label at the start of each CU?
>>>>>>
>>>>>
>>>>> Either way should work. But since we already have the
section offset
>>>>> for each CU, describing relative to the start of debug_info
section saves
>>>>> us a few symbols :)
>>>>>
>>>>>
>>>>>>
>>>>>> 0002:
>>>>>>   ref_addr_relocation.ll:
>>>>>>     seems a bit vague in terms of how you test for the
relocation. I
>>>>>> think it'd make more sense to test the assembly,
than the reafobj output,
>>>>>> that way you can test that the correct bytes have the
relocation rather
>>>>>> than just that there's "some" .debug_info
relocation in the file. For an
>>>>>> example, see test/DebugInfo/X86/tls.ll I wrote - it has
some "unannotated"
>>>>>> bytes because we still don't have a nice way to
annotate location bytes
>>>>>> that are DWARF expressions, but it's close - I
guess those should be
>>>>>> CHECK-NEXTs, though. In any case, you should be able to
check a few lines
>>>>>> of assembly with the # DW_AT/DW_TAG annotation
comments.
>>>>>>
>>>>>
>>>>> I can check for ".quad .Lsection_info+38
#DW_AT_type".
>>>>>
>>>> Done in attached patch.
>>>>
>>>>>
>>>>>
>>>>>>      You'd need to add the tu3.cpp from my example
if you want to
>>>>>> demonstrate that the relocation is actually working as
intended and
>>>>>> avoiding the bogus result I showed.
>>>>>>   type-unique-simple-a.ll
>>>>>>     While I agree that having common test cases helps
reduce the
>>>>>> number of separate invocations we have, this test case
seems like it might
>>>>>> be becoming a little hard to follow what's under
test. Just going from the
>>>>>> diff I'm not sure what's what. Could you give a
brief description of the
>>>>>> state of type-unique-simple2 files? What's
involved? What's it meant to
>>>>>> test? What's it actually testing?
>>>>>>
>>>>> I can add more comments. The source files are included in
the testing
>>>>> case. I am checking that llvm-link only generates a single
copy of the
>>>>> struct and the backend generates a single DIE and uses
ref_addr.
>>>>> The changes are to check "the backend generates a
single DIE and uses
>>>>> ref_addr".
>>>>>
>>>> Done in attached patch.
>>>>
>>>>>
>>>>>    DIE.h:
>>>>>>     checkCompileUnit could probably be called
"getCompileUnitOrNull",
>>>>>> the name "check*" seems to imply that it does
some checking, which isn't
>>>>>> true.
>>>>>>
>>>>>  Will do.
>>>>>
>>>> Done in attached patch.
>>>>
>>>>>
>>>>>
>>>>>>   DwarfCompileUnit.cpp:
>>>>>>     the "istype || issubprogram" check should
probably be pulled out
>>>>>> into a separate function, something like
"isShareableAcrossCUs" or
>>>>>> something like that (please, that's not the best
name, let's discuss it
>>>>>> further before we settle on a name) so that getDIE and
insertDIE are sure
>>>>>> to use the same test at all times.
>>>>>>
>>>>> Yes, the condition is shared between getDIE and insertDIE.
I like
>>>>> isSharableAcrossCUs, because that is why the map is in
DwarfDebug instead
>>>>> of CompileUnit.
>>>>>
>>>> Done in attached patch.
>>>>
>>>>>
>>>>>
>>>>>>    Why does addDIEEntry take a form? When does the
caller get to
>>>>>> choose the form rather than the callee deciding between
ref4 and ref_addr
>>>>>> based on context?
>>>>>>
>>>>>
>>>>> addDIEEntry took a form before my change and I didn't
check why it did.
>>>>> I will check if all callers always use ref4, if it it true,
I will
>>>>> submit a cleanup patch first.
>>>>>
>>>> Done in attached patch.
>>>>
>>>>>
>>>>>   I'm still unsure about this worklist thing - do your
current tests
>>>>>> cover the need for the worklist? ie: if we removed that
logic, would tests
>>>>>> fail? Can you describe a specific sequence where the
worklist is necessary?
>>>>>>
>>>>>
>>>>> If we are sure that DIEs are always added to an owner
before calling
>>>>> addDIEEntry, we don't need the worklist.
>>>>> But I saw cases where that was not true, I will get a
reduced testing
>>>>> case.
>>>>>
>>>>
>>>> If we try to assert both DIEs have owner in addDIEEntry, the
following
>>>> testing cases will fail:
>>>>     LLVM :: DebugInfo/X86/multiple-aranges.ll
>>>>     LLVM :: DebugInfo/X86/ref_addr_relocation.ll
>>>>     LLVM :: DebugInfo/X86/stmt-list-multiple-compile-units.ll
>>>>     LLVM :: DebugInfo/two-cus-from-same-file.ll
>>>>     LLVM :: Linker/type-unique-simple-a.ll
>>>>     LLVM :: Linker/type-unique-simple2.ll
>>>>
>>>> For ref_addr_relocation, it failed in
>>>> #5  0x0000000100b169ba in llvm::DwarfDebug::addDIEEntry
>>>> (this=0x103023600, Die=0x102e14090, Attribute=73,
Entry=0x10302a9d0) at
>>>>
/Users/manmanren/llvm_git/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:3071
>>>> #6  0x0000000100b040e0 in llvm::CompileUnit::addType
(this=0x102e13ec0,
>>>> Entity=0x102e14090, Ty={<llvm::DIScope> =
{<llvm::DIDescriptor> = {DbgNode
>>>> = 0x102e05f30}, <No data fields>}, <No data
fields>}, Attribute=73) at
>>>>
/Users/manmanren/llvm_git/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:910
>>>> #7  0x0000000100b05bff in
llvm::CompileUnit::createGlobalVariableDIE
>>>> (this=0x102e13ec0, N=0x102e068c0) at
>>>>
/Users/manmanren/llvm_git/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:1505
>>>>
>>>> If we look at DwarfCompileUnit.cpp:
>>>>     VariableDIE = new DIE(GV.getTag());
>>>>     // Add to map.
>>>>     insertDIE(N, VariableDIE);
>>>>
>>>>     // Add name and type.
>>>>     addString(VariableDIE, dwarf::DW_AT_name,
GV.getDisplayName());
>>>>     addType(VariableDIE, GTy);
>>>>
>>>> The VariableDIE is not added to an owner yet when calling
addType.
>>>>
>>>
>>> I believe I have addressed all review comments, the patches are
>>> re-attached here for convenience.
>>>
>>
>> So I'm still thinking about the work list work.
>>
>> If we don't know which CU a DIE is in - isn't it, necessarily,
going to
>> be in the current CU (& thus referenced by ref_data4 (using a
CU-local
>> offset), not ref_addr)?
>>
>
> That may be true. But can we prove that?
>
We really shouldn't add extra complexity to the codebase just because we
don't know how the codebase works - that's what leads to the kind of
complexity we see in the debug info handling today.

> There are two ways of handling this:
> 1> use a worklist to be conservative
> 2> do not use a worklist, but add an assertion when emitting a DIE A,
make
> sure the DIE referenced with ref4 is indeed in the same CU as DIE A.
>
Please just add this assertion. If it fires we'll have a test case and a
concrete justification for this complexity such that should anyone remove
it later because it looked unnecessary, they'll at least have a failing
test to explain why it was there in the first place.

>
> Let me know which one you prefer.
>
> Do you have any comments on the ref_addr patch (the 1st patch of the two)?
>
Nothing much - I wouldn't mind Eric taking a look (& would rather you
not
commit this until you're committing the second patch, since it's
otherwise
untested/unjustified code) on the label/offset related stuff since I'm less
familiar with that.

There's one or two cases of {} on single-statement blocks you could fix up.

"DebugInfoOffset" (both the member and the functions to set/get it)
might
be more meaningfully named "SectionOffset" - but I'm not sure.
Eric? Other
names (DebugInfoSectionOffset?)?

- David
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131015/3f0da65b/attachment.html>

Manman Ren

2013-Oct-15 20:22 UTC

head link

[LLVMdev] [Debug Info PATCH] for support of ref_addr and removal of DIE duplication

On Tue, Oct 15, 2013 at 11:34 AM, David Blaikie <dblaikie at gmail.com>
wrote:
>
>
>
> On Tue, Oct 15, 2013 at 10:51 AM, Manman Ren <manman.ren at
gmail.com> wrote:
>
>>
>>
>>
>> On Tue, Oct 15, 2013 at 10:10 AM, David Blaikie <dblaikie at
gmail.com>wrote:
>>
>>>
>>>
>>>
>>> On Tue, Oct 15, 2013 at 10:05 AM, Manman Ren <manman.ren at
gmail.com>wrote:
>>>
>>>>
>>>>
>>>>
>>>> On Wed, Oct 9, 2013 at 5:22 PM, Manman Ren <manman.ren at
gmail.com>wrote:
>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 9, 2013 at 1:32 PM, Manman Ren <manman.ren
at gmail.com>wrote:
>>>>>
>>>>>>
>>>>>> David,
>>>>>>
>>>>>> Thanks for reviewing!
>>>>>>
>>>>>> On Wed, Oct 9, 2013 at 11:36 AM, David Blaikie
<dblaikie at gmail.com>wrote:
>>>>>>
>>>>>>> Might be easier if these were on Phabricator, but
here are some
>>>>>>> thoughts:
>>>>>>>
>>>>>>> 0001:
>>>>>>>   This patch generally, while separated for
legibility, is untested
>>>>>>> & difficult to discuss in isolation.
>>>>>>>
>>>>>> I agree, this patch adds the functionality but does not
use it, the
>>>>>> 2nd patch uses ref_addr.
>>>>>> If you think I should merge the two and commit as a
single patch, let
>>>>>> me know.
>>>>>>
>>>>>>
>>>>>>>  I may need to refer to the second patch in
reviewing this first one.
>>>>>>>   DwarfDebug.cpp:
>>>>>>>     computeSizeAndOffsets:
>>>>>>>       I believe this produces the wrong offset for
the 3rd CU and
>>>>>>> onwards. computeSizeAndOffset returns the EndOffset
which is absolute, not
>>>>>>> relative to the Offset passed in, so it should be
assigned to SecOffset,
>>>>>>> not added to it. (eg: if you have CUs at 0, 42, and
57 - the first pass
>>>>>>> through SecOffset will be zero, then it'll be 0
+ 42, then on the 3rd it'll
>>>>>>> be (0 + 42) + 57 which isn't right, it should
just be 57). Please add more
>>>>>>> test coverage while fixing this issue.
>>>>>>>
>>>>>>
>>>>>> computeSizeAndOffset takes an offset that is relative
to the start of
>>>>>> the CU and returns the offset relative to the CU after
laying out the DIE.
>>>>>> The initial offset before laying out the CU DIE is the
header size,
>>>>>> EndOffset will be the offset relative to the CU after
laying out the whole
>>>>>> CU DIE.
>>>>>> We can think of EndOffset as the size of the whole CU
DIE. SecOffset
>>>>>> is the offset from the Debug Info section, so we can
update it by adding
>>>>>> the CU size.
>>>>>>
>>>>>>   // Offset from the beginning of debug info section.
>>>>>>   unsigned SecOffset = 0;
>>>>>>   for (SmallVectorImpl<CompileUnit *>::iterator I
= CUs.begin(),
>>>>>>          E = CUs.end(); I != E; ++I) {
>>>>>>     (*I)->setDebugInfoOffset(SecOffset);
>>>>>>      unsigned Offset >>>>>>      
sizeof(int32_t) + // Length of Compilation Unit Info
>>>>>>       sizeof(int16_t) + // DWARF version number
>>>>>>       sizeof(int32_t) + // Offset Into Abbrev. Section
>>>>>>       sizeof(int8_t);   // Pointer Size (in bytes)
>>>>>>
>>>>>>     unsigned EndOffset =
computeSizeAndOffset((*I)->getCUDie(),
>>>>>> Offset);
>>>>>>     SecOffset += EndOffset;
>>>>>>   }
>>>>>>
>>>>>
>>>>> Added more comments in attached patch.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> Eric/Manman: rough design question: compute the
absolute offset of
>>>>>>> each CU within the debug_info section and describe
them all relative to a
>>>>>>> single symbol at the start of the debug_info
section, or should we put a
>>>>>>> label at the start of each CU?
>>>>>>>
>>>>>>
>>>>>> Either way should work. But since we already have the
section offset
>>>>>> for each CU, describing relative to the start of
debug_info section saves
>>>>>> us a few symbols :)
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> 0002:
>>>>>>>   ref_addr_relocation.ll:
>>>>>>>     seems a bit vague in terms of how you test for
the relocation. I
>>>>>>> think it'd make more sense to test the
assembly, than the reafobj output,
>>>>>>> that way you can test that the correct bytes have
the relocation rather
>>>>>>> than just that there's "some"
.debug_info relocation in the file. For an
>>>>>>> example, see test/DebugInfo/X86/tls.ll I wrote - it
has some "unannotated"
>>>>>>> bytes because we still don't have a nice way to
annotate location bytes
>>>>>>> that are DWARF expressions, but it's close - I
guess those should be
>>>>>>> CHECK-NEXTs, though. In any case, you should be
able to check a few lines
>>>>>>> of assembly with the # DW_AT/DW_TAG annotation
comments.
>>>>>>>
>>>>>>
>>>>>> I can check for ".quad .Lsection_info+38
#DW_AT_type".
>>>>>>
>>>>> Done in attached patch.
>>>>>
>>>>>>
>>>>>>
>>>>>>>      You'd need to add the tu3.cpp from my
example if you want to
>>>>>>> demonstrate that the relocation is actually working
as intended and
>>>>>>> avoiding the bogus result I showed.
>>>>>>>   type-unique-simple-a.ll
>>>>>>>     While I agree that having common test cases
helps reduce the
>>>>>>> number of separate invocations we have, this test
case seems like it might
>>>>>>> be becoming a little hard to follow what's
under test. Just going from the
>>>>>>> diff I'm not sure what's what. Could you
give a brief description of the
>>>>>>> state of type-unique-simple2 files? What's
involved? What's it meant to
>>>>>>> test? What's it actually testing?
>>>>>>>
>>>>>> I can add more comments. The source files are included
in the testing
>>>>>> case. I am checking that llvm-link only generates a
single copy of the
>>>>>> struct and the backend generates a single DIE and uses
ref_addr.
>>>>>> The changes are to check "the backend generates a
single DIE and uses
>>>>>> ref_addr".
>>>>>>
>>>>> Done in attached patch.
>>>>>
>>>>>>
>>>>>>    DIE.h:
>>>>>>>     checkCompileUnit could probably be called
>>>>>>> "getCompileUnitOrNull", the name
"check*" seems to imply that it does some
>>>>>>> checking, which isn't true.
>>>>>>>
>>>>>>  Will do.
>>>>>>
>>>>> Done in attached patch.
>>>>>
>>>>>>
>>>>>>
>>>>>>>   DwarfCompileUnit.cpp:
>>>>>>>     the "istype || issubprogram" check
should probably be pulled out
>>>>>>> into a separate function, something like
"isShareableAcrossCUs" or
>>>>>>> something like that (please, that's not the
best name, let's discuss it
>>>>>>> further before we settle on a name) so that getDIE
and insertDIE are sure
>>>>>>> to use the same test at all times.
>>>>>>>
>>>>>> Yes, the condition is shared between getDIE and
insertDIE. I like
>>>>>> isSharableAcrossCUs, because that is why the map is in
DwarfDebug instead
>>>>>> of CompileUnit.
>>>>>>
>>>>> Done in attached patch.
>>>>>
>>>>>>
>>>>>>
>>>>>>>    Why does addDIEEntry take a form? When does the
caller get to
>>>>>>> choose the form rather than the callee deciding
between ref4 and ref_addr
>>>>>>> based on context?
>>>>>>>
>>>>>>
>>>>>> addDIEEntry took a form before my change and I
didn't check why it
>>>>>> did.
>>>>>> I will check if all callers always use ref4, if it it
true, I will
>>>>>> submit a cleanup patch first.
>>>>>>
>>>>> Done in attached patch.
>>>>>
>>>>>>
>>>>>>   I'm still unsure about this worklist thing - do
your current tests
>>>>>>> cover the need for the worklist? ie: if we removed
that logic, would tests
>>>>>>> fail? Can you describe a specific sequence where
the worklist is necessary?
>>>>>>>
>>>>>>
>>>>>> If we are sure that DIEs are always added to an owner
before calling
>>>>>> addDIEEntry, we don't need the worklist.
>>>>>> But I saw cases where that was not true, I will get a
reduced testing
>>>>>> case.
>>>>>>
>>>>>
>>>>> If we try to assert both DIEs have owner in addDIEEntry,
the following
>>>>> testing cases will fail:
>>>>>     LLVM :: DebugInfo/X86/multiple-aranges.ll
>>>>>     LLVM :: DebugInfo/X86/ref_addr_relocation.ll
>>>>>     LLVM ::
DebugInfo/X86/stmt-list-multiple-compile-units.ll
>>>>>     LLVM :: DebugInfo/two-cus-from-same-file.ll
>>>>>     LLVM :: Linker/type-unique-simple-a.ll
>>>>>     LLVM :: Linker/type-unique-simple2.ll
>>>>>
>>>>> For ref_addr_relocation, it failed in
>>>>> #5  0x0000000100b169ba in llvm::DwarfDebug::addDIEEntry
>>>>> (this=0x103023600, Die=0x102e14090, Attribute=73,
Entry=0x10302a9d0) at
>>>>>
/Users/manmanren/llvm_git/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:3071
>>>>> #6  0x0000000100b040e0 in llvm::CompileUnit::addType
>>>>> (this=0x102e13ec0, Entity=0x102e14090,
Ty={<llvm::DIScope> >>>>> {<llvm::DIDescriptor> =
{DbgNode = 0x102e05f30}, <No data fields>}, <No
>>>>> data fields>}, Attribute=73) at
>>>>>
/Users/manmanren/llvm_git/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:910
>>>>> #7  0x0000000100b05bff in
llvm::CompileUnit::createGlobalVariableDIE
>>>>> (this=0x102e13ec0, N=0x102e068c0) at
>>>>>
/Users/manmanren/llvm_git/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:1505
>>>>>
>>>>> If we look at DwarfCompileUnit.cpp:
>>>>>     VariableDIE = new DIE(GV.getTag());
>>>>>     // Add to map.
>>>>>     insertDIE(N, VariableDIE);
>>>>>
>>>>>     // Add name and type.
>>>>>     addString(VariableDIE, dwarf::DW_AT_name,
GV.getDisplayName());
>>>>>     addType(VariableDIE, GTy);
>>>>>
>>>>> The VariableDIE is not added to an owner yet when calling
addType.
>>>>>
>>>>
>>>> I believe I have addressed all review comments, the patches are
>>>> re-attached here for convenience.
>>>>
>>>
>>> So I'm still thinking about the work list work.
>>>
>>> If we don't know which CU a DIE is in - isn't it,
necessarily, going to
>>> be in the current CU (& thus referenced by ref_data4 (using a
CU-local
>>> offset), not ref_addr)?
>>>
>>
>> That may be true. But can we prove that?
>>
>
> We really shouldn't add extra complexity to the codebase just because
we
> don't know how the codebase works - that's what leads to the kind
of
> complexity we see in the debug info handling today.
>
>
>> There are two ways of handling this:
>> 1> use a worklist to be conservative
>> 2> do not use a worklist, but add an assertion when emitting a DIE
A,
>> make sure the DIE referenced with ref4 is indeed in the same CU as DIE
A.
>>
>
> Please just add this assertion. If it fires we'll have a test case and
a
> concrete justification for this complexity such that should anyone remove
> it later because it looked unnecessary, they'll at least have a failing
> test to explain why it was there in the first plac .
>
The assertion fails even with a simple testing case when the referenced DIE
has an owner and the DIE itself does not have an owner.
For that case, we can't assume ref4 should be used. I don't think we can
enforce that all DIEs must be added to a parent before calling addDIEEntry.

For the specific testing case, when constructing children of a scope DIE,
we call addDIEEntry on the children, after that, we add the children to the
scope DIE.
cat foo.cpp

#include "a.hpp"
void f(int a) {
  Base t;
}
cat bar.cpp

#include "a.hpp"
void f(int);
void g(int a) {
  Base t;
}
int main() {
  f(0);
  g(1);
  return 0;
}
cat a.hpp
struct Base {
  int a;
};

Let me know if I should investigate further.

Thanks,
Manman

>
>
>>
>> Let me know which one you prefer.
>>
>> Do you have any comments on the ref_addr patch (the 1st patch of the
two)?
>>
>
> Nothing much - I wouldn't mind Eric taking a look (& would rather
you not
> commit this until you're committing the second patch, since it's
otherwise
> untested/unjustified code) on the label/offset related stuff since I'm
less
> familiar with that.
>
> There's one or two cases of {} on single-statement blocks you could fix
up.
>
> "DebugInfoOffset" (both the member and the functions to set/get
it) might
> be more meaningfully named "SectionOffset" - but I'm not
sure. Eric? Other
> names (DebugInfoSectionOffset?)?
>
> - David
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131015/edfcec11/attachment.html>

Manman Ren

2013-Oct-15 20:29 UTC

head link

[LLVMdev] [Debug Info PATCH] for support of ref_addr and removal of DIE duplication

On Tue, Oct 15, 2013 at 11:34 AM, David Blaikie <dblaikie at gmail.com>
wrote:
>
>
>
> On Tue, Oct 15, 2013 at 10:51 AM, Manman Ren <manman.ren at
gmail.com> wrote:
>
>>
>>
>>
>> On Tue, Oct 15, 2013 at 10:10 AM, David Blaikie <dblaikie at
gmail.com>wrote:
>>
>>>
>>>
>>>
>>> On Tue, Oct 15, 2013 at 10:05 AM, Manman Ren <manman.ren at
gmail.com>wrote:
>>>
>>>>
>>>>
>>>>
>>>> On Wed, Oct 9, 2013 at 5:22 PM, Manman Ren <manman.ren at
gmail.com>wrote:
>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 9, 2013 at 1:32 PM, Manman Ren <manman.ren
at gmail.com>wrote:
>>>>>
>>>>>>
>>>>>> David,
>>>>>>
>>>>>> Thanks for reviewing!
>>>>>>
>>>>>> On Wed, Oct 9, 2013 at 11:36 AM, David Blaikie
<dblaikie at gmail.com>wrote:
>>>>>>
>>>>>>> Might be easier if these were on Phabricator, but
here are some
>>>>>>> thoughts:
>>>>>>>
>>>>>>> 0001:
>>>>>>>   This patch generally, while separated for
legibility, is untested
>>>>>>> & difficult to discuss in isolation.
>>>>>>>
>>>>>> I agree, this patch adds the functionality but does not
use it, the
>>>>>> 2nd patch uses ref_addr.
>>>>>> If you think I should merge the two and commit as a
single patch, let
>>>>>> me know.
>>>>>>
>>>>>>
>>>>>>>  I may need to refer to the second patch in
reviewing this first one.
>>>>>>>   DwarfDebug.cpp:
>>>>>>>     computeSizeAndOffsets:
>>>>>>>       I believe this produces the wrong offset for
the 3rd CU and
>>>>>>> onwards. computeSizeAndOffset returns the EndOffset
which is absolute, not
>>>>>>> relative to the Offset passed in, so it should be
assigned to SecOffset,
>>>>>>> not added to it. (eg: if you have CUs at 0, 42, and
57 - the first pass
>>>>>>> through SecOffset will be zero, then it'll be 0
+ 42, then on the 3rd it'll
>>>>>>> be (0 + 42) + 57 which isn't right, it should
just be 57). Please add more
>>>>>>> test coverage while fixing this issue.
>>>>>>>
>>>>>>
>>>>>> computeSizeAndOffset takes an offset that is relative
to the start of
>>>>>> the CU and returns the offset relative to the CU after
laying out the DIE.
>>>>>> The initial offset before laying out the CU DIE is the
header size,
>>>>>> EndOffset will be the offset relative to the CU after
laying out the whole
>>>>>> CU DIE.
>>>>>> We can think of EndOffset as the size of the whole CU
DIE. SecOffset
>>>>>> is the offset from the Debug Info section, so we can
update it by adding
>>>>>> the CU size.
>>>>>>
>>>>>>   // Offset from the beginning of debug info section.
>>>>>>   unsigned SecOffset = 0;
>>>>>>   for (SmallVectorImpl<CompileUnit *>::iterator I
= CUs.begin(),
>>>>>>          E = CUs.end(); I != E; ++I) {
>>>>>>     (*I)->setDebugInfoOffset(SecOffset);
>>>>>>      unsigned Offset >>>>>>      
sizeof(int32_t) + // Length of Compilation Unit Info
>>>>>>       sizeof(int16_t) + // DWARF version number
>>>>>>       sizeof(int32_t) + // Offset Into Abbrev. Section
>>>>>>       sizeof(int8_t);   // Pointer Size (in bytes)
>>>>>>
>>>>>>     unsigned EndOffset =
computeSizeAndOffset((*I)->getCUDie(),
>>>>>> Offset);
>>>>>>     SecOffset += EndOffset;
>>>>>>   }
>>>>>>
>>>>>
>>>>> Added more comments in attached patch.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> Eric/Manman: rough design question: compute the
absolute offset of
>>>>>>> each CU within the debug_info section and describe
them all relative to a
>>>>>>> single symbol at the start of the debug_info
section, or should we put a
>>>>>>> label at the start of each CU?
>>>>>>>
>>>>>>
>>>>>> Either way should work. But since we already have the
section offset
>>>>>> for each CU, describing relative to the start of
debug_info section saves
>>>>>> us a few symbols :)
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> 0002:
>>>>>>>   ref_addr_relocation.ll:
>>>>>>>     seems a bit vague in terms of how you test for
the relocation. I
>>>>>>> think it'd make more sense to test the
assembly, than the reafobj output,
>>>>>>> that way you can test that the correct bytes have
the relocation rather
>>>>>>> than just that there's "some"
.debug_info relocation in the file. For an
>>>>>>> example, see test/DebugInfo/X86/tls.ll I wrote - it
has some "unannotated"
>>>>>>> bytes because we still don't have a nice way to
annotate location bytes
>>>>>>> that are DWARF expressions, but it's close - I
guess those should be
>>>>>>> CHECK-NEXTs, though. In any case, you should be
able to check a few lines
>>>>>>> of assembly with the # DW_AT/DW_TAG annotation
comments.
>>>>>>>
>>>>>>
>>>>>> I can check for ".quad .Lsection_info+38
#DW_AT_type".
>>>>>>
>>>>> Done in attached patch.
>>>>>
>>>>>>
>>>>>>
>>>>>>>      You'd need to add the tu3.cpp from my
example if you want to
>>>>>>> demonstrate that the relocation is actually working
as intended and
>>>>>>> avoiding the bogus result I showed.
>>>>>>>   type-unique-simple-a.ll
>>>>>>>     While I agree that having common test cases
helps reduce the
>>>>>>> number of separate invocations we have, this test
case seems like it might
>>>>>>> be becoming a little hard to follow what's
under test. Just going from the
>>>>>>> diff I'm not sure what's what. Could you
give a brief description of the
>>>>>>> state of type-unique-simple2 files? What's
involved? What's it meant to
>>>>>>> test? What's it actually testing?
>>>>>>>
>>>>>> I can add more comments. The source files are included
in the testing
>>>>>> case. I am checking that llvm-link only generates a
single copy of the
>>>>>> struct and the backend generates a single DIE and uses
ref_addr.
>>>>>> The changes are to check "the backend generates a
single DIE and uses
>>>>>> ref_addr".
>>>>>>
>>>>> Done in attached patch.
>>>>>
>>>>>>
>>>>>>    DIE.h:
>>>>>>>     checkCompileUnit could probably be called
>>>>>>> "getCompileUnitOrNull", the name
"check*" seems to imply that it does some
>>>>>>> checking, which isn't true.
>>>>>>>
>>>>>>  Will do.
>>>>>>
>>>>> Done in attached patch.
>>>>>
>>>>>>
>>>>>>
>>>>>>>   DwarfCompileUnit.cpp:
>>>>>>>     the "istype || issubprogram" check
should probably be pulled out
>>>>>>> into a separate function, something like
"isShareableAcrossCUs" or
>>>>>>> something like that (please, that's not the
best name, let's discuss it
>>>>>>> further before we settle on a name) so that getDIE
and insertDIE are sure
>>>>>>> to use the same test at all times.
>>>>>>>
>>>>>> Yes, the condition is shared between getDIE and
insertDIE. I like
>>>>>> isSharableAcrossCUs, because that is why the map is in
DwarfDebug instead
>>>>>> of CompileUnit.
>>>>>>
>>>>> Done in attached patch.
>>>>>
>>>>>>
>>>>>>
>>>>>>>    Why does addDIEEntry take a form? When does the
caller get to
>>>>>>> choose the form rather than the callee deciding
between ref4 and ref_addr
>>>>>>> based on context?
>>>>>>>
>>>>>>
>>>>>> addDIEEntry took a form before my change and I
didn't check why it
>>>>>> did.
>>>>>> I will check if all callers always use ref4, if it it
true, I will
>>>>>> submit a cleanup patch first.
>>>>>>
>>>>> Done in attached patch.
>>>>>
>>>>>>
>>>>>>   I'm still unsure about this worklist thing - do
your current tests
>>>>>>> cover the need for the worklist? ie: if we removed
that logic, would tests
>>>>>>> fail? Can you describe a specific sequence where
the worklist is necessary?
>>>>>>>
>>>>>>
>>>>>> If we are sure that DIEs are always added to an owner
before calling
>>>>>> addDIEEntry, we don't need the worklist.
>>>>>> But I saw cases where that was not true, I will get a
reduced testing
>>>>>> case.
>>>>>>
>>>>>
>>>>> If we try to assert both DIEs have owner in addDIEEntry,
the following
>>>>> testing cases will fail:
>>>>>     LLVM :: DebugInfo/X86/multiple-aranges.ll
>>>>>     LLVM :: DebugInfo/X86/ref_addr_relocation.ll
>>>>>     LLVM ::
DebugInfo/X86/stmt-list-multiple-compile-units.ll
>>>>>     LLVM :: DebugInfo/two-cus-from-same-file.ll
>>>>>     LLVM :: Linker/type-unique-simple-a.ll
>>>>>     LLVM :: Linker/type-unique-simple2.ll
>>>>>
>>>>> For ref_addr_relocation, it failed in
>>>>> #5  0x0000000100b169ba in llvm::DwarfDebug::addDIEEntry
>>>>> (this=0x103023600, Die=0x102e14090, Attribute=73,
Entry=0x10302a9d0) at
>>>>>
/Users/manmanren/llvm_git/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:3071
>>>>> #6  0x0000000100b040e0 in llvm::CompileUnit::addType
>>>>> (this=0x102e13ec0, Entity=0x102e14090,
Ty={<llvm::DIScope> >>>>> {<llvm::DIDescriptor> =
{DbgNode = 0x102e05f30}, <No data fields>}, <No
>>>>> data fields>}, Attribute=73) at
>>>>>
/Users/manmanren/llvm_git/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:910
>>>>> #7  0x0000000100b05bff in
llvm::CompileUnit::createGlobalVariableDIE
>>>>> (this=0x102e13ec0, N=0x102e068c0) at
>>>>>
/Users/manmanren/llvm_git/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:1505
>>>>>
>>>>> If we look at DwarfCompileUnit.cpp:
>>>>>     VariableDIE = new DIE(GV.getTag());
>>>>>     // Add to map.
>>>>>     insertDIE(N, VariableDIE);
>>>>>
>>>>>     // Add name and type.
>>>>>     addString(VariableDIE, dwarf::DW_AT_name,
GV.getDisplayName());
>>>>>     addType(VariableDIE, GTy);
>>>>>
>>>>> The VariableDIE is not added to an owner yet when calling
addType.
>>>>>
>>>>
>>>> I believe I have addressed all review comments, the patches are
>>>> re-attached here for convenience.
>>>>
>>>
>>> So I'm still thinking about the work list work.
>>>
>>> If we don't know which CU a DIE is in - isn't it,
necessarily, going to
>>> be in the current CU (& thus referenced by ref_data4 (using a
CU-local
>>> offset), not ref_addr)?
>>>
>>
>> That may be true. But can we prove that?
>>
>
> We really shouldn't add extra complexity to the codebase just because
we
> don't know how the codebase works - that's what leads to the kind
of
> complexity we see in the debug info handling today.
>
>
>> There are two ways of handling this:
>> 1> use a worklist to be conservative
>> 2> do not use a worklist, but add an assertion when emitting a DIE
A,
>> make sure the DIE referenced with ref4 is indeed in the same CU as DIE
A.
>>
>
> Please just add this assertion. If it fires we'll have a test case and
a
> concrete justification for this complexity such that should anyone remove
> it later because it looked unnecessary, they'll at least have a failing
> test to explain why it was there in the first place.
>
>
>>
>> Let me know which one you prefer.
>>
>> Do you have any comments on the ref_addr patch (the 1st patch of the
two)?
>>
>
> Nothing much - I wouldn't mind Eric taking a look (& would rather
you not
> commit this until you're committing the second patch, since it's
otherwise
> untested/unjustified code) on the label/offset related stuff since I'm
less
> familiar with that.
>
Eric,

Do you want to take a look at the 1st patch?

>
> There's one or two cases of {} on single-statement blocks you could fix
up.
>I couldn't find that.
Are you referring to:

+      } else {

+        Asm->EmitInt32(Addr);
+      }

It has a multi-statement if, I thought the rule is to have a parenthesis
for the else as well.

>
> "DebugInfoOffset" (both the member and the functions to set/get
it) might
> be more meaningfully named "SectionOffset" - but I'm not
sure. Eric? Other
> names (DebugInfoSectionOffset?)?
>
Eric, any opinion here?

Thanks,
Manman
>
>
> - David
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131015/9e5796ea/attachment.html>

Eric Christopher

2013-Oct-15 20:31 UTC

head link

[LLVMdev] [Debug Info PATCH] for support of ref_addr and removal of DIE duplication

On Tue, Oct 15, 2013 at 1:29 PM, Manman Ren <manman.ren at gmail.com>
wrote:>
>
>
> On Tue, Oct 15, 2013 at 11:34 AM, David Blaikie <dblaikie at
gmail.com> wrote:
>>
>>
>>
>>
>> On Tue, Oct 15, 2013 at 10:51 AM, Manman Ren <manman.ren at
gmail.com> wrote:
>>>
>>>
>>>
>>>
>>> On Tue, Oct 15, 2013 at 10:10 AM, David Blaikie <dblaikie at
gmail.com>
>>> wrote:
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Oct 15, 2013 at 10:05 AM, Manman Ren <manman.ren at
gmail.com>
>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 9, 2013 at 5:22 PM, Manman Ren <manman.ren
at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 9, 2013 at 1:32 PM, Manman Ren
<manman.ren at gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> David,
>>>>>>>
>>>>>>> Thanks for reviewing!
>>>>>>>
>>>>>>> On Wed, Oct 9, 2013 at 11:36 AM, David Blaikie
<dblaikie at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Might be easier if these were on Phabricator,
but here are some
>>>>>>>> thoughts:
>>>>>>>>
>>>>>>>> 0001:
>>>>>>>>   This patch generally, while separated for
legibility, is untested
>>>>>>>> & difficult to discuss in isolation.
>>>>>>>
>>>>>>> I agree, this patch adds the functionality but does
not use it, the
>>>>>>> 2nd patch uses ref_addr.
>>>>>>> If you think I should merge the two and commit as a
single patch, let
>>>>>>> me know.
>>>>>>>
>>>>>>>>
>>>>>>>> I may need to refer to the second patch in
reviewing this first one.
>>>>>>>>   DwarfDebug.cpp:
>>>>>>>>     computeSizeAndOffsets:
>>>>>>>>       I believe this produces the wrong offset
for the 3rd CU and
>>>>>>>> onwards. computeSizeAndOffset returns the
EndOffset which is absolute, not
>>>>>>>> relative to the Offset passed in, so it should
be assigned to SecOffset, not
>>>>>>>> added to it. (eg: if you have CUs at 0, 42, and
57 - the first pass through
>>>>>>>> SecOffset will be zero, then it'll be 0 +
42, then on the 3rd it'll be (0 +
>>>>>>>> 42) + 57 which isn't right, it should just
be 57). Please add more test
>>>>>>>> coverage while fixing this issue.
>>>>>>>
>>>>>>>
>>>>>>> computeSizeAndOffset takes an offset that is
relative to the start of
>>>>>>> the CU and returns the offset relative to the CU
after laying out the DIE.
>>>>>>> The initial offset before laying out the CU DIE is
the header size,
>>>>>>> EndOffset will be the offset relative to the CU
after laying out the whole
>>>>>>> CU DIE.
>>>>>>> We can think of EndOffset as the size of the whole
CU DIE. SecOffset
>>>>>>> is the offset from the Debug Info section, so we
can update it by adding the
>>>>>>> CU size.
>>>>>>>
>>>>>>>   // Offset from the beginning of debug info
section.
>>>>>>>   unsigned SecOffset = 0;
>>>>>>>   for (SmallVectorImpl<CompileUnit
*>::iterator I = CUs.begin(),
>>>>>>>          E = CUs.end(); I != E; ++I) {
>>>>>>>     (*I)->setDebugInfoOffset(SecOffset);
>>>>>>>     unsigned Offset >>>>>>>   
sizeof(int32_t) + // Length of Compilation Unit Info
>>>>>>>       sizeof(int16_t) + // DWARF version number
>>>>>>>       sizeof(int32_t) + // Offset Into Abbrev.
Section
>>>>>>>       sizeof(int8_t);   // Pointer Size (in bytes)
>>>>>>>
>>>>>>>     unsigned EndOffset =
computeSizeAndOffset((*I)->getCUDie(),
>>>>>>> Offset);
>>>>>>>     SecOffset += EndOffset;
>>>>>>>   }
>>>>>>
>>>>>>
>>>>>> Added more comments in attached patch.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Eric/Manman: rough design question: compute the
absolute offset of
>>>>>>>> each CU within the debug_info section and
describe them all relative to a
>>>>>>>> single symbol at the start of the debug_info
section, or should we put a
>>>>>>>> label at the start of each CU?
>>>>>>>
>>>>>>>
>>>>>>> Either way should work. But since we already have
the section offset
>>>>>>> for each CU, describing relative to the start of
debug_info section saves us
>>>>>>> a few symbols :)
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 0002:
>>>>>>>>   ref_addr_relocation.ll:
>>>>>>>>     seems a bit vague in terms of how you test
for the relocation. I
>>>>>>>> think it'd make more sense to test the
assembly, than the reafobj output,
>>>>>>>> that way you can test that the correct bytes
have the relocation rather than
>>>>>>>> just that there's "some"
.debug_info relocation in the file. For an example,
>>>>>>>> see test/DebugInfo/X86/tls.ll I wrote - it has
some "unannotated" bytes
>>>>>>>> because we still don't have a nice way to
annotate location bytes that are
>>>>>>>> DWARF expressions, but it's close - I guess
those should be CHECK-NEXTs,
>>>>>>>> though. In any case, you should be able to
check a few lines of assembly
>>>>>>>> with the # DW_AT/DW_TAG annotation comments.
>>>>>>>
>>>>>>>
>>>>>>> I can check for ".quad .Lsection_info+38
#DW_AT_type".
>>>>>>
>>>>>> Done in attached patch.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>     You'd need to add the tu3.cpp from my
example if you want to
>>>>>>>> demonstrate that the relocation is actually
working as intended and avoiding
>>>>>>>> the bogus result I showed.
>>>>>>>>   type-unique-simple-a.ll
>>>>>>>>     While I agree that having common test cases
helps reduce the
>>>>>>>> number of separate invocations we have, this
test case seems like it might
>>>>>>>> be becoming a little hard to follow what's
under test. Just going from the
>>>>>>>> diff I'm not sure what's what. Could
you give a brief description of the
>>>>>>>> state of type-unique-simple2 files? What's
involved? What's it meant to
>>>>>>>> test? What's it actually testing?
>>>>>>>
>>>>>>> I can add more comments. The source files are
included in the testing
>>>>>>> case. I am checking that llvm-link only generates a
single copy of the
>>>>>>> struct and the backend generates a single DIE and
uses ref_addr.
>>>>>>> The changes are to check "the backend
generates a single DIE and uses
>>>>>>> ref_addr".
>>>>>>
>>>>>> Done in attached patch.
>>>>>>>
>>>>>>>
>>>>>>>>   DIE.h:
>>>>>>>>     checkCompileUnit could probably be called
>>>>>>>> "getCompileUnitOrNull", the name
"check*" seems to imply that it does some
>>>>>>>> checking, which isn't true.
>>>>>>>
>>>>>>> Will do.
>>>>>>
>>>>>> Done in attached patch.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>   DwarfCompileUnit.cpp:
>>>>>>>>     the "istype || issubprogram"
check should probably be pulled out
>>>>>>>> into a separate function, something like
"isShareableAcrossCUs" or something
>>>>>>>> like that (please, that's not the best
name, let's discuss it further before
>>>>>>>> we settle on a name) so that getDIE and
insertDIE are sure to use the same
>>>>>>>> test at all times.
>>>>>>>
>>>>>>> Yes, the condition is shared between getDIE and
insertDIE. I like
>>>>>>> isSharableAcrossCUs, because that is why the map is
in DwarfDebug instead of
>>>>>>> CompileUnit.
>>>>>>
>>>>>> Done in attached patch.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>   Why does addDIEEntry take a form? When does
the caller get to
>>>>>>>> choose the form rather than the callee deciding
between ref4 and ref_addr
>>>>>>>> based on context?
>>>>>>>
>>>>>>>
>>>>>>> addDIEEntry took a form before my change and I
didn't check why it
>>>>>>> did.
>>>>>>> I will check if all callers always use ref4, if it
it true, I will
>>>>>>> submit a cleanup patch first.
>>>>>>
>>>>>> Done in attached patch.
>>>>>>>
>>>>>>>
>>>>>>>>   I'm still unsure about this worklist
thing - do your current tests
>>>>>>>> cover the need for the worklist? ie: if we
removed that logic, would tests
>>>>>>>> fail? Can you describe a specific sequence
where the worklist is necessary?
>>>>>>>
>>>>>>>
>>>>>>> If we are sure that DIEs are always added to an
owner before calling
>>>>>>> addDIEEntry, we don't need the worklist.
>>>>>>> But I saw cases where that was not true, I will get
a reduced testing
>>>>>>> case.
>>>>>>
>>>>>>
>>>>>> If we try to assert both DIEs have owner in
addDIEEntry, the following
>>>>>> testing cases will fail:
>>>>>>     LLVM :: DebugInfo/X86/multiple-aranges.ll
>>>>>>     LLVM :: DebugInfo/X86/ref_addr_relocation.ll
>>>>>>     LLVM ::
DebugInfo/X86/stmt-list-multiple-compile-units.ll
>>>>>>     LLVM :: DebugInfo/two-cus-from-same-file.ll
>>>>>>     LLVM :: Linker/type-unique-simple-a.ll
>>>>>>     LLVM :: Linker/type-unique-simple2.ll
>>>>>>
>>>>>> For ref_addr_relocation, it failed in
>>>>>> #5  0x0000000100b169ba in llvm::DwarfDebug::addDIEEntry
>>>>>> (this=0x103023600, Die=0x102e14090, Attribute=73,
Entry=0x10302a9d0) at
>>>>>>
/Users/manmanren/llvm_git/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:3071
>>>>>> #6  0x0000000100b040e0 in llvm::CompileUnit::addType
>>>>>> (this=0x102e13ec0, Entity=0x102e14090,
Ty={<llvm::DIScope> >>>>>> {<llvm::DIDescriptor> =
{DbgNode = 0x102e05f30}, <No data fields>}, <No data
>>>>>> fields>}, Attribute=73) at
>>>>>>
/Users/manmanren/llvm_git/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:910
>>>>>> #7  0x0000000100b05bff in
llvm::CompileUnit::createGlobalVariableDIE
>>>>>> (this=0x102e13ec0, N=0x102e068c0) at
>>>>>>
/Users/manmanren/llvm_git/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:1505
>>>>>>
>>>>>> If we look at DwarfCompileUnit.cpp:
>>>>>>     VariableDIE = new DIE(GV.getTag());
>>>>>>     // Add to map.
>>>>>>     insertDIE(N, VariableDIE);
>>>>>>
>>>>>>     // Add name and type.
>>>>>>     addString(VariableDIE, dwarf::DW_AT_name,
GV.getDisplayName());
>>>>>>     addType(VariableDIE, GTy);
>>>>>>
>>>>>> The VariableDIE is not added to an owner yet when
calling addType.
>>>>>
>>>>>
>>>>> I believe I have addressed all review comments, the patches
are
>>>>> re-attached here for convenience.
>>>>
>>>>
>>>> So I'm still thinking about the work list work.
>>>>
>>>> If we don't know which CU a DIE is in - isn't it,
necessarily, going to
>>>> be in the current CU (& thus referenced by ref_data4 (using
a CU-local
>>>> offset), not ref_addr)?
>>>
>>>
>>> That may be true. But can we prove that?
>>
>>
>> We really shouldn't add extra complexity to the codebase just
because we
>> don't know how the codebase works - that's what leads to the
kind of
>> complexity we see in the debug info handling today.
>>
>>>
>>> There are two ways of handling this:
>>> 1> use a worklist to be conservative
>>> 2> do not use a worklist, but add an assertion when emitting a
DIE A,
>>> make sure the DIE referenced with ref4 is indeed in the same CU as
DIE A.
>>
>>
>> Please just add this assertion. If it fires we'll have a test case
and a
>> concrete justification for this complexity such that should anyone
remove it
>> later because it looked unnecessary, they'll at least have a
failing test to
>> explain why it was there in the first place.
>>
>>>
>>>
>>> Let me know which one you prefer.
>>>
>>> Do you have any comments on the ref_addr patch (the 1st patch of
the
>>> two)?
>>
>>
>> Nothing much - I wouldn't mind Eric taking a look (& would
rather you not
>> commit this until you're committing the second patch, since
it's otherwise
>> untested/unjustified code) on the label/offset related stuff since
I'm less
>> familiar with that.
>
>
> Eric,
>
> Do you want to take a look at the 1st patch?
>
I do, I've been following the thread, but give me a day or so.

Thanks.

-eric
>>
>>
>> There's one or two cases of {} on single-statement blocks you could
fix
>> up.
>
> I couldn't find that.
> Are you referring to:
>
> +      } else {
>
> +        Asm->EmitInt32(Addr);
>
> +      }
>
> It has a multi-statement if, I thought the rule is to have a parenthesis
for
> the else as well.
>
>>
>>
>> "DebugInfoOffset" (both the member and the functions to
set/get it) might
>> be more meaningfully named "SectionOffset" - but I'm not
sure. Eric? Other
>> names (DebugInfoSectionOffset?)?
>
>
> Eric, any opinion here?
>
> Thanks,
> Manman
>>
>>
>>
>> - David
>
>

David Blaikie

2013-Oct-15 20:37 UTC

head link

[LLVMdev] [Debug Info PATCH] for support of ref_addr and removal of DIE duplication

On Tue, Oct 15, 2013 at 1:22 PM, Manman Ren <manman.ren at gmail.com>
wrote:
>
>
>
> On Tue, Oct 15, 2013 at 11:34 AM, David Blaikie <dblaikie at
gmail.com>wrote:
>
>>
>>
>>
>> On Tue, Oct 15, 2013 at 10:51 AM, Manman Ren <manman.ren at
gmail.com>wrote:
>>
>>>
>>>
>>>
>>> On Tue, Oct 15, 2013 at 10:10 AM, David Blaikie <dblaikie at
gmail.com>wrote:
>>>
>>>>
>>>>
>>>>
>>>> On Tue, Oct 15, 2013 at 10:05 AM, Manman Ren <manman.ren at
gmail.com>wrote:
>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 9, 2013 at 5:22 PM, Manman Ren <manman.ren
at gmail.com>wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 9, 2013 at 1:32 PM, Manman Ren
<manman.ren at gmail.com>wrote:
>>>>>>
>>>>>>>
>>>>>>> David,
>>>>>>>
>>>>>>> Thanks for reviewing!
>>>>>>>
>>>>>>> On Wed, Oct 9, 2013 at 11:36 AM, David Blaikie
<dblaikie at gmail.com>wrote:
>>>>>>>
>>>>>>>> Might be easier if these were on Phabricator,
but here are some
>>>>>>>> thoughts:
>>>>>>>>
>>>>>>>> 0001:
>>>>>>>>   This patch generally, while separated for
legibility, is untested
>>>>>>>> & difficult to discuss in isolation.
>>>>>>>>
>>>>>>> I agree, this patch adds the functionality but does
not use it, the
>>>>>>> 2nd patch uses ref_addr.
>>>>>>> If you think I should merge the two and commit as a
single patch,
>>>>>>> let me know.
>>>>>>>
>>>>>>>
>>>>>>>>  I may need to refer to the second patch in
reviewing this first
>>>>>>>> one.
>>>>>>>>   DwarfDebug.cpp:
>>>>>>>>     computeSizeAndOffsets:
>>>>>>>>       I believe this produces the wrong offset
for the 3rd CU and
>>>>>>>> onwards. computeSizeAndOffset returns the
EndOffset which is absolute, not
>>>>>>>> relative to the Offset passed in, so it should
be assigned to SecOffset,
>>>>>>>> not added to it. (eg: if you have CUs at 0, 42,
and 57 - the first pass
>>>>>>>> through SecOffset will be zero, then it'll
be 0 + 42, then on the 3rd it'll
>>>>>>>> be (0 + 42) + 57 which isn't right, it
should just be 57). Please add more
>>>>>>>> test coverage while fixing this issue.
>>>>>>>>
>>>>>>>
>>>>>>> computeSizeAndOffset takes an offset that is
relative to the start
>>>>>>> of the CU and returns the offset relative to the CU
after laying out the
>>>>>>> DIE.
>>>>>>> The initial offset before laying out the CU DIE is
the header size,
>>>>>>> EndOffset will be the offset relative to the CU
after laying out the whole
>>>>>>> CU DIE.
>>>>>>> We can think of EndOffset as the size of the whole
CU DIE. SecOffset
>>>>>>> is the offset from the Debug Info section, so we
can update it by adding
>>>>>>> the CU size.
>>>>>>>
>>>>>>>   // Offset from the beginning of debug info
section.
>>>>>>>   unsigned SecOffset = 0;
>>>>>>>   for (SmallVectorImpl<CompileUnit
*>::iterator I = CUs.begin(),
>>>>>>>          E = CUs.end(); I != E; ++I) {
>>>>>>>     (*I)->setDebugInfoOffset(SecOffset);
>>>>>>>      unsigned Offset >>>>>>>  
sizeof(int32_t) + // Length of Compilation Unit Info
>>>>>>>       sizeof(int16_t) + // DWARF version number
>>>>>>>       sizeof(int32_t) + // Offset Into Abbrev.
Section
>>>>>>>       sizeof(int8_t);   // Pointer Size (in bytes)
>>>>>>>
>>>>>>>     unsigned EndOffset =
computeSizeAndOffset((*I)->getCUDie(),
>>>>>>> Offset);
>>>>>>>     SecOffset += EndOffset;
>>>>>>>   }
>>>>>>>
>>>>>>
>>>>>> Added more comments in attached patch.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Eric/Manman: rough design question: compute the
absolute offset of
>>>>>>>> each CU within the debug_info section and
describe them all relative to a
>>>>>>>> single symbol at the start of the debug_info
section, or should we put a
>>>>>>>> label at the start of each CU?
>>>>>>>>
>>>>>>>
>>>>>>> Either way should work. But since we already have
the section offset
>>>>>>> for each CU, describing relative to the start of
debug_info section saves
>>>>>>> us a few symbols :)
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> 0002:
>>>>>>>>   ref_addr_relocation.ll:
>>>>>>>>     seems a bit vague in terms of how you test
for the relocation.
>>>>>>>> I think it'd make more sense to test the
assembly, than the reafobj output,
>>>>>>>> that way you can test that the correct bytes
have the relocation rather
>>>>>>>> than just that there's "some"
.debug_info relocation in the file. For an
>>>>>>>> example, see test/DebugInfo/X86/tls.ll I wrote
- it has some "unannotated"
>>>>>>>> bytes because we still don't have a nice
way to annotate location bytes
>>>>>>>> that are DWARF expressions, but it's close
- I guess those should be
>>>>>>>> CHECK-NEXTs, though. In any case, you should be
able to check a few lines
>>>>>>>> of assembly with the # DW_AT/DW_TAG annotation
comments.
>>>>>>>>
>>>>>>>
>>>>>>> I can check for ".quad .Lsection_info+38
#DW_AT_type".
>>>>>>>
>>>>>> Done in attached patch.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>      You'd need to add the tu3.cpp from my
example if you want to
>>>>>>>> demonstrate that the relocation is actually
working as intended and
>>>>>>>> avoiding the bogus result I showed.
>>>>>>>>   type-unique-simple-a.ll
>>>>>>>>     While I agree that having common test cases
helps reduce the
>>>>>>>> number of separate invocations we have, this
test case seems like it might
>>>>>>>> be becoming a little hard to follow what's
under test. Just going from the
>>>>>>>> diff I'm not sure what's what. Could
you give a brief description of the
>>>>>>>> state of type-unique-simple2 files? What's
involved? What's it meant to
>>>>>>>> test? What's it actually testing?
>>>>>>>>
>>>>>>> I can add more comments. The source files are
included in the
>>>>>>> testing case. I am checking that llvm-link only
generates a single copy of
>>>>>>> the struct and the backend generates a single DIE
and uses ref_addr.
>>>>>>> The changes are to check "the backend
generates a single DIE and
>>>>>>> uses ref_addr".
>>>>>>>
>>>>>> Done in attached patch.
>>>>>>
>>>>>>>
>>>>>>>    DIE.h:
>>>>>>>>     checkCompileUnit could probably be called
>>>>>>>> "getCompileUnitOrNull", the name
"check*" seems to imply that it does some
>>>>>>>> checking, which isn't true.
>>>>>>>>
>>>>>>>  Will do.
>>>>>>>
>>>>>> Done in attached patch.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>   DwarfCompileUnit.cpp:
>>>>>>>>     the "istype || issubprogram"
check should probably be pulled
>>>>>>>> out into a separate function, something like
"isShareableAcrossCUs" or
>>>>>>>> something like that (please, that's not the
best name, let's discuss it
>>>>>>>> further before we settle on a name) so that
getDIE and insertDIE are sure
>>>>>>>> to use the same test at all times.
>>>>>>>>
>>>>>>> Yes, the condition is shared between getDIE and
insertDIE. I like
>>>>>>> isSharableAcrossCUs, because that is why the map is
in DwarfDebug instead
>>>>>>> of CompileUnit.
>>>>>>>
>>>>>> Done in attached patch.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>    Why does addDIEEntry take a form? When does
the caller get to
>>>>>>>> choose the form rather than the callee deciding
between ref4 and ref_addr
>>>>>>>> based on context?
>>>>>>>>
>>>>>>>
>>>>>>> addDIEEntry took a form before my change and I
didn't check why it
>>>>>>> did.
>>>>>>> I will check if all callers always use ref4, if it
it true, I will
>>>>>>> submit a cleanup patch first.
>>>>>>>
>>>>>> Done in attached patch.
>>>>>>
>>>>>>>
>>>>>>>   I'm still unsure about this worklist thing -
do your current tests
>>>>>>>> cover the need for the worklist? ie: if we
removed that logic, would tests
>>>>>>>> fail? Can you describe a specific sequence
where the worklist is necessary?
>>>>>>>>
>>>>>>>
>>>>>>> If we are sure that DIEs are always added to an
owner before calling
>>>>>>> addDIEEntry, we don't need the worklist.
>>>>>>> But I saw cases where that was not true, I will get
a reduced
>>>>>>> testing case.
>>>>>>>
>>>>>>
>>>>>> If we try to assert both DIEs have owner in
addDIEEntry, the
>>>>>> following testing cases will fail:
>>>>>>     LLVM :: DebugInfo/X86/multiple-aranges.ll
>>>>>>     LLVM :: DebugInfo/X86/ref_addr_relocation.ll
>>>>>>     LLVM ::
DebugInfo/X86/stmt-list-multiple-compile-units.ll
>>>>>>     LLVM :: DebugInfo/two-cus-from-same-file.ll
>>>>>>     LLVM :: Linker/type-unique-simple-a.ll
>>>>>>     LLVM :: Linker/type-unique-simple2.ll
>>>>>>
>>>>>> For ref_addr_relocation, it failed in
>>>>>> #5  0x0000000100b169ba in llvm::DwarfDebug::addDIEEntry
>>>>>> (this=0x103023600, Die=0x102e14090, Attribute=73,
Entry=0x10302a9d0) at
>>>>>>
/Users/manmanren/llvm_git/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:3071
>>>>>> #6  0x0000000100b040e0 in llvm::CompileUnit::addType
>>>>>> (this=0x102e13ec0, Entity=0x102e14090,
Ty={<llvm::DIScope> >>>>>> {<llvm::DIDescriptor> =
{DbgNode = 0x102e05f30}, <No data fields>}, <No
>>>>>> data fields>}, Attribute=73) at
>>>>>>
/Users/manmanren/llvm_git/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:910
>>>>>> #7  0x0000000100b05bff in
llvm::CompileUnit::createGlobalVariableDIE
>>>>>> (this=0x102e13ec0, N=0x102e068c0) at
>>>>>>
/Users/manmanren/llvm_git/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp:1505
>>>>>>
>>>>>> If we look at DwarfCompileUnit.cpp:
>>>>>>     VariableDIE = new DIE(GV.getTag());
>>>>>>     // Add to map.
>>>>>>     insertDIE(N, VariableDIE);
>>>>>>
>>>>>>     // Add name and type.
>>>>>>     addString(VariableDIE, dwarf::DW_AT_name,
GV.getDisplayName());
>>>>>>     addType(VariableDIE, GTy);
>>>>>>
>>>>>> The VariableDIE is not added to an owner yet when
calling addType.
>>>>>>
>>>>>
>>>>> I believe I have addressed all review comments, the patches
are
>>>>> re-attached here for convenience.
>>>>>
>>>>
>>>> So I'm still thinking about the work list work.
>>>>
>>>> If we don't know which CU a DIE is in - isn't it,
necessarily, going to
>>>> be in the current CU (& thus referenced by ref_data4 (using
a CU-local
>>>> offset), not ref_addr)?
>>>>
>>>
>>> That may be true. But can we prove that?
>>>
>>
>> We really shouldn't add extra complexity to the codebase just
because we
>> don't know how the codebase works - that's what leads to the
kind of
>> complexity we see in the debug info handling today.
>>
>
>>
>>>  There are two ways of handling this:
>>> 1> use a worklist to be conservative
>>> 2> do not use a worklist, but add an assertion when emitting a
DIE A,
>>> make sure the DIE referenced with ref4 is indeed in the same CU as
DIE A.
>>>
>>
>> Please just add this assertion. If it fires we'll have a test case
and a
>> concrete justification for this complexity such that should anyone
remove
>> it later because it looked unnecessary, they'll at least have a
failing
>> test to explain why it was there in the first plac .
>>
>
> The assertion fails even with a simple testing case when the referenced
> DIE has an owner and the DIE itself does not have an owner.
>
OK, sorry - I should've read your description of the assertion more
carefully. I believe the assertion you added wasn't the right thing to test
for.

I'm not sure there is a correct assertion to add here to detect the case
you're describing - perhaps a complex verification after-the-fact could be
done, but essentially if we have a DIE that's partially constructed/has no
parent we should assume it's in the current unit. If we can demonstrate a
case where this isn't true, then we should work to address that problem -
until we demonstrate that, we should not (though we might want to search
for such cases - but without type units I can't imagine how they could
occur - we build the DIE tree for one CU at a time, at no point do we have
DIEs under construction for multiple CUs).

So if we want to build a reference to a DIE, if the DIE is not in another
CU we should use ref4. (then the only other case is that it's either in
this CU or it's in no known CU at all - in which case it's under
construction and, without evidence to the contrary, will be added to the
current CU when it's done).

About the only assertion we could add would involve keeping a side-table of
"assumptions" ("we expect this DIE will be added to this
CU") and check
that those assumptions are fulfilled at some point.

> For that case, we can't assume ref4 should be used. I don't think
we can
> enforce that all DIEs must be added to a parent before calling addDIEEntry.
>
> For the specific testing case, when constructing children of a scope DIE,
> we call addDIEEntry on the children, after that, we add the children to the
> scope DIE.
> cat foo.cpp
>
> #include "a.hpp"
> void f(int a) {
>   Base t;
> }
> cat bar.cpp
>
> #include "a.hpp"
> void f(int);
> void g(int a) {
>   Base t;
> }
> int main() {
>   f(0);
>   g(1);
>   return 0;
> }
> cat a.hpp
> struct Base {
>   int a;
> };
>
> Let me know if I should investigate further.
>
> Thanks,
> Manman
>
>
>>
>>
>>>
>>> Let me know which one you prefer.
>>>
>>> Do you have any comments on the ref_addr patch (the 1st patch of
the
>>> two)?
>>>
>>
>> Nothing much - I wouldn't mind Eric taking a look (& would
rather you not
>> commit this until you're committing the second patch, since
it's otherwise
>> untested/unjustified code) on the label/offset related stuff since
I'm less
>> familiar with that.
>>
>> There's one or two cases of {} on single-statement blocks you could
fix
>> up.
>>
>> "DebugInfoOffset" (both the member and the functions to
set/get it) might
>> be more meaningfully named "SectionOffset" - but I'm not
sure. Eric? Other
>> names (DebugInfoSectionOffset?)?
>>
>> - David
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131015/5a6c789c/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Oct 2013 - [LLVMdev] [Debug Info PATCH] for support of ref_addr and removal of DIE duplication

[LLVMdev] [Debug Info PATCH] for support of ref_addr and removal of DIE duplication

[LLVMdev] [Debug Info PATCH] for support of ref_addr and removal of DIE duplication

[LLVMdev] [Debug Info PATCH] for support of ref_addr and removal of DIE duplication

[LLVMdev] [Debug Info PATCH] for support of ref_addr and removal of DIE duplication

[LLVMdev] [Debug Info PATCH] for support of ref_addr and removal of DIE duplication

Possibly Parallel Threads