thr3ads.net - llvm dev - [LLVMdev] Issues with clang-llvm debug info validity [Jul 2014]

If this information is useful, please help other people find it:
Share via:

David Blaikie

2014-Jul-09 00:15 UTC

[LLVMdev] Issues with clang-llvm debug info validity

On Fri, Jun 27, 2014 at 2:19 PM, David Blaikie <dblaikie at gmail.com>
wrote:> On Fri, Jun 27, 2014 at 2:15 PM, Adrian Prantl <aprantl at apple.com>
wrote:
>>
>>> On Jun 27, 2014, at 1:58 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>>>
>>> On Fri, Jun 27, 2014 at 1:49 PM, Adrian Prantl <aprantl at
apple.com> wrote:
>>>> I looked into this a little.
>>>
>>> Thanks for taking a look!
>>>
>>>> My proposal is this:
>>>>
>>>> - extend DINamespace with an extra UniqueID filed (analogous to
DICompositeType)
>>>> - add an extra case to DIScope::getRef()
>>>> - Let CGDebugInfo create a mangled name for each namespace to
use as UID
>>>>  -> the UID of the anonymous namespace is null.
>>>>
>>>> For example:
>>>>
>>>> namespace a { namespace b {} } -> “_ZN1a",
“_ZN1a1b"
>>>>
>>>> !0 = [compile unit]
>>>> !1 = metadata !{i32 786489, metadata !0, metadata
!”_ZN1a", metadata !”b", i32 1, metadata !”_ZN1a1b"} ; [
DW_TAG_namespace ] [a]
>>>> !2 = metadata !{i32 786489, metadata !0, null, metadata
!”a", i32 1, metadata !”_ZN1a"} ; [ DW_TAG_namespace ] [a] [line 1]
>>>>
>>>> namespace a1b {} -> “_ZN3a1b"
>>>> namespace {} -> null (!= “”) // Anonymous namespaces will
remain old-fashioned MDNode references to prevent uniquing.
>>>
>>> Will this do the right thing if there's a named namespace
inside or
>>> outside the anonymous namespace?
>>
>> This is actually really tricky. Normally the ValueSymbolTable takes
care of adding a unique suffix to all conflicting symbols. i think the most
reasonable thing to do here is to add a rule:
>>   -> the UID of any namespace that has an anonymous namespace in its
parent chain is null.
>
> Yep - I assume that's the right thing to do, just not sure if it
> would/will require a little extra work & wanted to make sure it was on
> the radar.
>
>>>> With this change we will get the full type uniquing benefits.
As a side effect, definitions that are in the same namespace will be children of
the same DW_TAG_namespace in the debug info, which I’d argue is the expected
behavior for LTO anyway.
>>>
>>> Sort of - what happens to functions defined in that namespace? Will
>>> they all end up in one CU with one instance of the namespace under
>>> LTO? Could do weird things to address ranges? (would we end up with
>>> the address range in the CU the function was originally in - but
the
>>> subprogram DIE ends up in some other CU that doesn't include
that
>>> address range)
>>
>> I think you just convinced me to save this for later and clone a
namespace for every compile unit instead. The decl_file/decl_line will have to
identical for all CUs though (and come from the module linked in first).
>
> OK - so we'll put one namespace in the metadata and just use that for
> each CU? Should be easy enough.
>
> One issue: what keeps the namespace metadata nodes live? Where do we
> find them? Are we going to add another list to the CUs?
Alternatively... does anyone really care about the source location of
a namespace? (given that we have to pick one arbitrary location that a
namespace starts at?) Should we just omit it, then namespace nodes
would deduplicate simply.

- David
>
>>
>> thanks!
>> adrian
>>
>>>
>>> - David
>>>
>>>> -- adrian
>>>>
>>>>> On Jun 24, 2014, at 4:39 PM, Adrian Prantl <aprantl at
apple.com> wrote:
>>>>>
>>>>>
>>>>>> On Jun 24, 2014, at 4:34 PM, Eric Christopher
<echristo at gmail.com> wrote:
>>>>>>
>>>>>>>> Without needing to change how namespaces are
handled.
>>>>>>>>
>>>>>>>> Yes, using DIRefs for namespaces would cause
the actual type nodes to
>>>>>>>> be deduplicated when they aren't even with
the patch above, and that
>>>>>>>> would help reduce debug info metadata further
if that's
>>>>>>>> useful/necessary.
>>>>>>>
>>>>>>> Exactly. While the above patch will cover up the
problem, we will still have many duplicate type entries during LTO that will
take up memory, even though only the last one to be entered into DIRefMap will
actually be used for emitting DWARF. Since DIRefs are also more expensive in
terms of lookup speed, I think I’d prefer to only use them only where necessary
(=as far down the MDNode DAG as possible).
>>>>>>>
>>>>>>
>>>>>> Right, though, to be fair, some numbers here might be
nice. The
>>>>>> tradeoff of "extra lookup for ref" versus
"extra memory allocation for
>>>>>> larger debug info”.
>>>>>
>>>>> Then again, it’s not like the extra lookups introduced by
David’s proposed patch would get rid of those extra memory allocations. Granted,
they do get rid of the assertion, which obviously provides some value ;-)
>>>>>
>>>>>>
>>>>>> I also don't think it'll matter for speed :)
>>>>>>
>>>>> You’re probably right about the lookup, but depending on
the project you’re building the memory usage could matter.
>>>>>
>>>>> -- adrian
>>>>>
>>>>>> -eric
>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Question to the C++ experts: Is the name
(“test” in this case) a sufficient appropriate unique ID for a C++ namespace?
>>>>>>>>
>>>>>>>> Potentially - assuming you don't do this
for anything in an anonymous
>>>>>>>> namespace. I'd probably see about using the
standard mangling
>>>>>>> Good point! We should avoid unifying the anonymous
namespace :-)
>>>>>>>> machinery as we do for other things, though, if
possible. Might keep
>>>>>>>> it more consistent.
>>>>>>>
>>>>>>> I will do some experiments with nested namespaces,
but it seems as if name mangling does not buy us much: this will merely turn
into N4test.
>>>>>>>
>>>>>>> -- adrian
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> -- adrian
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Jun 24, 2014, at 1:49 PM, David
Blaikie <dblaikie at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> +Adrian & Manman,
>>>>>>>>>>
>>>>>>>>>> Looks like this is a case of non-DIRef
references ending up in the IR,
>>>>>>>>>> and thus the references not being
deduplicated. This could lead to
>>>>>>>>>> some of the IR bloat that you guys
implemented the DIRef stuff to
>>>>>>>>>> reduce/avoid.
>>>>>>>>>>
>>>>>>>>>> The specific issue is that the type is
slightly different in the two
>>>>>>>>>> TUs (since the namespace scope it's
contained within is different in
>>>>>>>>>> the two TUs - note the namespace
preceeding the header include in
>>>>>>>>>> test2.cpp) and the actual DIRef-usage
failure is when building
>>>>>>>>>> subroutine types. The subroutine types
are made with direct references
>>>>>>>>>> to types, not with DIRef references to
types. This means they won't be
>>>>>>>>>> deduplicated/resolved to a single type
during linking correctly.
>>>>>>>>>>
>>>>>>>>>> It also leads to an assertion as per
the original email. Figured you
>>>>>>>>>> guys might want to look into/fix this
(I assume the right fix is to
>>>>>>>>>> use DIRef more pervasively in the
frontend - I'm doing a little
>>>>>>>>>> experimenting with clang now to see how
that looks).
>>>>>>>>>>
>>>>>>>>>> - David
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 13, 2014 at 11:08 PM, David
Blaikie <dblaikie at gmail.com> wrote:
>>>>>>>>>>> Yep, looks like a legit bug to me.
I'll try to reproduce it and reduce
>>>>>>>>>>> a test case, etc.
>>>>>>>>>>>
>>>>>>>>>>> (the ODR doesn't require that
the two copies of the Test class
>>>>>>>>>>> template appear on the same line in
the same file in two distinct
>>>>>>>>>>> translation units, it only requires
that the two copies have the same
>>>>>>>>>>> sequence of tokens - which they do
in your example)
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jun 13, 2014 at 5:33 PM,
Kevin Modzelewski <kmod at dropbox.com> wrote:
>>>>>>>>>>>> Hi all, sorry to post to both
lists, but I'm running into an issue where
>>>>>>>>>>>> clang-generated debug info is
deemed to be invalid by LLVM tools (throws an
>>>>>>>>>>>> assertion error in both llc and
mcjit), and I'm not sure what the proper
>>>>>>>>>>>> resolution is.
>>>>>>>>>>>>
>>>>>>>>>>>> Here's a test case; I last
tested it on revision r210953:
>>>>>>>>>>>>
>>>>>>>>>>>> $ cat test1.cpp
>>>>>>>>>>>> #include "test.h"
>>>>>>>>>>>> test::Test<int> foo1() {
>>>>>>>>>>>> return test::Test<int>();
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> $ cat test2.cpp
>>>>>>>>>>>> namespace test {
>>>>>>>>>>>> }
>>>>>>>>>>>> #include "test.h"
>>>>>>>>>>>> test::Test<int> foo2() {
>>>>>>>>>>>> return test::Test<int>();
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> $ cat test.h
>>>>>>>>>>>> namespace test {
>>>>>>>>>>>> template <class T>
>>>>>>>>>>>> class Test {
>>>>>>>>>>>> };
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> $ clang++ -g -emit-llvm
test1.cpp -S -o test1.ll
>>>>>>>>>>>> $ clang++ -g -emit-llvm
test2.cpp -S -o test2.ll
>>>>>>>>>>>> $ llvm-link test1.ll test2.ll
-S -o linked.ll
>>>>>>>>>>>> $ llc linked.ll
>>>>>>>>>>>>
>>>>>>>>>>>> The last step raises an
assertion error, assuming you have assertions
>>>>>>>>>>>> enabled:
>>>>>>>>>>>> llc:
lib/CodeGen/AsmPrinter/DwarfUnit.cpp:953: llvm::DIE*
>>>>>>>>>>>>
llvm::DwarfUnit::getOrCreateTypeDIE(const llvm::MDNode*): Assertion `Ty
=>>>>>>>>>>>> resolve(Ty.getRef())
&& "type was not uniqued, possible ODR violation."'
>>>>>>>>>>>> failed
>>>>>>>>>>>>
>>>>>>>>>>>> The initial introduction of the
"test" namespace in test2.cpp seems to
>>>>>>>>>>>> confuse clang -- test1.cpp and
test2.cpp have non-compatible debug
>>>>>>>>>>>> information for the test::Test
type.  test1.cpp says that test::Test's
>>>>>>>>>>>> namespace is defined in test.h,
and test2.cpp says that test::Test's
>>>>>>>>>>>> namespace is defined in
test2.cpp (they both say that the class itself is
>>>>>>>>>>>> defined in test.h).  Regardless
of whether LLVM tools should allow this,
>>>>>>>>>>>> this seems like the wrong
output for these files?
>>>>>>>>>>>>
>>>>>>>>>>>> Another related example, is if
we simply defined the test::Test template in
>>>>>>>>>>>> both files (ie, manually
executed the #include "test.h") and got rid of the
>>>>>>>>>>>> test2.cpp "namespace test
{}" definition, we'd run into the same assertion,
>>>>>>>>>>>> without hitting the weird clang
behavior with the empty namespace
>>>>>>>>>>>> pre-declaration.
>>>>>>>>>>>>
>>>>>>>>>>>> It's probably also worth
noting that compiling directly with clang, ie
>>>>>>>>>>>> adding a "int main()
{}" function and then doing
>>>>>>>>>>>> $ clang++ -g test1.cpp
test2.cpp -o test
>>>>>>>>>>>> works fine.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm not a C++ expert and I
don't know if these template issues really are
>>>>>>>>>>>> violations of the ODR, but it
feels like this should be allowed, and
>>>>>>>>>>>> somewhere in the llvm toolchain
(llvm-link? DwarfUnit.cpp?) this should be
>>>>>>>>>>>> handled.  What do people think?
>>>>>>>>>>>>
>>>>>>>>>>>> kmod
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>

Eric Christopher

2014-Jul-09 00:16 UTC

head link

[LLVMdev] Issues with clang-llvm debug info validity

On Tue, Jul 8, 2014 at 5:15 PM, David Blaikie <dblaikie at gmail.com>
wrote:> On Fri, Jun 27, 2014 at 2:19 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>> On Fri, Jun 27, 2014 at 2:15 PM, Adrian Prantl <aprantl at
apple.com> wrote:
>>>
>>>> On Jun 27, 2014, at 1:58 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>>>>
>>>> On Fri, Jun 27, 2014 at 1:49 PM, Adrian Prantl <aprantl at
apple.com> wrote:
>>>>> I looked into this a little.
>>>>
>>>> Thanks for taking a look!
>>>>
>>>>> My proposal is this:
>>>>>
>>>>> - extend DINamespace with an extra UniqueID filed
(analogous to DICompositeType)
>>>>> - add an extra case to DIScope::getRef()
>>>>> - Let CGDebugInfo create a mangled name for each namespace
to use as UID
>>>>>  -> the UID of the anonymous namespace is null.
>>>>>
>>>>> For example:
>>>>>
>>>>> namespace a { namespace b {} } -> “_ZN1a",
“_ZN1a1b"
>>>>>
>>>>> !0 = [compile unit]
>>>>> !1 = metadata !{i32 786489, metadata !0, metadata
!”_ZN1a", metadata !”b", i32 1, metadata !”_ZN1a1b"} ; [
DW_TAG_namespace ] [a]
>>>>> !2 = metadata !{i32 786489, metadata !0, null, metadata
!”a", i32 1, metadata !”_ZN1a"} ; [ DW_TAG_namespace ] [a] [line 1]
>>>>>
>>>>> namespace a1b {} -> “_ZN3a1b"
>>>>> namespace {} -> null (!= “”) // Anonymous namespaces
will remain old-fashioned MDNode references to prevent uniquing.
>>>>
>>>> Will this do the right thing if there's a named namespace
inside or
>>>> outside the anonymous namespace?
>>>
>>> This is actually really tricky. Normally the ValueSymbolTable takes
care of adding a unique suffix to all conflicting symbols. i think the most
reasonable thing to do here is to add a rule:
>>>   -> the UID of any namespace that has an anonymous namespace in
its parent chain is null.
>>
>> Yep - I assume that's the right thing to do, just not sure if it
>> would/will require a little extra work & wanted to make sure it was
on
>> the radar.
>>
>>>>> With this change we will get the full type uniquing
benefits. As a side effect, definitions that are in the same namespace will be
children of the same DW_TAG_namespace in the debug info, which I’d argue is the
expected behavior for LTO anyway.
>>>>
>>>> Sort of - what happens to functions defined in that namespace?
Will
>>>> they all end up in one CU with one instance of the namespace
under
>>>> LTO? Could do weird things to address ranges? (would we end up
with
>>>> the address range in the CU the function was originally in -
but the
>>>> subprogram DIE ends up in some other CU that doesn't
include that
>>>> address range)
>>>
>>> I think you just convinced me to save this for later and clone a
namespace for every compile unit instead. The decl_file/decl_line will have to
identical for all CUs though (and come from the module linked in first).
>>
>> OK - so we'll put one namespace in the metadata and just use that
for
>> each CU? Should be easy enough.
>>
>> One issue: what keeps the namespace metadata nodes live? Where do we
>> find them? Are we going to add another list to the CUs?
>
> Alternatively... does anyone really care about the source location of
> a namespace? (given that we have to pick one arbitrary location that a
> namespace starts at?) Should we just omit it, then namespace nodes
> would deduplicate simply.
>
Can't come up with a use for it off the top of my head, no.

-eric
> - David
>
>>
>>>
>>> thanks!
>>> adrian
>>>
>>>>
>>>> - David
>>>>
>>>>> -- adrian
>>>>>
>>>>>> On Jun 24, 2014, at 4:39 PM, Adrian Prantl <aprantl
at apple.com> wrote:
>>>>>>
>>>>>>
>>>>>>> On Jun 24, 2014, at 4:34 PM, Eric Christopher
<echristo at gmail.com> wrote:
>>>>>>>
>>>>>>>>> Without needing to change how namespaces
are handled.
>>>>>>>>>
>>>>>>>>> Yes, using DIRefs for namespaces would
cause the actual type nodes to
>>>>>>>>> be deduplicated when they aren't even
with the patch above, and that
>>>>>>>>> would help reduce debug info metadata
further if that's
>>>>>>>>> useful/necessary.
>>>>>>>>
>>>>>>>> Exactly. While the above patch will cover up
the problem, we will still have many duplicate type entries during LTO that will
take up memory, even though only the last one to be entered into DIRefMap will
actually be used for emitting DWARF. Since DIRefs are also more expensive in
terms of lookup speed, I think I’d prefer to only use them only where necessary
(=as far down the MDNode DAG as possible).
>>>>>>>>
>>>>>>>
>>>>>>> Right, though, to be fair, some numbers here might
be nice. The
>>>>>>> tradeoff of "extra lookup for ref" versus
"extra memory allocation for
>>>>>>> larger debug info”.
>>>>>>
>>>>>> Then again, it’s not like the extra lookups introduced
by David’s proposed patch would get rid of those extra memory allocations.
Granted, they do get rid of the assertion, which obviously provides some value
;-)
>>>>>>
>>>>>>>
>>>>>>> I also don't think it'll matter for speed
:)
>>>>>>>
>>>>>> You’re probably right about the lookup, but depending
on the project you’re building the memory usage could matter.
>>>>>>
>>>>>> -- adrian
>>>>>>
>>>>>>> -eric
>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Question to the C++ experts: Is the
name (“test” in this case) a sufficient appropriate unique ID for a C++
namespace?
>>>>>>>>>
>>>>>>>>> Potentially - assuming you don't do
this for anything in an anonymous
>>>>>>>>> namespace. I'd probably see about using
the standard mangling
>>>>>>>> Good point! We should avoid unifying the
anonymous namespace :-)
>>>>>>>>> machinery as we do for other things,
though, if possible. Might keep
>>>>>>>>> it more consistent.
>>>>>>>>
>>>>>>>> I will do some experiments with nested
namespaces, but it seems as if name mangling does not buy us much: this will
merely turn into N4test.
>>>>>>>>
>>>>>>>> -- adrian
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -- adrian
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On Jun 24, 2014, at 1:49 PM, David
Blaikie <dblaikie at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> +Adrian & Manman,
>>>>>>>>>>>
>>>>>>>>>>> Looks like this is a case of
non-DIRef references ending up in the IR,
>>>>>>>>>>> and thus the references not being
deduplicated. This could lead to
>>>>>>>>>>> some of the IR bloat that you guys
implemented the DIRef stuff to
>>>>>>>>>>> reduce/avoid.
>>>>>>>>>>>
>>>>>>>>>>> The specific issue is that the type
is slightly different in the two
>>>>>>>>>>> TUs (since the namespace scope
it's contained within is different in
>>>>>>>>>>> the two TUs - note the namespace
preceeding the header include in
>>>>>>>>>>> test2.cpp) and the actual
DIRef-usage failure is when building
>>>>>>>>>>> subroutine types. The subroutine
types are made with direct references
>>>>>>>>>>> to types, not with DIRef references
to types. This means they won't be
>>>>>>>>>>> deduplicated/resolved to a single
type during linking correctly.
>>>>>>>>>>>
>>>>>>>>>>> It also leads to an assertion as
per the original email. Figured you
>>>>>>>>>>> guys might want to look into/fix
this (I assume the right fix is to
>>>>>>>>>>> use DIRef more pervasively in the
frontend - I'm doing a little
>>>>>>>>>>> experimenting with clang now to see
how that looks).
>>>>>>>>>>>
>>>>>>>>>>> - David
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jun 13, 2014 at 11:08 PM,
David Blaikie <dblaikie at gmail.com> wrote:
>>>>>>>>>>>> Yep, looks like a legit bug to
me. I'll try to reproduce it and reduce
>>>>>>>>>>>> a test case, etc.
>>>>>>>>>>>>
>>>>>>>>>>>> (the ODR doesn't require
that the two copies of the Test class
>>>>>>>>>>>> template appear on the same
line in the same file in two distinct
>>>>>>>>>>>> translation units, it only
requires that the two copies have the same
>>>>>>>>>>>> sequence of tokens - which they
do in your example)
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jun 13, 2014 at 5:33
PM, Kevin Modzelewski <kmod at dropbox.com> wrote:
>>>>>>>>>>>>> Hi all, sorry to post to
both lists, but I'm running into an issue where
>>>>>>>>>>>>> clang-generated debug info
is deemed to be invalid by LLVM tools (throws an
>>>>>>>>>>>>> assertion error in both llc
and mcjit), and I'm not sure what the proper
>>>>>>>>>>>>> resolution is.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here's a test case; I
last tested it on revision r210953:
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ cat test1.cpp
>>>>>>>>>>>>> #include "test.h"
>>>>>>>>>>>>> test::Test<int>
foo1() {
>>>>>>>>>>>>> return
test::Test<int>();
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ cat test2.cpp
>>>>>>>>>>>>> namespace test {
>>>>>>>>>>>>> }
>>>>>>>>>>>>> #include "test.h"
>>>>>>>>>>>>> test::Test<int>
foo2() {
>>>>>>>>>>>>> return
test::Test<int>();
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ cat test.h
>>>>>>>>>>>>> namespace test {
>>>>>>>>>>>>> template <class T>
>>>>>>>>>>>>> class Test {
>>>>>>>>>>>>> };
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> $ clang++ -g -emit-llvm
test1.cpp -S -o test1.ll
>>>>>>>>>>>>> $ clang++ -g -emit-llvm
test2.cpp -S -o test2.ll
>>>>>>>>>>>>> $ llvm-link test1.ll
test2.ll -S -o linked.ll
>>>>>>>>>>>>> $ llc linked.ll
>>>>>>>>>>>>>
>>>>>>>>>>>>> The last step raises an
assertion error, assuming you have assertions
>>>>>>>>>>>>> enabled:
>>>>>>>>>>>>> llc:
lib/CodeGen/AsmPrinter/DwarfUnit.cpp:953: llvm::DIE*
>>>>>>>>>>>>>
llvm::DwarfUnit::getOrCreateTypeDIE(const llvm::MDNode*): Assertion `Ty
=>>>>>>>>>>>>> resolve(Ty.getRef())
&& "type was not uniqued, possible ODR violation."'
>>>>>>>>>>>>> failed
>>>>>>>>>>>>>
>>>>>>>>>>>>> The initial introduction of
the "test" namespace in test2.cpp seems to
>>>>>>>>>>>>> confuse clang -- test1.cpp
and test2.cpp have non-compatible debug
>>>>>>>>>>>>> information for the
test::Test type.  test1.cpp says that test::Test's
>>>>>>>>>>>>> namespace is defined in
test.h, and test2.cpp says that test::Test's
>>>>>>>>>>>>> namespace is defined in
test2.cpp (they both say that the class itself is
>>>>>>>>>>>>> defined in test.h). 
Regardless of whether LLVM tools should allow this,
>>>>>>>>>>>>> this seems like the wrong
output for these files?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Another related example, is
if we simply defined the test::Test template in
>>>>>>>>>>>>> both files (ie, manually
executed the #include "test.h") and got rid of the
>>>>>>>>>>>>> test2.cpp "namespace
test {}" definition, we'd run into the same assertion,
>>>>>>>>>>>>> without hitting the weird
clang behavior with the empty namespace
>>>>>>>>>>>>> pre-declaration.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It's probably also
worth noting that compiling directly with clang, ie
>>>>>>>>>>>>> adding a "int main()
{}" function and then doing
>>>>>>>>>>>>> $ clang++ -g test1.cpp
test2.cpp -o test
>>>>>>>>>>>>> works fine.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm not a C++ expert
and I don't know if these template issues really are
>>>>>>>>>>>>> violations of the ODR, but
it feels like this should be allowed, and
>>>>>>>>>>>>> somewhere in the llvm
toolchain (llvm-link? DwarfUnit.cpp?) this should be
>>>>>>>>>>>>> handled.  What do people
think?
>>>>>>>>>>>>>
>>>>>>>>>>>>> kmod
>>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>>>>>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>
>>>

Adrian Prantl

2014-Jul-09 07:21 UTC

head link

[LLVMdev] Issues with clang-llvm debug info validity

On Jul 9, 2014, at 2:15 AM, David Blaikie <dblaikie at gmail.com> wrote:
> On Fri, Jun 27, 2014 at 2:19 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>> On Fri, Jun 27, 2014 at 2:15 PM, Adrian Prantl <aprantl at
apple.com> wrote:
>>> 
>>>> On Jun 27, 2014, at 1:58 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>>>> 
>>>> On Fri, Jun 27, 2014 at 1:49 PM, Adrian Prantl <aprantl at
apple.com> wrote:
>>>>> I looked into this a little.
>>>> 
>>>> Thanks for taking a look!
>>>> 
>>>>> My proposal is this:
>>>>> 
>>>>> - extend DINamespace with an extra UniqueID filed
(analogous to DICompositeType)
>>>>> - add an extra case to DIScope::getRef()
>>>>> - Let CGDebugInfo create a mangled name for each namespace
to use as UID
>>>>> -> the UID of the anonymous namespace is null.
>>>>> 
>>>>> For example:
>>>>> 
>>>>> namespace a { namespace b {} } -> “_ZN1a",
“_ZN1a1b"
>>>>> 
>>>>> !0 = [compile unit]
>>>>> !1 = metadata !{i32 786489, metadata !0, metadata
!”_ZN1a", metadata !”b", i32 1, metadata !”_ZN1a1b"} ; [
DW_TAG_namespace ] [a]
>>>>> !2 = metadata !{i32 786489, metadata !0, null, metadata
!”a", i32 1, metadata !”_ZN1a"} ; [ DW_TAG_namespace ] [a] [line 1]
>>>>> 
>>>>> namespace a1b {} -> “_ZN3a1b"
>>>>> namespace {} -> null (!= “”) // Anonymous namespaces
will remain old-fashioned MDNode references to prevent uniquing.
>>>> 
>>>> Will this do the right thing if there's a named namespace
inside or
>>>> outside the anonymous namespace?
>>> 
>>> This is actually really tricky. Normally the ValueSymbolTable takes
care of adding a unique suffix to all conflicting symbols. i think the most
reasonable thing to do here is to add a rule:
>>> -> the UID of any namespace that has an anonymous namespace in
its parent chain is null.
>> 
>> Yep - I assume that's the right thing to do, just not sure if it
>> would/will require a little extra work & wanted to make sure it was
on
>> the radar.
>> 
>>>>> With this change we will get the full type uniquing
benefits. As a side effect, definitions that are in the same namespace will be
children of the same DW_TAG_namespace in the debug info, which I’d argue is the
expected behavior for LTO anyway.
>>>> 
>>>> Sort of - what happens to functions defined in that namespace?
Will
>>>> they all end up in one CU with one instance of the namespace
under
>>>> LTO? Could do weird things to address ranges? (would we end up
with
>>>> the address range in the CU the function was originally in -
but the
>>>> subprogram DIE ends up in some other CU that doesn't
include that
>>>> address range)
>>> 
>>> I think you just convinced me to save this for later and clone a
namespace for every compile unit instead. The decl_file/decl_line will have to
identical for all CUs though (and come from the module linked in first).
>> 
>> OK - so we'll put one namespace in the metadata and just use that
for
>> each CU? Should be easy enough.
>> 
>> One issue: what keeps the namespace metadata nodes live? Where do we
>> find them? Are we going to add another list to the CUs?
> 
> Alternatively... does anyone really care about the source location of
> a namespace? (given that we have to pick one arbitrary location that a
> namespace starts at?) Should we just omit it, then namespace nodes
> would deduplicate simply.
At least for C++ the source location for a namespace is arbitrary and not very
useful. We would have to do something with the context field though (the second
one): for a top-level namespace it currently points to the compile unit, which
would counter any uniquing attempts.

-- adrian> 
> - David
> 
>> 
>>> 
>>> thanks!
>>> adrian
>>> 
>>>> 
>>>> - David
>>>> 
>>>>> -- adrian
>>>>> 
>>>>>> On Jun 24, 2014, at 4:39 PM, Adrian Prantl <aprantl
at apple.com> wrote:
>>>>>> 
>>>>>> 
>>>>>>> On Jun 24, 2014, at 4:34 PM, Eric Christopher
<echristo at gmail.com> wrote:
>>>>>>> 
>>>>>>>>> Without needing to change how namespaces
are handled.
>>>>>>>>> 
>>>>>>>>> Yes, using DIRefs for namespaces would
cause the actual type nodes to
>>>>>>>>> be deduplicated when they aren't even
with the patch above, and that
>>>>>>>>> would help reduce debug info metadata
further if that's
>>>>>>>>> useful/necessary.
>>>>>>>> 
>>>>>>>> Exactly. While the above patch will cover up
the problem, we will still have many duplicate type entries during LTO that will
take up memory, even though only the last one to be entered into DIRefMap will
actually be used for emitting DWARF. Since DIRefs are also more expensive in
terms of lookup speed, I think I’d prefer to only use them only where necessary
(=as far down the MDNode DAG as possible).
>>>>>>>> 
>>>>>>> 
>>>>>>> Right, though, to be fair, some numbers here might
be nice. The
>>>>>>> tradeoff of "extra lookup for ref" versus
"extra memory allocation for
>>>>>>> larger debug info”.
>>>>>> 
>>>>>> Then again, it’s not like the extra lookups introduced
by David’s proposed patch would get rid of those extra memory allocations.
Granted, they do get rid of the assertion, which obviously provides some value
;-)
>>>>>> 
>>>>>>> 
>>>>>>> I also don't think it'll matter for speed
:)
>>>>>>> 
>>>>>> You’re probably right about the lookup, but depending
on the project you’re building the memory usage could matter.
>>>>>> 
>>>>>> -- adrian
>>>>>> 
>>>>>>> -eric
>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Question to the C++ experts: Is the
name (“test” in this case) a sufficient appropriate unique ID for a C++
namespace?
>>>>>>>>> 
>>>>>>>>> Potentially - assuming you don't do
this for anything in an anonymous
>>>>>>>>> namespace. I'd probably see about using
the standard mangling
>>>>>>>> Good point! We should avoid unifying the
anonymous namespace :-)
>>>>>>>>> machinery as we do for other things,
though, if possible. Might keep
>>>>>>>>> it more consistent.
>>>>>>>> 
>>>>>>>> I will do some experiments with nested
namespaces, but it seems as if name mangling does not buy us much: this will
merely turn into N4test.
>>>>>>>> 
>>>>>>>> -- adrian
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> -- adrian
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Jun 24, 2014, at 1:49 PM, David
Blaikie <dblaikie at gmail.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> +Adrian & Manman,
>>>>>>>>>>> 
>>>>>>>>>>> Looks like this is a case of
non-DIRef references ending up in the IR,
>>>>>>>>>>> and thus the references not being
deduplicated. This could lead to
>>>>>>>>>>> some of the IR bloat that you guys
implemented the DIRef stuff to
>>>>>>>>>>> reduce/avoid.
>>>>>>>>>>> 
>>>>>>>>>>> The specific issue is that the type
is slightly different in the two
>>>>>>>>>>> TUs (since the namespace scope
it's contained within is different in
>>>>>>>>>>> the two TUs - note the namespace
preceeding the header include in
>>>>>>>>>>> test2.cpp) and the actual
DIRef-usage failure is when building
>>>>>>>>>>> subroutine types. The subroutine
types are made with direct references
>>>>>>>>>>> to types, not with DIRef references
to types. This means they won't be
>>>>>>>>>>> deduplicated/resolved to a single
type during linking correctly.
>>>>>>>>>>> 
>>>>>>>>>>> It also leads to an assertion as
per the original email. Figured you
>>>>>>>>>>> guys might want to look into/fix
this (I assume the right fix is to
>>>>>>>>>>> use DIRef more pervasively in the
frontend - I'm doing a little
>>>>>>>>>>> experimenting with clang now to see
how that looks).
>>>>>>>>>>> 
>>>>>>>>>>> - David
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Jun 13, 2014 at 11:08 PM,
David Blaikie <dblaikie at gmail.com> wrote:
>>>>>>>>>>>> Yep, looks like a legit bug to
me. I'll try to reproduce it and reduce
>>>>>>>>>>>> a test case, etc.
>>>>>>>>>>>> 
>>>>>>>>>>>> (the ODR doesn't require
that the two copies of the Test class
>>>>>>>>>>>> template appear on the same
line in the same file in two distinct
>>>>>>>>>>>> translation units, it only
requires that the two copies have the same
>>>>>>>>>>>> sequence of tokens - which they
do in your example)
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Jun 13, 2014 at 5:33
PM, Kevin Modzelewski <kmod at dropbox.com> wrote:
>>>>>>>>>>>>> Hi all, sorry to post to
both lists, but I'm running into an issue where
>>>>>>>>>>>>> clang-generated debug info
is deemed to be invalid by LLVM tools (throws an
>>>>>>>>>>>>> assertion error in both llc
and mcjit), and I'm not sure what the proper
>>>>>>>>>>>>> resolution is.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Here's a test case; I
last tested it on revision r210953:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> $ cat test1.cpp
>>>>>>>>>>>>> #include "test.h"
>>>>>>>>>>>>> test::Test<int>
foo1() {
>>>>>>>>>>>>> return
test::Test<int>();
>>>>>>>>>>>>> }
>>>>>>>>>>>>> 
>>>>>>>>>>>>> $ cat test2.cpp
>>>>>>>>>>>>> namespace test {
>>>>>>>>>>>>> }
>>>>>>>>>>>>> #include "test.h"
>>>>>>>>>>>>> test::Test<int>
foo2() {
>>>>>>>>>>>>> return
test::Test<int>();
>>>>>>>>>>>>> }
>>>>>>>>>>>>> 
>>>>>>>>>>>>> $ cat test.h
>>>>>>>>>>>>> namespace test {
>>>>>>>>>>>>> template <class T>
>>>>>>>>>>>>> class Test {
>>>>>>>>>>>>> };
>>>>>>>>>>>>> }
>>>>>>>>>>>>> 
>>>>>>>>>>>>> $ clang++ -g -emit-llvm
test1.cpp -S -o test1.ll
>>>>>>>>>>>>> $ clang++ -g -emit-llvm
test2.cpp -S -o test2.ll
>>>>>>>>>>>>> $ llvm-link test1.ll
test2.ll -S -o linked.ll
>>>>>>>>>>>>> $ llc linked.ll
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The last step raises an
assertion error, assuming you have assertions
>>>>>>>>>>>>> enabled:
>>>>>>>>>>>>> llc:
lib/CodeGen/AsmPrinter/DwarfUnit.cpp:953: llvm::DIE*
>>>>>>>>>>>>>
llvm::DwarfUnit::getOrCreateTypeDIE(const llvm::MDNode*): Assertion `Ty
=>>>>>>>>>>>>> resolve(Ty.getRef())
&& "type was not uniqued, possible ODR violation."'
>>>>>>>>>>>>> failed
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The initial introduction of
the "test" namespace in test2.cpp seems to
>>>>>>>>>>>>> confuse clang -- test1.cpp
and test2.cpp have non-compatible debug
>>>>>>>>>>>>> information for the
test::Test type.  test1.cpp says that test::Test's
>>>>>>>>>>>>> namespace is defined in
test.h, and test2.cpp says that test::Test's
>>>>>>>>>>>>> namespace is defined in
test2.cpp (they both say that the class itself is
>>>>>>>>>>>>> defined in test.h). 
Regardless of whether LLVM tools should allow this,
>>>>>>>>>>>>> this seems like the wrong
output for these files?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Another related example, is
if we simply defined the test::Test template in
>>>>>>>>>>>>> both files (ie, manually
executed the #include "test.h") and got rid of the
>>>>>>>>>>>>> test2.cpp "namespace
test {}" definition, we'd run into the same assertion,
>>>>>>>>>>>>> without hitting the weird
clang behavior with the empty namespace
>>>>>>>>>>>>> pre-declaration.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It's probably also
worth noting that compiling directly with clang, ie
>>>>>>>>>>>>> adding a "int main()
{}" function and then doing
>>>>>>>>>>>>> $ clang++ -g test1.cpp
test2.cpp -o test
>>>>>>>>>>>>> works fine.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'm not a C++ expert
and I don't know if these template issues really are
>>>>>>>>>>>>> violations of the ODR, but
it feels like this should be allowed, and
>>>>>>>>>>>>> somewhere in the llvm
toolchain (llvm-link? DwarfUnit.cpp?) this should be
>>>>>>>>>>>>> handled.  What do people
think?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> kmod
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>>>>>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Robinson, Paul

2014-Jul-09 14:07 UTC

head link

[LLVMdev] Issues with clang-llvm debug info validity

> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu] On
> Behalf Of Adrian Prantl
> 
> On Jul 9, 2014, at 2:15 AM, David Blaikie <dblaikie at gmail.com>
wrote:
> 
> > On Fri, Jun 27, 2014 at 2:19 PM, David Blaikie <dblaikie at
gmail.com>
> wrote:
> >> On Fri, Jun 27, 2014 at 2:15 PM, Adrian Prantl <aprantl at
apple.com>
> wrote:
> >>>
> >>>> On Jun 27, 2014, at 1:58 PM, David Blaikie <dblaikie at
gmail.com>
> wrote:
> >>>>
> >>>> On Fri, Jun 27, 2014 at 1:49 PM, Adrian Prantl <aprantl
at apple.com>
> wrote:
> >>>>> I looked into this a little.
> >>>>
> >>>> Thanks for taking a look!
> >>>>
> >>>>> My proposal is this:
> >>>>>
> >>>>> - extend DINamespace with an extra UniqueID filed
(analogous to
> DICompositeType)
> >>>>> - add an extra case to DIScope::getRef()
> >>>>> - Let CGDebugInfo create a mangled name for each
namespace to use as
> UID
> >>>>> -> the UID of the anonymous namespace is null.
> >>>>>
> >>>>> For example:
> >>>>>
> >>>>> namespace a { namespace b {} } ->
"_ZN1a", "_ZN1a1b"
> >>>>>
> >>>>> !0 = [compile unit]
> >>>>> !1 = metadata !{i32 786489, metadata !0, metadata
!"_ZN1a", metadata
> !"b", i32 1, metadata !"_ZN1a1b"} ; [ DW_TAG_namespace
] [a]
> >>>>> !2 = metadata !{i32 786489, metadata !0, null,
metadata !"a", i32 1,
> metadata !"_ZN1a"} ; [ DW_TAG_namespace ] [a] [line 1]
> >>>>>
> >>>>> namespace a1b {} -> "_ZN3a1b"
> >>>>> namespace {} -> null (!= "") // Anonymous
namespaces will remain
> old-fashioned MDNode references to prevent uniquing.
> >>>>
> >>>> Will this do the right thing if there's a named
namespace inside or
> >>>> outside the anonymous namespace?
> >>>
> >>> This is actually really tricky. Normally the ValueSymbolTable
takes
> care of adding a unique suffix to all conflicting symbols. i think the
> most reasonable thing to do here is to add a rule:
> >>> -> the UID of any namespace that has an anonymous namespace
in its
> parent chain is null.
> >>
> >> Yep - I assume that's the right thing to do, just not sure if
it
> >> would/will require a little extra work & wanted to make sure
it was on
> >> the radar.
> >>
> >>>>> With this change we will get the full type uniquing
benefits. As a
> side effect, definitions that are in the same namespace will be children
> of the same DW_TAG_namespace in the debug info, which I'd argue is the
> expected behavior for LTO anyway.
> >>>>
> >>>> Sort of - what happens to functions defined in that
namespace? Will
> >>>> they all end up in one CU with one instance of the
namespace under
> >>>> LTO? Could do weird things to address ranges? (would we
end up with
> >>>> the address range in the CU the function was originally in
- but the
> >>>> subprogram DIE ends up in some other CU that doesn't
include that
> >>>> address range)
> >>>
> >>> I think you just convinced me to save this for later and clone
a
> namespace for every compile unit instead. The decl_file/decl_line will
> have to identical for all CUs though (and come from the module linked in
> first).
> >>
> >> OK - so we'll put one namespace in the metadata and just use
that for
> >> each CU? Should be easy enough.
> >>
> >> One issue: what keeps the namespace metadata nodes live? Where do
we
> >> find them? Are we going to add another list to the CUs?
> >
> > Alternatively... does anyone really care about the source location of
> > a namespace? (given that we have to pick one arbitrary location that a
> > namespace starts at?) Should we just omit it, then namespace nodes
> > would deduplicate simply.
> 
> At least for C++ the source location for a namespace is arbitrary and not
> very useful. We would have to do something with the context field though
> (the second one): for a top-level namespace it currently points to the
> compile unit, which would counter any uniquing attempts.
> 
> -- Adrian
Source location is useless on namespace entries, it would save a small amount
of space to nuke it. I'd say go ahead.
--paulr

David Blaikie

2014-Jul-09 15:17 UTC

head link

[LLVMdev] Issues with clang-llvm debug info validity

On Wed, Jul 9, 2014 at 12:21 AM, Adrian Prantl <aprantl at apple.com>
wrote:>
> On Jul 9, 2014, at 2:15 AM, David Blaikie <dblaikie at gmail.com>
wrote:
>
>> On Fri, Jun 27, 2014 at 2:19 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>>> On Fri, Jun 27, 2014 at 2:15 PM, Adrian Prantl <aprantl at
apple.com> wrote:
>>>>
>>>>> On Jun 27, 2014, at 1:58 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>>>>>
>>>>> On Fri, Jun 27, 2014 at 1:49 PM, Adrian Prantl <aprantl
at apple.com> wrote:
>>>>>> I looked into this a little.
>>>>>
>>>>> Thanks for taking a look!
>>>>>
>>>>>> My proposal is this:
>>>>>>
>>>>>> - extend DINamespace with an extra UniqueID filed
(analogous to DICompositeType)
>>>>>> - add an extra case to DIScope::getRef()
>>>>>> - Let CGDebugInfo create a mangled name for each
namespace to use as UID
>>>>>> -> the UID of the anonymous namespace is null.
>>>>>>
>>>>>> For example:
>>>>>>
>>>>>> namespace a { namespace b {} } -> “_ZN1a",
“_ZN1a1b"
>>>>>>
>>>>>> !0 = [compile unit]
>>>>>> !1 = metadata !{i32 786489, metadata !0, metadata
!”_ZN1a", metadata !”b", i32 1, metadata !”_ZN1a1b"} ; [
DW_TAG_namespace ] [a]
>>>>>> !2 = metadata !{i32 786489, metadata !0, null, metadata
!”a", i32 1, metadata !”_ZN1a"} ; [ DW_TAG_namespace ] [a] [line 1]
>>>>>>
>>>>>> namespace a1b {} -> “_ZN3a1b"
>>>>>> namespace {} -> null (!= “”) // Anonymous namespaces
will remain old-fashioned MDNode references to prevent uniquing.
>>>>>
>>>>> Will this do the right thing if there's a named
namespace inside or
>>>>> outside the anonymous namespace?
>>>>
>>>> This is actually really tricky. Normally the ValueSymbolTable
takes care of adding a unique suffix to all conflicting symbols. i think the
most reasonable thing to do here is to add a rule:
>>>> -> the UID of any namespace that has an anonymous namespace
in its parent chain is null.
>>>
>>> Yep - I assume that's the right thing to do, just not sure if
it
>>> would/will require a little extra work & wanted to make sure it
was on
>>> the radar.
>>>
>>>>>> With this change we will get the full type uniquing
benefits. As a side effect, definitions that are in the same namespace will be
children of the same DW_TAG_namespace in the debug info, which I’d argue is the
expected behavior for LTO anyway.
>>>>>
>>>>> Sort of - what happens to functions defined in that
namespace? Will
>>>>> they all end up in one CU with one instance of the
namespace under
>>>>> LTO? Could do weird things to address ranges? (would we end
up with
>>>>> the address range in the CU the function was originally in
- but the
>>>>> subprogram DIE ends up in some other CU that doesn't
include that
>>>>> address range)
>>>>
>>>> I think you just convinced me to save this for later and clone
a namespace for every compile unit instead. The decl_file/decl_line will have to
identical for all CUs though (and come from the module linked in first).
>>>
>>> OK - so we'll put one namespace in the metadata and just use
that for
>>> each CU? Should be easy enough.
>>>
>>> One issue: what keeps the namespace metadata nodes live? Where do
we
>>> find them? Are we going to add another list to the CUs?
>>
>> Alternatively... does anyone really care about the source location of
>> a namespace? (given that we have to pick one arbitrary location that a
>> namespace starts at?) Should we just omit it, then namespace nodes
>> would deduplicate simply.
>
> At least for C++ the source location for a namespace is arbitrary and not
very useful. We would have to do something with the context field though (the
second one): for a top-level namespace it currently points to the compile unit,
which would counter any uniquing attempts.
Actually I had the same thought & had tested it. It seems we don't
link top level namespaces to compile units... see
DIBuilder::createNameSpace and its use of getNonCompileUnitScope. If
you specify a scope with the DW_TAG_compile_unit, DIBuilder uses a
null scope instead. (not sure why we do this for namespaces and
perhaps not for types, functions, etc - we could probably just fix
that in the frontend (getContextDescriptor) and drop this
workaround/conditional in LLVM)

- David
>
> -- adrian
>>
>> - David
>>
>>>
>>>>
>>>> thanks!
>>>> adrian
>>>>
>>>>>
>>>>> - David
>>>>>
>>>>>> -- adrian
>>>>>>
>>>>>>> On Jun 24, 2014, at 4:39 PM, Adrian Prantl
<aprantl at apple.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On Jun 24, 2014, at 4:34 PM, Eric Christopher
<echristo at gmail.com> wrote:
>>>>>>>>
>>>>>>>>>> Without needing to change how
namespaces are handled.
>>>>>>>>>>
>>>>>>>>>> Yes, using DIRefs for namespaces would
cause the actual type nodes to
>>>>>>>>>> be deduplicated when they aren't
even with the patch above, and that
>>>>>>>>>> would help reduce debug info metadata
further if that's
>>>>>>>>>> useful/necessary.
>>>>>>>>>
>>>>>>>>> Exactly. While the above patch will cover
up the problem, we will still have many duplicate type entries during LTO that
will take up memory, even though only the last one to be entered into DIRefMap
will actually be used for emitting DWARF. Since DIRefs are also more expensive
in terms of lookup speed, I think I’d prefer to only use them only where
necessary (=as far down the MDNode DAG as possible).
>>>>>>>>>
>>>>>>>>
>>>>>>>> Right, though, to be fair, some numbers here
might be nice. The
>>>>>>>> tradeoff of "extra lookup for ref"
versus "extra memory allocation for
>>>>>>>> larger debug info”.
>>>>>>>
>>>>>>> Then again, it’s not like the extra lookups
introduced by David’s proposed patch would get rid of those extra memory
allocations. Granted, they do get rid of the assertion, which obviously provides
some value ;-)
>>>>>>>
>>>>>>>>
>>>>>>>> I also don't think it'll matter for
speed :)
>>>>>>>>
>>>>>>> You’re probably right about the lookup, but
depending on the project you’re building the memory usage could matter.
>>>>>>>
>>>>>>> -- adrian
>>>>>>>
>>>>>>>> -eric
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Question to the C++ experts: Is the
name (“test” in this case) a sufficient appropriate unique ID for a C++
namespace?
>>>>>>>>>>
>>>>>>>>>> Potentially - assuming you don't do
this for anything in an anonymous
>>>>>>>>>> namespace. I'd probably see about
using the standard mangling
>>>>>>>>> Good point! We should avoid unifying the
anonymous namespace :-)
>>>>>>>>>> machinery as we do for other things,
though, if possible. Might keep
>>>>>>>>>> it more consistent.
>>>>>>>>>
>>>>>>>>> I will do some experiments with nested
namespaces, but it seems as if name mangling does not buy us much: this will
merely turn into N4test.
>>>>>>>>>
>>>>>>>>> -- adrian
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -- adrian
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> On Jun 24, 2014, at 1:49 PM,
David Blaikie <dblaikie at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> +Adrian & Manman,
>>>>>>>>>>>>
>>>>>>>>>>>> Looks like this is a case of
non-DIRef references ending up in the IR,
>>>>>>>>>>>> and thus the references not
being deduplicated. This could lead to
>>>>>>>>>>>> some of the IR bloat that you
guys implemented the DIRef stuff to
>>>>>>>>>>>> reduce/avoid.
>>>>>>>>>>>>
>>>>>>>>>>>> The specific issue is that the
type is slightly different in the two
>>>>>>>>>>>> TUs (since the namespace scope
it's contained within is different in
>>>>>>>>>>>> the two TUs - note the
namespace preceeding the header include in
>>>>>>>>>>>> test2.cpp) and the actual
DIRef-usage failure is when building
>>>>>>>>>>>> subroutine types. The
subroutine types are made with direct references
>>>>>>>>>>>> to types, not with DIRef
references to types. This means they won't be
>>>>>>>>>>>> deduplicated/resolved to a
single type during linking correctly.
>>>>>>>>>>>>
>>>>>>>>>>>> It also leads to an assertion
as per the original email. Figured you
>>>>>>>>>>>> guys might want to look
into/fix this (I assume the right fix is to
>>>>>>>>>>>> use DIRef more pervasively in
the frontend - I'm doing a little
>>>>>>>>>>>> experimenting with clang now to
see how that looks).
>>>>>>>>>>>>
>>>>>>>>>>>> - David
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jun 13, 2014 at 11:08
PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>>>>>>>>>>> Yep, looks like a legit bug
to me. I'll try to reproduce it and reduce
>>>>>>>>>>>>> a test case, etc.
>>>>>>>>>>>>>
>>>>>>>>>>>>> (the ODR doesn't
require that the two copies of the Test class
>>>>>>>>>>>>> template appear on the same
line in the same file in two distinct
>>>>>>>>>>>>> translation units, it only
requires that the two copies have the same
>>>>>>>>>>>>> sequence of tokens - which
they do in your example)
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jun 13, 2014 at
5:33 PM, Kevin Modzelewski <kmod at dropbox.com> wrote:
>>>>>>>>>>>>>> Hi all, sorry to post
to both lists, but I'm running into an issue where
>>>>>>>>>>>>>> clang-generated debug
info is deemed to be invalid by LLVM tools (throws an
>>>>>>>>>>>>>> assertion error in both
llc and mcjit), and I'm not sure what the proper
>>>>>>>>>>>>>> resolution is.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here's a test case;
I last tested it on revision r210953:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> $ cat test1.cpp
>>>>>>>>>>>>>> #include
"test.h"
>>>>>>>>>>>>>> test::Test<int>
foo1() {
>>>>>>>>>>>>>> return
test::Test<int>();
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> $ cat test2.cpp
>>>>>>>>>>>>>> namespace test {
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> #include
"test.h"
>>>>>>>>>>>>>> test::Test<int>
foo2() {
>>>>>>>>>>>>>> return
test::Test<int>();
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> $ cat test.h
>>>>>>>>>>>>>> namespace test {
>>>>>>>>>>>>>> template <class
T>
>>>>>>>>>>>>>> class Test {
>>>>>>>>>>>>>> };
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> $ clang++ -g -emit-llvm
test1.cpp -S -o test1.ll
>>>>>>>>>>>>>> $ clang++ -g -emit-llvm
test2.cpp -S -o test2.ll
>>>>>>>>>>>>>> $ llvm-link test1.ll
test2.ll -S -o linked.ll
>>>>>>>>>>>>>> $ llc linked.ll
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The last step raises an
assertion error, assuming you have assertions
>>>>>>>>>>>>>> enabled:
>>>>>>>>>>>>>> llc:
lib/CodeGen/AsmPrinter/DwarfUnit.cpp:953: llvm::DIE*
>>>>>>>>>>>>>>
llvm::DwarfUnit::getOrCreateTypeDIE(const llvm::MDNode*): Assertion `Ty
=>>>>>>>>>>>>>> resolve(Ty.getRef())
&& "type was not uniqued, possible ODR violation."'
>>>>>>>>>>>>>> failed
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The initial
introduction of the "test" namespace in test2.cpp seems to
>>>>>>>>>>>>>> confuse clang --
test1.cpp and test2.cpp have non-compatible debug
>>>>>>>>>>>>>> information for the
test::Test type.  test1.cpp says that test::Test's
>>>>>>>>>>>>>> namespace is defined in
test.h, and test2.cpp says that test::Test's
>>>>>>>>>>>>>> namespace is defined in
test2.cpp (they both say that the class itself is
>>>>>>>>>>>>>> defined in test.h). 
Regardless of whether LLVM tools should allow this,
>>>>>>>>>>>>>> this seems like the
wrong output for these files?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Another related
example, is if we simply defined the test::Test template in
>>>>>>>>>>>>>> both files (ie,
manually executed the #include "test.h") and got rid of the
>>>>>>>>>>>>>> test2.cpp
"namespace test {}" definition, we'd run into the same assertion,
>>>>>>>>>>>>>> without hitting the
weird clang behavior with the empty namespace
>>>>>>>>>>>>>> pre-declaration.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It's probably also
worth noting that compiling directly with clang, ie
>>>>>>>>>>>>>> adding a "int
main() {}" function and then doing
>>>>>>>>>>>>>> $ clang++ -g test1.cpp
test2.cpp -o test
>>>>>>>>>>>>>> works fine.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm not a C++
expert and I don't know if these template issues really are
>>>>>>>>>>>>>> violations of the ODR,
but it feels like this should be allowed, and
>>>>>>>>>>>>>> somewhere in the llvm
toolchain (llvm-link? DwarfUnit.cpp?) this should be
>>>>>>>>>>>>>> handled.  What do
people think?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> kmod
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
_______________________________________________
>>>>>>>>> LLVM Developers mailing list
>>>>>>>>> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>>>>>>>>
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> LLVMdev at cs.uiuc.edu        
http://llvm.cs.uiuc.edu
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

llvm dev - Jul 2014 - [LLVMdev] Issues with clang-llvm debug info validity

[LLVMdev] Issues with clang-llvm debug info validity

[LLVMdev] Issues with clang-llvm debug info validity

[LLVMdev] Issues with clang-llvm debug info validity

[LLVMdev] Issues with clang-llvm debug info validity

[LLVMdev] Issues with clang-llvm debug info validity