thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD [Jun 2017]

If this information is useful, please help other people find it:
Share via:

Peter Collingbourne via llvm-dev

2017-Jun-08 06:28 UTC

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

On Wed, Jun 7, 2017 at 8:06 PM, Sean Silva <chisophugis at gmail.com>
wrote:
>
>
> On Wed, Jun 7, 2017 at 4:31 PM, Peter Collingbourne <peter at
pcc.me.uk>
> wrote:
>
>> On Wed, Jun 7, 2017 at 12:17 AM, Sean Silva <chisophugis at
gmail.com>
>> wrote:
>>
>>> Very nice and simple implementation!
>>>
>>> Do you have any statistics on how large these odr tables are
compared to
>>> other object file data? I assume that if these tables contain full
mangled
>>> symbol names, they could end up being very large and may want to
share the
>>> symbol name strings with the overall string table in the .o
>>>
>>
>> Looking at Chromium's object files it looks like the total size of
the
>> odrtabs is about 50% of the total size of the object files, which
isn't
>> great. The current implementation only looks at records, so I imagine
that
>> it would be hard to share any of the strings that I'm currently
creating.
>> (I guess it's possible that some types will have a mangled vtable
name in
>> the string table, so we may be able to share a little that way.) Note
>> however that this was without debug info.
>>
>> One option for reducing size would be to
>> 1) store hashes of ODR names in ODR tables, per Rui's suggestion
>> (alongside a reference to the name itself in the string table)
>> 2) compress the string table for the ODR names with a standard
>> compression algorithm like gzip.
>> This wouldn't seem to affect link time performance much because I
think
>> we should only need to look at the strings if we see a ODR name hash
match
>> together with an ODR hash mismatch, which would mean an ODR violation
with
>> a high probability (i.e. unless there was an ODR name hash collision,
we
>> have found an ODR violation). If we don't expect a lot of sharing
with
>> regular string tables (see below), it seems even more reasonable.
>>
>
> Neat observation!
>
> FWIW, it is a birthday problem type situation though, so for a 32-bit
> hash, we would expect a collision in about 1 in 2^16 distinct hashes (and
> 2^16 seems pretty easy to hit in a large project). So 64-bit hashes might
> be preferable.
>
Oh right, good point, using a 64-bit hash does seem like a good idea here.

>
>> Also, do you have any numbers on the performance of your initial
>>> implementation?
>>>
>>
>> I measured the link time for chromium's unit_tests (the largest
single
>> binary in chromium) at 5.05s without ODR checks and 6.61s with ODR
checks.
>> So about 30% overhead, but in absolute terms it doesn't seem too
bad. So I
>> think this may be acceptable for an initial implementation, but it
>> certainly seems worth trying to do better.
>>
>
> I know that things aren't currently apples-to-apples, but how does that
> compare to gold?
>
I will measure it.

>> W.r.t. LLD and having it always on by default (and hence making it as
>>> fast as possible), it seems like right now you are implementing the
>>> checking process with a hash table. That's simple and fine for
a first
>>> implementation, but it's probably worth mentioning in a comment
the problem
>>> of checking the tables, at least from the linker's perspective,
does fit
>>> into a map-reduce pattern and could be easily parallelized if
needed. E.g.
>>> a parallel sort to coalesce all entries for symbols of the same
name
>>> followed by a parallel forEach to check each bucket with the same
symbol
>>> name (roughly speaking).
>>>
>>
>> Right, that's one approach. I was thinking of a simpler approach
where at
>> compile time we sort ODR names by hash and partition them using (say)
the
>> upper bits of the hash, so that at link time we can have N threads each
>> building a hash table for a specific partition.
>>
>> And of course this work can be started right after symbol resolution
>> finishes and parallelised with the rest of the work done by the linker.
>>
>> Even better than doing it faster is just doing less work. There's a
lot
>>> of work that the linker is already doing that may be reusable for
the ODR
>>> checking.
>>> E.g.
>>> - maybe we could get the coalescing step as a byproduct of our
existing
>>> string deduping, which we are generally doing anyway.
>>> - we are already coalescing symbol names for the symbol table. If
the
>>> ODR table is keyed off of symbols in the binary that we are
inserting into
>>> the symbol table, then I think we could do the entire ODR check
with no
>>> extra "string" work on LLD's part.
>>>
>>> I see Rui already mentioned some of this in
https://bugs.chromium.org/p
>>> /chromium/issues/detail?id=726071#c4.
>>> You mentioned that not everything is necessarily directly keyed on
a
>>> symbol (such as types), but I think that it would really simplify
things if
>>> the check was done as such. Do you have any idea exactly how much
of the
>>> things that we want to check are not keyed on symbols? If most
things are
>>> keyed on symbols, for the things we are not we can just emit extra
symbols
>>> prefixed by __clang_odr_check_ or whatever.
>>>
>>
>> Since the current implementation only works with records there is
>> basically zero overlap right now between ODR names and symbols. I
suppose
>> that I could estimate the amount of function overlap in a hypothetical
>> implementation that computes ODR hashes of functions by comparing the
>> number of *_odr functions after clang has finished IRgen with the
number
>> after optimization finishes. This of course would be strictly more than
>> functions + types.
>>
>
> Wouldn't any function or symbol using the record type have the type
name
> somewhere in it? If we used an offset+length encoding (instead of offset +
> NUL termination) we might be able to reuse it then (at some cost in finding
> the reference).
>
That may be possible with some work in the string table builder. But at
that point of course we're not dealing with regular symbols any more. I
guess we could have two ODR tables per object file: an array of (ODR hash,
location) tuples for ODR names that correspond to symbol table symbols
(i.e. Rui's proposal on the chromium bug), and an array of (ODR name, ODR
hash, location) tuples for all other ODR names. I guess if we wanted a "low
overhead" mode we could just omit the second table or put fewer symbols in
it.

With debug info surely there is some sort of string representing the
record> name or something like that, no?
>
Not the record name on its own (they do appear but a bit awkwardly -- each
namespace component is stored in a separate string), but if the record has
at least one member function the mangled type name will appear somewhere in
.debug_str, so we could in principle reuse that with the offset/length
trick.

I guess we may have to have our "low-overhead" user-facing behavior be
a> bit more nuanced. E.g.:
> 1. does this feature bloat object files significantly
> 2. does this feature slow down link times significantly
>
> Intuitively, it seems like we should be able to get 1. when debug info
> happens to be enabled (not sure about split dwarf?) and possibly in all
> cases at the cost of complexity. We may be able to get 2. in all cases with
> proper design.
>
I think that would be my rough assessment as well. I think we have a good
shot at 1 for all cases with some of the ideas that have been mentioned
already. If we can avoid creating dependencies on DWARF I think that would
be ideal -- I'd ideally like this to work for COFF as well, where you'd
typically expect to find CodeView in object files. If I were to try this I
think the first thing that I would try is hash/compression combined with
the two ODR tables (no reuse for non-symbol ODR names to start with, as
compression may be enough on its own).

Peter

>
> -- Sean Silva
>
>
>>
>>
>>> The issue of retaining the ODR check for functions even if they get
>>> inlined may inherently pose an extra cost that can't be folded
into
>>> existing work the linker is doing, so there might be a reason for
clang to
>>> have a default mode that has practically no linking overhead and
one that
>>> does more thorough checking but imposes extra linking overhead.
Think
>>> something like a crazy boost library with thousands of functions
that get
>>> inlined away, but have gigantic mangled names and so precisely are
the ones
>>> that are going to impose extra cost on the linker. Simply due to
the extra
>>> volume of strings that the linker would need to look at, I
don't think
>>> there's a way to include checking of all inlined function
"for free" at the
>>> linker level using the symbol approach.
>>>
>>
>>>
>> I guess those inlined functions would still have those symbol names in
>>> debug info (I think?), so piggybacking on the string deduplication
we're
>>> already doing might make it possible to fold away the work in that
case
>>> (but then again, would still impose extra cost with split
dwarf...).
>>>
>>> Anyway, let's wait to see what the actual performance numbers
are.
>>>
>>> -- Sean Silva
>>>
>>> On Tue, Jun 6, 2017 at 10:40 PM, Peter Collingbourne via cfe-dev
<
>>> cfe-dev at lists.llvm.org> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'd like to propose an ODR checker feature for Clang and
LLD. The
>>>> feature would be similar to gold's --detect-odr-violations
feature, but
>>>> better: we can rely on integration with clang to avoid relying
on debug
>>>> info and to perform more precise matching.
>>>>
>>>> The basic idea is that we use clang's ability to create ODR
hashes for
>>>> declarations. ODR hashes are computed using all information
about a
>>>> declaration that is ODR-relevant. If the flag
-fdetect-odr-violations is
>>>> passed, Clang will store the ODR hashes in a so-called ODR
table in each
>>>> object file. Each ODR table will contain a mapping from mangled
declaration
>>>> names to ODR hashes. At link time, the linker will read the ODR
table and
>>>> report any mismatches.
>>>>
>>>> To make this work:
>>>> - LLVM will be extended with the ability to represent ODR
tables in the
>>>> IR and emit them to object files
>>>> - Clang will be extended with the ability to emit ODR tables
using ODR
>>>> hashes
>>>> - LLD will be extended to read ODR tables from object files
>>>>
>>>> I have implemented a prototype of this feature. It is available
here:
>>>> https://github.com/pcc/llvm-project/tree/odr-checker and some
results
>>>> from applying it to chromium are here: crbug.com/726071
>>>> As you can see it did indeed find a number of real ODR
violations in
>>>> Chromium, including some that wouldn't be detectable using
debug info.
>>>>
>>>> If you're interested in what the format of the ODR table
would look
>>>> like, that prototype shows pretty much what I had in mind, but
I expect
>>>> many other aspects of the implementation to change as it is
upstreamed.
>>>>
>>>> Thanks,
>>>> --
>>>> --
>>>> Peter
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>
>>>>
>>>
>>
>>
>> --
>> --
>> Peter
>>
>
>

-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170607/5b23f68e/attachment.html>

Peter Collingbourne via llvm-dev

2017-Jun-14 01:34 UTC

head link

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

On Wed, Jun 7, 2017 at 11:28 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:
> On Wed, Jun 7, 2017 at 8:06 PM, Sean Silva <chisophugis at gmail.com>
wrote:
>
>>
>>
>> On Wed, Jun 7, 2017 at 4:31 PM, Peter Collingbourne <peter at
pcc.me.uk>
>> wrote:
>>
>>> On Wed, Jun 7, 2017 at 12:17 AM, Sean Silva <chisophugis at
gmail.com>
>>> wrote:
>>>
>>>> Very nice and simple implementation!
>>>>
>>>> Do you have any statistics on how large these odr tables are
compared
>>>> to other object file data? I assume that if these tables
contain full
>>>> mangled symbol names, they could end up being very large and
may want to
>>>> share the symbol name strings with the overall string table in
the .o
>>>>
>>>
>>> Looking at Chromium's object files it looks like the total size
of the
>>> odrtabs is about 50% of the total size of the object files, which
isn't
>>> great. The current implementation only looks at records, so I
imagine that
>>> it would be hard to share any of the strings that I'm currently
creating.
>>> (I guess it's possible that some types will have a mangled
vtable name in
>>> the string table, so we may be able to share a little that way.)
Note
>>> however that this was without debug info.
>>>
>>> One option for reducing size would be to
>>> 1) store hashes of ODR names in ODR tables, per Rui's
suggestion
>>> (alongside a reference to the name itself in the string table)
>>> 2) compress the string table for the ODR names with a standard
>>> compression algorithm like gzip.
>>> This wouldn't seem to affect link time performance much because
I think
>>> we should only need to look at the strings if we see a ODR name
hash match
>>> together with an ODR hash mismatch, which would mean an ODR
violation with
>>> a high probability (i.e. unless there was an ODR name hash
collision, we
>>> have found an ODR violation). If we don't expect a lot of
sharing with
>>> regular string tables (see below), it seems even more reasonable.
>>>
>>
>> Neat observation!
>>
>> FWIW, it is a birthday problem type situation though, so for a 32-bit
>> hash, we would expect a collision in about 1 in 2^16 distinct hashes
(and
>> 2^16 seems pretty easy to hit in a large project). So 64-bit hashes
might
>> be preferable.
>>
>
> Oh right, good point, using a 64-bit hash does seem like a good idea here.
>
>
>>
>>> Also, do you have any numbers on the performance of your initial
>>>> implementation?
>>>>
>>>
>>> I measured the link time for chromium's unit_tests (the largest
single
>>> binary in chromium) at 5.05s without ODR checks and 6.61s with ODR
checks.
>>> So about 30% overhead, but in absolute terms it doesn't seem
too bad. So I
>>> think this may be acceptable for an initial implementation, but it
>>> certainly seems worth trying to do better.
>>>
>>
>> I know that things aren't currently apples-to-apples, but how does
that
>> compare to gold?
>>
>
> I will measure it.
>
For that unit_tests binary I measured the overhead at about 5 seconds
(average of 10 runs). That is with debug info, of course.

W.r.t. LLD and having it always on by default (and hence making it as
fast>>>> as possible), it seems like right now you are implementing the
checking
>>>> process with a hash table. That's simple and fine for a
first
>>>> implementation, but it's probably worth mentioning in a
comment the problem
>>>> of checking the tables, at least from the linker's
perspective, does fit
>>>> into a map-reduce pattern and could be easily parallelized if
needed. E.g.
>>>> a parallel sort to coalesce all entries for symbols of the same
name
>>>> followed by a parallel forEach to check each bucket with the
same symbol
>>>> name (roughly speaking).
>>>>
>>>
>>> Right, that's one approach. I was thinking of a simpler
approach where
>>> at compile time we sort ODR names by hash and partition them using
(say)
>>> the upper bits of the hash, so that at link time we can have N
threads each
>>> building a hash table for a specific partition.
>>>
>>> And of course this work can be started right after symbol
resolution
>>> finishes and parallelised with the rest of the work done by the
linker.
>>>
>>> Even better than doing it faster is just doing less work.
There's a lot
>>>> of work that the linker is already doing that may be reusable
for the ODR
>>>> checking.
>>>> E.g.
>>>> - maybe we could get the coalescing step as a byproduct of our
existing
>>>> string deduping, which we are generally doing anyway.
>>>> - we are already coalescing symbol names for the symbol table.
If the
>>>> ODR table is keyed off of symbols in the binary that we are
inserting into
>>>> the symbol table, then I think we could do the entire ODR check
with no
>>>> extra "string" work on LLD's part.
>>>>
>>>> I see Rui already mentioned some of this in
https://bugs.chromium.org/p
>>>> /chromium/issues/detail?id=726071#c4.
>>>> You mentioned that not everything is necessarily directly keyed
on a
>>>> symbol (such as types), but I think that it would really
simplify things if
>>>> the check was done as such. Do you have any idea exactly how
much of the
>>>> things that we want to check are not keyed on symbols? If most
things are
>>>> keyed on symbols, for the things we are not we can just emit
extra symbols
>>>> prefixed by __clang_odr_check_ or whatever.
>>>>
>>>
>>> Since the current implementation only works with records there is
>>> basically zero overlap right now between ODR names and symbols. I
suppose
>>> that I could estimate the amount of function overlap in a
hypothetical
>>> implementation that computes ODR hashes of functions by comparing
the
>>> number of *_odr functions after clang has finished IRgen with the
number
>>> after optimization finishes. This of course would be strictly more
than
>>> functions + types.
>>>
>>
>> Wouldn't any function or symbol using the record type have the type
name
>> somewhere in it? If we used an offset+length encoding (instead of
offset +
>> NUL termination) we might be able to reuse it then (at some cost in
finding
>> the reference).
>>
>
> That may be possible with some work in the string table builder. But at
> that point of course we're not dealing with regular symbols any more. I
> guess we could have two ODR tables per object file: an array of (ODR hash,
> location) tuples for ODR names that correspond to symbol table symbols
> (i.e. Rui's proposal on the chromium bug), and an array of (ODR name,
ODR
> hash, location) tuples for all other ODR names. I guess if we wanted a
"low
> overhead" mode we could just omit the second table or put fewer
symbols in
> it.
>
> With debug info surely there is some sort of string representing the
>> record name or something like that, no?
>>
>
> Not the record name on its own (they do appear but a bit awkwardly -- each
> namespace component is stored in a separate string), but if the record has
> at least one member function the mangled type name will appear somewhere in
> .debug_str, so we could in principle reuse that with the offset/length
> trick.
>
> I guess we may have to have our "low-overhead" user-facing
behavior be a
>> bit more nuanced. E.g.:
>> 1. does this feature bloat object files significantly
>> 2. does this feature slow down link times significantly
>>
>> Intuitively, it seems like we should be able to get 1. when debug info
>> happens to be enabled (not sure about split dwarf?) and possibly in all
>> cases at the cost of complexity. We may be able to get 2. in all cases
with
>> proper design.
>>
>
> I think that would be my rough assessment as well. I think we have a good
> shot at 1 for all cases with some of the ideas that have been mentioned
> already. If we can avoid creating dependencies on DWARF I think that would
> be ideal -- I'd ideally like this to work for COFF as well, where
you'd
> typically expect to find CodeView in object files. If I were to try this I
> think the first thing that I would try is hash/compression combined with
> the two ODR tables (no reuse for non-symbol ODR names to start with, as
> compression may be enough on its own).
>
I developed a second prototype which uses hash/compression with no attempt
to reuse. It is available here: https://github.com/pcc/
llvm-project/tree/odr-checker2

For Chromium the object file size overhead was 536566007 bytes, or in
relative terms about 25%, or about 4% with debug info. I measured perf
overhead for unit_tests at about 6%, but after I moved the checker onto
another thread, the overhead disappeared into the noise.

I'm reasonably happy with these figures, at least for a first
implementation. We may be able to do even better for file size with reuse,
but I'd leave that for version 2.

Peter
>
> Peter
>
>
>>
>> -- Sean Silva
>>
>>
>>>
>>>
>>>> The issue of retaining the ODR check for functions even if they
get
>>>> inlined may inherently pose an extra cost that can't be
folded into
>>>> existing work the linker is doing, so there might be a reason
for clang to
>>>> have a default mode that has practically no linking overhead
and one that
>>>> does more thorough checking but imposes extra linking overhead.
Think
>>>> something like a crazy boost library with thousands of
functions that get
>>>> inlined away, but have gigantic mangled names and so precisely
are the ones
>>>> that are going to impose extra cost on the linker. Simply due
to the extra
>>>> volume of strings that the linker would need to look at, I
don't think
>>>> there's a way to include checking of all inlined function
"for free" at the
>>>> linker level using the symbol approach.
>>>>
>>>
>>>>
>>> I guess those inlined functions would still have those symbol names
in
>>>> debug info (I think?), so piggybacking on the string
deduplication we're
>>>> already doing might make it possible to fold away the work in
that case
>>>> (but then again, would still impose extra cost with split
dwarf...).
>>>>
>>>> Anyway, let's wait to see what the actual performance
numbers are.
>>>>
>>>> -- Sean Silva
>>>>
>>>> On Tue, Jun 6, 2017 at 10:40 PM, Peter Collingbourne via
cfe-dev <
>>>> cfe-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I'd like to propose an ODR checker feature for Clang
and LLD. The
>>>>> feature would be similar to gold's
--detect-odr-violations feature, but
>>>>> better: we can rely on integration with clang to avoid
relying on debug
>>>>> info and to perform more precise matching.
>>>>>
>>>>> The basic idea is that we use clang's ability to create
ODR hashes for
>>>>> declarations. ODR hashes are computed using all information
about a
>>>>> declaration that is ODR-relevant. If the flag
-fdetect-odr-violations is
>>>>> passed, Clang will store the ODR hashes in a so-called ODR
table in each
>>>>> object file. Each ODR table will contain a mapping from
mangled declaration
>>>>> names to ODR hashes. At link time, the linker will read the
ODR table and
>>>>> report any mismatches.
>>>>>
>>>>> To make this work:
>>>>> - LLVM will be extended with the ability to represent ODR
tables in
>>>>> the IR and emit them to object files
>>>>> - Clang will be extended with the ability to emit ODR
tables using ODR
>>>>> hashes
>>>>> - LLD will be extended to read ODR tables from object files
>>>>>
>>>>> I have implemented a prototype of this feature. It is
available here:
>>>>> https://github.com/pcc/llvm-project/tree/odr-checker and
some results
>>>>> from applying it to chromium are here: crbug.com/726071
>>>>> As you can see it did indeed find a number of real ODR
violations in
>>>>> Chromium, including some that wouldn't be detectable
using debug info.
>>>>>
>>>>> If you're interested in what the format of the ODR
table would look
>>>>> like, that prototype shows pretty much what I had in mind,
but I expect
>>>>> many other aspects of the implementation to change as it is
upstreamed.
>>>>>
>>>>> Thanks,
>>>>> --
>>>>> --
>>>>> Peter
>>>>>
>>>>> _______________________________________________
>>>>> cfe-dev mailing list
>>>>> cfe-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> --
>>> Peter
>>>
>>
>>
>
>
> --
> --
> Peter
>


-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170613/c4f85ce4/attachment-0001.html>

David Blaikie via llvm-dev

2017-Jun-14 02:54 UTC

head link

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

On Tue, Jun 13, 2017 at 6:34 PM Peter Collingbourne via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> On Wed, Jun 7, 2017 at 11:28 PM, Peter Collingbourne <peter at
pcc.me.uk>
> wrote:
>
>> On Wed, Jun 7, 2017 at 8:06 PM, Sean Silva <chisophugis at
gmail.com> wrote:
>>
>>>
>>>
>>> On Wed, Jun 7, 2017 at 4:31 PM, Peter Collingbourne <peter at
pcc.me.uk>
>>> wrote:
>>>
>>>> On Wed, Jun 7, 2017 at 12:17 AM, Sean Silva <chisophugis at
gmail.com>
>>>> wrote:
>>>>
>>>>> Very nice and simple implementation!
>>>>>
>>>>> Do you have any statistics on how large these odr tables
are compared
>>>>> to other object file data? I assume that if these tables
contain full
>>>>> mangled symbol names, they could end up being very large
and may want to
>>>>> share the symbol name strings with the overall string table
in the .o
>>>>>
>>>>
>>>> Looking at Chromium's object files it looks like the total
size of the
>>>> odrtabs is about 50% of the total size of the object files,
which isn't
>>>> great. The current implementation only looks at records, so I
imagine that
>>>> it would be hard to share any of the strings that I'm
currently creating.
>>>> (I guess it's possible that some types will have a mangled
vtable name in
>>>> the string table, so we may be able to share a little that
way.) Note
>>>> however that this was without debug info.
>>>>
>>>> One option for reducing size would be to
>>>> 1) store hashes of ODR names in ODR tables, per Rui's
suggestion
>>>> (alongside a reference to the name itself in the string table)
>>>> 2) compress the string table for the ODR names with a standard
>>>> compression algorithm like gzip.
>>>> This wouldn't seem to affect link time performance much
because I think
>>>> we should only need to look at the strings if we see a ODR name
hash match
>>>> together with an ODR hash mismatch, which would mean an ODR
violation with
>>>> a high probability (i.e. unless there was an ODR name hash
collision, we
>>>> have found an ODR violation). If we don't expect a lot of
sharing with
>>>> regular string tables (see below), it seems even more
reasonable.
>>>>
>>>
>>> Neat observation!
>>>
>>> FWIW, it is a birthday problem type situation though, so for a
32-bit
>>> hash, we would expect a collision in about 1 in 2^16 distinct
hashes (and
>>> 2^16 seems pretty easy to hit in a large project). So 64-bit hashes
might
>>> be preferable.
>>>
>>
>> Oh right, good point, using a 64-bit hash does seem like a good idea
here.
>>
>>
>>>
>>>> Also, do you have any numbers on the performance of your
initial
>>>>> implementation?
>>>>>
>>>>
>>>> I measured the link time for chromium's unit_tests (the
largest single
>>>> binary in chromium) at 5.05s without ODR checks and 6.61s with
ODR checks.
>>>> So about 30% overhead, but in absolute terms it doesn't
seem too bad. So I
>>>> think this may be acceptable for an initial implementation, but
it
>>>> certainly seems worth trying to do better.
>>>>
>>>
>>> I know that things aren't currently apples-to-apples, but how
does that
>>> compare to gold?
>>>
>>
>> I will measure it.
>>
>
> For that unit_tests binary I measured the overhead at about 5 seconds
> (average of 10 runs). That is with debug info, of course.
>
> W.r.t. LLD and having it always on by default (and hence making it as fast
>>>>> as possible), it seems like right now you are implementing
the checking
>>>>> process with a hash table. That's simple and fine for a
first
>>>>> implementation, but it's probably worth mentioning in a
comment the problem
>>>>> of checking the tables, at least from the linker's
perspective, does fit
>>>>> into a map-reduce pattern and could be easily parallelized
if needed. E.g.
>>>>> a parallel sort to coalesce all entries for symbols of the
same name
>>>>> followed by a parallel forEach to check each bucket with
the same symbol
>>>>> name (roughly speaking).
>>>>>
>>>>
>>>> Right, that's one approach. I was thinking of a simpler
approach where
>>>> at compile time we sort ODR names by hash and partition them
using (say)
>>>> the upper bits of the hash, so that at link time we can have N
threads each
>>>> building a hash table for a specific partition.
>>>>
>>>> And of course this work can be started right after symbol
resolution
>>>> finishes and parallelised with the rest of the work done by the
linker.
>>>>
>>>> Even better than doing it faster is just doing less work.
There's a lot
>>>>> of work that the linker is already doing that may be
reusable for the ODR
>>>>> checking.
>>>>> E.g.
>>>>> - maybe we could get the coalescing step as a byproduct of
our
>>>>> existing string deduping, which we are generally doing
anyway.
>>>>> - we are already coalescing symbol names for the symbol
table. If the
>>>>> ODR table is keyed off of symbols in the binary that we are
inserting into
>>>>> the symbol table, then I think we could do the entire ODR
check with no
>>>>> extra "string" work on LLD's part.
>>>>>
>>>>> I see Rui already mentioned some of this in
>>>>>
https://bugs.chromium.org/p/chromium/issues/detail?id=726071#c4.
>>>>> You mentioned that not everything is necessarily directly
keyed on a
>>>>> symbol (such as types), but I think that it would really
simplify things if
>>>>> the check was done as such. Do you have any idea exactly
how much of the
>>>>> things that we want to check are not keyed on symbols? If
most things are
>>>>> keyed on symbols, for the things we are not we can just
emit extra symbols
>>>>> prefixed by __clang_odr_check_ or whatever.
>>>>>
>>>>
>>>> Since the current implementation only works with records there
is
>>>> basically zero overlap right now between ODR names and symbols.
I suppose
>>>> that I could estimate the amount of function overlap in a
hypothetical
>>>> implementation that computes ODR hashes of functions by
comparing the
>>>> number of *_odr functions after clang has finished IRgen with
the number
>>>> after optimization finishes. This of course would be strictly
more than
>>>> functions + types.
>>>>
>>>
>>> Wouldn't any function or symbol using the record type have the
type name
>>> somewhere in it? If we used an offset+length encoding (instead of
offset +
>>> NUL termination) we might be able to reuse it then (at some cost in
finding
>>> the reference).
>>>
>>
>> That may be possible with some work in the string table builder. But at
>> that point of course we're not dealing with regular symbols any
more. I
>> guess we could have two ODR tables per object file: an array of (ODR
hash,
>> location) tuples for ODR names that correspond to symbol table symbols
>> (i.e. Rui's proposal on the chromium bug), and an array of (ODR
name, ODR
>> hash, location) tuples for all other ODR names. I guess if we wanted a
"low
>> overhead" mode we could just omit the second table or put fewer
symbols in
>> it.
>>
>> With debug info surely there is some sort of string representing the
>>> record name or something like that, no?
>>>
>>
>> Not the record name on its own (they do appear but a bit awkwardly --
>> each namespace component is stored in a separate string), but if the
record
>> has at least one member function the mangled type name will appear
>> somewhere in .debug_str, so we could in principle reuse that with the
>> offset/length trick.
>>
>> I guess we may have to have our "low-overhead" user-facing
behavior be a
>>> bit more nuanced. E.g.:
>>> 1. does this feature bloat object files significantly
>>> 2. does this feature slow down link times significantly
>>>
>>> Intuitively, it seems like we should be able to get 1. when debug
info
>>> happens to be enabled (not sure about split dwarf?) and possibly in
all
>>> cases at the cost of complexity. We may be able to get 2. in all
cases with
>>> proper design.
>>>
>>
>> I think that would be my rough assessment as well. I think we have a
good
>> shot at 1 for all cases with some of the ideas that have been mentioned
>> already. If we can avoid creating dependencies on DWARF I think that
would
>> be ideal -- I'd ideally like this to work for COFF as well, where
you'd
>> typically expect to find CodeView in object files. If I were to try
this I
>> think the first thing that I would try is hash/compression combined
with
>> the two ODR tables (no reuse for non-symbol ODR names to start with, as
>> compression may be enough on its own).
>>
>
> I developed a second prototype which uses hash/compression with no attempt
> to reuse. It is available here:
> https://github.com/pcc/llvm-project/tree/odr-checker2
>
> For Chromium the object file size overhead was 536566007 bytes, or in
> relative terms about 25%, or about 4% with debug info. I measured perf
> overhead for unit_tests at about 6%, but after I moved the checker onto
> another thread, the overhead disappeared into the noise.
>
Still seems like quite a big increase.

Any chance of compression? Could this go into .dwo files with Fission and
be checked by dwp instead, perhaps?

> I'm reasonably happy with these figures, at least for a first
> implementation. We may be able to do even better for file size with reuse,
> but I'd leave that for version 2.
>
What's the story with compatibility between versions, then? Is there a
version header? Will old formats be supported by lld indefinitely? Not at
all?

- Dave

>
> Peter
>
>>
>> Peter
>>
>>
>>>
>>> -- Sean Silva
>>>
>>>
>>>>
>>>>
>>>>> The issue of retaining the ODR check for functions even if
they get
>>>>> inlined may inherently pose an extra cost that can't be
folded into
>>>>> existing work the linker is doing, so there might be a
reason for clang to
>>>>> have a default mode that has practically no linking
overhead and one that
>>>>> does more thorough checking but imposes extra linking
overhead. Think
>>>>> something like a crazy boost library with thousands of
functions that get
>>>>> inlined away, but have gigantic mangled names and so
precisely are the ones
>>>>> that are going to impose extra cost on the linker. Simply
due to the extra
>>>>> volume of strings that the linker would need to look at, I
don't think
>>>>> there's a way to include checking of all inlined
function "for free" at the
>>>>> linker level using the symbol approach.
>>>>>
>>>>
>>>>>
>>>> I guess those inlined functions would still have those symbol
names in
>>>>> debug info (I think?), so piggybacking on the string
deduplication we're
>>>>> already doing might make it possible to fold away the work
in that case
>>>>> (but then again, would still impose extra cost with split
dwarf...).
>>>>>
>>>>> Anyway, let's wait to see what the actual performance
numbers are.
>>>>>
>>>>> -- Sean Silva
>>>>>
>>>>> On Tue, Jun 6, 2017 at 10:40 PM, Peter Collingbourne via
cfe-dev <
>>>>> cfe-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I'd like to propose an ODR checker feature for
Clang and LLD. The
>>>>>> feature would be similar to gold's
--detect-odr-violations feature, but
>>>>>> better: we can rely on integration with clang to avoid
relying on debug
>>>>>> info and to perform more precise matching.
>>>>>>
>>>>>> The basic idea is that we use clang's ability to
create ODR hashes
>>>>>> for declarations. ODR hashes are computed using all
information about a
>>>>>> declaration that is ODR-relevant. If the flag
-fdetect-odr-violations is
>>>>>> passed, Clang will store the ODR hashes in a so-called
ODR table in each
>>>>>> object file. Each ODR table will contain a mapping from
mangled declaration
>>>>>> names to ODR hashes. At link time, the linker will read
the ODR table and
>>>>>> report any mismatches.
>>>>>>
>>>>>> To make this work:
>>>>>> - LLVM will be extended with the ability to represent
ODR tables in
>>>>>> the IR and emit them to object files
>>>>>> - Clang will be extended with the ability to emit ODR
tables using
>>>>>> ODR hashes
>>>>>> - LLD will be extended to read ODR tables from object
files
>>>>>>
>>>>>> I have implemented a prototype of this feature. It is
available here:
>>>>>> https://github.com/pcc/llvm-project/tree/odr-checker
and some
>>>>>> results from applying it to chromium are here:
crbug.com/726071
>>>>>> As you can see it did indeed find a number of real ODR
violations in
>>>>>> Chromium, including some that wouldn't be
detectable using debug info.
>>>>>>
>>>>>> If you're interested in what the format of the ODR
table would look
>>>>>> like, that prototype shows pretty much what I had in
mind, but I expect
>>>>>> many other aspects of the implementation to change as
it is upstreamed.
>>>>>>
>>>>>> Thanks,
>>>>>> --
>>>>>> --
>>>>>> Peter
>>>>>>
>>>>>> _______________________________________________
>>>>>> cfe-dev mailing list
>>>>>> cfe-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> --
>>>> Peter
>>>>
>>>
>>>
>>
>>
>> --
>> --
>> Peter
>>
>
>
>
> --
> --
> Peter
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170614/620a5e5f/attachment.html>

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Jun 2017 - [cfe-dev] RFC: ODR checker for Clang and LLD

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

Possibly Parallel Threads