thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD [Jun 2017]

If this information is useful, please help other people find it:
Share via:

David Blaikie via llvm-dev

2017-Jun-14 06:06 UTC

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

On Tue, Jun 13, 2017 at 10:05 PM Peter Collingbourne <peter at pcc.me.uk>
wrote:
> On Tue, Jun 13, 2017 at 8:48 PM, David Blaikie <dblaikie at
gmail.com> wrote:
>
>>
>>
>> On Tue, Jun 13, 2017 at 8:43 PM Peter Collingbourne <peter at
pcc.me.uk>
>> wrote:
>>
>>> On Tue, Jun 13, 2017 at 7:54 PM, David Blaikie <dblaikie at
gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Jun 13, 2017 at 6:34 PM Peter Collingbourne via cfe-dev
<
>>>> cfe-dev at lists.llvm.org> wrote:
>>>>
>>>>> On Wed, Jun 7, 2017 at 11:28 PM, Peter Collingbourne
<peter at pcc.me.uk>
>>>>> wrote:
>>>>>
>>>>>> On Wed, Jun 7, 2017 at 8:06 PM, Sean Silva
<chisophugis at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 7, 2017 at 4:31 PM, Peter Collingbourne
<peter at pcc.me.uk
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> On Wed, Jun 7, 2017 at 12:17 AM, Sean Silva
<chisophugis at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Very nice and simple implementation!
>>>>>>>>>
>>>>>>>>> Do you have any statistics on how large
these odr tables are
>>>>>>>>> compared to other object file data? I
assume that if these tables contain
>>>>>>>>> full mangled symbol names, they could end
up being very large and may want
>>>>>>>>> to share the symbol name strings with the
overall string table in the .o
>>>>>>>>>
>>>>>>>>
>>>>>>>> Looking at Chromium's object files it looks
like the total size of
>>>>>>>> the odrtabs is about 50% of the total size of
the object files, which isn't
>>>>>>>> great. The current implementation only looks at
records, so I imagine that
>>>>>>>> it would be hard to share any of the strings
that I'm currently creating.
>>>>>>>> (I guess it's possible that some types will
have a mangled vtable name in
>>>>>>>> the string table, so we may be able to share a
little that way.) Note
>>>>>>>> however that this was without debug info.
>>>>>>>>
>>>>>>>> One option for reducing size would be to
>>>>>>>> 1) store hashes of ODR names in ODR tables, per
Rui's suggestion
>>>>>>>> (alongside a reference to the name itself in
the string table)
>>>>>>>> 2) compress the string table for the ODR names
with a standard
>>>>>>>> compression algorithm like gzip.
>>>>>>>> This wouldn't seem to affect link time
performance much because I
>>>>>>>> think we should only need to look at the
strings if we see a ODR name hash
>>>>>>>> match together with an ODR hash mismatch, which
would mean an ODR violation
>>>>>>>> with a high probability (i.e. unless there was
an ODR name hash collision,
>>>>>>>> we have found an ODR violation). If we
don't expect a lot of sharing with
>>>>>>>> regular string tables (see below), it seems
even more reasonable.
>>>>>>>>
>>>>>>>
>>>>>>> Neat observation!
>>>>>>>
>>>>>>> FWIW, it is a birthday problem type situation
though, so for a
>>>>>>> 32-bit hash, we would expect a collision in about 1
in 2^16 distinct hashes
>>>>>>> (and 2^16 seems pretty easy to hit in a large
project). So 64-bit hashes
>>>>>>> might be preferable.
>>>>>>>
>>>>>>
>>>>>> Oh right, good point, using a 64-bit hash does seem
like a good idea
>>>>>> here.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> Also, do you have any numbers on the
performance of your initial
>>>>>>>>> implementation?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I measured the link time for chromium's
unit_tests (the largest
>>>>>>>> single binary in chromium) at 5.05s without ODR
checks and 6.61s with ODR
>>>>>>>> checks. So about 30% overhead, but in absolute
terms it doesn't seem too
>>>>>>>> bad. So I think this may be acceptable for an
initial implementation, but
>>>>>>>> it certainly seems worth trying to do better.
>>>>>>>>
>>>>>>>
>>>>>>> I know that things aren't currently
apples-to-apples, but how does
>>>>>>> that compare to gold?
>>>>>>>
>>>>>>
>>>>>> I will measure it.
>>>>>>
>>>>>
>>>>> For that unit_tests binary I measured the overhead at about
5 seconds
>>>>> (average of 10 runs). That is with debug info, of course.
>>>>>
>>>>> W.r.t. LLD and having it always on by default (and hence
making it as
>>>>>>>>> fast as possible), it seems like right now
you are implementing the
>>>>>>>>> checking process with a hash table.
That's simple and fine for a first
>>>>>>>>> implementation, but it's probably worth
mentioning in a comment the problem
>>>>>>>>> of checking the tables, at least from the
linker's perspective, does fit
>>>>>>>>> into a map-reduce pattern and could be
easily parallelized if needed. E.g.
>>>>>>>>> a parallel sort to coalesce all entries for
symbols of the same name
>>>>>>>>> followed by a parallel forEach to check
each bucket with the same symbol
>>>>>>>>> name (roughly speaking).
>>>>>>>>>
>>>>>>>>
>>>>>>>> Right, that's one approach. I was thinking
of a simpler approach
>>>>>>>> where at compile time we sort ODR names by hash
and partition them using
>>>>>>>> (say) the upper bits of the hash, so that at
link time we can have N
>>>>>>>> threads each building a hash table for a
specific partition.
>>>>>>>>
>>>>>>>> And of course this work can be started right
after symbol
>>>>>>>> resolution finishes and parallelised with the
rest of the work done by the
>>>>>>>> linker.
>>>>>>>>
>>>>>>>> Even better than doing it faster is just doing
less work. There's a
>>>>>>>>> lot of work that the linker is already
doing that may be reusable for the
>>>>>>>>> ODR checking.
>>>>>>>>> E.g.
>>>>>>>>> - maybe we could get the coalescing step as
a byproduct of our
>>>>>>>>> existing string deduping, which we are
generally doing anyway.
>>>>>>>>> - we are already coalescing symbol names
for the symbol table. If
>>>>>>>>> the ODR table is keyed off of symbols in
the binary that we are inserting
>>>>>>>>> into the symbol table, then I think we
could do the entire ODR check with
>>>>>>>>> no extra "string" work on
LLD's part.
>>>>>>>>>
>>>>>>>>> I see Rui already mentioned some of this in
>>>>>>>>>
https://bugs.chromium.org/p/chromium/issues/detail?id=726071#c4.
>>>>>>>>> You mentioned that not everything is
necessarily directly keyed on
>>>>>>>>> a symbol (such as types), but I think that
it would really simplify things
>>>>>>>>> if the check was done as such. Do you have
any idea exactly how much of the
>>>>>>>>> things that we want to check are not keyed
on symbols? If most things are
>>>>>>>>> keyed on symbols, for the things we are not
we can just emit extra symbols
>>>>>>>>> prefixed by __clang_odr_check_ or whatever.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Since the current implementation only works
with records there is
>>>>>>>> basically zero overlap right now between ODR
names and symbols. I suppose
>>>>>>>> that I could estimate the amount of function
overlap in a hypothetical
>>>>>>>> implementation that computes ODR hashes of
functions by comparing the
>>>>>>>> number of *_odr functions after clang has
finished IRgen with the number
>>>>>>>> after optimization finishes. This of course
would be strictly more than
>>>>>>>> functions + types.
>>>>>>>>
>>>>>>>
>>>>>>> Wouldn't any function or symbol using the
record type have the type
>>>>>>> name somewhere in it? If we used an offset+length
encoding (instead of
>>>>>>> offset + NUL termination) we might be able to reuse
it then (at some cost
>>>>>>> in finding the reference).
>>>>>>>
>>>>>>
>>>>>> That may be possible with some work in the string table
builder. But
>>>>>> at that point of course we're not dealing with
regular symbols any more. I
>>>>>> guess we could have two ODR tables per object file: an
array of (ODR hash,
>>>>>> location) tuples for ODR names that correspond to
symbol table symbols
>>>>>> (i.e. Rui's proposal on the chromium bug), and an
array of (ODR name, ODR
>>>>>> hash, location) tuples for all other ODR names. I guess
if we wanted a "low
>>>>>> overhead" mode we could just omit the second table
or put fewer symbols in
>>>>>> it.
>>>>>>
>>>>>> With debug info surely there is some sort of string
representing the
>>>>>>> record name or something like that, no?
>>>>>>>
>>>>>>
>>>>>> Not the record name on its own (they do appear but a
bit awkwardly --
>>>>>> each namespace component is stored in a separate
string), but if the record
>>>>>> has at least one member function the mangled type name
will appear
>>>>>> somewhere in .debug_str, so we could in principle reuse
that with the
>>>>>> offset/length trick.
>>>>>>
>>>>>> I guess we may have to have our
"low-overhead" user-facing behavior
>>>>>>> be a bit more nuanced. E.g.:
>>>>>>> 1. does this feature bloat object files
significantly
>>>>>>> 2. does this feature slow down link times
significantly
>>>>>>>
>>>>>>> Intuitively, it seems like we should be able to get
1. when debug
>>>>>>> info happens to be enabled (not sure about split
dwarf?) and possibly in
>>>>>>> all cases at the cost of complexity. We may be able
to get 2. in all cases
>>>>>>> with proper design.
>>>>>>>
>>>>>>
>>>>>> I think that would be my rough assessment as well. I
think we have a
>>>>>> good shot at 1 for all cases with some of the ideas
that have been
>>>>>> mentioned already. If we can avoid creating
dependencies on DWARF I think
>>>>>> that would be ideal -- I'd ideally like this to
work for COFF as well,
>>>>>> where you'd typically expect to find CodeView in
object files. If I were to
>>>>>> try this I think the first thing that I would try is
hash/compression
>>>>>> combined with the two ODR tables (no reuse for
non-symbol ODR names to
>>>>>> start with, as compression may be enough on its own).
>>>>>>
>>>>>
>>>>> I developed a second prototype which uses hash/compression
with no
>>>>> attempt to reuse. It is available here:
>>>>> https://github.com/pcc/llvm-project/tree/odr-checker2
>>>>>
>>>>> For Chromium the object file size overhead was 536566007
bytes, or in
>>>>> relative terms about 25%, or about 4% with debug info. I
measured perf
>>>>> overhead for unit_tests at about 6%, but after I moved the
checker onto
>>>>> another thread, the overhead disappeared into the noise.
>>>>>
>>>>
>>>> Still seems like quite a big increase.
>>>>
>>>> Any chance of compression?
>>>>
>>>
>>> That was with compression -- the implementation compresses the
parts of
>>> the ODR table that aren't hashes (aside from the header and the
Clang
>>> version, which is a small fixed cost), as well as the string table.
The
>>> hashes were left uncompressed because they are in the critical path
of the
>>> linker and because I imagine that they wouldn't really be that
compressible.
>>>
>>
>> I'd be a bit surprised if they weren't especially compressible
-
>>
>
> Maybe I'm wrong, but my intuition about compression is that it works
best
> when the data contains repeated patterns. If we use a hash function with
> good dispersion then I'd expect each hash to have little in common with
> other hashes.
>
>
>> and how much of the size increase is the compressed data V the
>> uncompressed data?
>>
>
> The ratio was roughly 60% compressed data to 40% uncompressed data.
>
>
>> Is it still in the hot path when parallelized?
>>
>
> Not right now according to my benchmarking, but decompression could push
> it into the critical path if it ends up taking longer than the rest of the
> work done by the linker after symbol resolution. On the same machine that I
> used for benchmarking, gunzip'ing 200MB of /dev/urandom (which is
roughly
> what I'd expect the hashes to look like) takes around 1.1s, i.e. a not
> insignificant fraction of lld's runtime.
>
>
>> So I think the remaining gains would either be through limiting the
>>> number of ODR table entries, or through reuse of data.
>>>
>>> Limiting might be something to explore -- one possibility is that
we
>>> could limit the ODR table entries to the declarations that are
"used" by a
>>> particular translation unit (it appears that Clang tracks something
like
>>> that in Decl::Used/Decl::Referenced, but I'm not sure if that
is exactly
>>> what we need -- I think we would basically need to test for
reference
>>> reachability from the functions/globals that are IRgen'd).
>>>
>>
>> Currently it has every type and function that is in the AST? Yeah,
that's
>> a lot - perhaps it should be more like the things that go in the DWARF?
>> (though would need to add some cases there - since the DWARF logic
already
>> relies on the ODR to not emit duplicates in some cases)
>>
>
> Just every record declaration -- Clang only supports ODR hashes for record
> declarations right now. I understand that function declarations (including
> function bodies) are still works in progress.
>
> I think it should indeed just be roughly the things that go in the DWARF.
> I think that at one point I observed that every record declaration, even
> unused ones, were going into the DWARF, but I might have been mistaken
> because I can no longer reproduce that. I'll take a closer look to see
if I
> can reuse what logic presumably already exists for DWARF.
>
> In terms of reuse, it seems that of the 536566007 bytes of
>>> overhead, 319309579 were the compressed part of the ODR tables. So
even if
>>> we achieved 100% sharing,
>>>
>>
>> 100% sharing? You mean if all the data were compressed, and assuming
the
>> hashes were compressible at the same ratio as the other data?
>>
>
> Sorry, I mean if 100% of the data in the compressed part of the ODR table
> could be eliminated by reusing data stored elsewhere (e.g. in the object
> file string table or in the DWARF).
>
> with the current scheme I think that our minimum achievable overhead would
>>> be ~15% (no debug info) or ~2% (with debug info).
>>>
>>>
>>>> Could this go into .dwo files with Fission and be checked by
dwp
>>>> instead, perhaps?
>>>>
>>>
>>> I think it could also work that way, yes.
>>>
>>> I'm reasonably happy with these figures, at least for a first
>>>>> implementation. We may be able to do even better for file
size with reuse,
>>>>> but I'd leave that for version 2.
>>>>>
>>>>
>>>> What's the story with compatibility between versions, then?
Is there a
>>>> version header?
>>>>
>>>
>>> Yes, the header contains a version number.
>>>
>>>
>>>> Will old formats be supported by lld indefinitely? Not at all?
>>>>
>>>
>>> I think we should drop support for old formats when we introduce a
new
>>> format. My understanding is that the ODR hash can change whenever
Clang
>>> changes (the implementation only performs ODR checking if all ODR
tables
>>> were produced by the same revision of Clang), so there wouldn't
seem to be
>>> a huge benefit in keeping support for old formats around.
>>>
>>
>> I imagine it's possible people aren't necessarily going to rev
lld in
>> exact lock-step with clang, but I could be wrong. (certainly binutils
ld or
>> gold aren't released/kept in lock-step with GCC, for example)
>>
>
> That's certainly possible, but I'd say that the bar for dropping
backwards
> compatibility is lower because ODR tables are not required for correctness.
> We could keep compatibility with the last version or so if it isn't too
> burdensome, or otherwise print a warning.
>
They aren't required for correctness, but upgrading your compiler or linker
then silently losing ODR checking would be bad (or even not silently losing
it, but having no choice but to rev both to keep the functionality & hold
the ODR-cleanliness bar) - it's the sort of thing where if you lost the
checking, then gained it back again later, the regression cleanup would be
annoying/an impediment to using the feature.

Any idea what Daniel Jasper & co have been working on WRT ODR checking &
how this feature integrates or doesn't with their work? I imagine they
might be working on something more like a Clang Tooling style approach, but
I'm not sure.

- Dave

>
> Peter
>
>>
>>
>>>
>>> Peter
>>>
>>>>
>>>> - Dave
>>>>
>>>>
>>>>>
>>>>> Peter
>>>>>
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> -- Sean Silva
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> The issue of retaining the ODR check for
functions even if they
>>>>>>>>> get inlined may inherently pose an extra
cost that can't be folded into
>>>>>>>>> existing work the linker is doing, so there
might be a reason for clang to
>>>>>>>>> have a default mode that has practically no
linking overhead and one that
>>>>>>>>> does more thorough checking but imposes
extra linking overhead. Think
>>>>>>>>> something like a crazy boost library with
thousands of functions that get
>>>>>>>>> inlined away, but have gigantic mangled
names and so precisely are the ones
>>>>>>>>> that are going to impose extra cost on the
linker. Simply due to the extra
>>>>>>>>> volume of strings that the linker would
need to look at, I don't think
>>>>>>>>> there's a way to include checking of
all inlined function "for free" at the
>>>>>>>>> linker level using the symbol approach.
>>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>> I guess those inlined functions would still
have those symbol names
>>>>>>>>> in debug info (I think?), so piggybacking
on the string deduplication we're
>>>>>>>>> already doing might make it possible to
fold away the work in that case
>>>>>>>>> (but then again, would still impose extra
cost with split dwarf...).
>>>>>>>>>
>>>>>>>>> Anyway, let's wait to see what the
actual performance numbers are.
>>>>>>>>>
>>>>>>>>> -- Sean Silva
>>>>>>>>>
>>>>>>>>> On Tue, Jun 6, 2017 at 10:40 PM, Peter
Collingbourne via cfe-dev <
>>>>>>>>> cfe-dev at lists.llvm.org> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I'd like to propose an ODR checker
feature for Clang and LLD. The
>>>>>>>>>> feature would be similar to gold's
--detect-odr-violations feature, but
>>>>>>>>>> better: we can rely on integration with
clang to avoid relying on debug
>>>>>>>>>> info and to perform more precise
matching.
>>>>>>>>>>
>>>>>>>>>> The basic idea is that we use
clang's ability to create ODR
>>>>>>>>>> hashes for declarations. ODR hashes are
computed using all information
>>>>>>>>>> about a declaration that is
ODR-relevant. If the flag
>>>>>>>>>> -fdetect-odr-violations is passed,
Clang will store the ODR hashes in a
>>>>>>>>>> so-called ODR table in each object
file. Each ODR table will contain a
>>>>>>>>>> mapping from mangled declaration names
to ODR hashes. At link time, the
>>>>>>>>>> linker will read the ODR table and
report any mismatches.
>>>>>>>>>>
>>>>>>>>>> To make this work:
>>>>>>>>>> - LLVM will be extended with the
ability to represent ODR tables
>>>>>>>>>> in the IR and emit them to object files
>>>>>>>>>> - Clang will be extended with the
ability to emit ODR tables
>>>>>>>>>> using ODR hashes
>>>>>>>>>> - LLD will be extended to read ODR
tables from object files
>>>>>>>>>>
>>>>>>>>>> I have implemented a prototype of this
feature. It is available
>>>>>>>>>> here:
https://github.com/pcc/llvm-project/tree/odr-checker and
>>>>>>>>>> some results from applying it to
chromium are here:
>>>>>>>>>> crbug.com/726071
>>>>>>>>>> As you can see it did indeed find a
number of real ODR violations
>>>>>>>>>> in Chromium, including some that
wouldn't be detectable using debug info.
>>>>>>>>>>
>>>>>>>>>> If you're interested in what the
format of the ODR table would
>>>>>>>>>> look like, that prototype shows pretty
much what I had in mind, but I
>>>>>>>>>> expect many other aspects of the
implementation to change as it is
>>>>>>>>>> upstreamed.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>>
_______________________________________________
>>>>>>>>>> cfe-dev mailing list
>>>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> --
>>>>>>>> Peter
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> --
>>>>>> Peter
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> --
>>>>> Peter
>>>>> _______________________________________________
>>>>> cfe-dev mailing list
>>>>> cfe-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>
>>>>
>>>
>>>
>>> --
>>> --
>>> Peter
>>>
>>
>
>
> --
> --
> Peter
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170614/91451507/attachment-0001.html>

Peter Collingbourne via llvm-dev

2017-Jun-14 06:30 UTC

head link

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

On Tue, Jun 13, 2017 at 11:06 PM, David Blaikie <dblaikie at gmail.com>
wrote:
>
>
> On Tue, Jun 13, 2017 at 10:05 PM Peter Collingbourne <peter at
pcc.me.uk>
> wrote:
>
>> On Tue, Jun 13, 2017 at 8:48 PM, David Blaikie <dblaikie at
gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Jun 13, 2017 at 8:43 PM Peter Collingbourne <peter at
pcc.me.uk>
>>> wrote:
>>>
>>>> On Tue, Jun 13, 2017 at 7:54 PM, David Blaikie <dblaikie at
gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Jun 13, 2017 at 6:34 PM Peter Collingbourne via
cfe-dev <
>>>>> cfe-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> On Wed, Jun 7, 2017 at 11:28 PM, Peter Collingbourne
<peter at pcc.me.uk
>>>>>> > wrote:
>>>>>>
>>>>>>> On Wed, Jun 7, 2017 at 8:06 PM, Sean Silva
<chisophugis at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jun 7, 2017 at 4:31 PM, Peter
Collingbourne <
>>>>>>>> peter at pcc.me.uk> wrote:
>>>>>>>>
>>>>>>>>> On Wed, Jun 7, 2017 at 12:17 AM, Sean Silva
<chisophugis at gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> Very nice and simple implementation!
>>>>>>>>>>
>>>>>>>>>> Do you have any statistics on how large
these odr tables are
>>>>>>>>>> compared to other object file data? I
assume that if these tables contain
>>>>>>>>>> full mangled symbol names, they could
end up being very large and may want
>>>>>>>>>> to share the symbol name strings with
the overall string table in the .o
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Looking at Chromium's object files it
looks like the total size of
>>>>>>>>> the odrtabs is about 50% of the total size
of the object files, which isn't
>>>>>>>>> great. The current implementation only
looks at records, so I imagine that
>>>>>>>>> it would be hard to share any of the
strings that I'm currently creating.
>>>>>>>>> (I guess it's possible that some types
will have a mangled vtable name in
>>>>>>>>> the string table, so we may be able to
share a little that way.) Note
>>>>>>>>> however that this was without debug info.
>>>>>>>>>
>>>>>>>>> One option for reducing size would be to
>>>>>>>>> 1) store hashes of ODR names in ODR tables,
per Rui's suggestion
>>>>>>>>> (alongside a reference to the name itself
in the string table)
>>>>>>>>> 2) compress the string table for the ODR
names with a standard
>>>>>>>>> compression algorithm like gzip.
>>>>>>>>> This wouldn't seem to affect link time
performance much because I
>>>>>>>>> think we should only need to look at the
strings if we see a ODR name hash
>>>>>>>>> match together with an ODR hash mismatch,
which would mean an ODR violation
>>>>>>>>> with a high probability (i.e. unless there
was an ODR name hash collision,
>>>>>>>>> we have found an ODR violation). If we
don't expect a lot of sharing with
>>>>>>>>> regular string tables (see below), it seems
even more reasonable.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Neat observation!
>>>>>>>>
>>>>>>>> FWIW, it is a birthday problem type situation
though, so for a
>>>>>>>> 32-bit hash, we would expect a collision in
about 1 in 2^16 distinct hashes
>>>>>>>> (and 2^16 seems pretty easy to hit in a large
project). So 64-bit hashes
>>>>>>>> might be preferable.
>>>>>>>>
>>>>>>>
>>>>>>> Oh right, good point, using a 64-bit hash does seem
like a good idea
>>>>>>> here.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>> Also, do you have any numbers on the
performance of your initial
>>>>>>>>>> implementation?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I measured the link time for chromium's
unit_tests (the largest
>>>>>>>>> single binary in chromium) at 5.05s without
ODR checks and 6.61s with ODR
>>>>>>>>> checks. So about 30% overhead, but in
absolute terms it doesn't seem too
>>>>>>>>> bad. So I think this may be acceptable for
an initial implementation, but
>>>>>>>>> it certainly seems worth trying to do
better.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I know that things aren't currently
apples-to-apples, but how does
>>>>>>>> that compare to gold?
>>>>>>>>
>>>>>>>
>>>>>>> I will measure it.
>>>>>>>
>>>>>>
>>>>>> For that unit_tests binary I measured the overhead at
about 5 seconds
>>>>>> (average of 10 runs). That is with debug info, of
course.
>>>>>>
>>>>>> W.r.t. LLD and having it always on by default (and
hence making it as
>>>>>>>>>> fast as possible), it seems like right
now you are implementing the
>>>>>>>>>> checking process with a hash table.
That's simple and fine for a first
>>>>>>>>>> implementation, but it's probably
worth mentioning in a comment the problem
>>>>>>>>>> of checking the tables, at least from
the linker's perspective, does fit
>>>>>>>>>> into a map-reduce pattern and could be
easily parallelized if needed. E.g.
>>>>>>>>>> a parallel sort to coalesce all entries
for symbols of the same name
>>>>>>>>>> followed by a parallel forEach to check
each bucket with the same symbol
>>>>>>>>>> name (roughly speaking).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Right, that's one approach. I was
thinking of a simpler approach
>>>>>>>>> where at compile time we sort ODR names by
hash and partition them using
>>>>>>>>> (say) the upper bits of the hash, so that
at link time we can have N
>>>>>>>>> threads each building a hash table for a
specific partition.
>>>>>>>>>
>>>>>>>>> And of course this work can be started
right after symbol
>>>>>>>>> resolution finishes and parallelised with
the rest of the work done by the
>>>>>>>>> linker.
>>>>>>>>>
>>>>>>>>> Even better than doing it faster is just
doing less work. There's
>>>>>>>>>> a lot of work that the linker is
already doing that may be reusable for the
>>>>>>>>>> ODR checking.
>>>>>>>>>> E.g.
>>>>>>>>>> - maybe we could get the coalescing
step as a byproduct of our
>>>>>>>>>> existing string deduping, which we are
generally doing anyway.
>>>>>>>>>> - we are already coalescing symbol
names for the symbol table. If
>>>>>>>>>> the ODR table is keyed off of symbols
in the binary that we are inserting
>>>>>>>>>> into the symbol table, then I think we
could do the entire ODR check with
>>>>>>>>>> no extra "string" work on
LLD's part.
>>>>>>>>>>
>>>>>>>>>> I see Rui already mentioned some of
this in
>>>>>>>>>>
https://bugs.chromium.org/p/chromium/issues/detail?id=726071#c4.
>>>>>>>>>> You mentioned that not everything is
necessarily directly keyed
>>>>>>>>>> on a symbol (such as types), but I
think that it would really simplify
>>>>>>>>>> things if the check was done as such.
Do you have any idea exactly how much
>>>>>>>>>> of the things that we want to check are
not keyed on symbols? If most
>>>>>>>>>> things are keyed on symbols, for the
things we are not we can just emit
>>>>>>>>>> extra symbols prefixed by
__clang_odr_check_ or whatever.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Since the current implementation only works
with records there is
>>>>>>>>> basically zero overlap right now between
ODR names and symbols. I suppose
>>>>>>>>> that I could estimate the amount of
function overlap in a hypothetical
>>>>>>>>> implementation that computes ODR hashes of
functions by comparing the
>>>>>>>>> number of *_odr functions after clang has
finished IRgen with the number
>>>>>>>>> after optimization finishes. This of course
would be strictly more than
>>>>>>>>> functions + types.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Wouldn't any function or symbol using the
record type have the type
>>>>>>>> name somewhere in it? If we used an
offset+length encoding (instead of
>>>>>>>> offset + NUL termination) we might be able to
reuse it then (at some cost
>>>>>>>> in finding the reference).
>>>>>>>>
>>>>>>>
>>>>>>> That may be possible with some work in the string
table builder. But
>>>>>>> at that point of course we're not dealing with
regular symbols any more. I
>>>>>>> guess we could have two ODR tables per object file:
an array of (ODR hash,
>>>>>>> location) tuples for ODR names that correspond to
symbol table symbols
>>>>>>> (i.e. Rui's proposal on the chromium bug), and
an array of (ODR name, ODR
>>>>>>> hash, location) tuples for all other ODR names. I
guess if we wanted a "low
>>>>>>> overhead" mode we could just omit the second
table or put fewer symbols in
>>>>>>> it.
>>>>>>>
>>>>>>> With debug info surely there is some sort of string
representing the
>>>>>>>> record name or something like that, no?
>>>>>>>>
>>>>>>>
>>>>>>> Not the record name on its own (they do appear but
a bit awkwardly
>>>>>>> -- each namespace component is stored in a separate
string), but if the
>>>>>>> record has at least one member function the mangled
type name will appear
>>>>>>> somewhere in .debug_str, so we could in principle
reuse that with the
>>>>>>> offset/length trick.
>>>>>>>
>>>>>>> I guess we may have to have our
"low-overhead" user-facing behavior
>>>>>>>> be a bit more nuanced. E.g.:
>>>>>>>> 1. does this feature bloat object files
significantly
>>>>>>>> 2. does this feature slow down link times
significantly
>>>>>>>>
>>>>>>>> Intuitively, it seems like we should be able to
get 1. when debug
>>>>>>>> info happens to be enabled (not sure about
split dwarf?) and possibly in
>>>>>>>> all cases at the cost of complexity. We may be
able to get 2. in all cases
>>>>>>>> with proper design.
>>>>>>>>
>>>>>>>
>>>>>>> I think that would be my rough assessment as well.
I think we have a
>>>>>>> good shot at 1 for all cases with some of the ideas
that have been
>>>>>>> mentioned already. If we can avoid creating
dependencies on DWARF I think
>>>>>>> that would be ideal -- I'd ideally like this to
work for COFF as well,
>>>>>>> where you'd typically expect to find CodeView
in object files. If I were to
>>>>>>> try this I think the first thing that I would try
is hash/compression
>>>>>>> combined with the two ODR tables (no reuse for
non-symbol ODR names to
>>>>>>> start with, as compression may be enough on its
own).
>>>>>>>
>>>>>>
>>>>>> I developed a second prototype which uses
hash/compression with no
>>>>>> attempt to reuse. It is available here:
https://github.com/pcc/
>>>>>> llvm-project/tree/odr-checker2
>>>>>>
>>>>>> For Chromium the object file size overhead was
536566007 bytes, or in
>>>>>> relative terms about 25%, or about 4% with debug info.
I measured perf
>>>>>> overhead for unit_tests at about 6%, but after I moved
the checker onto
>>>>>> another thread, the overhead disappeared into the
noise.
>>>>>>
>>>>>
>>>>> Still seems like quite a big increase.
>>>>>
>>>>> Any chance of compression?
>>>>>
>>>>
>>>> That was with compression -- the implementation compresses the
parts of
>>>> the ODR table that aren't hashes (aside from the header and
the Clang
>>>> version, which is a small fixed cost), as well as the string
table. The
>>>> hashes were left uncompressed because they are in the critical
path of the
>>>> linker and because I imagine that they wouldn't really be
that compressible.
>>>>
>>>
>>> I'd be a bit surprised if they weren't especially
compressible -
>>>
>>
>> Maybe I'm wrong, but my intuition about compression is that it
works best
>> when the data contains repeated patterns. If we use a hash function
with
>> good dispersion then I'd expect each hash to have little in common
with
>> other hashes.
>>
>>
>>> and how much of the size increase is the compressed data V the
>>> uncompressed data?
>>>
>>
>> The ratio was roughly 60% compressed data to 40% uncompressed data.
>>
>>
>>> Is it still in the hot path when parallelized?
>>>
>>
>> Not right now according to my benchmarking, but decompression could
push
>> it into the critical path if it ends up taking longer than the rest of
the
>> work done by the linker after symbol resolution. On the same machine
that I
>> used for benchmarking, gunzip'ing 200MB of /dev/urandom (which is
roughly
>> what I'd expect the hashes to look like) takes around 1.1s, i.e. a
not
>> insignificant fraction of lld's runtime.
>>
>>
>>> So I think the remaining gains would either be through limiting the
>>>> number of ODR table entries, or through reuse of data.
>>>>
>>>> Limiting might be something to explore -- one possibility is
that we
>>>> could limit the ODR table entries to the declarations that are
"used" by a
>>>> particular translation unit (it appears that Clang tracks
something like
>>>> that in Decl::Used/Decl::Referenced, but I'm not sure if
that is exactly
>>>> what we need -- I think we would basically need to test for
reference
>>>> reachability from the functions/globals that are IRgen'd).
>>>>
>>>
>>> Currently it has every type and function that is in the AST? Yeah,
>>> that's a lot - perhaps it should be more like the things that
go in the
>>> DWARF? (though would need to add some cases there - since the DWARF
logic
>>> already relies on the ODR to not emit duplicates in some cases)
>>>
>>
>> Just every record declaration -- Clang only supports ODR hashes for
>> record declarations right now. I understand that function declarations
>> (including function bodies) are still works in progress.
>>
>> I think it should indeed just be roughly the things that go in the
DWARF.
>> I think that at one point I observed that every record declaration,
even
>> unused ones, were going into the DWARF, but I might have been mistaken
>> because I can no longer reproduce that. I'll take a closer look to
see if I
>> can reuse what logic presumably already exists for DWARF.
>>
>> In terms of reuse, it seems that of the 536566007 bytes of
>>>> overhead, 319309579 were the compressed part of the ODR tables.
So even if
>>>> we achieved 100% sharing,
>>>>
>>>
>>> 100% sharing? You mean if all the data were compressed, and
assuming the
>>> hashes were compressible at the same ratio as the other data?
>>>
>>
>> Sorry, I mean if 100% of the data in the compressed part of the ODR
table
>> could be eliminated by reusing data stored elsewhere (e.g. in the
object
>> file string table or in the DWARF).
>>
>> with the current scheme I think that our minimum achievable overhead
>>>> would be ~15% (no debug info) or ~2% (with debug info).
>>>>
>>>>
>>>>> Could this go into .dwo files with Fission and be checked
by dwp
>>>>> instead, perhaps?
>>>>>
>>>>
>>>> I think it could also work that way, yes.
>>>>
>>>> I'm reasonably happy with these figures, at least for a
first
>>>>>> implementation. We may be able to do even better for
file size with reuse,
>>>>>> but I'd leave that for version 2.
>>>>>>
>>>>>
>>>>> What's the story with compatibility between versions,
then? Is there a
>>>>> version header?
>>>>>
>>>>
>>>> Yes, the header contains a version number.
>>>>
>>>>
>>>>> Will old formats be supported by lld indefinitely? Not at
all?
>>>>>
>>>>
>>>> I think we should drop support for old formats when we
introduce a new
>>>> format. My understanding is that the ODR hash can change
whenever Clang
>>>> changes (the implementation only performs ODR checking if all
ODR tables
>>>> were produced by the same revision of Clang), so there
wouldn't seem to be
>>>> a huge benefit in keeping support for old formats around.
>>>>
>>>
>>> I imagine it's possible people aren't necessarily going to
rev lld in
>>> exact lock-step with clang, but I could be wrong. (certainly
binutils ld or
>>> gold aren't released/kept in lock-step with GCC, for example)
>>>
>>
>> That's certainly possible, but I'd say that the bar for
dropping
>> backwards compatibility is lower because ODR tables are not required
for
>> correctness. We could keep compatibility with the last version or so if
it
>> isn't too burdensome, or otherwise print a warning.
>>
>
> They aren't required for correctness, but upgrading your compiler or
> linker then silently losing ODR checking would be bad (or even not silently
> losing it, but having no choice but to rev both to keep the functionality
&
> hold the ODR-cleanliness bar) - it's the sort of thing where if you
lost
> the checking, then gained it back again later, the regression cleanup would
> be annoying/an impediment to using the feature.
>
Makes sense I guess. I'd be fine with a policy where the Nth open source
release should be able to read ODR tables produced by the N-1th and
possibly the N-2th release.

Any idea what Daniel Jasper & co have been working on WRT ODR checking
&> how this feature integrates or doesn't with their work? I imagine they
> might be working on something more like a Clang Tooling style approach, but
> I'm not sure.
>
I'm not aware of any work like that, only of Richard Trieu's efforts for
modules that I'm piggybacking on.

Peter
>
> - Dave
>
>
>>
>> Peter
>>
>>>
>>>
>>>>
>>>> Peter
>>>>
>>>>>
>>>>> - Dave
>>>>>
>>>>>
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> -- Sean Silva
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> The issue of retaining the ODR check
for functions even if they
>>>>>>>>>> get inlined may inherently pose an
extra cost that can't be folded into
>>>>>>>>>> existing work the linker is doing, so
there might be a reason for clang to
>>>>>>>>>> have a default mode that has
practically no linking overhead and one that
>>>>>>>>>> does more thorough checking but imposes
extra linking overhead. Think
>>>>>>>>>> something like a crazy boost library
with thousands of functions that get
>>>>>>>>>> inlined away, but have gigantic mangled
names and so precisely are the ones
>>>>>>>>>> that are going to impose extra cost on
the linker. Simply due to the extra
>>>>>>>>>> volume of strings that the linker would
need to look at, I don't think
>>>>>>>>>> there's a way to include checking
of all inlined function "for free" at the
>>>>>>>>>> linker level using the symbol approach.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> I guess those inlined functions would still
have those symbol
>>>>>>>>>> names in debug info (I think?), so
piggybacking on the string deduplication
>>>>>>>>>> we're already doing might make it
possible to fold away the work in that
>>>>>>>>>> case (but then again, would still
impose extra cost with split dwarf...).
>>>>>>>>>>
>>>>>>>>>> Anyway, let's wait to see what the
actual performance numbers are.
>>>>>>>>>>
>>>>>>>>>> -- Sean Silva
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 6, 2017 at 10:40 PM, Peter
Collingbourne via cfe-dev
>>>>>>>>>> <cfe-dev at lists.llvm.org>
wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I'd like to propose an ODR
checker feature for Clang and LLD.
>>>>>>>>>>> The feature would be similar to
gold's --detect-odr-violations feature, but
>>>>>>>>>>> better: we can rely on integration
with clang to avoid relying on debug
>>>>>>>>>>> info and to perform more precise
matching.
>>>>>>>>>>>
>>>>>>>>>>> The basic idea is that we use
clang's ability to create ODR
>>>>>>>>>>> hashes for declarations. ODR hashes
are computed using all information
>>>>>>>>>>> about a declaration that is
ODR-relevant. If the flag
>>>>>>>>>>> -fdetect-odr-violations is passed,
Clang will store the ODR hashes in a
>>>>>>>>>>> so-called ODR table in each object
file. Each ODR table will contain a
>>>>>>>>>>> mapping from mangled declaration
names to ODR hashes. At link time, the
>>>>>>>>>>> linker will read the ODR table and
report any mismatches.
>>>>>>>>>>>
>>>>>>>>>>> To make this work:
>>>>>>>>>>> - LLVM will be extended with the
ability to represent ODR tables
>>>>>>>>>>> in the IR and emit them to object
files
>>>>>>>>>>> - Clang will be extended with the
ability to emit ODR tables
>>>>>>>>>>> using ODR hashes
>>>>>>>>>>> - LLD will be extended to read ODR
tables from object files
>>>>>>>>>>>
>>>>>>>>>>> I have implemented a prototype of
this feature. It is available
>>>>>>>>>>> here:
https://github.com/pcc/llvm-project/tree/odr-checker and
>>>>>>>>>>> some results from applying it to
chromium are here:
>>>>>>>>>>> crbug.com/726071
>>>>>>>>>>> As you can see it did indeed find a
number of real ODR
>>>>>>>>>>> violations in Chromium, including
some that wouldn't be detectable using
>>>>>>>>>>> debug info.
>>>>>>>>>>>
>>>>>>>>>>> If you're interested in what
the format of the ODR table would
>>>>>>>>>>> look like, that prototype shows
pretty much what I had in mind, but I
>>>>>>>>>>> expect many other aspects of the
implementation to change as it is
>>>>>>>>>>> upstreamed.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> --
>>>>>>>>>>> --
>>>>>>>>>>> Peter
>>>>>>>>>>>
>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>> cfe-dev mailing list
>>>>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> --
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> --
>>>>>>> Peter
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> --
>>>>>> Peter
>>>>>> _______________________________________________
>>>>>> cfe-dev mailing list
>>>>>> cfe-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> --
>>>> Peter
>>>>
>>>
>>
>>
>> --
>> --
>> Peter
>>
>

-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170613/b57fa5b5/attachment.html>

David Blaikie via llvm-dev

2017-Jun-14 19:47 UTC

head link

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

On Tue, Jun 13, 2017, 11:30 PM Peter Collingbourne <peter at pcc.me.uk>
wrote:
> On Tue, Jun 13, 2017 at 11:06 PM, David Blaikie <dblaikie at
gmail.com>
> wrote:
>
>>
>>
>> On Tue, Jun 13, 2017 at 10:05 PM Peter Collingbourne <peter at
pcc.me.uk>
>> wrote:
>>
>>> On Tue, Jun 13, 2017 at 8:48 PM, David Blaikie <dblaikie at
gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Jun 13, 2017 at 8:43 PM Peter Collingbourne <peter
at pcc.me.uk>
>>>> wrote:
>>>>
>>>>> On Tue, Jun 13, 2017 at 7:54 PM, David Blaikie <dblaikie
at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 13, 2017 at 6:34 PM Peter Collingbourne via
cfe-dev <
>>>>>> cfe-dev at lists.llvm.org> wrote:
>>>>>>
>>>>>>> On Wed, Jun 7, 2017 at 11:28 PM, Peter
Collingbourne <
>>>>>>> peter at pcc.me.uk> wrote:
>>>>>>>
>>>>>>>> On Wed, Jun 7, 2017 at 8:06 PM, Sean Silva
<chisophugis at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jun 7, 2017 at 4:31 PM, Peter
Collingbourne <
>>>>>>>>> peter at pcc.me.uk> wrote:
>>>>>>>>>
>>>>>>>>>> On Wed, Jun 7, 2017 at 12:17 AM, Sean
Silva <
>>>>>>>>>> chisophugis at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Very nice and simple
implementation!
>>>>>>>>>>>
>>>>>>>>>>> Do you have any statistics on how
large these odr tables are
>>>>>>>>>>> compared to other object file data?
I assume that if these tables contain
>>>>>>>>>>> full mangled symbol names, they
could end up being very large and may want
>>>>>>>>>>> to share the symbol name strings
with the overall string table in the .o
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Looking at Chromium's object files
it looks like the total size
>>>>>>>>>> of the odrtabs is about 50% of the
total size of the object files, which
>>>>>>>>>> isn't great. The current
implementation only looks at records, so I imagine
>>>>>>>>>> that it would be hard to share any of
the strings that I'm currently
>>>>>>>>>> creating. (I guess it's possible
that some types will have a mangled vtable
>>>>>>>>>> name in the string table, so we may be
able to share a little that way.)
>>>>>>>>>> Note however that this was without
debug info.
>>>>>>>>>>
>>>>>>>>>> One option for reducing size would be
to
>>>>>>>>>> 1) store hashes of ODR names in ODR
tables, per Rui's suggestion
>>>>>>>>>> (alongside a reference to the name
itself in the string table)
>>>>>>>>>> 2) compress the string table for the
ODR names with a standard
>>>>>>>>>> compression algorithm like gzip.
>>>>>>>>>> This wouldn't seem to affect link
time performance much because I
>>>>>>>>>> think we should only need to look at
the strings if we see a ODR name hash
>>>>>>>>>> match together with an ODR hash
mismatch, which would mean an ODR violation
>>>>>>>>>> with a high probability (i.e. unless
there was an ODR name hash collision,
>>>>>>>>>> we have found an ODR violation). If we
don't expect a lot of sharing with
>>>>>>>>>> regular string tables (see below), it
seems even more reasonable.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Neat observation!
>>>>>>>>>
>>>>>>>>> FWIW, it is a birthday problem type
situation though, so for a
>>>>>>>>> 32-bit hash, we would expect a collision in
about 1 in 2^16 distinct hashes
>>>>>>>>> (and 2^16 seems pretty easy to hit in a
large project). So 64-bit hashes
>>>>>>>>> might be preferable.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Oh right, good point, using a 64-bit hash does
seem like a good
>>>>>>>> idea here.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Also, do you have any numbers on the
performance of your initial
>>>>>>>>>>> implementation?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I measured the link time for
chromium's unit_tests (the largest
>>>>>>>>>> single binary in chromium) at 5.05s
without ODR checks and 6.61s with ODR
>>>>>>>>>> checks. So about 30% overhead, but in
absolute terms it doesn't seem too
>>>>>>>>>> bad. So I think this may be acceptable
for an initial implementation, but
>>>>>>>>>> it certainly seems worth trying to do
better.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I know that things aren't currently
apples-to-apples, but how does
>>>>>>>>> that compare to gold?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I will measure it.
>>>>>>>>
>>>>>>>
>>>>>>> For that unit_tests binary I measured the overhead
at about 5
>>>>>>> seconds (average of 10 runs). That is with debug
info, of course.
>>>>>>>
>>>>>>> W.r.t. LLD and having it always on by default (and
hence making it
>>>>>>>>>>> as fast as possible), it seems like
right now you are implementing the
>>>>>>>>>>> checking process with a hash table.
That's simple and fine for a first
>>>>>>>>>>> implementation, but it's
probably worth mentioning in a comment the problem
>>>>>>>>>>> of checking the tables, at least
from the linker's perspective, does fit
>>>>>>>>>>> into a map-reduce pattern and could
be easily parallelized if needed. E.g.
>>>>>>>>>>> a parallel sort to coalesce all
entries for symbols of the same name
>>>>>>>>>>> followed by a parallel forEach to
check each bucket with the same symbol
>>>>>>>>>>> name (roughly speaking).
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Right, that's one approach. I was
thinking of a simpler approach
>>>>>>>>>> where at compile time we sort ODR names
by hash and partition them using
>>>>>>>>>> (say) the upper bits of the hash, so
that at link time we can have N
>>>>>>>>>> threads each building a hash table for
a specific partition.
>>>>>>>>>>
>>>>>>>>>> And of course this work can be started
right after symbol
>>>>>>>>>> resolution finishes and parallelised
with the rest of the work done by the
>>>>>>>>>> linker.
>>>>>>>>>>
>>>>>>>>>> Even better than doing it faster is
just doing less work. There's
>>>>>>>>>>> a lot of work that the linker is
already doing that may be reusable for the
>>>>>>>>>>> ODR checking.
>>>>>>>>>>> E.g.
>>>>>>>>>>> - maybe we could get the coalescing
step as a byproduct of our
>>>>>>>>>>> existing string deduping, which we
are generally doing anyway.
>>>>>>>>>>> - we are already coalescing symbol
names for the symbol table.
>>>>>>>>>>> If the ODR table is keyed off of
symbols in the binary that we are
>>>>>>>>>>> inserting into the symbol table,
then I think we could do the entire ODR
>>>>>>>>>>> check with no extra
"string" work on LLD's part.
>>>>>>>>>>>
>>>>>>>>>>> I see Rui already mentioned some of
this in
>>>>>>>>>>>
https://bugs.chromium.org/p/chromium/issues/detail?id=726071#c4.
>>>>>>>>>>> You mentioned that not everything
is necessarily directly keyed
>>>>>>>>>>> on a symbol (such as types), but I
think that it would really simplify
>>>>>>>>>>> things if the check was done as
such. Do you have any idea exactly how much
>>>>>>>>>>> of the things that we want to check
are not keyed on symbols? If most
>>>>>>>>>>> things are keyed on symbols, for
the things we are not we can just emit
>>>>>>>>>>> extra symbols prefixed by
__clang_odr_check_ or whatever.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Since the current implementation only
works with records there is
>>>>>>>>>> basically zero overlap right now
between ODR names and symbols. I suppose
>>>>>>>>>> that I could estimate the amount of
function overlap in a hypothetical
>>>>>>>>>> implementation that computes ODR hashes
of functions by comparing the
>>>>>>>>>> number of *_odr functions after clang
has finished IRgen with the number
>>>>>>>>>> after optimization finishes. This of
course would be strictly more than
>>>>>>>>>> functions + types.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Wouldn't any function or symbol using
the record type have the
>>>>>>>>> type name somewhere in it? If we used an
offset+length encoding (instead of
>>>>>>>>> offset + NUL termination) we might be able
to reuse it then (at some cost
>>>>>>>>> in finding the reference).
>>>>>>>>>
>>>>>>>>
>>>>>>>> That may be possible with some work in the
string table builder.
>>>>>>>> But at that point of course we're not
dealing with regular symbols any
>>>>>>>> more. I guess we could have two ODR tables per
object file: an array of
>>>>>>>> (ODR hash, location) tuples for ODR names that
correspond to symbol table
>>>>>>>> symbols (i.e. Rui's proposal on the
chromium bug), and an array of (ODR
>>>>>>>> name, ODR hash, location) tuples for all other
ODR names. I guess if we
>>>>>>>> wanted a "low overhead" mode we could
just omit the second table or put
>>>>>>>> fewer symbols in it.
>>>>>>>>
>>>>>>>> With debug info surely there is some sort of
string representing
>>>>>>>>> the record name or something like that, no?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Not the record name on its own (they do appear
but a bit awkwardly
>>>>>>>> -- each namespace component is stored in a
separate string), but if the
>>>>>>>> record has at least one member function the
mangled type name will appear
>>>>>>>> somewhere in .debug_str, so we could in
principle reuse that with the
>>>>>>>> offset/length trick.
>>>>>>>>
>>>>>>>> I guess we may have to have our
"low-overhead" user-facing behavior
>>>>>>>>> be a bit more nuanced. E.g.:
>>>>>>>>> 1. does this feature bloat object files
significantly
>>>>>>>>> 2. does this feature slow down link times
significantly
>>>>>>>>>
>>>>>>>>> Intuitively, it seems like we should be
able to get 1. when debug
>>>>>>>>> info happens to be enabled (not sure about
split dwarf?) and possibly in
>>>>>>>>> all cases at the cost of complexity. We may
be able to get 2. in all cases
>>>>>>>>> with proper design.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think that would be my rough assessment as
well. I think we have
>>>>>>>> a good shot at 1 for all cases with some of the
ideas that have been
>>>>>>>> mentioned already. If we can avoid creating
dependencies on DWARF I think
>>>>>>>> that would be ideal -- I'd ideally like
this to work for COFF as well,
>>>>>>>> where you'd typically expect to find
CodeView in object files. If I were to
>>>>>>>> try this I think the first thing that I would
try is hash/compression
>>>>>>>> combined with the two ODR tables (no reuse for
non-symbol ODR names to
>>>>>>>> start with, as compression may be enough on its
own).
>>>>>>>>
>>>>>>>
>>>>>>> I developed a second prototype which uses
hash/compression with no
>>>>>>> attempt to reuse. It is available here:
>>>>>>>
https://github.com/pcc/llvm-project/tree/odr-checker2
>>>>>>>
>>>>>>> For Chromium the object file size overhead was
536566007 bytes, or
>>>>>>> in relative terms about 25%, or about 4% with debug
info. I measured perf
>>>>>>> overhead for unit_tests at about 6%, but after I
moved the checker onto
>>>>>>> another thread, the overhead disappeared into the
noise.
>>>>>>>
>>>>>>
>>>>>> Still seems like quite a big increase.
>>>>>>
>>>>>> Any chance of compression?
>>>>>>
>>>>>
>>>>> That was with compression -- the implementation compresses
the parts
>>>>> of the ODR table that aren't hashes (aside from the
header and the Clang
>>>>> version, which is a small fixed cost), as well as the
string table. The
>>>>> hashes were left uncompressed because they are in the
critical path of the
>>>>> linker and because I imagine that they wouldn't really
be that compressible.
>>>>>
>>>>
>>>> I'd be a bit surprised if they weren't especially
compressible -
>>>>
>>>
>>> Maybe I'm wrong, but my intuition about compression is that it
works
>>> best when the data contains repeated patterns. If we use a hash
function
>>> with good dispersion then I'd expect each hash to have little
in common
>>> with other hashes.
>>>
>>>
>>>> and how much of the size increase is the compressed data V the
>>>> uncompressed data?
>>>>
>>>
>>> The ratio was roughly 60% compressed data to 40% uncompressed data.
>>>
>>>
>>>> Is it still in the hot path when parallelized?
>>>>
>>>
>>> Not right now according to my benchmarking, but decompression could
push
>>> it into the critical path if it ends up taking longer than the rest
of the
>>> work done by the linker after symbol resolution. On the same
machine that I
>>> used for benchmarking, gunzip'ing 200MB of /dev/urandom (which
is roughly
>>> what I'd expect the hashes to look like) takes around 1.1s,
i.e. a not
>>> insignificant fraction of lld's runtime.
>>>
>>>
>>>> So I think the remaining gains would either be through limiting
the
>>>>> number of ODR table entries, or through reuse of data.
>>>>>
>>>>> Limiting might be something to explore -- one possibility
is that we
>>>>> could limit the ODR table entries to the declarations that
are "used" by a
>>>>> particular translation unit (it appears that Clang tracks
something like
>>>>> that in Decl::Used/Decl::Referenced, but I'm not sure
if that is exactly
>>>>> what we need -- I think we would basically need to test for
reference
>>>>> reachability from the functions/globals that are
IRgen'd).
>>>>>
>>>>
>>>> Currently it has every type and function that is in the AST?
Yeah,
>>>> that's a lot - perhaps it should be more like the things
that go in the
>>>> DWARF? (though would need to add some cases there - since the
DWARF logic
>>>> already relies on the ODR to not emit duplicates in some cases)
>>>>
>>>
>>> Just every record declaration -- Clang only supports ODR hashes for
>>> record declarations right now. I understand that function
declarations
>>> (including function bodies) are still works in progress.
>>>
>>> I think it should indeed just be roughly the things that go in the
>>> DWARF. I think that at one point I observed that every record
declaration,
>>> even unused ones, were going into the DWARF, but I might have been
mistaken
>>> because I can no longer reproduce that. I'll take a closer look
to see if I
>>> can reuse what logic presumably already exists for DWARF.
>>>
>>> In terms of reuse, it seems that of the 536566007 bytes of
>>>>> overhead, 319309579 were the compressed part of the ODR
tables. So even if
>>>>> we achieved 100% sharing,
>>>>>
>>>>
>>>> 100% sharing? You mean if all the data were compressed, and
assuming
>>>> the hashes were compressible at the same ratio as the other
data?
>>>>
>>>
>>> Sorry, I mean if 100% of the data in the compressed part of the ODR
>>> table could be eliminated by reusing data stored elsewhere (e.g. in
the
>>> object file string table or in the DWARF).
>>>
>>> with the current scheme I think that our minimum achievable
overhead
>>>>> would be ~15% (no debug info) or ~2% (with debug info).
>>>>>
>>>>>
>>>>>> Could this go into .dwo files with Fission and be
checked by dwp
>>>>>> instead, perhaps?
>>>>>>
>>>>>
>>>>> I think it could also work that way, yes.
>>>>>
>>>>> I'm reasonably happy with these figures, at least for a
first
>>>>>>> implementation. We may be able to do even better
for file size with reuse,
>>>>>>> but I'd leave that for version 2.
>>>>>>>
>>>>>>
>>>>>> What's the story with compatibility between
versions, then? Is there
>>>>>> a version header?
>>>>>>
>>>>>
>>>>> Yes, the header contains a version number.
>>>>>
>>>>>
>>>>>> Will old formats be supported by lld indefinitely? Not
at all?
>>>>>>
>>>>>
>>>>> I think we should drop support for old formats when we
introduce a new
>>>>> format. My understanding is that the ODR hash can change
whenever Clang
>>>>> changes (the implementation only performs ODR checking if
all ODR tables
>>>>> were produced by the same revision of Clang), so there
wouldn't seem to be
>>>>> a huge benefit in keeping support for old formats around.
>>>>>
>>>>
>>>> I imagine it's possible people aren't necessarily going
to rev lld in
>>>> exact lock-step with clang, but I could be wrong. (certainly
binutils ld or
>>>> gold aren't released/kept in lock-step with GCC, for
example)
>>>>
>>>
>>> That's certainly possible, but I'd say that the bar for
dropping
>>> backwards compatibility is lower because ODR tables are not
required for
>>> correctness. We could keep compatibility with the last version or
so if it
>>> isn't too burdensome, or otherwise print a warning.
>>>
>>
>> They aren't required for correctness, but upgrading your compiler
or
>> linker then silently losing ODR checking would be bad (or even not
silently
>> losing it, but having no choice but to rev both to keep the
functionality &
>> hold the ODR-cleanliness bar) - it's the sort of thing where if you
lost
>> the checking, then gained it back again later, the regression cleanup
would
>> be annoying/an impediment to using the feature.
>>
>
> Makes sense I guess. I'd be fine with a policy where the Nth open
source
> release should be able to read ODR tables produced by the N-1th and
> possibly the N-2th release.
>
Still strikes me as a bit awkward - wonder how that compared to other
(similar or different) linker features.

> Any idea what Daniel Jasper & co have been working on WRT ODR checking
&
>> how this feature integrates or doesn't with their work? I imagine
they
>> might be working on something more like a Clang Tooling style approach,
but
>> I'm not sure.
>>
>
> I'm not aware of any work like that, only of Richard Trieu's
efforts for
> modules that I'm piggybacking on.
>
+Djasper - perhaps you could provide some context on other odr detection
efforts?

> Peter
>
>>
>> - Dave
>>
>>
>>>
>>> Peter
>>>
>>>>
>>>>
>>>>>
>>>>> Peter
>>>>>
>>>>>>
>>>>>> - Dave
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>>>
>>>>>>>> Peter
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> -- Sean Silva
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> The issue of retaining the ODR
check for functions even if they
>>>>>>>>>>> get inlined may inherently pose an
extra cost that can't be folded into
>>>>>>>>>>> existing work the linker is doing,
so there might be a reason for clang to
>>>>>>>>>>> have a default mode that has
practically no linking overhead and one that
>>>>>>>>>>> does more thorough checking but
imposes extra linking overhead. Think
>>>>>>>>>>> something like a crazy boost
library with thousands of functions that get
>>>>>>>>>>> inlined away, but have gigantic
mangled names and so precisely are the ones
>>>>>>>>>>> that are going to impose extra cost
on the linker. Simply due to the extra
>>>>>>>>>>> volume of strings that the linker
would need to look at, I don't think
>>>>>>>>>>> there's a way to include
checking of all inlined function "for free" at the
>>>>>>>>>>> linker level using the symbol
approach.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> I guess those inlined functions would
still have those symbol
>>>>>>>>>>> names in debug info (I think?), so
piggybacking on the string deduplication
>>>>>>>>>>> we're already doing might make
it possible to fold away the work in that
>>>>>>>>>>> case (but then again, would still
impose extra cost with split dwarf...).
>>>>>>>>>>>
>>>>>>>>>>> Anyway, let's wait to see what
the actual performance numbers
>>>>>>>>>>> are.
>>>>>>>>>>>
>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 6, 2017 at 10:40 PM,
Peter Collingbourne via cfe-dev
>>>>>>>>>>> <cfe-dev at lists.llvm.org>
wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I'd like to propose an ODR
checker feature for Clang and LLD.
>>>>>>>>>>>> The feature would be similar to
gold's --detect-odr-violations feature, but
>>>>>>>>>>>> better: we can rely on
integration with clang to avoid relying on debug
>>>>>>>>>>>> info and to perform more
precise matching.
>>>>>>>>>>>>
>>>>>>>>>>>> The basic idea is that we use
clang's ability to create ODR
>>>>>>>>>>>> hashes for declarations. ODR
hashes are computed using all information
>>>>>>>>>>>> about a declaration that is
ODR-relevant. If the flag
>>>>>>>>>>>> -fdetect-odr-violations is
passed, Clang will store the ODR hashes in a
>>>>>>>>>>>> so-called ODR table in each
object file. Each ODR table will contain a
>>>>>>>>>>>> mapping from mangled
declaration names to ODR hashes. At link time, the
>>>>>>>>>>>> linker will read the ODR table
and report any mismatches.
>>>>>>>>>>>>
>>>>>>>>>>>> To make this work:
>>>>>>>>>>>> - LLVM will be extended with
the ability to represent ODR
>>>>>>>>>>>> tables in the IR and emit them
to object files
>>>>>>>>>>>> - Clang will be extended with
the ability to emit ODR tables
>>>>>>>>>>>> using ODR hashes
>>>>>>>>>>>> - LLD will be extended to read
ODR tables from object files
>>>>>>>>>>>>
>>>>>>>>>>>> I have implemented a prototype
of this feature. It is available
>>>>>>>>>>>> here:
https://github.com/pcc/llvm-project/tree/odr-checker and
>>>>>>>>>>>> some results from applying it
to chromium are here:
>>>>>>>>>>>> crbug.com/726071
>>>>>>>>>>>> As you can see it did indeed
find a number of real ODR
>>>>>>>>>>>> violations in Chromium,
including some that wouldn't be detectable using
>>>>>>>>>>>> debug info.
>>>>>>>>>>>>
>>>>>>>>>>>> If you're interested in
what the format of the ODR table would
>>>>>>>>>>>> look like, that prototype shows
pretty much what I had in mind, but I
>>>>>>>>>>>> expect many other aspects of
the implementation to change as it is
>>>>>>>>>>>> upstreamed.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> --
>>>>>>>>>>>> --
>>>>>>>>>>>> Peter
>>>>>>>>>>>>
>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>> cfe-dev mailing list
>>>>>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> --
>>>>>>>> Peter
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> --
>>>>>>> Peter
>>>>>>> _______________________________________________
>>>>>>> cfe-dev mailing list
>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> --
>>>>> Peter
>>>>>
>>>>
>>>
>>>
>>> --
>>> --
>>> Peter
>>>
>>
>
>
> --
> --
> Peter
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170614/db1c8799/attachment.html>

llvm dev - Jun 2017 - [cfe-dev] RFC: ODR checker for Clang and LLD

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD