thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD [Jun 2017]

If this information is useful, please help other people find it:
Share via:

David Blaikie via llvm-dev

2017-Jun-14 19:47 UTC

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

On Tue, Jun 13, 2017, 11:30 PM Peter Collingbourne <peter at pcc.me.uk>
wrote:
> On Tue, Jun 13, 2017 at 11:06 PM, David Blaikie <dblaikie at
gmail.com>
> wrote:
>
>>
>>
>> On Tue, Jun 13, 2017 at 10:05 PM Peter Collingbourne <peter at
pcc.me.uk>
>> wrote:
>>
>>> On Tue, Jun 13, 2017 at 8:48 PM, David Blaikie <dblaikie at
gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Jun 13, 2017 at 8:43 PM Peter Collingbourne <peter
at pcc.me.uk>
>>>> wrote:
>>>>
>>>>> On Tue, Jun 13, 2017 at 7:54 PM, David Blaikie <dblaikie
at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 13, 2017 at 6:34 PM Peter Collingbourne via
cfe-dev <
>>>>>> cfe-dev at lists.llvm.org> wrote:
>>>>>>
>>>>>>> On Wed, Jun 7, 2017 at 11:28 PM, Peter
Collingbourne <
>>>>>>> peter at pcc.me.uk> wrote:
>>>>>>>
>>>>>>>> On Wed, Jun 7, 2017 at 8:06 PM, Sean Silva
<chisophugis at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Jun 7, 2017 at 4:31 PM, Peter
Collingbourne <
>>>>>>>>> peter at pcc.me.uk> wrote:
>>>>>>>>>
>>>>>>>>>> On Wed, Jun 7, 2017 at 12:17 AM, Sean
Silva <
>>>>>>>>>> chisophugis at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Very nice and simple
implementation!
>>>>>>>>>>>
>>>>>>>>>>> Do you have any statistics on how
large these odr tables are
>>>>>>>>>>> compared to other object file data?
I assume that if these tables contain
>>>>>>>>>>> full mangled symbol names, they
could end up being very large and may want
>>>>>>>>>>> to share the symbol name strings
with the overall string table in the .o
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Looking at Chromium's object files
it looks like the total size
>>>>>>>>>> of the odrtabs is about 50% of the
total size of the object files, which
>>>>>>>>>> isn't great. The current
implementation only looks at records, so I imagine
>>>>>>>>>> that it would be hard to share any of
the strings that I'm currently
>>>>>>>>>> creating. (I guess it's possible
that some types will have a mangled vtable
>>>>>>>>>> name in the string table, so we may be
able to share a little that way.)
>>>>>>>>>> Note however that this was without
debug info.
>>>>>>>>>>
>>>>>>>>>> One option for reducing size would be
to
>>>>>>>>>> 1) store hashes of ODR names in ODR
tables, per Rui's suggestion
>>>>>>>>>> (alongside a reference to the name
itself in the string table)
>>>>>>>>>> 2) compress the string table for the
ODR names with a standard
>>>>>>>>>> compression algorithm like gzip.
>>>>>>>>>> This wouldn't seem to affect link
time performance much because I
>>>>>>>>>> think we should only need to look at
the strings if we see a ODR name hash
>>>>>>>>>> match together with an ODR hash
mismatch, which would mean an ODR violation
>>>>>>>>>> with a high probability (i.e. unless
there was an ODR name hash collision,
>>>>>>>>>> we have found an ODR violation). If we
don't expect a lot of sharing with
>>>>>>>>>> regular string tables (see below), it
seems even more reasonable.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Neat observation!
>>>>>>>>>
>>>>>>>>> FWIW, it is a birthday problem type
situation though, so for a
>>>>>>>>> 32-bit hash, we would expect a collision in
about 1 in 2^16 distinct hashes
>>>>>>>>> (and 2^16 seems pretty easy to hit in a
large project). So 64-bit hashes
>>>>>>>>> might be preferable.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Oh right, good point, using a 64-bit hash does
seem like a good
>>>>>>>> idea here.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Also, do you have any numbers on the
performance of your initial
>>>>>>>>>>> implementation?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I measured the link time for
chromium's unit_tests (the largest
>>>>>>>>>> single binary in chromium) at 5.05s
without ODR checks and 6.61s with ODR
>>>>>>>>>> checks. So about 30% overhead, but in
absolute terms it doesn't seem too
>>>>>>>>>> bad. So I think this may be acceptable
for an initial implementation, but
>>>>>>>>>> it certainly seems worth trying to do
better.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I know that things aren't currently
apples-to-apples, but how does
>>>>>>>>> that compare to gold?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I will measure it.
>>>>>>>>
>>>>>>>
>>>>>>> For that unit_tests binary I measured the overhead
at about 5
>>>>>>> seconds (average of 10 runs). That is with debug
info, of course.
>>>>>>>
>>>>>>> W.r.t. LLD and having it always on by default (and
hence making it
>>>>>>>>>>> as fast as possible), it seems like
right now you are implementing the
>>>>>>>>>>> checking process with a hash table.
That's simple and fine for a first
>>>>>>>>>>> implementation, but it's
probably worth mentioning in a comment the problem
>>>>>>>>>>> of checking the tables, at least
from the linker's perspective, does fit
>>>>>>>>>>> into a map-reduce pattern and could
be easily parallelized if needed. E.g.
>>>>>>>>>>> a parallel sort to coalesce all
entries for symbols of the same name
>>>>>>>>>>> followed by a parallel forEach to
check each bucket with the same symbol
>>>>>>>>>>> name (roughly speaking).
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Right, that's one approach. I was
thinking of a simpler approach
>>>>>>>>>> where at compile time we sort ODR names
by hash and partition them using
>>>>>>>>>> (say) the upper bits of the hash, so
that at link time we can have N
>>>>>>>>>> threads each building a hash table for
a specific partition.
>>>>>>>>>>
>>>>>>>>>> And of course this work can be started
right after symbol
>>>>>>>>>> resolution finishes and parallelised
with the rest of the work done by the
>>>>>>>>>> linker.
>>>>>>>>>>
>>>>>>>>>> Even better than doing it faster is
just doing less work. There's
>>>>>>>>>>> a lot of work that the linker is
already doing that may be reusable for the
>>>>>>>>>>> ODR checking.
>>>>>>>>>>> E.g.
>>>>>>>>>>> - maybe we could get the coalescing
step as a byproduct of our
>>>>>>>>>>> existing string deduping, which we
are generally doing anyway.
>>>>>>>>>>> - we are already coalescing symbol
names for the symbol table.
>>>>>>>>>>> If the ODR table is keyed off of
symbols in the binary that we are
>>>>>>>>>>> inserting into the symbol table,
then I think we could do the entire ODR
>>>>>>>>>>> check with no extra
"string" work on LLD's part.
>>>>>>>>>>>
>>>>>>>>>>> I see Rui already mentioned some of
this in
>>>>>>>>>>>
https://bugs.chromium.org/p/chromium/issues/detail?id=726071#c4.
>>>>>>>>>>> You mentioned that not everything
is necessarily directly keyed
>>>>>>>>>>> on a symbol (such as types), but I
think that it would really simplify
>>>>>>>>>>> things if the check was done as
such. Do you have any idea exactly how much
>>>>>>>>>>> of the things that we want to check
are not keyed on symbols? If most
>>>>>>>>>>> things are keyed on symbols, for
the things we are not we can just emit
>>>>>>>>>>> extra symbols prefixed by
__clang_odr_check_ or whatever.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Since the current implementation only
works with records there is
>>>>>>>>>> basically zero overlap right now
between ODR names and symbols. I suppose
>>>>>>>>>> that I could estimate the amount of
function overlap in a hypothetical
>>>>>>>>>> implementation that computes ODR hashes
of functions by comparing the
>>>>>>>>>> number of *_odr functions after clang
has finished IRgen with the number
>>>>>>>>>> after optimization finishes. This of
course would be strictly more than
>>>>>>>>>> functions + types.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Wouldn't any function or symbol using
the record type have the
>>>>>>>>> type name somewhere in it? If we used an
offset+length encoding (instead of
>>>>>>>>> offset + NUL termination) we might be able
to reuse it then (at some cost
>>>>>>>>> in finding the reference).
>>>>>>>>>
>>>>>>>>
>>>>>>>> That may be possible with some work in the
string table builder.
>>>>>>>> But at that point of course we're not
dealing with regular symbols any
>>>>>>>> more. I guess we could have two ODR tables per
object file: an array of
>>>>>>>> (ODR hash, location) tuples for ODR names that
correspond to symbol table
>>>>>>>> symbols (i.e. Rui's proposal on the
chromium bug), and an array of (ODR
>>>>>>>> name, ODR hash, location) tuples for all other
ODR names. I guess if we
>>>>>>>> wanted a "low overhead" mode we could
just omit the second table or put
>>>>>>>> fewer symbols in it.
>>>>>>>>
>>>>>>>> With debug info surely there is some sort of
string representing
>>>>>>>>> the record name or something like that, no?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Not the record name on its own (they do appear
but a bit awkwardly
>>>>>>>> -- each namespace component is stored in a
separate string), but if the
>>>>>>>> record has at least one member function the
mangled type name will appear
>>>>>>>> somewhere in .debug_str, so we could in
principle reuse that with the
>>>>>>>> offset/length trick.
>>>>>>>>
>>>>>>>> I guess we may have to have our
"low-overhead" user-facing behavior
>>>>>>>>> be a bit more nuanced. E.g.:
>>>>>>>>> 1. does this feature bloat object files
significantly
>>>>>>>>> 2. does this feature slow down link times
significantly
>>>>>>>>>
>>>>>>>>> Intuitively, it seems like we should be
able to get 1. when debug
>>>>>>>>> info happens to be enabled (not sure about
split dwarf?) and possibly in
>>>>>>>>> all cases at the cost of complexity. We may
be able to get 2. in all cases
>>>>>>>>> with proper design.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think that would be my rough assessment as
well. I think we have
>>>>>>>> a good shot at 1 for all cases with some of the
ideas that have been
>>>>>>>> mentioned already. If we can avoid creating
dependencies on DWARF I think
>>>>>>>> that would be ideal -- I'd ideally like
this to work for COFF as well,
>>>>>>>> where you'd typically expect to find
CodeView in object files. If I were to
>>>>>>>> try this I think the first thing that I would
try is hash/compression
>>>>>>>> combined with the two ODR tables (no reuse for
non-symbol ODR names to
>>>>>>>> start with, as compression may be enough on its
own).
>>>>>>>>
>>>>>>>
>>>>>>> I developed a second prototype which uses
hash/compression with no
>>>>>>> attempt to reuse. It is available here:
>>>>>>>
https://github.com/pcc/llvm-project/tree/odr-checker2
>>>>>>>
>>>>>>> For Chromium the object file size overhead was
536566007 bytes, or
>>>>>>> in relative terms about 25%, or about 4% with debug
info. I measured perf
>>>>>>> overhead for unit_tests at about 6%, but after I
moved the checker onto
>>>>>>> another thread, the overhead disappeared into the
noise.
>>>>>>>
>>>>>>
>>>>>> Still seems like quite a big increase.
>>>>>>
>>>>>> Any chance of compression?
>>>>>>
>>>>>
>>>>> That was with compression -- the implementation compresses
the parts
>>>>> of the ODR table that aren't hashes (aside from the
header and the Clang
>>>>> version, which is a small fixed cost), as well as the
string table. The
>>>>> hashes were left uncompressed because they are in the
critical path of the
>>>>> linker and because I imagine that they wouldn't really
be that compressible.
>>>>>
>>>>
>>>> I'd be a bit surprised if they weren't especially
compressible -
>>>>
>>>
>>> Maybe I'm wrong, but my intuition about compression is that it
works
>>> best when the data contains repeated patterns. If we use a hash
function
>>> with good dispersion then I'd expect each hash to have little
in common
>>> with other hashes.
>>>
>>>
>>>> and how much of the size increase is the compressed data V the
>>>> uncompressed data?
>>>>
>>>
>>> The ratio was roughly 60% compressed data to 40% uncompressed data.
>>>
>>>
>>>> Is it still in the hot path when parallelized?
>>>>
>>>
>>> Not right now according to my benchmarking, but decompression could
push
>>> it into the critical path if it ends up taking longer than the rest
of the
>>> work done by the linker after symbol resolution. On the same
machine that I
>>> used for benchmarking, gunzip'ing 200MB of /dev/urandom (which
is roughly
>>> what I'd expect the hashes to look like) takes around 1.1s,
i.e. a not
>>> insignificant fraction of lld's runtime.
>>>
>>>
>>>> So I think the remaining gains would either be through limiting
the
>>>>> number of ODR table entries, or through reuse of data.
>>>>>
>>>>> Limiting might be something to explore -- one possibility
is that we
>>>>> could limit the ODR table entries to the declarations that
are "used" by a
>>>>> particular translation unit (it appears that Clang tracks
something like
>>>>> that in Decl::Used/Decl::Referenced, but I'm not sure
if that is exactly
>>>>> what we need -- I think we would basically need to test for
reference
>>>>> reachability from the functions/globals that are
IRgen'd).
>>>>>
>>>>
>>>> Currently it has every type and function that is in the AST?
Yeah,
>>>> that's a lot - perhaps it should be more like the things
that go in the
>>>> DWARF? (though would need to add some cases there - since the
DWARF logic
>>>> already relies on the ODR to not emit duplicates in some cases)
>>>>
>>>
>>> Just every record declaration -- Clang only supports ODR hashes for
>>> record declarations right now. I understand that function
declarations
>>> (including function bodies) are still works in progress.
>>>
>>> I think it should indeed just be roughly the things that go in the
>>> DWARF. I think that at one point I observed that every record
declaration,
>>> even unused ones, were going into the DWARF, but I might have been
mistaken
>>> because I can no longer reproduce that. I'll take a closer look
to see if I
>>> can reuse what logic presumably already exists for DWARF.
>>>
>>> In terms of reuse, it seems that of the 536566007 bytes of
>>>>> overhead, 319309579 were the compressed part of the ODR
tables. So even if
>>>>> we achieved 100% sharing,
>>>>>
>>>>
>>>> 100% sharing? You mean if all the data were compressed, and
assuming
>>>> the hashes were compressible at the same ratio as the other
data?
>>>>
>>>
>>> Sorry, I mean if 100% of the data in the compressed part of the ODR
>>> table could be eliminated by reusing data stored elsewhere (e.g. in
the
>>> object file string table or in the DWARF).
>>>
>>> with the current scheme I think that our minimum achievable
overhead
>>>>> would be ~15% (no debug info) or ~2% (with debug info).
>>>>>
>>>>>
>>>>>> Could this go into .dwo files with Fission and be
checked by dwp
>>>>>> instead, perhaps?
>>>>>>
>>>>>
>>>>> I think it could also work that way, yes.
>>>>>
>>>>> I'm reasonably happy with these figures, at least for a
first
>>>>>>> implementation. We may be able to do even better
for file size with reuse,
>>>>>>> but I'd leave that for version 2.
>>>>>>>
>>>>>>
>>>>>> What's the story with compatibility between
versions, then? Is there
>>>>>> a version header?
>>>>>>
>>>>>
>>>>> Yes, the header contains a version number.
>>>>>
>>>>>
>>>>>> Will old formats be supported by lld indefinitely? Not
at all?
>>>>>>
>>>>>
>>>>> I think we should drop support for old formats when we
introduce a new
>>>>> format. My understanding is that the ODR hash can change
whenever Clang
>>>>> changes (the implementation only performs ODR checking if
all ODR tables
>>>>> were produced by the same revision of Clang), so there
wouldn't seem to be
>>>>> a huge benefit in keeping support for old formats around.
>>>>>
>>>>
>>>> I imagine it's possible people aren't necessarily going
to rev lld in
>>>> exact lock-step with clang, but I could be wrong. (certainly
binutils ld or
>>>> gold aren't released/kept in lock-step with GCC, for
example)
>>>>
>>>
>>> That's certainly possible, but I'd say that the bar for
dropping
>>> backwards compatibility is lower because ODR tables are not
required for
>>> correctness. We could keep compatibility with the last version or
so if it
>>> isn't too burdensome, or otherwise print a warning.
>>>
>>
>> They aren't required for correctness, but upgrading your compiler
or
>> linker then silently losing ODR checking would be bad (or even not
silently
>> losing it, but having no choice but to rev both to keep the
functionality &
>> hold the ODR-cleanliness bar) - it's the sort of thing where if you
lost
>> the checking, then gained it back again later, the regression cleanup
would
>> be annoying/an impediment to using the feature.
>>
>
> Makes sense I guess. I'd be fine with a policy where the Nth open
source
> release should be able to read ODR tables produced by the N-1th and
> possibly the N-2th release.
>
Still strikes me as a bit awkward - wonder how that compared to other
(similar or different) linker features.

> Any idea what Daniel Jasper & co have been working on WRT ODR checking
&
>> how this feature integrates or doesn't with their work? I imagine
they
>> might be working on something more like a Clang Tooling style approach,
but
>> I'm not sure.
>>
>
> I'm not aware of any work like that, only of Richard Trieu's
efforts for
> modules that I'm piggybacking on.
>
+Djasper - perhaps you could provide some context on other odr detection
efforts?

> Peter
>
>>
>> - Dave
>>
>>
>>>
>>> Peter
>>>
>>>>
>>>>
>>>>>
>>>>> Peter
>>>>>
>>>>>>
>>>>>> - Dave
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>>>
>>>>>>>> Peter
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> -- Sean Silva
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> The issue of retaining the ODR
check for functions even if they
>>>>>>>>>>> get inlined may inherently pose an
extra cost that can't be folded into
>>>>>>>>>>> existing work the linker is doing,
so there might be a reason for clang to
>>>>>>>>>>> have a default mode that has
practically no linking overhead and one that
>>>>>>>>>>> does more thorough checking but
imposes extra linking overhead. Think
>>>>>>>>>>> something like a crazy boost
library with thousands of functions that get
>>>>>>>>>>> inlined away, but have gigantic
mangled names and so precisely are the ones
>>>>>>>>>>> that are going to impose extra cost
on the linker. Simply due to the extra
>>>>>>>>>>> volume of strings that the linker
would need to look at, I don't think
>>>>>>>>>>> there's a way to include
checking of all inlined function "for free" at the
>>>>>>>>>>> linker level using the symbol
approach.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> I guess those inlined functions would
still have those symbol
>>>>>>>>>>> names in debug info (I think?), so
piggybacking on the string deduplication
>>>>>>>>>>> we're already doing might make
it possible to fold away the work in that
>>>>>>>>>>> case (but then again, would still
impose extra cost with split dwarf...).
>>>>>>>>>>>
>>>>>>>>>>> Anyway, let's wait to see what
the actual performance numbers
>>>>>>>>>>> are.
>>>>>>>>>>>
>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 6, 2017 at 10:40 PM,
Peter Collingbourne via cfe-dev
>>>>>>>>>>> <cfe-dev at lists.llvm.org>
wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I'd like to propose an ODR
checker feature for Clang and LLD.
>>>>>>>>>>>> The feature would be similar to
gold's --detect-odr-violations feature, but
>>>>>>>>>>>> better: we can rely on
integration with clang to avoid relying on debug
>>>>>>>>>>>> info and to perform more
precise matching.
>>>>>>>>>>>>
>>>>>>>>>>>> The basic idea is that we use
clang's ability to create ODR
>>>>>>>>>>>> hashes for declarations. ODR
hashes are computed using all information
>>>>>>>>>>>> about a declaration that is
ODR-relevant. If the flag
>>>>>>>>>>>> -fdetect-odr-violations is
passed, Clang will store the ODR hashes in a
>>>>>>>>>>>> so-called ODR table in each
object file. Each ODR table will contain a
>>>>>>>>>>>> mapping from mangled
declaration names to ODR hashes. At link time, the
>>>>>>>>>>>> linker will read the ODR table
and report any mismatches.
>>>>>>>>>>>>
>>>>>>>>>>>> To make this work:
>>>>>>>>>>>> - LLVM will be extended with
the ability to represent ODR
>>>>>>>>>>>> tables in the IR and emit them
to object files
>>>>>>>>>>>> - Clang will be extended with
the ability to emit ODR tables
>>>>>>>>>>>> using ODR hashes
>>>>>>>>>>>> - LLD will be extended to read
ODR tables from object files
>>>>>>>>>>>>
>>>>>>>>>>>> I have implemented a prototype
of this feature. It is available
>>>>>>>>>>>> here:
https://github.com/pcc/llvm-project/tree/odr-checker and
>>>>>>>>>>>> some results from applying it
to chromium are here:
>>>>>>>>>>>> crbug.com/726071
>>>>>>>>>>>> As you can see it did indeed
find a number of real ODR
>>>>>>>>>>>> violations in Chromium,
including some that wouldn't be detectable using
>>>>>>>>>>>> debug info.
>>>>>>>>>>>>
>>>>>>>>>>>> If you're interested in
what the format of the ODR table would
>>>>>>>>>>>> look like, that prototype shows
pretty much what I had in mind, but I
>>>>>>>>>>>> expect many other aspects of
the implementation to change as it is
>>>>>>>>>>>> upstreamed.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> --
>>>>>>>>>>>> --
>>>>>>>>>>>> Peter
>>>>>>>>>>>>
>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>> cfe-dev mailing list
>>>>>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> --
>>>>>>>> Peter
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> --
>>>>>>> Peter
>>>>>>> _______________________________________________
>>>>>>> cfe-dev mailing list
>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> --
>>>>> Peter
>>>>>
>>>>
>>>
>>>
>>> --
>>> --
>>> Peter
>>>
>>
>
>
> --
> --
> Peter
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170614/db1c8799/attachment.html>

Peter Collingbourne via llvm-dev

2017-Jun-14 20:41 UTC

head link

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

On Wed, Jun 14, 2017 at 12:47 PM, David Blaikie <dblaikie at gmail.com>
wrote:
>
>
> On Tue, Jun 13, 2017, 11:30 PM Peter Collingbourne <peter at
pcc.me.uk>
> wrote:
>
>> On Tue, Jun 13, 2017 at 11:06 PM, David Blaikie <dblaikie at
gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Jun 13, 2017 at 10:05 PM Peter Collingbourne <peter at
pcc.me.uk>
>>> wrote:
>>>
>>>> On Tue, Jun 13, 2017 at 8:48 PM, David Blaikie <dblaikie at
gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Jun 13, 2017 at 8:43 PM Peter Collingbourne
<peter at pcc.me.uk>
>>>>> wrote:
>>>>>
>>>>>> On Tue, Jun 13, 2017 at 7:54 PM, David Blaikie
<dblaikie at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 13, 2017 at 6:34 PM Peter Collingbourne
via cfe-dev <
>>>>>>> cfe-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>> On Wed, Jun 7, 2017 at 11:28 PM, Peter
Collingbourne <
>>>>>>>> peter at pcc.me.uk> wrote:
>>>>>>>>
>>>>>>>>> On Wed, Jun 7, 2017 at 8:06 PM, Sean Silva
<chisophugis at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jun 7, 2017 at 4:31 PM, Peter
Collingbourne <
>>>>>>>>>> peter at pcc.me.uk> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Wed, Jun 7, 2017 at 12:17 AM,
Sean Silva <
>>>>>>>>>>> chisophugis at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Very nice and simple
implementation!
>>>>>>>>>>>>
>>>>>>>>>>>> Do you have any statistics on
how large these odr tables are
>>>>>>>>>>>> compared to other object file
data? I assume that if these tables contain
>>>>>>>>>>>> full mangled symbol names, they
could end up being very large and may want
>>>>>>>>>>>> to share the symbol name
strings with the overall string table in the .o
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Looking at Chromium's object
files it looks like the total size
>>>>>>>>>>> of the odrtabs is about 50% of the
total size of the object files, which
>>>>>>>>>>> isn't great. The current
implementation only looks at records, so I imagine
>>>>>>>>>>> that it would be hard to share any
of the strings that I'm currently
>>>>>>>>>>> creating. (I guess it's
possible that some types will have a mangled vtable
>>>>>>>>>>> name in the string table, so we may
be able to share a little that way.)
>>>>>>>>>>> Note however that this was without
debug info.
>>>>>>>>>>>
>>>>>>>>>>> One option for reducing size would
be to
>>>>>>>>>>> 1) store hashes of ODR names in ODR
tables, per Rui's suggestion
>>>>>>>>>>> (alongside a reference to the name
itself in the string table)
>>>>>>>>>>> 2) compress the string table for
the ODR names with a standard
>>>>>>>>>>> compression algorithm like gzip.
>>>>>>>>>>> This wouldn't seem to affect
link time performance much because
>>>>>>>>>>> I think we should only need to look
at the strings if we see a ODR name
>>>>>>>>>>> hash match together with an ODR
hash mismatch, which would mean an ODR
>>>>>>>>>>> violation with a high probability
(i.e. unless there was an ODR name hash
>>>>>>>>>>> collision, we have found an ODR
violation). If we don't expect a lot of
>>>>>>>>>>> sharing with regular string tables
(see below), it seems even more
>>>>>>>>>>> reasonable.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Neat observation!
>>>>>>>>>>
>>>>>>>>>> FWIW, it is a birthday problem type
situation though, so for a
>>>>>>>>>> 32-bit hash, we would expect a
collision in about 1 in 2^16 distinct hashes
>>>>>>>>>> (and 2^16 seems pretty easy to hit in a
large project). So 64-bit hashes
>>>>>>>>>> might be preferable.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Oh right, good point, using a 64-bit hash
does seem like a good
>>>>>>>>> idea here.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Also, do you have any numbers on
the performance of your initial
>>>>>>>>>>>> implementation?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I measured the link time for
chromium's unit_tests (the largest
>>>>>>>>>>> single binary in chromium) at 5.05s
without ODR checks and 6.61s with ODR
>>>>>>>>>>> checks. So about 30% overhead, but
in absolute terms it doesn't seem too
>>>>>>>>>>> bad. So I think this may be
acceptable for an initial implementation, but
>>>>>>>>>>> it certainly seems worth trying to
do better.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I know that things aren't currently
apples-to-apples, but how
>>>>>>>>>> does that compare to gold?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I will measure it.
>>>>>>>>>
>>>>>>>>
>>>>>>>> For that unit_tests binary I measured the
overhead at about 5
>>>>>>>> seconds (average of 10 runs). That is with
debug info, of course.
>>>>>>>>
>>>>>>>> W.r.t. LLD and having it always on by default
(and hence making it
>>>>>>>>>>>> as fast as possible), it seems
like right now you are implementing the
>>>>>>>>>>>> checking process with a hash
table. That's simple and fine for a first
>>>>>>>>>>>> implementation, but it's
probably worth mentioning in a comment the problem
>>>>>>>>>>>> of checking the tables, at
least from the linker's perspective, does fit
>>>>>>>>>>>> into a map-reduce pattern and
could be easily parallelized if needed. E.g.
>>>>>>>>>>>> a parallel sort to coalesce all
entries for symbols of the same name
>>>>>>>>>>>> followed by a parallel forEach
to check each bucket with the same symbol
>>>>>>>>>>>> name (roughly speaking).
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Right, that's one approach. I
was thinking of a simpler approach
>>>>>>>>>>> where at compile time we sort ODR
names by hash and partition them using
>>>>>>>>>>> (say) the upper bits of the hash,
so that at link time we can have N
>>>>>>>>>>> threads each building a hash table
for a specific partition.
>>>>>>>>>>>
>>>>>>>>>>> And of course this work can be
started right after symbol
>>>>>>>>>>> resolution finishes and
parallelised with the rest of the work done by the
>>>>>>>>>>> linker.
>>>>>>>>>>>
>>>>>>>>>>> Even better than doing it faster is
just doing less work.
>>>>>>>>>>>> There's a lot of work that
the linker is already doing that may be reusable
>>>>>>>>>>>> for the ODR checking.
>>>>>>>>>>>> E.g.
>>>>>>>>>>>> - maybe we could get the
coalescing step as a byproduct of our
>>>>>>>>>>>> existing string deduping, which
we are generally doing anyway.
>>>>>>>>>>>> - we are already coalescing
symbol names for the symbol table.
>>>>>>>>>>>> If the ODR table is keyed off
of symbols in the binary that we are
>>>>>>>>>>>> inserting into the symbol
table, then I think we could do the entire ODR
>>>>>>>>>>>> check with no extra
"string" work on LLD's part.
>>>>>>>>>>>>
>>>>>>>>>>>> I see Rui already mentioned
some of this in
>>>>>>>>>>>>
https://bugs.chromium.org/p/chromium/issues/detail?id=726071#c4
>>>>>>>>>>>> .
>>>>>>>>>>>> You mentioned that not
everything is necessarily directly keyed
>>>>>>>>>>>> on a symbol (such as types),
but I think that it would really simplify
>>>>>>>>>>>> things if the check was done as
such. Do you have any idea exactly how much
>>>>>>>>>>>> of the things that we want to
check are not keyed on symbols? If most
>>>>>>>>>>>> things are keyed on symbols,
for the things we are not we can just emit
>>>>>>>>>>>> extra symbols prefixed by
__clang_odr_check_ or whatever.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Since the current implementation
only works with records there
>>>>>>>>>>> is basically zero overlap right now
between ODR names and symbols. I
>>>>>>>>>>> suppose that I could estimate the
amount of function overlap in a
>>>>>>>>>>> hypothetical implementation that
computes ODR hashes of functions by
>>>>>>>>>>> comparing the number of *_odr
functions after clang has finished IRgen with
>>>>>>>>>>> the number after optimization
finishes. This of course would be strictly
>>>>>>>>>>> more than functions + types.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Wouldn't any function or symbol
using the record type have the
>>>>>>>>>> type name somewhere in it? If we used
an offset+length encoding (instead of
>>>>>>>>>> offset + NUL termination) we might be
able to reuse it then (at some cost
>>>>>>>>>> in finding the reference).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> That may be possible with some work in the
string table builder.
>>>>>>>>> But at that point of course we're not
dealing with regular symbols any
>>>>>>>>> more. I guess we could have two ODR tables
per object file: an array of
>>>>>>>>> (ODR hash, location) tuples for ODR names
that correspond to symbol table
>>>>>>>>> symbols (i.e. Rui's proposal on the
chromium bug), and an array of (ODR
>>>>>>>>> name, ODR hash, location) tuples for all
other ODR names. I guess if we
>>>>>>>>> wanted a "low overhead" mode we
could just omit the second table or put
>>>>>>>>> fewer symbols in it.
>>>>>>>>>
>>>>>>>>> With debug info surely there is some sort
of string representing
>>>>>>>>>> the record name or something like that,
no?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Not the record name on its own (they do
appear but a bit awkwardly
>>>>>>>>> -- each namespace component is stored in a
separate string), but if the
>>>>>>>>> record has at least one member function the
mangled type name will appear
>>>>>>>>> somewhere in .debug_str, so we could in
principle reuse that with the
>>>>>>>>> offset/length trick.
>>>>>>>>>
>>>>>>>>> I guess we may have to have our
"low-overhead" user-facing
>>>>>>>>>> behavior be a bit more nuanced. E.g.:
>>>>>>>>>> 1. does this feature bloat object files
significantly
>>>>>>>>>> 2. does this feature slow down link
times significantly
>>>>>>>>>>
>>>>>>>>>> Intuitively, it seems like we should be
able to get 1. when debug
>>>>>>>>>> info happens to be enabled (not sure
about split dwarf?) and possibly in
>>>>>>>>>> all cases at the cost of complexity. We
may be able to get 2. in all cases
>>>>>>>>>> with proper design.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think that would be my rough assessment
as well. I think we have
>>>>>>>>> a good shot at 1 for all cases with some of
the ideas that have been
>>>>>>>>> mentioned already. If we can avoid creating
dependencies on DWARF I think
>>>>>>>>> that would be ideal -- I'd ideally like
this to work for COFF as well,
>>>>>>>>> where you'd typically expect to find
CodeView in object files. If I were to
>>>>>>>>> try this I think the first thing that I
would try is hash/compression
>>>>>>>>> combined with the two ODR tables (no reuse
for non-symbol ODR names to
>>>>>>>>> start with, as compression may be enough on
its own).
>>>>>>>>>
>>>>>>>>
>>>>>>>> I developed a second prototype which uses
hash/compression with no
>>>>>>>> attempt to reuse. It is available here:
https://github.com/pcc/
>>>>>>>> llvm-project/tree/odr-checker2
>>>>>>>>
>>>>>>>> For Chromium the object file size overhead was
536566007 bytes, or
>>>>>>>> in relative terms about 25%, or about 4% with
debug info. I measured perf
>>>>>>>> overhead for unit_tests at about 6%, but after
I moved the checker onto
>>>>>>>> another thread, the overhead disappeared into
the noise.
>>>>>>>>
>>>>>>>
>>>>>>> Still seems like quite a big increase.
>>>>>>>
>>>>>>> Any chance of compression?
>>>>>>>
>>>>>>
>>>>>> That was with compression -- the implementation
compresses the parts
>>>>>> of the ODR table that aren't hashes (aside from the
header and the Clang
>>>>>> version, which is a small fixed cost), as well as the
string table. The
>>>>>> hashes were left uncompressed because they are in the
critical path of the
>>>>>> linker and because I imagine that they wouldn't
really be that compressible.
>>>>>>
>>>>>
>>>>> I'd be a bit surprised if they weren't especially
compressible -
>>>>>
>>>>
>>>> Maybe I'm wrong, but my intuition about compression is that
it works
>>>> best when the data contains repeated patterns. If we use a hash
function
>>>> with good dispersion then I'd expect each hash to have
little in common
>>>> with other hashes.
>>>>
>>>>
>>>>> and how much of the size increase is the compressed data V
the
>>>>> uncompressed data?
>>>>>
>>>>
>>>> The ratio was roughly 60% compressed data to 40% uncompressed
data.
>>>>
>>>>
>>>>> Is it still in the hot path when parallelized?
>>>>>
>>>>
>>>> Not right now according to my benchmarking, but decompression
could
>>>> push it into the critical path if it ends up taking longer than
the rest of
>>>> the work done by the linker after symbol resolution. On the
same machine
>>>> that I used for benchmarking, gunzip'ing 200MB of
/dev/urandom (which is
>>>> roughly what I'd expect the hashes to look like) takes
around 1.1s, i.e. a
>>>> not insignificant fraction of lld's runtime.
>>>>
>>>>
>>>>> So I think the remaining gains would either be through
limiting the
>>>>>> number of ODR table entries, or through reuse of data.
>>>>>>
>>>>>> Limiting might be something to explore -- one
possibility is that we
>>>>>> could limit the ODR table entries to the declarations
that are "used" by a
>>>>>> particular translation unit (it appears that Clang
tracks something like
>>>>>> that in Decl::Used/Decl::Referenced, but I'm not
sure if that is exactly
>>>>>> what we need -- I think we would basically need to test
for reference
>>>>>> reachability from the functions/globals that are
IRgen'd).
>>>>>>
>>>>>
>>>>> Currently it has every type and function that is in the
AST? Yeah,
>>>>> that's a lot - perhaps it should be more like the
things that go in the
>>>>> DWARF? (though would need to add some cases there - since
the DWARF logic
>>>>> already relies on the ODR to not emit duplicates in some
cases)
>>>>>
>>>>
>>>> Just every record declaration -- Clang only supports ODR hashes
for
>>>> record declarations right now. I understand that function
declarations
>>>> (including function bodies) are still works in progress.
>>>>
>>>> I think it should indeed just be roughly the things that go in
the
>>>> DWARF. I think that at one point I observed that every record
declaration,
>>>> even unused ones, were going into the DWARF, but I might have
been mistaken
>>>> because I can no longer reproduce that. I'll take a closer
look to see if I
>>>> can reuse what logic presumably already exists for DWARF.
>>>>
>>>> In terms of reuse, it seems that of the 536566007 bytes of
>>>>>> overhead, 319309579 were the compressed part of the ODR
tables. So even if
>>>>>> we achieved 100% sharing,
>>>>>>
>>>>>
>>>>> 100% sharing? You mean if all the data were compressed, and
assuming
>>>>> the hashes were compressible at the same ratio as the other
data?
>>>>>
>>>>
>>>> Sorry, I mean if 100% of the data in the compressed part of the
ODR
>>>> table could be eliminated by reusing data stored elsewhere
(e.g. in the
>>>> object file string table or in the DWARF).
>>>>
>>>> with the current scheme I think that our minimum achievable
overhead
>>>>>> would be ~15% (no debug info) or ~2% (with debug info).
>>>>>>
>>>>>>
>>>>>>> Could this go into .dwo files with Fission and be
checked by dwp
>>>>>>> instead, perhaps?
>>>>>>>
>>>>>>
>>>>>> I think it could also work that way, yes.
>>>>>>
>>>>>> I'm reasonably happy with these figures, at least
for a first
>>>>>>>> implementation. We may be able to do even
better for file size with reuse,
>>>>>>>> but I'd leave that for version 2.
>>>>>>>>
>>>>>>>
>>>>>>> What's the story with compatibility between
versions, then? Is there
>>>>>>> a version header?
>>>>>>>
>>>>>>
>>>>>> Yes, the header contains a version number.
>>>>>>
>>>>>>
>>>>>>> Will old formats be supported by lld indefinitely?
Not at all?
>>>>>>>
>>>>>>
>>>>>> I think we should drop support for old formats when we
introduce a
>>>>>> new format. My understanding is that the ODR hash can
change whenever Clang
>>>>>> changes (the implementation only performs ODR checking
if all ODR tables
>>>>>> were produced by the same revision of Clang), so there
wouldn't seem to be
>>>>>> a huge benefit in keeping support for old formats
around.
>>>>>>
>>>>>
>>>>> I imagine it's possible people aren't necessarily
going to rev lld in
>>>>> exact lock-step with clang, but I could be wrong.
(certainly binutils ld or
>>>>> gold aren't released/kept in lock-step with GCC, for
example)
>>>>>
>>>>
>>>> That's certainly possible, but I'd say that the bar for
dropping
>>>> backwards compatibility is lower because ODR tables are not
required for
>>>> correctness. We could keep compatibility with the last version
or so if it
>>>> isn't too burdensome, or otherwise print a warning.
>>>>
>>>
>>> They aren't required for correctness, but upgrading your
compiler or
>>> linker then silently losing ODR checking would be bad (or even not
silently
>>> losing it, but having no choice but to rev both to keep the
functionality &
>>> hold the ODR-cleanliness bar) - it's the sort of thing where if
you lost
>>> the checking, then gained it back again later, the regression
cleanup would
>>> be annoying/an impediment to using the feature.
>>>
>>
>> Makes sense I guess. I'd be fine with a policy where the Nth open
source
>> release should be able to read ODR tables produced by the N-1th and
>> possibly the N-2th release.
>>
>
> Still strikes me as a bit awkward - wonder how that compared to other
> (similar or different) linker features.
>
I think the most similar existing feature is .gdb_index. They have already
gone through a few format revisions:
https://sourceware.org/gdb/onlinedocs/gdb/Index-Section-Format.html
and have deprecated/removed support for older formats.

Because the requirements for ODR tables are simpler than those for
.gdb_index, I'd expect us to converge on a final format sooner, so in
practice the window of compatibility would end up being longer than a year.

Peter

>
>> Any idea what Daniel Jasper & co have been working on WRT ODR
checking &
>>> how this feature integrates or doesn't with their work? I
imagine they
>>> might be working on something more like a Clang Tooling style
approach, but
>>> I'm not sure.
>>>
>>
>> I'm not aware of any work like that, only of Richard Trieu's
efforts for
>> modules that I'm piggybacking on.
>>
>
> +Djasper - perhaps you could provide some context on other odr detection
> efforts?
>
>
>> Peter
>>
>>>
>>> - Dave
>>>
>>>
>>>>
>>>> Peter
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>>>
>>>>>>> - Dave
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Peter
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -- Sean Silva
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> The issue of retaining the ODR
check for functions even if they
>>>>>>>>>>>> get inlined may inherently pose
an extra cost that can't be folded into
>>>>>>>>>>>> existing work the linker is
doing, so there might be a reason for clang to
>>>>>>>>>>>> have a default mode that has
practically no linking overhead and one that
>>>>>>>>>>>> does more thorough checking but
imposes extra linking overhead. Think
>>>>>>>>>>>> something like a crazy boost
library with thousands of functions that get
>>>>>>>>>>>> inlined away, but have gigantic
mangled names and so precisely are the ones
>>>>>>>>>>>> that are going to impose extra
cost on the linker. Simply due to the extra
>>>>>>>>>>>> volume of strings that the
linker would need to look at, I don't think
>>>>>>>>>>>> there's a way to include
checking of all inlined function "for free" at the
>>>>>>>>>>>> linker level using the symbol
approach.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> I guess those inlined functions
would still have those symbol
>>>>>>>>>>>> names in debug info (I think?),
so piggybacking on the string deduplication
>>>>>>>>>>>> we're already doing might
make it possible to fold away the work in that
>>>>>>>>>>>> case (but then again, would
still impose extra cost with split dwarf...).
>>>>>>>>>>>>
>>>>>>>>>>>> Anyway, let's wait to see
what the actual performance numbers
>>>>>>>>>>>> are.
>>>>>>>>>>>>
>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jun 6, 2017 at 10:40
PM, Peter Collingbourne via
>>>>>>>>>>>> cfe-dev <cfe-dev at
lists.llvm.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'd like to propose an
ODR checker feature for Clang and LLD.
>>>>>>>>>>>>> The feature would be
similar to gold's --detect-odr-violations feature, but
>>>>>>>>>>>>> better: we can rely on
integration with clang to avoid relying on debug
>>>>>>>>>>>>> info and to perform more
precise matching.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The basic idea is that we
use clang's ability to create ODR
>>>>>>>>>>>>> hashes for declarations.
ODR hashes are computed using all information
>>>>>>>>>>>>> about a declaration that is
ODR-relevant. If the flag
>>>>>>>>>>>>> -fdetect-odr-violations is
passed, Clang will store the ODR hashes in a
>>>>>>>>>>>>> so-called ODR table in each
object file. Each ODR table will contain a
>>>>>>>>>>>>> mapping from mangled
declaration names to ODR hashes. At link time, the
>>>>>>>>>>>>> linker will read the ODR
table and report any mismatches.
>>>>>>>>>>>>>
>>>>>>>>>>>>> To make this work:
>>>>>>>>>>>>> - LLVM will be extended
with the ability to represent ODR
>>>>>>>>>>>>> tables in the IR and emit
them to object files
>>>>>>>>>>>>> - Clang will be extended
with the ability to emit ODR tables
>>>>>>>>>>>>> using ODR hashes
>>>>>>>>>>>>> - LLD will be extended to
read ODR tables from object files
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have implemented a
prototype of this feature. It is
>>>>>>>>>>>>> available here:
https://github.com/pcc/
>>>>>>>>>>>>>
llvm-project/tree/odr-checker and some results from applying
>>>>>>>>>>>>> it to chromium are here:
crbug.com/726071
>>>>>>>>>>>>> As you can see it did
indeed find a number of real ODR
>>>>>>>>>>>>> violations in Chromium,
including some that wouldn't be detectable using
>>>>>>>>>>>>> debug info.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If you're interested in
what the format of the ODR table would
>>>>>>>>>>>>> look like, that prototype
shows pretty much what I had in mind, but I
>>>>>>>>>>>>> expect many other aspects
of the implementation to change as it is
>>>>>>>>>>>>> upstreamed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> --
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>
>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>> cfe-dev mailing list
>>>>>>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> --
>>>>>>>>>>> Peter
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> --
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> --
>>>>>>>> Peter
>>>>>>>> _______________________________________________
>>>>>>>> cfe-dev mailing list
>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> --
>>>>>> Peter
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> --
>>>> Peter
>>>>
>>>
>>
>>
>> --
>> --
>> Peter
>>
>

-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170614/76716b88/attachment.html>

Rui Ueyama via llvm-dev

2017-Jun-15 01:00 UTC

head link

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

Is the entry in your ODR table 64-bit? Sean mentioned that this is a
birthday paradox situation, but I don't think we need that large hash
values, as our aim is not to avoid any collisions. Small number of
collisions is okay as it just slightly increases false negatives. I think
it can even be 16-bit if space saving is important. If we choose 16-bit
hash, the probability that an ODR violation is not detected is 1/65536,
which is still quite low.

On Wed, Jun 14, 2017 at 1:41 PM, Peter Collingbourne via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
>
>
> On Wed, Jun 14, 2017 at 12:47 PM, David Blaikie <dblaikie at
gmail.com>
> wrote:
>
>>
>>
>> On Tue, Jun 13, 2017, 11:30 PM Peter Collingbourne <peter at
pcc.me.uk>
>> wrote:
>>
>>> On Tue, Jun 13, 2017 at 11:06 PM, David Blaikie <dblaikie at
gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Jun 13, 2017 at 10:05 PM Peter Collingbourne <peter
at pcc.me.uk>
>>>> wrote:
>>>>
>>>>> On Tue, Jun 13, 2017 at 8:48 PM, David Blaikie <dblaikie
at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 13, 2017 at 8:43 PM Peter Collingbourne
<peter at pcc.me.uk>
>>>>>> wrote:
>>>>>>
>>>>>>> On Tue, Jun 13, 2017 at 7:54 PM, David Blaikie
<dblaikie at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 13, 2017 at 6:34 PM Peter
Collingbourne via cfe-dev <
>>>>>>>> cfe-dev at lists.llvm.org> wrote:
>>>>>>>>
>>>>>>>>> On Wed, Jun 7, 2017 at 11:28 PM, Peter
Collingbourne <
>>>>>>>>> peter at pcc.me.uk> wrote:
>>>>>>>>>
>>>>>>>>>> On Wed, Jun 7, 2017 at 8:06 PM, Sean
Silva <chisophugis at gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jun 7, 2017 at 4:31 PM,
Peter Collingbourne <
>>>>>>>>>>> peter at pcc.me.uk> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jun 7, 2017 at 12:17
AM, Sean Silva <
>>>>>>>>>>>> chisophugis at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Very nice and simple
implementation!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do you have any statistics
on how large these odr tables are
>>>>>>>>>>>>> compared to other object
file data? I assume that if these tables contain
>>>>>>>>>>>>> full mangled symbol names,
they could end up being very large and may want
>>>>>>>>>>>>> to share the symbol name
strings with the overall string table in the .o
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Looking at Chromium's
object files it looks like the total size
>>>>>>>>>>>> of the odrtabs is about 50% of
the total size of the object files, which
>>>>>>>>>>>> isn't great. The current
implementation only looks at records, so I imagine
>>>>>>>>>>>> that it would be hard to share
any of the strings that I'm currently
>>>>>>>>>>>> creating. (I guess it's
possible that some types will have a mangled vtable
>>>>>>>>>>>> name in the string table, so we
may be able to share a little that way.)
>>>>>>>>>>>> Note however that this was
without debug info.
>>>>>>>>>>>>
>>>>>>>>>>>> One option for reducing size
would be to
>>>>>>>>>>>> 1) store hashes of ODR names in
ODR tables, per Rui's
>>>>>>>>>>>> suggestion (alongside a
reference to the name itself in the string table)
>>>>>>>>>>>> 2) compress the string table
for the ODR names with a standard
>>>>>>>>>>>> compression algorithm like
gzip.
>>>>>>>>>>>> This wouldn't seem to
affect link time performance much because
>>>>>>>>>>>> I think we should only need to
look at the strings if we see a ODR name
>>>>>>>>>>>> hash match together with an ODR
hash mismatch, which would mean an ODR
>>>>>>>>>>>> violation with a high
probability (i.e. unless there was an ODR name hash
>>>>>>>>>>>> collision, we have found an ODR
violation). If we don't expect a lot of
>>>>>>>>>>>> sharing with regular string
tables (see below), it seems even more
>>>>>>>>>>>> reasonable.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Neat observation!
>>>>>>>>>>>
>>>>>>>>>>> FWIW, it is a birthday problem type
situation though, so for a
>>>>>>>>>>> 32-bit hash, we would expect a
collision in about 1 in 2^16 distinct hashes
>>>>>>>>>>> (and 2^16 seems pretty easy to hit
in a large project). So 64-bit hashes
>>>>>>>>>>> might be preferable.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Oh right, good point, using a 64-bit
hash does seem like a good
>>>>>>>>>> idea here.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Also, do you have any numbers
on the performance of your
>>>>>>>>>>>>> initial implementation?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I measured the link time for
chromium's unit_tests (the largest
>>>>>>>>>>>> single binary in chromium) at
5.05s without ODR checks and 6.61s with ODR
>>>>>>>>>>>> checks. So about 30% overhead,
but in absolute terms it doesn't seem too
>>>>>>>>>>>> bad. So I think this may be
acceptable for an initial implementation, but
>>>>>>>>>>>> it certainly seems worth trying
to do better.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I know that things aren't
currently apples-to-apples, but how
>>>>>>>>>>> does that compare to gold?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I will measure it.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> For that unit_tests binary I measured the
overhead at about 5
>>>>>>>>> seconds (average of 10 runs). That is with
debug info, of course.
>>>>>>>>>
>>>>>>>>> W.r.t. LLD and having it always on by
default (and hence making it
>>>>>>>>>>>>> as fast as possible), it
seems like right now you are implementing the
>>>>>>>>>>>>> checking process with a
hash table. That's simple and fine for a first
>>>>>>>>>>>>> implementation, but
it's probably worth mentioning in a comment the problem
>>>>>>>>>>>>> of checking the tables, at
least from the linker's perspective, does fit
>>>>>>>>>>>>> into a map-reduce pattern
and could be easily parallelized if needed. E.g.
>>>>>>>>>>>>> a parallel sort to coalesce
all entries for symbols of the same name
>>>>>>>>>>>>> followed by a parallel
forEach to check each bucket with the same symbol
>>>>>>>>>>>>> name (roughly speaking).
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Right, that's one approach.
I was thinking of a simpler
>>>>>>>>>>>> approach where at compile time
we sort ODR names by hash and partition them
>>>>>>>>>>>> using (say) the upper bits of
the hash, so that at link time we can have N
>>>>>>>>>>>> threads each building a hash
table for a specific partition.
>>>>>>>>>>>>
>>>>>>>>>>>> And of course this work can be
started right after symbol
>>>>>>>>>>>> resolution finishes and
parallelised with the rest of the work done by the
>>>>>>>>>>>> linker.
>>>>>>>>>>>>
>>>>>>>>>>>> Even better than doing it
faster is just doing less work.
>>>>>>>>>>>>> There's a lot of work
that the linker is already doing that may be reusable
>>>>>>>>>>>>> for the ODR checking.
>>>>>>>>>>>>> E.g.
>>>>>>>>>>>>> - maybe we could get the
coalescing step as a byproduct of our
>>>>>>>>>>>>> existing string deduping,
which we are generally doing anyway.
>>>>>>>>>>>>> - we are already coalescing
symbol names for the symbol table.
>>>>>>>>>>>>> If the ODR table is keyed
off of symbols in the binary that we are
>>>>>>>>>>>>> inserting into the symbol
table, then I think we could do the entire ODR
>>>>>>>>>>>>> check with no extra
"string" work on LLD's part.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I see Rui already mentioned
some of this in
>>>>>>>>>>>>>
https://bugs.chromium.org/p/chromium/issues/detail?id=726
>>>>>>>>>>>>> 071#c4.
>>>>>>>>>>>>> You mentioned that not
everything is necessarily directly
>>>>>>>>>>>>> keyed on a symbol (such as
types), but I think that it would really
>>>>>>>>>>>>> simplify things if the
check was done as such. Do you have any idea exactly
>>>>>>>>>>>>> how much of the things that
we want to check are not keyed on symbols? If
>>>>>>>>>>>>> most things are keyed on
symbols, for the things we are not we can just
>>>>>>>>>>>>> emit extra symbols prefixed
by __clang_odr_check_ or whatever.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Since the current
implementation only works with records there
>>>>>>>>>>>> is basically zero overlap right
now between ODR names and symbols. I
>>>>>>>>>>>> suppose that I could estimate
the amount of function overlap in a
>>>>>>>>>>>> hypothetical implementation
that computes ODR hashes of functions by
>>>>>>>>>>>> comparing the number of *_odr
functions after clang has finished IRgen with
>>>>>>>>>>>> the number after optimization
finishes. This of course would be strictly
>>>>>>>>>>>> more than functions + types.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Wouldn't any function or symbol
using the record type have the
>>>>>>>>>>> type name somewhere in it? If we
used an offset+length encoding (instead of
>>>>>>>>>>> offset + NUL termination) we might
be able to reuse it then (at some cost
>>>>>>>>>>> in finding the reference).
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> That may be possible with some work in
the string table builder.
>>>>>>>>>> But at that point of course we're
not dealing with regular symbols any
>>>>>>>>>> more. I guess we could have two ODR
tables per object file: an array of
>>>>>>>>>> (ODR hash, location) tuples for ODR
names that correspond to symbol table
>>>>>>>>>> symbols (i.e. Rui's proposal on the
chromium bug), and an array of (ODR
>>>>>>>>>> name, ODR hash, location) tuples for
all other ODR names. I guess if we
>>>>>>>>>> wanted a "low overhead" mode
we could just omit the second table or put
>>>>>>>>>> fewer symbols in it.
>>>>>>>>>>
>>>>>>>>>> With debug info surely there is some
sort of string representing
>>>>>>>>>>> the record name or something like
that, no?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Not the record name on its own (they do
appear but a bit
>>>>>>>>>> awkwardly -- each namespace component
is stored in a separate string), but
>>>>>>>>>> if the record has at least one member
function the mangled type name will
>>>>>>>>>> appear somewhere in .debug_str, so we
could in principle reuse that with
>>>>>>>>>> the offset/length trick.
>>>>>>>>>>
>>>>>>>>>> I guess we may have to have our
"low-overhead" user-facing
>>>>>>>>>>> behavior be a bit more nuanced.
E.g.:
>>>>>>>>>>> 1. does this feature bloat object
files significantly
>>>>>>>>>>> 2. does this feature slow down link
times significantly
>>>>>>>>>>>
>>>>>>>>>>> Intuitively, it seems like we
should be able to get 1. when
>>>>>>>>>>> debug info happens to be enabled
(not sure about split dwarf?) and possibly
>>>>>>>>>>> in all cases at the cost of
complexity. We may be able to get 2. in all
>>>>>>>>>>> cases with proper design.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think that would be my rough
assessment as well. I think we
>>>>>>>>>> have a good shot at 1 for all cases
with some of the ideas that have been
>>>>>>>>>> mentioned already. If we can avoid
creating dependencies on DWARF I think
>>>>>>>>>> that would be ideal -- I'd ideally
like this to work for COFF as well,
>>>>>>>>>> where you'd typically expect to
find CodeView in object files. If I were to
>>>>>>>>>> try this I think the first thing that I
would try is hash/compression
>>>>>>>>>> combined with the two ODR tables (no
reuse for non-symbol ODR names to
>>>>>>>>>> start with, as compression may be
enough on its own).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I developed a second prototype which uses
hash/compression with no
>>>>>>>>> attempt to reuse. It is available here:
https://github.com/pcc/l
>>>>>>>>> lvm-project/tree/odr-checker2
>>>>>>>>>
>>>>>>>>> For Chromium the object file size overhead
was 536566007 bytes, or
>>>>>>>>> in relative terms about 25%, or about 4%
with debug info. I measured perf
>>>>>>>>> overhead for unit_tests at about 6%, but
after I moved the checker onto
>>>>>>>>> another thread, the overhead disappeared
into the noise.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Still seems like quite a big increase.
>>>>>>>>
>>>>>>>> Any chance of compression?
>>>>>>>>
>>>>>>>
>>>>>>> That was with compression -- the implementation
compresses the parts
>>>>>>> of the ODR table that aren't hashes (aside from
the header and the Clang
>>>>>>> version, which is a small fixed cost), as well as
the string table. The
>>>>>>> hashes were left uncompressed because they are in
the critical path of the
>>>>>>> linker and because I imagine that they wouldn't
really be that compressible.
>>>>>>>
>>>>>>
>>>>>> I'd be a bit surprised if they weren't
especially compressible -
>>>>>>
>>>>>
>>>>> Maybe I'm wrong, but my intuition about compression is
that it works
>>>>> best when the data contains repeated patterns. If we use a
hash function
>>>>> with good dispersion then I'd expect each hash to have
little in common
>>>>> with other hashes.
>>>>>
>>>>>
>>>>>> and how much of the size increase is the compressed
data V the
>>>>>> uncompressed data?
>>>>>>
>>>>>
>>>>> The ratio was roughly 60% compressed data to 40%
uncompressed data.
>>>>>
>>>>>
>>>>>> Is it still in the hot path when parallelized?
>>>>>>
>>>>>
>>>>> Not right now according to my benchmarking, but
decompression could
>>>>> push it into the critical path if it ends up taking longer
than the rest of
>>>>> the work done by the linker after symbol resolution. On the
same machine
>>>>> that I used for benchmarking, gunzip'ing 200MB of
/dev/urandom (which is
>>>>> roughly what I'd expect the hashes to look like) takes
around 1.1s, i.e. a
>>>>> not insignificant fraction of lld's runtime.
>>>>>
>>>>>
>>>>>> So I think the remaining gains would either be through
limiting the
>>>>>>> number of ODR table entries, or through reuse of
data.
>>>>>>>
>>>>>>> Limiting might be something to explore -- one
possibility is that we
>>>>>>> could limit the ODR table entries to the
declarations that are "used" by a
>>>>>>> particular translation unit (it appears that Clang
tracks something like
>>>>>>> that in Decl::Used/Decl::Referenced, but I'm
not sure if that is exactly
>>>>>>> what we need -- I think we would basically need to
test for reference
>>>>>>> reachability from the functions/globals that are
IRgen'd).
>>>>>>>
>>>>>>
>>>>>> Currently it has every type and function that is in the
AST? Yeah,
>>>>>> that's a lot - perhaps it should be more like the
things that go in the
>>>>>> DWARF? (though would need to add some cases there -
since the DWARF logic
>>>>>> already relies on the ODR to not emit duplicates in
some cases)
>>>>>>
>>>>>
>>>>> Just every record declaration -- Clang only supports ODR
hashes for
>>>>> record declarations right now. I understand that function
declarations
>>>>> (including function bodies) are still works in progress.
>>>>>
>>>>> I think it should indeed just be roughly the things that go
in the
>>>>> DWARF. I think that at one point I observed that every
record declaration,
>>>>> even unused ones, were going into the DWARF, but I might
have been mistaken
>>>>> because I can no longer reproduce that. I'll take a
closer look to see if I
>>>>> can reuse what logic presumably already exists for DWARF.
>>>>>
>>>>> In terms of reuse, it seems that of the 536566007 bytes of
>>>>>>> overhead, 319309579 were the compressed part of the
ODR tables. So even if
>>>>>>> we achieved 100% sharing,
>>>>>>>
>>>>>>
>>>>>> 100% sharing? You mean if all the data were compressed,
and assuming
>>>>>> the hashes were compressible at the same ratio as the
other data?
>>>>>>
>>>>>
>>>>> Sorry, I mean if 100% of the data in the compressed part of
the ODR
>>>>> table could be eliminated by reusing data stored elsewhere
(e.g. in the
>>>>> object file string table or in the DWARF).
>>>>>
>>>>> with the current scheme I think that our minimum achievable
overhead
>>>>>>> would be ~15% (no debug info) or ~2% (with debug
info).
>>>>>>>
>>>>>>>
>>>>>>>> Could this go into .dwo files with Fission and
be checked by dwp
>>>>>>>> instead, perhaps?
>>>>>>>>
>>>>>>>
>>>>>>> I think it could also work that way, yes.
>>>>>>>
>>>>>>> I'm reasonably happy with these figures, at
least for a first
>>>>>>>>> implementation. We may be able to do even
better for file size with reuse,
>>>>>>>>> but I'd leave that for version 2.
>>>>>>>>>
>>>>>>>>
>>>>>>>> What's the story with compatibility between
versions, then? Is
>>>>>>>> there a version header?
>>>>>>>>
>>>>>>>
>>>>>>> Yes, the header contains a version number.
>>>>>>>
>>>>>>>
>>>>>>>> Will old formats be supported by lld
indefinitely? Not at all?
>>>>>>>>
>>>>>>>
>>>>>>> I think we should drop support for old formats when
we introduce a
>>>>>>> new format. My understanding is that the ODR hash
can change whenever Clang
>>>>>>> changes (the implementation only performs ODR
checking if all ODR tables
>>>>>>> were produced by the same revision of Clang), so
there wouldn't seem to be
>>>>>>> a huge benefit in keeping support for old formats
around.
>>>>>>>
>>>>>>
>>>>>> I imagine it's possible people aren't
necessarily going to rev lld in
>>>>>> exact lock-step with clang, but I could be wrong.
(certainly binutils ld or
>>>>>> gold aren't released/kept in lock-step with GCC,
for example)
>>>>>>
>>>>>
>>>>> That's certainly possible, but I'd say that the bar
for dropping
>>>>> backwards compatibility is lower because ODR tables are not
required for
>>>>> correctness. We could keep compatibility with the last
version or so if it
>>>>> isn't too burdensome, or otherwise print a warning.
>>>>>
>>>>
>>>> They aren't required for correctness, but upgrading your
compiler or
>>>> linker then silently losing ODR checking would be bad (or even
not silently
>>>> losing it, but having no choice but to rev both to keep the
functionality &
>>>> hold the ODR-cleanliness bar) - it's the sort of thing
where if you lost
>>>> the checking, then gained it back again later, the regression
cleanup would
>>>> be annoying/an impediment to using the feature.
>>>>
>>>
>>> Makes sense I guess. I'd be fine with a policy where the Nth
open source
>>> release should be able to read ODR tables produced by the N-1th and
>>> possibly the N-2th release.
>>>
>>
>> Still strikes me as a bit awkward - wonder how that compared to other
>> (similar or different) linker features.
>>
>
> I think the most similar existing feature is .gdb_index. They have already
> gone through a few format revisions:
> https://sourceware.org/gdb/onlinedocs/gdb/Index-Section-Format.html
> and have deprecated/removed support for older formats.
>
> Because the requirements for ODR tables are simpler than those for
> .gdb_index, I'd expect us to converge on a final format sooner, so in
> practice the window of compatibility would end up being longer than a year.
>
> Peter
>
>
>>
>>> Any idea what Daniel Jasper & co have been working on WRT ODR
checking &
>>>> how this feature integrates or doesn't with their work? I
imagine they
>>>> might be working on something more like a Clang Tooling style
approach, but
>>>> I'm not sure.
>>>>
>>>
>>> I'm not aware of any work like that, only of Richard
Trieu's efforts for
>>> modules that I'm piggybacking on.
>>>
>>
>> +Djasper - perhaps you could provide some context on other odr
detection
>> efforts?
>>
>>
>>> Peter
>>>
>>>>
>>>> - Dave
>>>>
>>>>
>>>>>
>>>>> Peter
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>>>
>>>>>>>> - Dave
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> The issue of retaining the
ODR check for functions even if
>>>>>>>>>>>>> they get inlined may
inherently pose an extra cost that can't be folded
>>>>>>>>>>>>> into existing work the
linker is doing, so there might be a reason for
>>>>>>>>>>>>> clang to have a default
mode that has practically no linking overhead and
>>>>>>>>>>>>> one that does more thorough
checking but imposes extra linking overhead.
>>>>>>>>>>>>> Think something like a
crazy boost library with thousands of functions that
>>>>>>>>>>>>> get inlined away, but have
gigantic mangled names and so precisely are the
>>>>>>>>>>>>> ones that are going to
impose extra cost on the linker. Simply due to the
>>>>>>>>>>>>> extra volume of strings
that the linker would need to look at, I don't
>>>>>>>>>>>>> think there's a way to
include checking of all inlined function "for free"
>>>>>>>>>>>>> at the linker level using
the symbol approach.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> I guess those inlined functions
would still have those symbol
>>>>>>>>>>>>> names in debug info (I
think?), so piggybacking on the string deduplication
>>>>>>>>>>>>> we're already doing
might make it possible to fold away the work in that
>>>>>>>>>>>>> case (but then again, would
still impose extra cost with split dwarf...).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Anyway, let's wait to
see what the actual performance numbers
>>>>>>>>>>>>> are.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jun 6, 2017 at
10:40 PM, Peter Collingbourne via
>>>>>>>>>>>>> cfe-dev <cfe-dev at
lists.llvm.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'd like to propose
an ODR checker feature for Clang and LLD.
>>>>>>>>>>>>>> The feature would be
similar to gold's --detect-odr-violations feature, but
>>>>>>>>>>>>>> better: we can rely on
integration with clang to avoid relying on debug
>>>>>>>>>>>>>> info and to perform
more precise matching.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The basic idea is that
we use clang's ability to create ODR
>>>>>>>>>>>>>> hashes for
declarations. ODR hashes are computed using all information
>>>>>>>>>>>>>> about a declaration
that is ODR-relevant. If the flag
>>>>>>>>>>>>>> -fdetect-odr-violations
is passed, Clang will store the ODR hashes in a
>>>>>>>>>>>>>> so-called ODR table in
each object file. Each ODR table will contain a
>>>>>>>>>>>>>> mapping from mangled
declaration names to ODR hashes. At link time, the
>>>>>>>>>>>>>> linker will read the
ODR table and report any mismatches.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To make this work:
>>>>>>>>>>>>>> - LLVM will be extended
with the ability to represent ODR
>>>>>>>>>>>>>> tables in the IR and
emit them to object files
>>>>>>>>>>>>>> - Clang will be
extended with the ability to emit ODR tables
>>>>>>>>>>>>>> using ODR hashes
>>>>>>>>>>>>>> - LLD will be extended
to read ODR tables from object files
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have implemented a
prototype of this feature. It is
>>>>>>>>>>>>>> available here:
https://github.com/pcc/l
>>>>>>>>>>>>>>
lvm-project/tree/odr-checker and some results from applying
>>>>>>>>>>>>>> it to chromium are
here: crbug.com/726071
>>>>>>>>>>>>>> As you can see it did
indeed find a number of real ODR
>>>>>>>>>>>>>> violations in Chromium,
including some that wouldn't be detectable using
>>>>>>>>>>>>>> debug info.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If you're
interested in what the format of the ODR table
>>>>>>>>>>>>>> would look like, that
prototype shows pretty much what I had in mind, but I
>>>>>>>>>>>>>> expect many other
aspects of the implementation to change as it is
>>>>>>>>>>>>>> upstreamed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>> cfe-dev mailing list
>>>>>>>>>>>>>> cfe-dev at
lists.llvm.org
>>>>>>>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> --
>>>>>>>>>>>> Peter
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> --
>>>>>>>>> Peter
>>>>>>>>>
_______________________________________________
>>>>>>>>> cfe-dev mailing list
>>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> --
>>>>>>> Peter
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> --
>>>>> Peter
>>>>>
>>>>
>>>
>>>
>>> --
>>> --
>>> Peter
>>>
>>
>
>
> --
> --
> Peter
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170614/275102ab/attachment-0001.html>

David Blaikie via llvm-dev

2017-Jun-15 01:14 UTC

head link

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

On Wed, Jun 14, 2017 at 1:41 PM Peter Collingbourne <peter at pcc.me.uk>
wrote:
> On Wed, Jun 14, 2017 at 12:47 PM, David Blaikie <dblaikie at
gmail.com>
> wrote:
>
>>
>>
>> On Tue, Jun 13, 2017, 11:30 PM Peter Collingbourne <peter at
pcc.me.uk>
>> wrote:
>>
>>> On Tue, Jun 13, 2017 at 11:06 PM, David Blaikie <dblaikie at
gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Jun 13, 2017 at 10:05 PM Peter Collingbourne <peter
at pcc.me.uk>
>>>> wrote:
>>>>
>>>>> On Tue, Jun 13, 2017 at 8:48 PM, David Blaikie <dblaikie
at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 13, 2017 at 8:43 PM Peter Collingbourne
<peter at pcc.me.uk>
>>>>>> wrote:
>>>>>>
>>>>>>> On Tue, Jun 13, 2017 at 7:54 PM, David Blaikie
<dblaikie at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 13, 2017 at 6:34 PM Peter
Collingbourne via cfe-dev <
>>>>>>>> cfe-dev at lists.llvm.org> wrote:
>>>>>>>>
>>>>>>>>> On Wed, Jun 7, 2017 at 11:28 PM, Peter
Collingbourne <
>>>>>>>>> peter at pcc.me.uk> wrote:
>>>>>>>>>
>>>>>>>>>> On Wed, Jun 7, 2017 at 8:06 PM, Sean
Silva <chisophugis at gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jun 7, 2017 at 4:31 PM,
Peter Collingbourne <
>>>>>>>>>>> peter at pcc.me.uk> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jun 7, 2017 at 12:17
AM, Sean Silva <
>>>>>>>>>>>> chisophugis at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Very nice and simple
implementation!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do you have any statistics
on how large these odr tables are
>>>>>>>>>>>>> compared to other object
file data? I assume that if these tables contain
>>>>>>>>>>>>> full mangled symbol names,
they could end up being very large and may want
>>>>>>>>>>>>> to share the symbol name
strings with the overall string table in the .o
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Looking at Chromium's
object files it looks like the total size
>>>>>>>>>>>> of the odrtabs is about 50% of
the total size of the object files, which
>>>>>>>>>>>> isn't great. The current
implementation only looks at records, so I imagine
>>>>>>>>>>>> that it would be hard to share
any of the strings that I'm currently
>>>>>>>>>>>> creating. (I guess it's
possible that some types will have a mangled vtable
>>>>>>>>>>>> name in the string table, so we
may be able to share a little that way.)
>>>>>>>>>>>> Note however that this was
without debug info.
>>>>>>>>>>>>
>>>>>>>>>>>> One option for reducing size
would be to
>>>>>>>>>>>> 1) store hashes of ODR names in
ODR tables, per Rui's
>>>>>>>>>>>> suggestion (alongside a
reference to the name itself in the string table)
>>>>>>>>>>>> 2) compress the string table
for the ODR names with a standard
>>>>>>>>>>>> compression algorithm like
gzip.
>>>>>>>>>>>> This wouldn't seem to
affect link time performance much because
>>>>>>>>>>>> I think we should only need to
look at the strings if we see a ODR name
>>>>>>>>>>>> hash match together with an ODR
hash mismatch, which would mean an ODR
>>>>>>>>>>>> violation with a high
probability (i.e. unless there was an ODR name hash
>>>>>>>>>>>> collision, we have found an ODR
violation). If we don't expect a lot of
>>>>>>>>>>>> sharing with regular string
tables (see below), it seems even more
>>>>>>>>>>>> reasonable.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Neat observation!
>>>>>>>>>>>
>>>>>>>>>>> FWIW, it is a birthday problem type
situation though, so for a
>>>>>>>>>>> 32-bit hash, we would expect a
collision in about 1 in 2^16 distinct hashes
>>>>>>>>>>> (and 2^16 seems pretty easy to hit
in a large project). So 64-bit hashes
>>>>>>>>>>> might be preferable.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Oh right, good point, using a 64-bit
hash does seem like a good
>>>>>>>>>> idea here.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Also, do you have any numbers
on the performance of your
>>>>>>>>>>>>> initial implementation?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I measured the link time for
chromium's unit_tests (the largest
>>>>>>>>>>>> single binary in chromium) at
5.05s without ODR checks and 6.61s with ODR
>>>>>>>>>>>> checks. So about 30% overhead,
but in absolute terms it doesn't seem too
>>>>>>>>>>>> bad. So I think this may be
acceptable for an initial implementation, but
>>>>>>>>>>>> it certainly seems worth trying
to do better.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I know that things aren't
currently apples-to-apples, but how
>>>>>>>>>>> does that compare to gold?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I will measure it.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> For that unit_tests binary I measured the
overhead at about 5
>>>>>>>>> seconds (average of 10 runs). That is with
debug info, of course.
>>>>>>>>>
>>>>>>>>> W.r.t. LLD and having it always on by
default (and hence making it
>>>>>>>>>>>>> as fast as possible), it
seems like right now you are implementing the
>>>>>>>>>>>>> checking process with a
hash table. That's simple and fine for a first
>>>>>>>>>>>>> implementation, but
it's probably worth mentioning in a comment the problem
>>>>>>>>>>>>> of checking the tables, at
least from the linker's perspective, does fit
>>>>>>>>>>>>> into a map-reduce pattern
and could be easily parallelized if needed. E.g.
>>>>>>>>>>>>> a parallel sort to coalesce
all entries for symbols of the same name
>>>>>>>>>>>>> followed by a parallel
forEach to check each bucket with the same symbol
>>>>>>>>>>>>> name (roughly speaking).
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Right, that's one approach.
I was thinking of a simpler
>>>>>>>>>>>> approach where at compile time
we sort ODR names by hash and partition them
>>>>>>>>>>>> using (say) the upper bits of
the hash, so that at link time we can have N
>>>>>>>>>>>> threads each building a hash
table for a specific partition.
>>>>>>>>>>>>
>>>>>>>>>>>> And of course this work can be
started right after symbol
>>>>>>>>>>>> resolution finishes and
parallelised with the rest of the work done by the
>>>>>>>>>>>> linker.
>>>>>>>>>>>>
>>>>>>>>>>>> Even better than doing it
faster is just doing less work.
>>>>>>>>>>>>> There's a lot of work
that the linker is already doing that may be reusable
>>>>>>>>>>>>> for the ODR checking.
>>>>>>>>>>>>> E.g.
>>>>>>>>>>>>> - maybe we could get the
coalescing step as a byproduct of our
>>>>>>>>>>>>> existing string deduping,
which we are generally doing anyway.
>>>>>>>>>>>>> - we are already coalescing
symbol names for the symbol table.
>>>>>>>>>>>>> If the ODR table is keyed
off of symbols in the binary that we are
>>>>>>>>>>>>> inserting into the symbol
table, then I think we could do the entire ODR
>>>>>>>>>>>>> check with no extra
"string" work on LLD's part.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I see Rui already mentioned
some of this in
>>>>>>>>>>>>>
https://bugs.chromium.org/p/chromium/issues/detail?id=726071#c4
>>>>>>>>>>>>> .
>>>>>>>>>>>>> You mentioned that not
everything is necessarily directly
>>>>>>>>>>>>> keyed on a symbol (such as
types), but I think that it would really
>>>>>>>>>>>>> simplify things if the
check was done as such. Do you have any idea exactly
>>>>>>>>>>>>> how much of the things that
we want to check are not keyed on symbols? If
>>>>>>>>>>>>> most things are keyed on
symbols, for the things we are not we can just
>>>>>>>>>>>>> emit extra symbols prefixed
by __clang_odr_check_ or whatever.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Since the current
implementation only works with records there
>>>>>>>>>>>> is basically zero overlap right
now between ODR names and symbols. I
>>>>>>>>>>>> suppose that I could estimate
the amount of function overlap in a
>>>>>>>>>>>> hypothetical implementation
that computes ODR hashes of functions by
>>>>>>>>>>>> comparing the number of *_odr
functions after clang has finished IRgen with
>>>>>>>>>>>> the number after optimization
finishes. This of course would be strictly
>>>>>>>>>>>> more than functions + types.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Wouldn't any function or symbol
using the record type have the
>>>>>>>>>>> type name somewhere in it? If we
used an offset+length encoding (instead of
>>>>>>>>>>> offset + NUL termination) we might
be able to reuse it then (at some cost
>>>>>>>>>>> in finding the reference).
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> That may be possible with some work in
the string table builder.
>>>>>>>>>> But at that point of course we're
not dealing with regular symbols any
>>>>>>>>>> more. I guess we could have two ODR
tables per object file: an array of
>>>>>>>>>> (ODR hash, location) tuples for ODR
names that correspond to symbol table
>>>>>>>>>> symbols (i.e. Rui's proposal on the
chromium bug), and an array of (ODR
>>>>>>>>>> name, ODR hash, location) tuples for
all other ODR names. I guess if we
>>>>>>>>>> wanted a "low overhead" mode
we could just omit the second table or put
>>>>>>>>>> fewer symbols in it.
>>>>>>>>>>
>>>>>>>>>> With debug info surely there is some
sort of string representing
>>>>>>>>>>> the record name or something like
that, no?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Not the record name on its own (they do
appear but a bit
>>>>>>>>>> awkwardly -- each namespace component
is stored in a separate string), but
>>>>>>>>>> if the record has at least one member
function the mangled type name will
>>>>>>>>>> appear somewhere in .debug_str, so we
could in principle reuse that with
>>>>>>>>>> the offset/length trick.
>>>>>>>>>>
>>>>>>>>>> I guess we may have to have our
"low-overhead" user-facing
>>>>>>>>>>> behavior be a bit more nuanced.
E.g.:
>>>>>>>>>>> 1. does this feature bloat object
files significantly
>>>>>>>>>>> 2. does this feature slow down link
times significantly
>>>>>>>>>>>
>>>>>>>>>>> Intuitively, it seems like we
should be able to get 1. when
>>>>>>>>>>> debug info happens to be enabled
(not sure about split dwarf?) and possibly
>>>>>>>>>>> in all cases at the cost of
complexity. We may be able to get 2. in all
>>>>>>>>>>> cases with proper design.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think that would be my rough
assessment as well. I think we
>>>>>>>>>> have a good shot at 1 for all cases
with some of the ideas that have been
>>>>>>>>>> mentioned already. If we can avoid
creating dependencies on DWARF I think
>>>>>>>>>> that would be ideal -- I'd ideally
like this to work for COFF as well,
>>>>>>>>>> where you'd typically expect to
find CodeView in object files. If I were to
>>>>>>>>>> try this I think the first thing that I
would try is hash/compression
>>>>>>>>>> combined with the two ODR tables (no
reuse for non-symbol ODR names to
>>>>>>>>>> start with, as compression may be
enough on its own).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I developed a second prototype which uses
hash/compression with no
>>>>>>>>> attempt to reuse. It is available here:
>>>>>>>>>
https://github.com/pcc/llvm-project/tree/odr-checker2
>>>>>>>>>
>>>>>>>>> For Chromium the object file size overhead
was 536566007 bytes, or
>>>>>>>>> in relative terms about 25%, or about 4%
with debug info. I measured perf
>>>>>>>>> overhead for unit_tests at about 6%, but
after I moved the checker onto
>>>>>>>>> another thread, the overhead disappeared
into the noise.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Still seems like quite a big increase.
>>>>>>>>
>>>>>>>> Any chance of compression?
>>>>>>>>
>>>>>>>
>>>>>>> That was with compression -- the implementation
compresses the parts
>>>>>>> of the ODR table that aren't hashes (aside from
the header and the Clang
>>>>>>> version, which is a small fixed cost), as well as
the string table. The
>>>>>>> hashes were left uncompressed because they are in
the critical path of the
>>>>>>> linker and because I imagine that they wouldn't
really be that compressible.
>>>>>>>
>>>>>>
>>>>>> I'd be a bit surprised if they weren't
especially compressible -
>>>>>>
>>>>>
>>>>> Maybe I'm wrong, but my intuition about compression is
that it works
>>>>> best when the data contains repeated patterns. If we use a
hash function
>>>>> with good dispersion then I'd expect each hash to have
little in common
>>>>> with other hashes.
>>>>>
>>>>>
>>>>>> and how much of the size increase is the compressed
data V the
>>>>>> uncompressed data?
>>>>>>
>>>>>
>>>>> The ratio was roughly 60% compressed data to 40%
uncompressed data.
>>>>>
>>>>>
>>>>>> Is it still in the hot path when parallelized?
>>>>>>
>>>>>
>>>>> Not right now according to my benchmarking, but
decompression could
>>>>> push it into the critical path if it ends up taking longer
than the rest of
>>>>> the work done by the linker after symbol resolution. On the
same machine
>>>>> that I used for benchmarking, gunzip'ing 200MB of
/dev/urandom (which is
>>>>> roughly what I'd expect the hashes to look like) takes
around 1.1s, i.e. a
>>>>> not insignificant fraction of lld's runtime.
>>>>>
>>>>>
>>>>>> So I think the remaining gains would either be through
limiting the
>>>>>>> number of ODR table entries, or through reuse of
data.
>>>>>>>
>>>>>>> Limiting might be something to explore -- one
possibility is that we
>>>>>>> could limit the ODR table entries to the
declarations that are "used" by a
>>>>>>> particular translation unit (it appears that Clang
tracks something like
>>>>>>> that in Decl::Used/Decl::Referenced, but I'm
not sure if that is exactly
>>>>>>> what we need -- I think we would basically need to
test for reference
>>>>>>> reachability from the functions/globals that are
IRgen'd).
>>>>>>>
>>>>>>
>>>>>> Currently it has every type and function that is in the
AST? Yeah,
>>>>>> that's a lot - perhaps it should be more like the
things that go in the
>>>>>> DWARF? (though would need to add some cases there -
since the DWARF logic
>>>>>> already relies on the ODR to not emit duplicates in
some cases)
>>>>>>
>>>>>
>>>>> Just every record declaration -- Clang only supports ODR
hashes for
>>>>> record declarations right now. I understand that function
declarations
>>>>> (including function bodies) are still works in progress.
>>>>>
>>>>> I think it should indeed just be roughly the things that go
in the
>>>>> DWARF. I think that at one point I observed that every
record declaration,
>>>>> even unused ones, were going into the DWARF, but I might
have been mistaken
>>>>> because I can no longer reproduce that. I'll take a
closer look to see if I
>>>>> can reuse what logic presumably already exists for DWARF.
>>>>>
>>>>> In terms of reuse, it seems that of the 536566007 bytes of
>>>>>>> overhead, 319309579 were the compressed part of the
ODR tables. So even if
>>>>>>> we achieved 100% sharing,
>>>>>>>
>>>>>>
>>>>>> 100% sharing? You mean if all the data were compressed,
and assuming
>>>>>> the hashes were compressible at the same ratio as the
other data?
>>>>>>
>>>>>
>>>>> Sorry, I mean if 100% of the data in the compressed part of
the ODR
>>>>> table could be eliminated by reusing data stored elsewhere
(e.g. in the
>>>>> object file string table or in the DWARF).
>>>>>
>>>>> with the current scheme I think that our minimum achievable
overhead
>>>>>>> would be ~15% (no debug info) or ~2% (with debug
info).
>>>>>>>
>>>>>>>
>>>>>>>> Could this go into .dwo files with Fission and
be checked by dwp
>>>>>>>> instead, perhaps?
>>>>>>>>
>>>>>>>
>>>>>>> I think it could also work that way, yes.
>>>>>>>
>>>>>>> I'm reasonably happy with these figures, at
least for a first
>>>>>>>>> implementation. We may be able to do even
better for file size with reuse,
>>>>>>>>> but I'd leave that for version 2.
>>>>>>>>>
>>>>>>>>
>>>>>>>> What's the story with compatibility between
versions, then? Is
>>>>>>>> there a version header?
>>>>>>>>
>>>>>>>
>>>>>>> Yes, the header contains a version number.
>>>>>>>
>>>>>>>
>>>>>>>> Will old formats be supported by lld
indefinitely? Not at all?
>>>>>>>>
>>>>>>>
>>>>>>> I think we should drop support for old formats when
we introduce a
>>>>>>> new format. My understanding is that the ODR hash
can change whenever Clang
>>>>>>> changes (the implementation only performs ODR
checking if all ODR tables
>>>>>>> were produced by the same revision of Clang), so
there wouldn't seem to be
>>>>>>> a huge benefit in keeping support for old formats
around.
>>>>>>>
>>>>>>
>>>>>> I imagine it's possible people aren't
necessarily going to rev lld in
>>>>>> exact lock-step with clang, but I could be wrong.
(certainly binutils ld or
>>>>>> gold aren't released/kept in lock-step with GCC,
for example)
>>>>>>
>>>>>
>>>>> That's certainly possible, but I'd say that the bar
for dropping
>>>>> backwards compatibility is lower because ODR tables are not
required for
>>>>> correctness. We could keep compatibility with the last
version or so if it
>>>>> isn't too burdensome, or otherwise print a warning.
>>>>>
>>>>
>>>> They aren't required for correctness, but upgrading your
compiler or
>>>> linker then silently losing ODR checking would be bad (or even
not silently
>>>> losing it, but having no choice but to rev both to keep the
functionality &
>>>> hold the ODR-cleanliness bar) - it's the sort of thing
where if you lost
>>>> the checking, then gained it back again later, the regression
cleanup would
>>>> be annoying/an impediment to using the feature.
>>>>
>>>
>>> Makes sense I guess. I'd be fine with a policy where the Nth
open source
>>> release should be able to read ODR tables produced by the N-1th and
>>> possibly the N-2th release.
>>>
>>
>> Still strikes me as a bit awkward - wonder how that compared to other
>> (similar or different) linker features.
>>
>
> I think the most similar existing feature is .gdb_index. They have already
> gone through a few format revisions:
> https://sourceware.org/gdb/onlinedocs/gdb/Index-Section-Format.html
> and have deprecated/removed support for older formats.
>
Not sure it's quite the same - that's an output of the linker &
produces a
performance degredation in GDB when not present. Here the discussion is a
format input to the linker that produces warnings, potentially errors
(Werror) for users that they might want to maintain compatibility for.
Seems like a higher bar.

Feels to me like a bit more time could be spent on a design/size impact/etc
to give a better chance of ongoing stability (and/or backcompat at least)
for users. But just a thought.

- Dave

>
> Because the requirements for ODR tables are simpler than those for
> .gdb_index, I'd expect us to converge on a final format sooner, so in
> practice the window of compatibility would end up being longer than a year.
>
> Peter
>
>
>>
>>> Any idea what Daniel Jasper & co have been working on WRT ODR
checking &
>>>> how this feature integrates or doesn't with their work? I
imagine they
>>>> might be working on something more like a Clang Tooling style
approach, but
>>>> I'm not sure.
>>>>
>>>
>>> I'm not aware of any work like that, only of Richard
Trieu's efforts for
>>> modules that I'm piggybacking on.
>>>
>>
>> +Djasper - perhaps you could provide some context on other odr
detection
>> efforts?
>>
>>
>>> Peter
>>>
>>>>
>>>> - Dave
>>>>
>>>>
>>>>>
>>>>> Peter
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>>>
>>>>>>>> - Dave
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> The issue of retaining the
ODR check for functions even if
>>>>>>>>>>>>> they get inlined may
inherently pose an extra cost that can't be folded
>>>>>>>>>>>>> into existing work the
linker is doing, so there might be a reason for
>>>>>>>>>>>>> clang to have a default
mode that has practically no linking overhead and
>>>>>>>>>>>>> one that does more thorough
checking but imposes extra linking overhead.
>>>>>>>>>>>>> Think something like a
crazy boost library with thousands of functions that
>>>>>>>>>>>>> get inlined away, but have
gigantic mangled names and so precisely are the
>>>>>>>>>>>>> ones that are going to
impose extra cost on the linker. Simply due to the
>>>>>>>>>>>>> extra volume of strings
that the linker would need to look at, I don't
>>>>>>>>>>>>> think there's a way to
include checking of all inlined function "for free"
>>>>>>>>>>>>> at the linker level using
the symbol approach.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> I guess those inlined functions
would still have those symbol
>>>>>>>>>>>>> names in debug info (I
think?), so piggybacking on the string deduplication
>>>>>>>>>>>>> we're already doing
might make it possible to fold away the work in that
>>>>>>>>>>>>> case (but then again, would
still impose extra cost with split dwarf...).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Anyway, let's wait to
see what the actual performance numbers
>>>>>>>>>>>>> are.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jun 6, 2017 at
10:40 PM, Peter Collingbourne via
>>>>>>>>>>>>> cfe-dev <cfe-dev at
lists.llvm.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'd like to propose
an ODR checker feature for Clang and LLD.
>>>>>>>>>>>>>> The feature would be
similar to gold's --detect-odr-violations feature, but
>>>>>>>>>>>>>> better: we can rely on
integration with clang to avoid relying on debug
>>>>>>>>>>>>>> info and to perform
more precise matching.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The basic idea is that
we use clang's ability to create ODR
>>>>>>>>>>>>>> hashes for
declarations. ODR hashes are computed using all information
>>>>>>>>>>>>>> about a declaration
that is ODR-relevant. If the flag
>>>>>>>>>>>>>> -fdetect-odr-violations
is passed, Clang will store the ODR hashes in a
>>>>>>>>>>>>>> so-called ODR table in
each object file. Each ODR table will contain a
>>>>>>>>>>>>>> mapping from mangled
declaration names to ODR hashes. At link time, the
>>>>>>>>>>>>>> linker will read the
ODR table and report any mismatches.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To make this work:
>>>>>>>>>>>>>> - LLVM will be extended
with the ability to represent ODR
>>>>>>>>>>>>>> tables in the IR and
emit them to object files
>>>>>>>>>>>>>> - Clang will be
extended with the ability to emit ODR tables
>>>>>>>>>>>>>> using ODR hashes
>>>>>>>>>>>>>> - LLD will be extended
to read ODR tables from object files
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have implemented a
prototype of this feature. It is
>>>>>>>>>>>>>> available here:
>>>>>>>>>>>>>>
https://github.com/pcc/llvm-project/tree/odr-checker and
>>>>>>>>>>>>>> some results from
applying it to chromium are here:
>>>>>>>>>>>>>> crbug.com/726071
>>>>>>>>>>>>>> As you can see it did
indeed find a number of real ODR
>>>>>>>>>>>>>> violations in Chromium,
including some that wouldn't be detectable using
>>>>>>>>>>>>>> debug info.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If you're
interested in what the format of the ODR table
>>>>>>>>>>>>>> would look like, that
prototype shows pretty much what I had in mind, but I
>>>>>>>>>>>>>> expect many other
aspects of the implementation to change as it is
>>>>>>>>>>>>>> upstreamed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>> cfe-dev mailing list
>>>>>>>>>>>>>> cfe-dev at
lists.llvm.org
>>>>>>>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> --
>>>>>>>>>>>> Peter
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> --
>>>>>>>>> Peter
>>>>>>>>>
_______________________________________________
>>>>>>>>> cfe-dev mailing list
>>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> --
>>>>>>> Peter
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> --
>>>>> Peter
>>>>>
>>>>
>>>
>>>
>>> --
>>> --
>>> Peter
>>>
>>
>
>
> --
> --
> Peter
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170615/66a682c3/attachment.html>

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Jun 2017 - [cfe-dev] RFC: ODR checker for Clang and LLD

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

Maybe Matching Threads