thr3ads.net - llvm dev - [llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD [Jun 2017]

If this information is useful, please help other people find it:
Share via:

David Blaikie via llvm-dev

2017-Jun-15 01:14 UTC

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

On Wed, Jun 14, 2017 at 1:41 PM Peter Collingbourne <peter at pcc.me.uk>
wrote:
> On Wed, Jun 14, 2017 at 12:47 PM, David Blaikie <dblaikie at
gmail.com>
> wrote:
>
>>
>>
>> On Tue, Jun 13, 2017, 11:30 PM Peter Collingbourne <peter at
pcc.me.uk>
>> wrote:
>>
>>> On Tue, Jun 13, 2017 at 11:06 PM, David Blaikie <dblaikie at
gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Jun 13, 2017 at 10:05 PM Peter Collingbourne <peter
at pcc.me.uk>
>>>> wrote:
>>>>
>>>>> On Tue, Jun 13, 2017 at 8:48 PM, David Blaikie <dblaikie
at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 13, 2017 at 8:43 PM Peter Collingbourne
<peter at pcc.me.uk>
>>>>>> wrote:
>>>>>>
>>>>>>> On Tue, Jun 13, 2017 at 7:54 PM, David Blaikie
<dblaikie at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 13, 2017 at 6:34 PM Peter
Collingbourne via cfe-dev <
>>>>>>>> cfe-dev at lists.llvm.org> wrote:
>>>>>>>>
>>>>>>>>> On Wed, Jun 7, 2017 at 11:28 PM, Peter
Collingbourne <
>>>>>>>>> peter at pcc.me.uk> wrote:
>>>>>>>>>
>>>>>>>>>> On Wed, Jun 7, 2017 at 8:06 PM, Sean
Silva <chisophugis at gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jun 7, 2017 at 4:31 PM,
Peter Collingbourne <
>>>>>>>>>>> peter at pcc.me.uk> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jun 7, 2017 at 12:17
AM, Sean Silva <
>>>>>>>>>>>> chisophugis at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Very nice and simple
implementation!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Do you have any statistics
on how large these odr tables are
>>>>>>>>>>>>> compared to other object
file data? I assume that if these tables contain
>>>>>>>>>>>>> full mangled symbol names,
they could end up being very large and may want
>>>>>>>>>>>>> to share the symbol name
strings with the overall string table in the .o
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Looking at Chromium's
object files it looks like the total size
>>>>>>>>>>>> of the odrtabs is about 50% of
the total size of the object files, which
>>>>>>>>>>>> isn't great. The current
implementation only looks at records, so I imagine
>>>>>>>>>>>> that it would be hard to share
any of the strings that I'm currently
>>>>>>>>>>>> creating. (I guess it's
possible that some types will have a mangled vtable
>>>>>>>>>>>> name in the string table, so we
may be able to share a little that way.)
>>>>>>>>>>>> Note however that this was
without debug info.
>>>>>>>>>>>>
>>>>>>>>>>>> One option for reducing size
would be to
>>>>>>>>>>>> 1) store hashes of ODR names in
ODR tables, per Rui's
>>>>>>>>>>>> suggestion (alongside a
reference to the name itself in the string table)
>>>>>>>>>>>> 2) compress the string table
for the ODR names with a standard
>>>>>>>>>>>> compression algorithm like
gzip.
>>>>>>>>>>>> This wouldn't seem to
affect link time performance much because
>>>>>>>>>>>> I think we should only need to
look at the strings if we see a ODR name
>>>>>>>>>>>> hash match together with an ODR
hash mismatch, which would mean an ODR
>>>>>>>>>>>> violation with a high
probability (i.e. unless there was an ODR name hash
>>>>>>>>>>>> collision, we have found an ODR
violation). If we don't expect a lot of
>>>>>>>>>>>> sharing with regular string
tables (see below), it seems even more
>>>>>>>>>>>> reasonable.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Neat observation!
>>>>>>>>>>>
>>>>>>>>>>> FWIW, it is a birthday problem type
situation though, so for a
>>>>>>>>>>> 32-bit hash, we would expect a
collision in about 1 in 2^16 distinct hashes
>>>>>>>>>>> (and 2^16 seems pretty easy to hit
in a large project). So 64-bit hashes
>>>>>>>>>>> might be preferable.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Oh right, good point, using a 64-bit
hash does seem like a good
>>>>>>>>>> idea here.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Also, do you have any numbers
on the performance of your
>>>>>>>>>>>>> initial implementation?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I measured the link time for
chromium's unit_tests (the largest
>>>>>>>>>>>> single binary in chromium) at
5.05s without ODR checks and 6.61s with ODR
>>>>>>>>>>>> checks. So about 30% overhead,
but in absolute terms it doesn't seem too
>>>>>>>>>>>> bad. So I think this may be
acceptable for an initial implementation, but
>>>>>>>>>>>> it certainly seems worth trying
to do better.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I know that things aren't
currently apples-to-apples, but how
>>>>>>>>>>> does that compare to gold?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I will measure it.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> For that unit_tests binary I measured the
overhead at about 5
>>>>>>>>> seconds (average of 10 runs). That is with
debug info, of course.
>>>>>>>>>
>>>>>>>>> W.r.t. LLD and having it always on by
default (and hence making it
>>>>>>>>>>>>> as fast as possible), it
seems like right now you are implementing the
>>>>>>>>>>>>> checking process with a
hash table. That's simple and fine for a first
>>>>>>>>>>>>> implementation, but
it's probably worth mentioning in a comment the problem
>>>>>>>>>>>>> of checking the tables, at
least from the linker's perspective, does fit
>>>>>>>>>>>>> into a map-reduce pattern
and could be easily parallelized if needed. E.g.
>>>>>>>>>>>>> a parallel sort to coalesce
all entries for symbols of the same name
>>>>>>>>>>>>> followed by a parallel
forEach to check each bucket with the same symbol
>>>>>>>>>>>>> name (roughly speaking).
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Right, that's one approach.
I was thinking of a simpler
>>>>>>>>>>>> approach where at compile time
we sort ODR names by hash and partition them
>>>>>>>>>>>> using (say) the upper bits of
the hash, so that at link time we can have N
>>>>>>>>>>>> threads each building a hash
table for a specific partition.
>>>>>>>>>>>>
>>>>>>>>>>>> And of course this work can be
started right after symbol
>>>>>>>>>>>> resolution finishes and
parallelised with the rest of the work done by the
>>>>>>>>>>>> linker.
>>>>>>>>>>>>
>>>>>>>>>>>> Even better than doing it
faster is just doing less work.
>>>>>>>>>>>>> There's a lot of work
that the linker is already doing that may be reusable
>>>>>>>>>>>>> for the ODR checking.
>>>>>>>>>>>>> E.g.
>>>>>>>>>>>>> - maybe we could get the
coalescing step as a byproduct of our
>>>>>>>>>>>>> existing string deduping,
which we are generally doing anyway.
>>>>>>>>>>>>> - we are already coalescing
symbol names for the symbol table.
>>>>>>>>>>>>> If the ODR table is keyed
off of symbols in the binary that we are
>>>>>>>>>>>>> inserting into the symbol
table, then I think we could do the entire ODR
>>>>>>>>>>>>> check with no extra
"string" work on LLD's part.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I see Rui already mentioned
some of this in
>>>>>>>>>>>>>
https://bugs.chromium.org/p/chromium/issues/detail?id=726071#c4
>>>>>>>>>>>>> .
>>>>>>>>>>>>> You mentioned that not
everything is necessarily directly
>>>>>>>>>>>>> keyed on a symbol (such as
types), but I think that it would really
>>>>>>>>>>>>> simplify things if the
check was done as such. Do you have any idea exactly
>>>>>>>>>>>>> how much of the things that
we want to check are not keyed on symbols? If
>>>>>>>>>>>>> most things are keyed on
symbols, for the things we are not we can just
>>>>>>>>>>>>> emit extra symbols prefixed
by __clang_odr_check_ or whatever.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Since the current
implementation only works with records there
>>>>>>>>>>>> is basically zero overlap right
now between ODR names and symbols. I
>>>>>>>>>>>> suppose that I could estimate
the amount of function overlap in a
>>>>>>>>>>>> hypothetical implementation
that computes ODR hashes of functions by
>>>>>>>>>>>> comparing the number of *_odr
functions after clang has finished IRgen with
>>>>>>>>>>>> the number after optimization
finishes. This of course would be strictly
>>>>>>>>>>>> more than functions + types.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Wouldn't any function or symbol
using the record type have the
>>>>>>>>>>> type name somewhere in it? If we
used an offset+length encoding (instead of
>>>>>>>>>>> offset + NUL termination) we might
be able to reuse it then (at some cost
>>>>>>>>>>> in finding the reference).
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> That may be possible with some work in
the string table builder.
>>>>>>>>>> But at that point of course we're
not dealing with regular symbols any
>>>>>>>>>> more. I guess we could have two ODR
tables per object file: an array of
>>>>>>>>>> (ODR hash, location) tuples for ODR
names that correspond to symbol table
>>>>>>>>>> symbols (i.e. Rui's proposal on the
chromium bug), and an array of (ODR
>>>>>>>>>> name, ODR hash, location) tuples for
all other ODR names. I guess if we
>>>>>>>>>> wanted a "low overhead" mode
we could just omit the second table or put
>>>>>>>>>> fewer symbols in it.
>>>>>>>>>>
>>>>>>>>>> With debug info surely there is some
sort of string representing
>>>>>>>>>>> the record name or something like
that, no?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Not the record name on its own (they do
appear but a bit
>>>>>>>>>> awkwardly -- each namespace component
is stored in a separate string), but
>>>>>>>>>> if the record has at least one member
function the mangled type name will
>>>>>>>>>> appear somewhere in .debug_str, so we
could in principle reuse that with
>>>>>>>>>> the offset/length trick.
>>>>>>>>>>
>>>>>>>>>> I guess we may have to have our
"low-overhead" user-facing
>>>>>>>>>>> behavior be a bit more nuanced.
E.g.:
>>>>>>>>>>> 1. does this feature bloat object
files significantly
>>>>>>>>>>> 2. does this feature slow down link
times significantly
>>>>>>>>>>>
>>>>>>>>>>> Intuitively, it seems like we
should be able to get 1. when
>>>>>>>>>>> debug info happens to be enabled
(not sure about split dwarf?) and possibly
>>>>>>>>>>> in all cases at the cost of
complexity. We may be able to get 2. in all
>>>>>>>>>>> cases with proper design.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think that would be my rough
assessment as well. I think we
>>>>>>>>>> have a good shot at 1 for all cases
with some of the ideas that have been
>>>>>>>>>> mentioned already. If we can avoid
creating dependencies on DWARF I think
>>>>>>>>>> that would be ideal -- I'd ideally
like this to work for COFF as well,
>>>>>>>>>> where you'd typically expect to
find CodeView in object files. If I were to
>>>>>>>>>> try this I think the first thing that I
would try is hash/compression
>>>>>>>>>> combined with the two ODR tables (no
reuse for non-symbol ODR names to
>>>>>>>>>> start with, as compression may be
enough on its own).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I developed a second prototype which uses
hash/compression with no
>>>>>>>>> attempt to reuse. It is available here:
>>>>>>>>>
https://github.com/pcc/llvm-project/tree/odr-checker2
>>>>>>>>>
>>>>>>>>> For Chromium the object file size overhead
was 536566007 bytes, or
>>>>>>>>> in relative terms about 25%, or about 4%
with debug info. I measured perf
>>>>>>>>> overhead for unit_tests at about 6%, but
after I moved the checker onto
>>>>>>>>> another thread, the overhead disappeared
into the noise.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Still seems like quite a big increase.
>>>>>>>>
>>>>>>>> Any chance of compression?
>>>>>>>>
>>>>>>>
>>>>>>> That was with compression -- the implementation
compresses the parts
>>>>>>> of the ODR table that aren't hashes (aside from
the header and the Clang
>>>>>>> version, which is a small fixed cost), as well as
the string table. The
>>>>>>> hashes were left uncompressed because they are in
the critical path of the
>>>>>>> linker and because I imagine that they wouldn't
really be that compressible.
>>>>>>>
>>>>>>
>>>>>> I'd be a bit surprised if they weren't
especially compressible -
>>>>>>
>>>>>
>>>>> Maybe I'm wrong, but my intuition about compression is
that it works
>>>>> best when the data contains repeated patterns. If we use a
hash function
>>>>> with good dispersion then I'd expect each hash to have
little in common
>>>>> with other hashes.
>>>>>
>>>>>
>>>>>> and how much of the size increase is the compressed
data V the
>>>>>> uncompressed data?
>>>>>>
>>>>>
>>>>> The ratio was roughly 60% compressed data to 40%
uncompressed data.
>>>>>
>>>>>
>>>>>> Is it still in the hot path when parallelized?
>>>>>>
>>>>>
>>>>> Not right now according to my benchmarking, but
decompression could
>>>>> push it into the critical path if it ends up taking longer
than the rest of
>>>>> the work done by the linker after symbol resolution. On the
same machine
>>>>> that I used for benchmarking, gunzip'ing 200MB of
/dev/urandom (which is
>>>>> roughly what I'd expect the hashes to look like) takes
around 1.1s, i.e. a
>>>>> not insignificant fraction of lld's runtime.
>>>>>
>>>>>
>>>>>> So I think the remaining gains would either be through
limiting the
>>>>>>> number of ODR table entries, or through reuse of
data.
>>>>>>>
>>>>>>> Limiting might be something to explore -- one
possibility is that we
>>>>>>> could limit the ODR table entries to the
declarations that are "used" by a
>>>>>>> particular translation unit (it appears that Clang
tracks something like
>>>>>>> that in Decl::Used/Decl::Referenced, but I'm
not sure if that is exactly
>>>>>>> what we need -- I think we would basically need to
test for reference
>>>>>>> reachability from the functions/globals that are
IRgen'd).
>>>>>>>
>>>>>>
>>>>>> Currently it has every type and function that is in the
AST? Yeah,
>>>>>> that's a lot - perhaps it should be more like the
things that go in the
>>>>>> DWARF? (though would need to add some cases there -
since the DWARF logic
>>>>>> already relies on the ODR to not emit duplicates in
some cases)
>>>>>>
>>>>>
>>>>> Just every record declaration -- Clang only supports ODR
hashes for
>>>>> record declarations right now. I understand that function
declarations
>>>>> (including function bodies) are still works in progress.
>>>>>
>>>>> I think it should indeed just be roughly the things that go
in the
>>>>> DWARF. I think that at one point I observed that every
record declaration,
>>>>> even unused ones, were going into the DWARF, but I might
have been mistaken
>>>>> because I can no longer reproduce that. I'll take a
closer look to see if I
>>>>> can reuse what logic presumably already exists for DWARF.
>>>>>
>>>>> In terms of reuse, it seems that of the 536566007 bytes of
>>>>>>> overhead, 319309579 were the compressed part of the
ODR tables. So even if
>>>>>>> we achieved 100% sharing,
>>>>>>>
>>>>>>
>>>>>> 100% sharing? You mean if all the data were compressed,
and assuming
>>>>>> the hashes were compressible at the same ratio as the
other data?
>>>>>>
>>>>>
>>>>> Sorry, I mean if 100% of the data in the compressed part of
the ODR
>>>>> table could be eliminated by reusing data stored elsewhere
(e.g. in the
>>>>> object file string table or in the DWARF).
>>>>>
>>>>> with the current scheme I think that our minimum achievable
overhead
>>>>>>> would be ~15% (no debug info) or ~2% (with debug
info).
>>>>>>>
>>>>>>>
>>>>>>>> Could this go into .dwo files with Fission and
be checked by dwp
>>>>>>>> instead, perhaps?
>>>>>>>>
>>>>>>>
>>>>>>> I think it could also work that way, yes.
>>>>>>>
>>>>>>> I'm reasonably happy with these figures, at
least for a first
>>>>>>>>> implementation. We may be able to do even
better for file size with reuse,
>>>>>>>>> but I'd leave that for version 2.
>>>>>>>>>
>>>>>>>>
>>>>>>>> What's the story with compatibility between
versions, then? Is
>>>>>>>> there a version header?
>>>>>>>>
>>>>>>>
>>>>>>> Yes, the header contains a version number.
>>>>>>>
>>>>>>>
>>>>>>>> Will old formats be supported by lld
indefinitely? Not at all?
>>>>>>>>
>>>>>>>
>>>>>>> I think we should drop support for old formats when
we introduce a
>>>>>>> new format. My understanding is that the ODR hash
can change whenever Clang
>>>>>>> changes (the implementation only performs ODR
checking if all ODR tables
>>>>>>> were produced by the same revision of Clang), so
there wouldn't seem to be
>>>>>>> a huge benefit in keeping support for old formats
around.
>>>>>>>
>>>>>>
>>>>>> I imagine it's possible people aren't
necessarily going to rev lld in
>>>>>> exact lock-step with clang, but I could be wrong.
(certainly binutils ld or
>>>>>> gold aren't released/kept in lock-step with GCC,
for example)
>>>>>>
>>>>>
>>>>> That's certainly possible, but I'd say that the bar
for dropping
>>>>> backwards compatibility is lower because ODR tables are not
required for
>>>>> correctness. We could keep compatibility with the last
version or so if it
>>>>> isn't too burdensome, or otherwise print a warning.
>>>>>
>>>>
>>>> They aren't required for correctness, but upgrading your
compiler or
>>>> linker then silently losing ODR checking would be bad (or even
not silently
>>>> losing it, but having no choice but to rev both to keep the
functionality &
>>>> hold the ODR-cleanliness bar) - it's the sort of thing
where if you lost
>>>> the checking, then gained it back again later, the regression
cleanup would
>>>> be annoying/an impediment to using the feature.
>>>>
>>>
>>> Makes sense I guess. I'd be fine with a policy where the Nth
open source
>>> release should be able to read ODR tables produced by the N-1th and
>>> possibly the N-2th release.
>>>
>>
>> Still strikes me as a bit awkward - wonder how that compared to other
>> (similar or different) linker features.
>>
>
> I think the most similar existing feature is .gdb_index. They have already
> gone through a few format revisions:
> https://sourceware.org/gdb/onlinedocs/gdb/Index-Section-Format.html
> and have deprecated/removed support for older formats.
>
Not sure it's quite the same - that's an output of the linker &
produces a
performance degredation in GDB when not present. Here the discussion is a
format input to the linker that produces warnings, potentially errors
(Werror) for users that they might want to maintain compatibility for.
Seems like a higher bar.

Feels to me like a bit more time could be spent on a design/size impact/etc
to give a better chance of ongoing stability (and/or backcompat at least)
for users. But just a thought.

- Dave

>
> Because the requirements for ODR tables are simpler than those for
> .gdb_index, I'd expect us to converge on a final format sooner, so in
> practice the window of compatibility would end up being longer than a year.
>
> Peter
>
>
>>
>>> Any idea what Daniel Jasper & co have been working on WRT ODR
checking &
>>>> how this feature integrates or doesn't with their work? I
imagine they
>>>> might be working on something more like a Clang Tooling style
approach, but
>>>> I'm not sure.
>>>>
>>>
>>> I'm not aware of any work like that, only of Richard
Trieu's efforts for
>>> modules that I'm piggybacking on.
>>>
>>
>> +Djasper - perhaps you could provide some context on other odr
detection
>> efforts?
>>
>>
>>> Peter
>>>
>>>>
>>>> - Dave
>>>>
>>>>
>>>>>
>>>>> Peter
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>>>
>>>>>>>> - Dave
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> The issue of retaining the
ODR check for functions even if
>>>>>>>>>>>>> they get inlined may
inherently pose an extra cost that can't be folded
>>>>>>>>>>>>> into existing work the
linker is doing, so there might be a reason for
>>>>>>>>>>>>> clang to have a default
mode that has practically no linking overhead and
>>>>>>>>>>>>> one that does more thorough
checking but imposes extra linking overhead.
>>>>>>>>>>>>> Think something like a
crazy boost library with thousands of functions that
>>>>>>>>>>>>> get inlined away, but have
gigantic mangled names and so precisely are the
>>>>>>>>>>>>> ones that are going to
impose extra cost on the linker. Simply due to the
>>>>>>>>>>>>> extra volume of strings
that the linker would need to look at, I don't
>>>>>>>>>>>>> think there's a way to
include checking of all inlined function "for free"
>>>>>>>>>>>>> at the linker level using
the symbol approach.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> I guess those inlined functions
would still have those symbol
>>>>>>>>>>>>> names in debug info (I
think?), so piggybacking on the string deduplication
>>>>>>>>>>>>> we're already doing
might make it possible to fold away the work in that
>>>>>>>>>>>>> case (but then again, would
still impose extra cost with split dwarf...).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Anyway, let's wait to
see what the actual performance numbers
>>>>>>>>>>>>> are.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jun 6, 2017 at
10:40 PM, Peter Collingbourne via
>>>>>>>>>>>>> cfe-dev <cfe-dev at
lists.llvm.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'd like to propose
an ODR checker feature for Clang and LLD.
>>>>>>>>>>>>>> The feature would be
similar to gold's --detect-odr-violations feature, but
>>>>>>>>>>>>>> better: we can rely on
integration with clang to avoid relying on debug
>>>>>>>>>>>>>> info and to perform
more precise matching.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The basic idea is that
we use clang's ability to create ODR
>>>>>>>>>>>>>> hashes for
declarations. ODR hashes are computed using all information
>>>>>>>>>>>>>> about a declaration
that is ODR-relevant. If the flag
>>>>>>>>>>>>>> -fdetect-odr-violations
is passed, Clang will store the ODR hashes in a
>>>>>>>>>>>>>> so-called ODR table in
each object file. Each ODR table will contain a
>>>>>>>>>>>>>> mapping from mangled
declaration names to ODR hashes. At link time, the
>>>>>>>>>>>>>> linker will read the
ODR table and report any mismatches.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To make this work:
>>>>>>>>>>>>>> - LLVM will be extended
with the ability to represent ODR
>>>>>>>>>>>>>> tables in the IR and
emit them to object files
>>>>>>>>>>>>>> - Clang will be
extended with the ability to emit ODR tables
>>>>>>>>>>>>>> using ODR hashes
>>>>>>>>>>>>>> - LLD will be extended
to read ODR tables from object files
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have implemented a
prototype of this feature. It is
>>>>>>>>>>>>>> available here:
>>>>>>>>>>>>>>
https://github.com/pcc/llvm-project/tree/odr-checker and
>>>>>>>>>>>>>> some results from
applying it to chromium are here:
>>>>>>>>>>>>>> crbug.com/726071
>>>>>>>>>>>>>> As you can see it did
indeed find a number of real ODR
>>>>>>>>>>>>>> violations in Chromium,
including some that wouldn't be detectable using
>>>>>>>>>>>>>> debug info.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If you're
interested in what the format of the ODR table
>>>>>>>>>>>>>> would look like, that
prototype shows pretty much what I had in mind, but I
>>>>>>>>>>>>>> expect many other
aspects of the implementation to change as it is
>>>>>>>>>>>>>> upstreamed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>> cfe-dev mailing list
>>>>>>>>>>>>>> cfe-dev at
lists.llvm.org
>>>>>>>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> --
>>>>>>>>>>>> Peter
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> --
>>>>>>>>> Peter
>>>>>>>>>
_______________________________________________
>>>>>>>>> cfe-dev mailing list
>>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> --
>>>>>>> Peter
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> --
>>>>> Peter
>>>>>
>>>>
>>>
>>>
>>> --
>>> --
>>> Peter
>>>
>>
>
>
> --
> --
> Peter
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170615/66a682c3/attachment.html>

James Henderson via llvm-dev

2017-Jun-15 08:14 UTC

head link

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

I agree with Dave here. In the environment I work in, we regularly see
users ship objects and static archives to other users, and then never
update them, even though the linker moves on. If they did this with an
object file that had contents (such as the ODR information) that were later
warned about due to older versions being deprecated, the user linking that
object would probably get quite annoyed, especially as in such cases, the
original producer may no longer be able/willing to update the
object/archive.

James

On 15 June 2017 at 02:14, David Blaikie via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
>
>
> On Wed, Jun 14, 2017 at 1:41 PM Peter Collingbourne <peter at
pcc.me.uk>
> wrote:
>
>> On Wed, Jun 14, 2017 at 12:47 PM, David Blaikie <dblaikie at
gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Jun 13, 2017, 11:30 PM Peter Collingbourne <peter at
pcc.me.uk>
>>> wrote:
>>>
>>>> On Tue, Jun 13, 2017 at 11:06 PM, David Blaikie <dblaikie at
gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Jun 13, 2017 at 10:05 PM Peter Collingbourne
<peter at pcc.me.uk>
>>>>> wrote:
>>>>>
>>>>>> On Tue, Jun 13, 2017 at 8:48 PM, David Blaikie
<dblaikie at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 13, 2017 at 8:43 PM Peter Collingbourne
<peter at pcc.me.uk>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Tue, Jun 13, 2017 at 7:54 PM, David Blaikie
<dblaikie at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jun 13, 2017 at 6:34 PM Peter
Collingbourne via cfe-dev <
>>>>>>>>> cfe-dev at lists.llvm.org> wrote:
>>>>>>>>>
>>>>>>>>>> On Wed, Jun 7, 2017 at 11:28 PM, Peter
Collingbourne <
>>>>>>>>>> peter at pcc.me.uk> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Wed, Jun 7, 2017 at 8:06 PM,
Sean Silva <
>>>>>>>>>>> chisophugis at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jun 7, 2017 at 4:31 PM,
Peter Collingbourne <
>>>>>>>>>>>> peter at pcc.me.uk> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Jun 7, 2017 at
12:17 AM, Sean Silva <
>>>>>>>>>>>>> chisophugis at
gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Very nice and simple
implementation!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Do you have any
statistics on how large these odr tables are
>>>>>>>>>>>>>> compared to other
object file data? I assume that if these tables contain
>>>>>>>>>>>>>> full mangled symbol
names, they could end up being very large and may want
>>>>>>>>>>>>>> to share the symbol
name strings with the overall string table in the .o
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Looking at Chromium's
object files it looks like the total
>>>>>>>>>>>>> size of the odrtabs is
about 50% of the total size of the object files,
>>>>>>>>>>>>> which isn't great. The
current implementation only looks at records, so I
>>>>>>>>>>>>> imagine that it would be
hard to share any of the strings that I'm
>>>>>>>>>>>>> currently creating. (I
guess it's possible that some types will have a
>>>>>>>>>>>>> mangled vtable name in the
string table, so we may be able to share a
>>>>>>>>>>>>> little that way.) Note
however that this was without debug info.
>>>>>>>>>>>>>
>>>>>>>>>>>>> One option for reducing
size would be to
>>>>>>>>>>>>> 1) store hashes of ODR
names in ODR tables, per Rui's
>>>>>>>>>>>>> suggestion (alongside a
reference to the name itself in the string table)
>>>>>>>>>>>>> 2) compress the string
table for the ODR names with a standard
>>>>>>>>>>>>> compression algorithm like
gzip.
>>>>>>>>>>>>> This wouldn't seem to
affect link time performance much
>>>>>>>>>>>>> because I think we should
only need to look at the strings if we see a ODR
>>>>>>>>>>>>> name hash match together
with an ODR hash mismatch, which would mean an ODR
>>>>>>>>>>>>> violation with a high
probability (i.e. unless there was an ODR name hash
>>>>>>>>>>>>> collision, we have found an
ODR violation). If we don't expect a lot of
>>>>>>>>>>>>> sharing with regular string
tables (see below), it seems even more
>>>>>>>>>>>>> reasonable.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Neat observation!
>>>>>>>>>>>>
>>>>>>>>>>>> FWIW, it is a birthday problem
type situation though, so for a
>>>>>>>>>>>> 32-bit hash, we would expect a
collision in about 1 in 2^16 distinct hashes
>>>>>>>>>>>> (and 2^16 seems pretty easy to
hit in a large project). So 64-bit hashes
>>>>>>>>>>>> might be preferable.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Oh right, good point, using a
64-bit hash does seem like a good
>>>>>>>>>>> idea here.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Also, do you have any
numbers on the performance of your
>>>>>>>>>>>>>> initial implementation?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I measured the link time
for chromium's unit_tests (the
>>>>>>>>>>>>> largest single binary in
chromium) at 5.05s without ODR checks and 6.61s
>>>>>>>>>>>>> with ODR checks. So about
30% overhead, but in absolute terms it doesn't
>>>>>>>>>>>>> seem too bad. So I think
this may be acceptable for an initial
>>>>>>>>>>>>> implementation, but it
certainly seems worth trying to do better.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I know that things aren't
currently apples-to-apples, but how
>>>>>>>>>>>> does that compare to gold?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I will measure it.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> For that unit_tests binary I measured
the overhead at about 5
>>>>>>>>>> seconds (average of 10 runs). That is
with debug info, of course.
>>>>>>>>>>
>>>>>>>>>> W.r.t. LLD and having it always on by
default (and hence making
>>>>>>>>>>>>>> it as fast as
possible), it seems like right now you are implementing the
>>>>>>>>>>>>>> checking process with a
hash table. That's simple and fine for a first
>>>>>>>>>>>>>> implementation, but
it's probably worth mentioning in a comment the problem
>>>>>>>>>>>>>> of checking the tables,
at least from the linker's perspective, does fit
>>>>>>>>>>>>>> into a map-reduce
pattern and could be easily parallelized if needed. E.g.
>>>>>>>>>>>>>> a parallel sort to
coalesce all entries for symbols of the same name
>>>>>>>>>>>>>> followed by a parallel
forEach to check each bucket with the same symbol
>>>>>>>>>>>>>> name (roughly
speaking).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Right, that's one
approach. I was thinking of a simpler
>>>>>>>>>>>>> approach where at compile
time we sort ODR names by hash and partition them
>>>>>>>>>>>>> using (say) the upper bits
of the hash, so that at link time we can have N
>>>>>>>>>>>>> threads each building a
hash table for a specific partition.
>>>>>>>>>>>>>
>>>>>>>>>>>>> And of course this work can
be started right after symbol
>>>>>>>>>>>>> resolution finishes and
parallelised with the rest of the work done by the
>>>>>>>>>>>>> linker.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Even better than doing it
faster is just doing less work.
>>>>>>>>>>>>>> There's a lot of
work that the linker is already doing that may be reusable
>>>>>>>>>>>>>> for the ODR checking.
>>>>>>>>>>>>>> E.g.
>>>>>>>>>>>>>> - maybe we could get
the coalescing step as a byproduct of
>>>>>>>>>>>>>> our existing string
deduping, which we are generally doing anyway.
>>>>>>>>>>>>>> - we are already
coalescing symbol names for the symbol
>>>>>>>>>>>>>> table. If the ODR table
is keyed off of symbols in the binary that we are
>>>>>>>>>>>>>> inserting into the
symbol table, then I think we could do the entire ODR
>>>>>>>>>>>>>> check with no extra
"string" work on LLD's part.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I see Rui already
mentioned some of this in
>>>>>>>>>>>>>>
https://bugs.chromium.org/p/chromium/issues/detail?id>>>>>>>>>>>>>>
726071#c4.
>>>>>>>>>>>>>> You mentioned that not
everything is necessarily directly
>>>>>>>>>>>>>> keyed on a symbol (such
as types), but I think that it would really
>>>>>>>>>>>>>> simplify things if the
check was done as such. Do you have any idea exactly
>>>>>>>>>>>>>> how much of the things
that we want to check are not keyed on symbols? If
>>>>>>>>>>>>>> most things are keyed
on symbols, for the things we are not we can just
>>>>>>>>>>>>>> emit extra symbols
prefixed by __clang_odr_check_ or whatever.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Since the current
implementation only works with records there
>>>>>>>>>>>>> is basically zero overlap
right now between ODR names and symbols. I
>>>>>>>>>>>>> suppose that I could
estimate the amount of function overlap in a
>>>>>>>>>>>>> hypothetical implementation
that computes ODR hashes of functions by
>>>>>>>>>>>>> comparing the number of
*_odr functions after clang has finished IRgen with
>>>>>>>>>>>>> the number after
optimization finishes. This of course would be strictly
>>>>>>>>>>>>> more than functions +
types.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Wouldn't any function or
symbol using the record type have the
>>>>>>>>>>>> type name somewhere in it? If
we used an offset+length encoding (instead of
>>>>>>>>>>>> offset + NUL termination) we
might be able to reuse it then (at some cost
>>>>>>>>>>>> in finding the reference).
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> That may be possible with some work
in the string table builder.
>>>>>>>>>>> But at that point of course
we're not dealing with regular symbols any
>>>>>>>>>>> more. I guess we could have two ODR
tables per object file: an array of
>>>>>>>>>>> (ODR hash, location) tuples for ODR
names that correspond to symbol table
>>>>>>>>>>> symbols (i.e. Rui's proposal on
the chromium bug), and an array of (ODR
>>>>>>>>>>> name, ODR hash, location) tuples
for all other ODR names. I guess if we
>>>>>>>>>>> wanted a "low overhead"
mode we could just omit the second table or put
>>>>>>>>>>> fewer symbols in it.
>>>>>>>>>>>
>>>>>>>>>>> With debug info surely there is
some sort of string representing
>>>>>>>>>>>> the record name or something
like that, no?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Not the record name on its own
(they do appear but a bit
>>>>>>>>>>> awkwardly -- each namespace
component is stored in a separate string), but
>>>>>>>>>>> if the record has at least one
member function the mangled type name will
>>>>>>>>>>> appear somewhere in .debug_str, so
we could in principle reuse that with
>>>>>>>>>>> the offset/length trick.
>>>>>>>>>>>
>>>>>>>>>>> I guess we may have to have our
"low-overhead" user-facing
>>>>>>>>>>>> behavior be a bit more nuanced.
E.g.:
>>>>>>>>>>>> 1. does this feature bloat
object files significantly
>>>>>>>>>>>> 2. does this feature slow down
link times significantly
>>>>>>>>>>>>
>>>>>>>>>>>> Intuitively, it seems like we
should be able to get 1. when
>>>>>>>>>>>> debug info happens to be
enabled (not sure about split dwarf?) and possibly
>>>>>>>>>>>> in all cases at the cost of
complexity. We may be able to get 2. in all
>>>>>>>>>>>> cases with proper design.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I think that would be my rough
assessment as well. I think we
>>>>>>>>>>> have a good shot at 1 for all cases
with some of the ideas that have been
>>>>>>>>>>> mentioned already. If we can avoid
creating dependencies on DWARF I think
>>>>>>>>>>> that would be ideal -- I'd
ideally like this to work for COFF as well,
>>>>>>>>>>> where you'd typically expect to
find CodeView in object files. If I were to
>>>>>>>>>>> try this I think the first thing
that I would try is hash/compression
>>>>>>>>>>> combined with the two ODR tables
(no reuse for non-symbol ODR names to
>>>>>>>>>>> start with, as compression may be
enough on its own).
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I developed a second prototype which
uses hash/compression with
>>>>>>>>>> no attempt to reuse. It is available
here:
>>>>>>>>>>
https://github.com/pcc/llvm-project/tree/odr-checker2
>>>>>>>>>>
>>>>>>>>>> For Chromium the object file size
overhead was 536566007 bytes,
>>>>>>>>>> or in relative terms about 25%, or
about 4% with debug info. I measured
>>>>>>>>>> perf overhead for unit_tests at about
6%, but after I moved the checker
>>>>>>>>>> onto another thread, the overhead
disappeared into the noise.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Still seems like quite a big increase.
>>>>>>>>>
>>>>>>>>> Any chance of compression?
>>>>>>>>>
>>>>>>>>
>>>>>>>> That was with compression -- the implementation
compresses the
>>>>>>>> parts of the ODR table that aren't hashes
(aside from the header and the
>>>>>>>> Clang version, which is a small fixed cost), as
well as the string table.
>>>>>>>> The hashes were left uncompressed because they
are in the critical path of
>>>>>>>> the linker and because I imagine that they
wouldn't really be that
>>>>>>>> compressible.
>>>>>>>>
>>>>>>>
>>>>>>> I'd be a bit surprised if they weren't
especially compressible -
>>>>>>>
>>>>>>
>>>>>> Maybe I'm wrong, but my intuition about compression
is that it works
>>>>>> best when the data contains repeated patterns. If we
use a hash function
>>>>>> with good dispersion then I'd expect each hash to
have little in common
>>>>>> with other hashes.
>>>>>>
>>>>>>
>>>>>>> and how much of the size increase is the compressed
data V the
>>>>>>> uncompressed data?
>>>>>>>
>>>>>>
>>>>>> The ratio was roughly 60% compressed data to 40%
uncompressed data.
>>>>>>
>>>>>>
>>>>>>> Is it still in the hot path when parallelized?
>>>>>>>
>>>>>>
>>>>>> Not right now according to my benchmarking, but
decompression could
>>>>>> push it into the critical path if it ends up taking
longer than the rest of
>>>>>> the work done by the linker after symbol resolution. On
the same machine
>>>>>> that I used for benchmarking, gunzip'ing 200MB of
/dev/urandom (which is
>>>>>> roughly what I'd expect the hashes to look like)
takes around 1.1s, i.e. a
>>>>>> not insignificant fraction of lld's runtime.
>>>>>>
>>>>>>
>>>>>>> So I think the remaining gains would either be
through limiting the
>>>>>>>> number of ODR table entries, or through reuse
of data.
>>>>>>>>
>>>>>>>> Limiting might be something to explore -- one
possibility is that
>>>>>>>> we could limit the ODR table entries to the
declarations that are "used" by
>>>>>>>> a particular translation unit (it appears that
Clang tracks something like
>>>>>>>> that in Decl::Used/Decl::Referenced, but
I'm not sure if that is exactly
>>>>>>>> what we need -- I think we would basically need
to test for reference
>>>>>>>> reachability from the functions/globals that
are IRgen'd).
>>>>>>>>
>>>>>>>
>>>>>>> Currently it has every type and function that is in
the AST? Yeah,
>>>>>>> that's a lot - perhaps it should be more like
the things that go in the
>>>>>>> DWARF? (though would need to add some cases there -
since the DWARF logic
>>>>>>> already relies on the ODR to not emit duplicates in
some cases)
>>>>>>>
>>>>>>
>>>>>> Just every record declaration -- Clang only supports
ODR hashes for
>>>>>> record declarations right now. I understand that
function declarations
>>>>>> (including function bodies) are still works in
progress.
>>>>>>
>>>>>> I think it should indeed just be roughly the things
that go in the
>>>>>> DWARF. I think that at one point I observed that every
record declaration,
>>>>>> even unused ones, were going into the DWARF, but I
might have been mistaken
>>>>>> because I can no longer reproduce that. I'll take a
closer look to see if I
>>>>>> can reuse what logic presumably already exists for
DWARF.
>>>>>>
>>>>>> In terms of reuse, it seems that of the 536566007 bytes
of
>>>>>>>> overhead, 319309579 were the compressed part of
the ODR tables. So even if
>>>>>>>> we achieved 100% sharing,
>>>>>>>>
>>>>>>>
>>>>>>> 100% sharing? You mean if all the data were
compressed, and assuming
>>>>>>> the hashes were compressible at the same ratio as
the other data?
>>>>>>>
>>>>>>
>>>>>> Sorry, I mean if 100% of the data in the compressed
part of the ODR
>>>>>> table could be eliminated by reusing data stored
elsewhere (e.g. in the
>>>>>> object file string table or in the DWARF).
>>>>>>
>>>>>> with the current scheme I think that our minimum
achievable overhead
>>>>>>>> would be ~15% (no debug info) or ~2% (with
debug info).
>>>>>>>>
>>>>>>>>
>>>>>>>>> Could this go into .dwo files with Fission
and be checked by dwp
>>>>>>>>> instead, perhaps?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think it could also work that way, yes.
>>>>>>>>
>>>>>>>> I'm reasonably happy with these figures, at
least for a first
>>>>>>>>>> implementation. We may be able to do
even better for file size with reuse,
>>>>>>>>>> but I'd leave that for version 2.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What's the story with compatibility
between versions, then? Is
>>>>>>>>> there a version header?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, the header contains a version number.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Will old formats be supported by lld
indefinitely? Not at all?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think we should drop support for old formats
when we introduce a
>>>>>>>> new format. My understanding is that the ODR
hash can change whenever Clang
>>>>>>>> changes (the implementation only performs ODR
checking if all ODR tables
>>>>>>>> were produced by the same revision of Clang),
so there wouldn't seem to be
>>>>>>>> a huge benefit in keeping support for old
formats around.
>>>>>>>>
>>>>>>>
>>>>>>> I imagine it's possible people aren't
necessarily going to rev lld
>>>>>>> in exact lock-step with clang, but I could be
wrong. (certainly binutils ld
>>>>>>> or gold aren't released/kept in lock-step with
GCC, for example)
>>>>>>>
>>>>>>
>>>>>> That's certainly possible, but I'd say that the
bar for dropping
>>>>>> backwards compatibility is lower because ODR tables are
not required for
>>>>>> correctness. We could keep compatibility with the last
version or so if it
>>>>>> isn't too burdensome, or otherwise print a warning.
>>>>>>
>>>>>
>>>>> They aren't required for correctness, but upgrading
your compiler or
>>>>> linker then silently losing ODR checking would be bad (or
even not silently
>>>>> losing it, but having no choice but to rev both to keep the
functionality &
>>>>> hold the ODR-cleanliness bar) - it's the sort of thing
where if you lost
>>>>> the checking, then gained it back again later, the
regression cleanup would
>>>>> be annoying/an impediment to using the feature.
>>>>>
>>>>
>>>> Makes sense I guess. I'd be fine with a policy where the
Nth open
>>>> source release should be able to read ODR tables produced by
the N-1th and
>>>> possibly the N-2th release.
>>>>
>>>
>>> Still strikes me as a bit awkward - wonder how that compared to
other
>>> (similar or different) linker features.
>>>
>>
>> I think the most similar existing feature is .gdb_index. They have
>> already gone through a few format revisions:
>> https://sourceware.org/gdb/onlinedocs/gdb/Index-Section-Format.html
>> and have deprecated/removed support for older formats.
>>
>
> Not sure it's quite the same - that's an output of the linker &
produces a
> performance degredation in GDB when not present. Here the discussion is a
> format input to the linker that produces warnings, potentially errors
> (Werror) for users that they might want to maintain compatibility for.
> Seems like a higher bar.
>
> Feels to me like a bit more time could be spent on a design/size
> impact/etc to give a better chance of ongoing stability (and/or backcompat
> at least) for users. But just a thought.
>
> - Dave
>
>
>>
>> Because the requirements for ODR tables are simpler than those for
>> .gdb_index, I'd expect us to converge on a final format sooner, so
in
>> practice the window of compatibility would end up being longer than a
year.
>>
>> Peter
>>
>>
>>>
>>>> Any idea what Daniel Jasper & co have been working on WRT
ODR checking
>>>>> & how this feature integrates or doesn't with their
work? I imagine they
>>>>> might be working on something more like a Clang Tooling
style approach, but
>>>>> I'm not sure.
>>>>>
>>>>
>>>> I'm not aware of any work like that, only of Richard
Trieu's efforts
>>>> for modules that I'm piggybacking on.
>>>>
>>>
>>> +Djasper - perhaps you could provide some context on other odr
detection
>>> efforts?
>>>
>>>
>>>> Peter
>>>>
>>>>>
>>>>> - Dave
>>>>>
>>>>>
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Peter
>>>>>>>>
>>>>>>>>>
>>>>>>>>> - Dave
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Peter
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Peter
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The issue of retaining
the ODR check for functions even if
>>>>>>>>>>>>>> they get inlined may
inherently pose an extra cost that can't be folded
>>>>>>>>>>>>>> into existing work the
linker is doing, so there might be a reason for
>>>>>>>>>>>>>> clang to have a default
mode that has practically no linking overhead and
>>>>>>>>>>>>>> one that does more
thorough checking but imposes extra linking overhead.
>>>>>>>>>>>>>> Think something like a
crazy boost library with thousands of functions that
>>>>>>>>>>>>>> get inlined away, but
have gigantic mangled names and so precisely are the
>>>>>>>>>>>>>> ones that are going to
impose extra cost on the linker. Simply due to the
>>>>>>>>>>>>>> extra volume of strings
that the linker would need to look at, I don't
>>>>>>>>>>>>>> think there's a way
to include checking of all inlined function "for free"
>>>>>>>>>>>>>> at the linker level
using the symbol approach.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> I guess those inlined
functions would still have those symbol
>>>>>>>>>>>>>> names in debug info (I
think?), so piggybacking on the string deduplication
>>>>>>>>>>>>>> we're already doing
might make it possible to fold away the work in that
>>>>>>>>>>>>>> case (but then again,
would still impose extra cost with split dwarf...).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Anyway, let's wait
to see what the actual performance numbers
>>>>>>>>>>>>>> are.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jun 6, 2017 at
10:40 PM, Peter Collingbourne via
>>>>>>>>>>>>>> cfe-dev <cfe-dev at
lists.llvm.org> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'd like to
propose an ODR checker feature for Clang and
>>>>>>>>>>>>>>> LLD. The feature
would be similar to gold's --detect-odr-violations
>>>>>>>>>>>>>>> feature, but
better: we can rely on integration with clang to avoid relying
>>>>>>>>>>>>>>> on debug info and
to perform more precise matching.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The basic idea is
that we use clang's ability to create ODR
>>>>>>>>>>>>>>> hashes for
declarations. ODR hashes are computed using all information
>>>>>>>>>>>>>>> about a declaration
that is ODR-relevant. If the flag
>>>>>>>>>>>>>>>
-fdetect-odr-violations is passed, Clang will store the ODR hashes in a
>>>>>>>>>>>>>>> so-called ODR table
in each object file. Each ODR table will contain a
>>>>>>>>>>>>>>> mapping from
mangled declaration names to ODR hashes. At link time, the
>>>>>>>>>>>>>>> linker will read
the ODR table and report any mismatches.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> To make this work:
>>>>>>>>>>>>>>> - LLVM will be
extended with the ability to represent ODR
>>>>>>>>>>>>>>> tables in the IR
and emit them to object files
>>>>>>>>>>>>>>> - Clang will be
extended with the ability to emit ODR tables
>>>>>>>>>>>>>>> using ODR hashes
>>>>>>>>>>>>>>> - LLD will be
extended to read ODR tables from object files
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have implemented
a prototype of this feature. It is
>>>>>>>>>>>>>>> available here:
https://github.com/pcc/
>>>>>>>>>>>>>>>
llvm-project/tree/odr-checker and some results from
>>>>>>>>>>>>>>> applying it to
chromium are here: crbug.com/726071
>>>>>>>>>>>>>>> As you can see it
did indeed find a number of real ODR
>>>>>>>>>>>>>>> violations in
Chromium, including some that wouldn't be detectable using
>>>>>>>>>>>>>>> debug info.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If you're
interested in what the format of the ODR table
>>>>>>>>>>>>>>> would look like,
that prototype shows pretty much what I had in mind, but I
>>>>>>>>>>>>>>> expect many other
aspects of the implementation to change as it is
>>>>>>>>>>>>>>> upstreamed.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>> cfe-dev mailing
list
>>>>>>>>>>>>>>> cfe-dev at
lists.llvm.org
>>>>>>>>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> --
>>>>>>>>>>> Peter
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> Peter
>>>>>>>>>>
_______________________________________________
>>>>>>>>>> cfe-dev mailing list
>>>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> --
>>>>>>>> Peter
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> --
>>>>>> Peter
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> --
>>>> Peter
>>>>
>>>
>>
>>
>> --
>> --
>> Peter
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170615/5b7afa0f/attachment-0001.html>

Peter Collingbourne via llvm-dev

2017-Jun-15 17:20 UTC

head link

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

On Thu, Jun 15, 2017 at 1:14 AM, James Henderson <
jh7370.2008 at my.bristol.ac.uk> wrote:
> I agree with Dave here. In the environment I work in, we regularly see
> users ship objects and static archives to other users, and then never
> update them, even though the linker moves on. If they did this with an
> object file that had contents (such as the ODR information) that were later
> warned about due to older versions being deprecated, the user linking that
> object would probably get quite annoyed, especially as in such cases, the
> original producer may no longer be able/willing to update the
> object/archive.
>
I think this issue can be addressed using a combination of documentation
and a linker flag.

As I mentioned earlier, the ODR checker won't work if the object files were
built by multiple different versions of Clang, which is because ODR hashes
are computed from Clang's AST, which can change at any time. You can
imagine defining ODR hashes so that they can be computed independently of
the version of clang, or even independently of the compiler (so gcc could
generate them for example), and I have a few preliminary ideas for how you
could do that, but I think that should be outside the scope of the first
implementation.

So in your scenario, right now we would be unable to diagnose all ODR
violations if the user linking the object is also including ODR information
in their own object files, so we would probably need to warn anyway. So
what I think we should do is document that -fdiagnose-odr-violations should
not be used if you plan to ship the resulting object file. We can consider
removing this restriction later, if we ever stabilise ODR hashes (and I
think that if/when we do that, that would be the right time to consider
supporting old format versions indefinitely).

But that alone wouldn't address the scenario where someone doesn't read
the
documentation and ships the -fdiagnose-odr-violations object files anyway.
What I think we should do to support that scenario is provide a linker flag
that disables the ODR checker. Any diagnostics about version mismatches
would include the name of the flag in the output.

Peter


> James
>
> On 15 June 2017 at 02:14, David Blaikie via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>>
>>
>> On Wed, Jun 14, 2017 at 1:41 PM Peter Collingbourne <peter at
pcc.me.uk>
>> wrote:
>>
>>> On Wed, Jun 14, 2017 at 12:47 PM, David Blaikie <dblaikie at
gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Jun 13, 2017, 11:30 PM Peter Collingbourne <peter at
pcc.me.uk>
>>>> wrote:
>>>>
>>>>> On Tue, Jun 13, 2017 at 11:06 PM, David Blaikie
<dblaikie at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 13, 2017 at 10:05 PM Peter Collingbourne
<peter at pcc.me.uk>
>>>>>> wrote:
>>>>>>
>>>>>>> On Tue, Jun 13, 2017 at 8:48 PM, David Blaikie
<dblaikie at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 13, 2017 at 8:43 PM Peter
Collingbourne <
>>>>>>>> peter at pcc.me.uk> wrote:
>>>>>>>>
>>>>>>>>> On Tue, Jun 13, 2017 at 7:54 PM, David
Blaikie <dblaikie at gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 13, 2017 at 6:34 PM Peter
Collingbourne via cfe-dev <
>>>>>>>>>> cfe-dev at lists.llvm.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> On Wed, Jun 7, 2017 at 11:28 PM,
Peter Collingbourne <
>>>>>>>>>>> peter at pcc.me.uk> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jun 7, 2017 at 8:06 PM,
Sean Silva <
>>>>>>>>>>>> chisophugis at gmail.com>
wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Jun 7, 2017 at 4:31
PM, Peter Collingbourne <
>>>>>>>>>>>>> peter at pcc.me.uk>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Jun 7, 2017 at
12:17 AM, Sean Silva <
>>>>>>>>>>>>>> chisophugis at
gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Very nice and
simple implementation!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Do you have any
statistics on how large these odr tables are
>>>>>>>>>>>>>>> compared to other
object file data? I assume that if these tables contain
>>>>>>>>>>>>>>> full mangled symbol
names, they could end up being very large and may want
>>>>>>>>>>>>>>> to share the symbol
name strings with the overall string table in the .o
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Looking at
Chromium's object files it looks like the total
>>>>>>>>>>>>>> size of the odrtabs is
about 50% of the total size of the object files,
>>>>>>>>>>>>>> which isn't great.
The current implementation only looks at records, so I
>>>>>>>>>>>>>> imagine that it would
be hard to share any of the strings that I'm
>>>>>>>>>>>>>> currently creating. (I
guess it's possible that some types will have a
>>>>>>>>>>>>>> mangled vtable name in
the string table, so we may be able to share a
>>>>>>>>>>>>>> little that way.) Note
however that this was without debug info.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> One option for reducing
size would be to
>>>>>>>>>>>>>> 1) store hashes of ODR
names in ODR tables, per Rui's
>>>>>>>>>>>>>> suggestion (alongside a
reference to the name itself in the string table)
>>>>>>>>>>>>>> 2) compress the string
table for the ODR names with a
>>>>>>>>>>>>>> standard compression
algorithm like gzip.
>>>>>>>>>>>>>> This wouldn't seem
to affect link time performance much
>>>>>>>>>>>>>> because I think we
should only need to look at the strings if we see a ODR
>>>>>>>>>>>>>> name hash match
together with an ODR hash mismatch, which would mean an ODR
>>>>>>>>>>>>>> violation with a high
probability (i.e. unless there was an ODR name hash
>>>>>>>>>>>>>> collision, we have
found an ODR violation). If we don't expect a lot of
>>>>>>>>>>>>>> sharing with regular
string tables (see below), it seems even more
>>>>>>>>>>>>>> reasonable.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Neat observation!
>>>>>>>>>>>>>
>>>>>>>>>>>>> FWIW, it is a birthday
problem type situation though, so for a
>>>>>>>>>>>>> 32-bit hash, we would
expect a collision in about 1 in 2^16 distinct hashes
>>>>>>>>>>>>> (and 2^16 seems pretty easy
to hit in a large project). So 64-bit hashes
>>>>>>>>>>>>> might be preferable.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Oh right, good point, using a
64-bit hash does seem like a good
>>>>>>>>>>>> idea here.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, do you have any
numbers on the performance of your
>>>>>>>>>>>>>>> initial
implementation?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I measured the link
time for chromium's unit_tests (the
>>>>>>>>>>>>>> largest single binary
in chromium) at 5.05s without ODR checks and 6.61s
>>>>>>>>>>>>>> with ODR checks. So
about 30% overhead, but in absolute terms it doesn't
>>>>>>>>>>>>>> seem too bad. So I
think this may be acceptable for an initial
>>>>>>>>>>>>>> implementation, but it
certainly seems worth trying to do better.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I know that things
aren't currently apples-to-apples, but how
>>>>>>>>>>>>> does that compare to gold?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I will measure it.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> For that unit_tests binary I
measured the overhead at about 5
>>>>>>>>>>> seconds (average of 10 runs). That
is with debug info, of course.
>>>>>>>>>>>
>>>>>>>>>>> W.r.t. LLD and having it always on
by default (and hence making
>>>>>>>>>>>>>>> it as fast as
possible), it seems like right now you are implementing the
>>>>>>>>>>>>>>> checking process
with a hash table. That's simple and fine for a first
>>>>>>>>>>>>>>> implementation, but
it's probably worth mentioning in a comment the problem
>>>>>>>>>>>>>>> of checking the
tables, at least from the linker's perspective, does fit
>>>>>>>>>>>>>>> into a map-reduce
pattern and could be easily parallelized if needed. E.g.
>>>>>>>>>>>>>>> a parallel sort to
coalesce all entries for symbols of the same name
>>>>>>>>>>>>>>> followed by a
parallel forEach to check each bucket with the same symbol
>>>>>>>>>>>>>>> name (roughly
speaking).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Right, that's one
approach. I was thinking of a simpler
>>>>>>>>>>>>>> approach where at
compile time we sort ODR names by hash and partition them
>>>>>>>>>>>>>> using (say) the upper
bits of the hash, so that at link time we can have N
>>>>>>>>>>>>>> threads each building a
hash table for a specific partition.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> And of course this work
can be started right after symbol
>>>>>>>>>>>>>> resolution finishes and
parallelised with the rest of the work done by the
>>>>>>>>>>>>>> linker.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Even better than doing
it faster is just doing less work.
>>>>>>>>>>>>>>> There's a lot
of work that the linker is already doing that may be reusable
>>>>>>>>>>>>>>> for the ODR
checking.
>>>>>>>>>>>>>>> E.g.
>>>>>>>>>>>>>>> - maybe we could
get the coalescing step as a byproduct of
>>>>>>>>>>>>>>> our existing string
deduping, which we are generally doing anyway.
>>>>>>>>>>>>>>> - we are already
coalescing symbol names for the symbol
>>>>>>>>>>>>>>> table. If the ODR
table is keyed off of symbols in the binary that we are
>>>>>>>>>>>>>>> inserting into the
symbol table, then I think we could do the entire ODR
>>>>>>>>>>>>>>> check with no extra
"string" work on LLD's part.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I see Rui already
mentioned some of this in
>>>>>>>>>>>>>>>
https://bugs.chromium.org/p/chromium/issues/detail?id=726
>>>>>>>>>>>>>>> 071#c4.
>>>>>>>>>>>>>>> You mentioned that
not everything is necessarily directly
>>>>>>>>>>>>>>> keyed on a symbol
(such as types), but I think that it would really
>>>>>>>>>>>>>>> simplify things if
the check was done as such. Do you have any idea exactly
>>>>>>>>>>>>>>> how much of the
things that we want to check are not keyed on symbols? If
>>>>>>>>>>>>>>> most things are
keyed on symbols, for the things we are not we can just
>>>>>>>>>>>>>>> emit extra symbols
prefixed by __clang_odr_check_ or whatever.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Since the current
implementation only works with records
>>>>>>>>>>>>>> there is basically zero
overlap right now between ODR names and symbols. I
>>>>>>>>>>>>>> suppose that I could
estimate the amount of function overlap in a
>>>>>>>>>>>>>> hypothetical
implementation that computes ODR hashes of functions by
>>>>>>>>>>>>>> comparing the number of
*_odr functions after clang has finished IRgen with
>>>>>>>>>>>>>> the number after
optimization finishes. This of course would be strictly
>>>>>>>>>>>>>> more than functions +
types.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Wouldn't any function
or symbol using the record type have the
>>>>>>>>>>>>> type name somewhere in it?
If we used an offset+length encoding (instead of
>>>>>>>>>>>>> offset + NUL termination)
we might be able to reuse it then (at some cost
>>>>>>>>>>>>> in finding the reference).
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> That may be possible with some
work in the string table
>>>>>>>>>>>> builder. But at that point of
course we're not dealing with regular symbols
>>>>>>>>>>>> any more. I guess we could have
two ODR tables per object file: an array of
>>>>>>>>>>>> (ODR hash, location) tuples for
ODR names that correspond to symbol table
>>>>>>>>>>>> symbols (i.e. Rui's
proposal on the chromium bug), and an array of (ODR
>>>>>>>>>>>> name, ODR hash, location)
tuples for all other ODR names. I guess if we
>>>>>>>>>>>> wanted a "low
overhead" mode we could just omit the second table or put
>>>>>>>>>>>> fewer symbols in it.
>>>>>>>>>>>>
>>>>>>>>>>>> With debug info surely there is
some sort of string
>>>>>>>>>>>>> representing the record
name or something like that, no?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Not the record name on its own
(they do appear but a bit
>>>>>>>>>>>> awkwardly -- each namespace
component is stored in a separate string), but
>>>>>>>>>>>> if the record has at least one
member function the mangled type name will
>>>>>>>>>>>> appear somewhere in .debug_str,
so we could in principle reuse that with
>>>>>>>>>>>> the offset/length trick.
>>>>>>>>>>>>
>>>>>>>>>>>> I guess we may have to have our
"low-overhead" user-facing
>>>>>>>>>>>>> behavior be a bit more
nuanced. E.g.:
>>>>>>>>>>>>> 1. does this feature bloat
object files significantly
>>>>>>>>>>>>> 2. does this feature slow
down link times significantly
>>>>>>>>>>>>>
>>>>>>>>>>>>> Intuitively, it seems like
we should be able to get 1. when
>>>>>>>>>>>>> debug info happens to be
enabled (not sure about split dwarf?) and possibly
>>>>>>>>>>>>> in all cases at the cost of
complexity. We may be able to get 2. in all
>>>>>>>>>>>>> cases with proper design.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I think that would be my rough
assessment as well. I think we
>>>>>>>>>>>> have a good shot at 1 for all
cases with some of the ideas that have been
>>>>>>>>>>>> mentioned already. If we can
avoid creating dependencies on DWARF I think
>>>>>>>>>>>> that would be ideal -- I'd
ideally like this to work for COFF as well,
>>>>>>>>>>>> where you'd typically
expect to find CodeView in object files. If I were to
>>>>>>>>>>>> try this I think the first
thing that I would try is hash/compression
>>>>>>>>>>>> combined with the two ODR
tables (no reuse for non-symbol ODR names to
>>>>>>>>>>>> start with, as compression may
be enough on its own).
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I developed a second prototype
which uses hash/compression with
>>>>>>>>>>> no attempt to reuse. It is
available here:
>>>>>>>>>>>
https://github.com/pcc/llvm-project/tree/odr-checker2
>>>>>>>>>>>
>>>>>>>>>>> For Chromium the object file size
overhead was 536566007 bytes,
>>>>>>>>>>> or in relative terms about 25%, or
about 4% with debug info. I measured
>>>>>>>>>>> perf overhead for unit_tests at
about 6%, but after I moved the checker
>>>>>>>>>>> onto another thread, the overhead
disappeared into the noise.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Still seems like quite a big increase.
>>>>>>>>>>
>>>>>>>>>> Any chance of compression?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> That was with compression -- the
implementation compresses the
>>>>>>>>> parts of the ODR table that aren't
hashes (aside from the header and the
>>>>>>>>> Clang version, which is a small fixed
cost), as well as the string table.
>>>>>>>>> The hashes were left uncompressed because
they are in the critical path of
>>>>>>>>> the linker and because I imagine that they
wouldn't really be that
>>>>>>>>> compressible.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I'd be a bit surprised if they weren't
especially compressible -
>>>>>>>>
>>>>>>>
>>>>>>> Maybe I'm wrong, but my intuition about
compression is that it works
>>>>>>> best when the data contains repeated patterns. If
we use a hash function
>>>>>>> with good dispersion then I'd expect each hash
to have little in common
>>>>>>> with other hashes.
>>>>>>>
>>>>>>>
>>>>>>>> and how much of the size increase is the
compressed data V the
>>>>>>>> uncompressed data?
>>>>>>>>
>>>>>>>
>>>>>>> The ratio was roughly 60% compressed data to 40%
uncompressed data.
>>>>>>>
>>>>>>>
>>>>>>>> Is it still in the hot path when parallelized?
>>>>>>>>
>>>>>>>
>>>>>>> Not right now according to my benchmarking, but
decompression could
>>>>>>> push it into the critical path if it ends up taking
longer than the rest of
>>>>>>> the work done by the linker after symbol
resolution. On the same machine
>>>>>>> that I used for benchmarking, gunzip'ing 200MB
of /dev/urandom (which is
>>>>>>> roughly what I'd expect the hashes to look
like) takes around 1.1s, i.e. a
>>>>>>> not insignificant fraction of lld's runtime.
>>>>>>>
>>>>>>>
>>>>>>>> So I think the remaining gains would either be
through limiting the
>>>>>>>>> number of ODR table entries, or through
reuse of data.
>>>>>>>>>
>>>>>>>>> Limiting might be something to explore --
one possibility is that
>>>>>>>>> we could limit the ODR table entries to the
declarations that are "used" by
>>>>>>>>> a particular translation unit (it appears
that Clang tracks something like
>>>>>>>>> that in Decl::Used/Decl::Referenced, but
I'm not sure if that is exactly
>>>>>>>>> what we need -- I think we would basically
need to test for reference
>>>>>>>>> reachability from the functions/globals
that are IRgen'd).
>>>>>>>>>
>>>>>>>>
>>>>>>>> Currently it has every type and function that
is in the AST? Yeah,
>>>>>>>> that's a lot - perhaps it should be more
like the things that go in the
>>>>>>>> DWARF? (though would need to add some cases
there - since the DWARF logic
>>>>>>>> already relies on the ODR to not emit
duplicates in some cases)
>>>>>>>>
>>>>>>>
>>>>>>> Just every record declaration -- Clang only
supports ODR hashes for
>>>>>>> record declarations right now. I understand that
function declarations
>>>>>>> (including function bodies) are still works in
progress.
>>>>>>>
>>>>>>> I think it should indeed just be roughly the things
that go in the
>>>>>>> DWARF. I think that at one point I observed that
every record declaration,
>>>>>>> even unused ones, were going into the DWARF, but I
might have been mistaken
>>>>>>> because I can no longer reproduce that. I'll
take a closer look to see if I
>>>>>>> can reuse what logic presumably already exists for
DWARF.
>>>>>>>
>>>>>>> In terms of reuse, it seems that of the 536566007
bytes of
>>>>>>>>> overhead, 319309579 were the compressed
part of the ODR tables. So even if
>>>>>>>>> we achieved 100% sharing,
>>>>>>>>>
>>>>>>>>
>>>>>>>> 100% sharing? You mean if all the data were
compressed, and
>>>>>>>> assuming the hashes were compressible at the
same ratio as the other data?
>>>>>>>>
>>>>>>>
>>>>>>> Sorry, I mean if 100% of the data in the compressed
part of the ODR
>>>>>>> table could be eliminated by reusing data stored
elsewhere (e.g. in the
>>>>>>> object file string table or in the DWARF).
>>>>>>>
>>>>>>> with the current scheme I think that our minimum
achievable overhead
>>>>>>>>> would be ~15% (no debug info) or ~2% (with
debug info).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Could this go into .dwo files with
Fission and be checked by dwp
>>>>>>>>>> instead, perhaps?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think it could also work that way, yes.
>>>>>>>>>
>>>>>>>>> I'm reasonably happy with these
figures, at least for a first
>>>>>>>>>>> implementation. We may be able to
do even better for file size with reuse,
>>>>>>>>>>> but I'd leave that for version
2.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> What's the story with compatibility
between versions, then? Is
>>>>>>>>>> there a version header?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes, the header contains a version number.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Will old formats be supported by lld
indefinitely? Not at all?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think we should drop support for old
formats when we introduce a
>>>>>>>>> new format. My understanding is that the
ODR hash can change whenever Clang
>>>>>>>>> changes (the implementation only performs
ODR checking if all ODR tables
>>>>>>>>> were produced by the same revision of
Clang), so there wouldn't seem to be
>>>>>>>>> a huge benefit in keeping support for old
formats around.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I imagine it's possible people aren't
necessarily going to rev lld
>>>>>>>> in exact lock-step with clang, but I could be
wrong. (certainly binutils ld
>>>>>>>> or gold aren't released/kept in lock-step
with GCC, for example)
>>>>>>>>
>>>>>>>
>>>>>>> That's certainly possible, but I'd say that
the bar for dropping
>>>>>>> backwards compatibility is lower because ODR tables
are not required for
>>>>>>> correctness. We could keep compatibility with the
last version or so if it
>>>>>>> isn't too burdensome, or otherwise print a
warning.
>>>>>>>
>>>>>>
>>>>>> They aren't required for correctness, but upgrading
your compiler or
>>>>>> linker then silently losing ODR checking would be bad
(or even not silently
>>>>>> losing it, but having no choice but to rev both to keep
the functionality &
>>>>>> hold the ODR-cleanliness bar) - it's the sort of
thing where if you lost
>>>>>> the checking, then gained it back again later, the
regression cleanup would
>>>>>> be annoying/an impediment to using the feature.
>>>>>>
>>>>>
>>>>> Makes sense I guess. I'd be fine with a policy where
the Nth open
>>>>> source release should be able to read ODR tables produced
by the N-1th and
>>>>> possibly the N-2th release.
>>>>>
>>>>
>>>> Still strikes me as a bit awkward - wonder how that compared to
other
>>>> (similar or different) linker features.
>>>>
>>>
>>> I think the most similar existing feature is .gdb_index. They have
>>> already gone through a few format revisions:
>>> https://sourceware.org/gdb/onlinedocs/gdb/Index-Section-Format.html
>>> and have deprecated/removed support for older formats.
>>>
>>
>> Not sure it's quite the same - that's an output of the linker
& produces
>> a performance degredation in GDB when not present. Here the discussion
is a
>> format input to the linker that produces warnings, potentially errors
>> (Werror) for users that they might want to maintain compatibility for.
>> Seems like a higher bar.
>>
>> Feels to me like a bit more time could be spent on a design/size
>> impact/etc to give a better chance of ongoing stability (and/or
backcompat
>> at least) for users. But just a thought.
>>
>> - Dave
>>
>>
>>>
>>> Because the requirements for ODR tables are simpler than those for
>>> .gdb_index, I'd expect us to converge on a final format sooner,
so in
>>> practice the window of compatibility would end up being longer than
a year.
>>>
>>> Peter
>>>
>>>
>>>>
>>>>> Any idea what Daniel Jasper & co have been working on
WRT ODR checking
>>>>>> & how this feature integrates or doesn't with
their work? I imagine they
>>>>>> might be working on something more like a Clang Tooling
style approach, but
>>>>>> I'm not sure.
>>>>>>
>>>>>
>>>>> I'm not aware of any work like that, only of Richard
Trieu's efforts
>>>>> for modules that I'm piggybacking on.
>>>>>
>>>>
>>>> +Djasper - perhaps you could provide some context on other odr
>>>> detection efforts?
>>>>
>>>>
>>>>> Peter
>>>>>
>>>>>>
>>>>>> - Dave
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> - Dave
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Peter
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Peter
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The issue of
retaining the ODR check for functions even if
>>>>>>>>>>>>>>> they get inlined
may inherently pose an extra cost that can't be folded
>>>>>>>>>>>>>>> into existing work
the linker is doing, so there might be a reason for
>>>>>>>>>>>>>>> clang to have a
default mode that has practically no linking overhead and
>>>>>>>>>>>>>>> one that does more
thorough checking but imposes extra linking overhead.
>>>>>>>>>>>>>>> Think something
like a crazy boost library with thousands of functions that
>>>>>>>>>>>>>>> get inlined away,
but have gigantic mangled names and so precisely are the
>>>>>>>>>>>>>>> ones that are going
to impose extra cost on the linker. Simply due to the
>>>>>>>>>>>>>>> extra volume of
strings that the linker would need to look at, I don't
>>>>>>>>>>>>>>> think there's a
way to include checking of all inlined function "for free"
>>>>>>>>>>>>>>> at the linker level
using the symbol approach.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I guess those inlined
functions would still have those symbol
>>>>>>>>>>>>>>> names in debug info
(I think?), so piggybacking on the string deduplication
>>>>>>>>>>>>>>> we're already
doing might make it possible to fold away the work in that
>>>>>>>>>>>>>>> case (but then
again, would still impose extra cost with split dwarf...).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Anyway, let's
wait to see what the actual performance
>>>>>>>>>>>>>>> numbers are.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- Sean Silva
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jun 6, 2017
at 10:40 PM, Peter Collingbourne via
>>>>>>>>>>>>>>> cfe-dev <cfe-dev
at lists.llvm.org> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'd like to
propose an ODR checker feature for Clang and
>>>>>>>>>>>>>>>> LLD. The
feature would be similar to gold's --detect-odr-violations
>>>>>>>>>>>>>>>> feature, but
better: we can rely on integration with clang to avoid relying
>>>>>>>>>>>>>>>> on debug info
and to perform more precise matching.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The basic idea
is that we use clang's ability to create ODR
>>>>>>>>>>>>>>>> hashes for
declarations. ODR hashes are computed using all information
>>>>>>>>>>>>>>>> about a
declaration that is ODR-relevant. If the flag
>>>>>>>>>>>>>>>>
-fdetect-odr-violations is passed, Clang will store the ODR hashes in a
>>>>>>>>>>>>>>>> so-called ODR
table in each object file. Each ODR table will contain a
>>>>>>>>>>>>>>>> mapping from
mangled declaration names to ODR hashes. At link time, the
>>>>>>>>>>>>>>>> linker will
read the ODR table and report any mismatches.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> To make this
work:
>>>>>>>>>>>>>>>> - LLVM will be
extended with the ability to represent ODR
>>>>>>>>>>>>>>>> tables in the
IR and emit them to object files
>>>>>>>>>>>>>>>> - Clang will be
extended with the ability to emit ODR
>>>>>>>>>>>>>>>> tables using
ODR hashes
>>>>>>>>>>>>>>>> - LLD will be
extended to read ODR tables from object files
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have
implemented a prototype of this feature. It is
>>>>>>>>>>>>>>>> available here:
https://github.com/pcc/l
>>>>>>>>>>>>>>>>
lvm-project/tree/odr-checker and some results from
>>>>>>>>>>>>>>>> applying it to
chromium are here: crbug.com/726071
>>>>>>>>>>>>>>>> As you can see
it did indeed find a number of real ODR
>>>>>>>>>>>>>>>> violations in
Chromium, including some that wouldn't be detectable using
>>>>>>>>>>>>>>>> debug info.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If you're
interested in what the format of the ODR table
>>>>>>>>>>>>>>>> would look
like, that prototype shows pretty much what I had in mind, but I
>>>>>>>>>>>>>>>> expect many
other aspects of the implementation to change as it is
>>>>>>>>>>>>>>>> upstreamed.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>>>>>> cfe-dev mailing
list
>>>>>>>>>>>>>>>> cfe-dev at
lists.llvm.org
>>>>>>>>>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> --
>>>>>>>>>>>> Peter
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> --
>>>>>>>>>>> Peter
>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>> cfe-dev mailing list
>>>>>>>>>>> cfe-dev at lists.llvm.org
>>>>>>>>>>>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> --
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> --
>>>>>>> Peter
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> --
>>>>> Peter
>>>>>
>>>>
>>>
>>>
>>> --
>>> --
>>> Peter
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>

-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170615/b06a970b/attachment.html>

Seemingly Similar Threads

Search for more apparently analagous threads

llvm dev - Jun 2017 - [cfe-dev] RFC: ODR checker for Clang and LLD

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

[llvm-dev] [cfe-dev] RFC: ODR checker for Clang and LLD

Seemingly Similar Threads