thr3ads.net - llvm dev - [llvm-dev] Proposal for address-significance tables for --icf=safe [May 2018]

If this information is useful, please help other people find it:
Share via:

Peter Collingbourne via llvm-dev

2018-May-22 22:06 UTC

[llvm-dev] Proposal for address-significance tables for --icf=safe

Hi all,

Context: ld.gold has an --icf=safe flag which is intended to apply ICF only
to sections which can be safely merged according to the guarantees provided
by the language. It works using a set of heuristics (symbol name matching
and relocation scanning). That's not only imprecise but it only works with
certain languages and is slow due to the need to demangle symbols and scan
relocations. It's also redundant with the (local_)unnamed_addr analysis
already performed by LLVM.

I implemented an alternative to this approach in clang and lld. It works by
adding a section to each object file containing the indexes of the symbols
which are address-significant (i.e. not (local_)unnamed_addr in IR).

I used this implementation to link clang with release+asserts with each of
--icf={none,safe,all}. The binary sizes were:

none: 109407184
safe: 108534736 (-0.8%)
all: 107281360 (-2%)

I measured the object file overhead of these sections in my clang build at
0.08%. That's almost nothing, and I think it's small enough that we can
turn it on by default.

I've uploaded a patch series for this feature here:
https://github.com/pcc/llvm-project/tree/llvm-addrsig
I intend to start sending it for review soon.

Thanks,
-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180522/0df20999/attachment.html>

Peter Smith via llvm-dev

2018-May-23 10:25 UTC

head link

[llvm-dev] Proposal for address-significance tables for --icf=safe

Hello,

I think that the approach of using a section to record address
significance is a good one. I'm guessing it will have its own section
type and format? If it does would it make sense to try and submit this
to the GABI https://groups.google.com/forum/#!forum/generic-abi as it
could be potentially useful for other linkers, for example gold?

Happy to help out with reviews.

Peter


On 22 May 2018 at 23:06, Peter Collingbourne via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Hi all,
>
> Context: ld.gold has an --icf=safe flag which is intended to apply ICF only
> to sections which can be safely merged according to the guarantees provided
> by the language. It works using a set of heuristics (symbol name matching
> and relocation scanning). That's not only imprecise but it only works
with
> certain languages and is slow due to the need to demangle symbols and scan
> relocations. It's also redundant with the (local_)unnamed_addr analysis
> already performed by LLVM.
>
> I implemented an alternative to this approach in clang and lld. It works by
> adding a section to each object file containing the indexes of the symbols
> which are address-significant (i.e. not (local_)unnamed_addr in IR).
>
> I used this implementation to link clang with release+asserts with each of
> --icf={none,safe,all}. The binary sizes were:
>
> none: 109407184
> safe: 108534736 (-0.8%)
> all: 107281360 (-2%)
>
> I measured the object file overhead of these sections in my clang build at
> 0.08%. That's almost nothing, and I think it's small enough that we
can turn
> it on by default.
>
> I've uploaded a patch series for this feature here:
> https://github.com/pcc/llvm-project/tree/llvm-addrsig
> I intend to start sending it for review soon.
>
> Thanks,
> --
> --
> Peter
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Hans Wennborg via llvm-dev

2018-May-23 14:44 UTC

head link

[llvm-dev] Proposal for address-significance tables for --icf=safe

On Wed, May 23, 2018 at 12:06 AM, Peter Collingbourne via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Hi all,
>
> Context: ld.gold has an --icf=safe flag which is intended to apply ICF only
> to sections which can be safely merged according to the guarantees provided
> by the language. It works using a set of heuristics (symbol name matching
> and relocation scanning). That's not only imprecise but it only works
with
> certain languages and is slow due to the need to demangle symbols and scan
> relocations. It's also redundant with the (local_)unnamed_addr analysis
> already performed by LLVM.
>
> I implemented an alternative to this approach in clang and lld. It works by
> adding a section to each object file containing the indexes of the symbols
> which are address-significant (i.e. not (local_)unnamed_addr in IR).
>
> I used this implementation to link clang with release+asserts with each of
> --icf={none,safe,all}. The binary sizes were:
>
> none: 109407184
> safe: 108534736 (-0.8%)
> all: 107281360 (-2%)
>
> I measured the object file overhead of these sections in my clang build at
> 0.08%. That's almost nothing, and I think it's small enough that we
can turn
> it on by default.
>
> I've uploaded a patch series for this feature here:
> https://github.com/pcc/llvm-project/tree/llvm-addrsig
> I intend to start sending it for review soon.
Very nice!

I was going to ask how this plays with object files compiled with
something other than LLVM, but now I see you assume all symbols are
address-significant if there's no address-significance table in the
file. It all seems very sensible to me :-)

James Y Knight via llvm-dev

2018-May-23 15:15 UTC

head link

[llvm-dev] Proposal for address-significance tables for --icf=safe

On Tue, May 22, 2018 at 6:06 PM Peter Collingbourne via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi all,
>
> Context: ld.gold has an --icf=safe flag which is intended to apply ICF
> only to sections which can be safely merged according to the guarantees
> provided by the language. It works using a set of heuristics (symbol name
> matching and relocation scanning). That's not only imprecise but it
only
> works with certain languages and is slow due to the need to demangle
> symbols and scan relocations. It's also redundant with the
> (local_)unnamed_addr analysis already performed by LLVM.
>
> I implemented an alternative to this approach in clang and lld. It works
> by adding a section to each object file containing the indexes of the
> symbols which are address-significant (i.e. not (local_)unnamed_addr in
IR).
>
> I used this implementation to link clang with release+asserts with each of
> --icf={none,safe,all}. The binary sizes were:
>
> none: 109407184
> safe: 108534736 (-0.8%)
> all: 107281360 (-2%)
>
> I measured the object file overhead of these sections in my clang build at
> 0.08%. That's almost nothing, and I think it's small enough that we
can
> turn it on by default.
>
> I've uploaded a patch series for this feature here:
> https://github.com/pcc/llvm-project/tree/llvm-addrsig
> I intend to start sending it for review soon.
>
This sounds like a nice idea, but it'd be great to put in some effort to
see if we can get this done in a cross-toolchain collaborative manner,
instead of llvm-specific. The need is clearly generic, after all.

I'm a bit worried of the scheme of emitting symbol indexes into a
section. AFAIK there is nothing else in ELF which puts symbol indexes in
data at the moment (only in relocations and section headers). In
particular, if anyone were to use a tool which rewrites the symbol table,
it'll break things, unless that tool knows about this special section.

I wonder if it's possible to put the data in the symbol table.
Unfortunately, there's not a whole lot of available space there...

The "st_other" field has some space available -- only the bottom 2
bits are
currently used in general for visibility, so one could imagine adding a
flag at 0x04, perhaps, for this indicator. Unfortunately, the bits of this
field are widely used by a variety of architecture-specific things, which
makes that rather more complicated. MIPS uses almost all the remaining bits
in that field, but seemingly not bit 3. PowerPC uses bits 5-8 for the
local-call optimization, leaving bits 3-4. Alpha uses bits 4 and 8 for what
I think may be a similar optimization. IA64 OpenVMS uses bits 5-8 for
additional function-type and linkage annotations. m68k uses bits 7-8 for
identifying "far" functions and interrupt handlers.

So...that might be viable -- just barely --  but even after suggesting
that, I'd not really want to argue for it. =)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180523/770c44ab/attachment.html>

Peter Collingbourne via llvm-dev

2018-May-23 19:06 UTC

head link

[llvm-dev] Proposal for address-significance tables for --icf=safe

On Wed, May 23, 2018 at 8:15 AM, James Y Knight <jyknight at google.com>
wrote:
> On Tue, May 22, 2018 at 6:06 PM Peter Collingbourne via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi all,
>>
>> Context: ld.gold has an --icf=safe flag which is intended to apply ICF
>> only to sections which can be safely merged according to the guarantees
>> provided by the language. It works using a set of heuristics (symbol
name
>> matching and relocation scanning). That's not only imprecise but it
only
>> works with certain languages and is slow due to the need to demangle
>> symbols and scan relocations. It's also redundant with the
>> (local_)unnamed_addr analysis already performed by LLVM.
>>
>> I implemented an alternative to this approach in clang and lld. It
works
>> by adding a section to each object file containing the indexes of the
>> symbols which are address-significant (i.e. not (local_)unnamed_addr in
IR).
>>
>> I used this implementation to link clang with release+asserts with each
>> of --icf={none,safe,all}. The binary sizes were:
>>
>> none: 109407184
>> safe: 108534736 (-0.8%)
>> all: 107281360 (-2%)
>>
>> I measured the object file overhead of these sections in my clang build
>> at 0.08%. That's almost nothing, and I think it's small enough
that we can
>> turn it on by default.
>>
>> I've uploaded a patch series for this feature here:
>> https://github.com/pcc/llvm-project/tree/llvm-addrsig
>> I intend to start sending it for review soon.
>>
>
> This sounds like a nice idea, but it'd be great to put in some effort
to
> see if we can get this done in a cross-toolchain collaborative manner,
> instead of llvm-specific. The need is clearly generic, after all.
>
>Peter Smith has suggested making a proposal to generic-abi and that
certainly seems reasonable, but there seems to be a practical problem
there: the generic-abi is unmaintained and there doesn't seem to be an
authority responsible for assigning section numbers (see the recent
SHT_RELR thread for example). Since this proposal requires a new section
number I would suggest that we make the proposal to generic-abi,
incorporate any design feedback from there and proceed with a section
number in the LLVM namespace until the generic-abi gets a maintainer, at
which point we can change lld to accept both section numbers or maybe just
the generic one (since reading the section is optional).

I'm a bit worried of the scheme of emitting symbol indexes into
a> section. AFAIK there is nothing else in ELF which puts symbol indexes in
> data at the moment (only in relocations and section headers). In
> particular, if anyone were to use a tool which rewrites the symbol table,
> it'll break things, unless that tool knows about this special section.
>
The design accounts for this :)

To begin with, in practice I don't think we can get this right for every
conceivable tool, because the tool could put something in the object file
that would invalidate the metadata by making a symbol address-significant.
For example, I can use ld -r to combine a metadata-containing object file
defining a function foo with a non-metadata-containing object file defining
a function bar that returns the address of foo, which would invalidate the
metadata for foo. So I think the best that we can hope for is to arrange
for most tools to "naturally" invalidate the metadata.

There turns out to be a way to do this: most tools will reset the sh_link
field of an unrecognized section to zero if that sh_link field points to
the .symtab section. GNU objcopy, ld.bfd -r, ld.gold -r and ld.lld -r all
do this. (It looks like llvm-objcopy will preserve the sh_link, but we can
fix that.) So what we can do is make the sh_link in our section point to
.symtab and use sh_link=0 as a signal that a tool has operated on the
object file, and therefore ignore the section. This resetting of sh_link
for unrecognized sections doesn't appear to be required by the generic-abi,
so this is probably something that we'd want to bring up there in addition
to the section itself. Also, this should hopefully become a non-problem
once this proposal makes its way into either the generic ABI or GNU-gABI
and tools learn about the section.

> I wonder if it's possible to put the data in the symbol table.
> Unfortunately, there's not a whole lot of available space there...
>
> The "st_other" field has some space available -- only the bottom
2 bits
> are currently used in general for visibility, so one could imagine adding a
> flag at 0x04, perhaps, for this indicator. Unfortunately, the bits of this
> field are widely used by a variety of architecture-specific things, which
> makes that rather more complicated. MIPS uses almost all the remaining bits
> in that field, but seemingly not bit 3. PowerPC uses bits 5-8 for the
> local-call optimization, leaving bits 3-4. Alpha uses bits 4 and 8 for what
> I think may be a similar optimization. IA64 OpenVMS uses bits 5-8 for
> additional function-type and linkage annotations. m68k uses bits 7-8 for
> identifying "far" functions and interrupt handlers.
>
> So...that might be viable -- just barely --  but even after suggesting
> that, I'd not really want to argue for it. =)
>
>I thought about using st_other for this, but I came to the same conclusion
that it would probably be too hard to stake out a bit with the
architecture-specific things going on there. From the looks of things it
looks like MIPS is (somewhat gratuitously in the case of STO_MIPS_MIPS16)
using all of the "unused" bits: http://llvm-cs.pcc.me.uk
/include/llvm/BinaryFormat/ELF.h#554
So if we did something with st_other we'd probably need to exclude MIPS and
maybe other architectures and get tools to do the right thing (which seems
harder than with the sh_link trick).

Thanks,
-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180523/139a5f60/attachment.html>

Peter Collingbourne via llvm-dev

2018-May-23 19:16 UTC

head link

[llvm-dev] Proposal for address-significance tables for --icf=safe

On Wed, May 23, 2018 at 3:25 AM, Peter Smith <peter.smith at linaro.org>
wrote:
> Hello,
>
> I think that the approach of using a section to record address
> significance is a good one. I'm guessing it will have its own section
> type and format? If it does would it make sense to try and submit this
> to the GABI https://groups.google.com/forum/#!forum/generic-abi as it
> could be potentially useful for other linkers, for example gold?
>
Yes, there is a new section type (SHT_LLVM_ADDRSIG) and format (a sequence
of ULEB128-encoded symbol table indexes that are address-significant).

I think it makes sense for this to eventually be part of the generic ABI,
and I will send a proposal to generic-abi. As I mentioned in my reply to
James Knight, I don't think we should block on getting a section number
assignment, but we can at least incorporate any design feedback from that
proposal.

Peter


> Happy to help out with reviews.
>
> Peter
>
>
> On 22 May 2018 at 23:06, Peter Collingbourne via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> > Hi all,
> >
> > Context: ld.gold has an --icf=safe flag which is intended to apply ICF
> only
> > to sections which can be safely merged according to the guarantees
> provided
> > by the language. It works using a set of heuristics (symbol name
matching
> > and relocation scanning). That's not only imprecise but it only
works
> with
> > certain languages and is slow due to the need to demangle symbols and
> scan
> > relocations. It's also redundant with the (local_)unnamed_addr
analysis
> > already performed by LLVM.
> >
> > I implemented an alternative to this approach in clang and lld. It
works
> by
> > adding a section to each object file containing the indexes of the
> symbols
> > which are address-significant (i.e. not (local_)unnamed_addr in IR).
> >
> > I used this implementation to link clang with release+asserts with
each
> of
> > --icf={none,safe,all}. The binary sizes were:
> >
> > none: 109407184
> > safe: 108534736 (-0.8%)
> > all: 107281360 (-2%)
> >
> > I measured the object file overhead of these sections in my clang
build
> at
> > 0.08%. That's almost nothing, and I think it's small enough
that we can
> turn
> > it on by default.
> >
> > I've uploaded a patch series for this feature here:
> > https://github.com/pcc/llvm-project/tree/llvm-addrsig
> > I intend to start sending it for review soon.
> >
> > Thanks,
> > --
> > --
> > Peter
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
>


-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180523/a7a7420a/attachment.html>

bd1976 llvm via llvm-dev

2018-May-31 20:52 UTC

head link

[llvm-dev] Proposal for address-significance tables for --icf=safe

Hi Peter, This is a great proposal, thanks!.

If you were worried about making the abi change have you
thought about just going for an array of symbol names
or hashes of symbol names where any matching symbol is
considered address significant? This would sidestep the
problem of keeping the symbol table indices in sync.

It would be pessimistic for local symbols if the input
SHT_ADDRSIG sections were combined by e.g. "ld -r" but
in my experience this should not have too much of an
impact on --icf.

Might be worth considering in the short term whilst you
work on getting gabi acceptance.

On Tue, May 22, 2018 at 11:06 PM, Peter Collingbourne via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi all,
>
> Context: ld.gold has an --icf=safe flag which is intended to apply ICF
> only to sections which can be safely merged according to the guarantees
> provided by the language. It works using a set of heuristics (symbol name
> matching and relocation scanning). That's not only imprecise but it
only
> works with certain languages and is slow due to the need to demangle
> symbols and scan relocations. It's also redundant with the
> (local_)unnamed_addr analysis already performed by LLVM.
>
> I implemented an alternative to this approach in clang and lld. It works
> by adding a section to each object file containing the indexes of the
> symbols which are address-significant (i.e. not (local_)unnamed_addr in
IR).
>
> I used this implementation to link clang with release+asserts with each of
> --icf={none,safe,all}. The binary sizes were:
>
> none: 109407184
> safe: 108534736 (-0.8%)
> all: 107281360 (-2%)
>
> I measured the object file overhead of these sections in my clang build at
> 0.08%. That's almost nothing, and I think it's small enough that we
can
> turn it on by default.
>
> I've uploaded a patch series for this feature here:
> https://github.com/pcc/llvm-project/tree/llvm-addrsig
> I intend to start sending it for review soon.
>
> Thanks,
> --
> --
> Peter
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180531/77e6efce/attachment.html>

Peter Collingbourne via llvm-dev

2018-May-31 21:05 UTC

head link

[llvm-dev] Proposal for address-significance tables for --icf=safe

The hash approach was suggested by others as well, but I think for now we
can use the sh_link field until the tools are updated -- that was the
recommended approach on the generic-abi thread as well. Keep in mind that
updating the gABI is really orthogonal to the compatibility issue: even
with an updated gABI we'd still have the practical problem of needing to
deal with old tools somehow.

Agreed that we'd be fine pessimising ld -r -- the main thing that I'm
concerned about is avoiding miscompiles caused by shuffling symbols.

Peter

On Thu, May 31, 2018 at 1:52 PM, bd1976 llvm <bd1976llvm at gmail.com>
wrote:
> Hi Peter, This is a great proposal, thanks!.
>
> If you were worried about making the abi change have you
> thought about just going for an array of symbol names
> or hashes of symbol names where any matching symbol is
> considered address significant? This would sidestep the
> problem of keeping the symbol table indices in sync.
>
> It would be pessimistic for local symbols if the input
> SHT_ADDRSIG sections were combined by e.g. "ld -r" but
> in my experience this should not have too much of an
> impact on --icf.
>
> Might be worth considering in the short term whilst you
> work on getting gabi acceptance.
>
> On Tue, May 22, 2018 at 11:06 PM, Peter Collingbourne via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi all,
>>
>> Context: ld.gold has an --icf=safe flag which is intended to apply ICF
>> only to sections which can be safely merged according to the guarantees
>> provided by the language. It works using a set of heuristics (symbol
name
>> matching and relocation scanning). That's not only imprecise but it
only
>> works with certain languages and is slow due to the need to demangle
>> symbols and scan relocations. It's also redundant with the
>> (local_)unnamed_addr analysis already performed by LLVM.
>>
>> I implemented an alternative to this approach in clang and lld. It
works
>> by adding a section to each object file containing the indexes of the
>> symbols which are address-significant (i.e. not (local_)unnamed_addr in
IR).
>>
>> I used this implementation to link clang with release+asserts with each
>> of --icf={none,safe,all}. The binary sizes were:
>>
>> none: 109407184
>> safe: 108534736 (-0.8%)
>> all: 107281360 (-2%)
>>
>> I measured the object file overhead of these sections in my clang build
>> at 0.08%. That's almost nothing, and I think it's small enough
that we can
>> turn it on by default.
>>
>> I've uploaded a patch series for this feature here:
>> https://github.com/pcc/llvm-project/tree/llvm-addrsig
>> I intend to start sending it for review soon.
>>
>> Thanks,
>> --
>> --
>> Peter
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>

-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180531/c59bda9f/attachment.html>

Reasonably Related Threads

Search for more apparently analagous threads

llvm dev - May 2018 - Proposal for address-significance tables for --icf=safe

[llvm-dev] Proposal for address-significance tables for --icf=safe

[llvm-dev] Proposal for address-significance tables for --icf=safe

[llvm-dev] Proposal for address-significance tables for --icf=safe

[llvm-dev] Proposal for address-significance tables for --icf=safe

[llvm-dev] Proposal for address-significance tables for --icf=safe

[llvm-dev] Proposal for address-significance tables for --icf=safe

[llvm-dev] Proposal for address-significance tables for --icf=safe

[llvm-dev] Proposal for address-significance tables for --icf=safe

Reasonably Related Threads