Peter Collingbourne via llvm-dev
2018-May-22 22:06 UTC
[llvm-dev] Proposal for address-significance tables for --icf=safe
Hi all, Context: ld.gold has an --icf=safe flag which is intended to apply ICF only to sections which can be safely merged according to the guarantees provided by the language. It works using a set of heuristics (symbol name matching and relocation scanning). That's not only imprecise but it only works with certain languages and is slow due to the need to demangle symbols and scan relocations. It's also redundant with the (local_)unnamed_addr analysis already performed by LLVM. I implemented an alternative to this approach in clang and lld. It works by adding a section to each object file containing the indexes of the symbols which are address-significant (i.e. not (local_)unnamed_addr in IR). I used this implementation to link clang with release+asserts with each of --icf={none,safe,all}. The binary sizes were: none: 109407184 safe: 108534736 (-0.8%) all: 107281360 (-2%) I measured the object file overhead of these sections in my clang build at 0.08%. That's almost nothing, and I think it's small enough that we can turn it on by default. I've uploaded a patch series for this feature here: https://github.com/pcc/llvm-project/tree/llvm-addrsig I intend to start sending it for review soon. Thanks, -- -- Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180522/0df20999/attachment.html>
Peter Smith via llvm-dev
2018-May-23 10:25 UTC
[llvm-dev] Proposal for address-significance tables for --icf=safe
Hello, I think that the approach of using a section to record address significance is a good one. I'm guessing it will have its own section type and format? If it does would it make sense to try and submit this to the GABI https://groups.google.com/forum/#!forum/generic-abi as it could be potentially useful for other linkers, for example gold? Happy to help out with reviews. Peter On 22 May 2018 at 23:06, Peter Collingbourne via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Hi all, > > Context: ld.gold has an --icf=safe flag which is intended to apply ICF only > to sections which can be safely merged according to the guarantees provided > by the language. It works using a set of heuristics (symbol name matching > and relocation scanning). That's not only imprecise but it only works with > certain languages and is slow due to the need to demangle symbols and scan > relocations. It's also redundant with the (local_)unnamed_addr analysis > already performed by LLVM. > > I implemented an alternative to this approach in clang and lld. It works by > adding a section to each object file containing the indexes of the symbols > which are address-significant (i.e. not (local_)unnamed_addr in IR). > > I used this implementation to link clang with release+asserts with each of > --icf={none,safe,all}. The binary sizes were: > > none: 109407184 > safe: 108534736 (-0.8%) > all: 107281360 (-2%) > > I measured the object file overhead of these sections in my clang build at > 0.08%. That's almost nothing, and I think it's small enough that we can turn > it on by default. > > I've uploaded a patch series for this feature here: > https://github.com/pcc/llvm-project/tree/llvm-addrsig > I intend to start sending it for review soon. > > Thanks, > -- > -- > Peter > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Hans Wennborg via llvm-dev
2018-May-23 14:44 UTC
[llvm-dev] Proposal for address-significance tables for --icf=safe
On Wed, May 23, 2018 at 12:06 AM, Peter Collingbourne via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Hi all, > > Context: ld.gold has an --icf=safe flag which is intended to apply ICF only > to sections which can be safely merged according to the guarantees provided > by the language. It works using a set of heuristics (symbol name matching > and relocation scanning). That's not only imprecise but it only works with > certain languages and is slow due to the need to demangle symbols and scan > relocations. It's also redundant with the (local_)unnamed_addr analysis > already performed by LLVM. > > I implemented an alternative to this approach in clang and lld. It works by > adding a section to each object file containing the indexes of the symbols > which are address-significant (i.e. not (local_)unnamed_addr in IR). > > I used this implementation to link clang with release+asserts with each of > --icf={none,safe,all}. The binary sizes were: > > none: 109407184 > safe: 108534736 (-0.8%) > all: 107281360 (-2%) > > I measured the object file overhead of these sections in my clang build at > 0.08%. That's almost nothing, and I think it's small enough that we can turn > it on by default. > > I've uploaded a patch series for this feature here: > https://github.com/pcc/llvm-project/tree/llvm-addrsig > I intend to start sending it for review soon.Very nice! I was going to ask how this plays with object files compiled with something other than LLVM, but now I see you assume all symbols are address-significant if there's no address-significance table in the file. It all seems very sensible to me :-)
James Y Knight via llvm-dev
2018-May-23 15:15 UTC
[llvm-dev] Proposal for address-significance tables for --icf=safe
On Tue, May 22, 2018 at 6:06 PM Peter Collingbourne via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi all, > > Context: ld.gold has an --icf=safe flag which is intended to apply ICF > only to sections which can be safely merged according to the guarantees > provided by the language. It works using a set of heuristics (symbol name > matching and relocation scanning). That's not only imprecise but it only > works with certain languages and is slow due to the need to demangle > symbols and scan relocations. It's also redundant with the > (local_)unnamed_addr analysis already performed by LLVM. > > I implemented an alternative to this approach in clang and lld. It works > by adding a section to each object file containing the indexes of the > symbols which are address-significant (i.e. not (local_)unnamed_addr in IR). > > I used this implementation to link clang with release+asserts with each of > --icf={none,safe,all}. The binary sizes were: > > none: 109407184 > safe: 108534736 (-0.8%) > all: 107281360 (-2%) > > I measured the object file overhead of these sections in my clang build at > 0.08%. That's almost nothing, and I think it's small enough that we can > turn it on by default. > > I've uploaded a patch series for this feature here: > https://github.com/pcc/llvm-project/tree/llvm-addrsig > I intend to start sending it for review soon. >This sounds like a nice idea, but it'd be great to put in some effort to see if we can get this done in a cross-toolchain collaborative manner, instead of llvm-specific. The need is clearly generic, after all. I'm a bit worried of the scheme of emitting symbol indexes into a section. AFAIK there is nothing else in ELF which puts symbol indexes in data at the moment (only in relocations and section headers). In particular, if anyone were to use a tool which rewrites the symbol table, it'll break things, unless that tool knows about this special section. I wonder if it's possible to put the data in the symbol table. Unfortunately, there's not a whole lot of available space there... The "st_other" field has some space available -- only the bottom 2 bits are currently used in general for visibility, so one could imagine adding a flag at 0x04, perhaps, for this indicator. Unfortunately, the bits of this field are widely used by a variety of architecture-specific things, which makes that rather more complicated. MIPS uses almost all the remaining bits in that field, but seemingly not bit 3. PowerPC uses bits 5-8 for the local-call optimization, leaving bits 3-4. Alpha uses bits 4 and 8 for what I think may be a similar optimization. IA64 OpenVMS uses bits 5-8 for additional function-type and linkage annotations. m68k uses bits 7-8 for identifying "far" functions and interrupt handlers. So...that might be viable -- just barely -- but even after suggesting that, I'd not really want to argue for it. =) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180523/770c44ab/attachment.html>
Peter Collingbourne via llvm-dev
2018-May-23 19:06 UTC
[llvm-dev] Proposal for address-significance tables for --icf=safe
On Wed, May 23, 2018 at 8:15 AM, James Y Knight <jyknight at google.com> wrote:> On Tue, May 22, 2018 at 6:06 PM Peter Collingbourne via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi all, >> >> Context: ld.gold has an --icf=safe flag which is intended to apply ICF >> only to sections which can be safely merged according to the guarantees >> provided by the language. It works using a set of heuristics (symbol name >> matching and relocation scanning). That's not only imprecise but it only >> works with certain languages and is slow due to the need to demangle >> symbols and scan relocations. It's also redundant with the >> (local_)unnamed_addr analysis already performed by LLVM. >> >> I implemented an alternative to this approach in clang and lld. It works >> by adding a section to each object file containing the indexes of the >> symbols which are address-significant (i.e. not (local_)unnamed_addr in IR). >> >> I used this implementation to link clang with release+asserts with each >> of --icf={none,safe,all}. The binary sizes were: >> >> none: 109407184 >> safe: 108534736 (-0.8%) >> all: 107281360 (-2%) >> >> I measured the object file overhead of these sections in my clang build >> at 0.08%. That's almost nothing, and I think it's small enough that we can >> turn it on by default. >> >> I've uploaded a patch series for this feature here: >> https://github.com/pcc/llvm-project/tree/llvm-addrsig >> I intend to start sending it for review soon. >> > > This sounds like a nice idea, but it'd be great to put in some effort to > see if we can get this done in a cross-toolchain collaborative manner, > instead of llvm-specific. The need is clearly generic, after all. > >Peter Smith has suggested making a proposal to generic-abi and that certainly seems reasonable, but there seems to be a practical problem there: the generic-abi is unmaintained and there doesn't seem to be an authority responsible for assigning section numbers (see the recent SHT_RELR thread for example). Since this proposal requires a new section number I would suggest that we make the proposal to generic-abi, incorporate any design feedback from there and proceed with a section number in the LLVM namespace until the generic-abi gets a maintainer, at which point we can change lld to accept both section numbers or maybe just the generic one (since reading the section is optional). I'm a bit worried of the scheme of emitting symbol indexes into a> section. AFAIK there is nothing else in ELF which puts symbol indexes in > data at the moment (only in relocations and section headers). In > particular, if anyone were to use a tool which rewrites the symbol table, > it'll break things, unless that tool knows about this special section. >The design accounts for this :) To begin with, in practice I don't think we can get this right for every conceivable tool, because the tool could put something in the object file that would invalidate the metadata by making a symbol address-significant. For example, I can use ld -r to combine a metadata-containing object file defining a function foo with a non-metadata-containing object file defining a function bar that returns the address of foo, which would invalidate the metadata for foo. So I think the best that we can hope for is to arrange for most tools to "naturally" invalidate the metadata. There turns out to be a way to do this: most tools will reset the sh_link field of an unrecognized section to zero if that sh_link field points to the .symtab section. GNU objcopy, ld.bfd -r, ld.gold -r and ld.lld -r all do this. (It looks like llvm-objcopy will preserve the sh_link, but we can fix that.) So what we can do is make the sh_link in our section point to .symtab and use sh_link=0 as a signal that a tool has operated on the object file, and therefore ignore the section. This resetting of sh_link for unrecognized sections doesn't appear to be required by the generic-abi, so this is probably something that we'd want to bring up there in addition to the section itself. Also, this should hopefully become a non-problem once this proposal makes its way into either the generic ABI or GNU-gABI and tools learn about the section.> I wonder if it's possible to put the data in the symbol table. > Unfortunately, there's not a whole lot of available space there... > > The "st_other" field has some space available -- only the bottom 2 bits > are currently used in general for visibility, so one could imagine adding a > flag at 0x04, perhaps, for this indicator. Unfortunately, the bits of this > field are widely used by a variety of architecture-specific things, which > makes that rather more complicated. MIPS uses almost all the remaining bits > in that field, but seemingly not bit 3. PowerPC uses bits 5-8 for the > local-call optimization, leaving bits 3-4. Alpha uses bits 4 and 8 for what > I think may be a similar optimization. IA64 OpenVMS uses bits 5-8 for > additional function-type and linkage annotations. m68k uses bits 7-8 for > identifying "far" functions and interrupt handlers. > > So...that might be viable -- just barely -- but even after suggesting > that, I'd not really want to argue for it. =) > >I thought about using st_other for this, but I came to the same conclusion that it would probably be too hard to stake out a bit with the architecture-specific things going on there. From the looks of things it looks like MIPS is (somewhat gratuitously in the case of STO_MIPS_MIPS16) using all of the "unused" bits: http://llvm-cs.pcc.me.uk /include/llvm/BinaryFormat/ELF.h#554 So if we did something with st_other we'd probably need to exclude MIPS and maybe other architectures and get tools to do the right thing (which seems harder than with the sh_link trick). Thanks, -- -- Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180523/139a5f60/attachment.html>
Peter Collingbourne via llvm-dev
2018-May-23 19:16 UTC
[llvm-dev] Proposal for address-significance tables for --icf=safe
On Wed, May 23, 2018 at 3:25 AM, Peter Smith <peter.smith at linaro.org> wrote:> Hello, > > I think that the approach of using a section to record address > significance is a good one. I'm guessing it will have its own section > type and format? If it does would it make sense to try and submit this > to the GABI https://groups.google.com/forum/#!forum/generic-abi as it > could be potentially useful for other linkers, for example gold? >Yes, there is a new section type (SHT_LLVM_ADDRSIG) and format (a sequence of ULEB128-encoded symbol table indexes that are address-significant). I think it makes sense for this to eventually be part of the generic ABI, and I will send a proposal to generic-abi. As I mentioned in my reply to James Knight, I don't think we should block on getting a section number assignment, but we can at least incorporate any design feedback from that proposal. Peter> Happy to help out with reviews. > > Peter > > > On 22 May 2018 at 23:06, Peter Collingbourne via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > Hi all, > > > > Context: ld.gold has an --icf=safe flag which is intended to apply ICF > only > > to sections which can be safely merged according to the guarantees > provided > > by the language. It works using a set of heuristics (symbol name matching > > and relocation scanning). That's not only imprecise but it only works > with > > certain languages and is slow due to the need to demangle symbols and > scan > > relocations. It's also redundant with the (local_)unnamed_addr analysis > > already performed by LLVM. > > > > I implemented an alternative to this approach in clang and lld. It works > by > > adding a section to each object file containing the indexes of the > symbols > > which are address-significant (i.e. not (local_)unnamed_addr in IR). > > > > I used this implementation to link clang with release+asserts with each > of > > --icf={none,safe,all}. The binary sizes were: > > > > none: 109407184 > > safe: 108534736 (-0.8%) > > all: 107281360 (-2%) > > > > I measured the object file overhead of these sections in my clang build > at > > 0.08%. That's almost nothing, and I think it's small enough that we can > turn > > it on by default. > > > > I've uploaded a patch series for this feature here: > > https://github.com/pcc/llvm-project/tree/llvm-addrsig > > I intend to start sending it for review soon. > > > > Thanks, > > -- > > -- > > Peter > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > >-- -- Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180523/a7a7420a/attachment.html>
bd1976 llvm via llvm-dev
2018-May-31 20:52 UTC
[llvm-dev] Proposal for address-significance tables for --icf=safe
Hi Peter, This is a great proposal, thanks!. If you were worried about making the abi change have you thought about just going for an array of symbol names or hashes of symbol names where any matching symbol is considered address significant? This would sidestep the problem of keeping the symbol table indices in sync. It would be pessimistic for local symbols if the input SHT_ADDRSIG sections were combined by e.g. "ld -r" but in my experience this should not have too much of an impact on --icf. Might be worth considering in the short term whilst you work on getting gabi acceptance. On Tue, May 22, 2018 at 11:06 PM, Peter Collingbourne via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi all, > > Context: ld.gold has an --icf=safe flag which is intended to apply ICF > only to sections which can be safely merged according to the guarantees > provided by the language. It works using a set of heuristics (symbol name > matching and relocation scanning). That's not only imprecise but it only > works with certain languages and is slow due to the need to demangle > symbols and scan relocations. It's also redundant with the > (local_)unnamed_addr analysis already performed by LLVM. > > I implemented an alternative to this approach in clang and lld. It works > by adding a section to each object file containing the indexes of the > symbols which are address-significant (i.e. not (local_)unnamed_addr in IR). > > I used this implementation to link clang with release+asserts with each of > --icf={none,safe,all}. The binary sizes were: > > none: 109407184 > safe: 108534736 (-0.8%) > all: 107281360 (-2%) > > I measured the object file overhead of these sections in my clang build at > 0.08%. That's almost nothing, and I think it's small enough that we can > turn it on by default. > > I've uploaded a patch series for this feature here: > https://github.com/pcc/llvm-project/tree/llvm-addrsig > I intend to start sending it for review soon. > > Thanks, > -- > -- > Peter > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180531/77e6efce/attachment.html>
Peter Collingbourne via llvm-dev
2018-May-31 21:05 UTC
[llvm-dev] Proposal for address-significance tables for --icf=safe
The hash approach was suggested by others as well, but I think for now we can use the sh_link field until the tools are updated -- that was the recommended approach on the generic-abi thread as well. Keep in mind that updating the gABI is really orthogonal to the compatibility issue: even with an updated gABI we'd still have the practical problem of needing to deal with old tools somehow. Agreed that we'd be fine pessimising ld -r -- the main thing that I'm concerned about is avoiding miscompiles caused by shuffling symbols. Peter On Thu, May 31, 2018 at 1:52 PM, bd1976 llvm <bd1976llvm at gmail.com> wrote:> Hi Peter, This is a great proposal, thanks!. > > If you were worried about making the abi change have you > thought about just going for an array of symbol names > or hashes of symbol names where any matching symbol is > considered address significant? This would sidestep the > problem of keeping the symbol table indices in sync. > > It would be pessimistic for local symbols if the input > SHT_ADDRSIG sections were combined by e.g. "ld -r" but > in my experience this should not have too much of an > impact on --icf. > > Might be worth considering in the short term whilst you > work on getting gabi acceptance. > > On Tue, May 22, 2018 at 11:06 PM, Peter Collingbourne via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi all, >> >> Context: ld.gold has an --icf=safe flag which is intended to apply ICF >> only to sections which can be safely merged according to the guarantees >> provided by the language. It works using a set of heuristics (symbol name >> matching and relocation scanning). That's not only imprecise but it only >> works with certain languages and is slow due to the need to demangle >> symbols and scan relocations. It's also redundant with the >> (local_)unnamed_addr analysis already performed by LLVM. >> >> I implemented an alternative to this approach in clang and lld. It works >> by adding a section to each object file containing the indexes of the >> symbols which are address-significant (i.e. not (local_)unnamed_addr in IR). >> >> I used this implementation to link clang with release+asserts with each >> of --icf={none,safe,all}. The binary sizes were: >> >> none: 109407184 >> safe: 108534736 (-0.8%) >> all: 107281360 (-2%) >> >> I measured the object file overhead of these sections in my clang build >> at 0.08%. That's almost nothing, and I think it's small enough that we can >> turn it on by default. >> >> I've uploaded a patch series for this feature here: >> https://github.com/pcc/llvm-project/tree/llvm-addrsig >> I intend to start sending it for review soon. >> >> Thanks, >> -- >> -- >> Peter >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> >-- -- Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180531/c59bda9f/attachment.html>
Possibly Parallel Threads
- Proposal for address-significance tables for --icf=safe
- Proposal for address-significance tables for --icf=safe
- Proposal for address-significance tables for --icf=safe
- Proposal for address-significance tables for --icf=safe
- Proposal for address-significance tables for --icf=safe