> Cool! > > So the thin archive is a divergence from the standard ar file (although it's > compatible with GNU). Is there any room to push it further? Last time I ran > the linker with profiling enabled, it spends a good amount of time just to > find the terminating nul character in the archive file symbol table. If we > store string length for each symbol, the linker can read archive files > faster.We can probably do it, yes. Take a look at the BSD format (used on OS X, I just implemented it in llvm). It is a bit better in that the symbol table is organized as a series of offset pairs. One to the member, one to the string table. This already improves handling on the traditional unix linker model where we scan each member to see if it should be included on the link. Once we find out it is to be included, it is really fast to scan past the member without looking for nulls as one has to do in the GNU format. That doesn't help with COFF were we do a single pass anyway, but there is more that we could benefit from the BSD format. I think that in practice the string table in in order, so that we can compute the string size by looking at the next member. I will give that a try. Another reason to come up with a thin BSD format variant :-) Cheers, Rafael P.S.: While testing the thin archive format I noticed that the thin .lib files were a lot bigger than what I was getting on linux. It turns out it was because cl.exe was producing .obj files with a *lot* more symbols than clang on linux. Trying clang on windows showed that cl.exe was not dropping what we call linkonce_odr, but clang on windows still produces more symbols than clang on linux.
On Wed, Jul 22, 2015 at 5:32 PM, Rafael Espíndola < rafael.espindola at gmail.com> wrote:> > Cool! > > > > So the thin archive is a divergence from the standard ar file (although > it's > > compatible with GNU). Is there any room to push it further? Last time I > ran > > the linker with profiling enabled, it spends a good amount of time just > to > > find the terminating nul character in the archive file symbol table. If > we > > store string length for each symbol, the linker can read archive files > > faster. > > We can probably do it, yes. > > Take a look at the BSD format (used on OS X, I just implemented it in > llvm). > > It is a bit better in that the symbol table is organized as a series > of offset pairs. One to the member, one to the string table. > > This already improves handling on the traditional unix linker model > where we scan each member to see if it should be included on the link. > Once we find out it is to be included, it is really fast to scan past > the member without looking for nulls as one has to do in the GNU > format. > > That doesn't help with COFF were we do a single pass anyway, but there > is more that we could benefit from the BSD format. I think that in > practice the string table in in order, so that we can compute the > string size by looking at the next member. I will give that a try. >Even if the string table is in in-order in practice, you have to search for NUL characters byte-by-byte unless it's really guaranteed to be in-order, no? Another reason to come up with a thin BSD format variant :-)> > Cheers, > Rafael > > P.S.: While testing the thin archive format I noticed that the thin > .lib files were a lot bigger than what I was getting on linux. It > turns out it was because cl.exe was producing .obj files with a *lot* > more symbols than clang on linux. Trying clang on windows showed that > cl.exe was not dropping what we call linkonce_odr, but clang on > windows still produces more symbols than clang on linux. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150723/7e80634d/attachment.html>
.> > > Even if the string table is in in-order in practice, you have to searchfor NUL characters byte-by-byte unless it's really guaranteed to be in-order, no?>No. Let's say we are at symbol N and want to find its size. Each symbol in the bsd format is represented with a pair of offsets. One to the member and one to the string table. We should be able to compute the symbol name size as the difference from the current symbol string table offset and the next symbol string table offset. Cheers, Rafael -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150723/754c9802/attachment.html>