kyra via llvm-dev
2017-Oct-10 19:41 UTC
[llvm-dev] Make LLD output COFF relocatable object file (like ELF's -r does). How much work is required to implement this?
On 10/10/2017 9:00 PM, Rui Ueyama wrote:> I'm not sure if I understand correctly. If my understanding is > correct, you are saying that GHC can link either .o or .so at runtime, > which sounds a bit odd because .o is not designed for dynamic linking. > Am I missing something?Yes, GHC runtime linker *does* link .o files not only doing all necessary relocations but also creating trampolines for "far" code to fulfill "small" memory model.> I also do not understand why only static libraries need "compile/link > pass" -- they at least don't need a compile pass, as they contain > compiled .o files, and they indeed need a link pass, but that's also > true for a single big .o file generated by -r, no? After all, in order > to link against a .a file, I think you need to pull out a .o file from > a .a and do whatever you need to do to link a single big .o file.Don't quite understand this. The idea is that when creating a package you should *at the very least* provide a static library a client can statically link against. You optionally may create a shared library for a client to link against, but to do so you should *recompile* the whole package because things differ now (this is how GHC works), – you can't simply link all your existing object code (what you've produced the static library from) into this shared library. But if you want to provide the single prelinked *.o file (for GHC runtime linker consumption) you need no to perform any extra compile step, you simply link all your object files (exactly those which went to the package's static library) into this *.o file with 'ld -r'.> IIUC, GHC is faster when handling .a files compared to a prelinked big > .o file, even if they contain the same binary code/data. But it sounds > like an artifact of the current implementation of GHC, because, in > theory, there's no reason the former is much inefficient than the > latter. If that's the case, doesn't it make more sense to improve GHC?No. GHC **runtime** linker is much slower when handling *.a files (and this is exactly the culprit of this whole story) since it goes through the whole archive and links each object module separately doing all resolutions and relocations and trampolines, than when linking already prelinked big *.o file. There are, perhaps, some confusions related to what GHC *runtime* linker is. GHC runtime linker goes out into the scene when either GHC is used interactively, or GHC encounters the code which it has to execute at compile time (Template Haskell/quasiquotations). Thus GHC compiler must link some external code during it's own run time. HTH. Cheers, Kyra
Rui Ueyama via llvm-dev
2017-Oct-10 20:01 UTC
[llvm-dev] Make LLD output COFF relocatable object file (like ELF's -r does). How much work is required to implement this?
On Tue, Oct 10, 2017 at 12:41 PM, kyra <kyrab at mail.ru> wrote:> On 10/10/2017 9:00 PM, Rui Ueyama wrote: > >> I'm not sure if I understand correctly. If my understanding is correct, >> you are saying that GHC can link either .o or .so at runtime, which sounds >> a bit odd because .o is not designed for dynamic linking. Am I missing >> something? >> > Yes, GHC runtime linker *does* link .o files not only doing all necessary > relocations but also creating trampolines for "far" code to fulfill "small" > memory model. > > I also do not understand why only static libraries need "compile/link >> pass" -- they at least don't need a compile pass, as they contain compiled >> .o files, and they indeed need a link pass, but that's also true for a >> single big .o file generated by -r, no? After all, in order to link against >> a .a file, I think you need to pull out a .o file from a .a and do whatever >> you need to do to link a single big .o file. >> > Don't quite understand this. > The idea is that when creating a package you should *at the very least* > provide a static library a client can statically link against. You > optionally may create a shared library for a client to link against, but to > do so you should *recompile* the whole package because things differ now > (this is how GHC works), – you can't simply link all your existing object > code (what you've produced the static library from) into this shared > library. But if you want to provide the single prelinked *.o file (for GHC > runtime linker consumption) you need no to perform any extra compile step, > you simply link all your object files (exactly those which went to the > package's static library) into this *.o file with 'ld -r'. > > IIUC, GHC is faster when handling .a files compared to a prelinked big .o >> file, even if they contain the same binary code/data. But it sounds like an >> artifact of the current implementation of GHC, because, in theory, there's >> no reason the former is much inefficient than the latter. If that's the >> case, doesn't it make more sense to improve GHC? >> > No. GHC **runtime** linker is much slower when handling *.a files (and > this is exactly the culprit of this whole story) since it goes through the > whole archive and links each object module separately doing all resolutions > and relocations and trampolines, than when linking already prelinked big > *.o file. >Looks like I still do not understand why a .a can be much slower than a prelinked .o. As far as I understand, "ld -r" doesn't reduce amount of data that much. It doesn't reduce the number of relocations, as relocations in input object files are basically passed through to the output. It doesn't reduce the number of symbols that much, as the combined object file contains a union of all symbols appeared in the input files. So, I think the amount of data in a .a is essentially the same as a prelinked .o. I wonder what can make a difference in speed. There are, perhaps, some confusions related to what GHC *runtime* linker> is. GHC runtime linker goes out into the scene when either GHC is used > interactively, or GHC encounters the code which it has to execute at > compile time (Template Haskell/quasiquotations). Thus GHC compiler must > link some external code during it's own run time. > > HTH. > > Cheers, > Kyra > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171010/54f9aedd/attachment.html>
Reid Kleckner via llvm-dev
2017-Oct-10 21:20 UTC
[llvm-dev] Make LLD output COFF relocatable object file (like ELF's -r does). How much work is required to implement this?
On Tue, Oct 10, 2017 at 1:01 PM, Rui Ueyama via llvm-dev < llvm-dev at lists.llvm.org> wrote:> No. GHC **runtime** linker is much slower when handling *.a files (and >> this is exactly the culprit of this whole story) since it goes through the >> whole archive and links each object module separately doing all resolutions >> and relocations and trampolines, than when linking already prelinked big >> *.o file. >> > > Looks like I still do not understand why a .a can be much slower than a > prelinked .o. As far as I understand, "ld -r" doesn't reduce amount of data > that much. It doesn't reduce the number of relocations, as relocations in > input object files are basically passed through to the output. It doesn't > reduce the number of symbols that much, as the combined object file > contains a union of all symbols appeared in the input files. So, I think > the amount of data in a .a is essentially the same as a prelinked .o. I > wonder what can make a difference in speed. >I can't speak for Haskell, but ld -r can be useful for speeding up C++ links, because it acts as a pre-merging step for duplicate comdats. Consider a library that uses many instantiations of the same template with the same type. An archive will contain many copies of the template, but a relocated object file will only contain one. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171010/f1cdef65/attachment.html>
kyra via llvm-dev
2017-Oct-10 21:21 UTC
[llvm-dev] Make LLD output COFF relocatable object file (like ELF's -r does). How much work is required to implement this?
On 10/10/2017 11:01 PM, Rui Ueyama wrote:> Looks like I still do not understand why a .a can be much slower than > a prelinked .o. As far as I understand, "ld -r" doesn't reduce amount > of data that much. It doesn't reduce the number of relocations, as > relocations in input object files are basically passed through to the > output. It doesn't reduce the number of symbols that much, as the > combined object file contains a union of all symbols appeared in the > input files. So, I think the amount of data in a .a is essentially the > same as a prelinked .o. I wonder what can make a difference in speed.Ah, good point. Only now have I realized that my perception of link times was formed when no '-split-sections' option existed. The corresponding option was '-split-obs' and typical package 's static library contained thousands object modules. For example: The latest official GHC 8.2.1 release "base" package's static library built with '-split-objs' contains 25631 object modules. The static library size is 28MB, prelinked object file size is 15MB. My own custom built GHC ghc-8.3.20170619 release "base" package's static library built with '-split-sections' (instead of '-split-objs') contains 228 object modules only. The static library size is 22MB, prelinked object file size is 15MB. Thus, when working with "-split-sections" libraries we won't, perhaps, see that big differences in link times (remember we mean GHC runtime linker here) between these libraries and their prelinked object counterparts. Thus, perhaps, having '-r' option in COFF LLD is becoming much less important than I though before. Cheers, Kyra