Sam Clegg via llvm-dev
2017-Jul-12 18:31 UTC
[llvm-dev] [LLD] Adding WebAssembly support to lld
On Mon, Jul 10, 2017 at 4:13 PM, Rui Ueyama via llvm-dev <llvm-dev at lists.llvm.org> wrote:> Sorry for the belated response. I was on vacation last week. A couple of > thoughts on this patch and the story of webassembly linking.And I'm about to be on (mostly) vacation for next 3 weeks :)> > - This patch adds a wasm support as yet another major architecture besides > ELF and COFF. That is fine and actually aligned to the design principle of > the current lld. Wasm is probably more different than ELF against COFF, and > the reason why we separated COFF and ELF was because they are different > enough that it is easier to handle them separately rather than writing a > complex compatibility layer for the two. So that is I think the right design > chocie. That being said, some files are unnecessarily copied to all targets. > Particularly, Error.{cpp,h} and Memory.{h,cpp} need to be merged because > they are mostly identical.I concur. However, would you accept the wasm port landing first, and then factoring some kind of library out of the 3 backends after that? Personally I would prefer to land the initial version without touching the ELF/COFF backends and refactor in a second pass.> - I can imagine that you would eventually want to support two modes of wasm > object files. In one form, object files are represented in the compact > format using LEB128 encoding, and the linker has to decode and re-encode > LEB128 instruction streams. In the other form, they are still in LEB128 but > uses full 5 bytes for 4-byte numbers, so that you can just concatenate them > without decoding/re-encoding. Which mode do you want to make default? The > latter should be much faster than the former (or the former is probably > unnecessarily slow), and because the regular compile-link-run cycle is very > important for developers, I'd guess that making the latter default is a > reasonable choice, although this patch implements the former. What do you > think about it?Yes, currently relocatable wasm files (as produced by clang) use fixed width LEB128 (padded to five bytes) for any relocation targets. This allows the linker to trivially apply relocations and blindly concatenate data a code sections. We specifically avoid any instruction decoding in the linker. The plan is to add a optional pass over the generated code section of an executable file to compress the relocation targets to their normal LEB128 size. Whether or not to make this the default is TBD.> - Storing the length and a hash value for each symbol in the symbol table > may speed up linking. We've learned that finding terminating NULs and > computing hash values for symbols is time-consuming process in the linker.Yes, I imagine we could even share some of the core symbol table code via the above mentioned library?> > > > On Thu, Jul 6, 2017 at 3:38 PM, Rafael Avila de Espindola via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> >> Dan Gohman <sunfish at mozilla.com> writes: >> >> >> Sorry, I meant why that didn't work with ELF (or what else didn't). >> >> >> > >> > The standard executable WebAssembly format does not use ELF, for >> > numerous >> > reasons, most visibly that ELF is designed for sparse decoding -- >> > headers >> > contain offsets to arbitrary points in the file, while WebAssembly's >> > format >> > is designed for streaming decoding. Also, as Sam mentioned, there are a >> > lot >> > of conceptual differences. In ELF, virtual addresses are a pervasive >> > organizing principle; in WebAssembly, it's possible to think about >> > various >> > index spaces as virtual address spaces, but not all >> > address-space-oriented >> > assumptions apply. >> >> I can see why you would want your own format for distribution. My >> question was really about using ELF for the .o files. >> >> > It would also be possible for WebAssembly to use ELF ET_REL files just >> > for >> > linking, however telling LLVM and other tools to target ELF tends to >> > lead >> > them to assume that the final output is ELF and rely on ELF-specific >> > features. >> >> Things like "the dynamic linker implements copy relocations"? >> >> Cheers, >> Rafael >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Rui Ueyama via llvm-dev
2017-Jul-12 22:23 UTC
[llvm-dev] [LLD] Adding WebAssembly support to lld
On Wed, Jul 12, 2017 at 11:31 AM, Sam Clegg <sbc at chromium.org> wrote:> On Mon, Jul 10, 2017 at 4:13 PM, Rui Ueyama via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > Sorry for the belated response. I was on vacation last week. A couple of > > thoughts on this patch and the story of webassembly linking. > > And I'm about to be on (mostly) vacation for next 3 weeks :) > > > > > - This patch adds a wasm support as yet another major architecture > besides > > ELF and COFF. That is fine and actually aligned to the design principle > of > > the current lld. Wasm is probably more different than ELF against COFF, > and > > the reason why we separated COFF and ELF was because they are different > > enough that it is easier to handle them separately rather than writing a > > complex compatibility layer for the two. So that is I think the right > design > > chocie. That being said, some files are unnecessarily copied to all > targets. > > Particularly, Error.{cpp,h} and Memory.{h,cpp} need to be merged because > > they are mostly identical. > > I concur. However, would you accept the wasm port landing first, and > then factoring some kind of library out of the 3 backends after that? > Personally I would prefer to land the initial version without > touching the ELF/COFF backends and refactor in a second pass.Yes, we can do that later.> - I can imagine that you would eventually want to support two modes of > wasm > > object files. In one form, object files are represented in the compact > > format using LEB128 encoding, and the linker has to decode and re-encode > > LEB128 instruction streams. In the other form, they are still in LEB128 > but > > uses full 5 bytes for 4-byte numbers, so that you can just concatenate > them > > without decoding/re-encoding. Which mode do you want to make default? The > > latter should be much faster than the former (or the former is probably > > unnecessarily slow), and because the regular compile-link-run cycle is > very > > important for developers, I'd guess that making the latter default is a > > reasonable choice, although this patch implements the former. What do you > > think about it? > > Yes, currently relocatable wasm files (as produced by clang) use fixed > width LEB128 (padded to five bytes) for any relocation targets. This > allows the linker to trivially apply relocations and blindly > concatenate data a code sections. We specifically avoid any > instruction decoding in the linker. The plan is to add a optional > pass over the generated code section of an executable file to compress > the relocation targets to their normal LEB128 size. Whether or not to > make this the default is TBD.Does this strategy make sense? - make compilers always emit fixed-width LEB128, so that linkers can link them just by concatenating them and applying relocations, - make the linker emit fixed-width LEB128 by default as well, so that it can create executables as fast as it can just, and - write an optional re-encoder which decodes and re-encodes fixed-width LEB128 to "compress" the final output. The third one can be an internal linker pass which is invoked when you pass -O1 or something to the linker, but conceptually it is separated from the "main" linker. The rationale behind this strategy is that - Developers usually want to create outputs as fast as linkers can. Creating final executables for shipping is (probably by far) less frequent. I also expect that, if wasm will be successful, you'll be compiling and linking large programs using wasm as a target (on a successful platform, people start doing something incredible/crazy in general), so the toolchain performance will matter. You want to optimize it for regular compile-link-debug cycle. - Creating an output just by concatenating input file sections is I believe easier than decoding and re-encoding LEB128 fields. So I think we want to construct the linker based on that design, instead of directly emitting variable-size LEB128 fields.> - Storing the length and a hash value for each symbol in the symbol table > > may speed up linking. We've learned that finding terminating NULs and > > computing hash values for symbols is time-consuming process in the > linker. > > Yes, I imagine we could even share some of the core symbol table code > via the above mentioned library? > > > > > > > > > On Thu, Jul 6, 2017 at 3:38 PM, Rafael Avila de Espindola via llvm-dev > > <llvm-dev at lists.llvm.org> wrote: > >> > >> Dan Gohman <sunfish at mozilla.com> writes: > >> > >> >> Sorry, I meant why that didn't work with ELF (or what else didn't). > >> >> > >> > > >> > The standard executable WebAssembly format does not use ELF, for > >> > numerous > >> > reasons, most visibly that ELF is designed for sparse decoding -- > >> > headers > >> > contain offsets to arbitrary points in the file, while WebAssembly's > >> > format > >> > is designed for streaming decoding. Also, as Sam mentioned, there are > a > >> > lot > >> > of conceptual differences. In ELF, virtual addresses are a pervasive > >> > organizing principle; in WebAssembly, it's possible to think about > >> > various > >> > index spaces as virtual address spaces, but not all > >> > address-space-oriented > >> > assumptions apply. > >> > >> I can see why you would want your own format for distribution. My > >> question was really about using ELF for the .o files. > >> > >> > It would also be possible for WebAssembly to use ELF ET_REL files just > >> > for > >> > linking, however telling LLVM and other tools to target ELF tends to > >> > lead > >> > them to assume that the final output is ELF and rely on ELF-specific > >> > features. > >> > >> Things like "the dynamic linker implements copy relocations"? > >> > >> Cheers, > >> Rafael > >> _______________________________________________ > >> LLVM Developers mailing list > >> llvm-dev at lists.llvm.org > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170712/f14e32ef/attachment.html>
Sam Clegg via llvm-dev
2017-Jul-12 23:36 UTC
[llvm-dev] [LLD] Adding WebAssembly support to lld
On Wed, Jul 12, 2017 at 3:23 PM, Rui Ueyama <ruiu at google.com> wrote:> On Wed, Jul 12, 2017 at 11:31 AM, Sam Clegg <sbc at chromium.org> wrote: >> >> On Mon, Jul 10, 2017 at 4:13 PM, Rui Ueyama via llvm-dev >> <llvm-dev at lists.llvm.org> wrote: >> > Sorry for the belated response. I was on vacation last week. A couple of >> > thoughts on this patch and the story of webassembly linking. >> >> And I'm about to be on (mostly) vacation for next 3 weeks :) >> >> > >> > - This patch adds a wasm support as yet another major architecture >> > besides >> > ELF and COFF. That is fine and actually aligned to the design principle >> > of >> > the current lld. Wasm is probably more different than ELF against COFF, >> > and >> > the reason why we separated COFF and ELF was because they are different >> > enough that it is easier to handle them separately rather than writing a >> > complex compatibility layer for the two. So that is I think the right >> > design >> > chocie. That being said, some files are unnecessarily copied to all >> > targets. >> > Particularly, Error.{cpp,h} and Memory.{h,cpp} need to be merged because >> > they are mostly identical. >> >> I concur. However, would you accept the wasm port landing first, and >> then factoring some kind of library out of the 3 backends after that? >> Personally I would prefer to land the initial version without >> touching the ELF/COFF backends and refactor in a second pass. > > > Yes, we can do that later. > >> > - I can imagine that you would eventually want to support two modes of >> > wasm >> > object files. In one form, object files are represented in the compact >> > format using LEB128 encoding, and the linker has to decode and re-encode >> > LEB128 instruction streams. In the other form, they are still in LEB128 >> > but >> > uses full 5 bytes for 4-byte numbers, so that you can just concatenate >> > them >> > without decoding/re-encoding. Which mode do you want to make default? >> > The >> > latter should be much faster than the former (or the former is probably >> > unnecessarily slow), and because the regular compile-link-run cycle is >> > very >> > important for developers, I'd guess that making the latter default is a >> > reasonable choice, although this patch implements the former. What do >> > you >> > think about it? >> >> Yes, currently relocatable wasm files (as produced by clang) use fixed >> width LEB128 (padded to five bytes) for any relocation targets. This >> allows the linker to trivially apply relocations and blindly >> concatenate data a code sections. We specifically avoid any >> instruction decoding in the linker. The plan is to add a optional >> pass over the generated code section of an executable file to compress >> the relocation targets to their normal LEB128 size. Whether or not to >> make this the default is TBD. > > > Does this strategy make sense? > > - make compilers always emit fixed-width LEB128, so that linkers can link > them just by concatenating them and applying relocations, > - make the linker emit fixed-width LEB128 by default as well, so that it > can create executables as fast as it can just, and > - write an optional re-encoder which decodes and re-encodes fixed-width > LEB128 to "compress" the final output. > > The third one can be an internal linker pass which is invoked when you pass > -O1 or something to the linker, but conceptually it is separated from the > "main" linker.IIUC that is exactly the strategy I am suggesting. Perhaps my description of it was less clear. The currently implement does this, with caveat that the final (optional) compression phase is not yet implemented :)> > The rationale behind this strategy is that > > - Developers usually want to create outputs as fast as linkers can. Creating > final executables for shipping is (probably by far) less frequent. I also > expect that, if wasm will be successful, you'll be compiling and linking > large programs using wasm as a target (on a successful platform, people > start doing something incredible/crazy in general), so the toolchain > performance will matter. You want to optimize it for regular > compile-link-debug cycle. > - Creating an output just by concatenating input file sections is I believe > easier than decoding and re-encoding LEB128 fields. So I think we want to > construct the linker based on that design, instead of directly emitting > variable-size LEB128 fields. > > >> > - Storing the length and a hash value for each symbol in the symbol >> > table >> > may speed up linking. We've learned that finding terminating NULs and >> > computing hash values for symbols is time-consuming process in the >> > linker. >> >> Yes, I imagine we could even share some of the core symbol table code >> via the above mentioned library? >> >> > >> > >> > >> > On Thu, Jul 6, 2017 at 3:38 PM, Rafael Avila de Espindola via llvm-dev >> > <llvm-dev at lists.llvm.org> wrote: >> >> >> >> Dan Gohman <sunfish at mozilla.com> writes: >> >> >> >> >> Sorry, I meant why that didn't work with ELF (or what else didn't). >> >> >> >> >> > >> >> > The standard executable WebAssembly format does not use ELF, for >> >> > numerous >> >> > reasons, most visibly that ELF is designed for sparse decoding -- >> >> > headers >> >> > contain offsets to arbitrary points in the file, while WebAssembly's >> >> > format >> >> > is designed for streaming decoding. Also, as Sam mentioned, there are >> >> > a >> >> > lot >> >> > of conceptual differences. In ELF, virtual addresses are a pervasive >> >> > organizing principle; in WebAssembly, it's possible to think about >> >> > various >> >> > index spaces as virtual address spaces, but not all >> >> > address-space-oriented >> >> > assumptions apply. >> >> >> >> I can see why you would want your own format for distribution. My >> >> question was really about using ELF for the .o files. >> >> >> >> > It would also be possible for WebAssembly to use ELF ET_REL files >> >> > just >> >> > for >> >> > linking, however telling LLVM and other tools to target ELF tends to >> >> > lead >> >> > them to assume that the final output is ELF and rely on ELF-specific >> >> > features. >> >> >> >> Things like "the dynamic linker implements copy relocations"? >> >> >> >> Cheers, >> >> Rafael >> >> _______________________________________________ >> >> LLVM Developers mailing list >> >> llvm-dev at lists.llvm.org >> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > >> > >> > >> > _______________________________________________ >> > LLVM Developers mailing list >> > llvm-dev at lists.llvm.org >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > > >