thr3ads.net - llvm dev - [llvm-dev] [LLD] Adding WebAssembly support to lld [Jul 2017]

If this information is useful, please help other people find it:
Share via:

Sam Clegg via llvm-dev

2017-Jul-12 18:31 UTC

[llvm-dev] [LLD] Adding WebAssembly support to lld

On Mon, Jul 10, 2017 at 4:13 PM, Rui Ueyama via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> Sorry for the belated response. I was on vacation last week. A couple of
> thoughts on this patch and the story of webassembly linking.
And I'm about to be on (mostly) vacation for next 3 weeks :)
>
> - This patch adds a wasm support as yet another major architecture besides
> ELF and COFF. That is fine and actually aligned to the design principle of
> the current lld. Wasm is probably more different than ELF against COFF, and
> the reason why we separated COFF and ELF was because they are different
> enough that it is easier to handle them separately rather than writing a
> complex compatibility layer for the two. So that is I think the right
design
> chocie. That being said, some files are unnecessarily copied to all
targets.
> Particularly, Error.{cpp,h} and Memory.{h,cpp} need to be merged because
> they are mostly identical.
I concur.  However, would you accept the wasm port landing first, and
then factoring some kind of library out of the 3 backends after that?
 Personally I would prefer to land the initial version without
touching the ELF/COFF backends and refactor in a second pass.
> - I can imagine that you would eventually want to support two modes of wasm
> object files. In one form, object files are represented in the compact
> format using LEB128 encoding, and the linker has to decode and re-encode
> LEB128 instruction streams. In the other form, they are still in LEB128 but
> uses full 5 bytes for 4-byte numbers, so that you can just concatenate them
> without decoding/re-encoding. Which mode do you want to make default? The
> latter should be much faster than the former (or the former is probably
> unnecessarily slow), and because the regular compile-link-run cycle is very
> important for developers, I'd guess that making the latter default is a
> reasonable choice, although this patch implements the former. What do you
> think about it?
Yes, currently relocatable wasm files (as produced by clang) use fixed
width LEB128 (padded to five bytes) for any relocation targets.  This
allows the linker to trivially apply relocations and blindly
concatenate data a code sections.  We specifically avoid any
instruction decoding in the linker.   The plan is to add a optional
pass over the generated code section of an executable file to compress
the relocation targets to their normal LEB128 size.  Whether or not to
make this the default is TBD.
> - Storing the length and a hash value for each symbol in the symbol table
> may speed up linking. We've learned that finding terminating NULs and
> computing hash values for symbols is time-consuming process in the linker.
Yes, I imagine we could even share some of the core symbol table code
via the above mentioned library?
>
>
>
> On Thu, Jul 6, 2017 at 3:38 PM, Rafael Avila de Espindola via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>>
>> Dan Gohman <sunfish at mozilla.com> writes:
>>
>> >> Sorry, I meant why that didn't work with ELF (or what else
didn't).
>> >>
>> >
>> > The standard executable WebAssembly format does not use ELF, for
>> > numerous
>> > reasons, most visibly that ELF is designed for sparse decoding --
>> > headers
>> > contain offsets to arbitrary points in the file, while
WebAssembly's
>> > format
>> > is designed for streaming decoding. Also, as Sam mentioned, there
are a
>> > lot
>> > of conceptual differences. In ELF, virtual addresses are a
pervasive
>> > organizing principle; in WebAssembly, it's possible to think
about
>> > various
>> > index spaces as virtual address spaces, but not all
>> > address-space-oriented
>> > assumptions apply.
>>
>> I can see why you would want your own format for distribution. My
>> question was really about using ELF for the .o files.
>>
>> > It would also be possible for WebAssembly to use ELF ET_REL files
just
>> > for
>> > linking, however telling LLVM and other tools to target ELF tends
to
>> > lead
>> > them to assume that the final output is ELF and rely on
ELF-specific
>> > features.
>>
>> Things like "the dynamic linker implements copy relocations"?
>>
>> Cheers,
>> Rafael
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Rui Ueyama via llvm-dev

2017-Jul-12 22:23 UTC

head link

[llvm-dev] [LLD] Adding WebAssembly support to lld

On Wed, Jul 12, 2017 at 11:31 AM, Sam Clegg <sbc at chromium.org> wrote:
> On Mon, Jul 10, 2017 at 4:13 PM, Rui Ueyama via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> > Sorry for the belated response. I was on vacation last week. A couple
of
> > thoughts on this patch and the story of webassembly linking.
>
> And I'm about to be on (mostly) vacation for next 3 weeks :)
>
> >
> > - This patch adds a wasm support as yet another major architecture
> besides
> > ELF and COFF. That is fine and actually aligned to the design
principle
> of
> > the current lld. Wasm is probably more different than ELF against
COFF,
> and
> > the reason why we separated COFF and ELF was because they are
different
> > enough that it is easier to handle them separately rather than writing
a
> > complex compatibility layer for the two. So that is I think the right
> design
> > chocie. That being said, some files are unnecessarily copied to all
> targets.
> > Particularly, Error.{cpp,h} and Memory.{h,cpp} need to be merged
because
> > they are mostly identical.
>
> I concur.  However, would you accept the wasm port landing first, and
> then factoring some kind of library out of the 3 backends after that?
>  Personally I would prefer to land the initial version without
> touching the ELF/COFF backends and refactor in a second pass.

Yes, we can do that later.
> - I can imagine that you would eventually want to support two modes of
> wasm
> > object files. In one form, object files are represented in the compact
> > format using LEB128 encoding, and the linker has to decode and
re-encode
> > LEB128 instruction streams. In the other form, they are still in
LEB128
> but
> > uses full 5 bytes for 4-byte numbers, so that you can just concatenate
> them
> > without decoding/re-encoding. Which mode do you want to make default?
The
> > latter should be much faster than the former (or the former is
probably
> > unnecessarily slow), and because the regular compile-link-run cycle is
> very
> > important for developers, I'd guess that making the latter default
is a
> > reasonable choice, although this patch implements the former. What do
you
> > think about it?
>
> Yes, currently relocatable wasm files (as produced by clang) use fixed
> width LEB128 (padded to five bytes) for any relocation targets.  This
> allows the linker to trivially apply relocations and blindly
> concatenate data a code sections.  We specifically avoid any
> instruction decoding in the linker.   The plan is to add a optional
> pass over the generated code section of an executable file to compress
> the relocation targets to their normal LEB128 size.  Whether or not to
> make this the default is TBD.

Does this strategy make sense?

 - make compilers always emit fixed-width LEB128, so that linkers can link
them just by concatenating them and applying relocations,
 - make the linker emit fixed-width LEB128 by default as well, so that it
can create executables as fast as it can just, and
 - write an optional re-encoder which decodes and re-encodes fixed-width
LEB128 to "compress" the final output.

The third one can be an internal linker pass which is invoked when you pass
-O1 or something to the linker, but conceptually it is separated from the
"main" linker.

The rationale behind this strategy is that

- Developers usually want to create outputs as fast as linkers can.
Creating final executables for shipping is (probably by far) less frequent.
I also expect that, if wasm will be successful, you'll be compiling and
linking large programs using wasm as a target (on a successful platform,
people start doing something incredible/crazy in general), so the toolchain
performance will matter. You want to optimize it for regular
compile-link-debug cycle.
- Creating an output just by concatenating input file sections is I believe
easier than decoding and re-encoding LEB128 fields. So I think we want to
construct the linker based on that design, instead of directly emitting
variable-size LEB128 fields.

> - Storing the length and a hash value for each symbol in the symbol table
> > may speed up linking. We've learned that finding terminating NULs
and
> > computing hash values for symbols is time-consuming process in the
> linker.
>
> Yes, I imagine we could even share some of the core symbol table code
> via the above mentioned library?
>
> >
> >
> >
> > On Thu, Jul 6, 2017 at 3:38 PM, Rafael Avila de Espindola via llvm-dev
> > <llvm-dev at lists.llvm.org> wrote:
> >>
> >> Dan Gohman <sunfish at mozilla.com> writes:
> >>
> >> >> Sorry, I meant why that didn't work with ELF (or what
else didn't).
> >> >>
> >> >
> >> > The standard executable WebAssembly format does not use ELF,
for
> >> > numerous
> >> > reasons, most visibly that ELF is designed for sparse
decoding --
> >> > headers
> >> > contain offsets to arbitrary points in the file, while
WebAssembly's
> >> > format
> >> > is designed for streaming decoding. Also, as Sam mentioned,
there are
> a
> >> > lot
> >> > of conceptual differences. In ELF, virtual addresses are a
pervasive
> >> > organizing principle; in WebAssembly, it's possible to
think about
> >> > various
> >> > index spaces as virtual address spaces, but not all
> >> > address-space-oriented
> >> > assumptions apply.
> >>
> >> I can see why you would want your own format for distribution. My
> >> question was really about using ELF for the .o files.
> >>
> >> > It would also be possible for WebAssembly to use ELF ET_REL
files just
> >> > for
> >> > linking, however telling LLVM and other tools to target ELF
tends to
> >> > lead
> >> > them to assume that the final output is ELF and rely on
ELF-specific
> >> > features.
> >>
> >> Things like "the dynamic linker implements copy
relocations"?
> >>
> >> Cheers,
> >> Rafael
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170712/f14e32ef/attachment.html>

Sam Clegg via llvm-dev

2017-Jul-12 23:36 UTC

head link

[llvm-dev] [LLD] Adding WebAssembly support to lld

On Wed, Jul 12, 2017 at 3:23 PM, Rui Ueyama <ruiu at google.com>
wrote:> On Wed, Jul 12, 2017 at 11:31 AM, Sam Clegg <sbc at chromium.org>
wrote:
>>
>> On Mon, Jul 10, 2017 at 4:13 PM, Rui Ueyama via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> > Sorry for the belated response. I was on vacation last week. A
couple of
>> > thoughts on this patch and the story of webassembly linking.
>>
>> And I'm about to be on (mostly) vacation for next 3 weeks :)
>>
>> >
>> > - This patch adds a wasm support as yet another major architecture
>> > besides
>> > ELF and COFF. That is fine and actually aligned to the design
principle
>> > of
>> > the current lld. Wasm is probably more different than ELF against
COFF,
>> > and
>> > the reason why we separated COFF and ELF was because they are
different
>> > enough that it is easier to handle them separately rather than
writing a
>> > complex compatibility layer for the two. So that is I think the
right
>> > design
>> > chocie. That being said, some files are unnecessarily copied to
all
>> > targets.
>> > Particularly, Error.{cpp,h} and Memory.{h,cpp} need to be merged
because
>> > they are mostly identical.
>>
>> I concur.  However, would you accept the wasm port landing first, and
>> then factoring some kind of library out of the 3 backends after that?
>>  Personally I would prefer to land the initial version without
>> touching the ELF/COFF backends and refactor in a second pass.
>
>
> Yes, we can do that later.
>
>> > - I can imagine that you would eventually want to support two
modes of
>> > wasm
>> > object files. In one form, object files are represented in the
compact
>> > format using LEB128 encoding, and the linker has to decode and
re-encode
>> > LEB128 instruction streams. In the other form, they are still in
LEB128
>> > but
>> > uses full 5 bytes for 4-byte numbers, so that you can just
concatenate
>> > them
>> > without decoding/re-encoding. Which mode do you want to make
default?
>> > The
>> > latter should be much faster than the former (or the former is
probably
>> > unnecessarily slow), and because the regular compile-link-run
cycle is
>> > very
>> > important for developers, I'd guess that making the latter
default is a
>> > reasonable choice, although this patch implements the former. What
do
>> > you
>> > think about it?
>>
>> Yes, currently relocatable wasm files (as produced by clang) use fixed
>> width LEB128 (padded to five bytes) for any relocation targets.  This
>> allows the linker to trivially apply relocations and blindly
>> concatenate data a code sections.  We specifically avoid any
>> instruction decoding in the linker.   The plan is to add a optional
>> pass over the generated code section of an executable file to compress
>> the relocation targets to their normal LEB128 size.  Whether or not to
>> make this the default is TBD.
>
>
> Does this strategy make sense?
>
>  - make compilers always emit fixed-width LEB128, so that linkers can link
> them just by concatenating them and applying relocations,
>  - make the linker emit fixed-width LEB128 by default as well, so that it
> can create executables as fast as it can just, and
>  - write an optional re-encoder which decodes and re-encodes fixed-width
> LEB128 to "compress" the final output.
>
> The third one can be an internal linker pass which is invoked when you pass
> -O1 or something to the linker, but conceptually it is separated from the
> "main" linker.
IIUC that is exactly the strategy I am suggesting.   Perhaps my
description of it was less clear.   The currently implement does this,
 with caveat that the final (optional) compression phase is not yet
implemented :)
>
> The rationale behind this strategy is that
>
> - Developers usually want to create outputs as fast as linkers can.
Creating
> final executables for shipping is (probably by far) less frequent. I also
> expect that, if wasm will be successful, you'll be compiling and
linking
> large programs using wasm as a target (on a successful platform, people
> start doing something incredible/crazy in general), so the toolchain
> performance will matter. You want to optimize it for regular
> compile-link-debug cycle.
> - Creating an output just by concatenating input file sections is I believe
> easier than decoding and re-encoding LEB128 fields. So I think we want to
> construct the linker based on that design, instead of directly emitting
> variable-size LEB128 fields.
>
>
>> > - Storing the length and a hash value for each symbol in the
symbol
>> > table
>> > may speed up linking. We've learned that finding terminating
NULs and
>> > computing hash values for symbols is time-consuming process in the
>> > linker.
>>
>> Yes, I imagine we could even share some of the core symbol table code
>> via the above mentioned library?
>>
>> >
>> >
>> >
>> > On Thu, Jul 6, 2017 at 3:38 PM, Rafael Avila de Espindola via
llvm-dev
>> > <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> Dan Gohman <sunfish at mozilla.com> writes:
>> >>
>> >> >> Sorry, I meant why that didn't work with ELF (or
what else didn't).
>> >> >>
>> >> >
>> >> > The standard executable WebAssembly format does not use
ELF, for
>> >> > numerous
>> >> > reasons, most visibly that ELF is designed for sparse
decoding --
>> >> > headers
>> >> > contain offsets to arbitrary points in the file, while
WebAssembly's
>> >> > format
>> >> > is designed for streaming decoding. Also, as Sam
mentioned, there are
>> >> > a
>> >> > lot
>> >> > of conceptual differences. In ELF, virtual addresses are
a pervasive
>> >> > organizing principle; in WebAssembly, it's possible
to think about
>> >> > various
>> >> > index spaces as virtual address spaces, but not all
>> >> > address-space-oriented
>> >> > assumptions apply.
>> >>
>> >> I can see why you would want your own format for distribution.
My
>> >> question was really about using ELF for the .o files.
>> >>
>> >> > It would also be possible for WebAssembly to use ELF
ET_REL files
>> >> > just
>> >> > for
>> >> > linking, however telling LLVM and other tools to target
ELF tends to
>> >> > lead
>> >> > them to assume that the final output is ELF and rely on
ELF-specific
>> >> > features.
>> >>
>> >> Things like "the dynamic linker implements copy
relocations"?
>> >>
>> >> Cheers,
>> >> Rafael
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> llvm-dev at lists.llvm.org
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>> >
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > llvm-dev at lists.llvm.org
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >
>
>

llvm dev - Jul 2017 - [LLD] Adding WebAssembly support to lld

[llvm-dev] [LLD] Adding WebAssembly support to lld

[llvm-dev] [LLD] Adding WebAssembly support to lld

[llvm-dev] [LLD] Adding WebAssembly support to lld