Ömer Sinan Ağacan via llvm-dev
2020-Aug-05 04:05 UTC
[llvm-dev] Debugging a potential bug when generating wasm32
Hi,
Sorry if you've seen this message before on llvm.discourse.group or
elsewhere --
I've been trying to get to the bottom of this for a while now and asked
about
this in a few different platforms before.
I'm currently trying to debug a bug in a LLVM-generated Wasm code. The bug
could
be in the code that generates LLVM (rustc) or in the LLVM, I'm not sure yet.
LLVM IR and Wasm can be seen in [1].
The problem is this line:
(import "GOT.func"
"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E"
(global (;3;) (mut i32)))
The same symbol is already imported from "env" in the same module:
(import "env"
"_ZN5core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E"
(func (;4;) (type 1)))
So there's no need to import it from "GOT.func" and I want to get
rid of that
"GOT.func" import.
This LLVM IR is generated when compiling Rust code to a "staticlib",
which is
supposed to include *all* dependencies of the code so that it'll be linkable
with code for other languages. Because of the "GOT.func" import this
module is
not linkable, it needs to resolve that "GOT.func" import in runtime
using
dynamic linking for Wasm [2].
I'm trying to understand whether this is a rustc bug or an LLVM bug. I'm
using
LLVM 10 downloaded from the official web page and rustc nightly. I can build
LLVM from source and use it, but I don't have any experience in LLVM code
base.
Questions:
- Given a reference to a symbol, how does LLVM decide how to import it?
Currently I see these uses of the problematic symbol in LLVM IR:
- `store i8* bitcast (i1 (i32*, %"core::fmt::Formatter"*)*
@"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E"
to i8*), i8** %11, align 4`
- `store i8* bitcast (i1 (i32*, %"core::fmt::Formatter"*)*
@"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E"
to i8*), i8** %14, align 4`
- `store i8* bitcast (i1 (i32*, %"core::fmt::Formatter"*)*
@"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E"
to i8*), i8** %17, align 4`
- `declare zeroext i1
@"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E"(i32*
noalias readonly align 4 dereferenceable(4), %"core::fmt::Formatter"*
align 4 dereferenceable(36)) unnamed_addr #1`
First three look very similar so I'm guessing the first three are causing
one
of those imports, and the last one is causing the other import, but I'm
not
sure which one is generating which import. Any ideas?
- Any suggestions on how to debug this? Just knowing which line in the LLVM IR
listed above causes this "GOT.func" import would be helpful.
Thanks,
Ömer
[1]: https://gist.github.com/osa1/4c672fe8998c8e8768cf9f7c014c61d8
[2]:
https://github.com/WebAssembly/tool-conventions/blob/master/DynamicLinking.md
Thomas Lively via llvm-dev
2020-Aug-05 21:53 UTC
[llvm-dev] Debugging a potential bug when generating wasm32
+Sam Clegg <sbc at google.com> Is the expert on this dynamic linking stuff. A lot of us WebAssembly toolchain folks have been hanging out in the WebAssembly Discord <https://discord.gg/6UCUaMP>, especially on the #emscripten channel, so that would be another good place to ask future WebAssembly-specific questions. On Tue, Aug 4, 2020 at 9:06 PM Ömer Sinan Ağacan via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi, > > Sorry if you've seen this message before on llvm.discourse.group or > elsewhere -- > I've been trying to get to the bottom of this for a while now and asked > about > this in a few different platforms before. > > I'm currently trying to debug a bug in a LLVM-generated Wasm code. The bug > could > be in the code that generates LLVM (rustc) or in the LLVM, I'm not sure > yet. > LLVM IR and Wasm can be seen in [1]. > > The problem is this line: > > (import "GOT.func" > > "_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E" > (global (;3;) (mut i32))) > > The same symbol is already imported from "env" in the same module: > > (import "env" > > "_ZN5core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E" > (func (;4;) (type 1))) > > So there's no need to import it from "GOT.func" and I want to get rid of > that > "GOT.func" import. > > This LLVM IR is generated when compiling Rust code to a "staticlib", which > is > supposed to include *all* dependencies of the code so that it'll be > linkable > with code for other languages. Because of the "GOT.func" import this > module is > not linkable, it needs to resolve that "GOT.func" import in runtime using > dynamic linking for Wasm [2]. > > I'm trying to understand whether this is a rustc bug or an LLVM bug. I'm > using > LLVM 10 downloaded from the official web page and rustc nightly. I can > build > LLVM from source and use it, but I don't have any experience in LLVM code > base. > Questions: > > - Given a reference to a symbol, how does LLVM decide how to import it? > Currently I see these uses of the problematic symbol in LLVM IR: > > - `store i8* bitcast (i1 (i32*, %"core::fmt::Formatter"*)* > > @"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E" > to i8*), i8** %11, align 4` > > - `store i8* bitcast (i1 (i32*, %"core::fmt::Formatter"*)* > > @"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E" > to i8*), i8** %14, align 4` > > - `store i8* bitcast (i1 (i32*, %"core::fmt::Formatter"*)* > > @"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E" > to i8*), i8** %17, align 4` > > - `declare zeroext i1 > > @"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E"(i32* > noalias readonly align 4 dereferenceable(4), %"core::fmt::Formatter"* > align 4 dereferenceable(36)) unnamed_addr #1` > > First three look very similar so I'm guessing the first three are > causing one > of those imports, and the last one is causing the other import, but I'm > not > sure which one is generating which import. Any ideas? > > - Any suggestions on how to debug this? Just knowing which line in the > LLVM IR > listed above causes this "GOT.func" import would be helpful. > > Thanks, > > Ömer > > [1]: https://gist.github.com/osa1/4c672fe8998c8e8768cf9f7c014c61d8 > [2]: > https://github.com/WebAssembly/tool-conventions/blob/master/DynamicLinking.md > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200805/5f89e5bf/attachment.html>
Ömer Sinan Ağacan via llvm-dev
2020-Aug-06 14:29 UTC
[llvm-dev] Debugging a potential bug when generating wasm32
Thanks, I joined Discord as well.
Here's an update: I was able to come up with a tiny C program which is
compiled
to a similar "GOT.func" import. C code:
__attribute__ ((visibility("default")))
int c_fn_2(int x, int y)
{
return x + y;
}
__attribute__ ((visibility("default")))
int (*c_fn(void)) (int x, int y)
{
return &c_fn_2;
}
Compile with:
clang-10 -fPIC --target=wasm32-unknown-emscripten test.c -c -o test.o -O
wasm2wat test.o > test.wat
Generated wat:
(module
(type (;0;) (func (param i32 i32) (result i32)))
(type (;1;) (func (result i32)))
(import "env" "__linear_memory" (memory (;0;) 0))
(import "env" "__indirect_function_table" (table (;0;)
0 funcref))
(import "GOT.func" "c_fn_2" (global (;0;) (mut i32)))
(func $c_fn_2 (type 0) (param i32 i32) (result i32)
local.get 1
local.get 0
i32.add)
(func $c_fn (type 1) (result i32)
global.get 0))
Now if I remove the visibility attribute in `c_fn_2` the GOT.func import
disappears:
(module
(type (;0;) (func (param i32 i32) (result i32)))
(type (;1;) (func (result i32)))
(import "env" "__linear_memory" (memory (;0;) 0))
(import "env" "__indirect_function_table" (table (;0;)
0 funcref))
(import "env" "__table_base" (global (;0;) i32))
(func $c_fn_2 (type 0) (param i32 i32) (result i32)
local.get 1
local.get 0
i32.add)
(func $c_fn (type 1) (result i32)
global.get 0
i32.const 0
i32.add))
Comparing these two programs I think I see a potential answer to my question of
why this GOT.func import is needed. I can't find documentation how what
these
attributes mean exactly (the closest one I could find is [1]), but I think
without `visibility("default")` the symbol is not visible in other
compilation
units (here I think "compilation unit" means a module in Wasm, though
it may
also be a C compilation unit, I'm not sure), in other words it's
DSO-local. When
it's DSO-local the importing modules do not need to know about the
function's
table index as they can never refer to it directly (e.g. the C expression
`c_fn_2` doesn't make sense in the importing modules).
When we add `visibility("default")` and make c_fn_2 visible in other
modules we
need to be able to compare return value of `c_fn` with the value of symbol
`c_fn_2`, as that pointer equality must hold according to C standard.
The approach taken here is we expect the symbol's table index to be defined
in
the *loading module*, and import that index with that `GOT.func` import. The
host (e.g. wasmtime or the browser) is then responsible for generating a table
index for this function in the loading module and providing that
"GOT.func"
import in load time. In importing module code we use that index for c_fn_2, so
if the importing module does something like `c_fn_2 == c_fn()` that's
evaluated
to `true` as expected.
Can anyone deny or confirm that this is indeed the reason for that `GOT.func`
import?
Thanks,
Ömer
[1]: https://clang.llvm.org/docs/LTOVisibility.html
Thomas Lively <tlively at google.com>, 6 Ağu 2020 Per, 00:54 tarihinde
şunu yazdı:>
> +Sam Clegg Is the expert on this dynamic linking stuff.
>
> A lot of us WebAssembly toolchain folks have been hanging out in the
WebAssembly Discord, especially on the #emscripten channel, so that would be
another good place to ask future WebAssembly-specific questions.
>
> On Tue, Aug 4, 2020 at 9:06 PM Ömer Sinan Ağacan via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
>>
>> Hi,
>>
>> Sorry if you've seen this message before on llvm.discourse.group or
elsewhere --
>> I've been trying to get to the bottom of this for a while now and
asked about
>> this in a few different platforms before.
>>
>> I'm currently trying to debug a bug in a LLVM-generated Wasm code.
The bug could
>> be in the code that generates LLVM (rustc) or in the LLVM, I'm not
sure yet.
>> LLVM IR and Wasm can be seen in [1].
>>
>> The problem is this line:
>>
>> (import "GOT.func"
>>
"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E"
>> (global (;3;) (mut i32)))
>>
>> The same symbol is already imported from "env" in the same
module:
>>
>> (import "env"
>>
"_ZN5core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E"
>> (func (;4;) (type 1)))
>>
>> So there's no need to import it from "GOT.func" and I
want to get rid of that
>> "GOT.func" import.
>>
>> This LLVM IR is generated when compiling Rust code to a
"staticlib", which is
>> supposed to include *all* dependencies of the code so that it'll be
linkable
>> with code for other languages. Because of the "GOT.func"
import this module is
>> not linkable, it needs to resolve that "GOT.func" import in
runtime using
>> dynamic linking for Wasm [2].
>>
>> I'm trying to understand whether this is a rustc bug or an LLVM
bug. I'm using
>> LLVM 10 downloaded from the official web page and rustc nightly. I can
build
>> LLVM from source and use it, but I don't have any experience in
LLVM code base.
>> Questions:
>>
>> - Given a reference to a symbol, how does LLVM decide how to import it?
>> Currently I see these uses of the problematic symbol in LLVM IR:
>>
>> - `store i8* bitcast (i1 (i32*, %"core::fmt::Formatter"*)*
>>
@"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E"
>> to i8*), i8** %11, align 4`
>>
>> - `store i8* bitcast (i1 (i32*, %"core::fmt::Formatter"*)*
>>
@"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E"
>> to i8*), i8** %14, align 4`
>>
>> - `store i8* bitcast (i1 (i32*, %"core::fmt::Formatter"*)*
>>
@"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E"
>> to i8*), i8** %17, align 4`
>>
>> - `declare zeroext i1
>>
@"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E"(i32*
>> noalias readonly align 4 dereferenceable(4),
%"core::fmt::Formatter"*
>> align 4 dereferenceable(36)) unnamed_addr #1`
>>
>> First three look very similar so I'm guessing the first three are
causing one
>> of those imports, and the last one is causing the other import, but
I'm not
>> sure which one is generating which import. Any ideas?
>>
>> - Any suggestions on how to debug this? Just knowing which line in the
LLVM IR
>> listed above causes this "GOT.func" import would be
helpful.
>>
>> Thanks,
>>
>> Ömer
>>
>> [1]: https://gist.github.com/osa1/4c672fe8998c8e8768cf9f7c014c61d8
>> [2]:
https://github.com/WebAssembly/tool-conventions/blob/master/DynamicLinking.md
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev