Ömer Sinan Ağacan via llvm-dev
2020-Aug-05 04:05 UTC
[llvm-dev] Debugging a potential bug when generating wasm32
Hi, Sorry if you've seen this message before on llvm.discourse.group or elsewhere -- I've been trying to get to the bottom of this for a while now and asked about this in a few different platforms before. I'm currently trying to debug a bug in a LLVM-generated Wasm code. The bug could be in the code that generates LLVM (rustc) or in the LLVM, I'm not sure yet. LLVM IR and Wasm can be seen in [1]. The problem is this line: (import "GOT.func" "_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E" (global (;3;) (mut i32))) The same symbol is already imported from "env" in the same module: (import "env" "_ZN5core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E" (func (;4;) (type 1))) So there's no need to import it from "GOT.func" and I want to get rid of that "GOT.func" import. This LLVM IR is generated when compiling Rust code to a "staticlib", which is supposed to include *all* dependencies of the code so that it'll be linkable with code for other languages. Because of the "GOT.func" import this module is not linkable, it needs to resolve that "GOT.func" import in runtime using dynamic linking for Wasm [2]. I'm trying to understand whether this is a rustc bug or an LLVM bug. I'm using LLVM 10 downloaded from the official web page and rustc nightly. I can build LLVM from source and use it, but I don't have any experience in LLVM code base. Questions: - Given a reference to a symbol, how does LLVM decide how to import it? Currently I see these uses of the problematic symbol in LLVM IR: - `store i8* bitcast (i1 (i32*, %"core::fmt::Formatter"*)* @"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E" to i8*), i8** %11, align 4` - `store i8* bitcast (i1 (i32*, %"core::fmt::Formatter"*)* @"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E" to i8*), i8** %14, align 4` - `store i8* bitcast (i1 (i32*, %"core::fmt::Formatter"*)* @"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E" to i8*), i8** %17, align 4` - `declare zeroext i1 @"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E"(i32* noalias readonly align 4 dereferenceable(4), %"core::fmt::Formatter"* align 4 dereferenceable(36)) unnamed_addr #1` First three look very similar so I'm guessing the first three are causing one of those imports, and the last one is causing the other import, but I'm not sure which one is generating which import. Any ideas? - Any suggestions on how to debug this? Just knowing which line in the LLVM IR listed above causes this "GOT.func" import would be helpful. Thanks, Ömer [1]: https://gist.github.com/osa1/4c672fe8998c8e8768cf9f7c014c61d8 [2]: https://github.com/WebAssembly/tool-conventions/blob/master/DynamicLinking.md
Thomas Lively via llvm-dev
2020-Aug-05 21:53 UTC
[llvm-dev] Debugging a potential bug when generating wasm32
+Sam Clegg <sbc at google.com> Is the expert on this dynamic linking stuff. A lot of us WebAssembly toolchain folks have been hanging out in the WebAssembly Discord <https://discord.gg/6UCUaMP>, especially on the #emscripten channel, so that would be another good place to ask future WebAssembly-specific questions. On Tue, Aug 4, 2020 at 9:06 PM Ömer Sinan Ağacan via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi, > > Sorry if you've seen this message before on llvm.discourse.group or > elsewhere -- > I've been trying to get to the bottom of this for a while now and asked > about > this in a few different platforms before. > > I'm currently trying to debug a bug in a LLVM-generated Wasm code. The bug > could > be in the code that generates LLVM (rustc) or in the LLVM, I'm not sure > yet. > LLVM IR and Wasm can be seen in [1]. > > The problem is this line: > > (import "GOT.func" > > "_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E" > (global (;3;) (mut i32))) > > The same symbol is already imported from "env" in the same module: > > (import "env" > > "_ZN5core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E" > (func (;4;) (type 1))) > > So there's no need to import it from "GOT.func" and I want to get rid of > that > "GOT.func" import. > > This LLVM IR is generated when compiling Rust code to a "staticlib", which > is > supposed to include *all* dependencies of the code so that it'll be > linkable > with code for other languages. Because of the "GOT.func" import this > module is > not linkable, it needs to resolve that "GOT.func" import in runtime using > dynamic linking for Wasm [2]. > > I'm trying to understand whether this is a rustc bug or an LLVM bug. I'm > using > LLVM 10 downloaded from the official web page and rustc nightly. I can > build > LLVM from source and use it, but I don't have any experience in LLVM code > base. > Questions: > > - Given a reference to a symbol, how does LLVM decide how to import it? > Currently I see these uses of the problematic symbol in LLVM IR: > > - `store i8* bitcast (i1 (i32*, %"core::fmt::Formatter"*)* > > @"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E" > to i8*), i8** %11, align 4` > > - `store i8* bitcast (i1 (i32*, %"core::fmt::Formatter"*)* > > @"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E" > to i8*), i8** %14, align 4` > > - `store i8* bitcast (i1 (i32*, %"core::fmt::Formatter"*)* > > @"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E" > to i8*), i8** %17, align 4` > > - `declare zeroext i1 > > @"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E"(i32* > noalias readonly align 4 dereferenceable(4), %"core::fmt::Formatter"* > align 4 dereferenceable(36)) unnamed_addr #1` > > First three look very similar so I'm guessing the first three are > causing one > of those imports, and the last one is causing the other import, but I'm > not > sure which one is generating which import. Any ideas? > > - Any suggestions on how to debug this? Just knowing which line in the > LLVM IR > listed above causes this "GOT.func" import would be helpful. > > Thanks, > > Ömer > > [1]: https://gist.github.com/osa1/4c672fe8998c8e8768cf9f7c014c61d8 > [2]: > https://github.com/WebAssembly/tool-conventions/blob/master/DynamicLinking.md > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200805/5f89e5bf/attachment.html>
Ömer Sinan Ağacan via llvm-dev
2020-Aug-06 14:29 UTC
[llvm-dev] Debugging a potential bug when generating wasm32
Thanks, I joined Discord as well. Here's an update: I was able to come up with a tiny C program which is compiled to a similar "GOT.func" import. C code: __attribute__ ((visibility("default"))) int c_fn_2(int x, int y) { return x + y; } __attribute__ ((visibility("default"))) int (*c_fn(void)) (int x, int y) { return &c_fn_2; } Compile with: clang-10 -fPIC --target=wasm32-unknown-emscripten test.c -c -o test.o -O wasm2wat test.o > test.wat Generated wat: (module (type (;0;) (func (param i32 i32) (result i32))) (type (;1;) (func (result i32))) (import "env" "__linear_memory" (memory (;0;) 0)) (import "env" "__indirect_function_table" (table (;0;) 0 funcref)) (import "GOT.func" "c_fn_2" (global (;0;) (mut i32))) (func $c_fn_2 (type 0) (param i32 i32) (result i32) local.get 1 local.get 0 i32.add) (func $c_fn (type 1) (result i32) global.get 0)) Now if I remove the visibility attribute in `c_fn_2` the GOT.func import disappears: (module (type (;0;) (func (param i32 i32) (result i32))) (type (;1;) (func (result i32))) (import "env" "__linear_memory" (memory (;0;) 0)) (import "env" "__indirect_function_table" (table (;0;) 0 funcref)) (import "env" "__table_base" (global (;0;) i32)) (func $c_fn_2 (type 0) (param i32 i32) (result i32) local.get 1 local.get 0 i32.add) (func $c_fn (type 1) (result i32) global.get 0 i32.const 0 i32.add)) Comparing these two programs I think I see a potential answer to my question of why this GOT.func import is needed. I can't find documentation how what these attributes mean exactly (the closest one I could find is [1]), but I think without `visibility("default")` the symbol is not visible in other compilation units (here I think "compilation unit" means a module in Wasm, though it may also be a C compilation unit, I'm not sure), in other words it's DSO-local. When it's DSO-local the importing modules do not need to know about the function's table index as they can never refer to it directly (e.g. the C expression `c_fn_2` doesn't make sense in the importing modules). When we add `visibility("default")` and make c_fn_2 visible in other modules we need to be able to compare return value of `c_fn` with the value of symbol `c_fn_2`, as that pointer equality must hold according to C standard. The approach taken here is we expect the symbol's table index to be defined in the *loading module*, and import that index with that `GOT.func` import. The host (e.g. wasmtime or the browser) is then responsible for generating a table index for this function in the loading module and providing that "GOT.func" import in load time. In importing module code we use that index for c_fn_2, so if the importing module does something like `c_fn_2 == c_fn()` that's evaluated to `true` as expected. Can anyone deny or confirm that this is indeed the reason for that `GOT.func` import? Thanks, Ömer [1]: https://clang.llvm.org/docs/LTOVisibility.html Thomas Lively <tlively at google.com>, 6 Ağu 2020 Per, 00:54 tarihinde şunu yazdı:> > +Sam Clegg Is the expert on this dynamic linking stuff. > > A lot of us WebAssembly toolchain folks have been hanging out in the WebAssembly Discord, especially on the #emscripten channel, so that would be another good place to ask future WebAssembly-specific questions. > > On Tue, Aug 4, 2020 at 9:06 PM Ömer Sinan Ağacan via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> Hi, >> >> Sorry if you've seen this message before on llvm.discourse.group or elsewhere -- >> I've been trying to get to the bottom of this for a while now and asked about >> this in a few different platforms before. >> >> I'm currently trying to debug a bug in a LLVM-generated Wasm code. The bug could >> be in the code that generates LLVM (rustc) or in the LLVM, I'm not sure yet. >> LLVM IR and Wasm can be seen in [1]. >> >> The problem is this line: >> >> (import "GOT.func" >> "_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E" >> (global (;3;) (mut i32))) >> >> The same symbol is already imported from "env" in the same module: >> >> (import "env" >> "_ZN5core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E" >> (func (;4;) (type 1))) >> >> So there's no need to import it from "GOT.func" and I want to get rid of that >> "GOT.func" import. >> >> This LLVM IR is generated when compiling Rust code to a "staticlib", which is >> supposed to include *all* dependencies of the code so that it'll be linkable >> with code for other languages. Because of the "GOT.func" import this module is >> not linkable, it needs to resolve that "GOT.func" import in runtime using >> dynamic linking for Wasm [2]. >> >> I'm trying to understand whether this is a rustc bug or an LLVM bug. I'm using >> LLVM 10 downloaded from the official web page and rustc nightly. I can build >> LLVM from source and use it, but I don't have any experience in LLVM code base. >> Questions: >> >> - Given a reference to a symbol, how does LLVM decide how to import it? >> Currently I see these uses of the problematic symbol in LLVM IR: >> >> - `store i8* bitcast (i1 (i32*, %"core::fmt::Formatter"*)* >> @"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E" >> to i8*), i8** %11, align 4` >> >> - `store i8* bitcast (i1 (i32*, %"core::fmt::Formatter"*)* >> @"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E" >> to i8*), i8** %14, align 4` >> >> - `store i8* bitcast (i1 (i32*, %"core::fmt::Formatter"*)* >> @"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E" >> to i8*), i8** %17, align 4` >> >> - `declare zeroext i1 >> @"_ZN4core3fmt3num3imp52_$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h9ba9fea9cadf7bd5E"(i32* >> noalias readonly align 4 dereferenceable(4), %"core::fmt::Formatter"* >> align 4 dereferenceable(36)) unnamed_addr #1` >> >> First three look very similar so I'm guessing the first three are causing one >> of those imports, and the last one is causing the other import, but I'm not >> sure which one is generating which import. Any ideas? >> >> - Any suggestions on how to debug this? Just knowing which line in the LLVM IR >> listed above causes this "GOT.func" import would be helpful. >> >> Thanks, >> >> Ömer >> >> [1]: https://gist.github.com/osa1/4c672fe8998c8e8768cf9f7c014c61d8 >> [2]: https://github.com/WebAssembly/tool-conventions/blob/master/DynamicLinking.md >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev