Rui Ueyama via llvm-dev
2017-Apr-25 04:40 UTC
[llvm-dev] [LLD] Linking static library does not resolve symbols as gold/ld
Hi Martin, Thank you for sending the script. I can reproduce the issue with it. It looks like the program crashes when it tries to call std::vector<sometype>'s ctor from a static initializer. I don't fully understand what is causing the issue yet, but here are my observations. - Since you are creating a temporary object file using `ld.gold -r`, your object file contains multiple weak definitions with the same name, as two or more input files for `ld.gold -r` contains the same template instantiations. This is not immediately an error, and LLD should pick one of them for each unique name, but this might not be workingw ell. - If you create a temporary object file using `ld.lld -r`, it should work. I don't know why, though. I'll continue investigating. On Sat, Apr 15, 2017 at 3:10 PM, Martin Richtarsky <s at martinien.de> wrote:> Hi Rui, > > I finally managed to come up with a reduced example, please find it > attached. You need to have GOLDPATH and LLDPATH set to point to the > respective linkers. > > What happens in build.sh is that an object file is partially linked ("-u") > with gold first, then this is linked with lld to another object file for > the final executable. The resulting executable 'repro' then crashes during > static initialization. > > The following changes make it work: > 1) Using ld instead of gold for the first step > 2) Using ld or gold for the second step > > 2) makes me think there must be something those linkers are doing, but lld > is not, that makes the whole thing work. But note that the crash happens > in a constructor. I found this for the "-u" option in the ld manpage here: > > https://linux.die.net/man/1/ld > > "When linking C++ programs, this option will not resolve references to > constructors; to do that, use -Ur." > > However, gold does not know that option (and ld already works without it) > > Any idea what is going wrong here? > > Thanks and best regards > Martin > > > Hi Martin, > > > > It's hard to tell what is wrong only with the information. If that is an > > open-source program, can you give me a link to that so that I can try? If > > that's a proprietary software you cannot share with me, you might want to > > produce small reproducible test case. > > > > On Thu, Mar 23, 2017 at 1:10 AM, Martin Richtarsky <s at martinien.de> > wrote: > > > >> Hi Rui, > >> > >> fyi I'm still working on a reproducer I can share. > >> > >> >> Here is the relevant output: > >> >> > >> >> 0000000000013832 <func()>: > >> >> 13832: 55 push %rbp > >> >> 13833: 48 89 e5 mov %rsp,%rbp > >> >> 13836: 53 push %rbx > >> >> 13837: 48 83 ec 18 sub $0x18,%rsp > >> >> 1383b: 48 89 7d e8 mov %rdi,-0x18(%rbp) > >> >> 1383f: 48 8b 45 e8 mov -0x18(%rbp),%rax > >> >> 13843: 48 89 c7 mov %rax,%rdi > >> >> 13846: e8 00 00 00 00 callq 1384b <func()+0x19> > >> >> 13847: R_X86_64_PLT32 std::vector<record, > >> >> std::allocator<record> >::vector()-0x4 > >> >> .... > >> >> > >> > > >> > This seems a bit odd. You have type `record` and instantiate > >> std::vector > >> > with `record`. Usually the instantiated template function is in the > >> same > >> > compilation unit, and the relocation type is R_X86_64_PC32, not > >> > R_X86_64_PLT32. > >> > >> It seems to me R_X86_64_PLT32 is not so unusual in this case, e.g. -fPIC > >> already produces this relocation: > >> > >> $ cat example.cpp > >> #include <vector> > >> #include <string> > >> > >> class PropertyReader > >> { > >> public: > >> struct record > >> { > >> std::string a; > >> std::string b; > >> }; > >> PropertyReader(); > >> private: > >> std::vector<record> records; > >> }; > >> > >> PropertyReader::PropertyReader() : records() > >> { > >> } > >> > >> $ g++ -fPIC -c example.cpp -o example.o > >> $ objdump -d -r -C example.o > >> ... > >> 0000000000000000 <PropertyReader::PropertyReader()>: > >> 0: 55 push %rbp > >> 1: 48 89 e5 mov %rsp,%rbp > >> 4: 48 83 ec 10 sub $0x10,%rsp > >> 8: 48 89 7d f8 mov %rdi,-0x8(%rbp) > >> c: 48 8b 45 f8 mov -0x8(%rbp),%rax > >> 10: 48 89 c7 mov %rax,%rdi > >> 13: e8 00 00 00 00 callq 18 > >> <PropertyReader::PropertyReader()+0x18> > >> 14: R_X86_64_PLT32 > >> std::vector<PropertyReader::record, > >> std::allocator<PropertyReader::record> > >> >::vector()-0x4 > >> 18: 90 nop > >> 19: c9 leaveq > >> 1a: c3 retq > >> ... > >> > >> But linking such an object file with lld does not produce the original > >> error so something else is going on. > >> > >> > Let me know if more is needed. > >> >> > >> >> I recall that this object file is created in a bit unusual way, > >> >> something > >> >> like partially linking several other object files together into this > >> >> one, > >> >> but I will have to dig deeper to say for sure. > >> >> > >> > > >> > Yes, it looks like the object file is created in an unusual way, and > >> that > >> > revealed a subtle difference between ld.gold and ld.lld. I want to > >> know > >> > more about that. > >> > > >> > > >> >> Best regards > >> >> Martin > >> >> > >> >> Rui Ueyama wrote: > >> >> > Compilers don't know about functions that are not defined in the > >> same > >> >> > compilation unit, so they leave call instruction operands as zero > >> >> (because > >> >> > they can't compute any absolute nor relative address of the > >> >> destinations), > >> >> > and let linkers fix the address by binary patching. > >> >> > > >> >> > So, what you are seeing is likely a bug of LLD that it fails to fix > >> >> the > >> >> > address for some reason. > >> >> > > >> >> > Can you dump that function with `objdump -d -r that-file.o`? With > >> the > >> >> -r > >> >> > option, objdump prints out relocation records. Relocation records > >> are > >> >> the > >> >> > information that linkers use to fix addresses. > >> >> > > >> >> > On Wed, Mar 15, 2017 at 9:25 AM, Martin Richtarsky <s at martinien.de > > > >> >> wrote: > >> >> > > >> >> >> Hi all, > >> >> >> > >> >> >> I'm currently trying out lld on a large project. We are currently > >> >> using > >> >> >> gold (and used GNU ld before that). > >> >> >> > >> >> >> I have come across a few minor issues but could workaround them: > >> >> >> - Missing support for --defsym=symbol1=symbol2, > >> >> >> --warn-unknown-eh-frame-section, --exclude-libs > >> >> >> > >> >> >> There are two other issues which are more critical, one of which > >> is > >> >> >> currently blocking me, so I would like to find a solution for this > >> >> one > >> >> >> first. > >> >> >> > >> >> >> I have a static library that is linked into an executable. The > >> binary > >> >> >> produced by lld crashes, while the gold version runs fine. > >> >> >> > >> >> >> The difference is in the call instructions below. The original > >> object > >> >> >> file > >> >> >> from the archive has an address of zero in the call instruction: > >> >> >> > >> >> >> 0000000000013832 <func>: > >> >> >> 13832: 55 push %rbp > >> >> >> 13833: 48 89 e5 mov %rsp,%rbp > >> >> >> 13836: 53 push %rbx > >> >> >> 13837: 48 83 ec 18 sub $0x18,%rsp > >> >> >> 1383b: 48 89 7d e8 mov %rdi,-0x18(%rbp) > >> >> >> 1383f: 48 8b 45 e8 mov -0x18(%rbp),%rax > >> >> >> 13843: 48 89 c7 mov %rax,%rdi > >> >> >> -> 13846: e8 00 00 00 00 callq 1384b <func+0x19> > >> >> >> 1384b: 48 8b 45 e8 mov -0x18(%rbp),%rax > >> >> >> > >> >> >> gdb displays this as a jump to the next instruction: > >> >> >> > >> >> >> 0x0000000000013832 <+0>: push %rbp > >> >> >> 0x0000000000013833 <+1>: mov %rsp,%rbp > >> >> >> 0x0000000000013836 <+4>: push %rbx > >> >> >> 0x0000000000013837 <+5>: sub $0x18,%rsp > >> >> >> 0x000000000001383b <+9>: mov %rdi,-0x18(%rbp) > >> >> >> 0x000000000001383f <+13>: mov -0x18(%rbp),%rax > >> >> >> 0x0000000000013843 <+17>: mov %rax,%rdi > >> >> >> 0x0000000000013846 <+20>: callq 0x1384b <func()+25> > >> >> >> 0x000000000001384b <+25>: mov -0x18(%rbp),%rax > >> >> >> > >> >> >> However, in the executable linked by gold, the calls are magically > >> >> >> resolved: > >> >> >> > >> >> >> 0x000000000018b44e <+0>: push %rbp > >> >> >> 0x000000000018b44f <+1>: mov %rsp,%rbp > >> >> >> 0x000000000018b452 <+4>: push %rbx > >> >> >> 0x000000000018b453 <+5>: sub $0x18,%rsp > >> >> >> 0x000000000018b457 <+9>: mov %rdi,-0x18(%rbp) > >> >> >> 0x000000000018b45b <+13>: mov -0x18(%rbp),%rax > >> >> >> 0x000000000018b45f <+17>: mov %rax,%rdi > >> >> >> 0x000000000018b462 <+20>: callq 0x68568c > >> <std::vector<record, > >> >> >> std::allocator<record> >::vector()> > >> >> >> 0x000000000018b467 <+25>: mov -0x18(%rbp),%rax > >> >> >> > >> >> >> Even more interesting, several such call instructions with > >> argument 0 > >> >> >> are > >> >> >> resolved to different functions. So somewhere there must be > >> >> information > >> >> >> stored to what functions they resolve to. > >> >> >> > >> >> >> lld produces this code: > >> >> >> > >> >> >> 0x00005555559f304e <+0>: push %rbp > >> >> >> 0x00005555559f304f <+1>: mov %rsp,%rbp > >> >> >> 0x00005555559f3052 <+4>: push %rbx > >> >> >> 0x00005555559f3053 <+5>: sub $0x18,%rsp > >> >> >> 0x00005555559f3057 <+9>: mov %rdi,-0x18(%rbp) > >> >> >> 0x00005555559f305b <+13>: mov -0x18(%rbp),%rax > >> >> >> 0x00005555559f305f <+17>: mov %rax,%rdi > >> >> >> 0x00005555559f3062 <+20>: callq 0x555555554000 > >> >> >> 0x00005555559f3067 <+25>: mov -0x18(%rbp),%rax > >> >> >> > >> >> >> 0x555555554000 is the start of the mapped region of the > >> executable, > >> >> so > >> >> >> it > >> >> >> seems lld just adds the argument 0 to that without doing any > >> >> relocation > >> >> >> processing. > >> >> >> > >> >> >> Is this a known limitation of lld? > >> >> >> > >> >> >> Thanks and best regards, > >> >> >> Martin > >> >> >> > >> >> > _______________________________________________ > >> >> > LLVM Developers mailing list > >> >> > llvm-dev at lists.llvm.org > >> >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> >> > >> >> > >> > _______________________________________________ > >> > LLVM Developers mailing list > >> > llvm-dev at lists.llvm.org > >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >> > > >> > >> > >> -- > >> http://www.martinien.de/ > >> > >> > >> > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170424/9cc94243/attachment.html>
Rafael Avila de Espindola via llvm-dev
2017-May-15 20:31 UTC
[llvm-dev] [LLD] Linking static library does not resolve symbols as gold/ld
Rui Ueyama via llvm-dev <llvm-dev at lists.llvm.org> writes:> I'll continue investigating.I reduced this to just ----------------------------------- .globl _start _start: callq foo .section .text.bar,"axG", at progbits,abc,comdat .section .text.foo,"axG", at progbits,xyz,comdat .global foo foo: mov $60, %rax mov $0, %rdi syscall ----------------------------------- The original .o contains [ 1] .group GROUP 0000000000000000 000040 000008 04 10 6 4 [ 2] .group GROUP 0000000000000000 000048 000008 04 10 7 4 6: 0000000000000000 0 NOTYPE LOCAL DEFAULT 1 abc 7: 0000000000000000 0 NOTYPE LOCAL DEFAULT 2 xyz I.E, the sh_info points to the symbols where the names are to be found. But the .o produced by gold has: [ 1] abc GROUP 0000000000000000 000040 000008 04 9 1 4 [ 2] xyz GROUP 0000000000000000 000048 000008 04 9 2 4 1: 0000000000000000 0 SECTION LOCAL DEFAULT 1 2: 0000000000000000 0 SECTION LOCAL DEFAULT 2 I.E, the sh_info points to the sections themselves. I will report the relevant bugs. Cheers, Rafael