Rui Ueyama via llvm-dev
2017-Oct-14 01:10 UTC
[llvm-dev] Weak undefined symbols and dynamic libraries
lld as well as other linkers work hard to make weak undefined symbols work beyond ELF binaries when dynamic linking involved, but unless everything is compiled with -fPIC, they don't actually work as people would expect. I think most people do not know that fact, and even for those who have knowledge on ELF, the current half-broken behavior is confusing and not useful. So I'd like to propose we simplify it. Let me explain why it is half-broken. Assume that we have foo.c with the following contents: __attribute__((weak)) void weakfn(void) {} int main() { if (weakfn) weakfn(); } What it's intended to do is to call weakfn only when the function is defined. If you link foo.o against a shared library providing a definition of weakfn, the symbol is added to the executable's dynamic symbol table as a weak undefined symbol. # Create a shared library $ echo 'void weakfn() { puts("hello"); }' | clang -xc -o bar.so -shared -fPIC - # Link foo.o and bar.so to create an executable $ clang -c foo.c $ clang foo.o bar.so $ LD_LIBRARY_PATH=. ./a.out hello Looks good so far. weakfn is in the dynamic symbol table as a weak undefined symbol. $ objdump --dynamic-syms a.out |grep weak 0000000000400500 w DF *UND* 0000000000000000 weakfn But, is it really weak? Not really. If we remove the symbol from bar.so, the main executable starts to crash. $ clang -xc -o bar.so -shared -fPIC /dev/null $ LD_LIBRARY_PATH=. ./a.out Segmentation fault (core dumped) This is because weakfn is always resolved to its PLT entry's address in the main executable. Since the PLT slot address is not zero, weakfn in `if (weakfn) weakfn()` is always called even if real weakfn is missing. If weakfn is missing, it's PLT entry jumps to address zero, so calling the function caused a crash. We cannot avoid it if we are creating a non-PIC binary, because for non-PIC code, function addresses need to be known at link-time. For imported functions, we use their PLT addresses as their symbol values. Dynamic weak undefined symbol is not representable in non-PIC. If we are linking a position-independent code, weak undefined symbols work fine. In this case, function addresses are read from GOT slots, and their values can be zero or non-zero depending on their load-time symbol resolution results. I think the current behavior is bad. I'd like to propose the following changes: 1. If a linker is creating a non-PIC ELF binary, and if it finds a DSO symbol foo for an undefined weak symbol foo, then it adds foo as a *strong* undefined symbol to the dynamic symbol table. This prevents the above crash because the program fails to start if foo is not found at load-time, instead of crashing at run-time. 2. If a linker is creating a non-PIC ELF binary, and if it *cannot* find a DSO symbol foo for an undefined weak symbol foo, then it *does not* add foo to the dynamic symbol table, and it sets foo's value to zero. In other words, my suggestion is to make the linker to not try too hard for weak undefined symbols in non-PIC. In non-PIC, if weak undefined symbols cannot be resolved at link-time, their values should be set to zero. Otherwise, they should be handled as if they were regular undefined symbol. I believe it simplifies the semantics and also simplifies the implementation of the linker. What do you think? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171013/47ad77ae/attachment-0001.html>
Rafael Avila de Espindola via llvm-dev
2017-Oct-14 06:27 UTC
[llvm-dev] Weak undefined symbols and dynamic libraries
Rui Ueyama <ruiu at google.com> writes:> I think the current behavior is bad. I'd like to propose the following > changes: > > 1. If a linker is creating a non-PIC ELF binary, and if it finds a DSO > symbol foo for an undefined weak symbol foo, then it adds foo as a *strong* > undefined symbol to the dynamic symbol table. This prevents the above crash > because the program fails to start if foo is not found at load-time, > instead of crashing at run-time. > > 2. If a linker is creating a non-PIC ELF binary, and if it *cannot* find a > DSO symbol foo for an undefined weak symbol foo, then it *does not* add foo to > the dynamic symbol table, and it sets foo's value to zero.I would not phrase this as pic/non-pic. From the linker point of view there are just relocations. I assume then that the intention is: ----------------------------------------------------------------- Sometimes a linker has to create a symbol in the main binary so that it is preempted from a shared library at runtime. That symbol is then used with a copy relocation if it is an object or a special plt entry if it is a function. If the symbol in question was a weak undefined: * If the symbol was found in a .so the resulting undefined reference will be strong. * If the symbol was not found in a .so, it is resolved to 0 and there is no undefined reference. If no relocation requires the symbol to be preempted to the main executable (all relocations use a got for example) then there will still be an weak undefined reference since the dynamic linker will be able to handle the symbol existing or not. ----------------------------------------------------------------- I agree that that is probably a good change. Cheers, Rafael
Rui Ueyama via llvm-dev
2017-Oct-15 21:55 UTC
[llvm-dev] Weak undefined symbols and dynamic libraries
On Fri, Oct 13, 2017 at 11:27 PM, Rafael Avila de Espindola < rafael.espindola at gmail.com> wrote:> Rui Ueyama <ruiu at google.com> writes: > > > I think the current behavior is bad. I'd like to propose the following > > changes: > > > > 1. If a linker is creating a non-PIC ELF binary, and if it finds a DSO > > symbol foo for an undefined weak symbol foo, then it adds foo as a > *strong* > > undefined symbol to the dynamic symbol table. This prevents the above > crash > > because the program fails to start if foo is not found at load-time, > > instead of crashing at run-time. > > > > 2. If a linker is creating a non-PIC ELF binary, and if it *cannot* find > a > > DSO symbol foo for an undefined weak symbol foo, then it *does not* add > foo to > > the dynamic symbol table, and it sets foo's value to zero. > > I would not phrase this as pic/non-pic. From the linker point of view > there are just relocations. I assume then that the intention is: >We have -shared/-pie options, so my intention was to use these flags. We could use relocations to make a decision whether we should export an weak undefined symbols or not, but I think there are a few issues with that: 1. We cannot make a decision until we visit all relocations, but we need a decision beforehand in order to create GOT entries or report errors. 2. Sometimes we could get mixed signals -- for example, if some object file contains a direct reference to a weak symbol, and other object file contains a GOTPCREL reference to the same symbol, they are somewhat conflicting. So, just using -pie/-shared flags is simple, I guess? -----------------------------------------------------------------> Sometimes a linker has to create a symbol in the main binary so that it > is preempted from a shared library at runtime. That symbol is then used > with a copy relocation if it is an object or a special plt entry if it > is a function. > > If the symbol in question was a weak undefined: > > * If the symbol was found in a .so the resulting undefined reference > will be strong. > * If the symbol was not found in a .so, it is resolved to 0 and there is > no undefined reference. > > If no relocation requires the symbol to be preempted to the main > executable (all relocations use a got for example) then there will still > be an weak undefined reference since the dynamic linker will be able to > handle the symbol existing or not. > ----------------------------------------------------------------- > > I agree that that is probably a good change. > > Cheers, > Rafael >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171015/1648d473/attachment.html>
Possibly Parallel Threads
- [LLVMdev] [lld] ELF needs type for SharedLibraryAtom.
- RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR
- [LLVMdev] [lld] ELF needs type for SharedLibraryAtom.
- [cfe-dev] RFC: Linker feature for automatically partitioning a program into multiple binaries
- Relocations used for PPC32 in non-PIC mode