Martin Ramsdale via llvm-dev
2021-May-17 10:53 UTC
[llvm-dev] Potential bug in LLD when wrapping symbols
Hi There appears to be some bugs in LLD's handling of the '--wrap' option that results in unexpected "undefined symbols" in the output. I've broken this report into { 1. Issues; 2. Reproduction; 3. Reproduction Tests; 4: Side Observations }. Please can somebody take a look at {1,2,3} to confirm if these are genuine issues? Many Thanks, Martin 1) Issues: ========= 1.1) Using '--wrap x' results in undefined references to 'x' even when '__real_x' isn't used in the source code. - i.e. The symbol table implies there is a dependency where one doesn't exist - Present in at least LLD-9, LLD-11 1.2) Using '--wrap x' where 'x' is in "llvm-project/compiler-rt/lib/dfsan/done_abilist.txt"* results in undefined references even when 'x' is never used - i.e. The symbol table implies there is a dependency where one doesn't exist, and exposes internal details of the linker implementation - *There a strong correlation with the symbols on this list, but has NOT been proven as the root cause(!) - Present in at least LLD-9, LLD-11 Whilst investigating the above I also came across the following issue that appears to be resolved in LLD-11, but am recording here in case it is relevant, or helps anybody else who encounters the issue: 1.3) Using '--wrap x' results in undefined references to '__real_x' - i.e. The symbol table implies there is a dependency where one doesn't exist - Present in at least LLD-9, and resolved via https://reviews.llvm.org/D34993 2) Reproduction ============== 2.1) Source setup ----------------- Source file, foo.c, which has function calls to: a) bar_fn_not_wrapped - A regular function, no wrapping - Expect undefined symbols: bar_fn_not_wrapped - Unexpected undefined symbols: __wrap_bar_fn_not_wrapped, __real_bar_fn_not_wrapped b) bar_fn_wrapped – In the tests we’ll wrap this call - Expect undefined symbols: __wrap_bar_fn_wrapped - Unexpected undefined symbols: bar_fn_wrapped, __real_bar_fn_wrapped c) gettimeofday – In the tests we’ll wrap this call - Expect undefined symbols: __wrap_gettimeofday - Unexpected undefined symbols: gettimeofday, __real_gettimeofday Also note foo.c does NOT call the following functions: d) sigaction – In the tests we’ll wrap this call - Expected undefined symbols: <none> - Unexpected undefined symbols: sigation, __wrap_sigaction, __real_sigaction e) bar_fn_other – In the tests we’ll wrap this call - Expected undefined symbols: <none> - Unexpected undefined symbols: bar_fn_other, __wrap_bar_fn_other, __real_bar_fn_other This is all summarized in the table below, and we’ll use this to compare against in the tests: +--------------------+-----------------+-------------------+-----------------------------------+ | Symbol x | x used in foo.c | Wrapped in tests? | Expect undefined symbol to ... ? | | | | | x | __wrap_x | __real_x | +--------------------+-----------------+-------------------+-----------+-----------+-----------+ | bar_fn_not_wrapped | Y | N | Y | N | N | | bar_fn_wrapped | Y | Y | N | Y | N | | gettimeofday | Y | Y | N | Y | N | | sigaction | N | Y | N | N | N | | bar_fn_other | N | Y | N | N | N | +--------------------+-----------------+-------------------+-----------+-----------+-----------+ 2.3) foo.c source code ---------------------- /* --- foo.c start --- */ #include <sys/time.h> void bar_fn_not_wrapped(void); void bar_fn_wrapped(void); void foo_fn (void) { struct timeval tv; struct timezone tz; bar_fn_wrapped(); bar_fn_not_wrapped(); (void)gettimeofday(&tv, &tz); } /* --- foo.c end --- */ 2.3) Test setup --------------- For each compiler/linker combination below we’ll run the following command: $ <compiler> [ -fuse-ld=<optional-linker-choice ] -fPIC -shared foo.c -Wl,--wrap=sigaction \ -Wl,--wrap=gettimeofday -Wl,--wrap=bar_fn_wrapped -Wl,--wrap=bar_fn_other -o libfoo.so And search for the interesting symbols using: $ nm -D libfoo.so --undefined-only | grep -E "(sig|get|bar)" | tr -s ' ' | sed 's/^/ /' The compiler/linkers used are: a) gcc 4.7.0, gnu-ld b) clang-9, gnu-ld c) clang-9, llvm-lld-9 d) clang-11, llvm-lld-11 3) Reproduction Tests: =====================NB: Bad results are highlighted with *Y* or *N* a) gcc 4.7.0, gnu-ld U bar_fn_not_wrapped U __wrap_bar_fn_wrapped U __wrap_gettimeofday +--------------------+-----------------+-------------------+-----------------------------------+ | Symbol x | x used in foo.c | Wrapped in tests? | Undefined symbol to ... ? | | | | | x | __wrap_x | __real_x | +--------------------+-----------------+-------------------+-----------+-----------+-----------+ | bar_fn_not_wrapped | Y | N | Y | N | N | | bar_fn_wrapped | Y | Y | N | Y | N | | gettimeofday | Y | Y | N | Y | N | | sigaction | N | Y | N | N | N | | bar_fn_other | N | Y | N | N | N | +--------------------+-----------------+-------------------+-----------+-----------+-----------+ b) clang-9, gnu-ld U bar_fn_not_wrapped U __wrap_bar_fn_wrapped U __wrap_gettimeofday +--------------------+-----------------+-------------------+-----------------------------------+ | Symbol x | x used in foo.c | Wrapped in tests? | Undefined symbol to ... ? | | | | | x | __wrap_x | __real_x | +--------------------+-----------------+-------------------+-----------+-----------+-----------+ | bar_fn_not_wrapped | Y | N | Y | N | N | | bar_fn_wrapped | Y | Y | N | Y | N | | gettimeofday | Y | Y | N | Y | N | | sigaction | N | Y | N | N | N | | bar_fn_other | N | Y | N | N | N | +--------------------+-----------------+-------------------+-----------+-----------+-----------+ c) clang-9, llvm-lld-9 U bar_fn_not_wrapped U bar_fn_wrapped U gettimeofday U __real_bar_fn_wrapped U __real_gettimeofday w __real_sigaction w sigaction U __wrap_bar_fn_wrapped U __wrap_gettimeofday U __wrap_sigaction +--------------------+-----------------+-------------------+-----------------------------------+ | Symbol x | x used in foo.c | Wrapped in tests? | Undefined symbol to ... ? | | | | | x | __wrap_x | __real_x | +--------------------+-----------------+-------------------+-----------+-----------+-----------+ | bar_fn_not_wrapped | Y | N | Y | N | N | | bar_fn_wrapped | Y | Y | *Y* | Y | *Y* | | gettimeofday | Y | Y | *Y* | Y | *Y* | | sigaction | N | Y | *Y* | *Y* | *Y* | | bar_fn_other | N | Y | N | N | N | +--------------------+-----------------+-------------------+-----------+-----------+-----------+ d) clang-11, llvm-lld-11 U __wrap_bar_fn_wrapped U __wrap_gettimeofday U __wrap_sigaction U bar_fn_not_wrapped U bar_fn_wrapped U gettimeofday w sigaction +--------------------+-----------------+-------------------+-----------------------------------+ | Symbol x | x used in foo.c | Wrapped in tests? | Undefined symbol to ... ? | | | | | x | __wrap_x | __real_x | +--------------------+-----------------+-------------------+-----------+-----------+-----------+ | bar_fn_not_wrapped | Y | N | Y | N | N | | bar_fn_wrapped | Y | Y | *Y* | Y | N | | gettimeofday | Y | Y | *Y* | Y | N | | sigaction | N | Y | *Y* | *Y* | N | | bar_fn_other | N | Y | N | N | N | +--------------------+-----------------+-------------------+-----------+-----------+-----------+ 4) Side observations: ====================A few observations made whilst investigating the main issues. It's likely that these won't be critical to the main report, but are left here in case it aids discussion on this topic: a) I can’t find much online regarding this behaviour. One interesting reference, although doesn’t explain the above, is at http://maskray.me/blog/2020-12-19-lld-and-gnu-linker-incompatibilities: - """ Semantics of --wrap: GNU ld hand LLD have slightly different --wrap semantics. I use "slightly" because in most use cases users will not observe a difference. In GNU ld, --wrap only applies to undefined symbols. In LLD, --wrap happens after all other symbol resolution steps. The implementation is to mangle the symbol table of each object file (foo -> __wrap_foo; __real_foo -> foo) so that all relocations to foo or __real_foo will be redirected. The LLD semantics have the advantage that non-LTO, LTO and relocatable link behaviors are consistent. I filed https://sourceware.org/bugzilla/show_bug.cgi?id=26358 for GNU ld. """ - Looking at the corresponding bug https://sourceware.org/bugzilla/show_bug.cgi?id=26358, the suggestion is that LLD is more consistent, but when I’ve tried the steps for their partial linking example it demonstrates different behaviour. b) I wonder if the run-time behaviour with these unexpected undefined symbols is impacted? e.g. what happens with -z,now? (Not investigated) c) I wonder if the build-link-time behaviour is impacted. e.g. what happens when linking with a library dependency that has one of these missing symbol dependencies? (Not investigated) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210517/7ea00e63/attachment-0001.html>
Martin Ramsdale via llvm-dev
2021-May-17 11:49 UTC
[llvm-dev] Potential bug in LLD when wrapping symbols
[ Self-reply to improve formatting due to line-length wrapping ] Hi There appears to be some bugs in LLD's handling of the '--wrap' option that results in unexpected "undefined symbols" in the output. I've broken this report into { 1. Issues; 2. Reproduction; 3. Reproduction Tests; 4: Side Observations }. Please can somebody take a look at {1,2,3} to confirm if these are genuine issues? Many Thanks, Martin 1) Issues: ========= 1.1) Using '--wrap x' results in undefined references to 'x' even when '__real_x' isn't used inthe source code. - i.e. The symbol table implies there is a dependency where one doesn't exist - Present in at least LLD-9, LLD-11 1.2) Using '--wrap x' where 'x' is in "llvm-project/compiler-rt/lib/dfsan/done_abilist.txt"* results in undefined references even when 'x' is never used - i.e. The symbol table implies there is a dependency where one doesn't exist, and exposes internal details of the linker implementation - *There a strong correlation with the symbols on this list, but has NOT been proven as the root cause(!) - Present in at least LLD-9, LLD-11 Whilst investigating the above I also came across the following issue that appears to be resolved in LLD-11, but am recording here in case it is relevant, or helps anybody else who encounters the issue: 1.3) Using '--wrap x' results in undefined references to '__real_x' - i.e. The symbol table implies there is a dependency where one doesn't exist - Present in at least LLD-9, and resolved via https://reviews.llvm.org/D34993 2) Reproduction ============== 2.1) Source setup ----------------- Source file, foo.c, which has function calls to: a) bar_fn_not_wrapped - A regular function, no wrapping - Expect undefined symbols: bar_fn_not_wrapped - Unexpected undefined symbols: __wrap_bar_fn_not_wrapped, __real_bar_fn_not_wrapped b) bar_fn_wrapped – In the tests we’ll wrap this call - Expect undefined symbols: __wrap_bar_fn_wrapped - Unexpected undefined symbols: bar_fn_wrapped, __real_bar_fn_wrapped c) gettimeofday – In the tests we’ll wrap this call - Expect undefined symbols: __wrap_gettimeofday - Unexpected undefined symbols: gettimeofday, __real_gettimeofday Also note foo.c does NOT call the following functions: d) sigaction – In the tests we’ll wrap this call - Expected undefined symbols: <none> - Unexpected undefined symbols: sigation, __wrap_sigaction, __real_sigaction e) bar_fn_other – In the tests we’ll wrap this call - Expected undefined symbols: <none> - Unexpected undefined symbols: bar_fn_other, __wrap_bar_fn_other, __real_bar_fn_other This is all summarized in the table below, and we’ll use this to compare against in the tests: +--------------------+------+----------+--------------------------------+ | Symbol x | Used | Wrapped? | Expect ...? | | | | | x | __wrap_x | __real_x | +--------------------+------+----------+----------+----------+----------+ | bar_fn_not_wrapped | Y | N | Y | N | N | | bar_fn_wrapped | Y | Y | N | Y | N | | gettimeofday | Y | Y | N | Y | N | | sigaction | N | Y | N | N | N | | bar_fn_other | N | Y | N | N | N | +--------------------+------+----------+----------+----------+----------+ 2.3) foo.c source code ---------------------- /* --- foo.c start --- */ #include <sys/time.h> void bar_fn_not_wrapped(void); void bar_fn_wrapped(void); void foo_fn (void) { struct timeval tv; struct timezone tz; bar_fn_wrapped(); bar_fn_not_wrapped(); (void)gettimeofday(&tv, &tz); } /* --- foo.c end --- */ 2.3) Test setup --------------- For each compiler/linker combination below we’ll run the following command: $ <compiler> [ -fuse-ld=<optional-linker-choice ] -fPIC -shared foo.c -Wl,--wrap=sigaction \ -Wl,--wrap=gettimeofday -Wl,--wrap=bar_fn_wrapped -Wl,--wrap=bar_fn_other -o libfoo.so And search for the interesting symbols using: $ nm -D libfoo.so --undefined-only | grep -E "(sig|get|bar)" | tr -s ' ' | sed 's/^/ /' The compiler/linkers used are: a) gcc 4.7.0, gnu-ld b) clang-9, gnu-ld c) clang-9, llvm-lld-9 d) clang-11, llvm-lld-11 3) Reproduction Tests: =====================NB: Bad results are highlighted with *Y* or *N* a) gcc 4.7.0, gnu-ld U bar_fn_not_wrapped U __wrap_bar_fn_wrapped U __wrap_gettimeofday +--------------------+------+----------+--------------------------------+ | Symbol x | Used | Wrapped? | Undefined symbol ...? | | | | | x | __wrap_x | __real_x | +--------------------+------+----------+----------+----------+----------+ | bar_fn_not_wrapped | Y | N | Y | N | N | | bar_fn_wrapped | Y | Y | N | Y | N | | gettimeofday | Y | Y | N | Y | N | | sigaction | N | Y | N | N | N | | bar_fn_other | N | Y | N | N | N | +--------------------+------+----------+----------+----------+----------+ b) clang-9, gnu-ld U bar_fn_not_wrapped U __wrap_bar_fn_wrapped U __wrap_gettimeofday +--------------------+------+----------+--------------------------------+ | Symbol x | Used | Wrapped? | Undefined symbol ...? | | | | | x | __wrap_x | __real_x | +--------------------+------+----------+----------+----------+----------+ | bar_fn_not_wrapped | Y | N | Y | N | N | | bar_fn_wrapped | Y | Y | N | Y | N | | gettimeofday | Y | Y | N | Y | N | | sigaction | N | Y | N | N | N | | bar_fn_other | N | Y | N | N | N | +--------------------+------+----------+----------+----------+----------+ c) clang-9, llvm-lld-9 U bar_fn_not_wrapped U bar_fn_wrapped U gettimeofday U __real_bar_fn_wrapped U __real_gettimeofday w __real_sigaction w sigaction U __wrap_bar_fn_wrapped U __wrap_gettimeofday U __wrap_sigaction +--------------------+------+----------+--------------------------------+ | Symbol x | Used | Wrapped? | Undefined symbol ...? | | | | | x | __wrap_x | __real_x | +--------------------+------+----------+----------+----------+----------+ | bar_fn_not_wrapped | Y | N | Y | N | N | | bar_fn_wrapped | Y | Y | *Y* | Y | *Y* | | gettimeofday | Y | Y | *Y* | Y | *Y* | | sigaction | N | Y | *Y* | *Y* | *Y* | | bar_fn_other | N | Y | N | N | N | +--------------------+------+----------+----------+----------+----------+ d) clang-11, llvm-lld-11 U __wrap_bar_fn_wrapped U __wrap_gettimeofday U __wrap_sigaction U bar_fn_not_wrapped U bar_fn_wrapped U gettimeofday w sigaction +--------------------+------+----------+--------------------------------+ | Symbol x | Used | Wrapped? | Undefined symbol ...? | | | | | x | __wrap_x | __real_x | +--------------------+------+----------+----------+----------+----------+ | bar_fn_not_wrapped | Y | N | Y | N | N | | bar_fn_wrapped | Y | Y | *Y* | Y | N | | gettimeofday | Y | Y | *Y* | Y | N | | sigaction | N | Y | *Y* | *Y* | N | | bar_fn_other | N | Y | N | N | N | +--------------------+------+----------+----------+----------+----------+ 4) Side observations: ====================A few observations made whilst investigating the main issues. It's likely that these won't be critical to the main report, but are left here in case it aids discussion on this topic: a) I can’t find much online regarding this behaviour. One interesting reference, although doesn’t explain the above, is at http://maskray.me/blog/2020-12-19-lld-and-gnu-linker-incompatibilities: - """ Semantics of --wrap: GNU ld hand LLD have slightly different --wrap semantics. I use "slightly" because in most use cases users will not observe a difference. In GNU ld, --wrap only applies to undefined symbols. In LLD, --wrap happens after all other symbol resolution steps. The implementation is to mangle the symbol table of each object file (foo -> __wrap_foo; __real_foo -> foo) so that all relocations to foo or __real_foo will be redirected. The LLD semantics have the advantage that non-LTO, LTO and relocatable link behaviors are consistent. I filed https://sourceware.org/bugzilla/show_bug.cgi?id=26358 for GNU ld. """ - Looking at the corresponding bug https://sourceware.org/bugzilla/show_bug.cgi?id=26358, the suggestion is that LLD is more consistent, but when I’ve tried the steps for their partial linking example it demonstrates different behaviour. b) I wonder if the run-time behaviour with these unexpected undefined symbols is impacted? e.g. what happens with -z,now? (Not investigated) c) I wonder if the build-link-time behaviour is impacted. e.g. what happens when linking with a library dependency that has one of these missing symbol dependencies? (Not investigated)