Jan Mikkelsen
2015-Jul-15 23:18 UTC
amd64 kernel dynamic linking allows extern references to statics
> On 15 Jul 2015, at 11:27 pm, Konstantin Belousov <kostikbel at gmail.com> wrote: > > On Wed, Jul 15, 2015 at 06:17:20PM +1000, Jan Mikkelsen wrote: >> Hi, >> >> (All on 10.2-BETA1.) >> >> I noticed that the latest patch in the bug https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594 <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594> works on amd64 but fails to load zfs.ko on i386 with a symbol not found error. >> >> Looking at the patch, there is one file that has ???extern int zio_use_uma??? to reference a variable that is declared elsewhere as ???static int zio_use_uma???. To me this obviously should not work. However it does work on amd64 but fails on i386. >> >> Below is a small test case that reproduces the problem. The generated kernel module loads on amd64 but fails on i386. On amd64 one compilation unit is accessing a static in from another compilation unit by declaring the variable ???extern???. >> >> I haven???t looked further to attempt to find the bug. However, it looks like a Bad Thing??? to me. >> > > I am not sure that this is fixable. Issue is that amd64 modules are > relinked object files, and they might have unresolved relocations against > local symbols. Change like the following probably fix your test case, > but also quite possible would break legitimate local references. > > diff --git a/sys/kern/link_elf_obj.c b/sys/kern/link_elf_obj.c > index 021381d..6fa5276 100644 > --- a/sys/kern/link_elf_obj.c > +++ b/sys/kern/link_elf_obj.c > @@ -1096,7 +1096,8 @@ link_elf_lookup_symbol(linker_file_t lf, const char *name, c_linker_sym_t *sym) > > for (i = 0, symp = ef->ddbsymtab; i < ef->ddbsymcnt; i++, symp++) { > strp = ef->ddbstrtab + symp->st_name; > - if (symp->st_shndx != SHN_UNDEF && strcmp(name, strp) == 0) { > + if (symp->st_shndx != SHN_UNDEF && strcmp(name, strp) == 0 && > + ELF_ST_BIND(symp->st_info) != STB_LOCAL) { > *sym = (c_linker_sym_t) symp; > return 0; > }I don?t know why there could be an unresolved relocation against a local symbol. My (admittedly trivial) tests with nm and static functions/variables have not led to a ?U? record. Under what circumstances would this happen? Separately to this is that this behaviour also defeats multiple definition checking at link time. Modifying my test slightly to have one compilation unit with ?static int testvar?, one compilation unit with ?int testvar? and one with ?extern int testvar? gives nm output like this from the .ko file: 00000000000000d0 d testvar 00000000000000c0 d testvar It is unclear which instance of testvar the ?extern? declaration used for resolution. Simple testing seems to show it was the one with the non-static storage class. However, a piece of code depending on the previous resolution to local names could break by the addition a file with a name clash. Also, not no ?U? record ? the ?extern int testvar? declaration has been resolved at this point. Making both definitions static gives this: U testvar 00000000000000c0 d testvar 00000000000000d0 d testvar Which testvar will I get at load time? Who knows? Adding a file with a static variable with a name clash in another compilation unit can change the behaviour of a module. The big question for me is: under which legitimate case is this feature necessary? The downsides seem significant. Regards, Jan.
Konstantin Belousov
2015-Jul-16 13:01 UTC
amd64 kernel dynamic linking allows extern references to statics
On Thu, Jul 16, 2015 at 09:18:15AM +1000, Jan Mikkelsen wrote:> > > On 15 Jul 2015, at 11:27 pm, Konstantin Belousov <kostikbel at gmail.com> wrote: > > > > On Wed, Jul 15, 2015 at 06:17:20PM +1000, Jan Mikkelsen wrote: > >> Hi, > >> > >> (All on 10.2-BETA1.) > >> > >> I noticed that the latest patch in the bug https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594 <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594> works on amd64 but fails to load zfs.ko on i386 with a symbol not found error. > >> > >> Looking at the patch, there is one file that has ???extern int zio_use_uma??? to reference a variable that is declared elsewhere as ???static int zio_use_uma???. To me this obviously should not work. However it does work on amd64 but fails on i386. > >> > >> Below is a small test case that reproduces the problem. The generated kernel module loads on amd64 but fails on i386. On amd64 one compilation unit is accessing a static in from another compilation unit by declaring the variable ???extern???. > >> > >> I haven???t looked further to attempt to find the bug. However, it looks like a Bad Thing??? to me. > >> > > > > I am not sure that this is fixable. Issue is that amd64 modules are > > relinked object files, and they might have unresolved relocations against > > local symbols. Change like the following probably fix your test case, > > but also quite possible would break legitimate local references. > > > > diff --git a/sys/kern/link_elf_obj.c b/sys/kern/link_elf_obj.c > > index 021381d..6fa5276 100644 > > --- a/sys/kern/link_elf_obj.c > > +++ b/sys/kern/link_elf_obj.c > > @@ -1096,7 +1096,8 @@ link_elf_lookup_symbol(linker_file_t lf, const char *name, c_linker_sym_t *sym) > > > > for (i = 0, symp = ef->ddbsymtab; i < ef->ddbsymcnt; i++, symp++) { > > strp = ef->ddbstrtab + symp->st_name; > > - if (symp->st_shndx != SHN_UNDEF && strcmp(name, strp) == 0) { > > + if (symp->st_shndx != SHN_UNDEF && strcmp(name, strp) == 0 && > > + ELF_ST_BIND(symp->st_info) != STB_LOCAL) { > > *sym = (c_linker_sym_t) symp; > > return 0; > > } > > I don???t know why there could be an unresolved relocation against a local symbol. My (admittedly trivial) tests with nm and static functions/variables have not led to a ???U??? record. Under what circumstances would this happen?Assembler cannot know the final location for any non-local data, so any variable, global or static, requires a relocation in the object file for the final code to work properly. Look at the most trivial example static int a; int f(int x) { return (a + x); } int *g(void) { return (&a); } The g() function is needed for prevent too smart gcc optimizer from noting that nothing could modify a and replace its uses with the literal 0. The generated object file, after disassembly, is 0000000000000000 <f>: 0: 89 f8 mov %edi,%eax 2: 03 05 00 00 00 00 add 0x0(%rip),%eax # 8 <f+0x8> 4: R_X86_64_PC32 .bss-0x4 8: c3 retq The relocation is X86_64_PC32, which takes a symbol as displacement. In this case, compiler decided to use the segment base for the symbol, but it is allowed to use the local symbol directly. Only linker during the final link (not the incremental -r link) can fill the word which provides displacement for the rip-relative addressing to fetch a value.> > Separately to this is that this behaviour also defeats multiple definition checking at link time. Modifying my test slightly to have one compilation unit with ???static int testvar???, one compilation unit with ???int testvar??? and one with ???extern int testvar??? gives nm output like this from the .ko file: > > 00000000000000d0 d testvar > 00000000000000c0 d testvar > > It is unclear which instance of testvar the ???extern??? declaration used for resolution. Simple testing seems to show it was the one with the non-static storage class. However, a piece of code depending on the previous resolution to local names could break by the addition a file with a name clash. Also, not no ???U??? record ??? the ???extern int testvar??? declaration has been resolved at this point. > > Making both definitions static gives this: > > U testvar > 00000000000000c0 d testvar > 00000000000000d0 d testvar > > Which testvar will I get at load time? Who knows? Adding a file with a static variable with a name clash in another compilation unit can change the behaviour of a module. > > The big question for me is: under which legitimate case is this feature necessary? The downsides seem significant. >Well, might be that adding a check for multiple definitions of the same symbol into the link_elf_obj.c is reasonable, I do not have very strong opinion against it. But also, I do not see a way to do it in non-O(n^2) time, which is probably not the best thing to put into kernel.