John McCall via llvm-dev
2016-Mar-08 00:09 UTC
[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR
> On Mar 4, 2016, at 2:48 PM, Peter Collingbourne via llvm-dev <llvm-dev at lists.llvm.org> wrote: > On Mon, Feb 29, 2016 at 1:53 PM, < <mailto:>> wrote: > @A_vtable = {i8*, i8*, i32, i32} {0, @A::rtti, @A::f - (@A_vtable + 16), @A::g - (@A_vtable + 16)} > > There's a subtlety about this aspect of the ABI that I should call attention to. The virtual function references can only be resolved directly by the static linker if they are defined in the same executable/DSO as the virtual table. I expect this to be the overwhelmingly common case, as classes are normally wholly defined within a single executable or DSO, so our implementation should be optimized around that case. > > If we expected cross-DSO references to be relatively common, we could make vtable entries be relative to GOT entries, but that would introduce an additional level of indirection and additional relocations, probably costing us more in binary size and memory bandwidth than the current ABI. > > However, it is technically possible to split the implementation of a class's virtual functions between DSOs, and there are more practical cases where we might expect to see cross-DSO references: > > - one DSO could derive from a class defined in another DSO, and only override some of its virtual functions > - the vtable could contain a reference to __cxa_pure_virtual which would be defined by the standard library > > We can handle these cases by having the vtable refer to a PLT entry for each function that is not defined within the module. This can be done by using a specific type of relative relocation that refers directly to the symbol if defined within the current module, or to a PLT entry if not. This is the same type of relocation that is needed to implement relative branches on x86, so I'd expect it to be generally available on that architecture (ELF has R_{386,X86_64}_PLT32, Mach-O has X86_64_RELOC_BRANCH, COFF has IMAGE_REL_{AMD64,I386}_REL32, which may resolve to a thunk [1], which is essentially the same thing as a PLT entry). It is also present on ARM (R_ARM_PREL31, which was apparently added to support unwind tables). > > We still need some way to create PLT relocations in the vtable's initializer without breaking the semantics of a load from the vtable. Rafael and I discussed this and we believe that if the target function is unnamed_addr, this indicates that the function's address isn't observable (this is true for virtual functions, as it isn't possible to take their address), and so it could be substituted with the address of a PLT entry.This seems like the best way to handle it. It would be nice if this could be requested of an arbitrary function without having to rely on it being unnamed_addr — that is, it would be nice to have “the address of an unnamed_addr function in this linkage unit which is equivalent when called to this other function that’s not necessary within this linkage unit”. It's easy to make an unnamed_addr wrapper function, and maybe we could teach the backend to peephole that to a PLT function reference, but (1) that sounds like some serious backend heroics and (2) it wouldn’t work for variadic functions.> One complication is that on ELF the linker will still create a PLT entry if the symbol has default visibility, in order to support symbol interposition. We can mitigate against that by using protected visibility for virtual functions if they would otherwise receive default visibility.To clarify, this is just a performance problem and doesn’t actually break semantics or feasibility, right? John.> > Thanks, > Peter > > [1] http://llvm.org/klaus/lld/blob/master/COFF/InputFiles.cpp#L-305 <http://llvm.org/klaus/lld/blob/master/COFF/InputFiles.cpp#L-305>_______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160307/d2b1a8d4/attachment.html>
Peter Collingbourne via llvm-dev
2016-Mar-08 00:22 UTC
[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR
On Mon, Mar 7, 2016 at 4:09 PM, John McCall <rjmccall at apple.com> wrote:> On Mar 4, 2016, at 2:48 PM, Peter Collingbourne via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > On Mon, Feb 29, 2016 at 1:53 PM, <> wrote: >> >> @A_vtable = {i8*, i8*, i32, i32} {0, @A::rtti, @A::f - (@A_vtable + 16), >> @A::g - (@A_vtable + 16)} >> > > There's a subtlety about this aspect of the ABI that I should call > attention to. The virtual function references can only be resolved directly > by the static linker if they are defined in the same executable/DSO as the > virtual table. I expect this to be the overwhelmingly common case, as > classes are normally wholly defined within a single executable or DSO, so > our implementation should be optimized around that case. > > If we expected cross-DSO references to be relatively common, we could make > vtable entries be relative to GOT entries, but that would introduce an > additional level of indirection and additional relocations, probably > costing us more in binary size and memory bandwidth than the current ABI. > > However, it is technically possible to split the implementation of a > class's virtual functions between DSOs, and there are more practical cases > where we might expect to see cross-DSO references: > > - one DSO could derive from a class defined in another DSO, and only > override some of its virtual functions > - the vtable could contain a reference to __cxa_pure_virtual which would > be defined by the standard library > > We can handle these cases by having the vtable refer to a PLT entry for > each function that is not defined within the module. This can be done by > using a specific type of relative relocation that refers directly to the > symbol if defined within the current module, or to a PLT entry if not. This > is the same type of relocation that is needed to implement relative > branches on x86, so I'd expect it to be generally available on that > architecture (ELF has R_{386,X86_64}_PLT32, Mach-O has X86_64_RELOC_BRANCH, > COFF has IMAGE_REL_{AMD64,I386}_REL32, which may resolve to a thunk [1], > which is essentially the same thing as a PLT entry). It is also present on > ARM (R_ARM_PREL31, which was apparently added to support unwind tables). > > We still need some way to create PLT relocations in the vtable's > initializer without breaking the semantics of a load from the vtable. > Rafael and I discussed this and we believe that if the target function is > unnamed_addr, this indicates that the function's address isn't observable > (this is true for virtual functions, as it isn't possible to take their > address), and so it could be substituted with the address of a PLT entry. > > > This seems like the best way to handle it. > > It would be nice if this could be requested of an arbitrary function > without having to rely on it being unnamed_addr — that is, it would be nice > to have “the address of an unnamed_addr function in this linkage unit which > is equivalent when called to this other function that’s not necessary > within this linkage unit”. It's easy to make an unnamed_addr wrapper > function, and maybe we could teach the backend to peephole that to a PLT > function reference, but (1) that sounds like some serious backend heroics > and (2) it wouldn’t work for variadic functions. >Yes, that seems like far too much for what is needed here.> > One complication is that on ELF the linker will still create a PLT entry > if the symbol has default visibility, in order to support symbol > interposition. We can mitigate against that by using protected visibility > for virtual functions if they would otherwise receive default visibility. > > > To clarify, this is just a performance problem and doesn’t actually break > semantics or feasibility, right? >Performance and code size (I suspect that all the extra PLT entries would negate most of the binary size savings associated with the relative ABI). It doesn't break semantics of regular C++ programs, as long as you stick to the ODR. It can only break things if a program relies on ELF symbol interposition to replace a DSO's implementation of a virtual function with something else (which I can't really imagine being very common at all, given that ELF is the only object format that could support this). I think the right response to that is "sorry, if you need symbol interposition, then go use the platform ABI". Thanks, -- Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160307/6ec226e5/attachment.html>
John McCall via llvm-dev
2016-Mar-11 06:32 UTC
[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR
> On Mar 7, 2016, at 4:22 PM, Peter Collingbourne <peter at pcc.me.uk> wrote: > On Mon, Mar 7, 2016 at 4:09 PM, John McCall <rjmccall at apple.com <mailto:rjmccall at apple.com>> wrote: >> On Mar 4, 2016, at 2:48 PM, Peter Collingbourne via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> On Mon, Feb 29, 2016 at 1:53 PM, < <>> wrote: >> @A_vtable = {i8*, i8*, i32, i32} {0, @A::rtti, @A::f - (@A_vtable + 16), @A::g - (@A_vtable + 16)} >> >> There's a subtlety about this aspect of the ABI that I should call attention to. The virtual function references can only be resolved directly by the static linker if they are defined in the same executable/DSO as the virtual table. I expect this to be the overwhelmingly common case, as classes are normally wholly defined within a single executable or DSO, so our implementation should be optimized around that case. >> >> If we expected cross-DSO references to be relatively common, we could make vtable entries be relative to GOT entries, but that would introduce an additional level of indirection and additional relocations, probably costing us more in binary size and memory bandwidth than the current ABI. >> >> However, it is technically possible to split the implementation of a class's virtual functions between DSOs, and there are more practical cases where we might expect to see cross-DSO references: >> >> - one DSO could derive from a class defined in another DSO, and only override some of its virtual functions >> - the vtable could contain a reference to __cxa_pure_virtual which would be defined by the standard library >> >> We can handle these cases by having the vtable refer to a PLT entry for each function that is not defined within the module. This can be done by using a specific type of relative relocation that refers directly to the symbol if defined within the current module, or to a PLT entry if not. This is the same type of relocation that is needed to implement relative branches on x86, so I'd expect it to be generally available on that architecture (ELF has R_{386,X86_64}_PLT32, Mach-O has X86_64_RELOC_BRANCH, COFF has IMAGE_REL_{AMD64,I386}_REL32, which may resolve to a thunk [1], which is essentially the same thing as a PLT entry). It is also present on ARM (R_ARM_PREL31, which was apparently added to support unwind tables). >> >> We still need some way to create PLT relocations in the vtable's initializer without breaking the semantics of a load from the vtable. Rafael and I discussed this and we believe that if the target function is unnamed_addr, this indicates that the function's address isn't observable (this is true for virtual functions, as it isn't possible to take their address), and so it could be substituted with the address of a PLT entry. > > This seems like the best way to handle it. > > It would be nice if this could be requested of an arbitrary function without having to rely on it being unnamed_addr — that is, it would be nice to have “the address of an unnamed_addr function in this linkage unit which is equivalent when called to this other function that’s not necessary within this linkage unit”. It's easy to make an unnamed_addr wrapper function, and maybe we could teach the backend to peephole that to a PLT function reference, but (1) that sounds like some serious backend heroics and (2) it wouldn’t work for variadic functions. > > Yes, that seems like far too much for what is needed here.It’s a shame, though. We play a similar trick with GOT entries, and it ends up being quite elegant, at least in IR. I think there are similar problems with trying to eliminate the fake symbol in the backend, though.>> One complication is that on ELF the linker will still create a PLT entry if the symbol has default visibility, in order to support symbol interposition. We can mitigate against that by using protected visibility for virtual functions if they would otherwise receive default visibility. > > To clarify, this is just a performance problem and doesn’t actually break semantics or feasibility, right? > > Performance and code size (I suspect that all the extra PLT entries would negate most of the binary size savings associated with the relative ABI). It doesn't break semantics of regular C++ programs, as long as you stick to the ODR. It can only break things if a program relies on ELF symbol interposition to replace a DSO's implementation of a virtual function with something else (which I can't really imagine being very common at all, given that ELF is the only object format that could support this). I think the right response to that is "sorry, if you need symbol interposition, then go use the platform ABI”.I mean, I’ve never really liked ELF’s stance on symbol interposition, but taking it as given, I’m not sure I agree that it’s reasonable to carve out virtual functions as a general exception. Isn’t there some global flag to make (non-weak?) definitions use protected visibility by default? Wouldn’t using that be an overall performance win anyway? John. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160310/c0683d37/attachment.html>