Peter Collingbourne via llvm-dev
2016-Feb-29 21:53 UTC
[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR
Hi all, I'd like to make a proposal to implement the new vtable ABI described in PR26723, which I'll call the relative ABI. That bug gives more details and justification for that ABI. The user interface for the new ABI would be that -fwhole-program-vtables would take an optional value indicating which aspects of the program have whole-program scope. For example, the existing implementation of whole-program vcall optimization allows external code to call into translation units compiled with -fwhole-program-vtables, but does not allow external code to derive from classes defined in such translation units, so you could request the current behaviour with "-fwhole-program-vtables=derive", which means that derived classes are not allowed from outside the program. To request the new ABI, you can specify "-fwhole-program-vtables=call,derive", which means that calls and derived classes are both not allowed from outside the program. "-fwhole-program-vtables" would be short for "-fwhole-program-vtables=call,derive,anythingelseweaddinfuture". I'll also make the observation that the new ABI does not require LTO or whole-program visibility at compile time; to decide whether to use the new ABI for a class, we just need to check that it and its bases are not in the whole-program-vtables blacklist. At the same time, I'd like to change how virtual calls are represented in the IR. This is for a few reasons: 1) Would allow whole-program virtual call optimization to work well with the relative ABI. This ABI would complicate the IR at call sites and make it harder to do matching and rewriting. 2) Simplifies the whole-program virtual call optimization pass. Currently we need to walk uses in the IR in order to determine the slot and callees for each call site. This can all be avoided with a simpler representation. 3) Would make it easier to implement dead virtual function stripping. This would involve reshaping any vtable initializers and rewriting call sites. Implementing this correctly is harder than it needs to be because of the current representation. My proposal is to add the following new intrinsics: i32 @llvm.vtable.slot.offset(metadata, i32) This intrinsic takes a bitset name B and an offset I. It returns the byte offset of the I'th virtual function pointer in each of the vtables in B. i8* @llvm.vtable.load(i8*, i32) This intrinsic takes a virtual table pointer and a byte offset, and loads a virtual function pointer from the virtual table at the given offset. i8* @llvm.vtable.load.relative(i8*, i32) This intrinsic is the same as above, but it uses the relative ABI. {i8*, i1} @llvm.vtable.checked.load(metadata %name, i8*, i32) {i8*, i1} @llvm.vtable.checked.load.relative(metadata %name, i8*, i32) These intrinsics would be used to implement CFI. They are similar to the unchecked intrinsics, but if the second element of the result is non-zero, the program may call the first element of the result as a function pointer without causing an indirect function call to any function other than one potentially loaded from one of the constant globals of which %name is a member. To minimize the impact on existing passes, the intrinsics would be lowered early during the regular pipeline when LTO is disabled, or early in the LTO pipeline when LTO is enabled. Clang would not use the llvm.vtable.slot.offset intrinsic when LTO is disabled, as bitset information would be unavailable. To give the optimizer permission to reshape vtable initializers for a particular class, the vtable would be added to a special named metadata node named 'llvm.vtable.slots'. The presence of this metadata would guarantee that all loads beyond a given byte offset (this range would not include the RTTI pointer for example) are done using the above intrinsics. We will also take advantage of the ABI break to split the class's virtual table group at virtual table boundaries into separate globals instead of emitting all virtual tables in the group into a single global. This will not only simplify the implementation of dead virtual function stripping, but also reduce code size overhead for CFI. (CFI works best if vtables for a base class can be laid out near vtables for derived class; the current ABI makes this harder to achieve.) Example (using the relative ABI): struct A { virtual void f(); virtual void g(); }; struct B { virtual void h(); }; struct C : A, B { virtual void f(); virtual void g(); virtual void h(); }; void fcall(A *a) { a->f(); } void gcall(A *a) { a->g(); } typedef void (A::*mfp)(); mfp getmfp() { return &A::g; } void callmfp(A *a, mfp m) { (a->*m)(); } In IR: @A_vtable = {i8*, i8*, i32, i32} {0, @A::rtti, @A::f - (@A_vtable + 16), @A::g - (@A_vtable + 16)} @B_vtable = {i8*, i8*, i32} {0, @B::rtti, @B::h - (@B_vtable + 16)} @C_vtable0 = {i8*, i8*, i32, i32, i32} {0, @C::rtti, @C::f - (@C_vtable0 + 16), @C::g - (@C_vtable0 + 16), @C::h - (@C_vtable0 + 16)} @C_vtable1 = {i8*, i8*, i32} {-8, @C::rtti, @C::h - (@C_vtable1 + 16)} define void @fcall(%A* %a) { %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 0) %vtable = load i8* %a %fp = i8* @llvm.vtable.load.relative(%vtable, %slot) %casted_fp = bitcast i8* %fp to void (%A*) call void %casted_fp(%a) } define void @gcall(%A* %a) { %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1) %vtable = load i8* %a %fp = i8* @llvm.vtable.load.relative(%vtable, %slot) %casted_fp = bitcast i8* %fp to void (%A*) call void %casted_fp(%a) } define {i8*, i8*} @getmfp() { %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1) %slotp1 = add %slot, 1 %result = insertvalue {i8*, i8*} {i8* 0, i8* 0}, 0, %slotp1 ret {i8*, i8*} %result } define @callmfp(%A* %a, {i8*, i8*} %m) { ; assuming the call is virtual and no this adjustment %slot = extractvalue i8* %m, 0 %slotm1 = sub %slot, 1 %vtable = load i8* %a %fp = i8* @llvm.vtable.load.relative(%vtable, %slotm1) %casted_fp = bitcast i8* %fp to void (%A*) call void %casted_fp(%a) } !0 = {!"A", @A_vtable, 16} !1 = {!"B", @B_vtable, 16} !2 = {!"A", @C_vtable0, 16} !3 = {!"B", @C_vtable1, 16} !4 = {!"C", @C_vtable0, 16} !llvm.bitsets = {!0, !1, !2, !3, !4} !5 = {@A_vtable, 16} !6 = {@B_vtable, 16} !7 = {@C_vtable0, 16} !8 = {@C_vtable1, 16} !llvm.vtable.slots = {!5, !6, !7, !8} Thanks, -- Peter
Sean Silva via llvm-dev
2016-Feb-29 23:38 UTC
[llvm-dev] [cfe-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR
Using relative offsets applies to more than just vtables. It would do wonders for constant strings too. -- Sean Silva On Mon, Feb 29, 2016 at 1:53 PM, Peter Collingbourne via cfe-dev < cfe-dev at lists.llvm.org> wrote:> Hi all, > > I'd like to make a proposal to implement the new vtable ABI described in > PR26723, which I'll call the relative ABI. That bug gives more details and > justification for that ABI. > > The user interface for the new ABI would be that -fwhole-program-vtables > would take an optional value indicating which aspects of the program have > whole-program scope. For example, the existing implementation of > whole-program > vcall optimization allows external code to call into translation units > compiled with -fwhole-program-vtables, but does not allow external code to > derive from classes defined in such translation units, so you could request > the current behaviour with "-fwhole-program-vtables=derive", which means > that derived classes are not allowed from outside the program. To request > the new ABI, you can specify "-fwhole-program-vtables=call,derive", > which means that calls and derived classes are both not allowed from > outside the program. "-fwhole-program-vtables" would be short for > "-fwhole-program-vtables=call,derive,anythingelseweaddinfuture". > > I'll also make the observation that the new ABI does not require LTO or > whole-program visibility at compile time; to decide whether to use the new > ABI for a class, we just need to check that it and its bases are not in the > whole-program-vtables blacklist. > > At the same time, I'd like to change how virtual calls are represented in > the IR. This is for a few reasons: > > 1) Would allow whole-program virtual call optimization to work well with > the > relative ABI. This ABI would complicate the IR at call sites and make it > harder to do matching and rewriting. > > 2) Simplifies the whole-program virtual call optimization pass. Currently > we > need to walk uses in the IR in order to determine the slot and callees > for > each call site. This can all be avoided with a simpler representation. > > 3) Would make it easier to implement dead virtual function stripping. This > would > involve reshaping any vtable initializers and rewriting call > sites. Implementing this correctly is harder than it needs to be because > of the current representation. > > My proposal is to add the following new intrinsics: > > i32 @llvm.vtable.slot.offset(metadata, i32) > > This intrinsic takes a bitset name B and an offset I. It returns the byte > offset of the I'th virtual function pointer in each of the vtables in B. > > i8* @llvm.vtable.load(i8*, i32) > > This intrinsic takes a virtual table pointer and a byte offset, and loads > a virtual function pointer from the virtual table at the given offset. > > i8* @llvm.vtable.load.relative(i8*, i32) > > This intrinsic is the same as above, but it uses the relative ABI. > > {i8*, i1} @llvm.vtable.checked.load(metadata %name, i8*, i32) > {i8*, i1} @llvm.vtable.checked.load.relative(metadata %name, i8*, i32) > > These intrinsics would be used to implement CFI. They are similar to the > unchecked intrinsics, but if the second element of the result is non-zero, > the program may call the first element of the result as a function pointer > without causing an indirect function call to any function other than one > potentially loaded from one of the constant globals of which %name is a > member. > > To minimize the impact on existing passes, the intrinsics would be lowered > early during the regular pipeline when LTO is disabled, or early in the LTO > pipeline when LTO is enabled. Clang would not use the > llvm.vtable.slot.offset > intrinsic when LTO is disabled, as bitset information would be unavailable. > > To give the optimizer permission to reshape vtable initializers for a > particular class, the vtable would be added to a special named metadata > node > named 'llvm.vtable.slots'. The presence of this metadata would guarantee > that all loads beyond a given byte offset (this range would not include the > RTTI pointer for example) are done using the above intrinsics. > > We will also take advantage of the ABI break to split the class's virtual > table group at virtual table boundaries into separate globals instead of > emitting all virtual tables in the group into a single global. This will > not only simplify the implementation of dead virtual function stripping, > but also reduce code size overhead for CFI. (CFI works best if vtables for > a base class can be laid out near vtables for derived class; the current > ABI makes this harder to achieve.) > > Example (using the relative ABI): > > struct A { > virtual void f(); > virtual void g(); > }; > > struct B { > virtual void h(); > }; > > struct C : A, B { > virtual void f(); > virtual void g(); > virtual void h(); > }; > > void fcall(A *a) { > a->f(); > } > > void gcall(A *a) { > a->g(); > } > > typedef void (A::*mfp)(); > > mfp getmfp() { > return &A::g; > } > > void callmfp(A *a, mfp m) { > (a->*m)(); > } > > In IR: > > @A_vtable = {i8*, i8*, i32, i32} {0, @A::rtti, @A::f - (@A_vtable + 16), > @A::g - (@A_vtable + 16)} > @B_vtable = {i8*, i8*, i32} {0, @B::rtti, @B::h - (@B_vtable + 16)} > @C_vtable0 = {i8*, i8*, i32, i32, i32} {0, @C::rtti, @C::f - (@C_vtable0 + > 16), @C::g - (@C_vtable0 + 16), @C::h - (@C_vtable0 + 16)} > @C_vtable1 = {i8*, i8*, i32} {-8, @C::rtti, @C::h - (@C_vtable1 + 16)} > > define void @fcall(%A* %a) { > %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 0) > %vtable = load i8* %a > %fp = i8* @llvm.vtable.load.relative(%vtable, %slot) > %casted_fp = bitcast i8* %fp to void (%A*) > call void %casted_fp(%a) > } > > define void @gcall(%A* %a) { > %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1) > %vtable = load i8* %a > %fp = i8* @llvm.vtable.load.relative(%vtable, %slot) > %casted_fp = bitcast i8* %fp to void (%A*) > call void %casted_fp(%a) > } > > define {i8*, i8*} @getmfp() { > %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1) > %slotp1 = add %slot, 1 > %result = insertvalue {i8*, i8*} {i8* 0, i8* 0}, 0, %slotp1 > ret {i8*, i8*} %result > } > > define @callmfp(%A* %a, {i8*, i8*} %m) { > ; assuming the call is virtual and no this adjustment > %slot = extractvalue i8* %m, 0 > %slotm1 = sub %slot, 1 > %vtable = load i8* %a > %fp = i8* @llvm.vtable.load.relative(%vtable, %slotm1) > %casted_fp = bitcast i8* %fp to void (%A*) > call void %casted_fp(%a) > } > > !0 = {!"A", @A_vtable, 16} > !1 = {!"B", @B_vtable, 16} > !2 = {!"A", @C_vtable0, 16} > !3 = {!"B", @C_vtable1, 16} > !4 = {!"C", @C_vtable0, 16} > !llvm.bitsets = {!0, !1, !2, !3, !4} > > !5 = {@A_vtable, 16} > !6 = {@B_vtable, 16} > !7 = {@C_vtable0, 16} > !8 = {@C_vtable1, 16} > !llvm.vtable.slots = {!5, !6, !7, !8} > > Thanks, > -- > Peter > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160229/581df192/attachment-0001.html>
Peter Collingbourne via llvm-dev
2016-Mar-01 00:29 UTC
[llvm-dev] [cfe-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR
That's true, and in fact I found that many of the remaining dynamic relocations in Chromium were for constant strings and data structures referencing them (e.g. [1]). Unfortunately I couldn't see a good general way to transform a program to use relative offsets automatically, as I imagine we would need to consider every use of the data structure in the program. Possible in theory, but very difficult to get right. Peter [1] https://code.google.com/p/chromium/codesearch#chromium/src/net/base/mime_util.cc&l=64&gs=cpp:net::kPrimaryMappings at chromium/../../net/base/mime_util.cc%257Cdef&gsn=kPrimaryMappings&ct=xref_usages On Mon, Feb 29, 2016 at 03:38:35PM -0800, Sean Silva wrote:> Using relative offsets applies to more than just vtables. It would do > wonders for constant strings too. > > -- Sean Silva > > On Mon, Feb 29, 2016 at 1:53 PM, Peter Collingbourne via cfe-dev < > cfe-dev at lists.llvm.org> wrote: > > > Hi all, > > > > I'd like to make a proposal to implement the new vtable ABI described in > > PR26723, which I'll call the relative ABI. That bug gives more details and > > justification for that ABI. > > > > The user interface for the new ABI would be that -fwhole-program-vtables > > would take an optional value indicating which aspects of the program have > > whole-program scope. For example, the existing implementation of > > whole-program > > vcall optimization allows external code to call into translation units > > compiled with -fwhole-program-vtables, but does not allow external code to > > derive from classes defined in such translation units, so you could request > > the current behaviour with "-fwhole-program-vtables=derive", which means > > that derived classes are not allowed from outside the program. To request > > the new ABI, you can specify "-fwhole-program-vtables=call,derive", > > which means that calls and derived classes are both not allowed from > > outside the program. "-fwhole-program-vtables" would be short for > > "-fwhole-program-vtables=call,derive,anythingelseweaddinfuture". > > > > I'll also make the observation that the new ABI does not require LTO or > > whole-program visibility at compile time; to decide whether to use the new > > ABI for a class, we just need to check that it and its bases are not in the > > whole-program-vtables blacklist. > > > > At the same time, I'd like to change how virtual calls are represented in > > the IR. This is for a few reasons: > > > > 1) Would allow whole-program virtual call optimization to work well with > > the > > relative ABI. This ABI would complicate the IR at call sites and make it > > harder to do matching and rewriting. > > > > 2) Simplifies the whole-program virtual call optimization pass. Currently > > we > > need to walk uses in the IR in order to determine the slot and callees > > for > > each call site. This can all be avoided with a simpler representation. > > > > 3) Would make it easier to implement dead virtual function stripping. This > > would > > involve reshaping any vtable initializers and rewriting call > > sites. Implementing this correctly is harder than it needs to be because > > of the current representation. > > > > My proposal is to add the following new intrinsics: > > > > i32 @llvm.vtable.slot.offset(metadata, i32) > > > > This intrinsic takes a bitset name B and an offset I. It returns the byte > > offset of the I'th virtual function pointer in each of the vtables in B. > > > > i8* @llvm.vtable.load(i8*, i32) > > > > This intrinsic takes a virtual table pointer and a byte offset, and loads > > a virtual function pointer from the virtual table at the given offset. > > > > i8* @llvm.vtable.load.relative(i8*, i32) > > > > This intrinsic is the same as above, but it uses the relative ABI. > > > > {i8*, i1} @llvm.vtable.checked.load(metadata %name, i8*, i32) > > {i8*, i1} @llvm.vtable.checked.load.relative(metadata %name, i8*, i32) > > > > These intrinsics would be used to implement CFI. They are similar to the > > unchecked intrinsics, but if the second element of the result is non-zero, > > the program may call the first element of the result as a function pointer > > without causing an indirect function call to any function other than one > > potentially loaded from one of the constant globals of which %name is a > > member. > > > > To minimize the impact on existing passes, the intrinsics would be lowered > > early during the regular pipeline when LTO is disabled, or early in the LTO > > pipeline when LTO is enabled. Clang would not use the > > llvm.vtable.slot.offset > > intrinsic when LTO is disabled, as bitset information would be unavailable. > > > > To give the optimizer permission to reshape vtable initializers for a > > particular class, the vtable would be added to a special named metadata > > node > > named 'llvm.vtable.slots'. The presence of this metadata would guarantee > > that all loads beyond a given byte offset (this range would not include the > > RTTI pointer for example) are done using the above intrinsics. > > > > We will also take advantage of the ABI break to split the class's virtual > > table group at virtual table boundaries into separate globals instead of > > emitting all virtual tables in the group into a single global. This will > > not only simplify the implementation of dead virtual function stripping, > > but also reduce code size overhead for CFI. (CFI works best if vtables for > > a base class can be laid out near vtables for derived class; the current > > ABI makes this harder to achieve.) > > > > Example (using the relative ABI): > > > > struct A { > > virtual void f(); > > virtual void g(); > > }; > > > > struct B { > > virtual void h(); > > }; > > > > struct C : A, B { > > virtual void f(); > > virtual void g(); > > virtual void h(); > > }; > > > > void fcall(A *a) { > > a->f(); > > } > > > > void gcall(A *a) { > > a->g(); > > } > > > > typedef void (A::*mfp)(); > > > > mfp getmfp() { > > return &A::g; > > } > > > > void callmfp(A *a, mfp m) { > > (a->*m)(); > > } > > > > In IR: > > > > @A_vtable = {i8*, i8*, i32, i32} {0, @A::rtti, @A::f - (@A_vtable + 16), > > @A::g - (@A_vtable + 16)} > > @B_vtable = {i8*, i8*, i32} {0, @B::rtti, @B::h - (@B_vtable + 16)} > > @C_vtable0 = {i8*, i8*, i32, i32, i32} {0, @C::rtti, @C::f - (@C_vtable0 + > > 16), @C::g - (@C_vtable0 + 16), @C::h - (@C_vtable0 + 16)} > > @C_vtable1 = {i8*, i8*, i32} {-8, @C::rtti, @C::h - (@C_vtable1 + 16)} > > > > define void @fcall(%A* %a) { > > %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 0) > > %vtable = load i8* %a > > %fp = i8* @llvm.vtable.load.relative(%vtable, %slot) > > %casted_fp = bitcast i8* %fp to void (%A*) > > call void %casted_fp(%a) > > } > > > > define void @gcall(%A* %a) { > > %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1) > > %vtable = load i8* %a > > %fp = i8* @llvm.vtable.load.relative(%vtable, %slot) > > %casted_fp = bitcast i8* %fp to void (%A*) > > call void %casted_fp(%a) > > } > > > > define {i8*, i8*} @getmfp() { > > %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1) > > %slotp1 = add %slot, 1 > > %result = insertvalue {i8*, i8*} {i8* 0, i8* 0}, 0, %slotp1 > > ret {i8*, i8*} %result > > } > > > > define @callmfp(%A* %a, {i8*, i8*} %m) { > > ; assuming the call is virtual and no this adjustment > > %slot = extractvalue i8* %m, 0 > > %slotm1 = sub %slot, 1 > > %vtable = load i8* %a > > %fp = i8* @llvm.vtable.load.relative(%vtable, %slotm1) > > %casted_fp = bitcast i8* %fp to void (%A*) > > call void %casted_fp(%a) > > } > > > > !0 = {!"A", @A_vtable, 16} > > !1 = {!"B", @B_vtable, 16} > > !2 = {!"A", @C_vtable0, 16} > > !3 = {!"B", @C_vtable1, 16} > > !4 = {!"C", @C_vtable0, 16} > > !llvm.bitsets = {!0, !1, !2, !3, !4} > > > > !5 = {@A_vtable, 16} > > !6 = {@B_vtable, 16} > > !7 = {@C_vtable0, 16} > > !8 = {@C_vtable1, 16} > > !llvm.vtable.slots = {!5, !6, !7, !8} > > > > Thanks, > > -- > > Peter > > _______________________________________________ > > cfe-dev mailing list > > cfe-dev at lists.llvm.org > > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev > >-- Peter
Peter Collingbourne via llvm-dev
2016-Mar-04 06:31 UTC
[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR
On Mon, Feb 29, 2016 at 1:53 PM, <> wrote:> Hi all, > > I'd like to make a proposal to implement the new vtable ABI described in > PR26723, which I'll call the relative ABI. That bug gives more details and > justification for that ABI. > > The user interface for the new ABI would be that -fwhole-program-vtables > would take an optional value indicating which aspects of the program have > whole-program scope. For example, the existing implementation of > whole-program > vcall optimization allows external code to call into translation units > compiled with -fwhole-program-vtables, but does not allow external code to > derive from classes defined in such translation units, so you could request > the current behaviour with "-fwhole-program-vtables=derive", which means > that derived classes are not allowed from outside the program. To request > the new ABI, you can specify "-fwhole-program-vtables=call,derive", > which means that calls and derived classes are both not allowed from > outside the program. "-fwhole-program-vtables" would be short for > "-fwhole-program-vtables=call,derive,anythingelseweaddinfuture". >Based on discussion with John McCall in PR26723, I’d like to change the user interface for -fwhole-program-vtables, and introduce an interface specifically to enable the relative ABI. That interface would be based on a whitelist rather than a blacklist, and together with -fwhole-program-vtables would enable devirtualization, virtual const prop, and virtual function stripping for those classes. The new user interface is as follows: We would introduce two new attributes, [[clang::unstable_abi]] and [[clang::stable_abi]], which would be attached to a class and would enable or disable the unstable ABI for that class. It is an ODR violation to use [[clang::unstable_abi]] in two translation units compiled with different versions of Clang (we may consider extending the object format to allow a linker to diagnose this). Specifically, mixing different head revisions or major releases is not allowed, but mixing different point releases is fine. The attribute __declspec(uuid()) (which is used for COM classes on Windows) would imply [[clang::stable_abi]]. A “dynamic-introducing” class is a class that declares new virtual member functions or virtual bases, and has no dynamic bases or virtual bases. A class that is dynamic but not dynamic-introducing would use the same ABI as its dynamic base classes. The compiler will diagnose if a class has two or more dynamic bases with different ABIs, or if the bases have a different ABI from the one explicitly specified by an attribute. The ABI for a dynamic-introducing class is determined from the attribute, or if the class has no attribute, from the following flags: -funstable-c++-abi or -funstable-c++-abi-classes would enable the unstable C++ ABI for all classes (the idea being that -funstable-c++-abi would also cover any unrelated ABI breaks we may want to make in future). -funstable-c++-abi-classes=PATH would enable the unstable C++ ABI for dynamic-introducing classes specified in the file at PATH. The -fwhole-program-vtables-blacklist flag would be removed, and I'm no longer proposing that -fwhole-program-vtables would take a value. The whole-program blacklist would be replaced by either inference based on visibility or a new [[clang::no_whole_program]] attribute. It is effectively an ODR violation to define a class that uses the unstable ABI in a translation unit compiled with a different set of -funstable-c++-abi* flags. It is also a violation for a linkage unit other than the one compiled with -fwhole-program-vtables to define any of the classes that use the unstable ABI. The format of the file is a series of lines ending in either * or **. Preceding that is a namespace specifier delimited by double-colons followed by “::”, or the empty string to denote the global namespace. Each entry in the list indicates that dynamic-introducing classes in that namespace, including nested classes, classes defined in enclosed anonymous namespaces, and classes defined within member functions of those classes, use the unstable ABI. If the line ends in “*” this applies to the given namespace only, while if the line ends in “**” it applies to the given namespace and any enclosed namespaces. In Chromium for example, the contents of the file would look like this: * app::** base::** browser::** [...] wm::** zip::** This whitelist specifies that classes defined in the global namespace as well as app, base, browser etc. and any enclosed namespaces would use the unstable ABI. This list excludes std::**, so we can continue to use the system standard library. If Chromium did start using its own copy of the standard library, we could create another whitelist with that entry in it, or just use the -funstable-c++-abi-classes flag. We would also add a new warning: -Wc++-stable-abi. This would warn for any classes defined in non-system header files that are inferred from namespaces to use the stable ABI. This warning would be intended to be used by programs that intend to use the unstable ABI for any non-system classes (such as Chromium). Control flow integrity (-fsanitize=cfi*) would also only be supported for classes using the unstable ABI, and would require the -fwhole-program-vtables flag, unless the cross-DSO mode (-fsanitize-cfi-cross-dso) is enabled. Next steps: I will send out a patch that implements the semantic analysis side of this. Once that lands, follow-up patches will actually start changing the unstable ABI. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160303/79ce9df3/attachment.html>
Mehdi Amini via llvm-dev
2016-Mar-04 17:32 UTC
[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR
> On Feb 29, 2016, at 1:53 PM, Peter Collingbourne <peter at pcc.me.uk> wrote: > > Hi all, > > I'd like to make a proposal to implement the new vtable ABI described in > PR26723, which I'll call the relative ABI. That bug gives more details and > justification for that ABI. > > The user interface for the new ABI would be that -fwhole-program-vtables > would take an optional value indicating which aspects of the program have > whole-program scope. For example, the existing implementation of whole-program > vcall optimization allows external code to call into translation units > compiled with -fwhole-program-vtables, but does not allow external code to > derive from classes defined in such translation units, so you could request > the current behaviour with "-fwhole-program-vtables=derive", which means > that derived classes are not allowed from outside the program. To request > the new ABI, you can specify "-fwhole-program-vtables=call,derive", > which means that calls and derived classes are both not allowed from > outside the program. "-fwhole-program-vtables" would be short for > "-fwhole-program-vtables=call,derive,anythingelseweaddinfuture". > > I'll also make the observation that the new ABI does not require LTO or > whole-program visibility at compile time; to decide whether to use the new > ABI for a class, we just need to check that it and its bases are not in the > whole-program-vtables blacklist. > > At the same time, I'd like to change how virtual calls are represented in > the IR. This is for a few reasons: > > 1) Would allow whole-program virtual call optimization to work well with the > relative ABI. This ABI would complicate the IR at call sites and make it > harder to do matching and rewriting. > > 2) Simplifies the whole-program virtual call optimization pass. Currently we > need to walk uses in the IR in order to determine the slot and callees for > each call site. This can all be avoided with a simpler representation. > > 3) Would make it easier to implement dead virtual function stripping. This would > involve reshaping any vtable initializers and rewriting call > sites. Implementing this correctly is harder than it needs to be because > of the current representation. > > My proposal is to add the following new intrinsics:Thanks, I'm really glad you're moving forward on improving the IR representation so fast after our previous discussion. The use of these intrinsics looks a lot more friendly to me! :) (even if I still does not make sense of the "bitset" terminology to represent the hierarchy for the metadata part)> > i32 @llvm.vtable.slot.offset(metadata, i32) > > This intrinsic takes a bitset name B and an offset I. It returns the byte > offset of the I'th virtual function pointer in each of the vtables in B. > > i8* @llvm.vtable.load(i8*, i32)Why is the vtable.load taking a byte offset instead of a slot index directly? (the IR could be simpler by not requiring to call @llvm.vtable.slot.offset() for every @llvm.vtable.load()) -- Mehdi> This intrinsic takes a virtual table pointer and a byte offset, and loads > a virtual function pointer from the virtual table at the given offset. > > i8* @llvm.vtable.load.relative(i8*, i32) > > This intrinsic is the same as above, but it uses the relative ABI. > > {i8*, i1} @llvm.vtable.checked.load(metadata %name, i8*, i32) > {i8*, i1} @llvm.vtable.checked.load.relative(metadata %name, i8*, i32) > > These intrinsics would be used to implement CFI. They are similar to the > unchecked intrinsics, but if the second element of the result is non-zero, > the program may call the first element of the result as a function pointer > without causing an indirect function call to any function other than one > potentially loaded from one of the constant globals of which %name is a member. > > To minimize the impact on existing passes, the intrinsics would be lowered > early during the regular pipeline when LTO is disabled, or early in the LTO > pipeline when LTO is enabled. Clang would not use the llvm.vtable.slot.offset > intrinsic when LTO is disabled, as bitset information would be unavailable. > > To give the optimizer permission to reshape vtable initializers for a > particular class, the vtable would be added to a special named metadata node > named 'llvm.vtable.slots'. The presence of this metadata would guarantee > that all loads beyond a given byte offset (this range would not include the > RTTI pointer for example) are done using the above intrinsics. > > We will also take advantage of the ABI break to split the class's virtual > table group at virtual table boundaries into separate globals instead of > emitting all virtual tables in the group into a single global. This will > not only simplify the implementation of dead virtual function stripping, > but also reduce code size overhead for CFI. (CFI works best if vtables for > a base class can be laid out near vtables for derived class; the current > ABI makes this harder to achieve.) > > Example (using the relative ABI): > > struct A { > virtual void f(); > virtual void g(); > }; > > struct B { > virtual void h(); > }; > > struct C : A, B { > virtual void f(); > virtual void g(); > virtual void h(); > }; > > void fcall(A *a) { > a->f(); > } > > void gcall(A *a) { > a->g(); > } > > typedef void (A::*mfp)(); > > mfp getmfp() { > return &A::g; > } > > void callmfp(A *a, mfp m) { > (a->*m)(); > } > > In IR: > > @A_vtable = {i8*, i8*, i32, i32} {0, @A::rtti, @A::f - (@A_vtable + 16), @A::g - (@A_vtable + 16)} > @B_vtable = {i8*, i8*, i32} {0, @B::rtti, @B::h - (@B_vtable + 16)} > @C_vtable0 = {i8*, i8*, i32, i32, i32} {0, @C::rtti, @C::f - (@C_vtable0 + 16), @C::g - (@C_vtable0 + 16), @C::h - (@C_vtable0 + 16)} > @C_vtable1 = {i8*, i8*, i32} {-8, @C::rtti, @C::h - (@C_vtable1 + 16)} > > define void @fcall(%A* %a) { > %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 0) > %vtable = load i8* %a > %fp = i8* @llvm.vtable.load.relative(%vtable, %slot) > %casted_fp = bitcast i8* %fp to void (%A*) > call void %casted_fp(%a) > } > > define void @gcall(%A* %a) { > %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1) > %vtable = load i8* %a > %fp = i8* @llvm.vtable.load.relative(%vtable, %slot) > %casted_fp = bitcast i8* %fp to void (%A*) > call void %casted_fp(%a) > } > > define {i8*, i8*} @getmfp() { > %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1) > %slotp1 = add %slot, 1 > %result = insertvalue {i8*, i8*} {i8* 0, i8* 0}, 0, %slotp1 > ret {i8*, i8*} %result > } > > define @callmfp(%A* %a, {i8*, i8*} %m) { > ; assuming the call is virtual and no this adjustment > %slot = extractvalue i8* %m, 0 > %slotm1 = sub %slot, 1 > %vtable = load i8* %a > %fp = i8* @llvm.vtable.load.relative(%vtable, %slotm1) > %casted_fp = bitcast i8* %fp to void (%A*) > call void %casted_fp(%a) > } > > !0 = {!"A", @A_vtable, 16} > !1 = {!"B", @B_vtable, 16} > !2 = {!"A", @C_vtable0, 16} > !3 = {!"B", @C_vtable1, 16} > !4 = {!"C", @C_vtable0, 16} > !llvm.bitsets = {!0, !1, !2, !3, !4} > > !5 = {@A_vtable, 16} > !6 = {@B_vtable, 16} > !7 = {@C_vtable0, 16} > !8 = {@C_vtable1, 16} > !llvm.vtable.slots = {!5, !6, !7, !8} > > Thanks, > -- > Peter
Peter Collingbourne via llvm-dev
2016-Mar-04 20:47 UTC
[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR
On Fri, Mar 4, 2016 at 9:32 AM, Mehdi Amini <mehdi.amini at apple.com> wrote:> > > On Feb 29, 2016, at 1:53 PM, Peter Collingbourne <peter at pcc.me.uk> > wrote: > > > > Hi all, > > > > I'd like to make a proposal to implement the new vtable ABI described in > > PR26723, which I'll call the relative ABI. That bug gives more details > and > > justification for that ABI. > > > > The user interface for the new ABI would be that -fwhole-program-vtables > > would take an optional value indicating which aspects of the program have > > whole-program scope. For example, the existing implementation of > whole-program > > vcall optimization allows external code to call into translation units > > compiled with -fwhole-program-vtables, but does not allow external code > to > > derive from classes defined in such translation units, so you could > request > > the current behaviour with "-fwhole-program-vtables=derive", which means > > that derived classes are not allowed from outside the program. To request > > the new ABI, you can specify "-fwhole-program-vtables=call,derive", > > which means that calls and derived classes are both not allowed from > > outside the program. "-fwhole-program-vtables" would be short for > > "-fwhole-program-vtables=call,derive,anythingelseweaddinfuture". > > > > I'll also make the observation that the new ABI does not require LTO or > > whole-program visibility at compile time; to decide whether to use the > new > > ABI for a class, we just need to check that it and its bases are not in > the > > whole-program-vtables blacklist. > > > > At the same time, I'd like to change how virtual calls are represented in > > the IR. This is for a few reasons: > > > > 1) Would allow whole-program virtual call optimization to work well with > the > > relative ABI. This ABI would complicate the IR at call sites and make > it > > harder to do matching and rewriting. > > > > 2) Simplifies the whole-program virtual call optimization pass. > Currently we > > need to walk uses in the IR in order to determine the slot and callees > for > > each call site. This can all be avoided with a simpler representation. > > > > 3) Would make it easier to implement dead virtual function stripping. > This would > > involve reshaping any vtable initializers and rewriting call > > sites. Implementing this correctly is harder than it needs to be > because > > of the current representation. > > > > My proposal is to add the following new intrinsics: > > Thanks, I'm really glad you're moving forward on improving the IR > representation so fast after our previous discussion. The use of these > intrinsics looks a lot more friendly to me! :) > (even if I still does not make sense of the "bitset" terminology to > represent the hierarchy for the metadata part) > > > > > i32 @llvm.vtable.slot.offset(metadata, i32) > > > > This intrinsic takes a bitset name B and an offset I. It returns the byte > > offset of the I'th virtual function pointer in each of the vtables in B. > > > > i8* @llvm.vtable.load(i8*, i32) > > Why is the vtable.load taking a byte offset instead of a slot index > directly? (the IR could be simpler by not requiring to call > @llvm.vtable.slot.offset() for every @llvm.vtable.load()) >I decided to split these in order to support virtual member function pointers correctly. In the Itanium ABI, member function pointers use a byte offset. The idea is that llvm.vtable.slot.offset would be used to create a member function pointer, while llvm.vtable.load would be used to call it (see also the getmfp and callmfp examples). Thanks, Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160304/bd784ca7/attachment.html>
Peter Collingbourne via llvm-dev
2016-Mar-04 22:48 UTC
[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR
On Mon, Feb 29, 2016 at 1:53 PM, <> wrote:> > @A_vtable = {i8*, i8*, i32, i32} {0, @A::rtti, @A::f - (@A_vtable + 16), > @A::g - (@A_vtable + 16)} >There's a subtlety about this aspect of the ABI that I should call attention to. The virtual function references can only be resolved directly by the static linker if they are defined in the same executable/DSO as the virtual table. I expect this to be the overwhelmingly common case, as classes are normally wholly defined within a single executable or DSO, so our implementation should be optimized around that case. If we expected cross-DSO references to be relatively common, we could make vtable entries be relative to GOT entries, but that would introduce an additional level of indirection and additional relocations, probably costing us more in binary size and memory bandwidth than the current ABI. However, it is technically possible to split the implementation of a class's virtual functions between DSOs, and there are more practical cases where we might expect to see cross-DSO references: - one DSO could derive from a class defined in another DSO, and only override some of its virtual functions - the vtable could contain a reference to __cxa_pure_virtual which would be defined by the standard library We can handle these cases by having the vtable refer to a PLT entry for each function that is not defined within the module. This can be done by using a specific type of relative relocation that refers directly to the symbol if defined within the current module, or to a PLT entry if not. This is the same type of relocation that is needed to implement relative branches on x86, so I'd expect it to be generally available on that architecture (ELF has R_{386,X86_64}_PLT32, Mach-O has X86_64_RELOC_BRANCH, COFF has IMAGE_REL_{AMD64,I386}_REL32, which may resolve to a thunk [1], which is essentially the same thing as a PLT entry). It is also present on ARM (R_ARM_PREL31, which was apparently added to support unwind tables). We still need some way to create PLT relocations in the vtable's initializer without breaking the semantics of a load from the vtable. Rafael and I discussed this and we believe that if the target function is unnamed_addr, this indicates that the function's address isn't observable (this is true for virtual functions, as it isn't possible to take their address), and so it could be substituted with the address of a PLT entry. One complication is that on ELF the linker will still create a PLT entry if the symbol has default visibility, in order to support symbol interposition. We can mitigate against that by using protected visibility for virtual functions if they would otherwise receive default visibility. Thanks, Peter [1] http://llvm.org/klaus/lld/blob/master/COFF/InputFiles.cpp#L-305 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160304/924f2f63/attachment.html>
Nico Weber via llvm-dev
2016-Mar-05 03:49 UTC
[llvm-dev] [cfe-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR
On Thu, Mar 3, 2016 at 10:31 PM, Peter Collingbourne via cfe-dev < cfe-dev at lists.llvm.org> wrote:> On Mon, Feb 29, 2016 at 1:53 PM, <> wrote: > >> Hi all, >> >> I'd like to make a proposal to implement the new vtable ABI described in >> PR26723, which I'll call the relative ABI. That bug gives more details and >> justification for that ABI. >> >> The user interface for the new ABI would be that -fwhole-program-vtables >> would take an optional value indicating which aspects of the program have >> whole-program scope. For example, the existing implementation of >> whole-program >> vcall optimization allows external code to call into translation units >> compiled with -fwhole-program-vtables, but does not allow external code to >> derive from classes defined in such translation units, so you could >> request >> the current behaviour with "-fwhole-program-vtables=derive", which means >> that derived classes are not allowed from outside the program. To request >> the new ABI, you can specify "-fwhole-program-vtables=call,derive", >> which means that calls and derived classes are both not allowed from >> outside the program. "-fwhole-program-vtables" would be short for >> "-fwhole-program-vtables=call,derive,anythingelseweaddinfuture". >> > > Based on discussion with John McCall in PR26723, I’d like to change the > user interface for -fwhole-program-vtables, and introduce an interface > specifically to enable the relative ABI. That interface would be based on a > whitelist rather than a blacklist, and together with > -fwhole-program-vtables would enable devirtualization, virtual const prop, > and virtual function stripping for those classes. > > The new user interface is as follows: > > We would introduce two new attributes, [[clang::unstable_abi]] and > [[clang::stable_abi]], which would be attached to a class and would enable > or disable the unstable ABI for that class. It is an ODR violation to use > [[clang::unstable_abi]] in two translation units compiled with different > versions of Clang (we may consider extending the object format to allow a > linker to diagnose this). Specifically, mixing different head revisions or > major releases is not allowed, but mixing different point releases is fine. > The attribute __declspec(uuid()) (which is used for COM classes on Windows) > would imply [[clang::stable_abi]]. > > A “dynamic-introducing” class is a class that declares new virtual member > functions or virtual bases, and has no dynamic bases or virtual bases. A > class that is dynamic but not dynamic-introducing would use the same ABI as > its dynamic base classes. The compiler will diagnose if a class has two or > more dynamic bases with different ABIs, or if the bases have a different > ABI from the one explicitly specified by an attribute. > > The ABI for a dynamic-introducing class is determined from the attribute, > or if the class has no attribute, from the following flags: > > -funstable-c++-abi or -funstable-c++-abi-classes would enable the unstable > C++ ABI for all classes (the idea being that -funstable-c++-abi would also > cover any unrelated ABI breaks we may want to make in future). > -funstable-c++-abi-classes=PATH would enable the unstable C++ ABI for > dynamic-introducing classes specified in the file at PATH. > The -fwhole-program-vtables-blacklist flag would be removed, and I'm no > longer proposing that -fwhole-program-vtables would take a value. The > whole-program blacklist would be replaced by either inference based on > visibility or a new [[clang::no_whole_program]] attribute. > > It is effectively an ODR violation to define a class that uses the > unstable ABI in a translation unit compiled with a different set of > -funstable-c++-abi* flags. It is also a violation for a linkage unit other > than the one compiled with -fwhole-program-vtables to define any of the > classes that use the unstable ABI. > > The format of the file is a series of lines ending in either * or **. > Preceding that is a namespace specifier delimited by double-colons followed > by “::”, or the empty string to denote the global namespace. Each entry in > the list indicates that dynamic-introducing classes in that namespace, > including nested classes, classes defined in enclosed anonymous namespaces, > and classes defined within member functions of those classes, use the > unstable ABI. If the line ends in “*” this applies to the given namespace > only, while if the line ends in “**” it applies to the given namespace and > any enclosed namespaces. > > In Chromium for example, the contents of the file would look like this: > > * > app::** > base::** > browser::** > [...] > wm::** > zip::** >Wouldn't we want to say "use custom ABI for everything non-exported" instead of manually tagging everything?> > This whitelist specifies that classes defined in the global namespace as > well as app, base, browser etc. and any enclosed namespaces would use the > unstable ABI. This list excludes std::**, so we can continue to use the > system standard library. If Chromium did start using its own copy of the > standard library, we could create another whitelist with that entry in it, > or just use the -funstable-c++-abi-classes flag. > > We would also add a new warning: -Wc++-stable-abi. This would warn for any > classes defined in non-system header files that are inferred from > namespaces to use the stable ABI. This warning would be intended to be used > by programs that intend to use the unstable ABI for any non-system classes > (such as Chromium). > > Control flow integrity (-fsanitize=cfi*) would also only be supported for > classes using the unstable ABI, and would require the > -fwhole-program-vtables flag, unless the cross-DSO mode > (-fsanitize-cfi-cross-dso) is enabled. > > Next steps: > > I will send out a patch that implements the semantic analysis side of > this. Once that lands, follow-up patches will actually start changing the > unstable ABI. > > _______________________________________________ > cfe-dev mailing list > cfe-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160304/26fa93a7/attachment.html>
John McCall via llvm-dev
2016-Mar-08 00:09 UTC
[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR
> On Mar 4, 2016, at 2:48 PM, Peter Collingbourne via llvm-dev <llvm-dev at lists.llvm.org> wrote: > On Mon, Feb 29, 2016 at 1:53 PM, < <mailto:>> wrote: > @A_vtable = {i8*, i8*, i32, i32} {0, @A::rtti, @A::f - (@A_vtable + 16), @A::g - (@A_vtable + 16)} > > There's a subtlety about this aspect of the ABI that I should call attention to. The virtual function references can only be resolved directly by the static linker if they are defined in the same executable/DSO as the virtual table. I expect this to be the overwhelmingly common case, as classes are normally wholly defined within a single executable or DSO, so our implementation should be optimized around that case. > > If we expected cross-DSO references to be relatively common, we could make vtable entries be relative to GOT entries, but that would introduce an additional level of indirection and additional relocations, probably costing us more in binary size and memory bandwidth than the current ABI. > > However, it is technically possible to split the implementation of a class's virtual functions between DSOs, and there are more practical cases where we might expect to see cross-DSO references: > > - one DSO could derive from a class defined in another DSO, and only override some of its virtual functions > - the vtable could contain a reference to __cxa_pure_virtual which would be defined by the standard library > > We can handle these cases by having the vtable refer to a PLT entry for each function that is not defined within the module. This can be done by using a specific type of relative relocation that refers directly to the symbol if defined within the current module, or to a PLT entry if not. This is the same type of relocation that is needed to implement relative branches on x86, so I'd expect it to be generally available on that architecture (ELF has R_{386,X86_64}_PLT32, Mach-O has X86_64_RELOC_BRANCH, COFF has IMAGE_REL_{AMD64,I386}_REL32, which may resolve to a thunk [1], which is essentially the same thing as a PLT entry). It is also present on ARM (R_ARM_PREL31, which was apparently added to support unwind tables). > > We still need some way to create PLT relocations in the vtable's initializer without breaking the semantics of a load from the vtable. Rafael and I discussed this and we believe that if the target function is unnamed_addr, this indicates that the function's address isn't observable (this is true for virtual functions, as it isn't possible to take their address), and so it could be substituted with the address of a PLT entry.This seems like the best way to handle it. It would be nice if this could be requested of an arbitrary function without having to rely on it being unnamed_addr — that is, it would be nice to have “the address of an unnamed_addr function in this linkage unit which is equivalent when called to this other function that’s not necessary within this linkage unit”. It's easy to make an unnamed_addr wrapper function, and maybe we could teach the backend to peephole that to a PLT function reference, but (1) that sounds like some serious backend heroics and (2) it wouldn’t work for variadic functions.> One complication is that on ELF the linker will still create a PLT entry if the symbol has default visibility, in order to support symbol interposition. We can mitigate against that by using protected visibility for virtual functions if they would otherwise receive default visibility.To clarify, this is just a performance problem and doesn’t actually break semantics or feasibility, right? John.> > Thanks, > Peter > > [1] http://llvm.org/klaus/lld/blob/master/COFF/InputFiles.cpp#L-305 <http://llvm.org/klaus/lld/blob/master/COFF/InputFiles.cpp#L-305>_______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160307/d2b1a8d4/attachment.html>
Joerg Sonnenberger via llvm-dev
2016-Mar-08 00:39 UTC
[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR
On Fri, Mar 04, 2016 at 02:48:28PM -0800, Peter Collingbourne via llvm-dev wrote:> One complication is that on ELF the linker will still create a PLT entry if > the symbol has default visibility, in order to support symbol > interposition. We can mitigate against that by using protected visibility > for virtual functions if they would otherwise receive default visibility.Be very careful with using protected visibility, since recent binutils version have completely broken a bunch of basic use cases of protected symbols :( Joerg
Peter Collingbourne via llvm-dev
2016-Mar-16 23:23 UTC
[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR
On Fri, Mar 4, 2016 at 2:48 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:> On Mon, Feb 29, 2016 at 1:53 PM, <> wrote: >> >> @A_vtable = {i8*, i8*, i32, i32} {0, @A::rtti, @A::f - (@A_vtable + 16), >> @A::g - (@A_vtable + 16)} >> > > There's a subtlety about this aspect of the ABI that I should call > attention to. The virtual function references can only be resolved directly > by the static linker if they are defined in the same executable/DSO as the > virtual table. I expect this to be the overwhelmingly common case, as > classes are normally wholly defined within a single executable or DSO, so > our implementation should be optimized around that case. > > If we expected cross-DSO references to be relatively common, we could make > vtable entries be relative to GOT entries, but that would introduce an > additional level of indirection and additional relocations, probably > costing us more in binary size and memory bandwidth than the current ABI. > > However, it is technically possible to split the implementation of a > class's virtual functions between DSOs, and there are more practical cases > where we might expect to see cross-DSO references: > > - one DSO could derive from a class defined in another DSO, and only > override some of its virtual functions > - the vtable could contain a reference to __cxa_pure_virtual which would > be defined by the standard library > > We can handle these cases by having the vtable refer to a PLT entry for > each function that is not defined within the module. This can be done by > using a specific type of relative relocation that refers directly to the > symbol if defined within the current module, or to a PLT entry if not. This > is the same type of relocation that is needed to implement relative > branches on x86, so I'd expect it to be generally available on that > architecture (ELF has R_{386,X86_64}_PLT32, Mach-O has X86_64_RELOC_BRANCH, > COFF has IMAGE_REL_{AMD64,I386}_REL32, which may resolve to a thunk [1], > which is essentially the same thing as a PLT entry). It is also present on > ARM (R_ARM_PREL31, which was apparently added to support unwind tables). > > We still need some way to create PLT relocations in the vtable's > initializer without breaking the semantics of a load from the vtable. > Rafael and I discussed this and we believe that if the target function is > unnamed_addr, this indicates that the function's address isn't observable > (this is true for virtual functions, as it isn't possible to take their > address), and so it could be substituted with the address of a PLT entry. >I've discovered a problem with this idea. Since we are using 32-bit displacements, the offset from the vtable to the function must fit within 32 bits. This is assumed to be true in the medium code model, so long as the displacement points to a real function address or a PLT entry. However, if we combine a vtable load at a virtual call site, the code will evaluate the function address to the actual address of the function via the GOT, and that could push the displacement outside of the 32-bit boundary and cause an error in the evaluation of the function address. To solve this problem, I reckon that the @llvm.vtable.load.relative intrinsic I mentioned earlier will be required for correctness, and we would have to lower it very late, e.g. in the pre-backend passes. Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160316/361e78ca/attachment.html>