thr3ads.net - llvm dev - [llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR [Feb 2016]

If this information is useful, please help other people find it:
Share via:

Peter Collingbourne via llvm-dev

2016-Feb-29 21:53 UTC

[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

Hi all,

I'd like to make a proposal to implement the new vtable ABI described in
PR26723, which I'll call the relative ABI. That bug gives more details and
justification for that ABI.

The user interface for the new ABI would be that -fwhole-program-vtables
would take an optional value indicating which aspects of the program have
whole-program scope. For example, the existing implementation of whole-program
vcall optimization allows external code to call into translation units
compiled with -fwhole-program-vtables, but does not allow external code to
derive from classes defined in such translation units, so you could request
the current behaviour with "-fwhole-program-vtables=derive", which
means
that derived classes are not allowed from outside the program. To request
the new ABI, you can specify "-fwhole-program-vtables=call,derive",
which means that calls and derived classes are both not allowed from
outside the program. "-fwhole-program-vtables" would be short for
"-fwhole-program-vtables=call,derive,anythingelseweaddinfuture".

I'll also make the observation that the new ABI does not require LTO or
whole-program visibility at compile time; to decide whether to use the new
ABI for a class, we just need to check that it and its bases are not in the
whole-program-vtables blacklist.

At the same time, I'd like to change how virtual calls are represented in
the IR. This is for a few reasons:

1) Would allow whole-program virtual call optimization to work well with the
   relative ABI. This ABI would complicate the IR at call sites and make it
   harder to do matching and rewriting.

2) Simplifies the whole-program virtual call optimization pass. Currently we
   need to walk uses in the IR in order to determine the slot and callees for
   each call site. This can all be avoided with a simpler representation.

3) Would make it easier to implement dead virtual function stripping. This would
   involve reshaping any vtable initializers and rewriting call
   sites. Implementing this correctly is harder than it needs to be because
   of the current representation.

My proposal is to add the following new intrinsics:

i32 @llvm.vtable.slot.offset(metadata, i32)

This intrinsic takes a bitset name B and an offset I. It returns the byte
offset of the I'th virtual function pointer in each of the vtables in B.

i8* @llvm.vtable.load(i8*, i32)

This intrinsic takes a virtual table pointer and a byte offset, and loads
a virtual function pointer from the virtual table at the given offset.

i8* @llvm.vtable.load.relative(i8*, i32)

This intrinsic is the same as above, but it uses the relative ABI.

{i8*, i1} @llvm.vtable.checked.load(metadata %name, i8*, i32)
{i8*, i1} @llvm.vtable.checked.load.relative(metadata %name, i8*, i32)

These intrinsics would be used to implement CFI. They are similar to the
unchecked intrinsics, but if the second element of the result is non-zero,
the program may call the first element of the result as a function pointer
without causing an indirect function call to any function other than one
potentially loaded from one of the constant globals of which %name is a member.

To minimize the impact on existing passes, the intrinsics would be lowered
early during the regular pipeline when LTO is disabled, or early in the LTO
pipeline when LTO is enabled. Clang would not use the llvm.vtable.slot.offset
intrinsic when LTO is disabled, as bitset information would be unavailable.

To give the optimizer permission to reshape vtable initializers for a
particular class, the vtable would be added to a special named metadata node
named 'llvm.vtable.slots'. The presence of this metadata would guarantee
that all loads beyond a given byte offset (this range would not include the
RTTI pointer for example) are done using the above intrinsics.

We will also take advantage of the ABI break to split the class's virtual
table group at virtual table boundaries into separate globals instead of
emitting all virtual tables in the group into a single global. This will
not only simplify the implementation of dead virtual function stripping,
but also reduce code size overhead for CFI. (CFI works best if vtables for
a base class can be laid out near vtables for derived class; the current
ABI makes this harder to achieve.)

Example (using the relative ABI):

struct A {
  virtual void f();
  virtual void g();
};

struct B {
  virtual void h();
};

struct C : A, B {
  virtual void f();
  virtual void g();
  virtual void h();
};

void fcall(A *a) {
  a->f();
}

void gcall(A *a) {
  a->g();
}

typedef void (A::*mfp)();

mfp getmfp() {
  return &A::g;
}

void callmfp(A *a, mfp m) {
  (a->*m)();
}

In IR:

@A_vtable = {i8*, i8*, i32, i32} {0, @A::rtti, @A::f - (@A_vtable + 16), @A::g -
(@A_vtable + 16)}
@B_vtable = {i8*, i8*, i32} {0, @B::rtti, @B::h - (@B_vtable + 16)}
@C_vtable0 = {i8*, i8*, i32, i32, i32} {0, @C::rtti, @C::f - (@C_vtable0 + 16),
@C::g - (@C_vtable0 + 16), @C::h - (@C_vtable0 + 16)}
@C_vtable1 = {i8*, i8*, i32} {-8, @C::rtti, @C::h - (@C_vtable1 + 16)}

define void @fcall(%A* %a) {
  %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 0)
  %vtable = load i8* %a
  %fp = i8* @llvm.vtable.load.relative(%vtable, %slot)
  %casted_fp = bitcast i8* %fp to void (%A*)
  call void %casted_fp(%a)
}

define void @gcall(%A* %a) {
  %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1)
  %vtable = load i8* %a
  %fp = i8* @llvm.vtable.load.relative(%vtable, %slot)
  %casted_fp = bitcast i8* %fp to void (%A*)
  call void %casted_fp(%a)
}

define {i8*, i8*} @getmfp() {
  %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1)
  %slotp1 = add %slot, 1
  %result = insertvalue {i8*, i8*} {i8* 0, i8* 0}, 0, %slotp1
  ret {i8*, i8*} %result
}

define @callmfp(%A* %a, {i8*, i8*} %m) {
  ; assuming the call is virtual and no this adjustment
  %slot = extractvalue i8* %m, 0
  %slotm1 = sub %slot, 1
  %vtable = load i8* %a
  %fp = i8* @llvm.vtable.load.relative(%vtable, %slotm1)
  %casted_fp = bitcast i8* %fp to void (%A*)
  call void %casted_fp(%a)
}

!0 = {!"A", @A_vtable, 16}
!1 = {!"B", @B_vtable, 16}
!2 = {!"A", @C_vtable0, 16}
!3 = {!"B", @C_vtable1, 16}
!4 = {!"C", @C_vtable0, 16}
!llvm.bitsets = {!0, !1, !2, !3, !4}

!5 = {@A_vtable, 16}
!6 = {@B_vtable, 16}
!7 = {@C_vtable0, 16}
!8 = {@C_vtable1, 16}
!llvm.vtable.slots = {!5, !6, !7, !8}

Thanks,
-- 
Peter

Sean Silva via llvm-dev

2016-Feb-29 23:38 UTC

head link

[llvm-dev] [cfe-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

Using relative offsets applies to more than just vtables. It would do
wonders for constant strings too.

-- Sean Silva

On Mon, Feb 29, 2016 at 1:53 PM, Peter Collingbourne via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> Hi all,
>
> I'd like to make a proposal to implement the new vtable ABI described
in
> PR26723, which I'll call the relative ABI. That bug gives more details
and
> justification for that ABI.
>
> The user interface for the new ABI would be that -fwhole-program-vtables
> would take an optional value indicating which aspects of the program have
> whole-program scope. For example, the existing implementation of
> whole-program
> vcall optimization allows external code to call into translation units
> compiled with -fwhole-program-vtables, but does not allow external code to
> derive from classes defined in such translation units, so you could request
> the current behaviour with "-fwhole-program-vtables=derive",
which means
> that derived classes are not allowed from outside the program. To request
> the new ABI, you can specify
"-fwhole-program-vtables=call,derive",
> which means that calls and derived classes are both not allowed from
> outside the program. "-fwhole-program-vtables" would be short for
> "-fwhole-program-vtables=call,derive,anythingelseweaddinfuture".
>
> I'll also make the observation that the new ABI does not require LTO or
> whole-program visibility at compile time; to decide whether to use the new
> ABI for a class, we just need to check that it and its bases are not in the
> whole-program-vtables blacklist.
>
> At the same time, I'd like to change how virtual calls are represented
in
> the IR. This is for a few reasons:
>
> 1) Would allow whole-program virtual call optimization to work well with
> the
>    relative ABI. This ABI would complicate the IR at call sites and make it
>    harder to do matching and rewriting.
>
> 2) Simplifies the whole-program virtual call optimization pass. Currently
> we
>    need to walk uses in the IR in order to determine the slot and callees
> for
>    each call site. This can all be avoided with a simpler representation.
>
> 3) Would make it easier to implement dead virtual function stripping. This
> would
>    involve reshaping any vtable initializers and rewriting call
>    sites. Implementing this correctly is harder than it needs to be because
>    of the current representation.
>
> My proposal is to add the following new intrinsics:
>
> i32 @llvm.vtable.slot.offset(metadata, i32)
>
> This intrinsic takes a bitset name B and an offset I. It returns the byte
> offset of the I'th virtual function pointer in each of the vtables in
B.
>
> i8* @llvm.vtable.load(i8*, i32)
>
> This intrinsic takes a virtual table pointer and a byte offset, and loads
> a virtual function pointer from the virtual table at the given offset.
>
> i8* @llvm.vtable.load.relative(i8*, i32)
>
> This intrinsic is the same as above, but it uses the relative ABI.
>
> {i8*, i1} @llvm.vtable.checked.load(metadata %name, i8*, i32)
> {i8*, i1} @llvm.vtable.checked.load.relative(metadata %name, i8*, i32)
>
> These intrinsics would be used to implement CFI. They are similar to the
> unchecked intrinsics, but if the second element of the result is non-zero,
> the program may call the first element of the result as a function pointer
> without causing an indirect function call to any function other than one
> potentially loaded from one of the constant globals of which %name is a
> member.
>
> To minimize the impact on existing passes, the intrinsics would be lowered
> early during the regular pipeline when LTO is disabled, or early in the LTO
> pipeline when LTO is enabled. Clang would not use the
> llvm.vtable.slot.offset
> intrinsic when LTO is disabled, as bitset information would be unavailable.
>
> To give the optimizer permission to reshape vtable initializers for a
> particular class, the vtable would be added to a special named metadata
> node
> named 'llvm.vtable.slots'. The presence of this metadata would
guarantee
> that all loads beyond a given byte offset (this range would not include the
> RTTI pointer for example) are done using the above intrinsics.
>
> We will also take advantage of the ABI break to split the class's
virtual
> table group at virtual table boundaries into separate globals instead of
> emitting all virtual tables in the group into a single global. This will
> not only simplify the implementation of dead virtual function stripping,
> but also reduce code size overhead for CFI. (CFI works best if vtables for
> a base class can be laid out near vtables for derived class; the current
> ABI makes this harder to achieve.)
>
> Example (using the relative ABI):
>
> struct A {
>   virtual void f();
>   virtual void g();
> };
>
> struct B {
>   virtual void h();
> };
>
> struct C : A, B {
>   virtual void f();
>   virtual void g();
>   virtual void h();
> };
>
> void fcall(A *a) {
>   a->f();
> }
>
> void gcall(A *a) {
>   a->g();
> }
>
> typedef void (A::*mfp)();
>
> mfp getmfp() {
>   return &A::g;
> }
>
> void callmfp(A *a, mfp m) {
>   (a->*m)();
> }
>
> In IR:
>
> @A_vtable = {i8*, i8*, i32, i32} {0, @A::rtti, @A::f - (@A_vtable + 16),
> @A::g - (@A_vtable + 16)}
> @B_vtable = {i8*, i8*, i32} {0, @B::rtti, @B::h - (@B_vtable + 16)}
> @C_vtable0 = {i8*, i8*, i32, i32, i32} {0, @C::rtti, @C::f - (@C_vtable0 +
> 16), @C::g - (@C_vtable0 + 16), @C::h - (@C_vtable0 + 16)}
> @C_vtable1 = {i8*, i8*, i32} {-8, @C::rtti, @C::h - (@C_vtable1 + 16)}
>
> define void @fcall(%A* %a) {
>   %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 0)
>   %vtable = load i8* %a
>   %fp = i8* @llvm.vtable.load.relative(%vtable, %slot)
>   %casted_fp = bitcast i8* %fp to void (%A*)
>   call void %casted_fp(%a)
> }
>
> define void @gcall(%A* %a) {
>   %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1)
>   %vtable = load i8* %a
>   %fp = i8* @llvm.vtable.load.relative(%vtable, %slot)
>   %casted_fp = bitcast i8* %fp to void (%A*)
>   call void %casted_fp(%a)
> }
>
> define {i8*, i8*} @getmfp() {
>   %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1)
>   %slotp1 = add %slot, 1
>   %result = insertvalue {i8*, i8*} {i8* 0, i8* 0}, 0, %slotp1
>   ret {i8*, i8*} %result
> }
>
> define @callmfp(%A* %a, {i8*, i8*} %m) {
>   ; assuming the call is virtual and no this adjustment
>   %slot = extractvalue i8* %m, 0
>   %slotm1 = sub %slot, 1
>   %vtable = load i8* %a
>   %fp = i8* @llvm.vtable.load.relative(%vtable, %slotm1)
>   %casted_fp = bitcast i8* %fp to void (%A*)
>   call void %casted_fp(%a)
> }
>
> !0 = {!"A", @A_vtable, 16}
> !1 = {!"B", @B_vtable, 16}
> !2 = {!"A", @C_vtable0, 16}
> !3 = {!"B", @C_vtable1, 16}
> !4 = {!"C", @C_vtable0, 16}
> !llvm.bitsets = {!0, !1, !2, !3, !4}
>
> !5 = {@A_vtable, 16}
> !6 = {@B_vtable, 16}
> !7 = {@C_vtable0, 16}
> !8 = {@C_vtable1, 16}
> !llvm.vtable.slots = {!5, !6, !7, !8}
>
> Thanks,
> --
> Peter
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160229/581df192/attachment-0001.html>

Peter Collingbourne via llvm-dev

2016-Mar-01 00:29 UTC

head link

[llvm-dev] [cfe-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

That's true, and in fact I found that many of the remaining dynamic
relocations
in Chromium were for constant strings and data structures referencing them
(e.g. [1]). Unfortunately I couldn't see a good general way to transform a
program to use relative offsets automatically, as I imagine we would need to
consider every use of the data structure in the program. Possible in theory,
but very difficult to get right.

Peter

[1]
https://code.google.com/p/chromium/codesearch#chromium/src/net/base/mime_util.cc&l=64&gs=cpp:net::kPrimaryMappings
at
chromium/../../net/base/mime_util.cc%257Cdef&gsn=kPrimaryMappings&ct=xref_usages

On Mon, Feb 29, 2016 at 03:38:35PM -0800, Sean Silva
wrote:> Using relative offsets applies to more than just vtables. It would do
> wonders for constant strings too.
> 
> -- Sean Silva
> 
> On Mon, Feb 29, 2016 at 1:53 PM, Peter Collingbourne via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
> 
> > Hi all,
> >
> > I'd like to make a proposal to implement the new vtable ABI
described in
> > PR26723, which I'll call the relative ABI. That bug gives more
details and
> > justification for that ABI.
> >
> > The user interface for the new ABI would be that
-fwhole-program-vtables
> > would take an optional value indicating which aspects of the program
have
> > whole-program scope. For example, the existing implementation of
> > whole-program
> > vcall optimization allows external code to call into translation units
> > compiled with -fwhole-program-vtables, but does not allow external
code to
> > derive from classes defined in such translation units, so you could
request
> > the current behaviour with "-fwhole-program-vtables=derive",
which means
> > that derived classes are not allowed from outside the program. To
request
> > the new ABI, you can specify
"-fwhole-program-vtables=call,derive",
> > which means that calls and derived classes are both not allowed from
> > outside the program. "-fwhole-program-vtables" would be
short for
> >
"-fwhole-program-vtables=call,derive,anythingelseweaddinfuture".
> >
> > I'll also make the observation that the new ABI does not require
LTO or
> > whole-program visibility at compile time; to decide whether to use the
new
> > ABI for a class, we just need to check that it and its bases are not
in the
> > whole-program-vtables blacklist.
> >
> > At the same time, I'd like to change how virtual calls are
represented in
> > the IR. This is for a few reasons:
> >
> > 1) Would allow whole-program virtual call optimization to work well
with
> > the
> >    relative ABI. This ABI would complicate the IR at call sites and
make it
> >    harder to do matching and rewriting.
> >
> > 2) Simplifies the whole-program virtual call optimization pass.
Currently
> > we
> >    need to walk uses in the IR in order to determine the slot and
callees
> > for
> >    each call site. This can all be avoided with a simpler
representation.
> >
> > 3) Would make it easier to implement dead virtual function stripping.
This
> > would
> >    involve reshaping any vtable initializers and rewriting call
> >    sites. Implementing this correctly is harder than it needs to be
because
> >    of the current representation.
> >
> > My proposal is to add the following new intrinsics:
> >
> > i32 @llvm.vtable.slot.offset(metadata, i32)
> >
> > This intrinsic takes a bitset name B and an offset I. It returns the
byte
> > offset of the I'th virtual function pointer in each of the vtables
in B.
> >
> > i8* @llvm.vtable.load(i8*, i32)
> >
> > This intrinsic takes a virtual table pointer and a byte offset, and
loads
> > a virtual function pointer from the virtual table at the given offset.
> >
> > i8* @llvm.vtable.load.relative(i8*, i32)
> >
> > This intrinsic is the same as above, but it uses the relative ABI.
> >
> > {i8*, i1} @llvm.vtable.checked.load(metadata %name, i8*, i32)
> > {i8*, i1} @llvm.vtable.checked.load.relative(metadata %name, i8*, i32)
> >
> > These intrinsics would be used to implement CFI. They are similar to
the
> > unchecked intrinsics, but if the second element of the result is
non-zero,
> > the program may call the first element of the result as a function
pointer
> > without causing an indirect function call to any function other than
one
> > potentially loaded from one of the constant globals of which %name is
a
> > member.
> >
> > To minimize the impact on existing passes, the intrinsics would be
lowered
> > early during the regular pipeline when LTO is disabled, or early in
the LTO
> > pipeline when LTO is enabled. Clang would not use the
> > llvm.vtable.slot.offset
> > intrinsic when LTO is disabled, as bitset information would be
unavailable.
> >
> > To give the optimizer permission to reshape vtable initializers for a
> > particular class, the vtable would be added to a special named
metadata
> > node
> > named 'llvm.vtable.slots'. The presence of this metadata would
guarantee
> > that all loads beyond a given byte offset (this range would not
include the
> > RTTI pointer for example) are done using the above intrinsics.
> >
> > We will also take advantage of the ABI break to split the class's
virtual
> > table group at virtual table boundaries into separate globals instead
of
> > emitting all virtual tables in the group into a single global. This
will
> > not only simplify the implementation of dead virtual function
stripping,
> > but also reduce code size overhead for CFI. (CFI works best if vtables
for
> > a base class can be laid out near vtables for derived class; the
current
> > ABI makes this harder to achieve.)
> >
> > Example (using the relative ABI):
> >
> > struct A {
> >   virtual void f();
> >   virtual void g();
> > };
> >
> > struct B {
> >   virtual void h();
> > };
> >
> > struct C : A, B {
> >   virtual void f();
> >   virtual void g();
> >   virtual void h();
> > };
> >
> > void fcall(A *a) {
> >   a->f();
> > }
> >
> > void gcall(A *a) {
> >   a->g();
> > }
> >
> > typedef void (A::*mfp)();
> >
> > mfp getmfp() {
> >   return &A::g;
> > }
> >
> > void callmfp(A *a, mfp m) {
> >   (a->*m)();
> > }
> >
> > In IR:
> >
> > @A_vtable = {i8*, i8*, i32, i32} {0, @A::rtti, @A::f - (@A_vtable +
16),
> > @A::g - (@A_vtable + 16)}
> > @B_vtable = {i8*, i8*, i32} {0, @B::rtti, @B::h - (@B_vtable + 16)}
> > @C_vtable0 = {i8*, i8*, i32, i32, i32} {0, @C::rtti, @C::f -
(@C_vtable0 +
> > 16), @C::g - (@C_vtable0 + 16), @C::h - (@C_vtable0 + 16)}
> > @C_vtable1 = {i8*, i8*, i32} {-8, @C::rtti, @C::h - (@C_vtable1 + 16)}
> >
> > define void @fcall(%A* %a) {
> >   %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 0)
> >   %vtable = load i8* %a
> >   %fp = i8* @llvm.vtable.load.relative(%vtable, %slot)
> >   %casted_fp = bitcast i8* %fp to void (%A*)
> >   call void %casted_fp(%a)
> > }
> >
> > define void @gcall(%A* %a) {
> >   %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1)
> >   %vtable = load i8* %a
> >   %fp = i8* @llvm.vtable.load.relative(%vtable, %slot)
> >   %casted_fp = bitcast i8* %fp to void (%A*)
> >   call void %casted_fp(%a)
> > }
> >
> > define {i8*, i8*} @getmfp() {
> >   %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1)
> >   %slotp1 = add %slot, 1
> >   %result = insertvalue {i8*, i8*} {i8* 0, i8* 0}, 0, %slotp1
> >   ret {i8*, i8*} %result
> > }
> >
> > define @callmfp(%A* %a, {i8*, i8*} %m) {
> >   ; assuming the call is virtual and no this adjustment
> >   %slot = extractvalue i8* %m, 0
> >   %slotm1 = sub %slot, 1
> >   %vtable = load i8* %a
> >   %fp = i8* @llvm.vtable.load.relative(%vtable, %slotm1)
> >   %casted_fp = bitcast i8* %fp to void (%A*)
> >   call void %casted_fp(%a)
> > }
> >
> > !0 = {!"A", @A_vtable, 16}
> > !1 = {!"B", @B_vtable, 16}
> > !2 = {!"A", @C_vtable0, 16}
> > !3 = {!"B", @C_vtable1, 16}
> > !4 = {!"C", @C_vtable0, 16}
> > !llvm.bitsets = {!0, !1, !2, !3, !4}
> >
> > !5 = {@A_vtable, 16}
> > !6 = {@B_vtable, 16}
> > !7 = {@C_vtable0, 16}
> > !8 = {@C_vtable1, 16}
> > !llvm.vtable.slots = {!5, !6, !7, !8}
> >
> > Thanks,
> > --
> > Peter
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >
-- 
Peter

Peter Collingbourne via llvm-dev

2016-Mar-04 06:31 UTC

head link

[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

On Mon, Feb 29, 2016 at 1:53 PM, <> wrote:
> Hi all,
>
> I'd like to make a proposal to implement the new vtable ABI described
in
> PR26723, which I'll call the relative ABI. That bug gives more details
and
> justification for that ABI.
>
> The user interface for the new ABI would be that -fwhole-program-vtables
> would take an optional value indicating which aspects of the program have
> whole-program scope. For example, the existing implementation of
> whole-program
> vcall optimization allows external code to call into translation units
> compiled with -fwhole-program-vtables, but does not allow external code to
> derive from classes defined in such translation units, so you could request
> the current behaviour with "-fwhole-program-vtables=derive",
which means
> that derived classes are not allowed from outside the program. To request
> the new ABI, you can specify
"-fwhole-program-vtables=call,derive",
> which means that calls and derived classes are both not allowed from
> outside the program. "-fwhole-program-vtables" would be short for
> "-fwhole-program-vtables=call,derive,anythingelseweaddinfuture".
>
Based on discussion with John McCall in PR26723, I’d like to change the
user interface for -fwhole-program-vtables, and introduce an interface
specifically to enable the relative ABI. That interface would be based on a
whitelist rather than a blacklist, and together with
-fwhole-program-vtables would enable devirtualization, virtual const prop,
and virtual function stripping for those classes.

The new user interface is as follows:

We would introduce two new attributes, [[clang::unstable_abi]] and
[[clang::stable_abi]], which would be attached to a class and would enable
or disable the unstable ABI for that class. It is an ODR violation to use
[[clang::unstable_abi]] in two translation units compiled with different
versions of Clang (we may consider extending the object format to allow a
linker to diagnose this). Specifically, mixing different head revisions or
major releases is not allowed, but mixing different point releases is fine.
The attribute __declspec(uuid()) (which is used for COM classes on Windows)
would imply [[clang::stable_abi]].

A “dynamic-introducing” class is a class that declares new virtual member
functions or virtual bases, and has no dynamic bases or virtual bases. A
class that is dynamic but not dynamic-introducing would use the same ABI as
its dynamic base classes. The compiler will diagnose if a class has two or
more dynamic bases with different ABIs, or if the bases have a different
ABI from the one explicitly specified by an attribute.

The ABI for a dynamic-introducing class is determined from the attribute,
or if the class has no attribute, from the following flags:

-funstable-c++-abi or -funstable-c++-abi-classes would enable the unstable
C++ ABI for all classes (the idea being that -funstable-c++-abi would also
cover any unrelated ABI breaks we may want to make in future).
-funstable-c++-abi-classes=PATH would enable the unstable C++ ABI for
dynamic-introducing classes specified in the file at PATH.
The -fwhole-program-vtables-blacklist flag would be removed, and I'm no
longer proposing that -fwhole-program-vtables would take a value. The
whole-program blacklist would be replaced by either inference based on
visibility or a new [[clang::no_whole_program]] attribute.

It is effectively an ODR violation to define a class that uses the unstable
ABI in a translation unit compiled with a different set of
-funstable-c++-abi* flags. It is also a violation for a linkage unit other
than the one compiled with -fwhole-program-vtables to define any of the
classes that use the unstable ABI.

The format of the file is a series of lines ending in either * or **.
Preceding that is a namespace specifier delimited by double-colons followed
by “::”, or the empty string to denote the global namespace. Each entry in
the list indicates that dynamic-introducing classes in that namespace,
including nested classes, classes defined in enclosed anonymous namespaces,
and classes defined within member functions of those classes, use the
unstable ABI. If the line ends in “*” this applies to the given namespace
only, while if the line ends in “**” it applies to the given namespace and
any enclosed namespaces.

In Chromium for example, the contents of the file would look like this:

*
app::**
base::**
browser::**
[...]
wm::**
zip::**

This whitelist specifies that classes defined in the global namespace as
well as app, base, browser etc. and any enclosed namespaces would use the
unstable ABI. This list excludes std::**, so we can continue to use the
system standard library. If Chromium did start using its own copy of the
standard library, we could create another whitelist with that entry in it,
or just use the -funstable-c++-abi-classes flag.

We would also add a new warning: -Wc++-stable-abi. This would warn for any
classes defined in non-system header files that are inferred from
namespaces to use the stable ABI. This warning would be intended to be used
by programs that intend to use the unstable ABI for any non-system classes
(such as Chromium).

Control flow integrity (-fsanitize=cfi*) would also only be supported for
classes using the unstable ABI, and would require the
-fwhole-program-vtables flag, unless the cross-DSO mode
(-fsanitize-cfi-cross-dso) is enabled.

Next steps:

I will send out a patch that implements the semantic analysis side of this.
Once that lands, follow-up patches will actually start changing the
unstable ABI.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160303/79ce9df3/attachment.html>

Mehdi Amini via llvm-dev

2016-Mar-04 17:32 UTC

head link

[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

> On Feb 29, 2016, at 1:53 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:
> 
> Hi all,
> 
> I'd like to make a proposal to implement the new vtable ABI described
in
> PR26723, which I'll call the relative ABI. That bug gives more details
and
> justification for that ABI.
> 
> The user interface for the new ABI would be that -fwhole-program-vtables
> would take an optional value indicating which aspects of the program have
> whole-program scope. For example, the existing implementation of
whole-program
> vcall optimization allows external code to call into translation units
> compiled with -fwhole-program-vtables, but does not allow external code to
> derive from classes defined in such translation units, so you could request
> the current behaviour with "-fwhole-program-vtables=derive",
which means
> that derived classes are not allowed from outside the program. To request
> the new ABI, you can specify
"-fwhole-program-vtables=call,derive",
> which means that calls and derived classes are both not allowed from
> outside the program. "-fwhole-program-vtables" would be short for
> "-fwhole-program-vtables=call,derive,anythingelseweaddinfuture".
> 
> I'll also make the observation that the new ABI does not require LTO or
> whole-program visibility at compile time; to decide whether to use the new
> ABI for a class, we just need to check that it and its bases are not in the
> whole-program-vtables blacklist.
> 
> At the same time, I'd like to change how virtual calls are represented
in
> the IR. This is for a few reasons:
> 
> 1) Would allow whole-program virtual call optimization to work well with
the
>   relative ABI. This ABI would complicate the IR at call sites and make it
>   harder to do matching and rewriting.
> 
> 2) Simplifies the whole-program virtual call optimization pass. Currently
we
>   need to walk uses in the IR in order to determine the slot and callees
for
>   each call site. This can all be avoided with a simpler representation.
> 
> 3) Would make it easier to implement dead virtual function stripping. This
would
>   involve reshaping any vtable initializers and rewriting call
>   sites. Implementing this correctly is harder than it needs to be because
>   of the current representation.
> 
> My proposal is to add the following new intrinsics:
Thanks, I'm really glad you're moving forward on improving the IR
representation so fast after our previous discussion. The use of these
intrinsics looks a lot more friendly to me! :)
(even if I still does not make sense of the "bitset" terminology to
represent the hierarchy for the metadata part)
> 
> i32 @llvm.vtable.slot.offset(metadata, i32)
> 
> This intrinsic takes a bitset name B and an offset I. It returns the byte
> offset of the I'th virtual function pointer in each of the vtables in
B.
> 
> i8* @llvm.vtable.load(i8*, i32)
Why is the vtable.load taking a byte offset instead of a slot index directly?
(the IR could be simpler by not requiring to call @llvm.vtable.slot.offset() for
every @llvm.vtable.load())

-- 
Mehdi

> This intrinsic takes a virtual table pointer and a byte offset, and loads
> a virtual function pointer from the virtual table at the given offset.
> 
> i8* @llvm.vtable.load.relative(i8*, i32)
> 
> This intrinsic is the same as above, but it uses the relative ABI.
> 
> {i8*, i1} @llvm.vtable.checked.load(metadata %name, i8*, i32)
> {i8*, i1} @llvm.vtable.checked.load.relative(metadata %name, i8*, i32)
> 
> These intrinsics would be used to implement CFI. They are similar to the
> unchecked intrinsics, but if the second element of the result is non-zero,
> the program may call the first element of the result as a function pointer
> without causing an indirect function call to any function other than one
> potentially loaded from one of the constant globals of which %name is a
member.
> 
> To minimize the impact on existing passes, the intrinsics would be lowered
> early during the regular pipeline when LTO is disabled, or early in the LTO
> pipeline when LTO is enabled. Clang would not use the
llvm.vtable.slot.offset
> intrinsic when LTO is disabled, as bitset information would be unavailable.
> 
> To give the optimizer permission to reshape vtable initializers for a
> particular class, the vtable would be added to a special named metadata
node
> named 'llvm.vtable.slots'. The presence of this metadata would
guarantee
> that all loads beyond a given byte offset (this range would not include the
> RTTI pointer for example) are done using the above intrinsics.
> 
> We will also take advantage of the ABI break to split the class's
virtual
> table group at virtual table boundaries into separate globals instead of
> emitting all virtual tables in the group into a single global. This will
> not only simplify the implementation of dead virtual function stripping,
> but also reduce code size overhead for CFI. (CFI works best if vtables for
> a base class can be laid out near vtables for derived class; the current
> ABI makes this harder to achieve.)
> 
> Example (using the relative ABI):
> 
> struct A {
>  virtual void f();
>  virtual void g();
> };
> 
> struct B {
>  virtual void h();
> };
> 
> struct C : A, B {
>  virtual void f();
>  virtual void g();
>  virtual void h();
> };
> 
> void fcall(A *a) {
>  a->f();
> }
> 
> void gcall(A *a) {
>  a->g();
> }
> 
> typedef void (A::*mfp)();
> 
> mfp getmfp() {
>  return &A::g;
> }
> 
> void callmfp(A *a, mfp m) {
>  (a->*m)();
> }
> 
> In IR:
> 
> @A_vtable = {i8*, i8*, i32, i32} {0, @A::rtti, @A::f - (@A_vtable + 16),
@A::g - (@A_vtable + 16)}
> @B_vtable = {i8*, i8*, i32} {0, @B::rtti, @B::h - (@B_vtable + 16)}
> @C_vtable0 = {i8*, i8*, i32, i32, i32} {0, @C::rtti, @C::f - (@C_vtable0 +
16), @C::g - (@C_vtable0 + 16), @C::h - (@C_vtable0 + 16)}
> @C_vtable1 = {i8*, i8*, i32} {-8, @C::rtti, @C::h - (@C_vtable1 + 16)}
> 
> define void @fcall(%A* %a) {
>  %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 0)
>  %vtable = load i8* %a
>  %fp = i8* @llvm.vtable.load.relative(%vtable, %slot)
>  %casted_fp = bitcast i8* %fp to void (%A*)
>  call void %casted_fp(%a)
> }
> 
> define void @gcall(%A* %a) {
>  %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1)
>  %vtable = load i8* %a
>  %fp = i8* @llvm.vtable.load.relative(%vtable, %slot)
>  %casted_fp = bitcast i8* %fp to void (%A*)
>  call void %casted_fp(%a)
> }
> 
> define {i8*, i8*} @getmfp() {
>  %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1)
>  %slotp1 = add %slot, 1
>  %result = insertvalue {i8*, i8*} {i8* 0, i8* 0}, 0, %slotp1
>  ret {i8*, i8*} %result
> }
> 
> define @callmfp(%A* %a, {i8*, i8*} %m) {
>  ; assuming the call is virtual and no this adjustment
>  %slot = extractvalue i8* %m, 0
>  %slotm1 = sub %slot, 1
>  %vtable = load i8* %a
>  %fp = i8* @llvm.vtable.load.relative(%vtable, %slotm1)
>  %casted_fp = bitcast i8* %fp to void (%A*)
>  call void %casted_fp(%a)
> }
> 
> !0 = {!"A", @A_vtable, 16}
> !1 = {!"B", @B_vtable, 16}
> !2 = {!"A", @C_vtable0, 16}
> !3 = {!"B", @C_vtable1, 16}
> !4 = {!"C", @C_vtable0, 16}
> !llvm.bitsets = {!0, !1, !2, !3, !4}
> 
> !5 = {@A_vtable, 16}
> !6 = {@B_vtable, 16}
> !7 = {@C_vtable0, 16}
> !8 = {@C_vtable1, 16}
> !llvm.vtable.slots = {!5, !6, !7, !8}
> 
> Thanks,
> -- 
> Peter

Peter Collingbourne via llvm-dev

2016-Mar-04 20:47 UTC

head link

[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

On Fri, Mar 4, 2016 at 9:32 AM, Mehdi Amini <mehdi.amini at apple.com>
wrote:
>
> > On Feb 29, 2016, at 1:53 PM, Peter Collingbourne <peter at
pcc.me.uk>
> wrote:
> >
> > Hi all,
> >
> > I'd like to make a proposal to implement the new vtable ABI
described in
> > PR26723, which I'll call the relative ABI. That bug gives more
details
> and
> > justification for that ABI.
> >
> > The user interface for the new ABI would be that
-fwhole-program-vtables
> > would take an optional value indicating which aspects of the program
have
> > whole-program scope. For example, the existing implementation of
> whole-program
> > vcall optimization allows external code to call into translation units
> > compiled with -fwhole-program-vtables, but does not allow external
code
> to
> > derive from classes defined in such translation units, so you could
> request
> > the current behaviour with "-fwhole-program-vtables=derive",
which means
> > that derived classes are not allowed from outside the program. To
request
> > the new ABI, you can specify
"-fwhole-program-vtables=call,derive",
> > which means that calls and derived classes are both not allowed from
> > outside the program. "-fwhole-program-vtables" would be
short for
> >
"-fwhole-program-vtables=call,derive,anythingelseweaddinfuture".
> >
> > I'll also make the observation that the new ABI does not require
LTO or
> > whole-program visibility at compile time; to decide whether to use the
> new
> > ABI for a class, we just need to check that it and its bases are not
in
> the
> > whole-program-vtables blacklist.
> >
> > At the same time, I'd like to change how virtual calls are
represented in
> > the IR. This is for a few reasons:
> >
> > 1) Would allow whole-program virtual call optimization to work well
with
> the
> >   relative ABI. This ABI would complicate the IR at call sites and
make
> it
> >   harder to do matching and rewriting.
> >
> > 2) Simplifies the whole-program virtual call optimization pass.
> Currently we
> >   need to walk uses in the IR in order to determine the slot and
callees
> for
> >   each call site. This can all be avoided with a simpler
representation.
> >
> > 3) Would make it easier to implement dead virtual function stripping.
> This would
> >   involve reshaping any vtable initializers and rewriting call
> >   sites. Implementing this correctly is harder than it needs to be
> because
> >   of the current representation.
> >
> > My proposal is to add the following new intrinsics:
>
> Thanks, I'm really glad you're moving forward on improving the IR
> representation so fast after our previous discussion. The use of these
> intrinsics looks a lot more friendly to me! :)
> (even if I still does not make sense of the "bitset" terminology
to
> represent the hierarchy for the metadata part)
>
> >
> > i32 @llvm.vtable.slot.offset(metadata, i32)
> >
> > This intrinsic takes a bitset name B and an offset I. It returns the
byte
> > offset of the I'th virtual function pointer in each of the vtables
in B.
> >
> > i8* @llvm.vtable.load(i8*, i32)
>
> Why is the vtable.load taking a byte offset instead of a slot index
> directly? (the IR could be simpler by not requiring to call
> @llvm.vtable.slot.offset() for every @llvm.vtable.load())
>
I decided to split these in order to support virtual member function
pointers correctly. In the Itanium ABI, member function pointers use a byte
offset. The idea is that llvm.vtable.slot.offset would be used to create a
member function pointer, while llvm.vtable.load would be used to call it
(see also the getmfp and callmfp examples).

Thanks,
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160304/bd784ca7/attachment.html>

Peter Collingbourne via llvm-dev

2016-Mar-04 22:48 UTC

head link

[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

On Mon, Feb 29, 2016 at 1:53 PM, <> wrote:>
> @A_vtable = {i8*, i8*, i32, i32} {0, @A::rtti, @A::f - (@A_vtable + 16),
> @A::g - (@A_vtable + 16)}
>
There's a subtlety about this aspect of the ABI that I should call
attention to. The virtual function references can only be resolved directly
by the static linker if they are defined in the same executable/DSO as the
virtual table. I expect this to be the overwhelmingly common case, as
classes are normally wholly defined within a single executable or DSO, so
our implementation should be optimized around that case.

If we expected cross-DSO references to be relatively common, we could make
vtable entries be relative to GOT entries, but that would introduce an
additional level of indirection and additional relocations, probably
costing us more in binary size and memory bandwidth than the current ABI.

However, it is technically possible to split the implementation of a
class's virtual functions between DSOs, and there are more practical cases
where we might expect to see cross-DSO references:

- one DSO could derive from a class defined in another DSO, and only
override some of its virtual functions
- the vtable could contain a reference to __cxa_pure_virtual which would be
defined by the standard library

We can handle these cases by having the vtable refer to a PLT entry for
each function that is not defined within the module. This can be done by
using a specific type of relative relocation that refers directly to the
symbol if defined within the current module, or to a PLT entry if not. This
is the same type of relocation that is needed to implement relative
branches on x86, so I'd expect it to be generally available on that
architecture (ELF has R_{386,X86_64}_PLT32, Mach-O has X86_64_RELOC_BRANCH,
COFF has IMAGE_REL_{AMD64,I386}_REL32, which may resolve to a thunk [1],
which is essentially the same thing as a PLT entry). It is also present on
ARM (R_ARM_PREL31, which was apparently added to support unwind tables).

We still need some way to create PLT relocations in the vtable's
initializer without breaking the semantics of a load from the vtable.
Rafael and I discussed this and we believe that if the target function is
unnamed_addr, this indicates that the function's address isn't
observable
(this is true for virtual functions, as it isn't possible to take their
address), and so it could be substituted with the address of a PLT entry.

One complication is that on ELF the linker will still create a PLT entry if
the symbol has default visibility, in order to support symbol
interposition. We can mitigate against that by using protected visibility
for virtual functions if they would otherwise receive default visibility.

Thanks,
Peter

[1] http://llvm.org/klaus/lld/blob/master/COFF/InputFiles.cpp#L-305
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160304/924f2f63/attachment.html>

Nico Weber via llvm-dev

2016-Mar-05 03:49 UTC

head link

[llvm-dev] [cfe-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

On Thu, Mar 3, 2016 at 10:31 PM, Peter Collingbourne via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> On Mon, Feb 29, 2016 at 1:53 PM, <> wrote:
>
>> Hi all,
>>
>> I'd like to make a proposal to implement the new vtable ABI
described in
>> PR26723, which I'll call the relative ABI. That bug gives more
details and
>> justification for that ABI.
>>
>> The user interface for the new ABI would be that
-fwhole-program-vtables
>> would take an optional value indicating which aspects of the program
have
>> whole-program scope. For example, the existing implementation of
>> whole-program
>> vcall optimization allows external code to call into translation units
>> compiled with -fwhole-program-vtables, but does not allow external code
to
>> derive from classes defined in such translation units, so you could
>> request
>> the current behaviour with "-fwhole-program-vtables=derive",
which means
>> that derived classes are not allowed from outside the program. To
request
>> the new ABI, you can specify
"-fwhole-program-vtables=call,derive",
>> which means that calls and derived classes are both not allowed from
>> outside the program. "-fwhole-program-vtables" would be short
for
>>
"-fwhole-program-vtables=call,derive,anythingelseweaddinfuture".
>>
>
> Based on discussion with John McCall in PR26723, I’d like to change the
> user interface for -fwhole-program-vtables, and introduce an interface
> specifically to enable the relative ABI. That interface would be based on a
> whitelist rather than a blacklist, and together with
> -fwhole-program-vtables would enable devirtualization, virtual const prop,
> and virtual function stripping for those classes.
>
> The new user interface is as follows:
>
> We would introduce two new attributes, [[clang::unstable_abi]] and
> [[clang::stable_abi]], which would be attached to a class and would enable
> or disable the unstable ABI for that class. It is an ODR violation to use
> [[clang::unstable_abi]] in two translation units compiled with different
> versions of Clang (we may consider extending the object format to allow a
> linker to diagnose this). Specifically, mixing different head revisions or
> major releases is not allowed, but mixing different point releases is fine.
> The attribute __declspec(uuid()) (which is used for COM classes on Windows)
> would imply [[clang::stable_abi]].
>
> A “dynamic-introducing” class is a class that declares new virtual member
> functions or virtual bases, and has no dynamic bases or virtual bases. A
> class that is dynamic but not dynamic-introducing would use the same ABI as
> its dynamic base classes. The compiler will diagnose if a class has two or
> more dynamic bases with different ABIs, or if the bases have a different
> ABI from the one explicitly specified by an attribute.
>
> The ABI for a dynamic-introducing class is determined from the attribute,
> or if the class has no attribute, from the following flags:
>
> -funstable-c++-abi or -funstable-c++-abi-classes would enable the unstable
> C++ ABI for all classes (the idea being that -funstable-c++-abi would also
> cover any unrelated ABI breaks we may want to make in future).
> -funstable-c++-abi-classes=PATH would enable the unstable C++ ABI for
> dynamic-introducing classes specified in the file at PATH.
> The -fwhole-program-vtables-blacklist flag would be removed, and I'm no
> longer proposing that -fwhole-program-vtables would take a value. The
> whole-program blacklist would be replaced by either inference based on
> visibility or a new [[clang::no_whole_program]] attribute.
>
> It is effectively an ODR violation to define a class that uses the
> unstable ABI in a translation unit compiled with a different set of
> -funstable-c++-abi* flags. It is also a violation for a linkage unit other
> than the one compiled with -fwhole-program-vtables to define any of the
> classes that use the unstable ABI.
>
> The format of the file is a series of lines ending in either * or **.
> Preceding that is a namespace specifier delimited by double-colons followed
> by “::”, or the empty string to denote the global namespace. Each entry in
> the list indicates that dynamic-introducing classes in that namespace,
> including nested classes, classes defined in enclosed anonymous namespaces,
> and classes defined within member functions of those classes, use the
> unstable ABI. If the line ends in “*” this applies to the given namespace
> only, while if the line ends in “**” it applies to the given namespace and
> any enclosed namespaces.
>
> In Chromium for example, the contents of the file would look like this:
>
> *
> app::**
> base::**
> browser::**
> [...]
> wm::**
> zip::**
>
Wouldn't we want to say "use custom ABI for everything
non-exported"
instead of manually tagging everything?

>
> This whitelist specifies that classes defined in the global namespace as
> well as app, base, browser etc. and any enclosed namespaces would use the
> unstable ABI. This list excludes std::**, so we can continue to use the
> system standard library. If Chromium did start using its own copy of the
> standard library, we could create another whitelist with that entry in it,
> or just use the -funstable-c++-abi-classes flag.
>
> We would also add a new warning: -Wc++-stable-abi. This would warn for any
> classes defined in non-system header files that are inferred from
> namespaces to use the stable ABI. This warning would be intended to be used
> by programs that intend to use the unstable ABI for any non-system classes
> (such as Chromium).
>
> Control flow integrity (-fsanitize=cfi*) would also only be supported for
> classes using the unstable ABI, and would require the
> -fwhole-program-vtables flag, unless the cross-DSO mode
> (-fsanitize-cfi-cross-dso) is enabled.
>
> Next steps:
>
> I will send out a patch that implements the semantic analysis side of
> this. Once that lands, follow-up patches will actually start changing the
> unstable ABI.
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160304/26fa93a7/attachment.html>

John McCall via llvm-dev

2016-Mar-08 00:09 UTC

head link

[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

> On Mar 4, 2016, at 2:48 PM, Peter Collingbourne via llvm-dev <llvm-dev
at lists.llvm.org> wrote:
> On Mon, Feb 29, 2016 at 1:53 PM, < <mailto:>> wrote:
> @A_vtable = {i8*, i8*, i32, i32} {0, @A::rtti, @A::f - (@A_vtable + 16),
@A::g - (@A_vtable + 16)}
> 
> There's a subtlety about this aspect of the ABI that I should call
attention to. The virtual function references can only be resolved directly by
the static linker if they are defined in the same executable/DSO as the virtual
table. I expect this to be the overwhelmingly common case, as classes are
normally wholly defined within a single executable or DSO, so our implementation
should be optimized around that case.
> 
> If we expected cross-DSO references to be relatively common, we could make
vtable entries be relative to GOT entries, but that would introduce an
additional level of indirection and additional relocations, probably costing us
more in binary size and memory bandwidth than the current ABI.
> 
> However, it is technically possible to split the implementation of a
class's virtual functions between DSOs, and there are more practical cases
where we might expect to see cross-DSO references:
> 
> - one DSO could derive from a class defined in another DSO, and only
override some of its virtual functions
> - the vtable could contain a reference to __cxa_pure_virtual which would be
defined by the standard library
> 
> We can handle these cases by having the vtable refer to a PLT entry for
each function that is not defined within the module. This can be done by using a
specific type of relative relocation that refers directly to the symbol if
defined within the current module, or to a PLT entry if not. This is the same
type of relocation that is needed to implement relative branches on x86, so
I'd expect it to be generally available on that architecture (ELF has
R_{386,X86_64}_PLT32, Mach-O has X86_64_RELOC_BRANCH, COFF has
IMAGE_REL_{AMD64,I386}_REL32, which may resolve to a thunk [1], which is
essentially the same thing as a PLT entry). It is also present on ARM
(R_ARM_PREL31, which was apparently added to support unwind tables).
> 
> We still need some way to create PLT relocations in the vtable's
initializer without breaking the semantics of a load from the vtable. Rafael and
I discussed this and we believe that if the target function is unnamed_addr,
this indicates that the function's address isn't observable (this is
true for virtual functions, as it isn't possible to take their address), and
so it could be substituted with the address of a PLT entry.
This seems like the best way to handle it.

It would be nice if this could be requested of an arbitrary function without
having to rely on it being unnamed_addr — that is, it would be nice to have “the
address of an unnamed_addr function in this linkage unit which is equivalent
when called to this other function that’s not necessary within this linkage
unit”.  It's easy to make an unnamed_addr wrapper function, and maybe we
could teach the backend to peephole that to a PLT function reference, but (1)
that sounds like some serious backend heroics and (2) it wouldn’t work for
variadic functions.
> One complication is that on ELF the linker will still create a PLT entry if
the symbol has default visibility, in order to support symbol interposition. We
can mitigate against that by using protected visibility for virtual functions if
they would otherwise receive default visibility.
To clarify, this is just a performance problem and doesn’t actually break
semantics or feasibility, right?

John.
> 
> Thanks,
> Peter
> 
> [1] http://llvm.org/klaus/lld/blob/master/COFF/InputFiles.cpp#L-305
<http://llvm.org/klaus/lld/blob/master/COFF/InputFiles.cpp#L-305>_______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160307/d2b1a8d4/attachment.html>

Joerg Sonnenberger via llvm-dev

2016-Mar-08 00:39 UTC

head link

[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

On Fri, Mar 04, 2016 at 02:48:28PM -0800, Peter Collingbourne via llvm-dev
wrote:> One complication is that on ELF the linker will still create a PLT entry if
> the symbol has default visibility, in order to support symbol
> interposition. We can mitigate against that by using protected visibility
> for virtual functions if they would otherwise receive default visibility.
Be very careful with using protected visibility, since recent binutils
version have completely broken a bunch of basic use cases of protected
symbols :(

Joerg

Peter Collingbourne via llvm-dev

2016-Mar-16 23:23 UTC

head link

[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

On Fri, Mar 4, 2016 at 2:48 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:
> On Mon, Feb 29, 2016 at 1:53 PM, <> wrote:
>>
>> @A_vtable = {i8*, i8*, i32, i32} {0, @A::rtti, @A::f - (@A_vtable +
16),
>> @A::g - (@A_vtable + 16)}
>>
>
> There's a subtlety about this aspect of the ABI that I should call
> attention to. The virtual function references can only be resolved directly
> by the static linker if they are defined in the same executable/DSO as the
> virtual table. I expect this to be the overwhelmingly common case, as
> classes are normally wholly defined within a single executable or DSO, so
> our implementation should be optimized around that case.
>
> If we expected cross-DSO references to be relatively common, we could make
> vtable entries be relative to GOT entries, but that would introduce an
> additional level of indirection and additional relocations, probably
> costing us more in binary size and memory bandwidth than the current ABI.
>
> However, it is technically possible to split the implementation of a
> class's virtual functions between DSOs, and there are more practical
cases
> where we might expect to see cross-DSO references:
>
> - one DSO could derive from a class defined in another DSO, and only
> override some of its virtual functions
> - the vtable could contain a reference to __cxa_pure_virtual which would
> be defined by the standard library
>
> We can handle these cases by having the vtable refer to a PLT entry for
> each function that is not defined within the module. This can be done by
> using a specific type of relative relocation that refers directly to the
> symbol if defined within the current module, or to a PLT entry if not. This
> is the same type of relocation that is needed to implement relative
> branches on x86, so I'd expect it to be generally available on that
> architecture (ELF has R_{386,X86_64}_PLT32, Mach-O has X86_64_RELOC_BRANCH,
> COFF has IMAGE_REL_{AMD64,I386}_REL32, which may resolve to a thunk [1],
> which is essentially the same thing as a PLT entry). It is also present on
> ARM (R_ARM_PREL31, which was apparently added to support unwind tables).
>
> We still need some way to create PLT relocations in the vtable's
> initializer without breaking the semantics of a load from the vtable.
> Rafael and I discussed this and we believe that if the target function is
> unnamed_addr, this indicates that the function's address isn't
observable
> (this is true for virtual functions, as it isn't possible to take their
> address), and so it could be substituted with the address of a PLT entry.
>
I've discovered a problem with this idea. Since we are using 32-bit
displacements, the offset from the vtable to the function must fit within
32 bits. This is assumed to be true in the medium code model, so long as
the displacement points to a real function address or a PLT entry. However,
if we combine a vtable load at a virtual call site, the code will evaluate
the function address to the actual address of the function via the GOT, and
that could push the displacement outside of the 32-bit boundary and cause
an error in the evaluation of the function address.

To solve this problem, I reckon that the @llvm.vtable.load.relative
intrinsic I mentioned earlier will be required for correctness, and we
would have to lower it very late, e.g. in the pre-backend passes.

Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160316/361e78ca/attachment.html>

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Feb 2016 - RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

[llvm-dev] [cfe-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

[llvm-dev] [cfe-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

[llvm-dev] [cfe-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

Apparently Analagous Threads