thr3ads.net - llvm dev - [LLVMdev] IR extension proposal: bitset constants [Feb 2015]

If this information is useful, please help other people find it:
Share via:

Peter Collingbourne

2015-Jan-27 21:07 UTC

[LLVMdev] IR extension proposal: bitset constants

Hi all,

I would like to propose a mechanism that allows IR modules to co-operatively
build a pointer set corresponding to addresses within a given set of
globals. The specific use case I have in mind is to provide a mechanism
for a C++ program to efficiently verify (at each call site) that a vtable
pointer is in the set of valid vtable pointers for the class or its derived
classes. One way of doing this is for a toolchain component to build, for each
class, a bit set that maps to the memory region allocated for the vtables,
such that each 1 bit in the bit set maps to a valid vtable for that class,
and lay out the vtables next to each other, to minimize the total size of
the bit sets. Call sites would perform a range check followed by a load of
the appropriate bit from the bit set.

To give a concrete example, suppose we have the following classes:

struct A { virtual void f(); };
struct B : A { virtual void f(), g(); };
struct C : A { virtual void f(), h(); };

with the following vtables:

_ZTV1A = { &A::rtti, &A::f };
_ZTV1B = { &B::rtti, &B::f, &B::g };
_ZTV1C = { &C::rtti, &C::f, &C::h };

The set of valid vtables for static class A is {&_ZTV1A[1], &_ZTV1B[1],
&_ZTV1C[1]}, for B is {&_ZTV1B[1]} and for C is {&_ZTV1C[1]}. The
toolchain
would lay out _ZTV1A, _ZTV1B and _ZTV1C consecutively, and generate bit sets
like this:

A = {0,1,0,1,0,0,1,0}
B = {0,0,0,1,0,0,0,0}
C = {0,0,0,0,0,0,1,0}

A call site (say the static type of the call is A) will check whether the
object's vtable falls in the range [_ZTV1A, _ZTV1C], and is sizeof(void
*)-aligned. It would then load the (vtable-_ZTV1A)/sizeof(void *)'th entry
in A's bitset, which will be 1 if the vtable is valid for A.

We are now faced with a number of implementation questions: how do we represent
these (potentially partial) bit sets in LLVM, and how do we arrange to lay
out the globals and build the complete bit sets? Remember that classes B
and C may be in different translation units, so we do not necessarily have
a complete picture of A's bit set at A's compile time.

What I propose is that the parts of the bit sets be represented as arrays
of pointers, which will be resolved through some mechanism (either at
LTO time, or at regular (non-LTO) link time) to bit sets corresponding to
those addresses. This mechanism would also be responsible for laying out
the referenced globals (i.e. the vtables) as close together as possible. We
introduce a new flag on globals, the so-called "bitset" flag, which
causes
the global to be transformed using this mechanism. Such a global cannot be
accessed in the normal way. It may only be accessed using an intrinsic that
tests whether a given pointer is in the set.

To return to our concrete example, a translation unit defining the vtable
for A may also define the following global:

@_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1A, 0, 1)]

A translation unit defining B would define:

@_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)]
@_ZBS1B = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)]

And C:

@_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)]
@_ZBS1C = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)]

A later mechanism would combine the bitset globals, lay out the referenced
globals and build the bit sets. In the LTO case, the IR linker can already
do the combination step if the globals are given appending linkage. A
later compiler pass (which we will call the "bitset lowering pass")
would lay out
the globals (by combining them into a single large global, and RAUWing the
original globals with aliases that gep into the large global) and build the
bit sets. In the non-LTO case, we could also invent an object-file-specific
mechanism for the backend to tell the linker to combine and build the bit sets,
for example by placing the bitset globals in a specifically named section,
and place the referenced globals in individual sections so that the linker
can rearrange them.

As previously mentioned, an intrinsic would be used to test entries in the
bit set at call sites. For example, the relevant IR for the following C++ code:

A *some_object = ...;
some_object->f();

might look like this:

%vtable = load i8** %some_object
%p = call i1 @llvm.bitset.test(i8* %vtable, i8* @_ZBS1A) readnone
br i1 %p, label %continue, label %trap

In the LTO case, such intrinsic calls would be lowered by the bitset lowering
pass
into the appropriate range/bit set check code, which might look like this:
(pseudocode for 64-bit platform)

if %vtable & 7 != 0 || %vtable < global_start_addr(_ZBS1A) || %vtable
> global_end_addr(_ZBS1A) {
  %p = 0
} else {
  %p = bitset(_ZBS1A)[(%vtable - global_start_addr(_ZBS1A)) >> 3]
}

In the non-LTO case, we would generate similar code, but with appropriate
relocations so that the required constants (global start/end address, bitset
address, mask, shift amount) would receive appropriate linker-determined
values.

The first step I propose to make in this direction is a patch that adds
support for the bitset flag on globals, and introduces the bitset lowering pass.
A
later step would involve building backend support, and linker support in lld.

Comments appreciated.

Thanks,
-- 
Peter

Sean Silva

2015-Jan-28 10:57 UTC

head link

[LLVMdev] IR extension proposal: bitset constants

Is there any way to accomplish this that doesn't require changes in every
part of the toolchain?

What you describe here sounds really efficient, but for a change this
pervasive throughout the toolchain, I would at least like to see an attempt
done that does not require changing the toolchain. Then we can circle back
to changing the toolchain if that proves inadequate and with concrete
experience informing what really is going to require a toolchain change
(Maybe you've already done this? Please share.).

-- Sean Silva


On Tue, Jan 27, 2015 at 9:07 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:
> Hi all,
>
> I would like to propose a mechanism that allows IR modules to
> co-operatively
> build a pointer set corresponding to addresses within a given set of
> globals. The specific use case I have in mind is to provide a mechanism
> for a C++ program to efficiently verify (at each call site) that a vtable
> pointer is in the set of valid vtable pointers for the class or its derived
> classes. One way of doing this is for a toolchain component to build, for
> each
> class, a bit set that maps to the memory region allocated for the vtables,
> such that each 1 bit in the bit set maps to a valid vtable for that class,
> and lay out the vtables next to each other, to minimize the total size of
> the bit sets. Call sites would perform a range check followed by a load of
> the appropriate bit from the bit set.
>
> To give a concrete example, suppose we have the following classes:
>
> struct A { virtual void f(); };
> struct B : A { virtual void f(), g(); };
> struct C : A { virtual void f(), h(); };
>
> with the following vtables:
>
> _ZTV1A = { &A::rtti, &A::f };
> _ZTV1B = { &B::rtti, &B::f, &B::g };
> _ZTV1C = { &C::rtti, &C::f, &C::h };
>
> The set of valid vtables for static class A is {&_ZTV1A[1],
&_ZTV1B[1],
> &_ZTV1C[1]}, for B is {&_ZTV1B[1]} and for C is {&_ZTV1C[1]}.
The toolchain
> would lay out _ZTV1A, _ZTV1B and _ZTV1C consecutively, and generate bit
> sets
> like this:
>
> A = {0,1,0,1,0,0,1,0}
> B = {0,0,0,1,0,0,0,0}
> C = {0,0,0,0,0,0,1,0}
>
> A call site (say the static type of the call is A) will check whether the
> object's vtable falls in the range [_ZTV1A, _ZTV1C], and is sizeof(void
> *)-aligned. It would then load the (vtable-_ZTV1A)/sizeof(void *)'th
entry
> in A's bitset, which will be 1 if the vtable is valid for A.
>
> We are now faced with a number of implementation questions: how do we
> represent
> these (potentially partial) bit sets in LLVM, and how do we arrange to lay
> out the globals and build the complete bit sets? Remember that classes B
> and C may be in different translation units, so we do not necessarily have
> a complete picture of A's bit set at A's compile time.
>
> What I propose is that the parts of the bit sets be represented as arrays
> of pointers, which will be resolved through some mechanism (either at
> LTO time, or at regular (non-LTO) link time) to bit sets corresponding to
> those addresses. This mechanism would also be responsible for laying out
> the referenced globals (i.e. the vtables) as close together as possible. We
> introduce a new flag on globals, the so-called "bitset" flag,
which causes
> the global to be transformed using this mechanism. Such a global cannot be
> accessed in the normal way. It may only be accessed using an intrinsic that
> tests whether a given pointer is in the set.
>
> To return to our concrete example, a translation unit defining the vtable
> for A may also define the following global:
>
> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1A, 0, 1)]
>
> A translation unit defining B would define:
>
> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)]
> @_ZBS1B = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)]
>
> And C:
>
> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)]
> @_ZBS1C = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)]
>
> A later mechanism would combine the bitset globals, lay out the referenced
> globals and build the bit sets. In the LTO case, the IR linker can already
> do the combination step if the globals are given appending linkage. A
> later compiler pass (which we will call the "bitset lowering
pass") would
> lay out
> the globals (by combining them into a single large global, and RAUWing the
> original globals with aliases that gep into the large global) and build the
> bit sets. In the non-LTO case, we could also invent an object-file-specific
> mechanism for the backend to tell the linker to combine and build the bit
> sets,
> for example by placing the bitset globals in a specifically named section,
> and place the referenced globals in individual sections so that the linker
> can rearrange them.
>
> As previously mentioned, an intrinsic would be used to test entries in the
> bit set at call sites. For example, the relevant IR for the following C++
> code:
>
> A *some_object = ...;
> some_object->f();
>
> might look like this:
>
> %vtable = load i8** %some_object
> %p = call i1 @llvm.bitset.test(i8* %vtable, i8* @_ZBS1A) readnone
> br i1 %p, label %continue, label %trap
>
> In the LTO case, such intrinsic calls would be lowered by the bitset
> lowering pass
> into the appropriate range/bit set check code, which might look like this:
> (pseudocode for 64-bit platform)
>
> if %vtable & 7 != 0 || %vtable < global_start_addr(_ZBS1A) ||
%vtable >
> global_end_addr(_ZBS1A) {
>   %p = 0
> } else {
>   %p = bitset(_ZBS1A)[(%vtable - global_start_addr(_ZBS1A)) >> 3]
> }
>
> In the non-LTO case, we would generate similar code, but with appropriate
> relocations so that the required constants (global start/end address,
> bitset
> address, mask, shift amount) would receive appropriate linker-determined
> values.
>
> The first step I propose to make in this direction is a patch that adds
> support for the bitset flag on globals, and introduces the bitset lowering
> pass. A
> later step would involve building backend support, and linker support in
> lld.
>
> Comments appreciated.
>
> Thanks,
> --
> Peter
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150128/d766bf84/attachment.html>

Kostya Serebryany

2015-Jan-28 17:41 UTC

head link

[LLVMdev] IR extension proposal: bitset constants

I would start from using module-level metadata.
An IR extension might be a good idea once we show that
  - the proposed indirect call protection mechanism is efficient and
useful, and
  - there are other use cases for the IR extension.

--kcc


On Wed, Jan 28, 2015 at 2:57 AM, Sean Silva <chisophugis at gmail.com>
wrote:
> Is there any way to accomplish this that doesn't require changes in
every
> part of the toolchain?
>
> What you describe here sounds really efficient, but for a change this
> pervasive throughout the toolchain, I would at least like to see an attempt
> done that does not require changing the toolchain. Then we can circle back
> to changing the toolchain if that proves inadequate and with concrete
> experience informing what really is going to require a toolchain change
> (Maybe you've already done this? Please share.).
>
> -- Sean Silva
>
>
> On Tue, Jan 27, 2015 at 9:07 PM, Peter Collingbourne <peter at
pcc.me.uk>
> wrote:
>
>> Hi all,
>>
>> I would like to propose a mechanism that allows IR modules to
>> co-operatively
>> build a pointer set corresponding to addresses within a given set of
>> globals. The specific use case I have in mind is to provide a mechanism
>> for a C++ program to efficiently verify (at each call site) that a
vtable
>> pointer is in the set of valid vtable pointers for the class or its
>> derived
>> classes. One way of doing this is for a toolchain component to build,
for
>> each
>> class, a bit set that maps to the memory region allocated for the
vtables,
>> such that each 1 bit in the bit set maps to a valid vtable for that
class,
>> and lay out the vtables next to each other, to minimize the total size
of
>> the bit sets. Call sites would perform a range check followed by a load
of
>> the appropriate bit from the bit set.
>>
>> To give a concrete example, suppose we have the following classes:
>>
>> struct A { virtual void f(); };
>> struct B : A { virtual void f(), g(); };
>> struct C : A { virtual void f(), h(); };
>>
>> with the following vtables:
>>
>> _ZTV1A = { &A::rtti, &A::f };
>> _ZTV1B = { &B::rtti, &B::f, &B::g };
>> _ZTV1C = { &C::rtti, &C::f, &C::h };
>>
>> The set of valid vtables for static class A is {&_ZTV1A[1],
&_ZTV1B[1],
>> &_ZTV1C[1]}, for B is {&_ZTV1B[1]} and for C is
{&_ZTV1C[1]}. The
>> toolchain
>> would lay out _ZTV1A, _ZTV1B and _ZTV1C consecutively, and generate bit
>> sets
>> like this:
>>
>> A = {0,1,0,1,0,0,1,0}
>> B = {0,0,0,1,0,0,0,0}
>> C = {0,0,0,0,0,0,1,0}
>>
>> A call site (say the static type of the call is A) will check whether
the
>> object's vtable falls in the range [_ZTV1A, _ZTV1C], and is
sizeof(void
>> *)-aligned. It would then load the (vtable-_ZTV1A)/sizeof(void
*)'th entry
>> in A's bitset, which will be 1 if the vtable is valid for A.
>>
>> We are now faced with a number of implementation questions: how do we
>> represent
>> these (potentially partial) bit sets in LLVM, and how do we arrange to
lay
>> out the globals and build the complete bit sets? Remember that classes
B
>> and C may be in different translation units, so we do not necessarily
have
>> a complete picture of A's bit set at A's compile time.
>>
>> What I propose is that the parts of the bit sets be represented as
arrays
>> of pointers, which will be resolved through some mechanism (either at
>> LTO time, or at regular (non-LTO) link time) to bit sets corresponding
to
>> those addresses. This mechanism would also be responsible for laying
out
>> the referenced globals (i.e. the vtables) as close together as
possible.
>> We
>> introduce a new flag on globals, the so-called "bitset" flag,
which causes
>> the global to be transformed using this mechanism. Such a global cannot
be
>> accessed in the normal way. It may only be accessed using an intrinsic
>> that
>> tests whether a given pointer is in the set.
>>
>> To return to our concrete example, a translation unit defining the
vtable
>> for A may also define the following global:
>>
>> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1A, 0, 1)]
>>
>> A translation unit defining B would define:
>>
>> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)]
>> @_ZBS1B = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)]
>>
>> And C:
>>
>> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)]
>> @_ZBS1C = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)]
>>
>> A later mechanism would combine the bitset globals, lay out the
referenced
>> globals and build the bit sets. In the LTO case, the IR linker can
already
>> do the combination step if the globals are given appending linkage. A
>> later compiler pass (which we will call the "bitset lowering
pass") would
>> lay out
>> the globals (by combining them into a single large global, and RAUWing
the
>> original globals with aliases that gep into the large global) and build
>> the
>> bit sets. In the non-LTO case, we could also invent an
>> object-file-specific
>> mechanism for the backend to tell the linker to combine and build the
bit
>> sets,
>> for example by placing the bitset globals in a specifically named
section,
>> and place the referenced globals in individual sections so that the
linker
>> can rearrange them.
>>
>> As previously mentioned, an intrinsic would be used to test entries in
the
>> bit set at call sites. For example, the relevant IR for the following
C++
>> code:
>>
>> A *some_object = ...;
>> some_object->f();
>>
>> might look like this:
>>
>> %vtable = load i8** %some_object
>> %p = call i1 @llvm.bitset.test(i8* %vtable, i8* @_ZBS1A) readnone
>> br i1 %p, label %continue, label %trap
>>
>> In the LTO case, such intrinsic calls would be lowered by the bitset
>> lowering pass
>> into the appropriate range/bit set check code, which might look like
this:
>> (pseudocode for 64-bit platform)
>>
>> if %vtable & 7 != 0 || %vtable < global_start_addr(_ZBS1A) ||
%vtable >
>> global_end_addr(_ZBS1A) {
>>   %p = 0
>> } else {
>>   %p = bitset(_ZBS1A)[(%vtable - global_start_addr(_ZBS1A)) >> 3]
>> }
>>
>> In the non-LTO case, we would generate similar code, but with
appropriate
>> relocations so that the required constants (global start/end address,
>> bitset
>> address, mask, shift amount) would receive appropriate
linker-determined
>> values.
>>
>> The first step I propose to make in this direction is a patch that adds
>> support for the bitset flag on globals, and introduces the bitset
>> lowering pass. A
>> later step would involve building backend support, and linker support
in
>> lld.
>>
>> Comments appreciated.
>>
>> Thanks,
>> --
>> Peter
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150128/2ed637b6/attachment.html>

Peter Collingbourne

2015-Jan-28 18:32 UTC

head link

[LLVMdev] IR extension proposal: bitset constants

Hi Sean,

I agree that we should first try to accomplish this without changing every
part of the toolchain. This is why I proposed to first introduce only the IR
support and bitset lowering pass. This would provide a working implementation
that is confined to the compiler, but does require LTO.

In the long run, however, I would like to drop the dependency on LTO, in
order to permit fast incremental builds.

Once the initial approach has proven to be effective, we can think about
dropping the LTO dependency by implementing linker support. I only included
the linker implementation details to show that the design can work without
requiring LTO.

Peter

On Wed, Jan 28, 2015 at 10:57:21AM +0000, Sean Silva
wrote:> Is there any way to accomplish this that doesn't require changes in
every
> part of the toolchain?
> 
> What you describe here sounds really efficient, but for a change this
> pervasive throughout the toolchain, I would at least like to see an attempt
> done that does not require changing the toolchain. Then we can circle back
> to changing the toolchain if that proves inadequate and with concrete
> experience informing what really is going to require a toolchain change
> (Maybe you've already done this? Please share.).
> 
> -- Sean Silva
> 
> 
> On Tue, Jan 27, 2015 at 9:07 PM, Peter Collingbourne <peter at
pcc.me.uk>
> wrote:
> 
> > Hi all,
> >
> > I would like to propose a mechanism that allows IR modules to
> > co-operatively
> > build a pointer set corresponding to addresses within a given set of
> > globals. The specific use case I have in mind is to provide a
mechanism
> > for a C++ program to efficiently verify (at each call site) that a
vtable
> > pointer is in the set of valid vtable pointers for the class or its
derived
> > classes. One way of doing this is for a toolchain component to build,
for
> > each
> > class, a bit set that maps to the memory region allocated for the
vtables,
> > such that each 1 bit in the bit set maps to a valid vtable for that
class,
> > and lay out the vtables next to each other, to minimize the total size
of
> > the bit sets. Call sites would perform a range check followed by a
load of
> > the appropriate bit from the bit set.
> >
> > To give a concrete example, suppose we have the following classes:
> >
> > struct A { virtual void f(); };
> > struct B : A { virtual void f(), g(); };
> > struct C : A { virtual void f(), h(); };
> >
> > with the following vtables:
> >
> > _ZTV1A = { &A::rtti, &A::f };
> > _ZTV1B = { &B::rtti, &B::f, &B::g };
> > _ZTV1C = { &C::rtti, &C::f, &C::h };
> >
> > The set of valid vtables for static class A is {&_ZTV1A[1],
&_ZTV1B[1],
> > &_ZTV1C[1]}, for B is {&_ZTV1B[1]} and for C is
{&_ZTV1C[1]}. The toolchain
> > would lay out _ZTV1A, _ZTV1B and _ZTV1C consecutively, and generate
bit
> > sets
> > like this:
> >
> > A = {0,1,0,1,0,0,1,0}
> > B = {0,0,0,1,0,0,0,0}
> > C = {0,0,0,0,0,0,1,0}
> >
> > A call site (say the static type of the call is A) will check whether
the
> > object's vtable falls in the range [_ZTV1A, _ZTV1C], and is
sizeof(void
> > *)-aligned. It would then load the (vtable-_ZTV1A)/sizeof(void
*)'th entry
> > in A's bitset, which will be 1 if the vtable is valid for A.
> >
> > We are now faced with a number of implementation questions: how do we
> > represent
> > these (potentially partial) bit sets in LLVM, and how do we arrange to
lay
> > out the globals and build the complete bit sets? Remember that classes
B
> > and C may be in different translation units, so we do not necessarily
have
> > a complete picture of A's bit set at A's compile time.
> >
> > What I propose is that the parts of the bit sets be represented as
arrays
> > of pointers, which will be resolved through some mechanism (either at
> > LTO time, or at regular (non-LTO) link time) to bit sets corresponding
to
> > those addresses. This mechanism would also be responsible for laying
out
> > the referenced globals (i.e. the vtables) as close together as
possible. We
> > introduce a new flag on globals, the so-called "bitset"
flag, which causes
> > the global to be transformed using this mechanism. Such a global
cannot be
> > accessed in the normal way. It may only be accessed using an intrinsic
that
> > tests whether a given pointer is in the set.
> >
> > To return to our concrete example, a translation unit defining the
vtable
> > for A may also define the following global:
> >
> > @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1A, 0, 1)]
> >
> > A translation unit defining B would define:
> >
> > @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)]
> > @_ZBS1B = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)]
> >
> > And C:
> >
> > @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)]
> > @_ZBS1C = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)]
> >
> > A later mechanism would combine the bitset globals, lay out the
referenced
> > globals and build the bit sets. In the LTO case, the IR linker can
already
> > do the combination step if the globals are given appending linkage. A
> > later compiler pass (which we will call the "bitset lowering
pass") would
> > lay out
> > the globals (by combining them into a single large global, and RAUWing
the
> > original globals with aliases that gep into the large global) and
build the
> > bit sets. In the non-LTO case, we could also invent an
object-file-specific
> > mechanism for the backend to tell the linker to combine and build the
bit
> > sets,
> > for example by placing the bitset globals in a specifically named
section,
> > and place the referenced globals in individual sections so that the
linker
> > can rearrange them.
> >
> > As previously mentioned, an intrinsic would be used to test entries in
the
> > bit set at call sites. For example, the relevant IR for the following
C++
> > code:
> >
> > A *some_object = ...;
> > some_object->f();
> >
> > might look like this:
> >
> > %vtable = load i8** %some_object
> > %p = call i1 @llvm.bitset.test(i8* %vtable, i8* @_ZBS1A) readnone
> > br i1 %p, label %continue, label %trap
> >
> > In the LTO case, such intrinsic calls would be lowered by the bitset
> > lowering pass
> > into the appropriate range/bit set check code, which might look like
this:
> > (pseudocode for 64-bit platform)
> >
> > if %vtable & 7 != 0 || %vtable < global_start_addr(_ZBS1A) ||
%vtable >
> > global_end_addr(_ZBS1A) {
> >   %p = 0
> > } else {
> >   %p = bitset(_ZBS1A)[(%vtable - global_start_addr(_ZBS1A)) >>
3]
> > }
> >
> > In the non-LTO case, we would generate similar code, but with
appropriate
> > relocations so that the required constants (global start/end address,
> > bitset
> > address, mask, shift amount) would receive appropriate
linker-determined
> > values.
> >
> > The first step I propose to make in this direction is a patch that
adds
> > support for the bitset flag on globals, and introduces the bitset
lowering
> > pass. A
> > later step would involve building backend support, and linker support
in
> > lld.
> >
> > Comments appreciated.
> >
> > Thanks,
> > --
> > Peter
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
-- 
Peter

Jim Grosbach

2015-Jan-30 02:51 UTC

head link

[LLVMdev] IR extension proposal: bitset constants

Hi Peter,

Please forgive if this is an obvious question, but how reasonable is it for this
approach to work when not all translation units are available to be rebuilt with
the new information?

I'm very interested in what you're proposing here.

Jim

Sent from my iPhone
> On Jan 27, 2015, at 1:07 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:
> 
> Hi all,
> 
> I would like to propose a mechanism that allows IR modules to
co-operatively
> build a pointer set corresponding to addresses within a given set of
> globals. The specific use case I have in mind is to provide a mechanism
> for a C++ program to efficiently verify (at each call site) that a vtable
> pointer is in the set of valid vtable pointers for the class or its derived
> classes. One way of doing this is for a toolchain component to build, for
each
> class, a bit set that maps to the memory region allocated for the vtables,
> such that each 1 bit in the bit set maps to a valid vtable for that class,
> and lay out the vtables next to each other, to minimize the total size of
> the bit sets. Call sites would perform a range check followed by a load of
> the appropriate bit from the bit set.
> 
> To give a concrete example, suppose we have the following classes:
> 
> struct A { virtual void f(); };
> struct B : A { virtual void f(), g(); };
> struct C : A { virtual void f(), h(); };
> 
> with the following vtables:
> 
> _ZTV1A = { &A::rtti, &A::f };
> _ZTV1B = { &B::rtti, &B::f, &B::g };
> _ZTV1C = { &C::rtti, &C::f, &C::h };
> 
> The set of valid vtables for static class A is {&_ZTV1A[1],
&_ZTV1B[1],
> &_ZTV1C[1]}, for B is {&_ZTV1B[1]} and for C is {&_ZTV1C[1]}.
The toolchain
> would lay out _ZTV1A, _ZTV1B and _ZTV1C consecutively, and generate bit
sets
> like this:
> 
> A = {0,1,0,1,0,0,1,0}
> B = {0,0,0,1,0,0,0,0}
> C = {0,0,0,0,0,0,1,0}
> 
> A call site (say the static type of the call is A) will check whether the
> object's vtable falls in the range [_ZTV1A, _ZTV1C], and is sizeof(void
> *)-aligned. It would then load the (vtable-_ZTV1A)/sizeof(void *)'th
entry
> in A's bitset, which will be 1 if the vtable is valid for A.
> 
> We are now faced with a number of implementation questions: how do we
represent
> these (potentially partial) bit sets in LLVM, and how do we arrange to lay
> out the globals and build the complete bit sets? Remember that classes B
> and C may be in different translation units, so we do not necessarily have
> a complete picture of A's bit set at A's compile time.
> 
> What I propose is that the parts of the bit sets be represented as arrays
> of pointers, which will be resolved through some mechanism (either at
> LTO time, or at regular (non-LTO) link time) to bit sets corresponding to
> those addresses. This mechanism would also be responsible for laying out
> the referenced globals (i.e. the vtables) as close together as possible. We
> introduce a new flag on globals, the so-called "bitset" flag,
which causes
> the global to be transformed using this mechanism. Such a global cannot be
> accessed in the normal way. It may only be accessed using an intrinsic that
> tests whether a given pointer is in the set.
> 
> To return to our concrete example, a translation unit defining the vtable
> for A may also define the following global:
> 
> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1A, 0, 1)]
> 
> A translation unit defining B would define:
> 
> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)]
> @_ZBS1B = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)]
> 
> And C:
> 
> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)]
> @_ZBS1C = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)]
> 
> A later mechanism would combine the bitset globals, lay out the referenced
> globals and build the bit sets. In the LTO case, the IR linker can already
> do the combination step if the globals are given appending linkage. A
> later compiler pass (which we will call the "bitset lowering
pass") would lay out
> the globals (by combining them into a single large global, and RAUWing the
> original globals with aliases that gep into the large global) and build the
> bit sets. In the non-LTO case, we could also invent an object-file-specific
> mechanism for the backend to tell the linker to combine and build the bit
sets,
> for example by placing the bitset globals in a specifically named section,
> and place the referenced globals in individual sections so that the linker
> can rearrange them.
> 
> As previously mentioned, an intrinsic would be used to test entries in the
> bit set at call sites. For example, the relevant IR for the following C++
code:
> 
> A *some_object = ...;
> some_object->f();
> 
> might look like this:
> 
> %vtable = load i8** %some_object
> %p = call i1 @llvm.bitset.test(i8* %vtable, i8* @_ZBS1A) readnone
> br i1 %p, label %continue, label %trap
> 
> In the LTO case, such intrinsic calls would be lowered by the bitset
lowering pass
> into the appropriate range/bit set check code, which might look like this:
> (pseudocode for 64-bit platform)
> 
> if %vtable & 7 != 0 || %vtable < global_start_addr(_ZBS1A) ||
%vtable > global_end_addr(_ZBS1A) {
>  %p = 0
> } else {
>  %p = bitset(_ZBS1A)[(%vtable - global_start_addr(_ZBS1A)) >> 3]
> }
> 
> In the non-LTO case, we would generate similar code, but with appropriate
> relocations so that the required constants (global start/end address,
bitset
> address, mask, shift amount) would receive appropriate linker-determined
> values.
> 
> The first step I propose to make in this direction is a patch that adds
> support for the bitset flag on globals, and introduces the bitset lowering
pass. A
> later step would involve building backend support, and linker support in
lld.
> 
> Comments appreciated.
> 
> Thanks,
> -- 
> Peter
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Peter Collingbourne

2015-Jan-30 03:35 UTC

head link

[LLVMdev] IR extension proposal: bitset constants

On Thu, Jan 29, 2015 at 06:51:51PM -0800, Jim Grosbach
wrote:> Hi Peter,
> 
> Please forgive if this is an obvious question, but how reasonable is it for
this approach to work when not all translation units are available to be rebuilt
with the new information?
> 
> I'm very interested in what you're proposing here.
This technique works best if we can lay out the vtables in a specific order,
but this is not necessarily a requirement. I can imagine an extension of the
design where if the pointer is out of range we can fall back to a slow path
that compares the vtable pointer against each element of a list of valid
addresses that belong to external code.

The hard requirements (I believe) are that we have headers for all external
classes that may interact with our code, and that the external objects export
the needed vtable symbols. With those, we should be able to use Clang to
build the list of valid addresses. I believe this should work with some
implementations of the standard library, for example.

If the external library does not have all headers available, or (say) if
it exposes classes defined in an anonymous namespace, we may not be able
to identify all the valid addresses. To handle those cases we may need to
develop some kind of blacklisting mechanism.

Thanks,
-- 
Peter

Sean Silva

2015-Feb-04 00:03 UTC

head link

[LLVMdev] IR extension proposal: bitset constants

One other thing: if this can be used for control-flow integrity, I assume
it has to have good knowledge of the available indirect call targets. Could
we also use this information for devirtualization?

-- Sean Silva

On Tue, Jan 27, 2015 at 1:07 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:
> Hi all,
>
> I would like to propose a mechanism that allows IR modules to
> co-operatively
> build a pointer set corresponding to addresses within a given set of
> globals. The specific use case I have in mind is to provide a mechanism
> for a C++ program to efficiently verify (at each call site) that a vtable
> pointer is in the set of valid vtable pointers for the class or its derived
> classes. One way of doing this is for a toolchain component to build, for
> each
> class, a bit set that maps to the memory region allocated for the vtables,
> such that each 1 bit in the bit set maps to a valid vtable for that class,
> and lay out the vtables next to each other, to minimize the total size of
> the bit sets. Call sites would perform a range check followed by a load of
> the appropriate bit from the bit set.
>
> To give a concrete example, suppose we have the following classes:
>
> struct A { virtual void f(); };
> struct B : A { virtual void f(), g(); };
> struct C : A { virtual void f(), h(); };
>
> with the following vtables:
>
> _ZTV1A = { &A::rtti, &A::f };
> _ZTV1B = { &B::rtti, &B::f, &B::g };
> _ZTV1C = { &C::rtti, &C::f, &C::h };
>
> The set of valid vtables for static class A is {&_ZTV1A[1],
&_ZTV1B[1],
> &_ZTV1C[1]}, for B is {&_ZTV1B[1]} and for C is {&_ZTV1C[1]}.
The toolchain
> would lay out _ZTV1A, _ZTV1B and _ZTV1C consecutively, and generate bit
> sets
> like this:
>
> A = {0,1,0,1,0,0,1,0}
> B = {0,0,0,1,0,0,0,0}
> C = {0,0,0,0,0,0,1,0}
>
> A call site (say the static type of the call is A) will check whether the
> object's vtable falls in the range [_ZTV1A, _ZTV1C], and is sizeof(void
> *)-aligned. It would then load the (vtable-_ZTV1A)/sizeof(void *)'th
entry
> in A's bitset, which will be 1 if the vtable is valid for A.
>
> We are now faced with a number of implementation questions: how do we
> represent
> these (potentially partial) bit sets in LLVM, and how do we arrange to lay
> out the globals and build the complete bit sets? Remember that classes B
> and C may be in different translation units, so we do not necessarily have
> a complete picture of A's bit set at A's compile time.
>
> What I propose is that the parts of the bit sets be represented as arrays
> of pointers, which will be resolved through some mechanism (either at
> LTO time, or at regular (non-LTO) link time) to bit sets corresponding to
> those addresses. This mechanism would also be responsible for laying out
> the referenced globals (i.e. the vtables) as close together as possible. We
> introduce a new flag on globals, the so-called "bitset" flag,
which causes
> the global to be transformed using this mechanism. Such a global cannot be
> accessed in the normal way. It may only be accessed using an intrinsic that
> tests whether a given pointer is in the set.
>
> To return to our concrete example, a translation unit defining the vtable
> for A may also define the following global:
>
> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1A, 0, 1)]
>
> A translation unit defining B would define:
>
> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)]
> @_ZBS1B = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)]
>
> And C:
>
> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)]
> @_ZBS1C = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)]
>
> A later mechanism would combine the bitset globals, lay out the referenced
> globals and build the bit sets. In the LTO case, the IR linker can already
> do the combination step if the globals are given appending linkage. A
> later compiler pass (which we will call the "bitset lowering
pass") would
> lay out
> the globals (by combining them into a single large global, and RAUWing the
> original globals with aliases that gep into the large global) and build the
> bit sets. In the non-LTO case, we could also invent an object-file-specific
> mechanism for the backend to tell the linker to combine and build the bit
> sets,
> for example by placing the bitset globals in a specifically named section,
> and place the referenced globals in individual sections so that the linker
> can rearrange them.
>
> As previously mentioned, an intrinsic would be used to test entries in the
> bit set at call sites. For example, the relevant IR for the following C++
> code:
>
> A *some_object = ...;
> some_object->f();
>
> might look like this:
>
> %vtable = load i8** %some_object
> %p = call i1 @llvm.bitset.test(i8* %vtable, i8* @_ZBS1A) readnone
> br i1 %p, label %continue, label %trap
>
> In the LTO case, such intrinsic calls would be lowered by the bitset
> lowering pass
> into the appropriate range/bit set check code, which might look like this:
> (pseudocode for 64-bit platform)
>
> if %vtable & 7 != 0 || %vtable < global_start_addr(_ZBS1A) ||
%vtable >
> global_end_addr(_ZBS1A) {
>   %p = 0
> } else {
>   %p = bitset(_ZBS1A)[(%vtable - global_start_addr(_ZBS1A)) >> 3]
> }
>
> In the non-LTO case, we would generate similar code, but with appropriate
> relocations so that the required constants (global start/end address,
> bitset
> address, mask, shift amount) would receive appropriate linker-determined
> values.
>
> The first step I propose to make in this direction is a patch that adds
> support for the bitset flag on globals, and introduces the bitset lowering
> pass. A
> later step would involve building backend support, and linker support in
> lld.
>
> Comments appreciated.
>
> Thanks,
> --
> Peter
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150203/8bd51a04/attachment.html>

Peter Collingbourne

2015-Feb-04 00:13 UTC

head link

[LLVMdev] IR extension proposal: bitset constants

On Tue, Feb 03, 2015 at 04:03:45PM -0800, Sean Silva
wrote:> One other thing: if this can be used for control-flow integrity, I assume
> it has to have good knowledge of the available indirect call targets. Could
> we also use this information for devirtualization?
I would expect so. If a bitset contains only one element, we should be able
to teach the lowering pass to simply test that the given pointer is equal
to that element (i.e. the only valid vptr). If the later IR branches to a
trapping block if that condition is false, it should be possible for the
optimizer to deduce that the condition is true in any code that is dominated
by the branch, and from that do devirtualization.

Thanks,
-- 
Peter

llvm dev - Feb 2015 - [LLVMdev] IR extension proposal: bitset constants

[LLVMdev] IR extension proposal: bitset constants

[LLVMdev] IR extension proposal: bitset constants

[LLVMdev] IR extension proposal: bitset constants

[LLVMdev] IR extension proposal: bitset constants

[LLVMdev] IR extension proposal: bitset constants

[LLVMdev] IR extension proposal: bitset constants

[LLVMdev] IR extension proposal: bitset constants

[LLVMdev] IR extension proposal: bitset constants