I would start from using module-level metadata. An IR extension might be a good idea once we show that - the proposed indirect call protection mechanism is efficient and useful, and - there are other use cases for the IR extension. --kcc On Wed, Jan 28, 2015 at 2:57 AM, Sean Silva <chisophugis at gmail.com> wrote:> Is there any way to accomplish this that doesn't require changes in every > part of the toolchain? > > What you describe here sounds really efficient, but for a change this > pervasive throughout the toolchain, I would at least like to see an attempt > done that does not require changing the toolchain. Then we can circle back > to changing the toolchain if that proves inadequate and with concrete > experience informing what really is going to require a toolchain change > (Maybe you've already done this? Please share.). > > -- Sean Silva > > > On Tue, Jan 27, 2015 at 9:07 PM, Peter Collingbourne <peter at pcc.me.uk> > wrote: > >> Hi all, >> >> I would like to propose a mechanism that allows IR modules to >> co-operatively >> build a pointer set corresponding to addresses within a given set of >> globals. The specific use case I have in mind is to provide a mechanism >> for a C++ program to efficiently verify (at each call site) that a vtable >> pointer is in the set of valid vtable pointers for the class or its >> derived >> classes. One way of doing this is for a toolchain component to build, for >> each >> class, a bit set that maps to the memory region allocated for the vtables, >> such that each 1 bit in the bit set maps to a valid vtable for that class, >> and lay out the vtables next to each other, to minimize the total size of >> the bit sets. Call sites would perform a range check followed by a load of >> the appropriate bit from the bit set. >> >> To give a concrete example, suppose we have the following classes: >> >> struct A { virtual void f(); }; >> struct B : A { virtual void f(), g(); }; >> struct C : A { virtual void f(), h(); }; >> >> with the following vtables: >> >> _ZTV1A = { &A::rtti, &A::f }; >> _ZTV1B = { &B::rtti, &B::f, &B::g }; >> _ZTV1C = { &C::rtti, &C::f, &C::h }; >> >> The set of valid vtables for static class A is {&_ZTV1A[1], &_ZTV1B[1], >> &_ZTV1C[1]}, for B is {&_ZTV1B[1]} and for C is {&_ZTV1C[1]}. The >> toolchain >> would lay out _ZTV1A, _ZTV1B and _ZTV1C consecutively, and generate bit >> sets >> like this: >> >> A = {0,1,0,1,0,0,1,0} >> B = {0,0,0,1,0,0,0,0} >> C = {0,0,0,0,0,0,1,0} >> >> A call site (say the static type of the call is A) will check whether the >> object's vtable falls in the range [_ZTV1A, _ZTV1C], and is sizeof(void >> *)-aligned. It would then load the (vtable-_ZTV1A)/sizeof(void *)'th entry >> in A's bitset, which will be 1 if the vtable is valid for A. >> >> We are now faced with a number of implementation questions: how do we >> represent >> these (potentially partial) bit sets in LLVM, and how do we arrange to lay >> out the globals and build the complete bit sets? Remember that classes B >> and C may be in different translation units, so we do not necessarily have >> a complete picture of A's bit set at A's compile time. >> >> What I propose is that the parts of the bit sets be represented as arrays >> of pointers, which will be resolved through some mechanism (either at >> LTO time, or at regular (non-LTO) link time) to bit sets corresponding to >> those addresses. This mechanism would also be responsible for laying out >> the referenced globals (i.e. the vtables) as close together as possible. >> We >> introduce a new flag on globals, the so-called "bitset" flag, which causes >> the global to be transformed using this mechanism. Such a global cannot be >> accessed in the normal way. It may only be accessed using an intrinsic >> that >> tests whether a given pointer is in the set. >> >> To return to our concrete example, a translation unit defining the vtable >> for A may also define the following global: >> >> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1A, 0, 1)] >> >> A translation unit defining B would define: >> >> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)] >> @_ZBS1B = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)] >> >> And C: >> >> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)] >> @_ZBS1C = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)] >> >> A later mechanism would combine the bitset globals, lay out the referenced >> globals and build the bit sets. In the LTO case, the IR linker can already >> do the combination step if the globals are given appending linkage. A >> later compiler pass (which we will call the "bitset lowering pass") would >> lay out >> the globals (by combining them into a single large global, and RAUWing the >> original globals with aliases that gep into the large global) and build >> the >> bit sets. In the non-LTO case, we could also invent an >> object-file-specific >> mechanism for the backend to tell the linker to combine and build the bit >> sets, >> for example by placing the bitset globals in a specifically named section, >> and place the referenced globals in individual sections so that the linker >> can rearrange them. >> >> As previously mentioned, an intrinsic would be used to test entries in the >> bit set at call sites. For example, the relevant IR for the following C++ >> code: >> >> A *some_object = ...; >> some_object->f(); >> >> might look like this: >> >> %vtable = load i8** %some_object >> %p = call i1 @llvm.bitset.test(i8* %vtable, i8* @_ZBS1A) readnone >> br i1 %p, label %continue, label %trap >> >> In the LTO case, such intrinsic calls would be lowered by the bitset >> lowering pass >> into the appropriate range/bit set check code, which might look like this: >> (pseudocode for 64-bit platform) >> >> if %vtable & 7 != 0 || %vtable < global_start_addr(_ZBS1A) || %vtable > >> global_end_addr(_ZBS1A) { >> %p = 0 >> } else { >> %p = bitset(_ZBS1A)[(%vtable - global_start_addr(_ZBS1A)) >> 3] >> } >> >> In the non-LTO case, we would generate similar code, but with appropriate >> relocations so that the required constants (global start/end address, >> bitset >> address, mask, shift amount) would receive appropriate linker-determined >> values. >> >> The first step I propose to make in this direction is a patch that adds >> support for the bitset flag on globals, and introduces the bitset >> lowering pass. A >> later step would involve building backend support, and linker support in >> lld. >> >> Comments appreciated. >> >> Thanks, >> -- >> Peter >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150128/2ed637b6/attachment.html>
Peter Collingbourne
2015-Jan-28 18:57 UTC
[LLVMdev] IR extension proposal: bitset constants
Introducing a new module-level metadata format is just another kind of IR extension, with the disadvantage of being untyped. We should be cutting down on these types of things, not introducing a new one [1]. I feel that a global is the right design for this, as it represents data that is conceptually in the "object file". Peter [1] https://groups.google.com/forum/#!topic/llvm-dev/u4hZUWAXQuY On Wed, Jan 28, 2015 at 09:41:14AM -0800, Kostya Serebryany wrote:> I would start from using module-level metadata. > An IR extension might be a good idea once we show that > - the proposed indirect call protection mechanism is efficient and > useful, and > - there are other use cases for the IR extension. > > --kcc > > > On Wed, Jan 28, 2015 at 2:57 AM, Sean Silva <chisophugis at gmail.com> wrote: > > > Is there any way to accomplish this that doesn't require changes in every > > part of the toolchain? > > > > What you describe here sounds really efficient, but for a change this > > pervasive throughout the toolchain, I would at least like to see an attempt > > done that does not require changing the toolchain. Then we can circle back > > to changing the toolchain if that proves inadequate and with concrete > > experience informing what really is going to require a toolchain change > > (Maybe you've already done this? Please share.). > > > > -- Sean Silva > > > > > > On Tue, Jan 27, 2015 at 9:07 PM, Peter Collingbourne <peter at pcc.me.uk> > > wrote: > > > >> Hi all, > >> > >> I would like to propose a mechanism that allows IR modules to > >> co-operatively > >> build a pointer set corresponding to addresses within a given set of > >> globals. The specific use case I have in mind is to provide a mechanism > >> for a C++ program to efficiently verify (at each call site) that a vtable > >> pointer is in the set of valid vtable pointers for the class or its > >> derived > >> classes. One way of doing this is for a toolchain component to build, for > >> each > >> class, a bit set that maps to the memory region allocated for the vtables, > >> such that each 1 bit in the bit set maps to a valid vtable for that class, > >> and lay out the vtables next to each other, to minimize the total size of > >> the bit sets. Call sites would perform a range check followed by a load of > >> the appropriate bit from the bit set. > >> > >> To give a concrete example, suppose we have the following classes: > >> > >> struct A { virtual void f(); }; > >> struct B : A { virtual void f(), g(); }; > >> struct C : A { virtual void f(), h(); }; > >> > >> with the following vtables: > >> > >> _ZTV1A = { &A::rtti, &A::f }; > >> _ZTV1B = { &B::rtti, &B::f, &B::g }; > >> _ZTV1C = { &C::rtti, &C::f, &C::h }; > >> > >> The set of valid vtables for static class A is {&_ZTV1A[1], &_ZTV1B[1], > >> &_ZTV1C[1]}, for B is {&_ZTV1B[1]} and for C is {&_ZTV1C[1]}. The > >> toolchain > >> would lay out _ZTV1A, _ZTV1B and _ZTV1C consecutively, and generate bit > >> sets > >> like this: > >> > >> A = {0,1,0,1,0,0,1,0} > >> B = {0,0,0,1,0,0,0,0} > >> C = {0,0,0,0,0,0,1,0} > >> > >> A call site (say the static type of the call is A) will check whether the > >> object's vtable falls in the range [_ZTV1A, _ZTV1C], and is sizeof(void > >> *)-aligned. It would then load the (vtable-_ZTV1A)/sizeof(void *)'th entry > >> in A's bitset, which will be 1 if the vtable is valid for A. > >> > >> We are now faced with a number of implementation questions: how do we > >> represent > >> these (potentially partial) bit sets in LLVM, and how do we arrange to lay > >> out the globals and build the complete bit sets? Remember that classes B > >> and C may be in different translation units, so we do not necessarily have > >> a complete picture of A's bit set at A's compile time. > >> > >> What I propose is that the parts of the bit sets be represented as arrays > >> of pointers, which will be resolved through some mechanism (either at > >> LTO time, or at regular (non-LTO) link time) to bit sets corresponding to > >> those addresses. This mechanism would also be responsible for laying out > >> the referenced globals (i.e. the vtables) as close together as possible. > >> We > >> introduce a new flag on globals, the so-called "bitset" flag, which causes > >> the global to be transformed using this mechanism. Such a global cannot be > >> accessed in the normal way. It may only be accessed using an intrinsic > >> that > >> tests whether a given pointer is in the set. > >> > >> To return to our concrete example, a translation unit defining the vtable > >> for A may also define the following global: > >> > >> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1A, 0, 1)] > >> > >> A translation unit defining B would define: > >> > >> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)] > >> @_ZBS1B = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)] > >> > >> And C: > >> > >> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)] > >> @_ZBS1C = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)] > >> > >> A later mechanism would combine the bitset globals, lay out the referenced > >> globals and build the bit sets. In the LTO case, the IR linker can already > >> do the combination step if the globals are given appending linkage. A > >> later compiler pass (which we will call the "bitset lowering pass") would > >> lay out > >> the globals (by combining them into a single large global, and RAUWing the > >> original globals with aliases that gep into the large global) and build > >> the > >> bit sets. In the non-LTO case, we could also invent an > >> object-file-specific > >> mechanism for the backend to tell the linker to combine and build the bit > >> sets, > >> for example by placing the bitset globals in a specifically named section, > >> and place the referenced globals in individual sections so that the linker > >> can rearrange them. > >> > >> As previously mentioned, an intrinsic would be used to test entries in the > >> bit set at call sites. For example, the relevant IR for the following C++ > >> code: > >> > >> A *some_object = ...; > >> some_object->f(); > >> > >> might look like this: > >> > >> %vtable = load i8** %some_object > >> %p = call i1 @llvm.bitset.test(i8* %vtable, i8* @_ZBS1A) readnone > >> br i1 %p, label %continue, label %trap > >> > >> In the LTO case, such intrinsic calls would be lowered by the bitset > >> lowering pass > >> into the appropriate range/bit set check code, which might look like this: > >> (pseudocode for 64-bit platform) > >> > >> if %vtable & 7 != 0 || %vtable < global_start_addr(_ZBS1A) || %vtable > > >> global_end_addr(_ZBS1A) { > >> %p = 0 > >> } else { > >> %p = bitset(_ZBS1A)[(%vtable - global_start_addr(_ZBS1A)) >> 3] > >> } > >> > >> In the non-LTO case, we would generate similar code, but with appropriate > >> relocations so that the required constants (global start/end address, > >> bitset > >> address, mask, shift amount) would receive appropriate linker-determined > >> values. > >> > >> The first step I propose to make in this direction is a patch that adds > >> support for the bitset flag on globals, and introduces the bitset > >> lowering pass. A > >> later step would involve building backend support, and linker support in > >> lld. > >> > >> Comments appreciated. > >> > >> Thanks, > >> -- > >> Peter > >> _______________________________________________ > >> LLVM Developers mailing list > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >> > > > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > >-- Peter
On Wed, Jan 28, 2015 at 10:57 AM, Peter Collingbourne <peter at pcc.me.uk> wrote:> Introducing a new module-level metadata format is just another kind of IR > extension, with the disadvantage of being untyped. We should be cutting > down > on these types of things, not introducing a new one [1]. >Hard to argue with that. module-level metadata is a simple (though not the only) way to make a prototype and prove that the approach works.> > I feel that a global is the right design for this, as it represents data > that is conceptually in the "object file". > > Peter > > [1] https://groups.google.com/forum/#!topic/llvm-dev/u4hZUWAXQuY > > On Wed, Jan 28, 2015 at 09:41:14AM -0800, Kostya Serebryany wrote: > > I would start from using module-level metadata. > > An IR extension might be a good idea once we show that > > - the proposed indirect call protection mechanism is efficient and > > useful, and > > - there are other use cases for the IR extension. > > > > --kcc > > > > > > On Wed, Jan 28, 2015 at 2:57 AM, Sean Silva <chisophugis at gmail.com> > wrote: > > > > > Is there any way to accomplish this that doesn't require changes in > every > > > part of the toolchain? > > > > > > What you describe here sounds really efficient, but for a change this > > > pervasive throughout the toolchain, I would at least like to see an > attempt > > > done that does not require changing the toolchain. Then we can circle > back > > > to changing the toolchain if that proves inadequate and with concrete > > > experience informing what really is going to require a toolchain change > > > (Maybe you've already done this? Please share.). > > > > > > -- Sean Silva > > > > > > > > > On Tue, Jan 27, 2015 at 9:07 PM, Peter Collingbourne <peter at pcc.me.uk> > > > wrote: > > > > > >> Hi all, > > >> > > >> I would like to propose a mechanism that allows IR modules to > > >> co-operatively > > >> build a pointer set corresponding to addresses within a given set of > > >> globals. The specific use case I have in mind is to provide a > mechanism > > >> for a C++ program to efficiently verify (at each call site) that a > vtable > > >> pointer is in the set of valid vtable pointers for the class or its > > >> derived > > >> classes. One way of doing this is for a toolchain component to build, > for > > >> each > > >> class, a bit set that maps to the memory region allocated for the > vtables, > > >> such that each 1 bit in the bit set maps to a valid vtable for that > class, > > >> and lay out the vtables next to each other, to minimize the total > size of > > >> the bit sets. Call sites would perform a range check followed by a > load of > > >> the appropriate bit from the bit set. > > >> > > >> To give a concrete example, suppose we have the following classes: > > >> > > >> struct A { virtual void f(); }; > > >> struct B : A { virtual void f(), g(); }; > > >> struct C : A { virtual void f(), h(); }; > > >> > > >> with the following vtables: > > >> > > >> _ZTV1A = { &A::rtti, &A::f }; > > >> _ZTV1B = { &B::rtti, &B::f, &B::g }; > > >> _ZTV1C = { &C::rtti, &C::f, &C::h }; > > >> > > >> The set of valid vtables for static class A is {&_ZTV1A[1], > &_ZTV1B[1], > > >> &_ZTV1C[1]}, for B is {&_ZTV1B[1]} and for C is {&_ZTV1C[1]}. The > > >> toolchain > > >> would lay out _ZTV1A, _ZTV1B and _ZTV1C consecutively, and generate > bit > > >> sets > > >> like this: > > >> > > >> A = {0,1,0,1,0,0,1,0} > > >> B = {0,0,0,1,0,0,0,0} > > >> C = {0,0,0,0,0,0,1,0} > > >> > > >> A call site (say the static type of the call is A) will check whether > the > > >> object's vtable falls in the range [_ZTV1A, _ZTV1C], and is > sizeof(void > > >> *)-aligned. It would then load the (vtable-_ZTV1A)/sizeof(void *)'th > entry > > >> in A's bitset, which will be 1 if the vtable is valid for A. > > >> > > >> We are now faced with a number of implementation questions: how do we > > >> represent > > >> these (potentially partial) bit sets in LLVM, and how do we arrange > to lay > > >> out the globals and build the complete bit sets? Remember that > classes B > > >> and C may be in different translation units, so we do not necessarily > have > > >> a complete picture of A's bit set at A's compile time. > > >> > > >> What I propose is that the parts of the bit sets be represented as > arrays > > >> of pointers, which will be resolved through some mechanism (either at > > >> LTO time, or at regular (non-LTO) link time) to bit sets > corresponding to > > >> those addresses. This mechanism would also be responsible for laying > out > > >> the referenced globals (i.e. the vtables) as close together as > possible. > > >> We > > >> introduce a new flag on globals, the so-called "bitset" flag, which > causes > > >> the global to be transformed using this mechanism. Such a global > cannot be > > >> accessed in the normal way. It may only be accessed using an intrinsic > > >> that > > >> tests whether a given pointer is in the set. > > >> > > >> To return to our concrete example, a translation unit defining the > vtable > > >> for A may also define the following global: > > >> > > >> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1A, 0, 1)] > > >> > > >> A translation unit defining B would define: > > >> > > >> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)] > > >> @_ZBS1B = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)] > > >> > > >> And C: > > >> > > >> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)] > > >> @_ZBS1C = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)] > > >> > > >> A later mechanism would combine the bitset globals, lay out the > referenced > > >> globals and build the bit sets. In the LTO case, the IR linker can > already > > >> do the combination step if the globals are given appending linkage. A > > >> later compiler pass (which we will call the "bitset lowering pass") > would > > >> lay out > > >> the globals (by combining them into a single large global, and > RAUWing the > > >> original globals with aliases that gep into the large global) and > build > > >> the > > >> bit sets. In the non-LTO case, we could also invent an > > >> object-file-specific > > >> mechanism for the backend to tell the linker to combine and build the > bit > > >> sets, > > >> for example by placing the bitset globals in a specifically named > section, > > >> and place the referenced globals in individual sections so that the > linker > > >> can rearrange them. > > >> > > >> As previously mentioned, an intrinsic would be used to test entries > in the > > >> bit set at call sites. For example, the relevant IR for the following > C++ > > >> code: > > >> > > >> A *some_object = ...; > > >> some_object->f(); > > >> > > >> might look like this: > > >> > > >> %vtable = load i8** %some_object > > >> %p = call i1 @llvm.bitset.test(i8* %vtable, i8* @_ZBS1A) readnone > > >> br i1 %p, label %continue, label %trap > > >> > > >> In the LTO case, such intrinsic calls would be lowered by the bitset > > >> lowering pass > > >> into the appropriate range/bit set check code, which might look like > this: > > >> (pseudocode for 64-bit platform) > > >> > > >> if %vtable & 7 != 0 || %vtable < global_start_addr(_ZBS1A) || %vtable > > > > >> global_end_addr(_ZBS1A) { > > >> %p = 0 > > >> } else { > > >> %p = bitset(_ZBS1A)[(%vtable - global_start_addr(_ZBS1A)) >> 3] > > >> } > > >> > > >> In the non-LTO case, we would generate similar code, but with > appropriate > > >> relocations so that the required constants (global start/end address, > > >> bitset > > >> address, mask, shift amount) would receive appropriate > linker-determined > > >> values. > > >> > > >> The first step I propose to make in this direction is a patch that > adds > > >> support for the bitset flag on globals, and introduces the bitset > > >> lowering pass. A > > >> later step would involve building backend support, and linker support > in > > >> lld. > > >> > > >> Comments appreciated. > > >> > > >> Thanks, > > >> -- > > >> Peter > > >> _______________________________________________ > > >> LLVM Developers mailing list > > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > >> > > > > > > > > > _______________________________________________ > > > LLVM Developers mailing list > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > > > -- > Peter >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150128/b2572e0b/attachment.html>
So, bitset would be a property that means : globals with the same name will append on a string of bits in the order in which they appear, and it's the job of the front end to make sure the correct order is followed in every unit, so that linking does the right job in every corner case? Could that be used for efficient global boolean flags, like LLVM's options? Even if you don't need the range check, you get the bit mask check for free. I'm trying to find other uses to help you in the case to make this into a first class citizen, not metadata. Cheers, Renato On 28 Jan 2015 19:10, "Peter Collingbourne" <peter at pcc.me.uk> wrote:> Introducing a new module-level metadata format is just another kind of IR > extension, with the disadvantage of being untyped. We should be cutting > down > on these types of things, not introducing a new one [1]. > > I feel that a global is the right design for this, as it represents data > that is conceptually in the "object file". > > Peter > > [1] https://groups.google.com/forum/#!topic/llvm-dev/u4hZUWAXQuY > > On Wed, Jan 28, 2015 at 09:41:14AM -0800, Kostya Serebryany wrote: > > I would start from using module-level metadata. > > An IR extension might be a good idea once we show that > > - the proposed indirect call protection mechanism is efficient and > > useful, and > > - there are other use cases for the IR extension. > > > > --kcc > > > > > > On Wed, Jan 28, 2015 at 2:57 AM, Sean Silva <chisophugis at gmail.com> > wrote: > > > > > Is there any way to accomplish this that doesn't require changes in > every > > > part of the toolchain? > > > > > > What you describe here sounds really efficient, but for a change this > > > pervasive throughout the toolchain, I would at least like to see an > attempt > > > done that does not require changing the toolchain. Then we can circle > back > > > to changing the toolchain if that proves inadequate and with concrete > > > experience informing what really is going to require a toolchain change > > > (Maybe you've already done this? Please share.). > > > > > > -- Sean Silva > > > > > > > > > On Tue, Jan 27, 2015 at 9:07 PM, Peter Collingbourne <peter at pcc.me.uk> > > > wrote: > > > > > >> Hi all, > > >> > > >> I would like to propose a mechanism that allows IR modules to > > >> co-operatively > > >> build a pointer set corresponding to addresses within a given set of > > >> globals. The specific use case I have in mind is to provide a > mechanism > > >> for a C++ program to efficiently verify (at each call site) that a > vtable > > >> pointer is in the set of valid vtable pointers for the class or its > > >> derived > > >> classes. One way of doing this is for a toolchain component to build, > for > > >> each > > >> class, a bit set that maps to the memory region allocated for the > vtables, > > >> such that each 1 bit in the bit set maps to a valid vtable for that > class, > > >> and lay out the vtables next to each other, to minimize the total > size of > > >> the bit sets. Call sites would perform a range check followed by a > load of > > >> the appropriate bit from the bit set. > > >> > > >> To give a concrete example, suppose we have the following classes: > > >> > > >> struct A { virtual void f(); }; > > >> struct B : A { virtual void f(), g(); }; > > >> struct C : A { virtual void f(), h(); }; > > >> > > >> with the following vtables: > > >> > > >> _ZTV1A = { &A::rtti, &A::f }; > > >> _ZTV1B = { &B::rtti, &B::f, &B::g }; > > >> _ZTV1C = { &C::rtti, &C::f, &C::h }; > > >> > > >> The set of valid vtables for static class A is {&_ZTV1A[1], > &_ZTV1B[1], > > >> &_ZTV1C[1]}, for B is {&_ZTV1B[1]} and for C is {&_ZTV1C[1]}. The > > >> toolchain > > >> would lay out _ZTV1A, _ZTV1B and _ZTV1C consecutively, and generate > bit > > >> sets > > >> like this: > > >> > > >> A = {0,1,0,1,0,0,1,0} > > >> B = {0,0,0,1,0,0,0,0} > > >> C = {0,0,0,0,0,0,1,0} > > >> > > >> A call site (say the static type of the call is A) will check whether > the > > >> object's vtable falls in the range [_ZTV1A, _ZTV1C], and is > sizeof(void > > >> *)-aligned. It would then load the (vtable-_ZTV1A)/sizeof(void *)'th > entry > > >> in A's bitset, which will be 1 if the vtable is valid for A. > > >> > > >> We are now faced with a number of implementation questions: how do we > > >> represent > > >> these (potentially partial) bit sets in LLVM, and how do we arrange > to lay > > >> out the globals and build the complete bit sets? Remember that > classes B > > >> and C may be in different translation units, so we do not necessarily > have > > >> a complete picture of A's bit set at A's compile time. > > >> > > >> What I propose is that the parts of the bit sets be represented as > arrays > > >> of pointers, which will be resolved through some mechanism (either at > > >> LTO time, or at regular (non-LTO) link time) to bit sets > corresponding to > > >> those addresses. This mechanism would also be responsible for laying > out > > >> the referenced globals (i.e. the vtables) as close together as > possible. > > >> We > > >> introduce a new flag on globals, the so-called "bitset" flag, which > causes > > >> the global to be transformed using this mechanism. Such a global > cannot be > > >> accessed in the normal way. It may only be accessed using an intrinsic > > >> that > > >> tests whether a given pointer is in the set. > > >> > > >> To return to our concrete example, a translation unit defining the > vtable > > >> for A may also define the following global: > > >> > > >> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1A, 0, 1)] > > >> > > >> A translation unit defining B would define: > > >> > > >> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)] > > >> @_ZBS1B = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)] > > >> > > >> And C: > > >> > > >> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)] > > >> @_ZBS1C = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)] > > >> > > >> A later mechanism would combine the bitset globals, lay out the > referenced > > >> globals and build the bit sets. In the LTO case, the IR linker can > already > > >> do the combination step if the globals are given appending linkage. A > > >> later compiler pass (which we will call the "bitset lowering pass") > would > > >> lay out > > >> the globals (by combining them into a single large global, and > RAUWing the > > >> original globals with aliases that gep into the large global) and > build > > >> the > > >> bit sets. In the non-LTO case, we could also invent an > > >> object-file-specific > > >> mechanism for the backend to tell the linker to combine and build the > bit > > >> sets, > > >> for example by placing the bitset globals in a specifically named > section, > > >> and place the referenced globals in individual sections so that the > linker > > >> can rearrange them. > > >> > > >> As previously mentioned, an intrinsic would be used to test entries > in the > > >> bit set at call sites. For example, the relevant IR for the following > C++ > > >> code: > > >> > > >> A *some_object = ...; > > >> some_object->f(); > > >> > > >> might look like this: > > >> > > >> %vtable = load i8** %some_object > > >> %p = call i1 @llvm.bitset.test(i8* %vtable, i8* @_ZBS1A) readnone > > >> br i1 %p, label %continue, label %trap > > >> > > >> In the LTO case, such intrinsic calls would be lowered by the bitset > > >> lowering pass > > >> into the appropriate range/bit set check code, which might look like > this: > > >> (pseudocode for 64-bit platform) > > >> > > >> if %vtable & 7 != 0 || %vtable < global_start_addr(_ZBS1A) || %vtable > > > > >> global_end_addr(_ZBS1A) { > > >> %p = 0 > > >> } else { > > >> %p = bitset(_ZBS1A)[(%vtable - global_start_addr(_ZBS1A)) >> 3] > > >> } > > >> > > >> In the non-LTO case, we would generate similar code, but with > appropriate > > >> relocations so that the required constants (global start/end address, > > >> bitset > > >> address, mask, shift amount) would receive appropriate > linker-determined > > >> values. > > >> > > >> The first step I propose to make in this direction is a patch that > adds > > >> support for the bitset flag on globals, and introduces the bitset > > >> lowering pass. A > > >> later step would involve building backend support, and linker support > in > > >> lld. > > >> > > >> Comments appreciated. > > >> > > >> Thanks, > > >> -- > > >> Peter > > >> _______________________________________________ > > >> LLVM Developers mailing list > > >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > >> > > > > > > > > > _______________________________________________ > > > LLVM Developers mailing list > > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > > > > > -- > Peter > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150129/94319591/attachment.html>