Peter Collingbourne
2015-Jan-31 22:07 UTC
[LLVMdev] IR extension proposal: bitset constants
On Sat, Jan 31, 2015 at 11:35:01AM -0800, JF Bastien wrote:> Trying to summarize all opinions expressed here: Peter is proposing an > initial implementation that would only work with LTO. Folks seem put off by > this implementation affecting IR without having proven itself, and having > shortcomings (as Jim pointed out). Kostya proposed going through metadata > (and Chris kind of did too by mentioning tbaa), but Peter points out that > this will make the implementation trickier. > > It sounds like going through metadata for the LTO-only proof-of-concept > would be preferable, even if more complex. Peter, how more complex would > that be?I was just thinking about how the metadata-based design for this might work. We could have a named metadata global containing the list of bitset entries. Each entry would be a pair consisting of an identifier for the bitset (in this case, the mangled name of the class) and a pointer into a global (in this case, a valid vtable pointer). For example, this class hierarchy: class A { virtual void f(); }; class B : A { virtual void f(); }; class C : A { virtual void f(); }; would have these bitsets: !llvm.bitsets = !{!0, !1, !2, !3, !4} !0 = !{!"1A", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1A, i32 0, i32 2)} !1 = !{!"1A", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1B, i32 0, i32 2)} !2 = !{!"1A", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1C, i32 0, i32 2)} !3 = !{!"1B", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1B, i32 0, i32 2)} !4 = !{!"1C", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1C, i32 0, i32 2)} The LLVM linker can already merge the contents of named metadata globals. The intrinsic for reading from bitsets would have this definition: declare i1 @llvm.bitset.test(i8*, metadata) and would be called like this: %x = call i1 @llvm.bitset.test(i8* %p, metadata !{!"1A"}) The disadvantage of this design is that metadata strings always have global scope, so identically named classes in anonymous namespaces in different translation units would receive the same bitset. (We can't just use the vtable globals as bitset identifiers, because they may be erased by globaldce. There are probably various things we could do to force the globals to stick around, if we wanted to go that way.) Does that seem more reasonable to people? Thanks, -- Peter
On Sat, Jan 31, 2015 at 2:07 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:> On Sat, Jan 31, 2015 at 11:35:01AM -0800, JF Bastien wrote: > > Trying to summarize all opinions expressed here: Peter is proposing an > > initial implementation that would only work with LTO. Folks seem put off > by > > this implementation affecting IR without having proven itself, and having > > shortcomings (as Jim pointed out). Kostya proposed going through metadata > > (and Chris kind of did too by mentioning tbaa), but Peter points out that > > this will make the implementation trickier. > > > > It sounds like going through metadata for the LTO-only proof-of-concept > > would be preferable, even if more complex. Peter, how more complex would > > that be? > > I was just thinking about how the metadata-based design for this might > work. > > We could have a named metadata global containing the list of bitset > entries. > Each entry would be a pair consisting of an identifier for the bitset (in > this case, the mangled name of the class) and a pointer into a global > (in this case, a valid vtable pointer). > > For example, this class hierarchy: > > class A { virtual void f(); }; > class B : A { virtual void f(); }; > class C : A { virtual void f(); }; > > would have these bitsets: > > !llvm.bitsets = !{!0, !1, !2, !3, !4} > > !0 = !{!"1A", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1A, i32 0, i32 > 2)} > !1 = !{!"1A", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1B, i32 0, i32 > 2)} > !2 = !{!"1A", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1C, i32 0, i32 > 2)} > !3 = !{!"1B", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1B, i32 0, i32 > 2)} > !4 = !{!"1C", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1C, i32 0, i32 > 2)} > > The LLVM linker can already merge the contents of named metadata globals. > > The intrinsic for reading from bitsets would have this definition: > > declare i1 @llvm.bitset.test(i8*, metadata) > > and would be called like this: > > %x = call i1 @llvm.bitset.test(i8* %p, metadata !{!"1A"}) > > The disadvantage of this design is that metadata strings always have global > scope, so identically named classes in anonymous namespaces in different > translation units would receive the same bitset. (We can't just use the > vtable globals as bitset identifiers, because they may be erased by > globaldce. > There are probably various things we could do to force the globals to stick > around, if we wanted to go that way.) >That's pretty unfortunate. Can you mangle anonymous namespace class names further for your purpose? With the module name? Does that seem more reasonable to people?>Seems reasonable as a first step. It doesn't seem like anyone is opposed to trying it out, and adding to IR if the metadata approach shows this to be a desirable feature. The one think we need to ensure is that your metadata can be dropped by the optimizer and the code remains correct. I'm guessing no vtable would mean anything goes (not check)? That's bad security-wise, but it'll at least work. We may want to make sure nothing gets dropped through a debug flag, so that we can compile Chrome and be confident that all the checks we want are there. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150131/a7df1ae3/attachment.html>
Peter Collingbourne
2015-Jan-31 22:45 UTC
[LLVMdev] IR extension proposal: bitset constants
On Sat, Jan 31, 2015 at 02:17:22PM -0800, JF Bastien wrote:> On Sat, Jan 31, 2015 at 2:07 PM, Peter Collingbourne <peter at pcc.me.uk> > wrote: > > > On Sat, Jan 31, 2015 at 11:35:01AM -0800, JF Bastien wrote: > > > Trying to summarize all opinions expressed here: Peter is proposing an > > > initial implementation that would only work with LTO. Folks seem put off > > by > > > this implementation affecting IR without having proven itself, and having > > > shortcomings (as Jim pointed out). Kostya proposed going through metadata > > > (and Chris kind of did too by mentioning tbaa), but Peter points out that > > > this will make the implementation trickier. > > > > > > It sounds like going through metadata for the LTO-only proof-of-concept > > > would be preferable, even if more complex. Peter, how more complex would > > > that be? > > > > I was just thinking about how the metadata-based design for this might > > work. > > > > We could have a named metadata global containing the list of bitset > > entries. > > Each entry would be a pair consisting of an identifier for the bitset (in > > this case, the mangled name of the class) and a pointer into a global > > (in this case, a valid vtable pointer). > > > > For example, this class hierarchy: > > > > class A { virtual void f(); }; > > class B : A { virtual void f(); }; > > class C : A { virtual void f(); }; > > > > would have these bitsets: > > > > !llvm.bitsets = !{!0, !1, !2, !3, !4} > > > > !0 = !{!"1A", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1A, i32 0, i32 > > 2)} > > !1 = !{!"1A", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1B, i32 0, i32 > > 2)} > > !2 = !{!"1A", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1C, i32 0, i32 > > 2)} > > !3 = !{!"1B", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1B, i32 0, i32 > > 2)} > > !4 = !{!"1C", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1C, i32 0, i32 > > 2)} > > > > The LLVM linker can already merge the contents of named metadata globals. > > > > The intrinsic for reading from bitsets would have this definition: > > > > declare i1 @llvm.bitset.test(i8*, metadata) > > > > and would be called like this: > > > > %x = call i1 @llvm.bitset.test(i8* %p, metadata !{!"1A"}) > > > > The disadvantage of this design is that metadata strings always have global > > scope, so identically named classes in anonymous namespaces in different > > translation units would receive the same bitset. (We can't just use the > > vtable globals as bitset identifiers, because they may be erased by > > globaldce. > > There are probably various things we could do to force the globals to stick > > around, if we wanted to go that way.) > > > > That's pretty unfortunate. Can you mangle anonymous namespace class names > further for your purpose? With the module name?Yes, I forgot that we could do that. Mangling with the source file name would probably be enough for most purposes.> Does that seem more reasonable to people? > > > > Seems reasonable as a first step. It doesn't seem like anyone is opposed to > trying it out, and adding to IR if the metadata approach shows this to be a > desirable feature. > > The one think we need to ensure is that your metadata can be dropped by the > optimizer and the code remains correct. I'm guessing no vtable would mean > anything goes (not check)? That's bad security-wise, but it'll at least > work. We may want to make sure nothing gets dropped through a debug flag, > so that we can compile Chrome and be confident that all the checks we want > are there.So the flag would determine whether no bitset present means return 1 or return 0? Sounds reasonable. Thanks, -- Peter