Peter Collingbourne
2015-Jan-30 19:44 UTC
[LLVMdev] IR extension proposal: bitset constants
Hi Chris, I wanted to start by giving an explanation of what I am trying to achieve and how I am trying to achieve it. I am working towards introducing into LLVM a security mechanism, Forward Control Flow Integrity (CFI), that is designed to mitigate against vulnerabilities that allow attacks based on corrupting vtable or function pointers in memory in order to subvert a program's control flow. More details are in a paper that I co-authored that was presented at USENIX Security last year [1]. As mentioned in the paper, attackers are increasingly relying on such techniques to subvert control flow. This is why I feel that it is particularly important that compilers contain practical defenses against such attacks. One particular variant of the defense I am proposing, vtable verification, was implemented in GCC and described in section 3 of the paper, however it comes with a significant performance overhead, more than 8% on certain Chrome browser-based benchmarks and up to around 20% on SPEC 2006 benchmarks (see Figure 2). This is likely due to the fact that it searches lists of vtables to determine if a given vtable is valid. This is a direct consequence of its avoidance of techniques that depend on changing how the program is linked. The implementation I am proposing to contribute to LLVM focuses on performance. Building the needed data structures at link time allows us to reduce the cost of checking the validity of a vtable pointer to a few machine instructions and a single load from memory. On Thu, Jan 29, 2015 at 11:04:41PM -0800, Chris Lattner wrote:> I don’t think that adding an IR construct for this is the way to go. You’re making IR complicated for everyone to serve a very narrow use case (one that admittedly I don’t really understand). > Also, your patch is incomplete. Presumably you have to scatter checks for “if (!GV->isBitSet())” throughout the optimizer, codegen and other things that touch globals.I'll address these points simultaneously because I would like to explain why extensive support for the new construct is not required, and why the maintenance burden is not as large as it might seem (and why my patch does not need to make such extensive changes to the optimizers). The first point I'd like to make is that from the point of view of optimizer passes other than bitset lowering, the llvm.bitset.test intrinsic has opaque semantics with regard to the content of the globals it references, so they cannot legally modify the contents of the bitset global. The second is that any use of a bitset global other than as an argument to the llvm.bitset.test intrinsic has undefined semantics. (This is something that can be documented better.) This means that any optimizer pass that looks through global initializers does not require any changes, as any transformation it may perform on IR that treats such globals as regular globals (for example by taking its address or loading from it) is semantics preserving, as the semantics of such IR would have been undefined anyway. With these points in mind, I'm reasonably confident that very little code needs to care about the new flag. (I should also point out that I know that the patch most likely works without any other optimizer changes, because I have a work-in-progress patch that implements the Clang side of this, and have successfully applied it to a large C++ codebase and found real bugs in that codebase.) Regarding codegen, I haven't implemented support in codegen for bitsets yet, the intrinsic is completely handled by the pass. I can't imagine the changes being very intrusive though. We can easily add a check that no bitset stuff makes it through to codegen for the moment.> The fact that this affects semantics and will only work with LTO and not native linkers is also really weird to me.I agree, which is why I plan to add support to lld and perhaps other linkers, but we do have to start somewhere. Adding the functionality to the compiler only seems like a reasonable first step, even if we depend on LTO to begin with.> Is there other precedent for that? The only cases I know that affect LTO add information that is safe to drop (e.g. TBAA etc).There is the jumptable attribute, which has been used to implement a variant of CFI for indirect function calls (see section 4 of the USENIX paper), and that only works effectively with an LTO'd module. (We might end up adding native linker support for that or something similar as well.) Thanks, -- Peter [1] http://www.pcc.me.uk/~peter/acad/usenix14.pdf
Trying to summarize all opinions expressed here: Peter is proposing an initial implementation that would only work with LTO. Folks seem put off by this implementation affecting IR without having proven itself, and having shortcomings (as Jim pointed out). Kostya proposed going through metadata (and Chris kind of did too by mentioning tbaa), but Peter points out that this will make the implementation trickier. It sounds like going through metadata for the LTO-only proof-of-concept would be preferable, even if more complex. Peter, how more complex would that be? As an interested user of this I'm happy with LTO, and I can help with the reviews of the code. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150131/78d2dc7c/attachment.html>
Peter Collingbourne
2015-Jan-31 22:07 UTC
[LLVMdev] IR extension proposal: bitset constants
On Sat, Jan 31, 2015 at 11:35:01AM -0800, JF Bastien wrote:> Trying to summarize all opinions expressed here: Peter is proposing an > initial implementation that would only work with LTO. Folks seem put off by > this implementation affecting IR without having proven itself, and having > shortcomings (as Jim pointed out). Kostya proposed going through metadata > (and Chris kind of did too by mentioning tbaa), but Peter points out that > this will make the implementation trickier. > > It sounds like going through metadata for the LTO-only proof-of-concept > would be preferable, even if more complex. Peter, how more complex would > that be?I was just thinking about how the metadata-based design for this might work. We could have a named metadata global containing the list of bitset entries. Each entry would be a pair consisting of an identifier for the bitset (in this case, the mangled name of the class) and a pointer into a global (in this case, a valid vtable pointer). For example, this class hierarchy: class A { virtual void f(); }; class B : A { virtual void f(); }; class C : A { virtual void f(); }; would have these bitsets: !llvm.bitsets = !{!0, !1, !2, !3, !4} !0 = !{!"1A", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1A, i32 0, i32 2)} !1 = !{!"1A", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1B, i32 0, i32 2)} !2 = !{!"1A", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1C, i32 0, i32 2)} !3 = !{!"1B", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1B, i32 0, i32 2)} !4 = !{!"1C", i8* getelementptr inbounds ([3 x i8*]* @_ZTV1C, i32 0, i32 2)} The LLVM linker can already merge the contents of named metadata globals. The intrinsic for reading from bitsets would have this definition: declare i1 @llvm.bitset.test(i8*, metadata) and would be called like this: %x = call i1 @llvm.bitset.test(i8* %p, metadata !{!"1A"}) The disadvantage of this design is that metadata strings always have global scope, so identically named classes in anonymous namespaces in different translation units would receive the same bitset. (We can't just use the vtable globals as bitset identifiers, because they may be erased by globaldce. There are probably various things we could do to force the globals to stick around, if we wanted to go that way.) Does that seem more reasonable to people? Thanks, -- Peter