thr3ads.net - llvm dev - [LLVMdev] [RFC] Simple control-flow integrity [Feb 2014]

If this information is useful, please help other people find it:
Share via:

Tom Roeder

2014-Feb-10 23:33 UTC

[LLVMdev] [RFC] Simple control-flow integrity

Hi everyone,

I've been working on control-flow integrity (CFI) techniques over
LLVM, and I'd like to get feedback on these techniques and their
potential usefulness as a part of LLVM. I'd like to submit some
patches for this; I've implemented a version of it, and I've applied
it to large, real-world programs like Chromium to see how well it
holds up in practice.


TL;DR: my CFI pass builds jump tables and adds a fast check to
indirect calls; values that fail the check are passed to a function
defined at compile time. I have added special analysis and
source-level annotations to help deal with the problems of external
function pointers.



Details:

My current implementation works as a pass over a single module
consisting of all the code that the compiler has at LTO time. At the
IR level, this pass:

    1. creates a power-of-two sized InlineAsm jump table (or multiple
jump tables) filled with jump instructions to each address-taken
function.

    2. replaces each such address-taken function with a pointer to the
corresponding location in the appropriate table. Note that these will
be valid function pointers for the purposes of external code.

    3. adds a fast check for pointer safety at each indirect call site:

         a. It forces the pointer into the appropriate table (based on
type information), and checks to see if the pointer changed. Pointers
that were already in the right table will not change, and all other
pointers will. We rewrite the pointer either by masking and adding to
a base pointer, if we can guarantee sufficient table alignment,
otherwise by subtracting from a base, then masking, then adding back
to the base.

         b. If the pointer fails the check, it's passed to a CFI
failure function defined at compile time to handle it. By default, we
define a function written in IR; this function prints out the name of
the function in which the CFI violation happens.




The biggest challenge for such an implementation is functions that are
neither declared nor defined at LTO time. These functions are false
positives for the CFI check. They can occur in at least 3 ways:

    - JIT code, like in the v8 javascript engine, can allocate and
call functions that were not defined at compile time. These functions
are not even external: they just didn’t exist at LTO time.

    - External functions can return pointers to external functions
that were not exposed at LTO time. The canonical example in this class
is dlsym, which is used extensively by many projects. Other commonly
used cases are signal/sigaction (returns the old signal handler),
XSetErrorHandler from X, and std::set_new_handler from the Standard
C++ library. But this happens with any dynamically-linked library that
has a method that returns function pointers.

    - Internal code that takes function pointer arguments can be
passed to external code and have external function pointers passed to
it as arguments. This pattern is used extensively by graphics
libraries, e.g., gtk.




I have some techniques that help handle these false positives:

    - Since CFI violations are passed to an arbitrary function, the
policy for these violations can be set at compile time. For example,
you could run the rewritten code for a while to build up a set of
known false positives, then switch to a CFI failure function that
stopped when it saw something not allowed by the policy. This is
similar to the approach taken by, e.g., AppArmor.

    - my current CFI pass looks for special annotations added to the
source code: these are of the form
__attribute__((annotate("cfi-maybe-external"))) and
__attribute__((annotate("cfi-no-rewrite")))

         - cfi-maybe-external can be applied to pointers and variables
(llvm.ptr.annotation and llvm.var.annotation) and means that this
value sometimes stores external function pointers.

         - cfi-no-rewrite is applied to functions and means that there
are indirect calls in this function that can happen with external
function pointers. The current implementation skips rewriting for
these functions, but it could instead be used to prepopulate a list of
known potential false positives.

    - I have a separate analysis pass called ExternalFunctionAnalysis
that does a fairly naive interprocedural dataflow analysis starting
from cfi-maybe-external annotations and from all places where it can
find external function pointers coming in to the module:

          - if an external function pointer flows into a store that
doesn't flow from an annotated location, then the pass prints a
warning

          - all indirect call sites that flow from annotated
pointers/variables are not rewritten (but this could be used instead
to prepopulate a whitelist of known false positives instead).



As I mentioned, I've used my current implementation to build a version
of Chromium protected with this form of CFI; in the process, I added
sufficient annotations to the Chromium code base to catch all false
positives (or at least: I haven't seen any in my testing so far). I've
also tried it out with other, less immense, projects, like the SPEC
CPU2006 benchmark suite.

Please let me know what you think.

Thanks,

Tom

JF Bastien

2014-Feb-11 00:22 UTC

head link

[LLVMdev] [RFC] Simple control-flow integrity

Hi Tom,

The PNaCl team is very interested in your work. We see 3 applications
for our purpose:
 - Augment the safety of our trusted code base, which includes OS
shims as well as NaCl validators.
 - Apply extra CFI to the PNaCl translator. The PNaCl translator
already runs inside a NaCl SFI sandbox (which include CFI), but this
sandboxing just ensure that the code doesn't escape the sandbox, it
doesn't guarantee that the translator is doing the right thing. Adding
your CFI on top is nice because it captures some semantic information
from the original LLVM source, and makes it that much harder to
exploit a bug in LLVM to generate malicious code (which still needs to
pass NaCl validation, so it's an extra defense-in-depth layer).
 - Optionally apply your CFI to user applications which opt-in.

Some of the concerns you express won't affect our platform since we
link everything statically and avoid passing raw pointer when
possible.

A few questions:
 - Have you tried running your CFI on LLVM itself? Did you need to add
any annotations?
 - What is the performance and size hit on different applications?

JF

On Mon, Feb 10, 2014 at 3:33 PM, Tom Roeder <tmroeder at google.com>
wrote:> Hi everyone,
>
> I've been working on control-flow integrity (CFI) techniques over
> LLVM, and I'd like to get feedback on these techniques and their
> potential usefulness as a part of LLVM. I'd like to submit some
> patches for this; I've implemented a version of it, and I've
applied
> it to large, real-world programs like Chromium to see how well it
> holds up in practice.
>
>
> TL;DR: my CFI pass builds jump tables and adds a fast check to
> indirect calls; values that fail the check are passed to a function
> defined at compile time. I have added special analysis and
> source-level annotations to help deal with the problems of external
> function pointers.
>
>
>
> Details:
>
> My current implementation works as a pass over a single module
> consisting of all the code that the compiler has at LTO time. At the
> IR level, this pass:
>
>     1. creates a power-of-two sized InlineAsm jump table (or multiple
> jump tables) filled with jump instructions to each address-taken
> function.
>
>     2. replaces each such address-taken function with a pointer to the
> corresponding location in the appropriate table. Note that these will
> be valid function pointers for the purposes of external code.
>
>     3. adds a fast check for pointer safety at each indirect call site:
>
>          a. It forces the pointer into the appropriate table (based on
> type information), and checks to see if the pointer changed. Pointers
> that were already in the right table will not change, and all other
> pointers will. We rewrite the pointer either by masking and adding to
> a base pointer, if we can guarantee sufficient table alignment,
> otherwise by subtracting from a base, then masking, then adding back
> to the base.
>
>          b. If the pointer fails the check, it's passed to a CFI
> failure function defined at compile time to handle it. By default, we
> define a function written in IR; this function prints out the name of
> the function in which the CFI violation happens.
>
>
>
>
> The biggest challenge for such an implementation is functions that are
> neither declared nor defined at LTO time. These functions are false
> positives for the CFI check. They can occur in at least 3 ways:
>
>     - JIT code, like in the v8 javascript engine, can allocate and
> call functions that were not defined at compile time. These functions
> are not even external: they just didn’t exist at LTO time.
>
>     - External functions can return pointers to external functions
> that were not exposed at LTO time. The canonical example in this class
> is dlsym, which is used extensively by many projects. Other commonly
> used cases are signal/sigaction (returns the old signal handler),
> XSetErrorHandler from X, and std::set_new_handler from the Standard
> C++ library. But this happens with any dynamically-linked library that
> has a method that returns function pointers.
>
>     - Internal code that takes function pointer arguments can be
> passed to external code and have external function pointers passed to
> it as arguments. This pattern is used extensively by graphics
> libraries, e.g., gtk.
>
>
>
>
> I have some techniques that help handle these false positives:
>
>     - Since CFI violations are passed to an arbitrary function, the
> policy for these violations can be set at compile time. For example,
> you could run the rewritten code for a while to build up a set of
> known false positives, then switch to a CFI failure function that
> stopped when it saw something not allowed by the policy. This is
> similar to the approach taken by, e.g., AppArmor.
>
>     - my current CFI pass looks for special annotations added to the
> source code: these are of the form
> __attribute__((annotate("cfi-maybe-external"))) and
> __attribute__((annotate("cfi-no-rewrite")))
>
>          - cfi-maybe-external can be applied to pointers and variables
> (llvm.ptr.annotation and llvm.var.annotation) and means that this
> value sometimes stores external function pointers.
>
>          - cfi-no-rewrite is applied to functions and means that there
> are indirect calls in this function that can happen with external
> function pointers. The current implementation skips rewriting for
> these functions, but it could instead be used to prepopulate a list of
> known potential false positives.
>
>     - I have a separate analysis pass called ExternalFunctionAnalysis
> that does a fairly naive interprocedural dataflow analysis starting
> from cfi-maybe-external annotations and from all places where it can
> find external function pointers coming in to the module:
>
>           - if an external function pointer flows into a store that
> doesn't flow from an annotated location, then the pass prints a
> warning
>
>           - all indirect call sites that flow from annotated
> pointers/variables are not rewritten (but this could be used instead
> to prepopulate a whitelist of known false positives instead).
>
>
>
> As I mentioned, I've used my current implementation to build a version
> of Chromium protected with this form of CFI; in the process, I added
> sufficient annotations to the Chromium code base to catch all false
> positives (or at least: I haven't seen any in my testing so far).
I've
> also tried it out with other, less immense, projects, like the SPEC
> CPU2006 benchmark suite.
>
> Please let me know what you think.
>
> Thanks,
>
> Tom
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Stephen Checkoway

2014-Feb-11 00:56 UTC

head link

[LLVMdev] [RFC] Simple control-flow integrity

One thing to note is that this does not protect return instructions the way
traditional control-flow integrity does. One consequence of this is that
applying this transformation imposes only a very modest run-time overhead to the
protected application at the cost of not defending against, for example, buffer
overflows on the stack which modify the saved instruction pointer.

Steve

On Feb 10, 2014, at 6:33 PM, Tom Roeder <tmroeder at google.com> wrote:
> Hi everyone,
> 
> I've been working on control-flow integrity (CFI) techniques over
> LLVM, and I'd like to get feedback on these techniques and their
> potential usefulness as a part of LLVM. I'd like to submit some
> patches for this; I've implemented a version of it, and I've
applied
> it to large, real-world programs like Chromium to see how well it
> holds up in practice.
> 
> 
> TL;DR: my CFI pass builds jump tables and adds a fast check to
> indirect calls; values that fail the check are passed to a function
> defined at compile time. I have added special analysis and
> source-level annotations to help deal with the problems of external
> function pointers.
> 
> 
> 
> Details:
> 
> My current implementation works as a pass over a single module
> consisting of all the code that the compiler has at LTO time. At the
> IR level, this pass:
> 
>    1. creates a power-of-two sized InlineAsm jump table (or multiple
> jump tables) filled with jump instructions to each address-taken
> function.
> 
>    2. replaces each such address-taken function with a pointer to the
> corresponding location in the appropriate table. Note that these will
> be valid function pointers for the purposes of external code.
> 
>    3. adds a fast check for pointer safety at each indirect call site:
> 
>         a. It forces the pointer into the appropriate table (based on
> type information), and checks to see if the pointer changed. Pointers
> that were already in the right table will not change, and all other
> pointers will. We rewrite the pointer either by masking and adding to
> a base pointer, if we can guarantee sufficient table alignment,
> otherwise by subtracting from a base, then masking, then adding back
> to the base.
> 
>         b. If the pointer fails the check, it's passed to a CFI
> failure function defined at compile time to handle it. By default, we
> define a function written in IR; this function prints out the name of
> the function in which the CFI violation happens.
> 
> 
> 
> 
> The biggest challenge for such an implementation is functions that are
> neither declared nor defined at LTO time. These functions are false
> positives for the CFI check. They can occur in at least 3 ways:
> 
>    - JIT code, like in the v8 javascript engine, can allocate and
> call functions that were not defined at compile time. These functions
> are not even external: they just didn’t exist at LTO time.
> 
>    - External functions can return pointers to external functions
> that were not exposed at LTO time. The canonical example in this class
> is dlsym, which is used extensively by many projects. Other commonly
> used cases are signal/sigaction (returns the old signal handler),
> XSetErrorHandler from X, and std::set_new_handler from the Standard
> C++ library. But this happens with any dynamically-linked library that
> has a method that returns function pointers.
> 
>    - Internal code that takes function pointer arguments can be
> passed to external code and have external function pointers passed to
> it as arguments. This pattern is used extensively by graphics
> libraries, e.g., gtk.
> 
> 
> 
> 
> I have some techniques that help handle these false positives:
> 
>    - Since CFI violations are passed to an arbitrary function, the
> policy for these violations can be set at compile time. For example,
> you could run the rewritten code for a while to build up a set of
> known false positives, then switch to a CFI failure function that
> stopped when it saw something not allowed by the policy. This is
> similar to the approach taken by, e.g., AppArmor.
> 
>    - my current CFI pass looks for special annotations added to the
> source code: these are of the form
> __attribute__((annotate("cfi-maybe-external"))) and
> __attribute__((annotate("cfi-no-rewrite")))
> 
>         - cfi-maybe-external can be applied to pointers and variables
> (llvm.ptr.annotation and llvm.var.annotation) and means that this
> value sometimes stores external function pointers.
> 
>         - cfi-no-rewrite is applied to functions and means that there
> are indirect calls in this function that can happen with external
> function pointers. The current implementation skips rewriting for
> these functions, but it could instead be used to prepopulate a list of
> known potential false positives.
> 
>    - I have a separate analysis pass called ExternalFunctionAnalysis
> that does a fairly naive interprocedural dataflow analysis starting
> from cfi-maybe-external annotations and from all places where it can
> find external function pointers coming in to the module:
> 
>          - if an external function pointer flows into a store that
> doesn't flow from an annotated location, then the pass prints a
> warning
> 
>          - all indirect call sites that flow from annotated
> pointers/variables are not rewritten (but this could be used instead
> to prepopulate a whitelist of known false positives instead).
> 
> 
> 
> As I mentioned, I've used my current implementation to build a version
> of Chromium protected with this form of CFI; in the process, I added
> sufficient annotations to the Chromium code base to catch all false
> positives (or at least: I haven't seen any in my testing so far).
I've
> also tried it out with other, less immense, projects, like the SPEC
> CPU2006 benchmark suite.
> 
> Please let me know what you think.
> 
> Thanks,
> 
> Tom
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-- 
Stephen Checkoway

Tom Roeder

2014-Feb-11 01:13 UTC

head link

[LLVMdev] [RFC] Simple control-flow integrity

On Mon, Feb 10, 2014 at 4:22 PM, JF Bastien <jfb at google.com>
wrote:> Hi Tom,
>
> A few questions:
>  - Have you tried running your CFI on LLVM itself? Did you need to add
> any annotations?Actually, I haven't. That will depend on me being able to compile LLVM
under LTO. I'll give it a try.
>  - What is the performance and size hit on different applications?The overhead varies a bit, but perf's generally been in the small # of
percent over a version compiled with LLVM LTO. For example, a version
of Chromium M31 had about a 4% perf overhead running the dromaeo.com
benchmark. The size hit mostly depends on the number of functions and
call sites; e.g., in x86-64, each function entry in the table takes up
8 bytes, and each rewritten indirect call instruction takes up 35
extra bytes for the pointer rewriting and the branch and call in the
case of a violation.

Eric Christopher

2014-Feb-11 01:19 UTC

head link

[LLVMdev] [RFC] Simple control-flow integrity

>     1. creates a power-of-two sized InlineAsm jump table (or multiple
> jump tables) filled with jump instructions to each address-taken
> function.
>
Why inline asm? There's probably a better way to do this via lowering
your jump table in the backend etc.

-eric
>     2. replaces each such address-taken function with a pointer to the
> corresponding location in the appropriate table. Note that these will
> be valid function pointers for the purposes of external code.
>
>     3. adds a fast check for pointer safety at each indirect call site:
>
>          a. It forces the pointer into the appropriate table (based on
> type information), and checks to see if the pointer changed. Pointers
> that were already in the right table will not change, and all other
> pointers will. We rewrite the pointer either by masking and adding to
> a base pointer, if we can guarantee sufficient table alignment,
> otherwise by subtracting from a base, then masking, then adding back
> to the base.
>
>          b. If the pointer fails the check, it's passed to a CFI
> failure function defined at compile time to handle it. By default, we
> define a function written in IR; this function prints out the name of
> the function in which the CFI violation happens.
>
>
>
>
> The biggest challenge for such an implementation is functions that are
> neither declared nor defined at LTO time. These functions are false
> positives for the CFI check. They can occur in at least 3 ways:
>
>     - JIT code, like in the v8 javascript engine, can allocate and
> call functions that were not defined at compile time. These functions
> are not even external: they just didn't exist at LTO time.
>
>     - External functions can return pointers to external functions
> that were not exposed at LTO time. The canonical example in this class
> is dlsym, which is used extensively by many projects. Other commonly
> used cases are signal/sigaction (returns the old signal handler),
> XSetErrorHandler from X, and std::set_new_handler from the Standard
> C++ library. But this happens with any dynamically-linked library that
> has a method that returns function pointers.
>
>     - Internal code that takes function pointer arguments can be
> passed to external code and have external function pointers passed to
> it as arguments. This pattern is used extensively by graphics
> libraries, e.g., gtk.
>
>
>
>
> I have some techniques that help handle these false positives:
>
>     - Since CFI violations are passed to an arbitrary function, the
> policy for these violations can be set at compile time. For example,
> you could run the rewritten code for a while to build up a set of
> known false positives, then switch to a CFI failure function that
> stopped when it saw something not allowed by the policy. This is
> similar to the approach taken by, e.g., AppArmor.
>
>     - my current CFI pass looks for special annotations added to the
> source code: these are of the form
> __attribute__((annotate("cfi-maybe-external"))) and
> __attribute__((annotate("cfi-no-rewrite")))
>
>          - cfi-maybe-external can be applied to pointers and variables
> (llvm.ptr.annotation and llvm.var.annotation) and means that this
> value sometimes stores external function pointers.
>
>          - cfi-no-rewrite is applied to functions and means that there
> are indirect calls in this function that can happen with external
> function pointers. The current implementation skips rewriting for
> these functions, but it could instead be used to prepopulate a list of
> known potential false positives.
>
>     - I have a separate analysis pass called ExternalFunctionAnalysis
> that does a fairly naive interprocedural dataflow analysis starting
> from cfi-maybe-external annotations and from all places where it can
> find external function pointers coming in to the module:
>
>           - if an external function pointer flows into a store that
> doesn't flow from an annotated location, then the pass prints a
> warning
>
>           - all indirect call sites that flow from annotated
> pointers/variables are not rewritten (but this could be used instead
> to prepopulate a whitelist of known false positives instead).
>
>
>
> As I mentioned, I've used my current implementation to build a version
> of Chromium protected with this form of CFI; in the process, I added
> sufficient annotations to the Chromium code base to catch all false
> positives (or at least: I haven't seen any in my testing so far).
I've
> also tried it out with other, less immense, projects, like the SPEC
> CPU2006 benchmark suite.
>
> Please let me know what you think.
>
> Thanks,
>
> Tom
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Reid Kleckner

2014-Feb-11 07:51 UTC

head link

[LLVMdev] [RFC] Simple control-flow integrity

Tom, this sounds awesome.  I'm imagining a wonderful world of CFI hardened
browsers.

On Mon, Feb 10, 2014 at 5:19 PM, Eric Christopher <echristo at
gmail.com>wrote:
> >     1. creates a power-of-two sized InlineAsm jump table (or multiple
> > jump tables) filled with jump instructions to each address-taken
> > function.
> >
>
> Why inline asm? There's probably a better way to do this via lowering
> your jump table in the backend etc.
>
IIRC this came up before, and I don't think we expose anything like a jump
table at the IR level.  As an IR-to-IR transform, I think asm is the only
way to do it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140210/8fc438b1/attachment.html>

Joerg Sonnenberger

2014-Feb-11 13:07 UTC

head link

[LLVMdev] [RFC] Simple control-flow integrity

On Mon, Feb 10, 2014 at 03:33:32PM -0800, Tom Roeder
wrote:>     3. adds a fast check for pointer safety at each indirect call site:
Why not using a bloom filter for valid target addresses instead?

Joerg

Apparently Analagous Threads

Search for more apparently analagous threads

llvm dev - Feb 2014 - [LLVMdev] [RFC] Simple control-flow integrity

[LLVMdev] [RFC] Simple control-flow integrity

[LLVMdev] [RFC] Simple control-flow integrity

[LLVMdev] [RFC] Simple control-flow integrity

[LLVMdev] [RFC] Simple control-flow integrity

[LLVMdev] [RFC] Simple control-flow integrity

[LLVMdev] [RFC] Simple control-flow integrity

[LLVMdev] [RFC] Simple control-flow integrity

Apparently Analagous Threads