thr3ads.net - llvm dev - [llvm-dev] Finding all pointers to functions [Dec 2015]

If this information is useful, please help other people find it:
Share via:

John Criswell via llvm-dev

2015-Dec-23 17:35 UTC

[llvm-dev] Finding all pointers to functions

On 12/23/15 2:09 AM, Russell Wallace wrote:> On Tue, Dec 22, 2015 at 10:55 AM, John Criswell <jtcriswel at gmail.com 
> <mailto:jtcriswel at gmail.com>> wrote:
>
>     You could conservatively assume that any function that has its
>     address taken has a pointer to it that escapes into memory or
>     external code.
>
>
> Right, that's what I'm doing to start with.
>
>     To make things a little more accurate, you could scan the uses of
>     any function for which hasAddressTaken() returns true and see if
>     any of its uses escapes its function or escapes into memory or
>     external code.  I believe hasAddressTaken() returns true if the
>     function is subjected to a cast instruction, and functions are
>     often casted if they are used in a call that uses a different
>     signature than the function's declared signature.
>
>
> I'll look into that. It seems reasonable to guess that the major 
> confounding factor in many C++ programs will be references from 
> virtual function tables; there should be some way to optimize those 
> specifically.
>
>
>     To get anything more accurate, you'll need to use alias analysis
>     or points-to analysis.  DSA tracks function pointers in the heap
>     and can tell you whether the function is called from external
>     code.  However, DSA's accuracy currently suffers if it is run
>     after LLVM's optimizations, and the code needs some serious TLC.
>
>
> DSA presumably stands for data structure analysis. TLC = tender loving 
> care? Why does DSA become less accurate if run after optimization?
>
DSA was built when LLVM's optimizations maintained the type information 
on GEP and other instructions (DSA existed before LLVM was 
open-source).  As such, it uses LLVM's type information to aid in its 
type-inference which, in turn, gives it field sensitivity which, in 
turn, improves its accuracy.  Over time, LLVM optimizations have come to 
modify the type information so that it is just simple byte-level 
indexing (as opposed to array-of-structure indexing).  DSA hasn't been 
updated to handle that well.  That is why its precision is better 
pre-optimization than post-optimization.

Just out of curiosity, what are you trying to do?  I need call graph 
analysis for C/C++ code with function pointers, and so I'm writing an 
NSF proposal to seek funding to do that (among other enhancements to my 
SVA infrastructure).  If it's something that would be useful to you (or 
other LLVM community members), it would be useful for me to know that.

Regards,

John Criswell


-- 
John Criswell
Assistant Professor
Department of Computer Science, University of Rochester
http://www.cs.rochester.edu/u/criswell

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151223/bd540330/attachment.html>

Russell Wallace via llvm-dev

2015-Dec-23 17:55 UTC

head link

[llvm-dev] Finding all pointers to functions

On Wed, Dec 23, 2015 at 5:35 PM, John Criswell <jtcriswel at gmail.com>
wrote:
> DSA was built when LLVM's optimizations maintained the type information
on
> GEP and other instructions (DSA existed before LLVM was open-source).  As
> such, it uses LLVM's type information to aid in its type-inference
which,
> in turn, gives it field sensitivity which, in turn, improves its accuracy.
> Over time, LLVM optimizations have come to modify the type information so
> that it is just simple byte-level indexing (as opposed to
> array-of-structure indexing).  DSA hasn't been updated to handle that
> well.  That is why its precision is better pre-optimization than
> post-optimization.
>
Ah! I don't suppose you could point to some examples of this? E.g. a simple
test program such that one could eyeball the intermediate code before and
after optimization?
>
> Just out of curiosity, what are you trying to do?  I need call graph
> analysis for C/C++ code with function pointers, and so I'm writing an
NSF
> proposal to seek funding to do that (among other enhancements to my SVA
> infrastructure).  If it's something that would be useful to you (or
other
> LLVM community members), it would be useful for me to know that.
>
SVA?

I'm trying to write a superoptimizer that can optimize code based on a
high-level understanding of what it's actually doing, so yes, call graph
analysis that can deal with function pointers does seem likely to be one of
the things that will be needed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151223/e63047aa/attachment.html>

John Criswell via llvm-dev

2015-Dec-23 18:26 UTC

head link

[llvm-dev] Finding all pointers to functions

On 12/23/15 12:55 PM, Russell Wallace wrote:> On Wed, Dec 23, 2015 at 5:35 PM, John Criswell <jtcriswel at gmail.com 
> <mailto:jtcriswel at gmail.com>> wrote:
>
>     DSA was built when LLVM's optimizations maintained the type
>     information on GEP and other instructions (DSA existed before LLVM
>     was open-source).  As such, it uses LLVM's type information to aid
>     in its type-inference which, in turn, gives it field sensitivity
>     which, in turn, improves its accuracy.  Over time, LLVM
>     optimizations have come to modify the type information so that it
>     is just simple byte-level indexing (as opposed to
>     array-of-structure indexing).  DSA hasn't been updated to handle
>     that well.  That is why its precision is better pre-optimization
>     than post-optimization.
>
>
> Ah! I don't suppose you could point to some examples of this? E.g. a 
> simple test program such that one could eyeball the intermediate code 
> before and after optimization?
Off the top of my head, no, I don't have an example, but I suspect any 
program with an array indexing operation with a for loop will do.
>
>     Just out of curiosity, what are you trying to do?  I need call
>     graph analysis for C/C++ code with function pointers, and so I'm
>     writing an NSF proposal to seek funding to do that (among other
>     enhancements to my SVA infrastructure).  If it's something that
>     would be useful to you (or other LLVM community members), it would
>     be useful for me to know that.
>
>
> SVA?
Sorry.  SVA is Secure Virtual Architecture.  It's my LLVM-based 
infrastructure for controlling operating system kernel behavior via 
compiler instrumentation and hardware configuration.  I've used it to 
build a system that protects applications from a compromised operating 
system kernel as well as to enforce memory safety and control-flow 
integrity on operating system kernel code.

I need DSA for doing things like:

1) Creating an accurate call graph for kernel code to enforce better 
control-flow integrity and to test our future infrastructure for 
measuring the efficacy of defenses against code reuse attacks.

2) Analyzing the memory accesses of kernel modules to see if they modify 
kernel data structures that they should not modify (e.g., to find 
rootkits that modify the process list).

3) For optimizing run-time checks that protect kernel data structure, at 
run-time, from other kernel components (useful for a number of things).

In short, strong points-to and call graph analysis enable some 
interesting research projects.
>
> I'm trying to write a superoptimizer that can optimize code based on a 
> high-level understanding of what it's actually doing, so yes, call 
> graph analysis that can deal with function pointers does seem likely 
> to be one of the things that will be needed.
Nice.

One thing you might want to investigate is whether building a call graph 
analysis off of the TBAA metadata would work.  If TBAA works for lots of 
programs (I hear some non-conformant programs cause it problems), then 
using it as a springboard for analysis may be effective (as TBAA is 
already well maintained in the LLVM source tree).

Regards,

John Criswell

-- 
John Criswell
Assistant Professor
Department of Computer Science, University of Rochester
http://www.cs.rochester.edu/u/criswell

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20151223/b7fdd5fb/attachment.html>

llvm dev - Dec 2015 - Finding all pointers to functions

[llvm-dev] Finding all pointers to functions

[llvm-dev] Finding all pointers to functions

[llvm-dev] Finding all pointers to functions