John Criswell via llvm-dev
2015-Dec-23 17:35 UTC
[llvm-dev] Finding all pointers to functions
On 12/23/15 2:09 AM, Russell Wallace wrote:> On Tue, Dec 22, 2015 at 10:55 AM, John Criswell <jtcriswel at gmail.com > <mailto:jtcriswel at gmail.com>> wrote: > > You could conservatively assume that any function that has its > address taken has a pointer to it that escapes into memory or > external code. > > > Right, that's what I'm doing to start with. > > To make things a little more accurate, you could scan the uses of > any function for which hasAddressTaken() returns true and see if > any of its uses escapes its function or escapes into memory or > external code. I believe hasAddressTaken() returns true if the > function is subjected to a cast instruction, and functions are > often casted if they are used in a call that uses a different > signature than the function's declared signature. > > > I'll look into that. It seems reasonable to guess that the major > confounding factor in many C++ programs will be references from > virtual function tables; there should be some way to optimize those > specifically. > > > To get anything more accurate, you'll need to use alias analysis > or points-to analysis. DSA tracks function pointers in the heap > and can tell you whether the function is called from external > code. However, DSA's accuracy currently suffers if it is run > after LLVM's optimizations, and the code needs some serious TLC. > > > DSA presumably stands for data structure analysis. TLC = tender loving > care? Why does DSA become less accurate if run after optimization? >DSA was built when LLVM's optimizations maintained the type information on GEP and other instructions (DSA existed before LLVM was open-source). As such, it uses LLVM's type information to aid in its type-inference which, in turn, gives it field sensitivity which, in turn, improves its accuracy. Over time, LLVM optimizations have come to modify the type information so that it is just simple byte-level indexing (as opposed to array-of-structure indexing). DSA hasn't been updated to handle that well. That is why its precision is better pre-optimization than post-optimization. Just out of curiosity, what are you trying to do? I need call graph analysis for C/C++ code with function pointers, and so I'm writing an NSF proposal to seek funding to do that (among other enhancements to my SVA infrastructure). If it's something that would be useful to you (or other LLVM community members), it would be useful for me to know that. Regards, John Criswell -- John Criswell Assistant Professor Department of Computer Science, University of Rochester http://www.cs.rochester.edu/u/criswell -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151223/bd540330/attachment.html>
Russell Wallace via llvm-dev
2015-Dec-23 17:55 UTC
[llvm-dev] Finding all pointers to functions
On Wed, Dec 23, 2015 at 5:35 PM, John Criswell <jtcriswel at gmail.com> wrote:> DSA was built when LLVM's optimizations maintained the type information on > GEP and other instructions (DSA existed before LLVM was open-source). As > such, it uses LLVM's type information to aid in its type-inference which, > in turn, gives it field sensitivity which, in turn, improves its accuracy. > Over time, LLVM optimizations have come to modify the type information so > that it is just simple byte-level indexing (as opposed to > array-of-structure indexing). DSA hasn't been updated to handle that > well. That is why its precision is better pre-optimization than > post-optimization. >Ah! I don't suppose you could point to some examples of this? E.g. a simple test program such that one could eyeball the intermediate code before and after optimization?> > Just out of curiosity, what are you trying to do? I need call graph > analysis for C/C++ code with function pointers, and so I'm writing an NSF > proposal to seek funding to do that (among other enhancements to my SVA > infrastructure). If it's something that would be useful to you (or other > LLVM community members), it would be useful for me to know that. >SVA? I'm trying to write a superoptimizer that can optimize code based on a high-level understanding of what it's actually doing, so yes, call graph analysis that can deal with function pointers does seem likely to be one of the things that will be needed. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151223/e63047aa/attachment.html>
John Criswell via llvm-dev
2015-Dec-23 18:26 UTC
[llvm-dev] Finding all pointers to functions
On 12/23/15 12:55 PM, Russell Wallace wrote:> On Wed, Dec 23, 2015 at 5:35 PM, John Criswell <jtcriswel at gmail.com > <mailto:jtcriswel at gmail.com>> wrote: > > DSA was built when LLVM's optimizations maintained the type > information on GEP and other instructions (DSA existed before LLVM > was open-source). As such, it uses LLVM's type information to aid > in its type-inference which, in turn, gives it field sensitivity > which, in turn, improves its accuracy. Over time, LLVM > optimizations have come to modify the type information so that it > is just simple byte-level indexing (as opposed to > array-of-structure indexing). DSA hasn't been updated to handle > that well. That is why its precision is better pre-optimization > than post-optimization. > > > Ah! I don't suppose you could point to some examples of this? E.g. a > simple test program such that one could eyeball the intermediate code > before and after optimization?Off the top of my head, no, I don't have an example, but I suspect any program with an array indexing operation with a for loop will do.> > Just out of curiosity, what are you trying to do? I need call > graph analysis for C/C++ code with function pointers, and so I'm > writing an NSF proposal to seek funding to do that (among other > enhancements to my SVA infrastructure). If it's something that > would be useful to you (or other LLVM community members), it would > be useful for me to know that. > > > SVA?Sorry. SVA is Secure Virtual Architecture. It's my LLVM-based infrastructure for controlling operating system kernel behavior via compiler instrumentation and hardware configuration. I've used it to build a system that protects applications from a compromised operating system kernel as well as to enforce memory safety and control-flow integrity on operating system kernel code. I need DSA for doing things like: 1) Creating an accurate call graph for kernel code to enforce better control-flow integrity and to test our future infrastructure for measuring the efficacy of defenses against code reuse attacks. 2) Analyzing the memory accesses of kernel modules to see if they modify kernel data structures that they should not modify (e.g., to find rootkits that modify the process list). 3) For optimizing run-time checks that protect kernel data structure, at run-time, from other kernel components (useful for a number of things). In short, strong points-to and call graph analysis enable some interesting research projects.> > I'm trying to write a superoptimizer that can optimize code based on a > high-level understanding of what it's actually doing, so yes, call > graph analysis that can deal with function pointers does seem likely > to be one of the things that will be needed.Nice. One thing you might want to investigate is whether building a call graph analysis off of the TBAA metadata would work. If TBAA works for lots of programs (I hear some non-conformant programs cause it problems), then using it as a springboard for analysis may be effective (as TBAA is already well maintained in the LLVM source tree). Regards, John Criswell -- John Criswell Assistant Professor Department of Computer Science, University of Rochester http://www.cs.rochester.edu/u/criswell -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151223/b7fdd5fb/attachment.html>