John Criswell via llvm-dev
2015-Dec-22 10:55 UTC
[llvm-dev] Finding all pointers to functions
On 12/22/15 4:45 AM, Russell Wallace via llvm-dev wrote:> Oh, I just came across Function::hasAddressTaken. Maybe I can just use > that instead?You could conservatively assume that any function that has its address taken has a pointer to it that escapes into memory or external code. To make things a little more accurate, you could scan the uses of any function for which hasAddressTaken() returns true and see if any of its uses escapes its function or escapes into memory or external code. I believe hasAddressTaken() returns true if the function is subjected to a cast instruction, and functions are often casted if they are used in a call that uses a different signature than the function's declared signature. To get anything more accurate, you'll need to use alias analysis or points-to analysis. DSA tracks function pointers in the heap and can tell you whether the function is called from external code. However, DSA's accuracy currently suffers if it is run after LLVM's optimizations, and the code needs some serious TLC. Regards, John Criswell> > On Tue, Dec 22, 2015 at 5:11 AM, Russell Wallace > <russell.wallace at gmail.com <mailto:russell.wallace at gmail.com>> wrote: > > I need to track down all pointers anywhere in a module that could > be pointing to functions (because some of the optimizations I want > to do, require either identifying every use of a function, or > conservatively identifying when such cannot be done). > > A starting point is to look at all the global variables: > > for (auto &G : M.globals()) > for (auto &V : G.operands()) > if (auto F = dyn_cast<Function>(V)) > > Of course, instructions can also refer to functions, both as > direct calls and otherwise: > > for (auto &F : M) { > for (auto &I : inst_range(F)) { > for (auto &V : I.operands()) > if (auto F = dyn_cast<Function>(V)) > > But there are other things as well, for example it seems there is > something called a personality function that can be a pointer to > another function, so need to add that > > if (F.hasPersonalityFn()) > > It seems there are other things called prefix data and prologue > data, which are pointers to constants, which could contain > pointers to functions, so would need to include those as well. > > Am I correct in thinking that prefix data and prologue data will > not be included in the global variables list, so do need special > handling? > > Could they be recursive? That is, could those constants contain > pointers to other constants... which end up containing pointers to > functions... such that none of the intermediate constant objects > are in the global variable list? > > Is there anything else I'm missing? > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- John Criswell Assistant Professor Department of Computer Science, University of Rochester http://www.cs.rochester.edu/u/criswell -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151222/c3de204d/attachment.html>
Russell Wallace via llvm-dev
2015-Dec-23 07:09 UTC
[llvm-dev] Finding all pointers to functions
On Tue, Dec 22, 2015 at 10:55 AM, John Criswell <jtcriswel at gmail.com> wrote:> You could conservatively assume that any function that has its address > taken has a pointer to it that escapes into memory or external code. >Right, that's what I'm doing to start with.> To make things a little more accurate, you could scan the uses of any > function for which hasAddressTaken() returns true and see if any of its > uses escapes its function or escapes into memory or external code. I > believe hasAddressTaken() returns true if the function is subjected to a > cast instruction, and functions are often casted if they are used in a call > that uses a different signature than the function's declared signature. >I'll look into that. It seems reasonable to guess that the major confounding factor in many C++ programs will be references from virtual function tables; there should be some way to optimize those specifically.> > To get anything more accurate, you'll need to use alias analysis or > points-to analysis. DSA tracks function pointers in the heap and can tell > you whether the function is called from external code. However, DSA's > accuracy currently suffers if it is run after LLVM's optimizations, and the > code needs some serious TLC. >DSA presumably stands for data structure analysis. TLC = tender loving care? Why does DSA become less accurate if run after optimization? -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151223/b67a9678/attachment.html>
John Criswell via llvm-dev
2015-Dec-23 17:35 UTC
[llvm-dev] Finding all pointers to functions
On 12/23/15 2:09 AM, Russell Wallace wrote:> On Tue, Dec 22, 2015 at 10:55 AM, John Criswell <jtcriswel at gmail.com > <mailto:jtcriswel at gmail.com>> wrote: > > You could conservatively assume that any function that has its > address taken has a pointer to it that escapes into memory or > external code. > > > Right, that's what I'm doing to start with. > > To make things a little more accurate, you could scan the uses of > any function for which hasAddressTaken() returns true and see if > any of its uses escapes its function or escapes into memory or > external code. I believe hasAddressTaken() returns true if the > function is subjected to a cast instruction, and functions are > often casted if they are used in a call that uses a different > signature than the function's declared signature. > > > I'll look into that. It seems reasonable to guess that the major > confounding factor in many C++ programs will be references from > virtual function tables; there should be some way to optimize those > specifically. > > > To get anything more accurate, you'll need to use alias analysis > or points-to analysis. DSA tracks function pointers in the heap > and can tell you whether the function is called from external > code. However, DSA's accuracy currently suffers if it is run > after LLVM's optimizations, and the code needs some serious TLC. > > > DSA presumably stands for data structure analysis. TLC = tender loving > care? Why does DSA become less accurate if run after optimization? >DSA was built when LLVM's optimizations maintained the type information on GEP and other instructions (DSA existed before LLVM was open-source). As such, it uses LLVM's type information to aid in its type-inference which, in turn, gives it field sensitivity which, in turn, improves its accuracy. Over time, LLVM optimizations have come to modify the type information so that it is just simple byte-level indexing (as opposed to array-of-structure indexing). DSA hasn't been updated to handle that well. That is why its precision is better pre-optimization than post-optimization. Just out of curiosity, what are you trying to do? I need call graph analysis for C/C++ code with function pointers, and so I'm writing an NSF proposal to seek funding to do that (among other enhancements to my SVA infrastructure). If it's something that would be useful to you (or other LLVM community members), it would be useful for me to know that. Regards, John Criswell -- John Criswell Assistant Professor Department of Computer Science, University of Rochester http://www.cs.rochester.edu/u/criswell -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151223/bd540330/attachment.html>
Christian Convey via llvm-dev
2015-Dec-28 15:50 UTC
[llvm-dev] Finding all pointers to functions
> On 12/22/15 4:45 AM, Russell Wallace via llvm-dev wrote: > > Oh, I just came across Function::hasAddressTaken. Maybe I can just use > that instead? > > > You could conservatively assume that any function that has its address > taken has a pointer to it that escapes into memory or external code. >I wonder if a conservative estimate of pointed-to functions would also need to include all functions with externally visible linkage? I could imagine a few routes by which that allows such a function to have its address show up in the data being handled by the Module's own code. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151228/eb73dbc0/attachment.html>
John Criswell via llvm-dev
2015-Dec-28 16:32 UTC
[llvm-dev] Finding all pointers to functions
On 12/28/15 9:50 AM, Christian Convey wrote:> > On 12/22/15 4:45 AM, Russell Wallace via llvm-dev wrote: >> Oh, I just came across Function::hasAddressTaken. Maybe I can >> just use that instead? > > You could conservatively assume that any function that has its > address taken has a pointer to it that escapes into memory or > external code. > > > I wonder if a conservative estimate of pointed-to functions would also > need to include all functions with externally visible linkage?This is correct. Any function that is externally visible can be called be external code that is outside the analysis's scope; you must therefore treat it is a function that can be called by external code (in LLVM CallGraph parlance, it can be called by the external calling node). If you're going to do this analysis, you want to do it in libLTO after the Internalize pass has been run. The Internalize pass finds functions which are externally visible (because C linkage needs them to be in order for linking to work) and makes them internal since we know that we will not link with any more LLVM bitcode files. You should get much better results that way. As an FYI, DSA tracks this sort of information using the External (E) flag on the DSNode. It can determine (provided it has sufficient precision) which heap objects are accessible to external code and which are not. Regards, John Criswell> > I could imagine a few routes by which that allows such a function to > have its address show up in the data being handled by the Module's own > code.-- John Criswell Assistant Professor Department of Computer Science, University of Rochester http://www.cs.rochester.edu/u/criswell -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151228/7e6590f5/attachment.html>