On Wed, Mar 5, 2008 at 3:44 AM, Xi Wang <xi.wang at gmail.com> wrote:> On Mon, Mar 3, 2008 at 6:55 PM, Richard Warburton > > <richard.warburton at gmail.com> wrote: > > > > No > > > I've got a BDD based version of andersen's around already (uncommitted). > > > It's slower in time (by a factor of 8 or so), but very slightly (1-2 > > > megabytes of memory) more memory efficient on some cases. Nowadays, > > > the bitmap version is actually more memory efficient than the BDD > > > version, even if you turn off BDD caching, etc. > > > > > > The thing you have to understand is that these context sensitive BDD > > > algorithms work okay on java, but fall down pretty badly on C/C++, > > > because it is a much bigger problem. Whaley's is just brute force, > > > and Zhu's is pretty darn complex. > > > > I was interested to see whether Whaley's algorithm really works > > particularly, given its simplicity and the claims made in the papers. > > I have implemented a tool based on the Phoenix compiler framework at > MSR, which can dump relations as BDD from C/C++ code and feed them > into John Whaley's bddbddb for further analysis. As Dan said, > currently the performance is not good enough for C/C++ code (due to > more complex relations, BDD variable orders, etc.). I am still trying > several optimizations. John also told that it took quite a lot of > effort to get the Java version to scale.My guess is that if you try doing variable substitution style optimizations, you will get very very serious speedups. Usually, out of 80-100k constraint variables, you can unify 60k of them. In my testing, it reduces the number of constraint variables by 80+% in non-context sensitive analysis. bddbddb is already going to be rather good at solving the query relations, so to get faster, you have to start reducing the size of the problem itself.
FWIW, Ben Hardekopf asked me to post this for him on the subject Regarding Whaley and Lam's BDD-based approach to context-sensitive analysis, there is a paper by the same research group (Avots et al in ICSE'05) that adapts their technique for C. The results are not very encouraging -- their system doesn't scale nearly as well for C as it did for Java. The primary difference is that unlike Java, in C top-level variables can have their address taken and be passed between functions, which requires the analysis to maintain 2 contexts for each points-to relation: one for the pointer and one for the pointee. The other problem with their method is that only the top-level variables are treated context-sensitively; everything in the heap (e.g. for Java, all object fields) is context-insensitive. I spoke with John Whaley about this and he said they've tried to use a context-sensitive heap model, but they couldn't get it to scale at all. My personal feeling (backed up by results from some other research groups, e.g. Chen et al in ISPAN'02 and Nystrom et al in PASTE'04) is that a context-sensitive heap model is critical for getting the maximum benefit from a context-sensitive pointer analysis, and I don't know that BDDs, at least as currently used, are the right approach for this. I was impressed by a completely different approach, described by Nystrom et al in SAS'04 and Nystrom's PhD thesis, for doing context-sensitive pointer analysis for C with a context-sensitive heap model. Like Whaley and Lam, and unlike Lattner and Adve's Data Structure Analysis, their technique is inclusion-based (i.e. a context-sensitive version of Andersen's), but unlike Whaley and Lam it also uses a context-sensitive heap model, and it's targeted at C. They managed to scale the analysis to a couple hundred thousand lines of code. I actually have several ideas on how to improve their analysis and make it more scalable, but I haven't had time to implement them -- mainly because I haven't wanted to re-implement their whole analysis in LLVM. I would definitely be interested in any effort to do this, and I think it would have a lot of benefit. In fact, I supervised another student who implemented a simplified version of this analysis for a class project; while the code itself probably isn't re-usable, the student made a number of good observations on the analysis that I would be happy to share with anyone who is interested. Thanks, Ben
> Regarding Whaley and Lam's BDD-based approach to context-sensitive > analysis ...<snip> Thanks for posting this. In which case its probably a good idea, if I don't go down this route to context sensitivity.> I was impressed by a completely different approach, described by Nystrom > et al in SAS'04 and Nystrom's PhD thesis, for doing context-sensitive > pointer analysis for C with a context-sensitive heap model. Like Whaley > and Lam, and unlike Lattner and Adve's Data Structure Analysis, their > technique is inclusion-based (i.e. a context-sensitive version of > Andersen's), but unlike Whaley and Lam it also uses a context-sensitive > heap model, and it's targeted at C. They managed to scale the analysis to > a couple hundred thousand lines of code. I actually have several ideas on > how to improve their analysis and make it more scalable, but I haven't had > time to implement them -- mainly because I haven't wanted to re-implement > their whole analysis in LLVM. I would definitely be interested in any > effort to do this, and I think it would have a lot of benefit. In fact, I > supervised another student who implemented a simplified version of this > analysis for a class project; while the code itself probably isn't > re-usable, the student made a number of good observations on the analysis > that I would be happy to share with anyone who is interested.I'll have a read through Nystrom's approach. Are there publically availible implementations of this algorithm anywhere else? If he is already in this region in terms of scalability, then improvements might make a context sensitive analysis actually practical. Either way if the claims of this paper hold true then it seems like a viable option for implementation. Would it be possible to get Ben Hardekopf's suggested improvements and his student's ideas posted to the list? Regards, Richard