thr3ads.net - llvm dev - [LLVMdev] Google Summer of Code Idea [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Daniel Berlin

2008-Mar-05 19:27 UTC

[LLVMdev] Google Summer of Code Idea

On Wed, Mar 5, 2008 at 3:44 AM, Xi Wang <xi.wang at gmail.com>
wrote:> On Mon, Mar 3, 2008 at 6:55 PM, Richard Warburton
>
> <richard.warburton at gmail.com> wrote:
>
> > >  No
>  > >  I've got a BDD based version of andersen's around
already (uncommitted).
>  > >  It's slower in time (by a factor of 8 or so), but very
slightly (1-2
>  > >  megabytes of memory) more memory efficient on some cases.
Nowadays,
>  > >  the bitmap version is actually more memory efficient than the
BDD
>  > >  version, even if you turn off BDD caching, etc.
>  > >
>  > >  The thing you have to understand is that these context
sensitive BDD
>  > >  algorithms work okay on java, but fall down pretty badly on
C/C++,
>  > >  because it is a much bigger problem.   Whaley's is just
brute force,
>  > >  and Zhu's is pretty darn complex.
>  >
>  > I was interested to see whether Whaley's algorithm really works
>  > particularly, given its simplicity and the claims made in the papers.
>
>  I have implemented a tool based on the Phoenix compiler framework at
>  MSR, which can dump relations as BDD from C/C++ code and feed them
>  into John Whaley's bddbddb for further analysis.  As Dan said,
>  currently the performance is not good enough for C/C++ code (due to
>  more complex relations, BDD variable orders, etc.).  I am still trying
>  several optimizations.  John also told that it took quite a lot of
>  effort to get the Java version to scale.
My guess is that if you try doing variable substitution style
optimizations, you will get very very serious speedups.

Usually, out of 80-100k constraint variables, you can unify 60k of them.
In my testing, it reduces the number of constraint variables by 80+%
in non-context sensitive analysis.
bddbddb is already going to be rather good at solving the query
relations, so to get faster, you have to start reducing the size of
the problem itself.

Daniel Berlin

2008-Mar-08 03:16 UTC

head link

[LLVMdev] Google Summer of Code Idea

FWIW, Ben Hardekopf asked me to post this for him on the subject

Regarding Whaley and Lam's BDD-based approach to context-sensitive
analysis, there is a paper by the same research group (Avots et al in
ICSE'05) that adapts their technique for C. The results are not very
encouraging -- their system doesn't scale nearly as well for C as it did
for Java. The primary difference is that unlike Java, in C top-level
variables can have their address taken and be passed between functions,
which requires the analysis to maintain 2 contexts for each points-to
relation: one for the pointer and one for the pointee.

The other problem with their method is that only the top-level variables
are treated context-sensitively; everything in the heap (e.g. for Java,
all object fields) is context-insensitive. I spoke with John Whaley about
this and he said they've tried to use a context-sensitive heap model, but
they couldn't get it to scale at all.

My personal feeling (backed up by results from some other research groups,
e.g. Chen et al in ISPAN'02 and Nystrom et al in PASTE'04) is that a
context-sensitive heap model is critical for getting the maximum benefit
from a context-sensitive pointer analysis, and I don't know that BDDs, at
least as currently used, are the right approach for this.

I was impressed by a completely different approach, described by Nystrom
et al in SAS'04 and Nystrom's PhD thesis, for doing context-sensitive
pointer analysis for C with a context-sensitive heap model. Like Whaley
and Lam, and unlike Lattner and Adve's Data Structure Analysis, their
technique is inclusion-based (i.e. a context-sensitive version of
Andersen's), but unlike Whaley and Lam it also uses a context-sensitive
heap model, and it's targeted at C. They managed to scale the analysis to
a couple hundred thousand lines of code. I actually have several ideas on
how to improve their analysis and make it more scalable, but I haven't had
time to implement them -- mainly because I haven't wanted to re-implement
their whole analysis in LLVM. I would definitely be interested in any
effort to do this, and I think it would have a lot of benefit. In fact, I
supervised another student who implemented a simplified version of this
analysis for a class project; while the code itself probably isn't
re-usable, the student made a number of good observations on the analysis
that I would be happy to share with anyone who is interested.

Thanks,

Ben

Richard Warburton

2008-Mar-08 11:23 UTC

head link

[LLVMdev] Google Summer of Code Idea

> Regarding Whaley and Lam's BDD-based approach to context-sensitive
> analysis ...<snip>

Thanks for posting this. In which case its probably a good idea, if I
don't go down this route to context sensitivity.
>  I was impressed by a completely different approach, described by Nystrom
>  et al in SAS'04 and Nystrom's PhD thesis, for doing
context-sensitive
>  pointer analysis for C with a context-sensitive heap model. Like Whaley
>  and Lam, and unlike Lattner and Adve's Data Structure Analysis, their
>  technique is inclusion-based (i.e. a context-sensitive version of
>  Andersen's), but unlike Whaley and Lam it also uses a
context-sensitive
>  heap model, and it's targeted at C. They managed to scale the analysis
to
>  a couple hundred thousand lines of code. I actually have several ideas on
>  how to improve their analysis and make it more scalable, but I haven't
had
>  time to implement them -- mainly because I haven't wanted to
re-implement
>  their whole analysis in LLVM. I would definitely be interested in any
>  effort to do this, and I think it would have a lot of benefit. In fact, I
>  supervised another student who implemented a simplified version of this
>  analysis for a class project; while the code itself probably isn't
>  re-usable, the student made a number of good observations on the analysis
>  that I would be happy to share with anyone who is interested.
I'll have a read through Nystrom's approach.  Are there publically
availible implementations of this algorithm anywhere else?  If he is
already in this region in terms of scalability, then improvements
might make a context sensitive analysis actually practical.  Either
way if the claims of this paper hold true then it seems like a viable
option for implementation.  Would it be possible to get Ben
Hardekopf's suggested improvements and his student's ideas posted to
the list?  Regards,

  Richard

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Mar 2008 - [LLVMdev] Google Summer of Code Idea

[LLVMdev] Google Summer of Code Idea

[LLVMdev] Google Summer of Code Idea

[LLVMdev] Google Summer of Code Idea

Maybe Matching Threads