thr3ads.net - llvm dev - [LLVMdev] Google Summer of Code Idea [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Richard Warburton

2008-Feb-28 18:43 UTC

[LLVMdev] Google Summer of Code Idea

This email is written on the premise that LLVM will be involved in
GSOC again this year.  I noted that the wiki's open projects page [0]
has several possible projects, that seem suitable for a summer of code
project.  I am writing this email to this list with the hope of
getting some feedback on specifics for a GSOC application, and also
wondering if any potential mentors are interested in these ideas.

The Idea:

I would be looking to implement a context-sensitive alias analysis.
Whilst I accept that this isn't the most efficient category of alias
analysis, I think the implementation of such an analysis, that trades
off computational efficiency for a more precise computation of
aliasing would be a beneficial addition to the LLVM project.  In terms
of which algorithm to actually implement, I am considering
implementing a BDD based algorithm, possible candidate include:

Whaley + Lam [2] which simply clones points on the call-graph to
generate different contexts and then performs a context insensitive
analysis.  This uses BDDs to reduce memory consumption.
Zhu [3] Which utilises a BDD base approach for a custom context
sensitive, flow insenstitive algorithm,

Were I to use a BDD based approach, I would undoubtedly rely on an
existing implementation, for example buddy[4], rather than
implementing my own. The opinions of any experienced LLVM hackers
would be helpful on algorithm choice.  I am reasonably flexible about
choosing a different .  At the moment I am personally undecided.  I am
interested to hear any feedback on these ideas, specifically, but in
no particular order, with regards the following points:

1. Is this too ambitious for a google summer of code project?
2. Would implementing any of these algorithms be tricky due to some
peculiarity of LLVM?
3. Do existing optimisations and DFAs that depend on aliasing
information interface with the alias analysis in a context sensitive
manner, or do they require rewriting?
4. Is anyone willing to mentor this project?

Me:

I am PhD student at the University of Warwick, England, where my
current research is focused on specifying compiler optimisations in a
domain specification programming language and formally verifying the
soundness of those optimisations.  As part of this project I am in the
process of implementing a tool for optimising Java Byte code by
compiling optimising phases from these specifications.  During a final
year project I also co-wrote a compiler that worked on parallelising
Java source code.  I am also well versed in traditional compiler
design literature, including lattice based data flow analysis.

Any feedback on the aforementioned ideas would be welcome.  Regards,

  Richard Warburton

[0] http://llvm.org/OpenProjects.html
[1] http://llvm.org/docs/AliasAnalysis.html
[2] Whaley & Lam, Cloning based Context-Sensitive Pointer Alias
Analysis Using Binary Decision Diagrams, Programming Language Design &
Implementation 2004
[3] Jianwen Zhu, Symbolic Pointer Analysis, International Conference
on Computer Aided Design, 2002
[4] http://sourceforge.net/projects/buddy

Chris Lattner

2008-Mar-01 09:09 UTC

head link

[LLVMdev] Google Summer of Code Idea

On Feb 28, 2008, at 10:43 AM, Richard Warburton wrote:
> This email is written on the premise that LLVM will be involved in
> GSOC again this year.
I hope so too!
> I would be looking to implement a context-sensitive alias analysis.
> Whilst I accept that this isn't the most efficient category of alias
> analysis, I think the implementation of such an analysis, that trades
> off computational efficiency for a more precise computation of
> aliasing would be a beneficial addition to the LLVM project.  In terms
> of which algorithm to actually implement, I am considering
> implementing a BDD based algorithm, possible candidate include:
Ok.  I think the most important thing to keep in mind, if you want  
this to be useful for LLVM, is for it to be sound in the presence of  
incomplete programs.  I think it would be very interesting to have a  
BDD based analysis in LLVM, it would be useful for performance  
comparisons and many other things, even if it isn't turned on by  
default.  However, it must be sound.

Also, LLVM benefits quite a bit from mod/ref info for function.  I  
don't know if you've thought about it at all, but it is an important  
problem.  If you're interested, my thesis describes these issues in  
detail.
> 1. Is this too ambitious for a google summer of code project?
It depends on your familiarity with the domain.  If you haven't worked  
in the area of alias analysis (and applications) it probably is.   
There are lot of smaller subprojects that would be useful for llvm  
though.
> 2. Would implementing any of these algorithms be tricky due to some
> peculiarity of LLVM?
No, I don't think so.  LLVM has hosted a lot of AA work already.
> 3. Do existing optimisations and DFAs that depend on aliasing
> information interface with the alias analysis in a context sensitive
> manner, or do they require rewriting?
They can benefit from context sensitive mod/ref info.
> 4. Is anyone willing to mentor this project?
I don't know :)  Maybe someone from Vikram's research group would be  
willing to mentor you.  If you are interested in pursuing this, I'd  
suggest making a proposal and we can figure out mentorship when the  
time is closer,

-Chris

Richard Warburton

2008-Mar-01 11:40 UTC

head link

[LLVMdev] Google Summer of Code Idea

>  Ok.  I think the most important thing to keep in mind, if you want
>  this to be useful for LLVM, is for it to be sound in the presence of
>  incomplete programs.  I think it would be very interesting to have a
>  BDD based analysis in LLVM, it would be useful for performance
>  comparisons and many other things, even if it isn't turned on by
>  default.  However, it must be sound.
Both of my suggested algorithms have been implemented in Java - which,
by virtue of reflection, means one cannot statically determine which
class certain object are referring to.  In this instance I believe
they simply assume the most conservative case (ie mayuse/maydef at
that point could be anything).  I suspect that this approach could be
equally applied to an unknown library.
>  Also, LLVM benefits quite a bit from mod/ref info for function.  I
>  don't know if you've thought about it at all, but it is an
important
>  problem.  If you're interested, my thesis describes these issues in
>  detail.
I'll peruse this, are there any other relevant, LLVM specific texts
that are appropriate for this, and not linked from the documentation
page[0] ?
>  > 1. Is this too ambitious for a google summer of code project?
>
>  It depends on your familiarity with the domain.  If you haven't worked
>  in the area of alias analysis (and applications) it probably is.
>  There are lot of smaller subprojects that would be useful for llvm
>  though.
In order that I may be to gauge what options there are, can you
suggest some examples of these subprojects.

regards,

  Richard Warburton

[0] http://llvm.cs.uiuc.edu/docs/

Daniel Berlin

2008-Mar-03 04:29 UTC

head link

[LLVMdev] Google Summer of Code Idea

On Thu, Feb 28, 2008 at 1:43 PM, Richard Warburton
<richard.warburton at gmail.com> wrote:> This email is written on the premise that LLVM will be involved in
>  GSOC again this year.  I noted that the wiki's open projects page [0]
>  has several possible projects, that seem suitable for a summer of code
>  project.  I am writing this email to this list with the hope of
>  getting some feedback on specifics for a GSOC application, and also
>  wondering if any potential mentors are interested in these ideas.
>
>  The Idea:
>
>  Were I to use a BDD based approach, I would undoubtedly rely on an
>  existing implementation, for example buddy[4], rather than
>  implementing my own. The opinions of any experienced LLVM hackers
>  would be helpful on algorithm choice.  I am reasonably flexible about
>  choosing a different .  At the moment I am personally undecided.  I am
>  interested to hear any feedback on these ideas, specifically, but in
>  no particular order, with regards the following points:
>
>  1. Is this too ambitious for a google summer of code project?
>  2. Would implementing any of these algorithms be tricky due to some
>  peculiarity of LLVM?No
I've got a BDD based version of andersen's around already (uncommitted).
It's slower in time (by a factor of 8 or so), but very slightly (1-2
megabytes of memory) more memory efficient on some cases. Nowadays,
the bitmap version is actually more memory efficient than the BDD
version, even if you turn off BDD caching, etc.

The thing you have to understand is that these context sensitive BDD
algorithms work okay on java, but fall down pretty badly on C/C++,
because it is a much bigger problem.   Whaley's is just brute force,
and Zhu's is pretty darn complex.

Both are amenable to the same style of offline variable optimizations
that Hardekopf, et al (Ben and a friend have been helping out with
implementation of andersen's in LLVM), and that may actually make them
usable for real world programs.
>  3. Do existing optimisations and DFAs that depend on aliasing
>  information interface with the alias analysis in a context sensitive
>  manner, or do they require rewriting?It should work okay.
>  4. Is anyone willing to mentor this project?I would.

I think you will be surprised how badly these algorithms work on even
moderate sized C++ programs :(

Richard Warburton

2008-Mar-03 10:55 UTC

head link

[LLVMdev] Google Summer of Code Idea

>  No
>  I've got a BDD based version of andersen's around already
(uncommitted).
>  It's slower in time (by a factor of 8 or so), but very slightly (1-2
>  megabytes of memory) more memory efficient on some cases. Nowadays,
>  the bitmap version is actually more memory efficient than the BDD
>  version, even if you turn off BDD caching, etc.
>
>  The thing you have to understand is that these context sensitive BDD
>  algorithms work okay on java, but fall down pretty badly on C/C++,
>  because it is a much bigger problem.   Whaley's is just brute force,
>  and Zhu's is pretty darn complex.
I was interested to see whether Whaley's algorithm really works
particularly, given its simplicity and the claims made in the papers.
>  Both are amenable to the same style of offline variable optimizations
>  that Hardekopf, et al (Ben and a friend have been helping out with
>  implementation of andersen's in LLVM), and that may actually make them
>  usable for real world programs.
Are you referring to both the PLDI and the SAS papers?
>  >  4. Is anyone willing to mentor this project?
>  I would.
Thanks ;)
>  I think you will be surprised how badly these algorithms work on even
>  moderate sized C++ programs :(
I'm slightly confused here as to whether you mean in terms of
precision of analysis, or compute time taken by running the algorithm.

  Richard

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Mar 2008 - [LLVMdev] Google Summer of Code Idea

[LLVMdev] Google Summer of Code Idea

[LLVMdev] Google Summer of Code Idea

[LLVMdev] Google Summer of Code Idea

[LLVMdev] Google Summer of Code Idea

[LLVMdev] Google Summer of Code Idea

Reasonably Related Threads