This email is written on the premise that LLVM will be involved in GSOC again this year. I noted that the wiki's open projects page [0] has several possible projects, that seem suitable for a summer of code project. I am writing this email to this list with the hope of getting some feedback on specifics for a GSOC application, and also wondering if any potential mentors are interested in these ideas. The Idea: I would be looking to implement a context-sensitive alias analysis. Whilst I accept that this isn't the most efficient category of alias analysis, I think the implementation of such an analysis, that trades off computational efficiency for a more precise computation of aliasing would be a beneficial addition to the LLVM project. In terms of which algorithm to actually implement, I am considering implementing a BDD based algorithm, possible candidate include: Whaley + Lam [2] which simply clones points on the call-graph to generate different contexts and then performs a context insensitive analysis. This uses BDDs to reduce memory consumption. Zhu [3] Which utilises a BDD base approach for a custom context sensitive, flow insenstitive algorithm, Were I to use a BDD based approach, I would undoubtedly rely on an existing implementation, for example buddy[4], rather than implementing my own. The opinions of any experienced LLVM hackers would be helpful on algorithm choice. I am reasonably flexible about choosing a different . At the moment I am personally undecided. I am interested to hear any feedback on these ideas, specifically, but in no particular order, with regards the following points: 1. Is this too ambitious for a google summer of code project? 2. Would implementing any of these algorithms be tricky due to some peculiarity of LLVM? 3. Do existing optimisations and DFAs that depend on aliasing information interface with the alias analysis in a context sensitive manner, or do they require rewriting? 4. Is anyone willing to mentor this project? Me: I am PhD student at the University of Warwick, England, where my current research is focused on specifying compiler optimisations in a domain specification programming language and formally verifying the soundness of those optimisations. As part of this project I am in the process of implementing a tool for optimising Java Byte code by compiling optimising phases from these specifications. During a final year project I also co-wrote a compiler that worked on parallelising Java source code. I am also well versed in traditional compiler design literature, including lattice based data flow analysis. Any feedback on the aforementioned ideas would be welcome. Regards, Richard Warburton [0] http://llvm.org/OpenProjects.html [1] http://llvm.org/docs/AliasAnalysis.html [2] Whaley & Lam, Cloning based Context-Sensitive Pointer Alias Analysis Using Binary Decision Diagrams, Programming Language Design & Implementation 2004 [3] Jianwen Zhu, Symbolic Pointer Analysis, International Conference on Computer Aided Design, 2002 [4] http://sourceforge.net/projects/buddy
On Feb 28, 2008, at 10:43 AM, Richard Warburton wrote:> This email is written on the premise that LLVM will be involved in > GSOC again this year.I hope so too!> I would be looking to implement a context-sensitive alias analysis. > Whilst I accept that this isn't the most efficient category of alias > analysis, I think the implementation of such an analysis, that trades > off computational efficiency for a more precise computation of > aliasing would be a beneficial addition to the LLVM project. In terms > of which algorithm to actually implement, I am considering > implementing a BDD based algorithm, possible candidate include:Ok. I think the most important thing to keep in mind, if you want this to be useful for LLVM, is for it to be sound in the presence of incomplete programs. I think it would be very interesting to have a BDD based analysis in LLVM, it would be useful for performance comparisons and many other things, even if it isn't turned on by default. However, it must be sound. Also, LLVM benefits quite a bit from mod/ref info for function. I don't know if you've thought about it at all, but it is an important problem. If you're interested, my thesis describes these issues in detail.> 1. Is this too ambitious for a google summer of code project?It depends on your familiarity with the domain. If you haven't worked in the area of alias analysis (and applications) it probably is. There are lot of smaller subprojects that would be useful for llvm though.> 2. Would implementing any of these algorithms be tricky due to some > peculiarity of LLVM?No, I don't think so. LLVM has hosted a lot of AA work already.> 3. Do existing optimisations and DFAs that depend on aliasing > information interface with the alias analysis in a context sensitive > manner, or do they require rewriting?They can benefit from context sensitive mod/ref info.> 4. Is anyone willing to mentor this project?I don't know :) Maybe someone from Vikram's research group would be willing to mentor you. If you are interested in pursuing this, I'd suggest making a proposal and we can figure out mentorship when the time is closer, -Chris
> Ok. I think the most important thing to keep in mind, if you want > this to be useful for LLVM, is for it to be sound in the presence of > incomplete programs. I think it would be very interesting to have a > BDD based analysis in LLVM, it would be useful for performance > comparisons and many other things, even if it isn't turned on by > default. However, it must be sound.Both of my suggested algorithms have been implemented in Java - which, by virtue of reflection, means one cannot statically determine which class certain object are referring to. In this instance I believe they simply assume the most conservative case (ie mayuse/maydef at that point could be anything). I suspect that this approach could be equally applied to an unknown library.> Also, LLVM benefits quite a bit from mod/ref info for function. I > don't know if you've thought about it at all, but it is an important > problem. If you're interested, my thesis describes these issues in > detail.I'll peruse this, are there any other relevant, LLVM specific texts that are appropriate for this, and not linked from the documentation page[0] ?> > 1. Is this too ambitious for a google summer of code project? > > It depends on your familiarity with the domain. If you haven't worked > in the area of alias analysis (and applications) it probably is. > There are lot of smaller subprojects that would be useful for llvm > though.In order that I may be to gauge what options there are, can you suggest some examples of these subprojects. regards, Richard Warburton [0] http://llvm.cs.uiuc.edu/docs/
On Thu, Feb 28, 2008 at 1:43 PM, Richard Warburton <richard.warburton at gmail.com> wrote:> This email is written on the premise that LLVM will be involved in > GSOC again this year. I noted that the wiki's open projects page [0] > has several possible projects, that seem suitable for a summer of code > project. I am writing this email to this list with the hope of > getting some feedback on specifics for a GSOC application, and also > wondering if any potential mentors are interested in these ideas. > > The Idea: > > Were I to use a BDD based approach, I would undoubtedly rely on an > existing implementation, for example buddy[4], rather than > implementing my own. The opinions of any experienced LLVM hackers > would be helpful on algorithm choice. I am reasonably flexible about > choosing a different . At the moment I am personally undecided. I am > interested to hear any feedback on these ideas, specifically, but in > no particular order, with regards the following points: > > 1. Is this too ambitious for a google summer of code project? > 2. Would implementing any of these algorithms be tricky due to some > peculiarity of LLVM?No I've got a BDD based version of andersen's around already (uncommitted). It's slower in time (by a factor of 8 or so), but very slightly (1-2 megabytes of memory) more memory efficient on some cases. Nowadays, the bitmap version is actually more memory efficient than the BDD version, even if you turn off BDD caching, etc. The thing you have to understand is that these context sensitive BDD algorithms work okay on java, but fall down pretty badly on C/C++, because it is a much bigger problem. Whaley's is just brute force, and Zhu's is pretty darn complex. Both are amenable to the same style of offline variable optimizations that Hardekopf, et al (Ben and a friend have been helping out with implementation of andersen's in LLVM), and that may actually make them usable for real world programs.> 3. Do existing optimisations and DFAs that depend on aliasing > information interface with the alias analysis in a context sensitive > manner, or do they require rewriting?It should work okay.> 4. Is anyone willing to mentor this project?I would. I think you will be surprised how badly these algorithms work on even moderate sized C++ programs :(
> No > I've got a BDD based version of andersen's around already (uncommitted). > It's slower in time (by a factor of 8 or so), but very slightly (1-2 > megabytes of memory) more memory efficient on some cases. Nowadays, > the bitmap version is actually more memory efficient than the BDD > version, even if you turn off BDD caching, etc. > > The thing you have to understand is that these context sensitive BDD > algorithms work okay on java, but fall down pretty badly on C/C++, > because it is a much bigger problem. Whaley's is just brute force, > and Zhu's is pretty darn complex.I was interested to see whether Whaley's algorithm really works particularly, given its simplicity and the claims made in the papers.> Both are amenable to the same style of offline variable optimizations > that Hardekopf, et al (Ben and a friend have been helping out with > implementation of andersen's in LLVM), and that may actually make them > usable for real world programs.Are you referring to both the PLDI and the SAS papers?> > 4. Is anyone willing to mentor this project? > I would.Thanks ;)> I think you will be surprised how badly these algorithms work on even > moderate sized C++ programs :(I'm slightly confused here as to whether you mean in terms of precision of analysis, or compute time taken by running the algorithm. Richard