Dear LLVMers, my name is Douglas, and I would like to participate in the Summer of Code this year. I am currently a Computer Science student at the Federal University of Minas Gerais, Brazil, and I work as a research assistant at the Programming Languages Lab, in that university. I work together with Andre Tavares and Andrei Rimsa, two summer of coders last year, and my advisor is Fernando Pereira, a Summer of Coder in 2006. I would like to work with range analysis. This is a super-set of bitwidth analysis. The objective is to find an approximation for the range of integer values that a variable may assume. Bitwidth analysis was initially part of Andre's Summer of Code, but it was not finished. I took over Andre's project, and I have already started working on range analysis. Currently I can find the range of variables used in simple loops, but there is still a lot to be done. Range analysis would be useful in a number of scenarios, for instance: 1) To eliminate overflow tests in dynamically compiled languages, such as java script, where numbers are represented as floating point values. A optimizing compiler may try to convert these numbers to integers, under the carpet, but it must keep the overflow tests, in case the values are larger than 32 bits. This is how the traceMonkey compiler does in JavaScript, for instance. 2) To reduce the size of variables, in order to keep more of them in registers. X86, as you know, has some aliased registers, but very few variables fit in there, as these registers require 8-bit values. Programmers generally declare variables with bigger types. It would be nice if an optimizing compiler could infer that the variables are small enough to fit into the eight-bit registers, so that less variables are sent to memory. 3) As a more general ABCD algorithm, that tries to remove array bound checks. Given that we have an approximation of the range of the variables, then it is possible to remove array bound checks of accesses that we prove that are always inside the limits of the arrays. 4) In security analyses. For instance, to detect buffer overflows. If we know that the range of a variable might be greater than the buffer that the variable ranges over, then a overflow might happen. Well, there are many other applications. In short, I would like to know if the LLVM community would be interested in such a project. I have been working on this for two months, and I am pretty confident I can implement a good range analysis. I am basing my work on a paper by Zhendong and Wagner: "A class of polynomially solvable range constraints for interval analysis without widenings". I have opted for this work, instead of Stephenson's more well known paper because I found Zhendong's easier to understand and implement. As I told you, I can already infer correct ranges on very simple loops, such as: for (int i = 0; i < 10; i++) { printf("%d\n", i); } In this case, I can find out that i is in [0, 0] before the loop, and is in [0, 9] inside, and in [10, +INF] outside. There is, of course, a lot of work to be done: handling loops limited by variables, nested loops, nested conditionals, etc, but I think that, with the help of a mentor, plus the help of the past Summer of Coders in my lab, I can do a good job. Thanks a lot, Douglas
douglas at dcc.ufmg.br wrote:> Dear LLVMers, > > [snip] > Well, there are many other applications. In short, I would like to know if > the LLVM community would be interested in such a project. I have been > working on this for two months, and I am pretty confident I can implement > a good range analysis. I am basing my work on a paper by Zhendong and > Wagner: "A class of polynomially solvable range constraints for interval > analysis without widenings". I have opted for this work, instead of > Stephenson's more well known paper because I found Zhendong's easier to > understand and implement. As I told you, I can already infer correct > ranges on very simple loops, such as: >I believe the SAFECode project (http://safecode.cs.illinois.edu) could benefit from a strong range analysis since we want to do better static array bounds checking. Range analysis could also benefit our points-to analysis algorithm. I have a few questions about your proposed analysis: 1) Would your range analysis be more accurate or work in more scenarios than the current SCEV pass? Is it primarily designed for finding value ranges within loops, or can it find accurate value ranges for values outside of loops, too? 2) Would your analysis be intra-procedural or inter-procedural? -- John T.> for (int i = 0; i < 10; i++) { > printf("%d\n", i); > } > > In this case, I can find out that i is in [0, 0] before the loop, and is > in [0, 9] inside, and in [10, +INF] outside. There is, of course, a lot of > work to be done: handling loops limited by variables, nested loops, nested > conditionals, etc, but I think that, with the help of a mentor, plus the > help of the past Summer of Coders in my lab, I can do a good job. > > Thanks a lot, > > Douglas > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
>> Well, there are many other applications. In short, I would like to know >> if >> the LLVM community would be interested in such a project. I have been >> working on this for two months, and I am pretty confident I can implement >> a good range analysis. I am basing my work on a paper by Zhendong and >> Wagner: "A class of polynomially solvable range constraints for interval >> analysis without widenings". I have opted for this work, instead of >> Stephenson's more well known paper because I found Zhendong's easier to >> understand and implement. As I told you, I can already infer correct >> ranges on very simple loops, such as: >> > > I believe the SAFECode project (http://safecode.cs.illinois.edu) could > benefit from a strong range analysis since we want to do better static > array bounds checking. Range analysis could also benefit our points-to > analysis algorithm.I second that. I could also find several uses for such an analysis. I wrote a very simple one for llvm 2.6, but it doesn't work with trunk anymore. Integrating a range analysis in llvm comes very handy to implement several optimizations.> I have a few questions about your proposed analysis: > > 1) Would your range analysis be more accurate or work in more scenarios > than the current SCEV pass? Is it primarily designed for finding value > ranges within loops, or can it find accurate value ranges for values > outside of loops, too?As far I as understand, it would subsume SCEV (by maybe reusing its code). I think a good interface for this analsysis would follow alias analysis, like Value* -> range. Also, will it be path- or flow-sensitive?> 2) Would your analysis be intra-procedural or inter-procedural?If inter-procedural, it should be optional. Nuno
Hi Douglas,> I would like to work with range analysis. This is a super-set of > bitwidth analysis. The objective is to find an approximation for the > range of integer values that a variable may assume. Bitwidth analysis > was initially part of Andre's Summer of Code, but it was not finished. > I took over Andre's project, and I have already started working on > range analysis. Currently I can find the range of variables used in > simple loops, but there is still a lot to be done. Range analysis > would be useful in a number of scenarios, for instance:Chris started adding a "lazy value info" infrastructure which I think he prefers to SSI and friends (see lib/Analysis/LazyValueInfo.cpp). Ciao, Duncan.
Dear John and Nuno, well, the analysis that I am working right now is intra-procedural. It should be more general than SCEV, as it finds ranges outside the loops. Indeed, it is very similar to Stephenson's original bitwidth analysis (see Bitwidth Analysis with Application to Silicon Compilation), but it is only forward, it does not have the backward pass. Stephenson would assume, for instance, that when we read an array position, we are inside the boundaries of the array, and would use this to improve the precision of his analysis. Of course, he assumes that the program is correct, and you only read valid array positions. Zhendong does not make this assumption. The analysis is flow sensitive, because it relies on the SSI framework that Andre Tavares has implemented onto LLVM. And it is sparse, in the sense that it maps variables, e.g Values, to ranges. So, like Nuno had already suggested, the interface would give the client a map Value* -> Range. That is the good thing about using the SSI framework: we do not have to worry about program points, and can encode information directly into variables, as we discover them from assignments and the result of conditionals. Best,
Hi, LLVM'ers, I have written a proposal for a Summer of Code project, and I put it on the wiki of our lab, here: http://www2.dcc.ufmg.br/laboratorios/llp/wiki/doku.php?id=douglas_soc. If possible, I would like to receive some comments, as I have still some days to change it until the deadline, which is on April 9th. My adviser, and a colleague from the lab have also reviewed the proposal, so the English should be at least 'palatable'. One thing that I would like to receive suggestions is about the best way to test my analysis. I am proposing to implement a range analysis pass on LLVM. In order to test it, I am proposing to implement a pass that changes the types of variables used in the LLVM ir, in such a way that we can convert 32-bit integers (i32) to 16, 8 or even 1-bit integers. This pass would, for instance, improve register allocation in the x86, which allows to place two small variables into registers such as AX, BX, CX and DX. Warm regards, Douglas
Hello community, My proposal is available at: http://socghop.appspot.com/gsoc/student_proposal/show/google/gsoc2010/douglasdocouto/t127016641282 I intend to change it until the deadline. All suggestions are welcome. Best, Douglas