Douglas do Couto Teixeira
2011-Mar-23 13:06 UTC
[LLVMdev] Range Analysis GSoC 2011 Proposal
Dear LLVM community, I would like to contribute to LLVM in the Google Summer of Code project. My proposal is listed below. Please let me know your comments. Adding Range Analysis to LLVM Abstract The objective of this work is patch our implementation of range analysis into LLVM. I have a running implementation of range analysis in LLVM, but it is not currently part of the main distribution. I propose to integrate our range analysis implementation into the Lazy Value Info (LVI) interface that LLVM provides. Range analysis finds the intervals of values that may be bound to the integer variables during the execution of programs and is useful in several scenarios: constant propagation, detection of potential buffer overflow attacks, dead branch elimination, array bound check elimination, elimination of overflow tests in scripting languages such as JavaScript and Lua, etc. Objective The objective of this project is to augment LLVM with range analysis. We will do this integration by patching the current implementation of range analysis that we have onto the Lazy Value Info (LVI) interface that LLVM already provides. In addition, we will develop new optimizations using LVI. In particular, we will provide a pass that performs conditional constant propagation [5], and elimination of dead-branches. Criteria of Success - To improve substantially the precision of the current implementation of LVI. Currently, LVI’s interface only allows a client to know if a variable contains a constant. We want to allow LVI to report that a variable either contains a constant, or a known-range. - To improve the current implementation of constant propagation that LLVM uses. We hope to obtain a small performance gain on the C benchmarks in the LLVM test suite, and a larger gain on Java programs that are compiled using VMKit. - To improve the implementation of JumpThreading, in such a way that more dead-branches will be eliminated. Again, we hope to achieve a small speed-up on the C benchmarks, and a larger speed-up on the Java benchmarks. Background Range Analysis is a technique that maps integer variables to the possible ranges of values that they may assume through out the execution of a program. Thus, for each integer variable, a range analysis determines its lower and upper limits. A very simple range analysis would, for instance, map each variable to the limits imposed by its type. That is, an 8-bit unsigned integer variable can be correctly mapped to the interval [0, 255], and an 16-bit signed integer can be mapped to [-32767, 32766]. However, the precision of this analysis can greatly be improved from information inferred from the program text. Ideally this range should be as constrained as possible, so that an optimizing compiler could learn more information about each variable. However, the range analysis must be conservative, that is, it will only constraint the range of a variable if it can prove that it is safe to do so. As an example, consider the program: i = read(); if (i < 10) { print (i + 1); else { print(i - 1); } In this program we know, from the conditional test, that the value of ‘i’ in the true side of the branch is in the range [-INF, 9], and in the false side is in the range [10, +INF]. During the Summer of Code 2010 I have designed and implemented, under the orientation of Duncan Sands, a non-iterative range analysis algorithm. Our implementation is currently fully functional, been able to analyze the whole LLVM test suite. For more details, see [4]. However, this implementation has never been integrated into the LLVM main trunc, for two reasons: 1. We use an intermediate representation called extended static assignment form [6], which the LLVM contributors were reluctant to use; 2. During the SoC 2010 we did not have time to completely finish our implementation, and runtime numbers were available only by the end of 2010. 3. There was not really an infra-structure already in place in LLVM to take benefit of our analysis. In order to address the first item, we propose to integrate the intermediate representation directly into our analysis, yet, as a module that can be used in separate by other clients, if necessary. We are splitting the live ranges of variables using single-arity phi-functions, which are automatically handled by the SSA elimination pass that LLVM already includes. This live range splitting is only necessary for greater precision. We can do our live range analysis without it, although the results are less precise. A previous Summer of Code, authored by Andre Tavares, has shown that the e-SSA form increases the number of phi-functions in the program code by less than 10%, and it is very fast to build [6]. The second item of our list of hindrances is no longer a problem. Our implementation is ready for use. We have been able to analyze the whole LLVM test suite, plus SPEC CPU 2006 - over 4 million LLVM bytecoes - in 44 seconds on a 2.4GHz machine. We obtain non-trivial bit size reductions for the small benchmarks, having results that match those found by previous, non-conservative works, such as Stephenson’s et al’s [2]. Moreover, our implementation is based on a very modern algorithm, by Su and Wagner [1], augmented with Gawlitza’s technique to handle cycles [3]. We believe that this is the fastest implementation of such an analysis. Finally, the third item is also no longer a problem. Presently LLVM offers the Lazy Value Info interface that reports when variables are constants. The current LVI implementation also provides infra-structure to deal with ranges of integer intervals. However, it does not contain a fully functional implementation yet, an omission that we hope to fix with this project. Timeline and Testing Methodology 1. Change the LVI interface, adding a new method to it: getRange(Value *V), so that we can, not only know if a variable is a constant, but also know its range of values whenever it is not a constant. 2. Change the implementation of the prototype range analysis that LVI already uses, so that it will use our implementation. 3. Implement the sparse conditional constant propagation to use LVI. This, in addition to JumpThreading will be another client that will use LVI. 4. Run experiments on the LLVM test suite to verify that our new optimization improves on the current dead branch elimination pass that LLVM already uses. 5. Run experiments using VMKit, to check the impact of dead branch elimination on Java programs. We hope to deliver better performance numbers in this case because Java, as a memory safe language, is notorious for using many checks to ensure that memory is used only in ways that obey the contract of the variable types. Biograph I am currently a third year Computer Science student at the Federal University of Minas Gerais <http://www.dcc.ufmg.br/dcc/>, Brazil, and I work as a research assistant at the Programming Languages Lab<http://www2.dcc.ufmg.br/laboratorios/llp/>(LLP), in that university. I believe I am a good candidate to work in the proposed project because I have already a good knowledge of LLVM. I have implemented range analysis, and currently it can find the ranges of integer variables used in the programs. In addition to this, I have been working as a programmer for three years, before joining the LLP as a research assistant, and I am experienced with C++, the language in which LLVM is implemented. Furthermore, I am a student at a very good university (UFMG). To ground this statement I would like to point that the Department of Computer Science of UFMG got the best mark <http://www.ufmg.br/online/arquivos/012977.shtml> in the Brazilian National Undergrad Exam (ENADE). Finally, I work on a lab in which three other students work with LLVM. We had three Summer of Codes on LLVM in the past, and a number of papers have been published out of these experiences. References 1. A Class of Polynomially Solvable Range Constraints for Interval Analysis without Widenings and Narrowings, Zhendong Su and David Wagner, In Theoretical Computer Science, Volume 345 , Issue 1, 2005. 2. Bitwidth Analysis with Application to Silicon Compilation, Mark Stephenson, Jonathan Babb, and Saman Amarasinghe, In Proceedings of the SIGPLAN conference on Programming Language Design and Implementation, 2000. 3. Polynomial Precise Interval Analysis Revisited. Thomas Gawlitza, Jérôme Leroux, Jan Reineke, Helmut Seidl, Grégoire Sutre, Reinhard Wilhelm: Efficient Algorithms 2009: 422-437 4. Linear Time Range Analysis with Affine Constraints. Douglas do Couto Teixeira and Fernando Magno Quintao Pereira ( http://homepages.dcc.ufmg.br/~douglas/projects/RangeAnalysis/RangeAnalysis.paper.pdf ) 5. Constant propagation with conditional branches. Mark N. Wegman and F. Kenneth Zadeck, In ACM Transactions on Programming Languages and Systems (TOPLAS), 181-210, 1991. 6. Efficient SSI Conversion. André Luiz C. Tavares, Fernando Magno Quintão Pereira, Mariza A. S. Bigonha and Roberto S. Bigonha. Simpósio Brasileiro de Linguagens de Programação. 2010. 7. ABCD: eliminating array bounds checks on demand, Rajkslav Bodik, Rajiv Gupta and Vivek Sarkar, In Proceedings of the SIGPLAN conference on Programming Language Design and Implementation, 2000. Contact Info Name: Douglas do Couto Teixeira e-mail 1: douglas at dcc dot ufmg dot br e-mail 2: douglasdocouto at gmail dot com -- Douglas do Couto Teixeira -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110323/e315130c/attachment.html>
Dear Douglas, Comments below. On 3/23/11 8:06 AM, Douglas do Couto Teixeira wrote:> Dear LLVM community, > > I would like to contribute to LLVM in the Google Summer of Code > project. My proposal is listed below. Please let me know your comments. > > > Adding Range Analysis to LLVM > > > > Abstract > > The objective of this work is patch our implementation of range > analysis into LLVM. I have a running implementation of rangeSay "working implementing" instead of "running implementation."> analysis in LLVM, but it is not currently part of the main > distribution. I propose to integrate our range analysis implementation > into the Lazy Value Info (LVI) interface that LLVM provides. Range > analysis finds the intervals of values that may be bound to the > integer variables during the execution of programs and is useful in > several scenarios: constant propagation, detection of potential buffer > overflow attacks, dead branch elimination, array bound check > elimination, elimination of overflow tests in scripting languages such > as JavaScript and Lua, etc. > > > Objective > > The objective of this project is to augment LLVM with range analysis. > We will do this integration by patching the current implementation of > range analysis that we have onto the Lazy Value Info (LVI) interface > that LLVM already provides. In addition, we will develop new > optimizations using LVI. In particular, we will provide a pass that > performs conditional constant propagation [5], and elimination of > dead-branches. > > Criteria of Success > > * To improve substantially the precision of the current > implementation of LVI. Currently, LVI’s interface only allows a > client to know if a variable contains a constant. We want to > allow LVI to report that a variable either contains a constant, > or a known-range. > * To improve the current implementation of constant propagation > that LLVM uses. We hope to obtain a small performance gain on > the C benchmarks in the LLVM test suite, and a larger gain on > Java programs that are compiled using VMKit. > * To improve the implementation of JumpThreading, in such a way > that more dead-branches will be eliminated. Again, we hope to > achieve a small speed-up on the C benchmarks, and a larger > speed-up on the Java benchmarks. > > > > Background > > Range Analysis is a technique that maps integer variables to the > possible ranges of values that they may assume through outthroughout is a single word in this instance.> the execution of a program. Thus, for each integer variable, a range > analysis determines its lower and upper limits. A very simple range > analysis would, for instance, map each variable to the limits imposed > by its type. That is, an 8-bit unsigned integer variable can be > correctly mapped to the interval [0, 255], and an 16-bit signed > integer can be mapped to [-32767, 32766].You probably want to be consistent in how numbers are interpreted. Use the unsigned or signed interpretations in both examples above.> However, the precision of this analysis can greatly be improved from > information inferred from the program text. > > Ideally this range should be as constrained as possible, so that an > optimizing compiler could learn more information about each variable. > However, the range analysis must be conservative, that is, it will > only constraint the range of a variable if it can"constrain the range"> prove that it is safe to do so. As an example, consider the program: > > i = read(); > if (i < 10) { > print (i + 1); > else { > print(i - 1); > } > > In this program we know, from the conditional test, that the value of > ‘i’ in the true side of the branch is in the range [-INF, 9], and in > the false side is in the range [10, +INF]. > > During the Summer of Code 2010 I have designed and implemented, under > the orientation of Duncan Sands, a non-iterativeRemove the word "have" in the sentence above.> range analysis algorithm. Our implementation is currently fully > functional, been able to analyze the whole LLVM test suite. For more > details, see [4]. However, this implementation has never been > integrated into the LLVM main trunc, for two reasons: > > 1. We use an intermediate representation called extended static > assignment form [6], which the LLVM contributors were reluctant > to use; > 2. During the SoC 2010 we did not have time to completely finish > our implementation, and runtime numbers were available only by > the end of 2010. > 3. There was not really an infra-structure already in place in LLVM > to take benefit of our analysis. >I think your text is a little unclear about point #3. From your text below, it sounds like LLVM lacks optimizations that utilize value range analysis. If that is the case, then you should state below (like you do in the beginning of your proposal) what optimizations you'll write. As an aside, we'd be interested in using trying out value-range analysis in SAFECode (http://safecode.cs.illinois.edu) for use in static array bounds checking. Creating a static array bounds checking pass for SAFECode using your analysis would probably be trivial. The only difficulty is that SAFECode is currently built using LLVM 2.7, and I'm guessing that you're using a newer version of LLVM.> > > In order to address the first item, we propose to integrate the > intermediate representation directly into our analysis, yet, as a > module that can be used in separate by other clients, if necessary.This part isn't completely clear to me. It sounds like what you're suggesting is to write one analysis pass that internally constructs e-SSA form and a second analysis that actually does the value-range analysis. It also sounds like you're writing the value-range analysis so that it can be used with and without e-SSA form. Is this correct, or have I misinterpreted what you're saying? Either way, I think the text above and the paragraph below about e-SSA form could be made a little more clear.> We are splitting the live ranges of variables using single-arity > phi-functions, which are automatically handled by the SSA elimination > pass that LLVM already includes. This live range splitting is only > necessary for greater precision. We can do our live range analysis > without it, although the results are less precise. A previous Summer > of Code, authored by Andre Tavares, has shown that the e-SSA form > increases the number of phi-functions in the program code by less than > 10%, and it is very fast to build [6]. > > The second item of our list of hindrances is no longer a problem. Our > implementation is ready for use. We have been able to analyze the > whole LLVM test suite, plus SPEC CPU 2006 - over 4 million LLVM > bytecoes - in 44 seconds on a 2.4GHz machine. We obtain non-trivial > bit size reductions for the small benchmarks, having results that > match those found by previous, non-conservative works, such as > Stephenson’s et al’s [2].What kind of improvements do you see in large benchmarks? If small benchmarks yield good results and large benchmarks yield poor results, then your analysis probably needs more work to be useful for real-world programs.> Moreover, our implementation is based on a very modern algorithm, by > Su and Wagner [1], augmented with Gawlitza’s technique to handle > cycles [3]. We believe that this is the fastest implementation of such > an analysis. > > Finally, the third item is also no longer a problem. Presently LLVM > offers the Lazy Value Info interface that reports when variables are > constants. The current LVI implementation also provides > infra-structure to deal with ranges of integer intervals. However, it > does not contain a fully functional implementation yet, an omission > that we hope to fix with this project.Again, you need to be clear about whether any optimizations use this for anything beyond seeing if a value is constant, and if not, describe what optimizations you plan to write to fix that.> > Timeline and Testing Methodology > > 1. Change the LVI interface, adding a new method to it: > getRange(Value *V), so that we can, not only know if a variable is >No comma after can.> 1. a constant, but also know its range of values whenever it is not > a constant. > 2. Change the implementation of the prototype range analysis that > LVI already uses, so that it will use our implementation. > 3. Implement the sparse conditional constant propagation to use > LVI. This, in addition to JumpThreading will be another client > that will use LVI. >Is there an advantage to using your analysis for conditional constant propagation? I see the value in range analysis, but I don't see what your algorithm adds above what is already there. Please specify.> 1. Run experiments on the LLVM test suite to verify that our new > optimization improves on the current dead branch elimination > pass that LLVM already uses. > 2. Run experiments using VMKit, to check the impact of dead branch > elimination on Java programs. We hope to deliver better > performance numbers in this case because Java, as a memory safe > language, is notorious for using many checks to ensure that > memory is used only in ways that obey the contract of the > variable types. >You may want to check that VMKit is working with the version of LLVM that you're using.> > > BiographBiography> > I am currently a third year Computer Science student at theFederal > University of Minas Gerais <http://www.dcc.ufmg.br/dcc/>, Brazil, and > I work as a research assistant at theProgramming Languages Lab > <http://www2.dcc.ufmg.br/laboratorios/llp/>(LLP), in that university.You might want to state more explicitly whether you're an undergraduate or graduate student. I'm guessing your an undergraduate but am not sure.> > I believe I am a good candidate to work in the proposed project > because I have already a good knowledge of LLVM. I have"on the proposed project" and "because I am already knowledgeable about LLVM"> implemented range analysis, and currently it can find the ranges of > integer variables used in the programs. In addition to this, I have > been working as a programmer for three years, before joining the LLP > as a research assistant, and I am experiencedI think you want to say that you worked as a programmer for three years before joining LLP.> with C++, the language in which LLVM is implemented. Furthermore, I am > a student at a very good university (UFMG). To ground this statement I > would like to point that the Department of Computer Science of UFMG > got thebest mark <http://www.ufmg.br/online/arquivos/012977.shtml>in the"To justify" instead of "to ground" and "point out" instead of just "point"> Brazilian National Undergrad Exam (ENADE). Finally, I work on a lab in > which three other students work with LLVM. We hadwork in a lab> three Summer of Codes on LLVM in the past, and a number of papers have > been published out of these experiences.I think you should just point out the SoC's that you've been involved in. The ones from your lab mates seems less relevant to me. All in all, I think your wrote a good proposal. You may want to send a revised version to the list; others may want to comment on the things that I think need to be clarified. BTW, are you looking for a mentor, or has Duncan volunteered for this year again? Good luck! -- John T.> > References > > 1. A Class of Polynomially Solvable Range Constraints for Interval > Analysis without Widenings and Narrowings, Zhendong Su and David > Wagner, In Theoretical Computer Science, Volume 345 , Issue 1, 2005. > > 2. Bitwidth Analysis with Application to Silicon Compilation, Mark > Stephenson, Jonathan Babb, and Saman Amarasinghe, In Proceedings of > the SIGPLAN conference on Programming Language Design and > Implementation, 2000. > > 3. Polynomial Precise Interval Analysis Revisited.Thomas Gawlitza, > Jérôme Leroux, Jan Reineke, Helmut Seidl, Grégoire Sutre, Reinhard > Wilhelm: Efficient Algorithms 2009: 422-437 > > 4. Linear Time Range Analysis with Affine Constraints.Douglas do Couto > Teixeira and Fernando Magno Quintao Pereira > (http://homepages.dcc.ufmg.br/~douglas/projects/RangeAnalysis/RangeAnalysis.paper.pdf > <http://homepages.dcc.ufmg.br/%7Edouglas/projects/RangeAnalysis/RangeAnalysis.paper.pdf>) > > 5. Constant propagation with conditional branches. Mark N. Wegman and > F. Kenneth Zadeck, In ACM Transactions on Programming Languages and > Systems (TOPLAS), 181-210, 1991. > > 6. Efficient SSI Conversion.André Luiz C. Tavares, Fernando Magno > Quintão Pereira, Mariza A. S. Bigonha and Roberto S. Bigonha. Simpósio > Brasileiro de Linguagens de Programação. 2010. > > 7. ABCD: eliminating array bounds checks on demand, Rajkslav Bodik, > Rajiv Gupta and Vivek Sarkar, In Proceedings of the SIGPLAN conference > on Programming Language Design and Implementation, 2000. > > Contact Info > > Name:Douglas do Couto Teixeira > e-mail 1:douglas at dcc dot ufmg dot br > e-mail 2: douglasdocouto at gmail dot com > > > > -- > Douglas do Couto Teixeira-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110323/5bee98fc/attachment.html>
Douglas do Couto Teixeira
2011-Mar-24 11:39 UTC
[LLVMdev] Range Analysis GSoC 2011 Proposal
On Wed, Mar 23, 2011 at 7:24 PM, John Criswell <criswell at illinois.edu>wrote:> Dear Douglas, > > Comments below. > > >Dear John, I'm planning to send a revised version of the proposal to the list today, but before that let me try answer some of your questions.> On 3/23/11 8:06 AM, Douglas do Couto Teixeira wrote: > > Dear LLVM community, > > I would like to contribute to LLVM in the Google Summer of Code project. My > proposal is listed below. Please let me know your comments. > > > Adding Range Analysis to LLVM > > Abstract > > The objective of this work is patch our implementation of range analysis > into LLVM. I have a running implementation of range > > > Say "working implementing" instead of "running implementation." > > > analysis in LLVM, but it is not currently part of the main distribution. I > propose to integrate our range analysis implementation into the Lazy Value > Info (LVI) interface that LLVM provides. Range analysis finds the intervals > of values that may be bound to the integer variables during the execution of > programs and is useful in several scenarios: constant propagation, detection > of potential buffer overflow attacks, dead branch elimination, array bound > check elimination, elimination of overflow tests in scripting languages such > as JavaScript and Lua, etc. > > > Objective > > The objective of this project is to augment LLVM with range analysis. We > will do this integration by patching the current implementation of range > analysis that we have onto the Lazy Value Info (LVI) interface that LLVM > already provides. In addition, we will develop new optimizations using LVI. > In particular, we will provide a pass that performs conditional constant > propagation [5], and elimination of dead-branches. > > Criteria of Success > > - To improve substantially the precision of the current implementation > of LVI. Currently, LVI’s interface only allows a client to know if a > variable contains a constant. We want to allow LVI to report that a variable > either contains a constant, or a known-range. > - To improve the current implementation of constant propagation that > LLVM uses. We hope to obtain a small performance gain on the C benchmarks in > the LLVM test suite, and a larger gain on Java programs that are compiled > using VMKit. > - To improve the implementation of JumpThreading, in such a way that > more dead-branches will be eliminated. Again, we hope to achieve a small > speed-up on the C benchmarks, and a larger speed-up on the Java benchmarks. > > > > Background > > Range Analysis is a technique that maps integer variables to the possible > ranges of values that they may assume through out > > > throughout is a single word in this instance. > > > the execution of a program. Thus, for each integer variable, a range > analysis determines its lower and upper limits. A very simple range analysis > would, for instance, map each variable to the limits imposed by its type. > That is, an 8-bit unsigned integer variable can be correctly mapped to the > interval [0, 255], and an 16-bit signed integer can be mapped to [-32767, > 32766]. > > > You probably want to be consistent in how numbers are interpreted. Use the > unsigned or signed interpretations in both examples above. > > > However, the precision of this analysis can greatly be improved from > information inferred from the program text. > > Ideally this range should be as constrained as possible, so that an > optimizing compiler could learn more information about each variable. > However, the range analysis must be conservative, that is, it will only > constraint the range of a variable if it can > > > "constrain the range" > > > prove that it is safe to do so. As an example, consider the program: > > i = read(); > if (i < 10) { > print (i + 1); > else { > print(i - 1); > } > > In this program we know, from the conditional test, that the value of ‘i’ > in the true side of the branch is in the range [-INF, 9], and in the false > side is in the range [10, +INF]. > > During the Summer of Code 2010 I have designed and implemented, under the > orientation of Duncan Sands, a non-iterative > > > Remove the word "have" in the sentence above. > > > range analysis algorithm. Our implementation is currently fully functional, > been able to analyze the whole LLVM test suite. For more details, see [4]. > However, this implementation has never been integrated into the LLVM main > trunc, for two reasons: > > 1. We use an intermediate representation called extended static > assignment form [6], which the LLVM contributors were reluctant to use; > 2. During the SoC 2010 we did not have time to completely finish our > implementation, and runtime numbers were available only by the end of 2010. > 3. There was not really an infra-structure already in place in LLVM to > take benefit of our analysis. > > > I think your text is a little unclear about point #3. From your text > below, it sounds like LLVM lacks optimizations that utilize value range > analysis. If that is the case, then you should state below (like you do in > the beginning of your proposal) what optimizations you'll write. > >I meant: There was not really an infra-structure already in place in LLVM to take benefit of our analysis. There were not clients for this pass, and not a common interface that those clients could use. Now, LLVM provides the LVI interface, and there are some clients that use it: JumpThreading, etc.> As an aside, we'd be interested in using trying out value-range analysis in > SAFECode (http://safecode.cs.illinois.edu) for use in static array bounds > checking. Creating a static array bounds checking pass for SAFECode using > your analysis would probably be trivial. The only difficulty is that > SAFECode is currently built using LLVM 2.7, and I'm guessing that you're > using a newer version of LLVM. > > >No, I'm using LLVM 2.7 too. That's because the pass that produces e-SSA form was not ported to the newer LLVM versions.> > > > > > In order to address the first item, we propose to integrate the > intermediate representation directly into our analysis, yet, as a module > that can be used in separate by other clients, if necessary. > > > This part isn't completely clear to me. It sounds like what you're > suggesting is to write one analysis pass that internally constructs e-SSA > form and a second analysis that actually does the value-range analysis. It > also sounds like you're writing the value-range analysis so that it can be > used with and without e-SSA form. Is this correct, or have I misinterpreted > what you're saying? > > Either way, I think the text above and the paragraph below about e-SSA form > could be made a little more clear. > > >Yes, we use e-SSA form to gain precision, but it is not a requirement. Compare, for instance, Figures 1, 4 and 5 in our report ( http://homepages.dcc.ufmg.br/~douglas/projects/RangeAnalysis/RangeAnalysis.paper.pdf). E-SSA increases a lot the precision of the analysis, but we still can work without it> We are splitting the live ranges of variables using single-arity > phi-functions, which are automatically handled by the SSA elimination pass > that LLVM already includes. This live range splitting is only necessary for > greater precision. We can do our live range analysis without it, although > the results are less precise. A previous Summer of Code, authored by Andre > Tavares, has shown that the e-SSA form increases the number of phi-functions > in the program code by less than 10%, and it is very fast to build [6]. > > The second item of our list of hindrances is no longer a problem. Our > implementation is ready for use. We have been able to analyze the whole LLVM > test suite, plus SPEC CPU 2006 - over 4 million LLVM bytecoes - in 44 > seconds on a 2.4GHz machine. We obtain non-trivial bit size reductions for > the small benchmarks, having results that match those found by previous, > non-conservative works, such as Stephenson’s et al’s [2]. > > > What kind of improvements do you see in large benchmarks? If small > benchmarks yield good results and large benchmarks yield poor results, then > your analysis probably needs more work to be useful for real-world programs. > > > >This happens because the analysis is intra-procedural. I guess the other optimizations that LLVM uses suffer from this limitation too. For SPEC CPU 2006 we have been able to reduce the bitwidth of the variables by 8%. This would happen with any type of range analysis algorithm. Every time we have a function, like: foo(int n) {...} We must assume that [-inf, +inf] \subseteq n. One of my lab mates is working on an inter-procedural version of the analysis. His project is on a very initial stage though.> Moreover, our implementation is based on a very modern algorithm, by Su > and Wagner [1], augmented with Gawlitza’s technique to handle cycles [3]. We > believe that this is the fastest implementation of such an analysis. > > Finally, the third item is also no longer a problem. Presently LLVM offers > the Lazy Value Info interface that reports when variables are constants. The > current LVI implementation also provides infra-structure to deal with ranges > of integer intervals. However, it does not contain a fully functional > implementation yet, an omission that we hope to fix with this project. > > > Again, you need to be clear about whether any optimizations use this for > anything beyond seeing if a value is constant, and if not, describe what > optimizations you plan to write to fix that. > > >I think the big client would be dead-code elimination: if (x > 10) {...} If we know that x is [0, 10], for instance, then this branch is dead. This kind of optimization is stronger than Zadeck's conditional constant propagation, and I believe that it would be good for array-bounds check elimination in memory safe languages such as Java.> > Timeline and Testing Methodology > > 1. Change the LVI interface, adding a new method to it: getRange(Value > *V), so that we can, not only know if a variable is > > > No comma after can. > > > > 1. a constant, but also know its range of values whenever it is not a > constant. > 2. Change the implementation of the prototype range analysis that LVI > already uses, so that it will use our implementation. > 3. Implement the sparse conditional constant propagation to use LVI. > This, in addition to JumpThreading will be another client that will use LVI. > > > Is there an advantage to using your analysis for conditional constant > propagation? I see the value in range analysis, but I don't see what your > algorithm adds above what is already there. Please specify. > > >Yes! I think I was not clear here. The main contribution, again, would be able to implement Zadeck's optimization with ranges, instead of simple constants. So, instead of being able to eliminate: if (x != 10) {...} when we know that x is, say, 11, we could eliminate also branches that use other relational operators, e.g, <, <=, >, >=, !=.> > 1. Run experiments on the LLVM test suite to verify that our new > optimization improves on the current dead branch elimination pass that LLVM > already uses. > 2. Run experiments using VMKit, to check the impact of dead branch > elimination on Java programs. We hope to deliver better performance numbers > in this case because Java, as a memory safe language, is notorious for using > many checks to ensure that memory is used only in ways that obey the > contract of the variable types. > > > You may want to check that VMKit is working with the version of LLVM that > you're using. > > > > Biograph > > > Biography > > > > I am currently a third year Computer Science student at the Federal > University of Minas Gerais <http://www.dcc.ufmg.br/dcc/>, Brazil, and I > work as a research assistant at the Programming Languages Lab<http://www2.dcc.ufmg.br/laboratorios/llp/>(LLP), in that university. > > > You might want to state more explicitly whether you're an undergraduate or > graduate student. I'm guessing your an undergraduate but am not sure. > > >I'm an undergraduate student.> > I believe I am a good candidate to work in the proposed project because I > have already a good knowledge of LLVM. I have > > > "on the proposed project" and "because I am already knowledgeable about > LLVM" > > > > implemented range analysis, and currently it can find the ranges of integer > variables used in the programs. In addition to this, I have been working as > a programmer for three years, before joining the LLP as a research > assistant, and I am experienced > > > I think you want to say that you worked as a programmer for three years > before joining LLP. > > > with C++, the language in which LLVM is implemented. Furthermore, I am a > student at a very good university (UFMG). To ground this statement I would > like to point that the Department of Computer Science of UFMG got the best > mark <http://www.ufmg.br/online/arquivos/012977.shtml> in the > > > "To justify" instead of "to ground" and "point out" instead of just "point" > > > Brazilian National Undergrad Exam (ENADE). Finally, I work on a lab in > which three other students work with LLVM. We had > > > work in a lab > > > three Summer of Codes on LLVM in the past, and a number of papers have been > published out of these experiences. > > > I think you should just point out the SoC's that you've been involved in. > The ones from your lab mates seems less relevant to me. > > All in all, I think your wrote a good proposal. You may want to send a > revised version to the list; others may want to comment on the things that I > think need to be clarified. > > BTW, are you looking for a mentor, or has Duncan volunteered for this year > again? > >Well, I made contact with Duncan some weeks ago but he doesn't officially volunteered to be my mentor this year. So, yes, I'm looking for a mentor. With best wishes, Douglas Good luck!> > -- John T. > > > > References > > 1. A Class of Polynomially Solvable Range Constraints for Interval > Analysis without Widenings and Narrowings, Zhendong Su and David Wagner, > In Theoretical Computer Science, Volume 345 , Issue 1, 2005. > > 2. Bitwidth Analysis with Application to Silicon Compilation, Mark > Stephenson, Jonathan Babb, and Saman Amarasinghe, In Proceedings of the > SIGPLAN conference on Programming Language Design and Implementation, 2000. > > 3. Polynomial Precise Interval Analysis Revisited. Thomas Gawlitza, Jérôme > Leroux, Jan Reineke, Helmut Seidl, Grégoire Sutre, Reinhard Wilhelm: > Efficient Algorithms 2009: 422-437 > > 4. Linear Time Range Analysis with Affine Constraints. Douglas do Couto > Teixeira and Fernando Magno Quintao Pereira ( > http://homepages.dcc.ufmg.br/~douglas/projects/RangeAnalysis/RangeAnalysis.paper.pdf > ) > > 5. Constant propagation with conditional branches. Mark N. Wegman and F. > Kenneth Zadeck, In ACM Transactions on Programming Languages and Systems > (TOPLAS), 181-210, 1991. > > 6. Efficient SSI Conversion. André Luiz C. Tavares, Fernando Magno Quintão > Pereira, Mariza A. S. Bigonha and Roberto S. Bigonha. Simpósio Brasileiro de > Linguagens de Programação. 2010. > > 7. ABCD: eliminating array bounds checks on demand, Rajkslav Bodik, Rajiv > Gupta and Vivek Sarkar, In Proceedings of the SIGPLAN conference on > Programming Language Design and Implementation, 2000. > > Contact Info > > Name: Douglas do Couto Teixeira > e-mail 1: douglas at dcc dot ufmg dot br > e-mail 2: douglasdocouto at gmail dot com > > > > -- > Douglas do Couto Teixeira > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110324/c71ba915/attachment.html>
On Wed, Mar 23, 2011 at 6:24 PM, John Criswell <criswell at illinois.edu> wrote:> > > On 3/23/11 8:06 AM, Douglas do Couto Teixeira wrote: > > > > the execution of a program. Thus, for each integer variable, a range > > analysis determines its lower and upper limits. A very simple range analysis > > > would, for instance, map each variable to the limits imposed by its type. > > That is, an 8-bit unsigned integer variable can be correctly mapped to the > > interval [0, 255], and an 16-bit signed integer can be mapped to [-32767, > > 32766]. > > You probably want to be consistent in how numbers are interpreted. Use the > unsigned or signed interpretations in both examples above.Also, if you keep the 16-bit signed example, it should be [-32768, 32767]. - Allan