Hello, I want to present my project for GSoC 2011 for LLVM below. It would be very nice to hear suggestions and your opinion, thanks! Superoptimization for LLVM IR Objective This project focuses on implementing superoptimization algorithms targeted at the LLVM IR. The project uses arbitrary LLVM bitcode as a training set to discover new peephole optimizations that can be later integrated into LLVM instruction simplify phase or as a separate peephole optimization pass. Motivation Code optimization is the task of using a faster sequence of instructions which compute the same information, compared to a previous code sequence. The old sequence may then be substituted by the new one, creating a faster program with the same behavior. The problem of finding a code sequence that does some computation in optimal time is undecidable and, in practice, optimizations are discovered by humans using analytical reasoning. Finding new optimizations in an automated fashion, using a search algorithm, is a different approach to the problem and became known in literature as superoptimization, term coined by Massalin [2]. Using computer-aided optimization discovery, different kind of optimizations unnoticed by humans may be detected, specially those specific to the target machine code [1]. This project will focus on implementing a superoptimizer which will exhaustively investigate new optimizations for the LLVM IR. Using this approach, I hope to greatly contribute to the current LLVM instruction simplify phase, introducing new substitutions identified in this project. Methodology There is already a superoptimizer for LLVM in initial stages of development. The existing superoptimizer currently limits its search space to a few transfor- mations. This project will focus on improving this current superoptimizer by extending the number of transformations applied and thus greatly increasing the search space. In order to test the superoptimizer efficiency, it will use LLVM bitcodes compiled from the entire LLVM test suite and SPEC to harvest possible code sequence optimizations. When successfully identified, these optimizations could be implemented directly as peephole optimizations for LLVM IR. Timeline This project is expected to last three months, as follows. • 1st month, 1st week - Study the LLVM IR in order to establish a list of axioms, that is, valid transformations that determine the search algorithm capability. There is an important trade-off to investigate here, since using fewer transformations will lead to faster searches, but several good optimizations may be missed. • 1st month, 2nd week - Thorough study of the current implemented base and new algorithm design to support arbitrary transformations. • 1st month, 3rd week until end of project - Implementation of the search algorithm. • 2nd month - Test phase begins. Test algorithm with simple transforma- tions searching for optimized code sequences. • 3rd month - Harvesting new optimizations and implementing fixes to the search algorithm as more performance feedback from training data is received. Biography I am a Master student at the University of Campinas (UNICAMP), Brazil (time- zone GMT -3). In my master thesis, I have already worked with superoptimiza- tion theory and automatic code sequence searching. But instead of using it for optimization purposes, I use this to automatically generate a LLVM backend based on a machine description using the ArchC architecture description lan- guage. The backend patterns are fixed and the implementation of the pattern is inferred using a search algorithm similar to the one used on superoptimizers. Thus, I have experience both on superoptimizer theory and LLVM code. References [1] Sorav Bansal and Alex Aiken. Automatic generation of peephole superopti- mizers. SIGPLAN Not., 41:394–403, October 2006. [2] Henry Massalin. Superoptimizer: a look at the smallest program. In ASPLOS-II: Proceedings of the second international conference on Architec- tual support for programming languages and operating systems, pages 122– 126, Los Alamitos, CA, USA, 1987. IEEE Computer Society Press. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110406/2161e6bb/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: Auler,GSoC2011.pdf Type: application/pdf Size: 102451 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110406/2161e6bb/attachment.pdf>
I am no llvm/compiler/optimization-expert and not able to mentor this project - I just want to say that this sounds very interesting and I'm hoping that maybe someone else could mentor it, or, at least, leave some useful comments, as it sounds like something that could improve llvm a lot - doesn't it? 2011/4/6 Rafael Auler <rafaelauler at gmail.com>> Hello, > > I want to present my project for GSoC 2011 for LLVM below. It would be very > nice to hear suggestions and your opinion, thanks! > > Superoptimization for LLVM IR > > Objective > > This project focuses on implementing superoptimization algorithms targeted > at > the LLVM IR. The project uses arbitrary LLVM bitcode as a training set to > discover new peephole optimizations that can be later integrated into LLVM > instruction simplify phase or as a separate peephole optimization pass. > > Motivation > > Code optimization is the task of using a faster sequence of instructions > which > compute the same information, compared to a previous code sequence. The old > sequence may then be substituted by the new one, creating a faster program > with the same behavior. The problem of finding a code sequence that does > some > computation in optimal time is undecidable and, in practice, optimizations > are > discovered by humans using analytical reasoning. Finding new optimizations > in > an automated fashion, using a search algorithm, is a different approach to > the > problem and became known in literature as superoptimization, term coined by > Massalin [2]. > Using computer-aided optimization discovery, different kind of optimizations > unnoticed by humans may be detected, specially those specific to the target > machine code [1]. This project will focus on implementing a superoptimizer > which will exhaustively investigate new optimizations for the LLVM IR. > Using > this approach, I hope to greatly contribute to the current LLVM instruction > simplify phase, introducing new substitutions identified in this project. > > > Methodology > > There is already a superoptimizer for LLVM in initial stages of > development. > The existing superoptimizer currently limits its search space to a few > transfor- > mations. This project will focus on improving this current superoptimizer > by > extending the number of transformations applied and thus greatly increasing > the search space. > > In order to test the superoptimizer efficiency, it will use LLVM bitcodes > compiled from the entire LLVM test suite and SPEC to harvest possible code > sequence optimizations. When successfully identified, these optimizations > could > be implemented directly as peephole optimizations for LLVM IR. > > Timeline > > This project is expected to last three months, as follows. > > • 1st month, 1st week - Study the LLVM IR in order to establish a > list of axioms, that is, valid transformations that determine the search > algorithm capability. There is an important trade-off to investigate here, > since using fewer transformations will lead to faster searches, but several > good optimizations may be missed. > • 1st month, 2nd week - Thorough study of the current implemented > base and new algorithm design to support arbitrary transformations. > • 1st month, 3rd week until end of project - Implementation of the > search algorithm. > • 2nd month - Test phase begins. Test algorithm with simple transforma- > tions searching for optimized code sequences. > • 3rd month - Harvesting new optimizations and implementing fixes to > the search algorithm as more performance feedback from training data is > received. > > Biography > > I am a Master student at the University of Campinas (UNICAMP), Brazil > (time- > zone GMT -3). In my master thesis, I have already worked with > superoptimiza- > tion theory and automatic code sequence searching. But instead of using it > for > optimization purposes, I use this to automatically generate a LLVM backend > based on a machine description using the ArchC architecture description > lan- > guage. The backend patterns are fixed and the implementation of the pattern > is inferred using a search algorithm similar to the one used on > superoptimizers. > Thus, I have experience both on superoptimizer theory and LLVM code. > > > References > > [1] Sorav Bansal and Alex Aiken. Automatic generation of peephole > superopti- > mizers. SIGPLAN Not., 41:394–403, October 2006. > > [2] Henry Massalin. Superoptimizer: a look at the smallest program. In > ASPLOS-II: Proceedings of the second international conference on Architec- > tual support for programming languages and operating systems, pages 122– > 126, Los Alamitos, CA, USA, 1987. IEEE Computer Society Press. > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110407/93916f93/attachment.html>
A bit of feedback on this proposal: It worries me that you talk about "extending the number of transformations." The point of a superoptimizer is that it has no specific model of transformations. Rather, it is simply looking for cheap code fragments that are equivalent to expensive ones. You have not specified how you will perform equivalence checking between code sequences. The best answer that I know of is to use the technique from the Bansal and Aiken paper, which is to translate bitcode sequences into a SAT or SMT instance and just ask if they are equivalent functions. You have not specified the cost model that you will use to assess the desirability of each potential transformation. Simply replacing larger sequences with smaller ones is not necessarily a good idea since you may end up replacing a series of shift operations with a division. or something like that. A linear sequence of LLVM instructions is perhaps not the best starting point. A better primitive unit of code would be a subgraph of the PDG. This will ensure that the superoptimizer is looking at instructions that are semantically related. Also, you will need to consider dependencies in any case to make this work, so you might as well do it from the outset. John Regehr
> It worries me that you talk about "extending the number of > transformations." The point of a superoptimizer is that it has no > specific model of transformations. Rather, it is simply looking for cheap > code fragments that are equivalent to expensive ones.I could stand to be more clear... Could you describe explictly the role of transformations in your superoptimizer framework? And also give examples of what kinds of transformations you are talking about? Could you describe how it is established that two code fragments are equivalent? John
Hi Rafael,> Methodology > > There is already a superoptimizer for LLVM in initial stages of development. > The existing superoptimizer currently limits its search space to a few transfor- > mations. This project will focus on improving this current superoptimizer by > extending the number of transformations applied and thus greatly increasing > the search space.it would be great if you could extend my superoptimizer to do fully general superoptimization.> In order to test the superoptimizer efficiency, it will use LLVM bitcodes > compiled from the entire LLVM test suite and SPEC to harvest possible code > sequence optimizations. When successfully identified, these optimizations could > be implemented directly as peephole optimizations for LLVM IR.Right, this is exactly what has been done with the existing superoptimizer, which, in spite of its simplicity, discovered a lot of commonly occurring simplifications in SPEC, the LLVM testsuite and the Ada ACATS testsuite. A bunch of these were implemented in LLVM 2.9, mostly in InstructionSimplify.> • 1st month, 1st week - Study the LLVM IR in order to establish a > list of axioms, that is, valid transformations that determine the search > algorithm capability. There is an important trade-off to investigate here, > since using fewer transformations will lead to faster searches, but several > good optimizations may be missed.It sounds like you are planning to follow the approach of Joshi, Nelson and Randall ("Denali: A Goal-directed Superoptimizer") in that you don't intend to exhaustively enumerate all possible code sequences, and see if they are the same as the original only better; but instead start from the original code sequence and repeatedly apply transformations that change code to equivalent code, looking to see if you can eventually get something better than the original. Is that right? Ciao, Duncan.
Hello all, thanks for the feedback! It sounds like you are planning to follow the approach of Joshi, Nelson and> Randall ("Denali: A Goal-directed Superoptimizer") in that you don't intend > to exhaustively enumerate all possible code sequences, and see if they are > the same as the original only better; but instead start from the original > code > sequence and repeatedly apply transformations that change code to > equivalent > code, looking to see if you can eventually get something better than the > original. Is that right? > >Exactly. If we can use axioms/transformations/identities which don't modify the code behavior, but just tries to interpret instruction semantics in a different way, so as to find another (and hopefully simpler) instructions to do the same job, the role of the SAT prover is no longer necessary, since the search space is constrained to expressions which are semantically equivalent. Example: Transformation: (ADD Operand_A, Contant 0) <- equivalent to-> Operand_A Using this transformation, the search algorithm now can simplify "add A with 0" to "A" or, what may look initially not so useful, transform any operand A into an add with A plus 0. Later, this is important to prove we can use an add instruction to load any operand. Well, I think you got the idea of using a set of axioms... this, in practice, is important to prove an arbitrary sequence A is equivalent to SequenceB, and, if B is shorter than A, use B instead. Speaking of implementation, there will be a big list of transformations and a "for" loop trying each transformation and recursively applying more transformations, until we can show a code sequence is equivalent to another. This is a short overview of the algorithm and how I worked in this area. If you prefer another approach described formally in another paper, I can study it and use another algorithm. Note that this system is a simple theorem prover, since it applies several equivalence rules to an expression to try to transform it to the answer. The difference is that we don't have a "answer" to prove. Any faster sequence will suffice as "answer". Regarding the performance "shift vs. multiply", the instructions will have weights. Since we are working at the IR level, we can't assume neither is good. That's why superoptimization is best applied as peephole optimization of a backend prior to code emission - when we know more about the architecture, we can take more appropriate decisions and, with superoptimization, use obscure target machine instructions to compute apparently unrelated expressions. If we use this at the IR level, some suggested optimizations may not necessarily lead to better code after the backend pass (lowering/transformation to selectionDAG). That's the downside of using it at the IR level. On the other side, many other simple substitutions may be learned by the algorithm and improve the instructionsimplify. Thanks, -Rafael -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110407/ce89f157/attachment.html>
IMO super optimizer would yield less benefits on LLVM compared to other compilers. If you check the patch of the instcombine pass, you'll find out people keep dragging "correct" optimization out, not because the optimization violates the semantic of LLVM IR, but it will generate wrong code sequences when lowering to machine code. An example: %3 = fcmp %1, %2 %6 = fcmp %4, %5 %7 = and %3, %6 %8 = and %7, %foo Sometimes you'll be screw if you want to play reassociate %7 and %8. I don't see a easy way of catching them in theorem provers. Haohui On Thu, Apr 7, 2011 at 5:03 PM, Rafael Auler <rafaelauler at gmail.com> wrote:> Hello all, thanks for the feedback! >> >> It sounds like you are planning to follow the approach of Joshi, Nelson >> and >> Randall ("Denali: A Goal-directed Superoptimizer") in that you don't >> intend >> to exhaustively enumerate all possible code sequences, and see if they are >> the same as the original only better; but instead start from the original >> code >> sequence and repeatedly apply transformations that change code to >> equivalent >> code, looking to see if you can eventually get something better than the >> original. Is that right? >> > > Exactly. If we can use axioms/transformations/identities which don't modify > the code behavior, but just tries to interpret instruction semantics in a > different way, so as to find another (and hopefully simpler) instructions to > do the same job, the role of the SAT prover is no longer necessary, since > the search space is constrained to expressions which are semantically > equivalent. > Example: > Transformation: (ADD Operand_A, Contant 0) <- equivalent to-> Operand_A > Using this transformation, the search algorithm now can simplify "add A with > 0" to "A" or, what may look initially not so useful, transform any operand A > into an add with A plus 0. Later, this is important to prove we can use an > add instruction to load any operand. Well, I think you got the idea of using > a set of axioms... this, in practice, is important to prove an arbitrary > sequence A is equivalent to SequenceB, and, if B is shorter than A, use B > instead. Speaking of implementation, there will be a big list of > transformations and a "for" loop trying each transformation and recursively > applying more transformations, until we can show a code sequence is > equivalent to another. This is a short overview of the algorithm and how I > worked in this area. If you prefer another approach described formally in > another paper, I can study it and use another algorithm. > Note that this system is a simple theorem prover, since it applies several > equivalence rules to an expression to try to transform it to the answer. The > difference is that we don't have a "answer" to prove. Any faster sequence > will suffice as "answer". Regarding the performance "shift vs. multiply", > the instructions will have weights. Since we are working at the IR level, we > can't assume neither is good. That's why superoptimization is best applied > as peephole optimization of a backend prior to code emission - when we know > more about the architecture, we can take more appropriate decisions and, > with superoptimization, use obscure target machine instructions to compute > apparently unrelated expressions. If we use this at the IR level, some > suggested optimizations may not necessarily lead to better code after the > backend pass (lowering/transformation to selectionDAG). That's the downside > of using it at the IR level. On the other side, many other simple > substitutions may be learned by the algorithm and improve the > instructionsimplify. > Thanks, > -Rafael > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >
Hi Rafael, don't forget to submit your proposal to GSOC (via the GSOC web-page) - the deadline is today!> > It sounds like you are planning to follow the approach of Joshi, Nelson and > Randall ("Denali: A Goal-directed Superoptimizer") in that you don't intend > to exhaustively enumerate all possible code sequences, and see if they are > the same as the original only better; but instead start from the original code > sequence and repeatedly apply transformations that change code to equivalent > code, looking to see if you can eventually get something better than the > original. Is that right? > > > Exactly. If we can use axioms/transformations/identities which don't modify the > code behavior, but just tries to interpret instruction semantics in a different > way, so as to find another (and hopefully simpler) instructions to do the same > job, the role of the SAT prover is no longer necessary, since the search space > is constrained to expressions which are semantically equivalent. > > Example: > Transformation: (ADD Operand_A, Contant 0) <- equivalent to-> Operand_A > > Using this transformation, the search algorithm now can simplify "add A with 0" > to "A" or, what may look initially not so useful, transform any operand A into > an add with A plus 0. Later, this is important to prove we can use an add > instruction to load any operand. Well, I think you got the idea of using a set > of axioms... this, in practice, is important to prove an arbitrary sequence A is > equivalent to SequenceB, and, if B is shorter than A, use B instead. Speaking > of implementation, there will be a big list of transformations and a "for" loop > trying each transformation and recursively applying more transformations, until > we can show a code sequence is equivalent to another. This is a short overview > of the algorithm and how I worked in this area. If you prefer another approach > described formally in another paper, I can study it and use another algorithm.I am fine with this approach, even though it can miss interesting reductions, since it is way more efficient (from what I hear) than the fully general approach, so should be able to handle code sequences longer than just a couple of instructions. Ciao, Duncan.