Here is the patch for the new Operator Strength Reduction optimization pass that I have written. The bulk of the code is in lib/Transforms/Scalar/OperatorStrengthReduce.cpp The optimization is based on the algorithm described within the paper that can be found here: http://portal.acm.org/citation.cfm?id=504709.504710 Keith D. Cooper , L. Taylor Simpson , Christopher A. Vick, Operator strength reduction, ACM Transactions on Programming Languages and Systems (TOPLAS), v.23 n.5, p.603-625, September 2001. I have embedded the paper's pseudo code into comment blocks within the code. The algorithm finds reduction opportunities in both array accesses and explicit multiplications within loops. Next, I plan on writing the regression tests. thanks, Brian Brian West <bnwest <at> rice.edu> writes:> I am currently writing a new optimization pass for LLVM, based on the > paper Operator Strength Reduction (Taylor Simpson et al, 2001). I > noticed from the Developer Policy page that my code will need to be > reviewed before it is committed to the trunk. > > The Policy page also indicated that all "major" changes should to be > announced before coding and should be done as a series of incremental > changes. I am not sure a new stand alone optimization pass counts as a > major change or not. > > I have run my pass against the test suite (as described here: "How to > test my pass" > http://comments.gmane.org/gmane.comp.compilers.llvm.devel/32875). I > have fixed all of the compilation errors and all but one of the > execution errors. > > I have made a pass over my code to conform to the LLVM coding standards. > > I have not written the regressions tests yet. > > I also still need to work on the performance, but as is it is faster > than the existing -loop-reduce pass. > > My question is how to proceed. What steps do I need to take to get my > code reviewed and committed to the trunk? > > thanks, > Brian WestDuncan Sands <baldrick <at> free.fr> writes:> Hi Brian, > > > My question is how to proceed. What steps do I need to take to get my > > code reviewed and committed to the trunk? > > the first step is to send a patch adding your pass (and hopefully the > associated documentation!) to LLVM to the mailing list, so that others > can test and study your code. > > Ciao, > > Duncan.-------------- next part -------------- A non-text attachment was scrubbed... Name: osr.patch Type: application/octet-stream Size: 81300 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20101027/faef5b6d/attachment.obj>
On Wed, Oct 27, 2010 at 1:29 PM, Brian West <bnwest at rice.edu> wrote:> Here is the patch for the new Operator Strength Reduction optimization > pass that I have written. The bulk of the code is in > > lib/Transforms/Scalar/OperatorStrengthReduce.cpp > > The algorithm finds reduction opportunities in both array accesses and > explicit multiplications within loops. > > Next, I plan on writing the regression tests.Comments: 1. Please don't top-post. 2. Instruction::getOpcodeName instead of your OpcodeToString function. 3. LLVM already has a significant amount of infrastructure for loop passes; why does this pass have its own code for finding loops? 4. What's the use case for this pass? Can this pass improve performance of generated code beyond the existing -loop-reduce pass? If not, could it usefully act as a faster replacement for -loop-reduce for fast code generation? (Perhaps someone doing fast code generation with LLVM could comment on whether -loop-reduce is useful, and whether the performance is an issue.) -Eli
On 10/27/10 8:34 PM, Eli Friedman wrote:> On Wed, Oct 27, 2010 at 1:29 PM, Brian West<bnwest at rice.edu> wrote: >> Here is the patch for the new Operator Strength Reduction optimization >> pass that I have written. The bulk of the code is in >> >> lib/Transforms/Scalar/OperatorStrengthReduce.cpp >> >> The algorithm finds reduction opportunities in both array accesses and >> explicit multiplications within loops. >> >> Next, I plan on writing the regression tests. > > Comments: > 1. Please don't top-post. > 2. Instruction::getOpcodeName instead of your OpcodeToString function.Can do. I did not see getOpcodeName() in lib/Target/CBackend/CBackend.cpp, so I wrote my own. FWIW I am only using OpcodeToString() for debug output.> 3. LLVM already has a significant amount of infrastructure for loop > passes; why does this pass have its own code for finding loops?I saw the loop infrastructure for CFG loops. This algorithm finds loops in the data flow (more precisely: strongly-connected components in the SSA-graph), e.g. bb5: %20 = add nsw i32 %i.0, 1 br label %bb6 bb6: %i.0 = phi i32 [ 0, %entry ], [ %20, %bb5 ] There is a data flow loop between %20 and %i.0. The OSR paper has a nice figure showing data flow loops. Here is small excerpt from the OSR paper: "OSR is driven by a simple depth-first search of the SSA-graph, using Tarjan’s strongly-connected component finder. The SCC-finder lets OSR discover induction variables in topological order and process them as they are discovered. As the SCCs are discovered, they are processed by a set of mutually-recursive routines that test for induction variables and region constants, and then reduce the appropriate operations."> 4. What's the use case for this pass? Can this pass improve > performance of generated code beyond the existing -loop-reduce pass? > If not, could it usefully act as a faster replacement for -loop-reduce > for fast code generation? (Perhaps someone doing fast code generation > with LLVM could comment on whether -loop-reduce is useful, and whether > the performance is an issue.)-loop-reduce (LSR) only looks at implicit multiplication in array accesses. The following code has an explicit multiplication within a loop that LSR will not reduce: extern float loop4(float* a, int start, int stop) { int i; float sum; float *pEntry; char *p; unsigned long offset; const unsigned int float_size = sizeof(float); sum = 0.0; p = (char*)a; for (i=start-1; i<stop; ++i) { offset = i * float_size; pEntry = (float*) (p + offset); sum += *pEntry; } return sum; } OSR also finds reductions in some esoteric array accesses that LSR misses. I found these LSR misses while writing less-than-real-world code to break OSR. I can include these examples, if requested. For array accesses, OSR and LSR produces similar LLVM IR and I am guessing that their execution times should be similar. Empirically the OSR optimization is compile-time faster than LSR. I have also noticed that OSR has more "analysis" requirements: Induction Variable User, Natural Loop Information, Canonicalize natural loops, and Scalar Evolution Analysis. Both OSR and LSR require the Dominator Tree Construction analysis pass. I did not mention in the original email (and should have) that OSR needs -instcombine to be run after it for cleanup. Also -licm, -reassociate, -gvn and -sccp can be enabling optimizations for OSR.> -EliEli, Thanks for your comments. Brian West
Possibly Parallel Threads
- [LLVMdev] Landing my new development on the trunk ...
- [LLVMdev] Landing my new development on the trunk ...
- [LLVMdev] Landing my new development on the trunk ...
- [LLVMdev] Landing my new development on the trunk ...
- [LLVMdev] Landing my new development on the trunk ...