Ariel Ben-Yehuda via llvm-dev
2017-Dec-21 20:24 UTC
[llvm-dev] Pass ordering - GVN vs. loop optimizations
Hi, This is Ariel from the Rust team again. I am having another pass ordering issue. Looking at the pass manager at https://github.com/llvm-mirror/llvm/blob/7034870f30320d6fbc74effff539d946018cd00a/lib/Transforms/IPO/PassManagerBuilder.cpp (the early SimplifyCfg now doesn't sink stores anymore! I can't wait until I can get to use that in rustc!) I find that the loop optimization group does not run after GVN: // Rotate Loop - disable header duplication at -Oz MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1)); MPM.add(createLICMPass()); // Hoist loop invariants if (EnableSimpleLoopUnswitch) MPM.add(createSimpleLoopUnswitchLegacyPass()); else MPM.add(createLoopUnswitchPass(SizeLevel || OptLevel < 3, DivergentTarget)); MPM.add(createCFGSimplificationPass()); addInstructionCombiningPass(MPM); MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars // <I probably want to add some SimplifyCfg pass here, but // that's a separate issue> MPM.add(createLoopIdiomPass()); // Recognize idioms like memset. addExtensionsToPM(EP_LateLoopOptimizations, MPM); MPM.add(createLoopDeletionPass()); // Delete dead loops if (EnableLoopInterchange) { MPM.add(createLoopInterchangePass()); // Interchange loops MPM.add(createCFGSimplificationPass()); } if (!DisableUnrollLoops) MPM.add(createSimpleLoopUnrollPass()); // Unroll small loops addExtensionsToPM(EP_LoopOptimizerEnd, MPM); // <GVN is now immediately after loop optimizatons if (OptLevel > 1) { MPM.add(createMergedLoadStoreMotionPass()); // Merge ld/st in diamonds MPM.add(NewGVN ? createNewGVNPass() : createGVNPass(DisableGVNLoadPRE)); // Remove redundancies } This causes a problem, because GVN appears to be the only pass that can merge loads across basic blocks. This means that if a loop index only appears behind a pointer, LLVM will not be able to optimize out bounds checks depending on it. The "canonical" example is this Rust function (or the similar C code): #![crate_type="rlib"] pub fn f(length: &usize) -> u64 { let mut sum = 0; let len_1 = *length; let mut i = 0; while i < len_1 { let len_2 = *length; assert!(i < len_2); i += 1; } sum } One would expect the assertion (which in a real example is a bounds check) to be optimized out. However, IndVarSimplify is the optimization that is supposed to do that, and it only runs *before* GVN, so it "sees" 2 separate loads of the length and can't do anything. Changing the pass ordering to put GVN before the loop optimizations (reversing the 2 blocks in my excerpt above) fixes this example, so I'm trying to figure out whether that destroys something important. I'll note that rustc has another GVN pass "downstream" (before the DeadStoreElimination pass) which might help alleviate some worries. I could not find any good documentation or tests for why the passes are in the order they are - the order seems to have been preserved from clang in 2009. If you have some good testing infrastructure, it would be nice if there was a way we could use it to check whether the pass reordering would hurt anything important. And in any case, I keep seeing weird optimization misses caused by pass ordering. It would be nice if there was some better knowledge of this situation. Regards, - Ariel -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171221/a0e78ab3/attachment.html>
Philip Reames via llvm-dev
2017-Dec-22 23:59 UTC
[llvm-dev] Pass ordering - GVN vs. loop optimizations
On 12/21/2017 12:24 PM, Ariel Ben-Yehuda via llvm-dev wrote:> Hi, > > This is Ariel from the Rust team again. > > I am having another pass ordering issue. Looking at the pass manager > at > https://github.com/llvm-mirror/llvm/blob/7034870f30320d6fbc74effff539d946018cd00a/lib/Transforms/IPO/PassManagerBuilder.cpp > (the early SimplifyCfg now doesn't sink stores anymore! I can't wait > until I can get to use that in rustc!) I find that the loop > optimization group does not run after GVN: > > // Rotate Loop - disable header duplication at -Oz > MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1)); > MPM.add(createLICMPass()); // Hoist loop invariants > if (EnableSimpleLoopUnswitch) > MPM.add(createSimpleLoopUnswitchLegacyPass()); > else > MPM.add(createLoopUnswitchPass(SizeLevel || OptLevel < 3, > DivergentTarget)); > MPM.add(createCFGSimplificationPass()); > addInstructionCombiningPass(MPM); > MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars > // <I probably want to add some SimplifyCfg pass here, but > // that's a separate issue> > MPM.add(createLoopIdiomPass()); // Recognize idioms like > memset. > addExtensionsToPM(EP_LateLoopOptimizations, MPM); > MPM.add(createLoopDeletionPass()); // Delete dead loops > if (EnableLoopInterchange) { > MPM.add(createLoopInterchangePass()); // Interchange loops > MPM.add(createCFGSimplificationPass()); > } > if (!DisableUnrollLoops) > MPM.add(createSimpleLoopUnrollPass()); // Unroll small loops > addExtensionsToPM(EP_LoopOptimizerEnd, MPM); > > // <GVN is now immediately after loop optimizatons > > if (OptLevel > 1) { > MPM.add(createMergedLoadStoreMotionPass()); // Merge ld/st in diamonds > MPM.add(NewGVN ? createNewGVNPass() > : createGVNPass(DisableGVNLoadPRE)); // Remove > redundancies > } > > This causes a problem, because GVN appears to be the only pass that > can merge loads across basic blocks.Aside: EarlyCSE also does this in simple cases with no merges in the CFG between the two loads.> This means that if a loop index only appears behind a pointer, LLVM > will not be able to optimize out bounds checks depending on it. > > The "canonical" example is this Rust function (or the similar C code): > > #![crate_type="rlib"] > pub fn f(length: &usize) -> u64 { > let mut sum = 0; > let len_1 = *length; > let mut i = 0; > while i < len_1 { > let len_2 = *length; > assert!(i < len_2); > i += 1; > } > sum > } > > One would expect the assertion (which in a real example is a bounds > check) to be optimized out. However, IndVarSimplify is the > optimization that is supposed to do that, and it only runs *before* > GVN, so it "sees" 2 separate loads of the length and can't do anything. > > Changing the pass ordering to put GVN before the loop optimizations > (reversing the 2 blocks in my excerpt above) fixes this example, so > I'm trying to figure out whether that destroys something important. > I'll note that rustc has another GVN pass "downstream" (before the > DeadStoreElimination pass) which might help alleviate some worries.I think you've discovered the practical reality that your frontend should have it's own pass order. Our Java frontend specifically runs multiple iterations of these passes to avoid scenarios like this. Getting this changed in the standard pipeline will be slow going at best; you're probably better off versioning one which works for you. Also, since you care about range checks, I strongly suggest you take a loop at LoopPredication and InductiveRangeCheckElimination.> > I could not find any good documentation or tests for why the passes > are in the order they are - the order seems to have been preserved > from clang in 2009. If you have some good testing infrastructure, it > would be nice if there was a way we could use it to check whether the > pass reordering would hurt anything important.The reality is that pass ordering is a mostly experimental science. We try something, measure, and see what goes wrong. You can also look to improve frequently run passes to handle the simple cases. In your particular example, I'd look to see why LICM wasn't hoisting "let len_2 = *length;". Once that's done, EarlyCSE should be able to common the loads exposing the range check for indvarsimplify. IRCE would also handle this case.> > And in any case, I keep seeing weird optimization misses caused by > pass ordering. It would be nice if there was some better knowledge of > this situation. > > Regards, > - Ariel > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171222/c2e48cde/attachment.html>
Ariel Ben-Yehuda via llvm-dev
2017-Dec-23 20:38 UTC
[llvm-dev] Pass ordering - GVN vs. loop optimizations
LICM does get move the load to the loop header, so putting an EarlyCSE between the "first" and "second" passes of loop optimizations would fix this problem. EarlyCSE is also not *that* expensive, so I might very well do that. On the other hand, this is another of these "global" slowdowns to optimize a specific case (e.g. unless I can figure out a smart way to do a LoopSimplifyCfg, I'm going to slam a SimplifyCfg between indvars and loopidiom, because bounds checks and loop idioms don't mix). I was just surprised that loop optimizations appear after GVN and wanted to understand the cause for this. I suppose one reason is that, while loads that are supposed to be merged should be pretty popular in both Rust and C++, merging loads is much easier in Rust than in C++, because we can very often emit !noalias on pointers. I suppose that this, along with Rust's second "late" GVN, mean that moving the first GVN to before loop optimizations might be a better idea in Rust than in C++. I am just trying to figure out what risks might appear to that. BTW, I could not get IRCE to touch the bounds check here in any way, despite aggressive use of much other passes. On 12/23/17, Philip Reames <listmail at philipreames.com> wrote:> > > On 12/21/2017 12:24 PM, Ariel Ben-Yehuda via llvm-dev wrote: >> Hi, >> >> This is Ariel from the Rust team again. >> >> I am having another pass ordering issue. Looking at the pass manager >> at >> https://github.com/llvm-mirror/llvm/blob/7034870f30320d6fbc74effff539d946018cd00a/lib/Transforms/IPO/PassManagerBuilder.cpp >> >> (the early SimplifyCfg now doesn't sink stores anymore! I can't wait >> until I can get to use that in rustc!) I find that the loop >> optimization group does not run after GVN: >> >> // Rotate Loop - disable header duplication at -Oz >> MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1)); >> MPM.add(createLICMPass()); // Hoist loop invariants >> if (EnableSimpleLoopUnswitch) >> MPM.add(createSimpleLoopUnswitchLegacyPass()); >> else >> MPM.add(createLoopUnswitchPass(SizeLevel || OptLevel < 3, >> DivergentTarget)); >> MPM.add(createCFGSimplificationPass()); >> addInstructionCombiningPass(MPM); >> MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars >> // <I probably want to add some SimplifyCfg pass here, but >> // that's a separate issue> >> MPM.add(createLoopIdiomPass()); // Recognize idioms like >> memset. >> addExtensionsToPM(EP_LateLoopOptimizations, MPM); >> MPM.add(createLoopDeletionPass()); // Delete dead loops >> if (EnableLoopInterchange) { >> MPM.add(createLoopInterchangePass()); // Interchange loops >> MPM.add(createCFGSimplificationPass()); >> } >> if (!DisableUnrollLoops) >> MPM.add(createSimpleLoopUnrollPass()); // Unroll small loops >> addExtensionsToPM(EP_LoopOptimizerEnd, MPM); >> >> // <GVN is now immediately after loop optimizatons >> >> if (OptLevel > 1) { >> MPM.add(createMergedLoadStoreMotionPass()); // Merge ld/st in >> diamonds >> MPM.add(NewGVN ? createNewGVNPass() >> : createGVNPass(DisableGVNLoadPRE)); // Remove >> redundancies >> } >> >> This causes a problem, because GVN appears to be the only pass that >> can merge loads across basic blocks. > Aside: EarlyCSE also does this in simple cases with no merges in the CFG > between the two loads. >> This means that if a loop index only appears behind a pointer, LLVM >> will not be able to optimize out bounds checks depending on it. >> >> The "canonical" example is this Rust function (or the similar C code): >> >> #![crate_type="rlib"] >> pub fn f(length: &usize) -> u64 { >> let mut sum = 0; >> let len_1 = *length; >> let mut i = 0; >> while i < len_1 { >> let len_2 = *length; >> assert!(i < len_2); >> i += 1; >> } >> sum >> } >> >> One would expect the assertion (which in a real example is a bounds >> check) to be optimized out. However, IndVarSimplify is the >> optimization that is supposed to do that, and it only runs *before* >> GVN, so it "sees" 2 separate loads of the length and can't do anything. >> >> Changing the pass ordering to put GVN before the loop optimizations >> (reversing the 2 blocks in my excerpt above) fixes this example, so >> I'm trying to figure out whether that destroys something important. >> I'll note that rustc has another GVN pass "downstream" (before the >> DeadStoreElimination pass) which might help alleviate some worries. > I think you've discovered the practical reality that your frontend > should have it's own pass order. Our Java frontend specifically runs > multiple iterations of these passes to avoid scenarios like this. > Getting this changed in the standard pipeline will be slow going at > best; you're probably better off versioning one which works for you. > > Also, since you care about range checks, I strongly suggest you take a > loop at LoopPredication and InductiveRangeCheckElimination. >> >> I could not find any good documentation or tests for why the passes >> are in the order they are - the order seems to have been preserved >> from clang in 2009. If you have some good testing infrastructure, it >> would be nice if there was a way we could use it to check whether the >> pass reordering would hurt anything important. > The reality is that pass ordering is a mostly experimental science. We > try something, measure, and see what goes wrong. You can also look to > improve frequently run passes to handle the simple cases. > > In your particular example, I'd look to see why LICM wasn't hoisting > "let len_2 = *length;". Once that's done, EarlyCSE should be able to > common the loads exposing the range check for indvarsimplify. IRCE > would also handle this case. >> >> And in any case, I keep seeing weird optimization misses caused by >> pass ordering. It would be nice if there was some better knowledge of >> this situation. >> >> Regards, >> - Ariel >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >