escha via llvm-dev
2015-Sep-13 16:03 UTC
[llvm-dev] RFC: faster simplifyInstructionsInBlock/SimplifyInstructions pass
LLVM has two similar bits of infrastructure: a simplifyInstructionsInBlock function and a SimplifyInstructions pass, both intended to be lightweight “fix up this code without doing serious optimizations” functions, as far as I can tell. I don’t think either is used in a performance-sensitive place in-tree; the former is mostly called in minor places when doing CFG twiddling, and the latter seems to be for testing purposes. However, in our out-of-tree code, I found we were calling simplifyInstructionsInBlock on the entire function to clean things up before another pass. This turns out to be quite costly: the function is not very efficient, tends to revisit instructions more than is necessary, and spends a lot of time constructing weak handles and handling its worklists. I wrote a new version of it that is about ~3x faster and reduced our total compilation time by about 2%. The new version is very very fast: it’s a full-function simplify-and-DCE that looks to be about twice as fast as comparatively “fast” passes like ADCE — fast enough that I can imagine plopping it down in a pipeline pretty much wherever is relevant. Would anyone be interested in having this in LLVM? It could be a replacement for simplifyInstructionsInBlock, a replacement for SimplifyInstructionsPass, or whatever else is reasonable; I’m mostly just not sure where to put it, and want to be sure it’d be useful to someone (given the current comparative lack of use of this code in-tree). It -should- be NFC compared to simplifyInstructionsInBlock other than running per-function instead of per-block (which can be easily changed, I just made it per-function since that’s what was most useful to us). —escha P.S. Here’s the code. The main optimizations here are: 1. Use a worklist, not a recursive approach, to avoid needless revisitation and being repeatedly forced to jump back to the start of the BB if a handle is invalidated. 2. Only insert operands to the worklist if they become unused after a dead instruction is removed, so we don’t have to visit them again in most cases. 3. Use a SmallSetVector to track the worklist. 4. Instead of pre-initting the SmallSetVector like in DeadCodeEliminationPass, only put things into the worklist if they have to be revisited after the first run-through. This minimizes how much the actual SmallSetVector gets used, which saves a lot of time. static bool simplifyAndDCEInstruction(Instruction *I, SmallSetVector<Instruction *, 16> &WorkList, const DataLayout &DL) { if (isInstructionTriviallyDead(I, nullptr)) { // Loop over all of the values that the instruction uses. If there are // instructions being used, add them to the worklist, because they might // go dead after this one is removed. SmallVector<Instruction *, 8> Operands; for (User::op_iterator OI = I->op_begin(), E = I->op_end(); OI != E; ++OI) if (Instruction *Used = dyn_cast<Instruction>(*OI)) Operands.push_back(Used); // Remove the instruction. I->eraseFromParent(); // Only add the operands to the worklist if their uses are now empty, // to avoid needlessly revisiting instructions. for (auto U : Operands) if (U->use_empty()) WorkList.insert(U); return true; } if (Value *SimpleV = SimplifyInstruction(I, DL)) { // Add the users to the worklist. for (User *U : I->users()) WorkList.insert(cast<Instruction>(U)); // Replace the instruction with its simplified value. I->replaceAllUsesWith(SimpleV); // Gracefully handle edge cases where the instruction is not wired into any // parent block. if (I->getParent()) I->eraseFromParent(); return true; } return false; } // A faster version of SimplifyInstructionsInBlock, designed for a whole // function. Modelled after DeadCodeEliminationPass. static bool simplifyAndDCEFunction(Function &F) { bool MadeChange = false; const DataLayout &DL = F.getParent()->getDataLayout(); SmallSetVector<Instruction *, 16> WorkList; // Iterate over the original function, only adding insts to the worklist // if they actually need to be revisited. This avoids having to pre-init // the worklist with the entire function's worth of instructions. for (inst_iterator FI = inst_begin(F), FE = inst_end(F); FI != FE;) { Instruction *I = &*FI; ++FI; // We're visiting this instruction now, so make sure it's not in the // worklist from an earlier visit. WorkList.remove(I); MadeChange |= simplifyAndDCEInstruction(I, WorkList, DL); } while (!WorkList.empty()) { Instruction *I = WorkList.pop_back_val(); MadeChange |= simplifyAndDCEInstruction(I, WorkList, DL); } return MadeChange; } -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150913/9bde4a5d/attachment-0001.html>
Marcello Maggioni via llvm-dev
2015-Sep-13 18:06 UTC
[llvm-dev] RFC: faster simplifyInstructionsInBlock/SimplifyInstructions pass
Hi! I like this of course! A comment about the code that I noticed.> On 13 Sep 2015, at 09:03, escha via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > LLVM has two similar bits of infrastructure: a simplifyInstructionsInBlock function and a SimplifyInstructions pass, both intended to be lightweight “fix up this code without doing serious optimizations” functions, as far as I can tell. I don’t think either is used in a performance-sensitive place in-tree; the former is mostly called in minor places when doing CFG twiddling, and the latter seems to be for testing purposes. > > However, in our out-of-tree code, I found we were calling simplifyInstructionsInBlock on the entire function to clean things up before another pass. This turns out to be quite costly: the function is not very efficient, tends to revisit instructions more than is necessary, and spends a lot of time constructing weak handles and handling its worklists. I wrote a new version of it that is about ~3x faster and reduced our total compilation time by about 2%. The new version is very very fast: it’s a full-function simplify-and-DCE that looks to be about twice as fast as comparatively “fast” passes like ADCE — fast enough that I can imagine plopping it down in a pipeline pretty much wherever is relevant. > > Would anyone be interested in having this in LLVM? It could be a replacement for simplifyInstructionsInBlock, a replacement for SimplifyInstructionsPass, or whatever else is reasonable; I’m mostly just not sure where to put it, and want to be sure it’d be useful to someone (given the current comparative lack of use of this code in-tree). It -should- be NFC compared to simplifyInstructionsInBlock other than running per-function instead of per-block (which can be easily changed, I just made it per-function since that’s what was most useful to us). > > —escha > > P.S. Here’s the code. The main optimizations here are: > 1. Use a worklist, not a recursive approach, to avoid needless revisitation and being repeatedly forced to jump back to the start of the BB if a handle is invalidated. > 2. Only insert operands to the worklist if they become unused after a dead instruction is removed, so we don’t have to visit them again in most cases. > 3. Use a SmallSetVector to track the worklist. > 4. Instead of pre-initting the SmallSetVector like in DeadCodeEliminationPass, only put things into the worklist if they have to be revisited after the first run-through. This minimizes how much the actual SmallSetVector gets used, which saves a lot of time. > > static bool simplifyAndDCEInstruction(Instruction *I, > SmallSetVector<Instruction *, 16> &WorkList, > const DataLayout &DL) { > if (isInstructionTriviallyDead(I, nullptr)) { > // Loop over all of the values that the instruction uses. If there are > // instructions being used, add them to the worklist, because they might > // go dead after this one is removed. > SmallVector<Instruction *, 8> Operands; > for (User::op_iterator OI = I->op_begin(), E = I->op_end(); OI != E; ++OI) > if (Instruction *Used = dyn_cast<Instruction>(*OI)) > Operands.push_back(Used); > > // Remove the instruction. > I->eraseFromParent(); > > // Only add the operands to the worklist if their uses are now empty, > // to avoid needlessly revisiting instructions. > for (auto U : Operands) > if (U->use_empty()) > WorkList.insert(U); >Instead of adding the operands to a list, erase the instruction and add them to the worklist wouldn’t be probably faster something like: if (Instruction *Used = dyn_cast<Instruction>(*OI)) if (Used->hasOneUse()) WorkList.insert(Used); If it has one use is going to be the instruction we are going to remove anyway, right? Cheers, Marcello> return true; > } > > if (Value *SimpleV = SimplifyInstruction(I, DL)) { > // Add the users to the worklist. > for (User *U : I->users()) > WorkList.insert(cast<Instruction>(U)); > > // Replace the instruction with its simplified value. > I->replaceAllUsesWith(SimpleV); > > // Gracefully handle edge cases where the instruction is not wired into any > // parent block. > if (I->getParent()) > I->eraseFromParent(); > return true; > } > return false; > } > > // A faster version of SimplifyInstructionsInBlock, designed for a whole > // function. Modelled after DeadCodeEliminationPass. > static bool simplifyAndDCEFunction(Function &F) { > bool MadeChange = false; > const DataLayout &DL = F.getParent()->getDataLayout(); > > SmallSetVector<Instruction *, 16> WorkList; > // Iterate over the original function, only adding insts to the worklist > // if they actually need to be revisited. This avoids having to pre-init > // the worklist with the entire function's worth of instructions. > for (inst_iterator FI = inst_begin(F), FE = inst_end(F); FI != FE;) { > Instruction *I = &*FI; > ++FI; > > // We're visiting this instruction now, so make sure it's not in the > // worklist from an earlier visit. > WorkList.remove(I); > MadeChange |= simplifyAndDCEInstruction(I, WorkList, DL); > } > > while (!WorkList.empty()) { > Instruction *I = WorkList.pop_back_val(); > MadeChange |= simplifyAndDCEInstruction(I, WorkList, DL); > } > return MadeChange; > } > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150913/86caf934/attachment.html>
escha via llvm-dev
2015-Sep-13 18:11 UTC
[llvm-dev] RFC: faster simplifyInstructionsInBlock/SimplifyInstructions pass
> > Instead of adding the operands to a list, erase the instruction and add them to the worklist wouldn’t be probably faster something like: > > if (Instruction *Used = dyn_cast<Instruction>(*OI)) > if (Used->hasOneUse()) > WorkList.insert(Used); > > If it has one use is going to be the instruction we are going to remove anyway, right?I don’t think this is strictly true, but someone correct me if I’m wrong: if you have %y = add i32 %x, %x %x will have two uses, but it will have zero if %y is deleted. This was the corner case I was worried about. —escha -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150913/5ace6ad5/attachment.html>
Philip Reames via llvm-dev
2015-Sep-14 18:44 UTC
[llvm-dev] RFC: faster simplifyInstructionsInBlock/SimplifyInstructions pass
This seems like a useful change to me. Why don't you post it for review on llvm-commits? This seems rather similar to both RecursivelyDeleteTriviallyDeadInstructions and replaceAndRecursivelySimplifyImpl. I think at least your single user trick should be applied to those as well. Given how similar all three are, I'm be tempted to common them into a common implementation. Philip On 09/13/2015 09:03 AM, escha via llvm-dev wrote:> LLVM has two similar bits of infrastructure: a > simplifyInstructionsInBlock function and a SimplifyInstructions pass, > both intended to be lightweight “fix up this code without doing > serious optimizations” functions, as far as I can tell. I don’t think > either is used in a performance-sensitive place in-tree; the former is > mostly called in minor places when doing CFG twiddling, and the latter > seems to be for testing purposes. > > However, in our out-of-tree code, I found we were calling > simplifyInstructionsInBlock on the entire function to clean things up > before another pass. This turns out to be quite costly: the function > is not very efficient, tends to revisit instructions more than is > necessary, and spends a lot of time constructing weak handles and > handling its worklists. I wrote a new version of it that is about ~3x > faster and reduced our total compilation time by about 2%. The new > version is very very fast: it’s a full-function simplify-and-DCE that > looks to be about twice as fast as comparatively “fast” passes like > ADCE — fast enough that I can imagine plopping it down in a pipeline > pretty much wherever is relevant. > > Would anyone be interested in having this in LLVM? It could be a > replacement for simplifyInstructionsInBlock, a replacement for > SimplifyInstructionsPass, or whatever else is reasonable; I’m mostly > just not sure where to put it, and want to be sure it’d be useful to > someone (given the current comparative lack of use of this code > in-tree). It -should- be NFC compared to simplifyInstructionsInBlock > other than running per-function instead of per-block (which can be > easily changed, I just made it per-function since that’s what was most > useful to us). > > —escha > > P.S. Here’s the code. The main optimizations here are: > 1. Use a worklist, not a recursive approach, to avoid needless > revisitation and being repeatedly forced to jump back to the start of > the BB if a handle is invalidated. > 2. Only insert operands to the worklist if they become unused after a > dead instruction is removed, so we don’t have to visit them again in > most cases. > 3. Use a SmallSetVector to track the worklist. > 4. Instead of pre-initting the SmallSetVector like in > DeadCodeEliminationPass, only put things into the worklist if they > have to be revisited after the first run-through. This minimizes how > much the actual SmallSetVector gets used, which saves a lot of time. > > static bool simplifyAndDCEInstruction(Instruction *I, > SmallSetVector<Instruction *, 16> &WorkList, > const DataLayout &DL) { > if(isInstructionTriviallyDead(I, nullptr)) { > // Loop over all of the values that the instruction uses. If there are > // instructions being used, add them to the worklist, because they might > // go dead after this one is removed. > SmallVector<Instruction *, 8> Operands; > for (User::op_iterator OI = I->op_begin(), E = I->op_end(); OI != E; ++OI) > if (Instruction *Used = dyn_cast<Instruction>(*OI)) > Operands.push_back(Used); > > // Remove the instruction. > I->eraseFromParent(); > > // Only add the operands to the worklist if their uses are now empty, > // to avoid needlessly revisiting instructions. > for (auto U : Operands) > if (U->use_empty()) > WorkList.insert(U); > > returntrue; > } > > if (Value *SimpleV = SimplifyInstruction(I, DL)) { > // Add the users to the worklist. > for (User *U : I->users()) > WorkList.insert(cast<Instruction>(U)); > > // Replace the instruction with its simplified value. > I->replaceAllUsesWith(SimpleV); > > // Gracefully handle edge cases where the instruction is not wired > into any > // parent block. > if (I->getParent()) > I->eraseFromParent(); > returntrue; > } > returnfalse; > } > > // A faster version of SimplifyInstructionsInBlock, designed for a whole > // function. Modelled after DeadCodeEliminationPass. > static bool simplifyAndDCEFunction(Function &F) { > bool MadeChange = false; > constDataLayout&DL = F.getParent()->getDataLayout(); > > SmallSetVector<Instruction *, 16> WorkList; > // Iterate over the original function, only adding insts to the worklist > // if they actually need to be revisited. This avoids having to pre-init > // the worklist with the entire function's worth of instructions. > for (inst_iterator FI = inst_begin(F), FE = inst_end(F); FI != FE;) { > Instruction *I = &*FI; > ++FI; > > // We're visiting this instruction now, so make sure it's not in the > // worklist from an earlier visit. > WorkList.remove(I); > MadeChange |= simplifyAndDCEInstruction(I, WorkList, DL); > } > > while (!WorkList.empty()) { > Instruction *I = WorkList.pop_back_val(); > MadeChange |= simplifyAndDCEInstruction(I, WorkList, DL); > } > return MadeChange; > } > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150914/42599ba5/attachment-0001.html>
Reasonably Related Threads
- RFC: faster simplifyInstructionsInBlock/SimplifyInstructions pass
- RFC: speedups with instruction side-data (ADCE, perhaps others?)
- [LLVMdev] Why change "sub x, 5" to "add x, -5" ?
- RFC: speedups with instruction side-data (ADCE, perhaps others?)
- [RFC] design doc for straight-line scalar optimizations