thr3ads.net - llvm dev - [llvm-dev] RFC: faster simplifyInstructionsInBlock/SimplifyInstructions pass [Sep 2015]

If this information is useful, please help other people find it:
Share via:

escha via llvm-dev

2015-Sep-13 16:03 UTC

[llvm-dev] RFC: faster simplifyInstructionsInBlock/SimplifyInstructions pass

LLVM has two similar bits of infrastructure: a simplifyInstructionsInBlock
function and a SimplifyInstructions pass, both intended to be lightweight “fix
up this code without doing serious optimizations” functions, as far as I can
tell. I don’t think either is used in a performance-sensitive place in-tree; the
former is mostly called in minor places when doing CFG twiddling, and the latter
seems to be for testing purposes.

However, in our out-of-tree code, I found we were calling
simplifyInstructionsInBlock on the entire function to clean things up before
another pass. This turns out to be quite costly: the function is not very
efficient, tends to revisit instructions more than is necessary, and spends a
lot of time constructing weak handles and handling its worklists. I wrote a new
version of it that is about ~3x faster and reduced our total compilation time by
about 2%. The new version is very very fast: it’s a full-function
simplify-and-DCE that looks to be about twice as fast as comparatively “fast”
passes like ADCE — fast enough that I can imagine plopping it down in a pipeline
pretty much wherever is relevant.

Would anyone be interested in having this in LLVM? It could be a replacement for
simplifyInstructionsInBlock, a replacement for SimplifyInstructionsPass, or
whatever else is reasonable; I’m mostly just not sure where to put it, and want
to be sure it’d be useful to someone (given the current comparative lack of use
of this code in-tree). It -should- be NFC compared to
simplifyInstructionsInBlock other than running per-function instead of per-block
(which can be easily changed, I just made it per-function since that’s what was
most useful to us).

—escha

P.S. Here’s the code. The main optimizations here are:
1. Use a worklist, not a recursive approach, to avoid needless revisitation and
being repeatedly forced to jump back to the start of the BB if a handle is
invalidated.
2. Only insert operands to the worklist if they become unused after a dead
instruction is removed, so we don’t have to visit them again in most cases.
3. Use a SmallSetVector to track the worklist.
4. Instead of pre-initting the SmallSetVector like in DeadCodeEliminationPass,
only put things into the worklist if they have to be revisited after the first
run-through. This minimizes how much the actual SmallSetVector gets used, which
saves a lot of time.

static bool simplifyAndDCEInstruction(Instruction *I,
                                      SmallSetVector<Instruction *, 16>
&WorkList,
                                      const DataLayout &DL) {
  if (isInstructionTriviallyDead(I, nullptr)) {
    // Loop over all of the values that the instruction uses. If there are
    // instructions being used, add them to the worklist, because they might
    // go dead after this one is removed.
    SmallVector<Instruction *, 8> Operands;
    for (User::op_iterator OI = I->op_begin(), E = I->op_end(); OI != E;
++OI)
      if (Instruction *Used = dyn_cast<Instruction>(*OI))
        Operands.push_back(Used);

    // Remove the instruction.
    I->eraseFromParent();

    // Only add the operands to the worklist if their uses are now empty,
    // to avoid needlessly revisiting instructions.
    for (auto U : Operands)
      if (U->use_empty())
        WorkList.insert(U);

    return true;
  }

  if (Value *SimpleV = SimplifyInstruction(I, DL)) {
    // Add the users to the worklist.
    for (User *U : I->users())
      WorkList.insert(cast<Instruction>(U));

    // Replace the instruction with its simplified value.
    I->replaceAllUsesWith(SimpleV);

    // Gracefully handle edge cases where the instruction is not wired into any
    // parent block.
    if (I->getParent())
      I->eraseFromParent();
    return true;
  }
  return false;
}

// A faster version of SimplifyInstructionsInBlock, designed for a whole
// function. Modelled after DeadCodeEliminationPass.
static bool simplifyAndDCEFunction(Function &F) {
  bool MadeChange = false;
  const DataLayout &DL = F.getParent()->getDataLayout();

  SmallSetVector<Instruction *, 16> WorkList;
  // Iterate over the original function, only adding insts to the worklist
  // if they actually need to be revisited. This avoids having to pre-init
  // the worklist with the entire function's worth of instructions.
  for (inst_iterator FI = inst_begin(F), FE = inst_end(F); FI != FE;) {
    Instruction *I = &*FI;
    ++FI;

    // We're visiting this instruction now, so make sure it's not in the
    // worklist from an earlier visit.
    WorkList.remove(I);
    MadeChange |= simplifyAndDCEInstruction(I, WorkList, DL);
  }

  while (!WorkList.empty()) {
    Instruction *I = WorkList.pop_back_val();
    MadeChange |= simplifyAndDCEInstruction(I, WorkList, DL);
  }
  return MadeChange;
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150913/9bde4a5d/attachment-0001.html>

Marcello Maggioni via llvm-dev

2015-Sep-13 18:06 UTC

head link

[llvm-dev] RFC: faster simplifyInstructionsInBlock/SimplifyInstructions pass

Hi!
I like this of course!
A comment about the code that I noticed.> On 13 Sep 2015, at 09:03, escha via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> LLVM has two similar bits of infrastructure: a simplifyInstructionsInBlock
function and a SimplifyInstructions pass, both intended to be lightweight “fix
up this code without doing serious optimizations” functions, as far as I can
tell. I don’t think either is used in a performance-sensitive place in-tree; the
former is mostly called in minor places when doing CFG twiddling, and the latter
seems to be for testing purposes.
> 
> However, in our out-of-tree code, I found we were calling
simplifyInstructionsInBlock on the entire function to clean things up before
another pass. This turns out to be quite costly: the function is not very
efficient, tends to revisit instructions more than is necessary, and spends a
lot of time constructing weak handles and handling its worklists. I wrote a new
version of it that is about ~3x faster and reduced our total compilation time by
about 2%. The new version is very very fast: it’s a full-function
simplify-and-DCE that looks to be about twice as fast as comparatively “fast”
passes like ADCE — fast enough that I can imagine plopping it down in a pipeline
pretty much wherever is relevant.
> 
> Would anyone be interested in having this in LLVM? It could be a
replacement for simplifyInstructionsInBlock, a replacement for
SimplifyInstructionsPass, or whatever else is reasonable; I’m mostly just not
sure where to put it, and want to be sure it’d be useful to someone (given the
current comparative lack of use of this code in-tree). It -should- be NFC
compared to simplifyInstructionsInBlock other than running per-function instead
of per-block (which can be easily changed, I just made it per-function since
that’s what was most useful to us).
> 
> —escha
> 
> P.S. Here’s the code. The main optimizations here are:
> 1. Use a worklist, not a recursive approach, to avoid needless revisitation
and being repeatedly forced to jump back to the start of the BB if a handle is
invalidated.
> 2. Only insert operands to the worklist if they become unused after a dead
instruction is removed, so we don’t have to visit them again in most cases.
> 3. Use a SmallSetVector to track the worklist.
> 4. Instead of pre-initting the SmallSetVector like in
DeadCodeEliminationPass, only put things into the worklist if they have to be
revisited after the first run-through. This minimizes how much the actual
SmallSetVector gets used, which saves a lot of time.
> 
> static bool simplifyAndDCEInstruction(Instruction *I,
>                                       SmallSetVector<Instruction *,
16> &WorkList,
>                                       const DataLayout &DL) {
>   if (isInstructionTriviallyDead(I, nullptr)) {
>     // Loop over all of the values that the instruction uses. If there are
>     // instructions being used, add them to the worklist, because they
might
>     // go dead after this one is removed.
>     SmallVector<Instruction *, 8> Operands;
>     for (User::op_iterator OI = I->op_begin(), E = I->op_end(); OI !=
E; ++OI)
>       if (Instruction *Used = dyn_cast<Instruction>(*OI))
>         Operands.push_back(Used);
> 
>     // Remove the instruction.
>     I->eraseFromParent();
> 
>     // Only add the operands to the worklist if their uses are now empty,
>     // to avoid needlessly revisiting instructions.
>     for (auto U : Operands)
>       if (U->use_empty())
>         WorkList.insert(U);
> 
Instead of adding the operands to a list, erase the instruction and add them to
the worklist wouldn’t be probably faster something like:

if (Instruction *Used = dyn_cast<Instruction>(*OI))
  if (Used->hasOneUse())
    WorkList.insert(Used);

If it has one use is going to be the instruction we are going to remove anyway,
right?


Cheers,
Marcello>     return true;
>   }
> 
>   if (Value *SimpleV = SimplifyInstruction(I, DL)) {
>     // Add the users to the worklist.
>     for (User *U : I->users())
>       WorkList.insert(cast<Instruction>(U));
> 
>     // Replace the instruction with its simplified value.
>     I->replaceAllUsesWith(SimpleV);
> 
>     // Gracefully handle edge cases where the instruction is not wired into
any
>     // parent block.
>     if (I->getParent())
>       I->eraseFromParent();
>     return true;
>   }
>   return false;
> }
> 
> // A faster version of SimplifyInstructionsInBlock, designed for a whole
> // function. Modelled after DeadCodeEliminationPass.
> static bool simplifyAndDCEFunction(Function &F) {
>   bool MadeChange = false;
>   const DataLayout &DL = F.getParent()->getDataLayout();
> 
>   SmallSetVector<Instruction *, 16> WorkList;
>   // Iterate over the original function, only adding insts to the worklist
>   // if they actually need to be revisited. This avoids having to pre-init
>   // the worklist with the entire function's worth of instructions.
>   for (inst_iterator FI = inst_begin(F), FE = inst_end(F); FI != FE;) {
>     Instruction *I = &*FI;
>     ++FI;
> 
>     // We're visiting this instruction now, so make sure it's not
in the
>     // worklist from an earlier visit.
>     WorkList.remove(I);
>     MadeChange |= simplifyAndDCEInstruction(I, WorkList, DL);
>   }
> 
>   while (!WorkList.empty()) {
>     Instruction *I = WorkList.pop_back_val();
>     MadeChange |= simplifyAndDCEInstruction(I, WorkList, DL);
>   }
>   return MadeChange;
> }
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150913/86caf934/attachment.html>

escha via llvm-dev

2015-Sep-13 18:11 UTC

head link

[llvm-dev] RFC: faster simplifyInstructionsInBlock/SimplifyInstructions pass

> 
> Instead of adding the operands to a list, erase the instruction and add
them to the worklist wouldn’t be probably faster something like:
> 
> if (Instruction *Used = dyn_cast<Instruction>(*OI))
>   if (Used->hasOneUse())
>     WorkList.insert(Used);
> 
> If it has one use is going to be the instruction we are going to remove
anyway, right?
I don’t think this is strictly true, but someone correct me if I’m wrong: if you
have

%y = add i32 %x, %x

%x will have two uses, but it will have zero if %y is deleted. This was the
corner case I was worried about.

—escha
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150913/5ace6ad5/attachment.html>

Philip Reames via llvm-dev

2015-Sep-14 18:44 UTC

head link

[llvm-dev] RFC: faster simplifyInstructionsInBlock/SimplifyInstructions pass

This seems like a useful change to me. Why don't you post it for review 
on llvm-commits?

This seems rather similar to both 
RecursivelyDeleteTriviallyDeadInstructions and 
replaceAndRecursivelySimplifyImpl.  I think at least your single user 
trick should be applied to those as well.  Given how similar all three 
are, I'm be tempted to common them into a common implementation.

Philip



On 09/13/2015 09:03 AM, escha via llvm-dev wrote:> LLVM has two similar bits of infrastructure: a 
> simplifyInstructionsInBlock function and a SimplifyInstructions pass, 
> both intended to be lightweight “fix up this code without doing 
> serious optimizations” functions, as far as I can tell. I don’t think 
> either is used in a performance-sensitive place in-tree; the former is 
> mostly called in minor places when doing CFG twiddling, and the latter 
> seems to be for testing purposes.
>
> However, in our out-of-tree code, I found we were calling 
> simplifyInstructionsInBlock on the entire function to clean things up 
> before another pass. This turns out to be quite costly: the function 
> is not very efficient, tends to revisit instructions more than is 
> necessary, and spends a lot of time constructing weak handles and 
> handling its worklists. I wrote a new version of it that is about ~3x 
> faster and reduced our total compilation time by about 2%. The new 
> version is very very fast: it’s a full-function simplify-and-DCE that 
> looks to be about twice as fast as comparatively “fast” passes like 
> ADCE — fast enough that I can imagine plopping it down in a pipeline 
> pretty much wherever is relevant.
>
> Would anyone be interested in having this in LLVM? It could be a 
> replacement for simplifyInstructionsInBlock, a replacement for 
> SimplifyInstructionsPass, or whatever else is reasonable; I’m mostly 
> just not sure where to put it, and want to be sure it’d be useful to 
> someone (given the current comparative lack of use of this code 
> in-tree). It -should- be NFC compared to simplifyInstructionsInBlock 
> other than running per-function instead of per-block (which can be 
> easily changed, I just made it per-function since that’s what was most 
> useful to us).
>
> —escha
>
> P.S. Here’s the code. The main optimizations here are:
> 1. Use a worklist, not a recursive approach, to avoid needless 
> revisitation and being repeatedly forced to jump back to the start of 
> the BB if a handle is invalidated.
> 2. Only insert operands to the worklist if they become unused after a 
> dead instruction is removed, so we don’t have to visit them again in 
> most cases.
> 3. Use a SmallSetVector to track the worklist.
> 4. Instead of pre-initting the SmallSetVector like in 
> DeadCodeEliminationPass, only put things into the worklist if they 
> have to be revisited after the first run-through. This minimizes how 
> much the actual SmallSetVector gets used, which saves a lot of time.
>
> static bool simplifyAndDCEInstruction(Instruction *I,
> SmallSetVector<Instruction *, 16> &WorkList,
> const DataLayout &DL) {
> if(isInstructionTriviallyDead(I, nullptr)) {
> // Loop over all of the values that the instruction uses. If there are
> // instructions being used, add them to the worklist, because they might
> // go dead after this one is removed.
> SmallVector<Instruction *, 8> Operands;
> for (User::op_iterator OI = I->op_begin(), E = I->op_end(); OI != E;
++OI)
> if (Instruction *Used = dyn_cast<Instruction>(*OI))
>         Operands.push_back(Used);
>
> // Remove the instruction.
>     I->eraseFromParent();
>
> // Only add the operands to the worklist if their uses are now empty,
> // to avoid needlessly revisiting instructions.
> for (auto U : Operands)
> if (U->use_empty())
>         WorkList.insert(U);
>
> returntrue;
>   }
>
> if (Value *SimpleV = SimplifyInstruction(I, DL)) {
> // Add the users to the worklist.
> for (User *U : I->users())
>       WorkList.insert(cast<Instruction>(U));
>
> // Replace the instruction with its simplified value.
>     I->replaceAllUsesWith(SimpleV);
>
> // Gracefully handle edge cases where the instruction is not wired 
> into any
> // parent block.
> if (I->getParent())
>       I->eraseFromParent();
> returntrue;
>   }
> returnfalse;
> }
>
> // A faster version of SimplifyInstructionsInBlock, designed for a whole
> // function. Modelled after DeadCodeEliminationPass.
> static bool simplifyAndDCEFunction(Function &F) {
> bool MadeChange = false;
> constDataLayout&DL = F.getParent()->getDataLayout();
>
> SmallSetVector<Instruction *, 16> WorkList;
> // Iterate over the original function, only adding insts to the worklist
> // if they actually need to be revisited. This avoids having to pre-init
> // the worklist with the entire function's worth of instructions.
> for (inst_iterator FI = inst_begin(F), FE = inst_end(F); FI != FE;) {
> Instruction *I = &*FI;
>     ++FI;
>
> // We're visiting this instruction now, so make sure it's not in
the
> // worklist from an earlier visit.
>     WorkList.remove(I);
>     MadeChange |= simplifyAndDCEInstruction(I, WorkList, DL);
>   }
>
> while (!WorkList.empty()) {
> Instruction *I = WorkList.pop_back_val();
>     MadeChange |= simplifyAndDCEInstruction(I, WorkList, DL);
>   }
> return MadeChange;
> }
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150914/42599ba5/attachment-0001.html>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Sep 2015 - RFC: faster simplifyInstructionsInBlock/SimplifyInstructions pass

[llvm-dev] RFC: faster simplifyInstructionsInBlock/SimplifyInstructions pass

[llvm-dev] RFC: faster simplifyInstructionsInBlock/SimplifyInstructions pass

[llvm-dev] RFC: faster simplifyInstructionsInBlock/SimplifyInstructions pass

[llvm-dev] RFC: faster simplifyInstructionsInBlock/SimplifyInstructions pass

Possibly Parallel Threads