thr3ads.net - llvm dev - [LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass [Nov 2011]

If this information is useful, please help other people find it:
Share via:

Tobias Grosser

2011-Nov-17 12:57 UTC

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

On 11/17/2011 12:38 AM, Hal Finkel wrote:> Tobias, et al.,
>
> Attached is the my autovectorization pass.
Very nice. Will you be at the developer summit? Maybe we could discuss 
the integration there?

Here a first review of the source code.

> diff --git a/docs/Passes.html b/docs/Passes.html
> index 5c42f3f..076effa 100644
> --- a/docs/Passes.html
> +++ b/docs/Passes.html
> @@ -126,6 +126,7 @@ perl -e '$/ = undef; for (split(/\n/,<>)) {
s:^ *///? ?::; print "<p>\n" if !
>   <tr><td><a
href="#adce">-adce</a></td><td>Aggressive Dead
Code Elimination</td></tr>
>   <tr><td><a
href="#always-inline">-always-inline</a></td><td>Inliner
for always_inline functions</td></tr>
>   <tr><td><a
href="#argpromotion">-argpromotion</a></td><td>Promote
'by reference' arguments to scalars</td></tr>
> +<tr><td><a
href="#bb-vectorize">-bb-vectorize</a></td><td>Combine
instructions to vectorize within basic blocks</td></tr>
Maybe 'Combine instructions to vector instructions within basic blocks'
> diff --git a/include/llvm-c/Transforms/Vectorize.h
b/include/llvm-c/Transforms/Vectorize.h
> new file mode 100644
> index 0000000..497518a
> --- /dev/null
> +++ b/include/llvm-c/Transforms/Vectorize.h
> @@ -0,0 +1,36 @@
> +/*===---------------------------Vectorize.h ------------------- -*- C++
-*-===*\
> +|*===----------- Vectorization Transformation Library C Interface
---------===*|
> +|*                                                                        
*|
> +|*                     The LLVM Compiler Infrastructure                   
*|
> +|*                                                                        
*|
> +|* This file is distributed under the University of Illinois Open Source  
*|
> +|* License. See LICENSE.TXT for details.                                  
*|
> +|*                                                                        
*|
>
+|*===----------------------------------------------------------------------===*|
> +|*                                                                        
*|
> +|* This header declares the C interface to libLLVMScalarOpts.a, which     
*|
> +|* implements various scalar transformations of the LLVM IR.              
*|
> +|*                                                                        
*|
> +|* Many exotic languages can interoperate with C code but have a harder
time  *|
> +|* with C++ due to name mangling. So in addition to C, this interface
enables *|
> +|* tools written in such languages.                                       
*|
> +|*                                                                        
*|
>
+\*===----------------------------------------------------------------------===*/
This comment does not match the content of the file.
>
> +static cl::opt<bool>
> +RunVectorization("vectorize", cl::desc("Run vectorization
passes"));
> +
>   PassManagerBuilder::PassManagerBuilder() {
>       OptLevel = 2;
>       SizeLevel = 0;
> @@ -38,6 +43,7 @@ PassManagerBuilder::PassManagerBuilder() {
>       DisableSimplifyLibCalls = false;
>       DisableUnitAtATime = false;
>       DisableUnrollLoops = false;
> +    Vectorize = RunVectorization;
Integrating vectorization like this seems to work for now. However, it 
does not seem to be 100% clean. I wonder if we could follow the other 
flags here and set Vectorize = false in the constructor and add the 
flags which enable this to the tools that use the PassManagerBuilder. 
You may even had this or something similar before, because I remember 
for your earlier patches -mllvm did not work for you in clang. If we 
require the tools to explicitly set Vectorizer, we would neeed to add a 
clang specific flag that could drive the vectorizer. I am not sure if we 
want to do this at this stage or even at all.

For now I would keep it like this. I don't have a better solution, but 
wanted to write down my thoughts.

>   }
>
>   PassManagerBuilder::~PassManagerBuilder() {
> @@ -170,6 +176,14 @@ void
PassManagerBuilder::populateModulePassManager(PassManagerBase&MPM) {
>
>     addExtensionsToPM(EP_ScalarOptimizerLate, MPM);
>
> +  if (Vectorize) {
> +    MPM.add(createBBVectorizePass());
> +    if (OptLevel>  1) {
> +      MPM.add(createInstructionCombiningPass());
> +      MPM.add(createGVNPass());                 // Remove redundancies
> +    }
> +  }
> +
What is the reason you will not enable this at -O1. LLVM runs even at 
-O1 a number of instcombine and GVN passes. I don't see a strong reason 
why we would not want to do this after vectorization (In case it is 
known to be useful in general).

> +++ b/lib/Transforms/Vectorize/BBVectorize.cpp
> @@ -0,0 +1,1338 @@
> +    typedef std::pair<Value *, Value *>  value_pair;
> +    typedef std::pair<value_pair, value_pair>  vp_pair;
> +    typedef std::pair<std::multimap<Value *, Value *>::iterator,
> +              std::multimap<Value *, Value *>::iterator> 
vp_iterator_pair;
> +    typedef std::pair<std::multimap<value_pair,
value_pair>::iterator,
> +              std::multimap<value_pair, value_pair>::iterator>
> +                vpp_iterator_pair;
http://llvm.org/docs/CodingStandards.html#ll_naming

"Type names (including classes, structs, enums, typedefs, etc) should be 
nouns and start with an upper-case letter (e.g. TextFileReader)."

> +    void getCandPairs(unsigned vBits, BasicBlock&BB,
> +                       std::multimap<Value *, Value *> 
&candPairs,
> +                       std::vector<Value *>  &pairableInsts);
"Variable names should be nouns (as they represent state). The name 
should be camel case, and start with an upper case letter (e.g. Leader 
or Boats)."

This happens at several places. Can you check your code for this.
> +    void replValues(BasicBlock&BB,
> +                     std::vector<Value *>  &pairableInsts,
> +                     DenseMap<Value *, Value *>&  chosenPairs);'repl' is no obvious abbreviation for me. Can you use the full word or 
equivalent shorter one?
> +
> +    virtual bool runOnBasicBlock(BasicBlock&BB) {
> +      // const TargetData *TD =
getAnalysisIfAvailable<TargetData>();
> +
> +      bool changed = false;
> +      // Iterate a sufficient number of times to merge types of
> +      // size 1, then 2, then 4, etc. up to half of the
> +      // target vector width of the target vector register.Any reason you don't use the 80 columns?

Also, why are you iterating here?

> +      for (unsigned vBits = 2, n = 1; vBits<= VectorBits&&
> +             (!MaxIter || n<= MaxIter); vBits *= 2, ++n) {
> +        DEBUG(dbgs()<<  "BBV: fusing loop #"<< 
n<<  "...\n");
> +        if (vectorizePairs(VectorBits, BB)) {
> +          changed = true;
> +        }
> +        else {
> +          break;
> +        }
In general you should use '} else {'. Here braces are not needed at all.
> +    static inline VectorType *getVecType(Type *iType) {
> +      if (iType->isVectorTy()) {
> +        unsigned numElem =
cast<VectorType>(iType)->getNumElements();
> +        return VectorType::get(iType->getScalarType(), numElem*2);
> +      }
> +      else {
> +        return VectorType::get(iType, 2);
> +      }
> +    }
> +
> +    // Note: when this function returns 0, the
> +    // resulting instructions are not actually fused.
Use all 80 columns. What is a depth factor? Can you explain in the
comment what this function calculates and not what happens when it is
used. If you want to keep the use case, just put is an example.
> +    static inline size_t depthFactor(Value *v) {
> +      if (isa<InsertElementInst>(v) ||
isa<ExtractElementInst>(v)) {
> +        return 0;
> +      }No '{}' needed.
> +
> +      return 1;
> +    }
> +
> +    // Returns 1 if J accesses the memory directly after I; -1 if I
accesses
> +    // the memory directly after J; and 0 otherwise.
> +    int getPairPtrInfo(Instruction *I, Instruction *J, const
TargetData&TD,
> +        Value *&Iptr, Value *&Jptr, unsigned&Ialign,
unsigned&Jalign) {
Also getPairPtrInfo() does not sound like a very descriptive name. What
about getOffset().

I am not sure about returning 0 for the remaining cases. I think
an offset of 0 is actually a very useful case, which can be implemented
as a scalar load plus a splat. What about returning the offset as a
reference value and return a boolean if it was successfully computed.
The other alternative I see is to use an enum that specifies the known
offsets and also includes an UNKNOWN value.
> +      if (isa<LoadInst>(I)) {
> +        Iptr = cast<LoadInst>(I)->getPointerOperand();
> +        Jptr = cast<LoadInst>(J)->getPointerOperand();
> +        Ialign = cast<LoadInst>(I)->getAlignment();
> +        Jalign = cast<LoadInst>(J)->getAlignment();
> +      }
> +      else {
> +        Iptr = cast<StoreInst>(I)->getPointerOperand();
> +        Jptr = cast<StoreInst>(J)->getPointerOperand();
> +        Ialign = cast<StoreInst>(I)->getAlignment();
> +        Jalign = cast<StoreInst>(J)->getAlignment();
> +      }
Use '} else {'. Also you are just checking just the type of one
instruction. You may want to check both and add a call to
llvm_unreachable in case the types do not match.
> +      ScalarEvolution&SE = getAnalysis<ScalarEvolution>();
> +      const SCEV *IptrSCEV = SE.getSCEV(Iptr);
> +      const SCEV *JptrSCEV = SE.getSCEV(Jptr);
> +
> +      // If this is a trivial offset, then we'll get something
> +      // like 1*sizeof(type). With target data, which we need
> +      // anyway, this will get constant folded into a number.Why don't you use the full 80 columns?

> +      const SCEV *RelOffSCEV = SE.getMinusSCEV(JptrSCEV, IptrSCEV);
What about just alling this 'Offset'. Those appreviations are hard to
read.
> +      if (const SCEVConstant *ConstOffSCEV > +           
dyn_cast<SCEVConstant>(RelOffSCEV)) {
> +        ConstantInt *IntOff = ConstOffSCEV->getValue();
> +        int64_t off = IntOff->getSExtValue();
> +
> +        Type *VTy =
cast<PointerType>(Iptr->getType())->getElementType();
You should assert that the types of both pointers are indentical.
Also you should document that this function only works for vector types.
I actually expected it to just work for scalar types. Vector types look
a little bit wired here and I am actually not even sure if it is correct
for vector types.
> +        int64_t VTy_tss = (int64_t) TD.getTypeStoreSize(VTy);
> +
> +        if (off == VTy_tss) {
> +          return 1;
> +        } else if (-off == VTy_tss) {
> +          return -1;
> +        }Braces not needed.
> +      }Did you think of using SE.getSizeOfExpr()?

const SCEV *ElementSize = SE.getSizeofExpr(Iprt->getAllocType())
const SCEV *ElementOffset = SE.getUDivExpr(RelOffSCEV, ElementSize);

if (const SCEVConstant *ConstOffset       
dyn_cast<SCEVConstant>(ElementOffset))
   return ConstOffset->getValue();

else
   return "Unknown offset"
> +  bool BBVectorize::vectorizePairs(unsigned vBits, BasicBlock&BB) {
> +    std::vector<Value *>  pairableInsts;
> +    std::multimap<Value *, Value *>  candPairs;
Variables should start with Uppercase letters.
> +    getCandPairs(vBits, BB, candPairs, pairableInsts);
> +    if (!pairableInsts.size()) return false;
> +
> +    // Now we have a map of all of the pairable instructions and we need
to
> +    // select the best possible pairing. A good pairing is one such that
the
> +    // users of the pair are also paired. This defines a (directed) forest
> +    // over the pairs such that two pairs are connected iff the second
pair
> +    // uses the first.
> +
> +    // Note that it only matters that both members of the second pair use
some
> +    // element of the first pair (to allow for splatting).
> +
> +    std::multimap<value_pair, value_pair>  connPairs;
> +    computeConnPairs(candPairs, pairableInsts, connPairs);
> +    if (!connPairs.size()) return false;
> +
> +    // Build the pairable-instruction dependency map
> +    DenseSet<value_pair>  pairableInstUsers;
> +    buildDepMap(BB, candPairs, pairableInsts, pairableInstUsers);
> +
> +    // There is now a graph of the connected pairs. For each variable,
pick the
> +    // pairing with the largest tree meeting the depth requirement on at
least
> +    // one branch. Then select all pairings that are part of that tree and
> +    // remove them from the list of available parings and pairable
variables.
> +
> +    DenseMap<Value *, Value *>  chosenPairs;
> +    choosePairs(candPairs, pairableInsts, connPairs,
> +      pairableInstUsers, chosenPairs);
> +
> +    if (!chosenPairs.size()) return false;
> +    NumFusedOps += chosenPairs.size();
> +
> +    // A set of chosen pairs has now been selected. It is now necessary to
> +    // replace the paired functions with vector functions. For this
procedure                              instructions with vector instructions.
> +    // each argument much be replaced with a vector argument. This vector                         must
> +    // is formed by using build_vector on the old arguments. The replaced
> +    // values are then replaced with a vector_extract on the result.
> +    // Subsequent optimization passes should coalesce the build/extract
> +    // combinations.
> +
> +    replValues(BB, pairableInsts, chosenPairs);
> +
> +    return true;
> +  }
> +
> +  void BBVectorize::getCandPairs(unsigned vBits, BasicBlock&BB,
> +                       std::multimap<Value *, Value *> 
&candPairs,
> +                       std::vector<Value *>  &pairableInsts) {This function is too big. It does not even fit on my large screen. Can
you extract sub functons. E.g. isValidInst()
> +    AliasAnalysis&AA = getAnalysis<AliasAnalysis>();
> +    BasicBlock::iterator E = BB.end();
> +    for (BasicBlock::iterator I = BB.getFirstInsertionPt(); I != E; ++I) {
The common pattern is:

for (BasicBlock::iterator I = BB.getFirstInsertionPt(), E.BB.end(); I != E;
      ++I) {

BTW, why don't you start at BB.begin()?
> +      bool isGoodIntr = false;
> +      if (isa<CallInst>(I)) {
> +        if (Function *F = cast<CallInst>(I)->getCalledFunction())
{
if (CallInst *CallInst = dyn_cast<CallInst>(I)) {
   if (Function *F = CallInst->getCalledFunction()) {
> +          if (unsigned IID = F->getIntrinsicID()) {
> +            switch(IID) {
> +            case Intrinsic::sqrt:
> +            case Intrinsic::powi:
> +            case Intrinsic::sin:
> +            case Intrinsic::cos:
> +            case Intrinsic::log:
> +            case Intrinsic::log2:
> +            case Intrinsic::log10:
> +            case Intrinsic::exp:
> +            case Intrinsic::exp2:
> +            case Intrinsic::pow:
> +              isGoodIntr = !NoMath;Is the fallthrough intended here?
> +            case Intrinsic::fma:
> +              isGoodIntr = !NoFMA;I would also put a break here.

What happends in the default case or do you cover all
intrinsics.?> +            }
> +          }
> +        }Most of these '{}' are not needed.
> +      }
> +
> +      // Vectorize simple loads and stores if possbile:
> +      bool isLdStr = false;
IsSimpleLoad?
> +      if (isa<LoadInst>(I)) {
> +        isLdStr = cast<LoadInst>(I)->isSimple();
> +      } else if (isa<StoreInst>(I)) {
> +        isLdStr = cast<StoreInst>(I)->isSimple();
> +      }
if (LoadInst *Load = dyn_cast<LoadInst>(I)) {
   isLdStr = Load->isSimple();
} else if (StoreInst *Store = dyn_cast<StoreInst>(I)) {
   isLdStr = Store->isSimple();
}
> +
> +      // We can vectorize casts, but not casts of pointer types, etc.
> +      bool isCast = false;
> +      if (I->isCast()) {
> +        isCast = true;
> +        if
(!cast<CastInst>(I)->getSrcTy()->isSingleValueType()) {
> +          isCast = false;
> +        } else if
(!cast<CastInst>(I)->getDestTy()->isSingleValueType()) {
> +          isCast = false;
> +        } else if
(cast<CastInst>(I)->getSrcTy()->isPointerTy()) {
> +          isCast = false;
> +        } else if
(cast<CastInst>(I)->getDestTy()->isPointerTy()) {
> +          isCast = false;
> +        }
> +      }
> +
> +      if (!(I->isBinaryOp() || isa<ShuffleVectorInst>(I) ||
> +              isa<ExtractElementInst>(I) ||
isa<InsertElementInst>(I) ||
> +              (!NoCasts&&  isCast) || isGoodIntr ||
> +              (!NoMemOps&&  isLdStr))) {
> +        continue;
> +      }
> +
> +      // We can't vectorize memory operations without target data
> +      if (AA.getTargetData() == 0&&  isLdStr) {
> +        continue;
> +      }
> +
> +      Type *T1, *T2;
> +      if (isa<StoreInst>(I)) {
> +        // For stores, it is the value type, not the pointer type that
matters
> +        // because the value is what will come from a vector register.
> +
> +        Value *Ival = cast<StoreInst>(I)->getValueOperand();
> +        T1 = Ival->getType();
> +      }
> +      else {
> +        T1 = I->getType();
> +      }
> +
> +      if (I->isCast()) {
> +        T2 = cast<CastInst>(I)->getSrcTy();
> +      }
> +      else {
> +        T2 = T1;
> +      }
> +
> +      // Not every type can be vectorized...
> +      if (!(VectorType::isValidElementType(T1) || T1->isVectorTy()) ||
> +          !(VectorType::isValidElementType(T2) || T2->isVectorTy())) {
> +        continue;
> +      }
> +
> +      if (NoInts&&  (T1->isIntOrIntVectorTy() ||
T2->isIntOrIntVectorTy())) {
> +        continue;
> +      }
> +
> +      if (NoFloats&&  (T1->isFPOrFPVectorTy() ||
T2->isFPOrFPVectorTy())) {
> +        continue;
> +      }
> +
> +      if (T1->getPrimitiveSizeInBits()>  vBits/2 ||
> +          T2->getPrimitiveSizeInBits()>  vBits/2) {
> +        continue;
> +      }
> +
> +      // Look for an instruction with which to pair instruction *I...This could be a separate function, no?
> +      DenseSet<Value *>  users;
> +      AliasSetTracker writes(AA);
> +      BasicBlock::iterator J = I; ++J;
> +      for (unsigned ss = 0; J != E&&  ss<= SearchLimit; ++J,
++ss) {
> +        // Determine if J uses I, if so, exit the loop.
This loop is way to big. You can extract here also a couple of
subfunctions. I will review this after this in etail more
readable.> +        bool usesI = false;
> +        for (User::op_iterator i = J->op_begin(), e = J->op_end();
> +               i != e; ++i) {
> +          Value *v = *i;
> +          if (I == v ||
> +               (!FastDep&&  users.count(v))) {
Why a newline here? It should not break the 80 col limit.
> +            usesI = true; break;One instruction per line, please.
> +          }
> +        }
> +        if (!usesI&&  J->mayReadFromMemory()) {
> +          for (AliasSetTracker::iterator i = writes.begin(), e =
writes.end();
> +                i != e; ++i) {
> +            for (AliasSet::iterator j = i->begin(), e2 = i->end();
> +                   j != e2; ++j) {
> +              AliasAnalysis::Location ptrLoc(j->getValue(),
j->getSize(),
> +                j->getTBAAInfo());
> +              if (AA.getModRefInfo(J, ptrLoc) != AliasAnalysis::NoModRef)
{
> +                usesI = true; break;One instruction per line.
> +              }
> +            }
> +            if (usesI) break;
> +          }
> +        }Many of these braces are not necessarily needed. In case this function
would be smaller they would also not be required for readability.
> +        if (FastDep) {
> +          // Note: For this heuristic to be effective, independent
operations
> +          // must tend to be intermixed. This is likely to be true from
some
> +          // kinds of loop unrolling (but not the generic LLVM pass),
> +          // but otherwise may require some kind of reordering pass.
What does it mean that independent operations must be intermixed? Can
you give an example?
> +
> +          // When using fast dependency analysis,
> +          // stop searching after first use:
> +          if (usesI) break;
> +        }
> +        else {
> +          if (usesI) {
} else if (usesI) {
> +            if (J->mayWriteToMemory()) writes.add(J);
> +            users.insert(J);
> +            continue;
> +          }
> +        }
> +
> +        // J does not use I, and comes before the first use of I, so it
can be
> +        // merged with I if the instructions are compatible.
> +        bool isCompat = J->isSameOperationAs(I);
> +        // FIXME: handle addsub-type operations!
> +
> +        // Only merge two shuffles if they're both constant
> +        // or both not constant.
> +        if (isCompat&&  isa<ShuffleVectorInst>(I)) {
> +          isCompat = isa<Constant>(I->getOperand(2))&&
> +                     isa<Constant>(J->getOperand(2));
> +          // FIXME: We may want to vectorize non-constant shuffles also.
> +        }
> +
> +        // Loads and stores can be merged if they have different
alignments,
> +        // but are otherwise the same.
> +        if (!isCompat&&  isa<LoadInst>(I)&& 
isa<LoadInst>(J)) {
> +          if (I->getType() == J->getType()) {
> +            if
(cast<LoadInst>(I)->getPointerOperand()->getType() => +          
cast<LoadInst>(J)->getPointerOperand()->getType()&&
> +                cast<LoadInst>(I)->isVolatile() => +          
cast<LoadInst>(J)->isVolatile()&&
> +                cast<LoadInst>(I)->getOrdering() => +         
cast<LoadInst>(J)->getOrdering()&&
> +                cast<LoadInst>(I)->getSynchScope() => +       
cast<LoadInst>(J)->getSynchScope()
> +               ) {
> +                 isCompat = true;
> +             }
> +          }
> +        } else if (!isCompat&&  isa<StoreInst>(I)&& 
isa<StoreInst>(J)) {
> +          if (cast<StoreInst>(I)->getValueOperand()->getType()
=> +               
cast<StoreInst>(J)->getValueOperand()->getType()&&
> +             
cast<StoreInst>(I)->getPointerOperand()->getType() => +          
cast<StoreInst>(J)->getPointerOperand()->getType()&&
> +              cast<StoreInst>(I)->isVolatile() => +           
cast<StoreInst>(J)->isVolatile()&&
> +              cast<StoreInst>(I)->getOrdering() => +          
cast<StoreInst>(J)->getOrdering()&&
> +              cast<StoreInst>(I)->getSynchScope() => +        
cast<StoreInst>(J)->getSynchScope()
> +             ) {
> +               isCompat = true;
> +           }
> +        }
> +
> +        if (isCompat&&  isLdStr) {
> +          const TargetData&TD = *AA.getTargetData();
> +
> +          Value *Iptr, *Jptr;
> +          unsigned Ialign, Jalign;
> +          int rel = getPairPtrInfo(I, J, TD, Iptr, Jptr, Ialign, Jalign);
Use uppercase letters. What does 'rel' mean? Can you use a more
descriptive name?
> +
> +          if (rel != 0) {
> +            if (AlignedOnly) {
> +              Type *aType = isa<StoreInst>(I) ?
> +               
cast<StoreInst>(I)->getValueOperand()->getType() : I->getType();
> +              // An aligned load or store is possible only if the
instruction
> +              // with the lower offset has an alignment suitable for the
> +              // vector type.
> +
> +              unsigned balign = Ialign;
Uppercase letter.
> +              if (rel<  0) balign = Jalign;
> +
> +              Type *vType = getVecType(aType);
> +              unsigned vecalign = TD.getPrefTypeAlignment(vType);
Uppercase.
> +              if (balign<  vecalign) {
> +                isCompat = false;
This could just be a continue (or a return false if extracted to a
separate function. Also no braces needed.
> +              }
> +            }
> +          }
> +          else {
} else {
> +            isCompat = false;This could just be a continue (or a return false if extracted to a
separate function.

> +          }
> +        }
> +
> +        if (!isCompat) continue;
> +
> +        // J is a candidate for merging with I.
> +        if (!pairableInsts.size() ||
> +             pairableInsts[pairableInsts.size()-1] != I) {
> +          pairableInsts.push_back(I);
> +        }
No braces needed.
> +        candPairs.insert(value_pair(I, J));
> +        DEBUG(dbgs()<<  "BBV: candidate pair "<< 
*I<<
> +                     "<->  "<<  *J<< 
"\n");
> +      }
> +    }
> +
> +    DEBUG(dbgs()<<  "BBV: found "<< 
pairableInsts.size()
> +<<  " instructions with candidate pairs\n");
> +  }
> +
> +  void BBVectorize::computeConnPairs(
> +                      std::multimap<Value *, Value *> 
&candPairs,
> +                      std::vector<Value *>  &pairableInsts,
> +                      std::multimap<value_pair, value_pair> 
&connPairs) {Uppercase names for arguments. Also, the function is way too long. Can
you extract smaller helper functions. I will review this then.
> +
> +    for (std::vector<Value *>::iterator i = pairableInsts.begin(),
> +           e = pairableInsts.end(); i != e; ++i) {
> +      vp_iterator_pair choiceRange = candPairs.equal_range(*i);
> +
> +      for (std::multimap<Value *, Value *>::iterator j =
choiceRange.first;
> +           j != choiceRange.second; ++j) {
> +
> +        // For each possible pairing for this variable, look at the uses
of
> +        // the first value...
> +        for (Value::use_iterator I = j->first->use_begin(),
> +             E = j->first->use_end(); I != E; ++I) {
> +          vp_iterator_pair iPairRange = candPairs.equal_range(*I);
> +
> +          // For each use of the first variable, look for uses of the
second
> +          // variable...
> +          for (Value::use_iterator J = j->second->use_begin(),
> +               E2 = j->second->use_end(); J != E2; ++J) {
> +            vp_iterator_pair jPairRange = candPairs.equal_range(*J);
> +
> +            // Look for<I, J>:
> +            for (std::multimap<Value *, Value *>::iterator k =
iPairRange.first;
> +                 k != iPairRange.second; ++k) {
> +              if (k->second == *J) {
> +                connPairs.insert(vp_pair(*j, value_pair(*I, *J)));
> +                break;
> +              }
> +            }
> +            // Look for<J, I>:
> +            for (std::multimap<Value *, Value *>::iterator k =
jPairRange.first;
> +                 k != jPairRange.second; ++k) {
> +              if (k->second == *I) {
> +                connPairs.insert(vp_pair(*j, value_pair(*J, *I)));
> +                break;
> +              }
> +            }
> +          }
> +
> +          // Look for cases where just the first value in the pair is used
by
> +          // both members of another pair (splatting).
> +          Value::use_iterator J = j->first->use_begin();
> +          if (!SplatBreaksChain) for (; J != E; ++J) {
> +            for (std::multimap<Value *, Value *>::iterator k =
iPairRange.first;
> +                 k != iPairRange.second; ++k) {
> +              if (k->second == *J) {
> +                connPairs.insert(vp_pair(*j, value_pair(*I, *J)));
> +                break;
> +              }
> +            }
> +          }
> +        }
> +
> +        // Look for cases where just the second value in the pair is used
by
> +        // both members of another pair (splatting).
> +        if (!SplatBreaksChain)
> +          for (Value::use_iterator I = j->second->use_begin(),
> +             E = j->second->use_end(); I != E; ++I) {
> +          vp_iterator_pair iPairRange = candPairs.equal_range(*I);
> +
> +          Value::use_iterator J = j->second->use_begin();
> +          for (; J != E; ++J) {
> +            for (std::multimap<Value *, Value *>::iterator k =
iPairRange.first;
> +                 k != iPairRange.second; ++k) {
> +              if (k->second == *J) {
> +                connPairs.insert(vp_pair(*j, value_pair(*I, *J)));
> +                break;
> +              }
> +            }
> +          }
> +        }
> +      }
> +    }
> +
> +    DEBUG(dbgs()<<  "BBV: found "<< 
connPairs.size()
> +<<  " pair connections.\n");
> +  }
> +  void BBVectorize::buildDepMap(
> +                      BasicBlock&BB,
> +                      std::multimap<Value *, Value *> 
&candPairs,
> +                      std::vector<Value *>  &pairableInsts,
> +                      DenseSet<value_pair>  &pairableInstUsers)
{Uppercase variable names.
> +    DenseSet<Value *>  isInPair;
> +    for (std::multimap<Value *, Value *>::iterator i =
candPairs.begin(),
> +           e = candPairs.end(); i != e; ++i) {
Uppercase letters are used in LLVM for iterators.
> +      isInPair.insert(i->first); isInPair.insert(i->second);
> +    }
No braces needed.
> +
> +    // Iterate through the basic block, recording all users of each
> +    // pairable instruction.
> +
> +    AliasAnalysis&AA = getAnalysis<AliasAnalysis>();
> +    BasicBlock::iterator E = BB.end();
> +    for (BasicBlock::iterator I = BB.getFirstInsertionPt(); I != E; ++I) {
> +      if (isInPair.find(I) == isInPair.end()) continue;
> +
> +      DenseSet<Value *>  users;
> +      AliasSetTracker writes(AA);
> +      BasicBlock::iterator J = I; ++J;
> +      for (; J != E; ++J) {
> +        // Determine if J uses I, if so, exit the loop.Just extract this into a small helper function. No need to keep it here.
> +        bool usesI = false;
> +        for (User::op_iterator i = J->op_begin(), e = J->op_end();
> +               i != e; ++i) {
> +          Value *v = *i;
> +          if (I == v || users.count(v)) {
> +            usesI = true; break;
> +          }
> +        }
> +        if (!usesI&&  J->mayReadFromMemory()) {
And this one as well> +          for (AliasSetTracker::iterator i = writes.begin(), e =
writes.end();
> +                i != e; ++i) {
> +            for (AliasSet::iterator j = i->begin(), e2 = i->end();
> +                   j != e2; ++j) {
> +              AliasAnalysis::Location ptrLoc(j->getValue(),
j->getSize(),
> +                j->getTBAAInfo());
> +              if (AA.getModRefInfo(J, ptrLoc) != AliasAnalysis::NoModRef)
{
> +                usesI = true; break;
> +              }
> +            }
> +            if (usesI) break;
> +          }
> +        }
> +        if (usesI) {
> +          if (J->mayWriteToMemory()) writes.add(J);
> +          users.insert(J);
> +        }
> +      }The resulting loop should be very simple.
	> +
> +      for (DenseSet<Value *>::iterator i = users.begin(), e =
users.end();
> +             i != e; ++i) {
> +        pairableInstUsers.insert(value_pair(I, *i));
> +      }
> +    }
> +  }
> +
What does it mean pairs are in conflict? What about adding a comment
about what this actually means.
> +  bool BBVectorize::pairsConflict(value_pair i, value_pair j,
> +                      DenseSet<value_pair>  &pairableInstUsers)
{
> +    // Two pairs are in conflict if they are mutual users of eachother.
> +    bool jui = pairableInstUsers.count(value_pair(i.first, j.first)) ||
> +               pairableInstUsers.count(value_pair(i.first, j.second)) ||
> +               pairableInstUsers.count(value_pair(i.second, j.first)) ||
> +               pairableInstUsers.count(value_pair(i.second, j.second));
> +    bool iuj = pairableInstUsers.count(value_pair(j.first, i.first)) ||
> +               pairableInstUsers.count(value_pair(j.first, i.second)) ||
> +               pairableInstUsers.count(value_pair(j.second, i.first)) ||
> +               pairableInstUsers.count(value_pair(j.second, i.second));
> +    return (jui&&  iuj);
> +  }
> +
> +  void BBVectorize::choosePairs(
> +                      std::multimap<Value *, Value *> 
&candPairs,
> +                      std::vector<Value *>  &pairableInsts,
> +                      std::multimap<value_pair, value_pair> 
&connPairs,
> +                      DenseSet<value_pair>  &pairableInstUsers,
> +                      DenseMap<Value *, Value *>&  chosenPairs)
{
This one is again way to large. Can you please extract subfunctions.

I need to stop here, as I run out of time for today. I hope you already
get an impression what kind of improvements I suggest. If you agree,
please go ahead and integrate them. Especially the use of smaller
subfunctions will help me a lot when reviewing this again. Most probably
many of your inline comments, could become the descriptions of some of
those extracted functions.

Cheers
Tobi

Hal Finkel

2011-Nov-21 17:55 UTC

head link

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

Tobias,

I've attached an updated patch. It contains a few bug fixes and many
(refactoring and coding-convention) changes inspired by your comments.

I'm currently trying to fix the bug responsible for causing a compile
failure when compiling
test-suite/MultiSource/Applications/obsequi/toggle_move.c; after the
pass begins to fuse instructions in a basic block in this file, the
aliasing analysis starts producing different (more pessimistic) query
answers. I've never before seen it do this, and I'm not sure why it is
happening. Also odd, at the same time, the numeric labels that are
assigned to otherwise unnamed instructions, change. I don't think I've
seen this happen either (they generally seem to stay fixed once
assigned). I don't know if these two things are related, but if anyone
can provide some insight, I'd appreciate it.

In any case, this version of the patch should be much more suitable for
your (or anyone else's) further review.

Thanks again,
Hal

On Thu, 2011-11-17 at 13:57 +0100, Tobias Grosser wrote:> On 11/17/2011 12:38 AM, Hal Finkel wrote:
> > Tobias, et al.,
> >
> > Attached is the my autovectorization pass.
> 
> Very nice. Will you be at the developer summit? Maybe we could discuss 
> the integration there?
> 
> Here a first review of the source code.
> 
> 
> > diff --git a/docs/Passes.html b/docs/Passes.html
> > index 5c42f3f..076effa 100644
> > --- a/docs/Passes.html
> > +++ b/docs/Passes.html
> > @@ -126,6 +126,7 @@ perl -e '$/ = undef; for
(split(/\n/,<>)) { s:^ *///? ?::; print "<p>\n" if !
> >   <tr><td><a
href="#adce">-adce</a></td><td>Aggressive Dead
Code Elimination</td></tr>
> >   <tr><td><a
href="#always-inline">-always-inline</a></td><td>Inliner
for always_inline functions</td></tr>
> >   <tr><td><a
href="#argpromotion">-argpromotion</a></td><td>Promote
'by reference' arguments to scalars</td></tr>
> > +<tr><td><a
href="#bb-vectorize">-bb-vectorize</a></td><td>Combine
instructions to vectorize within basic blocks</td></tr>
> 
> Maybe 'Combine instructions to vector instructions within basic
blocks'
> 
> > diff --git a/include/llvm-c/Transforms/Vectorize.h
b/include/llvm-c/Transforms/Vectorize.h
> > new file mode 100644
> > index 0000000..497518a
> > --- /dev/null
> > +++ b/include/llvm-c/Transforms/Vectorize.h
> > @@ -0,0 +1,36 @@
> > +/*===---------------------------Vectorize.h ------------------- -*-
C++ -*-===*\
> > +|*===----------- Vectorization Transformation Library C Interface
---------===*|
> > +|*                                                                   
*|
> > +|*                     The LLVM Compiler Infrastructure              
*|
> > +|*                                                                   
*|
> > +|* This file is distributed under the University of Illinois Open
Source      *|
> > +|* License. See LICENSE.TXT for details.                             
*|
> > +|*                                                                   
*|
> >
+|*===----------------------------------------------------------------------===*|
> > +|*                                                                   
*|
> > +|* This header declares the C interface to libLLVMScalarOpts.a, which
*|
> > +|* implements various scalar transformations of the LLVM IR.         
*|
> > +|*                                                                   
*|
> > +|* Many exotic languages can interoperate with C code but have a
harder time  *|
> > +|* with C++ due to name mangling. So in addition to C, this interface
enables *|
> > +|* tools written in such languages.                                  
*|
> > +|*                                                                   
*|
> >
+\*===----------------------------------------------------------------------===*/
> 
> This comment does not match the content of the file.
> 
> >
> > +static cl::opt<bool>
> > +RunVectorization("vectorize", cl::desc("Run
vectorization passes"));
> > +
> >   PassManagerBuilder::PassManagerBuilder() {
> >       OptLevel = 2;
> >       SizeLevel = 0;
> > @@ -38,6 +43,7 @@ PassManagerBuilder::PassManagerBuilder() {
> >       DisableSimplifyLibCalls = false;
> >       DisableUnitAtATime = false;
> >       DisableUnrollLoops = false;
> > +    Vectorize = RunVectorization;
> 
> Integrating vectorization like this seems to work for now. However, it 
> does not seem to be 100% clean. I wonder if we could follow the other 
> flags here and set Vectorize = false in the constructor and add the 
> flags which enable this to the tools that use the PassManagerBuilder. 
> You may even had this or something similar before, because I remember 
> for your earlier patches -mllvm did not work for you in clang. If we 
> require the tools to explicitly set Vectorizer, we would neeed to add a 
> clang specific flag that could drive the vectorizer. I am not sure if we 
> want to do this at this stage or even at all.
> 
> For now I would keep it like this. I don't have a better solution, but 
> wanted to write down my thoughts.
> 
> 
> >   }
> >
> >   PassManagerBuilder::~PassManagerBuilder() {
> > @@ -170,6 +176,14 @@ void
PassManagerBuilder::populateModulePassManager(PassManagerBase&MPM) {
> >
> >     addExtensionsToPM(EP_ScalarOptimizerLate, MPM);
> >
> > +  if (Vectorize) {
> > +    MPM.add(createBBVectorizePass());
> > +    if (OptLevel>  1) {
> > +      MPM.add(createInstructionCombiningPass());
> > +      MPM.add(createGVNPass());                 // Remove
redundancies
> > +    }
> > +  }
> > +
> 
> What is the reason you will not enable this at -O1. LLVM runs even at 
> -O1 a number of instcombine and GVN passes. I don't see a strong reason
> why we would not want to do this after vectorization (In case it is 
> known to be useful in general).
> 
> 
> > +++ b/lib/Transforms/Vectorize/BBVectorize.cpp
> > @@ -0,0 +1,1338 @@
> 
> > +    typedef std::pair<Value *, Value *>  value_pair;
> > +    typedef std::pair<value_pair, value_pair>  vp_pair;
> > +    typedef std::pair<std::multimap<Value *, Value
*>::iterator,
> > +              std::multimap<Value *, Value *>::iterator> 
vp_iterator_pair;
> > +    typedef std::pair<std::multimap<value_pair,
value_pair>::iterator,
> > +              std::multimap<value_pair,
value_pair>::iterator>
> > +                vpp_iterator_pair;
> 
> http://llvm.org/docs/CodingStandards.html#ll_naming
> 
> "Type names (including classes, structs, enums, typedefs, etc) should
be
> nouns and start with an upper-case letter (e.g. TextFileReader)."
> 
> 
> > +    void getCandPairs(unsigned vBits, BasicBlock&BB,
> > +                       std::multimap<Value *, Value *> 
&candPairs,
> > +                       std::vector<Value *> 
&pairableInsts);
> 
> "Variable names should be nouns (as they represent state). The name 
> should be camel case, and start with an upper case letter (e.g. Leader 
> or Boats)."
> 
> This happens at several places. Can you check your code for this.
> 
> > +    void replValues(BasicBlock&BB,
> > +                     std::vector<Value *>  &pairableInsts,
> > +                     DenseMap<Value *, Value *>& 
chosenPairs);
> 'repl' is no obvious abbreviation for me. Can you use the full word
or
> equivalent shorter one?
> 
> > +
> > +    virtual bool runOnBasicBlock(BasicBlock&BB) {
> > +      // const TargetData *TD =
getAnalysisIfAvailable<TargetData>();
> > +
> > +      bool changed = false;
> > +      // Iterate a sufficient number of times to merge types of
> > +      // size 1, then 2, then 4, etc. up to half of the
> > +      // target vector width of the target vector register.
> Any reason you don't use the 80 columns?
> 
> Also, why are you iterating here?
> 
> 
> > +      for (unsigned vBits = 2, n = 1; vBits<= VectorBits&&
> > +             (!MaxIter || n<= MaxIter); vBits *= 2, ++n) {
> > +        DEBUG(dbgs()<<  "BBV: fusing loop #"<< 
n<<  "...\n");
> > +        if (vectorizePairs(VectorBits, BB)) {
> > +          changed = true;
> > +        }
> > +        else {
> > +          break;
> > +        }
> 
> In general you should use '} else {'. Here braces are not needed at
all.
> 
> > +    static inline VectorType *getVecType(Type *iType) {
> > +      if (iType->isVectorTy()) {
> > +        unsigned numElem =
cast<VectorType>(iType)->getNumElements();
> > +        return VectorType::get(iType->getScalarType(), numElem*2);
> > +      }
> > +      else {
> > +        return VectorType::get(iType, 2);
> > +      }
> > +    }
> > +
> > +    // Note: when this function returns 0, the
> > +    // resulting instructions are not actually fused.
> 
> Use all 80 columns. What is a depth factor? Can you explain in the
> comment what this function calculates and not what happens when it is
> used. If you want to keep the use case, just put is an example.
> 
> > +    static inline size_t depthFactor(Value *v) {
> > +      if (isa<InsertElementInst>(v) ||
isa<ExtractElementInst>(v)) {
> > +        return 0;
> > +      }
> No '{}' needed.
> 
> > +
> > +      return 1;
> > +    }
> > +
> > +    // Returns 1 if J accesses the memory directly after I; -1 if I
accesses
> > +    // the memory directly after J; and 0 otherwise.
> > +    int getPairPtrInfo(Instruction *I, Instruction *J, const
TargetData&TD,
> > +        Value *&Iptr, Value *&Jptr, unsigned&Ialign,
unsigned&Jalign) {
> 
> Also getPairPtrInfo() does not sound like a very descriptive name. What
> about getOffset().
> 
> I am not sure about returning 0 for the remaining cases. I think
> an offset of 0 is actually a very useful case, which can be implemented
> as a scalar load plus a splat. What about returning the offset as a
> reference value and return a boolean if it was successfully computed.
> The other alternative I see is to use an enum that specifies the known
> offsets and also includes an UNKNOWN value.
> 
> > +      if (isa<LoadInst>(I)) {
> > +        Iptr = cast<LoadInst>(I)->getPointerOperand();
> > +        Jptr = cast<LoadInst>(J)->getPointerOperand();
> > +        Ialign = cast<LoadInst>(I)->getAlignment();
> > +        Jalign = cast<LoadInst>(J)->getAlignment();
> > +      }
> > +      else {
> > +        Iptr = cast<StoreInst>(I)->getPointerOperand();
> > +        Jptr = cast<StoreInst>(J)->getPointerOperand();
> > +        Ialign = cast<StoreInst>(I)->getAlignment();
> > +        Jalign = cast<StoreInst>(J)->getAlignment();
> > +      }
> 
> Use '} else {'. Also you are just checking just the type of one
> instruction. You may want to check both and add a call to
> llvm_unreachable in case the types do not match.
> 
> > +      ScalarEvolution&SE = getAnalysis<ScalarEvolution>();
> > +      const SCEV *IptrSCEV = SE.getSCEV(Iptr);
> > +      const SCEV *JptrSCEV = SE.getSCEV(Jptr);
> > +
> > +      // If this is a trivial offset, then we'll get something
> > +      // like 1*sizeof(type). With target data, which we need
> > +      // anyway, this will get constant folded into a number.
> Why don't you use the full 80 columns?
> 
> 
> > +      const SCEV *RelOffSCEV = SE.getMinusSCEV(JptrSCEV, IptrSCEV);
> 
> What about just alling this 'Offset'. Those appreviations are hard
to
> read.
> 
> > +      if (const SCEVConstant *ConstOffSCEV > > +           
dyn_cast<SCEVConstant>(RelOffSCEV)) {
> > +        ConstantInt *IntOff = ConstOffSCEV->getValue();
> > +        int64_t off = IntOff->getSExtValue();
> > +
> > +        Type *VTy =
cast<PointerType>(Iptr->getType())->getElementType();
> 
> You should assert that the types of both pointers are indentical.
> Also you should document that this function only works for vector types.
> I actually expected it to just work for scalar types. Vector types look
> a little bit wired here and I am actually not even sure if it is correct
> for vector types.
> 
> > +        int64_t VTy_tss = (int64_t) TD.getTypeStoreSize(VTy);
> > +
> > +        if (off == VTy_tss) {
> > +          return 1;
> > +        } else if (-off == VTy_tss) {
> > +          return -1;
> > +        }
> Braces not needed.
> 
> > +      }
> Did you think of using SE.getSizeOfExpr()?
> 
> const SCEV *ElementSize = SE.getSizeofExpr(Iprt->getAllocType())
> const SCEV *ElementOffset = SE.getUDivExpr(RelOffSCEV, ElementSize);
> 
> if (const SCEVConstant *ConstOffset >       
dyn_cast<SCEVConstant>(ElementOffset))
>    return ConstOffset->getValue();
> 
> else
>    return "Unknown offset"
> 
> > +  bool BBVectorize::vectorizePairs(unsigned vBits, BasicBlock&BB)
{
> > +    std::vector<Value *>  pairableInsts;
> > +    std::multimap<Value *, Value *>  candPairs;
> 
> Variables should start with Uppercase letters.
> 
> > +    getCandPairs(vBits, BB, candPairs, pairableInsts);
> > +    if (!pairableInsts.size()) return false;
> > +
> > +    // Now we have a map of all of the pairable instructions and we
need to
> > +    // select the best possible pairing. A good pairing is one such
that the
> > +    // users of the pair are also paired. This defines a (directed)
forest
> > +    // over the pairs such that two pairs are connected iff the
second pair
> > +    // uses the first.
> > +
> > +    // Note that it only matters that both members of the second pair
use some
> > +    // element of the first pair (to allow for splatting).
> > +
> > +    std::multimap<value_pair, value_pair>  connPairs;
> > +    computeConnPairs(candPairs, pairableInsts, connPairs);
> > +    if (!connPairs.size()) return false;
> > +
> > +    // Build the pairable-instruction dependency map
> > +    DenseSet<value_pair>  pairableInstUsers;
> > +    buildDepMap(BB, candPairs, pairableInsts, pairableInstUsers);
> > +
> > +    // There is now a graph of the connected pairs. For each
variable, pick the
> > +    // pairing with the largest tree meeting the depth requirement on
at least
> > +    // one branch. Then select all pairings that are part of that
tree and
> > +    // remove them from the list of available parings and pairable
variables.
> > +
> > +    DenseMap<Value *, Value *>  chosenPairs;
> > +    choosePairs(candPairs, pairableInsts, connPairs,
> > +      pairableInstUsers, chosenPairs);
> > +
> > +    if (!chosenPairs.size()) return false;
> > +    NumFusedOps += chosenPairs.size();
> > +
> > +    // A set of chosen pairs has now been selected. It is now
necessary to
> > +    // replace the paired functions with vector functions. For this
procedure
>                               instructions with vector instructions.
> 
> > +    // each argument much be replaced with a vector argument. This
vector
>                          must
> 
> > +    // is formed by using build_vector on the old arguments. The
replaced
> > +    // values are then replaced with a vector_extract on the result.
> > +    // Subsequent optimization passes should coalesce the
build/extract
> > +    // combinations.
> > +
> > +    replValues(BB, pairableInsts, chosenPairs);
> > +
> > +    return true;
> > +  }
> > +
> > +  void BBVectorize::getCandPairs(unsigned vBits, BasicBlock&BB,
> > +                       std::multimap<Value *, Value *> 
&candPairs,
> > +                       std::vector<Value *> 
&pairableInsts) {
> This function is too big. It does not even fit on my large screen. Can
> you extract sub functons. E.g. isValidInst()
> 
> > +    AliasAnalysis&AA = getAnalysis<AliasAnalysis>();
> > +    BasicBlock::iterator E = BB.end();
> > +    for (BasicBlock::iterator I = BB.getFirstInsertionPt(); I != E;
++I) {
> 
> The common pattern is:
> 
> for (BasicBlock::iterator I = BB.getFirstInsertionPt(), E.BB.end(); I != E;
>       ++I) {
> 
> BTW, why don't you start at BB.begin()?
> 
> > +      bool isGoodIntr = false;
> > +      if (isa<CallInst>(I)) {
> > +        if (Function *F =
cast<CallInst>(I)->getCalledFunction()) {
> 
> if (CallInst *CallInst = dyn_cast<CallInst>(I)) {
>    if (Function *F = CallInst->getCalledFunction()) {
> 
> > +          if (unsigned IID = F->getIntrinsicID()) {
> > +            switch(IID) {
> > +            case Intrinsic::sqrt:
> > +            case Intrinsic::powi:
> > +            case Intrinsic::sin:
> > +            case Intrinsic::cos:
> > +            case Intrinsic::log:
> > +            case Intrinsic::log2:
> > +            case Intrinsic::log10:
> > +            case Intrinsic::exp:
> > +            case Intrinsic::exp2:
> > +            case Intrinsic::pow:
> > +              isGoodIntr = !NoMath;
> Is the fallthrough intended here?
> 
> > +            case Intrinsic::fma:
> > +              isGoodIntr = !NoFMA;
> I would also put a break here.
> 
> What happends in the default case or do you cover all intrinsics.?
> > +            }
> > +          }
> > +        }
> Most of these '{}' are not needed.
> 
> > +      }
> > +
> > +      // Vectorize simple loads and stores if possbile:
> > +      bool isLdStr = false;
> 
> IsSimpleLoad?
> 
> > +      if (isa<LoadInst>(I)) {
> > +        isLdStr = cast<LoadInst>(I)->isSimple();
> > +      } else if (isa<StoreInst>(I)) {
> > +        isLdStr = cast<StoreInst>(I)->isSimple();
> > +      }
> 
> if (LoadInst *Load = dyn_cast<LoadInst>(I)) {
>    isLdStr = Load->isSimple();
> } else if (StoreInst *Store = dyn_cast<StoreInst>(I)) {
>    isLdStr = Store->isSimple();
> }
> 
> > +
> > +      // We can vectorize casts, but not casts of pointer types, etc.
> > +      bool isCast = false;
> > +      if (I->isCast()) {
> > +        isCast = true;
> > +        if
(!cast<CastInst>(I)->getSrcTy()->isSingleValueType()) {
> > +          isCast = false;
> > +        } else if
(!cast<CastInst>(I)->getDestTy()->isSingleValueType()) {
> > +          isCast = false;
> > +        } else if
(cast<CastInst>(I)->getSrcTy()->isPointerTy()) {
> > +          isCast = false;
> > +        } else if
(cast<CastInst>(I)->getDestTy()->isPointerTy()) {
> > +          isCast = false;
> > +        }
> > +      }
> > +
> > +      if (!(I->isBinaryOp() || isa<ShuffleVectorInst>(I) ||
> > +              isa<ExtractElementInst>(I) ||
isa<InsertElementInst>(I) ||
> > +              (!NoCasts&&  isCast) || isGoodIntr ||
> > +              (!NoMemOps&&  isLdStr))) {
> > +        continue;
> > +      }
> > +
> > +      // We can't vectorize memory operations without target data
> > +      if (AA.getTargetData() == 0&&  isLdStr) {
> > +        continue;
> > +      }
> > +
> > +      Type *T1, *T2;
> > +      if (isa<StoreInst>(I)) {
> > +        // For stores, it is the value type, not the pointer type
that matters
> > +        // because the value is what will come from a vector
register.
> > +
> > +        Value *Ival = cast<StoreInst>(I)->getValueOperand();
> > +        T1 = Ival->getType();
> > +      }
> > +      else {
> > +        T1 = I->getType();
> > +      }
> > +
> > +      if (I->isCast()) {
> > +        T2 = cast<CastInst>(I)->getSrcTy();
> > +      }
> > +      else {
> > +        T2 = T1;
> > +      }
> > +
> > +      // Not every type can be vectorized...
> > +      if (!(VectorType::isValidElementType(T1) ||
T1->isVectorTy()) ||
> > +          !(VectorType::isValidElementType(T2) ||
T2->isVectorTy())) {
> > +        continue;
> > +      }
> > +
> > +      if (NoInts&&  (T1->isIntOrIntVectorTy() ||
T2->isIntOrIntVectorTy())) {
> > +        continue;
> > +      }
> > +
> > +      if (NoFloats&&  (T1->isFPOrFPVectorTy() ||
T2->isFPOrFPVectorTy())) {
> > +        continue;
> > +      }
> > +
> > +      if (T1->getPrimitiveSizeInBits()>  vBits/2 ||
> > +          T2->getPrimitiveSizeInBits()>  vBits/2) {
> > +        continue;
> > +      }
> > +
> > +      // Look for an instruction with which to pair instruction *I...
> This could be a separate function, no?
> 
> > +      DenseSet<Value *>  users;
> > +      AliasSetTracker writes(AA);
> > +      BasicBlock::iterator J = I; ++J;
> > +      for (unsigned ss = 0; J != E&&  ss<= SearchLimit;
++J, ++ss) {
> > +        // Determine if J uses I, if so, exit the loop.
> 
> This loop is way to big. You can extract here also a couple of
> subfunctions. I will review this after this in etail more readable.
> > +        bool usesI = false;
> > +        for (User::op_iterator i = J->op_begin(), e =
J->op_end();
> > +               i != e; ++i) {
> > +          Value *v = *i;
> > +          if (I == v ||
> > +               (!FastDep&&  users.count(v))) {
> 
> Why a newline here? It should not break the 80 col limit.
> 
> > +            usesI = true; break;
> One instruction per line, please.
> 
> > +          }
> > +        }
> > +        if (!usesI&&  J->mayReadFromMemory()) {
> > +          for (AliasSetTracker::iterator i = writes.begin(), e =
writes.end();
> > +                i != e; ++i) {
> > +            for (AliasSet::iterator j = i->begin(), e2 =
i->end();
> > +                   j != e2; ++j) {
> > +              AliasAnalysis::Location ptrLoc(j->getValue(),
j->getSize(),
> > +                j->getTBAAInfo());
> > +              if (AA.getModRefInfo(J, ptrLoc) !=
AliasAnalysis::NoModRef) {
> > +                usesI = true; break;
> One instruction per line.
> 
> > +              }
> > +            }
> > +            if (usesI) break;
> > +          }
> > +        }
> Many of these braces are not necessarily needed. In case this function
> would be smaller they would also not be required for readability.
> 
> > +        if (FastDep) {
> > +          // Note: For this heuristic to be effective, independent
operations
> > +          // must tend to be intermixed. This is likely to be true
from some
> > +          // kinds of loop unrolling (but not the generic LLVM pass),
> > +          // but otherwise may require some kind of reordering pass.
> 
> What does it mean that independent operations must be intermixed? Can
> you give an example?
> 
> > +
> > +          // When using fast dependency analysis,
> > +          // stop searching after first use:
> > +          if (usesI) break;
> > +        }
> > +        else {
> > +          if (usesI) {
> 
> } else if (usesI) {
> 
> > +            if (J->mayWriteToMemory()) writes.add(J);
> > +            users.insert(J);
> > +            continue;
> > +          }
> > +        }
> > +
> > +        // J does not use I, and comes before the first use of I, so
it can be
> > +        // merged with I if the instructions are compatible.
> > +        bool isCompat = J->isSameOperationAs(I);
> > +        // FIXME: handle addsub-type operations!
> > +
> > +        // Only merge two shuffles if they're both constant
> > +        // or both not constant.
> > +        if (isCompat&&  isa<ShuffleVectorInst>(I)) {
> > +          isCompat =
isa<Constant>(I->getOperand(2))&&
> > +                     isa<Constant>(J->getOperand(2));
> > +          // FIXME: We may want to vectorize non-constant shuffles
also.
> > +        }
> > +
> > +        // Loads and stores can be merged if they have different
alignments,
> > +        // but are otherwise the same.
> > +        if (!isCompat&&  isa<LoadInst>(I)&& 
isa<LoadInst>(J)) {
> > +          if (I->getType() == J->getType()) {
> > +            if
(cast<LoadInst>(I)->getPointerOperand()->getType() => > +     
cast<LoadInst>(J)->getPointerOperand()->getType()&&
> > +                cast<LoadInst>(I)->isVolatile() => > +
cast<LoadInst>(J)->isVolatile()&&
> > +                cast<LoadInst>(I)->getOrdering() => >
+                  cast<LoadInst>(J)->getOrdering()&&
> > +                cast<LoadInst>(I)->getSynchScope() =>
> +                  cast<LoadInst>(J)->getSynchScope()
> > +               ) {
> > +                 isCompat = true;
> > +             }
> > +          }
> > +        } else if (!isCompat&& 
isa<StoreInst>(I)&&  isa<StoreInst>(J)) {
> > +          if
(cast<StoreInst>(I)->getValueOperand()->getType() => > +      
cast<StoreInst>(J)->getValueOperand()->getType()&&
> > +             
cast<StoreInst>(I)->getPointerOperand()->getType() => > +     
cast<StoreInst>(J)->getPointerOperand()->getType()&&
> > +              cast<StoreInst>(I)->isVolatile() => > + 
cast<StoreInst>(J)->isVolatile()&&
> > +              cast<StoreInst>(I)->getOrdering() => > +
cast<StoreInst>(J)->getOrdering()&&
> > +              cast<StoreInst>(I)->getSynchScope() => >
+                cast<StoreInst>(J)->getSynchScope()
> > +             ) {
> > +               isCompat = true;
> > +           }
> > +        }
> > +
> > +        if (isCompat&&  isLdStr) {
> > +          const TargetData&TD = *AA.getTargetData();
> > +
> > +          Value *Iptr, *Jptr;
> > +          unsigned Ialign, Jalign;
> > +          int rel = getPairPtrInfo(I, J, TD, Iptr, Jptr, Ialign,
Jalign);
> 
> Use uppercase letters. What does 'rel' mean? Can you use a more
> descriptive name?
> 
> > +
> > +          if (rel != 0) {
> > +            if (AlignedOnly) {
> > +              Type *aType = isa<StoreInst>(I) ?
> > +               
cast<StoreInst>(I)->getValueOperand()->getType() : I->getType();
> > +              // An aligned load or store is possible only if the
instruction
> > +              // with the lower offset has an alignment suitable for
the
> > +              // vector type.
> > +
> > +              unsigned balign = Ialign;
> 
> Uppercase letter.
> 
> > +              if (rel<  0) balign = Jalign;
> > +
> > +              Type *vType = getVecType(aType);
> > +              unsigned vecalign = TD.getPrefTypeAlignment(vType);
> 
> Uppercase.
> 
> > +              if (balign<  vecalign) {
> > +                isCompat = false;
> 
> This could just be a continue (or a return false if extracted to a
> separate function. Also no braces needed.
> 
> > +              }
> > +            }
> > +          }
> > +          else {
> 
> } else {
> 
> > +            isCompat = false;
> This could just be a continue (or a return false if extracted to a
> separate function.
> 
> 
> > +          }
> > +        }
> > +
> > +        if (!isCompat) continue;
> > +
> > +        // J is a candidate for merging with I.
> > +        if (!pairableInsts.size() ||
> > +             pairableInsts[pairableInsts.size()-1] != I) {
> > +          pairableInsts.push_back(I);
> > +        }
> 
> No braces needed.
> 
> > +        candPairs.insert(value_pair(I, J));
> > +        DEBUG(dbgs()<<  "BBV: candidate pair
"<<  *I<<
> > +                     "<->  "<<  *J<< 
"\n");
> > +      }
> > +    }
> > +
> > +    DEBUG(dbgs()<<  "BBV: found "<< 
pairableInsts.size()
> > +<<  " instructions with candidate pairs\n");
> > +  }
> > +
> > +  void BBVectorize::computeConnPairs(
> > +                      std::multimap<Value *, Value *> 
&candPairs,
> > +                      std::vector<Value *>  &pairableInsts,
> > +                      std::multimap<value_pair, value_pair> 
&connPairs) {
> Uppercase names for arguments. Also, the function is way too long. Can
> you extract smaller helper functions. I will review this then.
> 
> > +
> > +    for (std::vector<Value *>::iterator i =
pairableInsts.begin(),
> > +           e = pairableInsts.end(); i != e; ++i) {
> > +      vp_iterator_pair choiceRange = candPairs.equal_range(*i);
> > +
> > +      for (std::multimap<Value *, Value *>::iterator j =
choiceRange.first;
> > +           j != choiceRange.second; ++j) {
> > +
> > +        // For each possible pairing for this variable, look at the
uses of
> > +        // the first value...
> > +        for (Value::use_iterator I = j->first->use_begin(),
> > +             E = j->first->use_end(); I != E; ++I) {
> > +          vp_iterator_pair iPairRange = candPairs.equal_range(*I);
> > +
> > +          // For each use of the first variable, look for uses of the
second
> > +          // variable...
> > +          for (Value::use_iterator J = j->second->use_begin(),
> > +               E2 = j->second->use_end(); J != E2; ++J) {
> > +            vp_iterator_pair jPairRange = candPairs.equal_range(*J);
> > +
> > +            // Look for<I, J>:
> > +            for (std::multimap<Value *, Value *>::iterator k =
iPairRange.first;
> > +                 k != iPairRange.second; ++k) {
> > +              if (k->second == *J) {
> > +                connPairs.insert(vp_pair(*j, value_pair(*I, *J)));
> > +                break;
> > +              }
> > +            }
> > +            // Look for<J, I>:
> > +            for (std::multimap<Value *, Value *>::iterator k =
jPairRange.first;
> > +                 k != jPairRange.second; ++k) {
> > +              if (k->second == *I) {
> > +                connPairs.insert(vp_pair(*j, value_pair(*J, *I)));
> > +                break;
> > +              }
> > +            }
> > +          }
> > +
> > +          // Look for cases where just the first value in the pair is
used by
> > +          // both members of another pair (splatting).
> > +          Value::use_iterator J = j->first->use_begin();
> > +          if (!SplatBreaksChain) for (; J != E; ++J) {
> > +            for (std::multimap<Value *, Value *>::iterator k =
iPairRange.first;
> > +                 k != iPairRange.second; ++k) {
> > +              if (k->second == *J) {
> > +                connPairs.insert(vp_pair(*j, value_pair(*I, *J)));
> > +                break;
> > +              }
> > +            }
> > +          }
> > +        }
> > +
> > +        // Look for cases where just the second value in the pair is
used by
> > +        // both members of another pair (splatting).
> > +        if (!SplatBreaksChain)
> > +          for (Value::use_iterator I = j->second->use_begin(),
> > +             E = j->second->use_end(); I != E; ++I) {
> > +          vp_iterator_pair iPairRange = candPairs.equal_range(*I);
> > +
> > +          Value::use_iterator J = j->second->use_begin();
> > +          for (; J != E; ++J) {
> > +            for (std::multimap<Value *, Value *>::iterator k =
iPairRange.first;
> > +                 k != iPairRange.second; ++k) {
> > +              if (k->second == *J) {
> > +                connPairs.insert(vp_pair(*j, value_pair(*I, *J)));
> > +                break;
> > +              }
> > +            }
> > +          }
> > +        }
> > +      }
> > +    }
> > +
> > +    DEBUG(dbgs()<<  "BBV: found "<< 
connPairs.size()
> > +<<  " pair connections.\n");
> > +  }
> 
> 
> > +  void BBVectorize::buildDepMap(
> > +                      BasicBlock&BB,
> > +                      std::multimap<Value *, Value *> 
&candPairs,
> > +                      std::vector<Value *>  &pairableInsts,
> > +                      DenseSet<value_pair> 
&pairableInstUsers) {
> Uppercase variable names.
> 
> > +    DenseSet<Value *>  isInPair;
> > +    for (std::multimap<Value *, Value *>::iterator i =
candPairs.begin(),
> > +           e = candPairs.end(); i != e; ++i) {
> 
> Uppercase letters are used in LLVM for iterators.
> 
> > +      isInPair.insert(i->first); isInPair.insert(i->second);
> > +    }
> 
> No braces needed.
> 
> > +
> > +    // Iterate through the basic block, recording all users of each
> > +    // pairable instruction.
> > +
> > +    AliasAnalysis&AA = getAnalysis<AliasAnalysis>();
> > +    BasicBlock::iterator E = BB.end();
> > +    for (BasicBlock::iterator I = BB.getFirstInsertionPt(); I != E;
++I) {
> > +      if (isInPair.find(I) == isInPair.end()) continue;
> > +
> > +      DenseSet<Value *>  users;
> > +      AliasSetTracker writes(AA);
> > +      BasicBlock::iterator J = I; ++J;
> > +      for (; J != E; ++J) {
> > +        // Determine if J uses I, if so, exit the loop.
> Just extract this into a small helper function. No need to keep it here.
> 
> > +        bool usesI = false;
> > +        for (User::op_iterator i = J->op_begin(), e =
J->op_end();
> > +               i != e; ++i) {
> > +          Value *v = *i;
> > +          if (I == v || users.count(v)) {
> > +            usesI = true; break;
> > +          }
> > +        }
> > +        if (!usesI&&  J->mayReadFromMemory()) {
> 
> And this one as well
> > +          for (AliasSetTracker::iterator i = writes.begin(), e =
writes.end();
> > +                i != e; ++i) {
> > +            for (AliasSet::iterator j = i->begin(), e2 =
i->end();
> > +                   j != e2; ++j) {
> > +              AliasAnalysis::Location ptrLoc(j->getValue(),
j->getSize(),
> > +                j->getTBAAInfo());
> > +              if (AA.getModRefInfo(J, ptrLoc) !=
AliasAnalysis::NoModRef) {
> > +                usesI = true; break;
> > +              }
> > +            }
> > +            if (usesI) break;
> > +          }
> > +        }
> > +        if (usesI) {
> > +          if (J->mayWriteToMemory()) writes.add(J);
> > +          users.insert(J);
> > +        }
> 
> > +      }
> The resulting loop should be very simple.
> 	
> > +
> > +      for (DenseSet<Value *>::iterator i = users.begin(), e =
users.end();
> > +             i != e; ++i) {
> > +        pairableInstUsers.insert(value_pair(I, *i));
> > +      }
> > +    }
> > +  }
> > +
> 
> What does it mean pairs are in conflict? What about adding a comment
> about what this actually means.
> 
> > +  bool BBVectorize::pairsConflict(value_pair i, value_pair j,
> > +                      DenseSet<value_pair> 
&pairableInstUsers) {
> > +    // Two pairs are in conflict if they are mutual users of
eachother.
> > +    bool jui = pairableInstUsers.count(value_pair(i.first, j.first))
||
> > +               pairableInstUsers.count(value_pair(i.first, j.second))
||
> > +               pairableInstUsers.count(value_pair(i.second, j.first))
||
> > +               pairableInstUsers.count(value_pair(i.second,
j.second));
> > +    bool iuj = pairableInstUsers.count(value_pair(j.first, i.first))
||
> > +               pairableInstUsers.count(value_pair(j.first, i.second))
||
> > +               pairableInstUsers.count(value_pair(j.second, i.first))
||
> > +               pairableInstUsers.count(value_pair(j.second,
i.second));
> > +    return (jui&&  iuj);
> > +  }
> > +
> > +  void BBVectorize::choosePairs(
> > +                      std::multimap<Value *, Value *> 
&candPairs,
> > +                      std::vector<Value *>  &pairableInsts,
> > +                      std::multimap<value_pair, value_pair> 
&connPairs,
> > +                      DenseSet<value_pair> 
&pairableInstUsers,
> > +                      DenseMap<Value *, Value *>& 
chosenPairs) {
> 
> This one is again way to large. Can you please extract subfunctions.
> 
> I need to stop here, as I run out of time for today. I hope you already
> get an impression what kind of improvements I suggest. If you agree,
> please go ahead and integrate them. Especially the use of smaller
> subfunctions will help me a lot when reviewing this again. Most probably
> many of your inline comments, could become the descriptions of some of
> those extracted functions.
> 
> Cheers
> Tobi
> 
-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: llvm_bb_vectorize-20111121.diff
Type: text/x-patch
Size: 94105 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20111121/15e8fc19/attachment.bin>

Hal Finkel

2011-Nov-22 03:22 UTC

head link

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

On Mon, 2011-11-21 at 11:55 -0600, Hal Finkel wrote:> Tobias,
> 
> I've attached an updated patch. It contains a few bug fixes and many
> (refactoring and coding-convention) changes inspired by your comments.
> 
> I'm currently trying to fix the bug responsible for causing a compile
> failure when compiling
> test-suite/MultiSource/Applications/obsequi/toggle_move.c; after the
> pass begins to fuse instructions in a basic block in this file, the
> aliasing analysis starts producing different (more pessimistic) query
> answers. I've never before seen it do this, and I'm not sure why it
is
> happening. Also odd, at the same time, the numeric labels that are
> assigned to otherwise unnamed instructions, change. I don't think
I've
> seen this happen either (they generally seem to stay fixed once
> assigned). I don't know if these two things are related, but if anyone
> can provide some insight, I'd appreciate it.
I think that I see what is happening in this case (please someone tell
me if this does not make sense). In the problematic basic block, there
are some loads and stores that are independent. The default aliasing
analysis can tell that these loads and stores don't reference the same
memory region. Then, some of the inputs to the getelementptr
instructions used for these loads and stores are fused by the
vectorization. After this happens, the aliasing analysis loses its
ability to tell that the loads and stores that make use of those
vector-calculated indices are independent.

 -Hal
> 
> In any case, this version of the patch should be much more suitable for
> your (or anyone else's) further review.
> 
> Thanks again,
> Hal
> 
> On Thu, 2011-11-17 at 13:57 +0100, Tobias Grosser wrote:
> > On 11/17/2011 12:38 AM, Hal Finkel wrote:
> > > Tobias, et al.,
> > >
> > > Attached is the my autovectorization pass.
> > 
> > Very nice. Will you be at the developer summit? Maybe we could discuss
> > the integration there?
> > 
> > Here a first review of the source code.
> > 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Nov 2011 - [LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

Apparently Analagous Threads