thr3ads.net - llvm dev - [LLVMdev] Profiling in LLVM Patch [Jun 2009]

If this information is useful, please help other people find it:
Share via:

Andreas Neustifter

2009-Jun-29 13:34 UTC

[LLVMdev] Profiling in LLVM Patch

Hi all,

as proposed in
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-February/020396.html
I implemented the algorithm presented in [Ball94]. It only instruments
the minimal number of edges necessary for edge profiling.

The main changes introduced by this patch are:
*) a interface compatible rewrite of ProfileInfo
*) a cleanup of ProfileInfoLoader
    (some functionality in ProfileInfoLoader was duplicated in ProfileInfo or
    ProfileInfoLoaderPass; ProfileInfoLoader now really only performs the
    loading but not the post-processing)
*) a new instrumentation pass that performs the optimal edge profiling
    instrumentation
*) a helper module MaximumSpanningTree that selects the edges with have to
    be instrumented for optimal edge profiling
*) a ProfileEstimatorPass that does an offline estimation of a profile based
    on branching and loop depth (also proposed in [Ball94])
    (it is possible to use this ProfileEstimator stand-alone to have at least
    some profile estimation available in the frontend without doing profiling
    runs)
*) a ProfileVerifierPass that checks the current profiling information
    against the flow preservation rules

I did performance measurements on the C and C++ portions of the SPEC CPU2000
benchmark, the results are:
*) on average 46% of the edges are instrumented with a counter (in
    comparison to 100% with the current edge profiling)
*) the performance overhead (on linux-x68_64, other platforms to follow) was
    14.76% with the current profiling and 8.20% with the optimal profiling
    (there are strange effects with the mcf and equake benchmarks where the
    current edge profiling is as fast as the un-instrumented code whereas the
    optimally instrumented code has a small (~5%) performance overhead)
*) when mcf and equake are excluded the average performance overhead is
    51.31% with optimal profiling (in comparison to 100% with the current
    edge profiling)

Please tell me what you do not like and if this is interesting enough to be
added to the trunk, if so, I will also devise test cases for "make
check".

If you do not like the size of this patch it is possible to break it up a
little. I did not use the mkpatch utility since it does not include added
files, I hope the format is readable enough (it should be...)

I ran "make check" and there are not additional errors introduced
by the patch.

Grettings,

Andreas Neustifter

[Ball94] Thomas Ball, James R. Larus:
"Optimally profiling and tracing programs"
ACM TOPLAS, Volume 16, Issue 4, 1994
http://portal.acm.org/citation.cfm?id=183527
-------------- next part --------------
A non-text attachment was scrubbed...
Name: llvm-r74306.optimal_profiling.patch
Type: text/x-patch
Size: 85518 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090629/9d908488/attachment.bin>

Daniel Dunbar

2009-Jun-30 15:46 UTC

head link

[LLVMdev] Profiling in LLVM Patch

Hi Andreas,

Thanks for submitting this back, I intend to review this today. Since
its a big patch, I wanted to point this out on the list in case anyone
else started tackling it.

 - Daniel

On Mon, Jun 29, 2009 at 6:34 AM, Andreas
Neustifter<e0325716 at student.tuwien.ac.at>
wrote:> Hi all,
>
> as proposed in
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-February/020396.html
> I implemented the algorithm presented in [Ball94]. It only instruments
> the minimal number of edges necessary for edge profiling.
>
> The main changes introduced by this patch are:
> *) a interface compatible rewrite of ProfileInfo
> *) a cleanup of ProfileInfoLoader
>   (some functionality in ProfileInfoLoader was duplicated in ProfileInfo or
>   ProfileInfoLoaderPass; ProfileInfoLoader now really only performs the
>   loading but not the post-processing)
> *) a new instrumentation pass that performs the optimal edge profiling
>   instrumentation
> *) a helper module MaximumSpanningTree that selects the edges with have to
>   be instrumented for optimal edge profiling
> *) a ProfileEstimatorPass that does an offline estimation of a profile
based
>   on branching and loop depth (also proposed in [Ball94])
>   (it is possible to use this ProfileEstimator stand-alone to have at least
>   some profile estimation available in the frontend without doing profiling
>   runs)
> *) a ProfileVerifierPass that checks the current profiling information
>   against the flow preservation rules
>
> I did performance measurements on the C and C++ portions of the SPEC
CPU2000
> benchmark, the results are:
> *) on average 46% of the edges are instrumented with a counter (in
>   comparison to 100% with the current edge profiling)
> *) the performance overhead (on linux-x68_64, other platforms to follow)
was
>   14.76% with the current profiling and 8.20% with the optimal profiling
>   (there are strange effects with the mcf and equake benchmarks where the
>   current edge profiling is as fast as the un-instrumented code whereas the
>   optimally instrumented code has a small (~5%) performance overhead)
> *) when mcf and equake are excluded the average performance overhead is
>   51.31% with optimal profiling (in comparison to 100% with the current
>   edge profiling)
>
> Please tell me what you do not like and if this is interesting enough to be
> added to the trunk, if so, I will also devise test cases for "make
check".
>
> If you do not like the size of this patch it is possible to break it up a
> little. I did not use the mkpatch utility since it does not include added
> files, I hope the format is readable enough (it should be...)
>
> I ran "make check" and there are not additional errors introduced
> by the patch.
>
> Grettings,
>
> Andreas Neustifter
>
> [Ball94] Thomas Ball, James R. Larus:
> "Optimally profiling and tracing programs"
> ACM TOPLAS, Volume 16, Issue 4, 1994
> http://portal.acm.org/citation.cfm?id=183527
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

Daniel Dunbar

2009-Jul-01 04:36 UTC

head link

[LLVMdev] Profiling in LLVM Patch

Hi Andreas,

First, thanks again for undertaking this work and submitting it back. There is a
lot of good stuff here and it would be great to see it get back into the tree.

I have a few major high-level comments on the patch. First off, the patch is
quite large and should be broken down into separate incremental changes which
are easier to review and apply. I think the patches should more or less
correspond to the these high-level issues:

1. The intent of ProfileInfo is that it is the public interface used by the rest
of LLVM for all "profiling" related analyses. As such, we want it to
have a
thin, well-defined interface which can be implemented by multiple analysis
providers, just like we have with the current alias analysis interface. Your
current patch has a large number of changes to ProfileInfo which should instead
be inside a subclass of ProfileInfo, where most of the implementation details
are private.

I think the first task should be to separate out the changes which are
appropriate for the ProfileInfo class itself and apply those to the tree. I
believe that the important changes which are worth reflecting in the API are
changed the weights to return a double instead of an int, and defining -1 as a
sentinel, "don't know", value.

This is the most important thing to separate out, and to do early, because it is
the main interface that is shared with LLVM. It is also important for my GSoC
student, who was going to work on static profiling for LLVM and will depend on
any changes to this interface.

The current ProfileInfo functions should also become virtual, and all other
implementation details should be left to the subclass.

Finally, I don't think the ignoreMissing parameter to getExecutionCount
makes
sense, instead the provider should always return -1 if it doesn't have the
value, and clients should "do the right thing.

2. We shouldn't worry too much about preserving the functionality of the
current
llvm-prof tool, which uses the ProfileInfoLoader directly. llvm-prof should be
rewritten to just use the public API provided by ProfileInfo. This should be
done early so there are less dependencies on the ProfileInfoLoader interface.

In fact, it might even make sense to just get rid of llvm-prof and instead
implement a module level pass which consumes profiling information and prints it
out as part of the pass. This would allow us to debug the preservation of
profiling information once we start modifying optimization passes to update it.

3. The static profiling estimator should be implemented as just another
ProfileInfo implementation. I believe this can be a separate patch. Similarly
the profile verifier is just another consumer of ProfileInfo and can be a
separate patch.

4. Finally, the new optimal edge instrumentation & ProfileInfo
implementation
can be brought in.

Does this sound like a reasonable plan? Although it is a bit of work, I
don't
think it will be that difficult and I am happy to help out with splitting up the
patch / refactoring the existing code.

In addition, I have a number of "nitpicks" mostly related to LLVM
style:
 - Please make comments complete sentences, including capitalizing the first
  word and trailing punctuation.

 - Please format function prototypes to have the arguments following the
'(',
  and wrapped onto new lines as appropriate to fit in 80 columns. That is,
--
   static MaxSpanTree getMaxSpanTree(Function *F, ProfileInfo *PI,
                                     bool invert);
--
not
--
   static MaxSpanTree getMaxSpanTree(
       Function *F, ProfileInfo *PI, bool invert);
--

 - Please no spacing after the function in a call ("assert(" not
"assert (") and
  no spacing inside if ("if (foo)" not "if ( foo )").

 - In general, get...() functions should be marked 'const' when
possible.

Finally, I have some other technical comments inline with the diff below.

Thanks again for sharing your work!
 - Daniel
> +  /// getExitEdges - Return all pairs of (_inside_block_,_outside_block_).
> +  /// (Modelled after getExitingBlocks().)
> +  typedef std::pair<BlockT*,BlockT*> Edge;
> +  void getExitEdges(SmallVectorImpl<Edge> &ExitEdges) const {
> +    // Sort the blocks vector so that we can use binary search to do quick
> +    // lookups.
> +    SmallVector<BlockT*, 128> LoopBBs(block_begin(), block_end());
> +    std::sort(LoopBBs.begin(), LoopBBs.end());
Would it be better to use a DenseSet for this lookup?
> --- llvm-van/include/llvm/Analysis/MaximumSpanningTree.h      1970-01-01
01:00:00.000000000 +0100
> +++ llvm-c7/include/llvm/Analysis/MaximumSpanningTree.h       2009-06-26
16:47:01.000000000 +0200
...> +#include "llvm/Support/Streams.h"
Please don't use Support/Streams.h in a header, it pollutes any translation
unit
which imports this header.
> +    static void print(MaxSpanTree MST) {
> +      cerr<<"{";
> +      for ( MaxSpanTree::iterator ei = MST.begin(), ee = MST.end();
> +            ei!=ee; ++ei ) {
> +       
cerr<<"("<<((*ei).first?(*ei).first->getName():"0")<<",";
> +        cerr<<(*ei).second->getName()<<")";
> +      }
> +      cerr<<"}\n";
> +    }
> +  };
To be more in line with LLVM I would recommend making this a nullary const
method on the MST, and rename it to dump. Also, please use Support/raw_ostream.h
for output instead of C++ iostreams.
> --- llvm-van/include/llvm/Analysis/Passes.h   2009-06-29 13:49:13.000000000
+0200
> +++ llvm-c7/include/llvm/Analysis/Passes.h    2009-06-26 16:48:02.000000000
+0200
>    // createProfileLoaderPass - This pass loads information from a profile
dump
> -  // file.
> +  // file. Since its possible to use the ProfileInfoLoaderPass not only
from
> +  // opt but also e.g. from llvm-prof and with different profile info
filenames
> +  // a creator is provided where the toolname and profile info filename
can be
> +  // provided. (The toolname is used to create proper error messages.)
>    //
>    ModulePass *createProfileLoaderPass();
> +  ModulePass *createProfileLoaderPass(const std::string &, const
std::string &);
Please name the arguments so it is clear which is the toolname and which is the
profile info name (bonus points for switching to doxygen comments which
reference the parameters).
> --- llvm-van/include/llvm/Analysis/ProfileInfo.h      2009-05-12
10:37:52.000000000 +0200
> +++ llvm-c7/include/llvm/Analysis/ProfileInfo.h       2009-06-26
16:47:01.000000000 +0200
Keep in mind that most comments here are assuming this functionality is moved to
a subclass specific to optimal edge profiling.
>    protected:
> -    // EdgeCounts - Count the number of times a transition between two
blocks is
> -    // executed.  As a special case, we also hold an edge from the null
> -    // BasicBlock to the entry block to indicate how many times the
function was
> -    // entered.
> -    std::map<std::pair<BasicBlock*, BasicBlock*>, unsigned>
EdgeCounts;
> +    // EdgeInformation - Count the number of times a transition between
two
> +    // blocks is executed.  As a special case, we also hold an edge from
the
> +    // null BasicBlock to the entry block (henceforth (0,entry) to
indicate
> +    // how many times the function was entered. (This is especially
necessary
> +    // if there is only the entry block.)
> +    std::map<Function*, EdgeCounts> EdgeInformation;
> +
> +    // BlockInformation - Count the number of times a block is executed
> +    std::map<Function*, BlockCounts> BlockInformation;
> +
> +    // FunctionInformation - Count the number of times a function is
executed
> +    std::map<Function*, double> FunctionInformation;
Instead of multiple maps from Function* ->, would it make sense to collect
these
three (or four) maps inside a per-function structure?
> +    // getFunctionForEdge() - This returns the Function for a given Edge.
This
> +    // requires special handling for (0,entry) so its centralised here.
> +    static inline Function* getFunctionForEdge(Edge E) {
> +      BasicBlock *BB = E.first;
> +      if ( BB == 0 ) { BB = E.second; };
> +      assert ( BB != 0 && "both BasicBlocks of Edge are NULL,
can not "
> +                          "determine function");
> +      return BB->getParent();
> +    }
The edge destination should always be non-null, correct? Why not just
--
   static inline Function* getFunctionForEdge(Edge E) {
     assert(E.second && "Invalid profile edge!");
     return E.second->getParent();
   }
--
> +    // getFunctionForBlock() - Same as for Edges without special handling,
for
> +    // the sake of cleaner code.
> +    static inline Function* getFunctionForBlock(BasicBlock* BB) {
> +      return BB->getParent();
> +    }
I would prefer this wasn't used, its "common knowledge" in the
LLVM code that a
basic blocks parent is its Function.
> +    EdgeCounts getEdgeCounts(Function *F) {
> +      std::map<Function*, EdgeCounts>::const_iterator J > +      
EdgeInformation.find(F);
> +      if ( J != EdgeInformation.end() ) {
> +        return J->second;
> +      }
> +      return EdgeCounts(); // Return empty EdgeCounts in case none found.
> +    }
This is way too expensive, returning a deep copy of the edge counts map
isn't
necessary. The simplest alternative is to return a const reference and insert
into the map for missing functions. Something like:
--> +    const EdgeCounts& getEdgeCounts(Function *F) {
> +      return EdgeInformation[F];
> +    }--
Unless we really care about not inserting empty entries this seems reasonable to
me.
> +    void forceSynthesis(void) {
> +      for (std::map<Function*, EdgeCounts>::iterator
> +           FI = EdgeInformation.begin(), FIE = EdgeInformation.end();
> +           FI != FIE; ++FI) {
> +        Function *F = FI->first;
> +        for (Function::iterator BB = F->begin(), BBE = F->end();
> +             BB != BBE; ++BB) {
> +          getExecutionCount(BB,false);
> +        }
> +        getExecutionCount(F,false);
> +      }
> +    }
It doesn't make sense to have this function here, clients should just do the
right thing if the count isn't present. It may be worth providing a
ProfileInfo
level API to query whether any information is present for a particular function
so that clients don't perform needless queries.
> --- llvm-van/lib/Analysis/MaximumSpanningTree.cpp     1970-01-01
01:00:00.000000000 +0100
> +++ llvm-c7/lib/Analysis/MaximumSpanningTree.cpp      2009-06-26
16:47:01.000000000 +0200
...> +#define WOOD_REP(M,V1,V2) \
Please use "forest" instead of "wood".
> +  // compare two weighted edges
> +  struct VISIBILITY_HIDDEN EdgeWeightCompare {
> +    bool operator()(const ProfileInfo::EdgeWeight X,
> +                    const ProfileInfo::EdgeWeight Y) const {
> +      if ( X.second > Y.second ) return true;
> +      if ( X.second < Y.second ) return false;
> +      if ( X.first.first != 0 && Y.first.first == 0 ) return true;
> +      if ( X.first.first == 0 && Y.first.first != 0 ) return
false;
> +      if ( X.first.first == 0 && Y.first.first == 0 ) return
false;
> +      int c1 = 
X.first.first->getName().compare(Y.first.first->getName());
> +      if ( c1 > 0 ) return true;
> +      if ( c1 < 0 ) return false;
> +      int c2 = 
X.first.second->getName().compare(Y.first.second->getName());
> +      if ( c2 > 0 ) return true;
> +      return false;
> +    }
> +  };
Comparing edges based on the basic block name is not a good idea, in some common
cases these will all be empty. If the ordering really needs to be deterministic
then you will need to number the basic blocks, but is there a compelling reason
to not order based on the basic block address?

Regardless, I think this can be simplified to:> +  struct VISIBILITY_HIDDEN EdgeWeightCompare {
> +    bool operator()(const ProfileInfo::EdgeWeight X,
> +                    const ProfileInfo::EdgeWeight Y) const {
> +      if ( X.second != Y.second ) return (X.second > Y.second);
> +      if ( X.first.first != Y.first.first ) return (X.first.first >
Y.first.first);
> +      return (X.first.second > Y.first.second);
> +    }
> +  };using a helper routine for comparing the basic blocks if necessary (to handle
the null case).
> +// countSuccessors() - This gives an estimate of the number of successors
of a
> +// basic block. Returns 0, 1 or 2 depending on the block having no, one or
more
> +// than one successors respectively.
> +static int countSuccessors(BasicBlock* BB) {
> +  succ_iterator SI = succ_begin(BB), SE = succ_end(BB);
> +  if ( SI != SE ) {       // at least one successor
> +    ++SI;
> +    if ( SI == SE ) {     // exactly on successor
> +      return 1;
> +    } else {              // more than one successor
> +      return 2;
> +    }
> +  } else {
> +    return 0;
> +  }
> +}
This routine is only used in one place, and its result is compared against 0. I
think it should be removed and the code should just check (succ_begin(BB)
=succ_end(BB)), or for better readability add a succ_empty to CFG.h and use
that.
> --- llvm-van/lib/Analysis/ProfileEstimatorPass.cpp    1970-01-01
01:00:00.000000000 +0100
> +++ llvm-c7/lib/Analysis/ProfileEstimatorPass.cpp     2009-06-26
16:47:01.000000000 +0200
...> +bool OptimalEdgeProfiler::runOnModule(Module &M) {
> +  Function *Main = M.getFunction("main");
> +  if (Main == 0) {
> +    cerr << "WARNING: cannot insert edge profiling into a
module"
> +         << " with no main function!\n";
> +    return false;  // No main, no instrumentation!
> +  }
> +
> +  std::set<BasicBlock*> BlocksToInstrument;
> +  unsigned NumEdges = 0;
> +  for (Module::iterator F = M.begin(), E = M.end(); F != E; ++F) {
> +    if (F->isDeclaration()) continue;
> +    // use one counter for the edge (0,entry) for each function
> +    ++NumEdges;
> +    for (Function::iterator BB = F->begin(), E = F->end(); BB != E;
++BB) {
> +      // Keep track of which blocks need to be instrumented.  We don't
want to
> +      // instrument blocks that are added as the result of breaking
critical
> +      // edges!
> +      BlocksToInstrument.insert(BB);
> +      NumEdges += BB->getTerminator()->getNumSuccessors();
> +    }
> +  }
> +
> +  const ArrayType *ATy = ArrayType::get(Type::Int32Ty, NumEdges);
> +  GlobalVariable *Counters > +    new GlobalVariable(ATy, false,
GlobalValue::InternalLinkage,
> +                       0, "OptimalEdgeProfCounters", &M);
> +
> +  std::vector<Constant*> Initializer =
std::vector<Constant*>(NumEdges);
This unnecessarily copies the vector, please use:
 std::vector<Constant*> Initializer(NumEdges);
> +  APInt zero(32,0);      Constant* zeroc = ConstantInt::get(zero);
> +  APInt minusone(32,-1); Constant* minusonec = ConstantInt::get(minusone);
Please use:> +  Constant* zeroc = ConstantInt::get(llvm::Type::Int32Ty, 0);
> +  Constant* onec = ConstantInt::getSigned(llvm::Type::Int32Ty, -1);
On Mon, Jun 29, 2009 at 6:34 AM, Andreas
Neustifter<e0325716 at student.tuwien.ac.at>
wrote:> Hi all,
>
> as proposed in
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-February/020396.html
> I implemented the algorithm presented in [Ball94]. It only instruments
> the minimal number of edges necessary for edge profiling.
>
> The main changes introduced by this patch are:
> *) a interface compatible rewrite of ProfileInfo
> *) a cleanup of ProfileInfoLoader
>   (some functionality in ProfileInfoLoader was duplicated in ProfileInfo or
>   ProfileInfoLoaderPass; ProfileInfoLoader now really only performs the
>   loading but not the post-processing)
> *) a new instrumentation pass that performs the optimal edge profiling
>   instrumentation
> *) a helper module MaximumSpanningTree that selects the edges with have to
>   be instrumented for optimal edge profiling
> *) a ProfileEstimatorPass that does an offline estimation of a profile
based
>   on branching and loop depth (also proposed in [Ball94])
>   (it is possible to use this ProfileEstimator stand-alone to have at least
>   some profile estimation available in the frontend without doing profiling
>   runs)
> *) a ProfileVerifierPass that checks the current profiling information
>   against the flow preservation rules
>
> I did performance measurements on the C and C++ portions of the SPEC
CPU2000
> benchmark, the results are:
> *) on average 46% of the edges are instrumented with a counter (in
>   comparison to 100% with the current edge profiling)
> *) the performance overhead (on linux-x68_64, other platforms to follow)
was
>   14.76% with the current profiling and 8.20% with the optimal profiling
>   (there are strange effects with the mcf and equake benchmarks where the
>   current edge profiling is as fast as the un-instrumented code whereas the
>   optimally instrumented code has a small (~5%) performance overhead)
> *) when mcf and equake are excluded the average performance overhead is
>   51.31% with optimal profiling (in comparison to 100% with the current
>   edge profiling)
>
> Please tell me what you do not like and if this is interesting enough to be
> added to the trunk, if so, I will also devise test cases for "make
check".
>
> If you do not like the size of this patch it is possible to break it up a
> little. I did not use the mkpatch utility since it does not include added
> files, I hope the format is readable enough (it should be...)
>
> I ran "make check" and there are not additional errors introduced
> by the patch.
>
> Grettings,
>
> Andreas Neustifter
>
> [Ball94] Thomas Ball, James R. Larus:
> "Optimally profiling and tracing programs"
> ACM TOPLAS, Volume 16, Issue 4, 1994
> http://portal.acm.org/citation.cfm?id=183527
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

Andreas Neustifter

2009-Jul-01 09:36 UTC

head link

[LLVMdev] Profiling in LLVM Patch

Hi Daniel,

Daniel Dunbar wrote:> Hi Andreas,
> 
> First, thanks again for undertaking this work and submitting it back. There
is a
> lot of good stuff here and it would be great to see it get back into the
tree.Thanks for taking the time to review this, I know its a huge patch. I still have
a few questions on how you would like this patch to be re-factored and split up.
> [...]
> 
> 1. The intent of ProfileInfo is that it is the public interface used by the
rest
> of LLVM for all "profiling" related analyses. As such, we want it
to have a
> thin, well-defined interface which can be implemented by multiple analysis
> providers, just like we have with the current alias analysis interface.
Your
> current patch has a large number of changes to ProfileInfo which should
instead
> be inside a subclass of ProfileInfo, where most of the implementation
details
> are private.
> 
> I think the first task should be to separate out the changes which are
> appropriate for the ProfileInfo class itself and apply those to the tree. I
> believe that the important changes which are worth reflecting in the API
are
> changed the weights to return a double instead of an int, and defining -1
as a
> sentinel, "don't know", value.So you want the ProfileInfo to be untouched except for the double weights and
the MissingValue? Then mark all methods virtual and do a subclass that has this
sort of heavy implementation that I am using? Do you have any name suggestions
for this subclass?
One thing I would like to patch in the current ProfileInfo implementation is the
consistent use of the (0,entry) edge which is necessary to profile functions
with only one block (namely the entry block).
> This is the most important thing to separate out, and to do early, because
it is
> the main interface that is shared with LLVM. It is also important for my
GSoC
> student, who was going to work on static profiling for LLVM and will depend
on
> any changes to this interface.Yes, I thought so, I will try to do this quick.
> The current ProfileInfo functions should also become virtual, and all other
> implementation details should be left to the subclass.
> 
> Finally, I don't think the ignoreMissing parameter to getExecutionCount
makes
> sense, instead the provider should always return -1 if it doesn't have
the
> value, and clients should "do the right thing.Maybe, its of course always possibile to have custom wrappers that perform this
sort of task if necessary...
> 2. We shouldn't worry too much about preserving the functionality of
the current
> llvm-prof tool, which uses the ProfileInfoLoader directly. llvm-prof should
be
> rewritten to just use the public API provided by ProfileInfo. This should
be
> done early so there are less dependencies on the ProfileInfoLoader
interface.Okay, thats no problem. I guess it makes sense to rewrite llvm-prof but I was
not sure if this sort of changes would be appreciated.
> In fact, it might even make sense to just get rid of llvm-prof and instead
> implement a module level pass which consumes profiling information and
prints it
> out as part of the pass. This would allow us to debug the preservation of
> profiling information once we start modifying optimization passes to update
it.Good point.
> 3. The static profiling estimator should be implemented as just another
> ProfileInfo implementation. I believe this can be a separate patch.
Similarly
> the profile verifier is just another consumer of ProfileInfo and can be a
> separate patch.The ProfileEstimator is already a subclass of ProfileInfo and implements that
interface. Yes, these two are easily split into separate patches.
> 4. Finally, the new optimal edge instrumentation & ProfileInfo
implementation
> can be brought in.
> 
> Does this sound like a reasonable plan? Although it is a bit of work, I
don't
> think it will be that difficult and I am happy to help out with splitting
up the
> patch / refactoring the existing code.Yes, sounds good. I will try to split it up myself first and ask when there are
decisions to be made I'm not sure about.
> In addition, I have a number of "nitpicks" mostly related to LLVM
style:Well, they were to be expected :-) One never gets those right the first time
contributing to a new project...
>  [...]
>> +  /// getExitEdges - Return all pairs of
(_inside_block_,_outside_block_).
>> +  /// (Modelled after getExitingBlocks().)
>> +  typedef std::pair<BlockT*,BlockT*> Edge;
>> +  void getExitEdges(SmallVectorImpl<Edge> &ExitEdges) const
{
>> +    // Sort the blocks vector so that we can use binary search to do
quick
>> +    // lookups.
>> +    SmallVector<BlockT*, 128> LoopBBs(block_begin(),
block_end());
>> +    std::sort(LoopBBs.begin(), LoopBBs.end());
> 
> Would it be better to use a DenseSet for this lookup?Well, its the same that is used in getExitBlocks() where I got the
implementation from. I really don't know but I think the current
implementation is okay from what I gather form the LLVM Programmer's Manual.
>> --- llvm-van/include/llvm/Analysis/MaximumSpanningTree.h     
1970-01-01 01:00:00.000000000 +0100
>> +++ llvm-c7/include/llvm/Analysis/MaximumSpanningTree.h      
2009-06-26 16:47:01.000000000 +0200
> ...
>> +#include "llvm/Support/Streams.h"
> 
> Please don't use Support/Streams.h in a header, it pollutes any
translation unit
> which imports this header.Okay.
>> +    static void print(MaxSpanTree MST) {
>> +      cerr<<"{";
>> +      for ( MaxSpanTree::iterator ei = MST.begin(), ee = MST.end();
>> +            ei!=ee; ++ei ) {
>> +       
cerr<<"("<<((*ei).first?(*ei).first->getName():"0")<<",";
>> +        cerr<<(*ei).second->getName()<<")";
>> +      }
>> +      cerr<<"}\n";
>> +    }
>> +  };
> 
> To be more in line with LLVM I would recommend making this a nullary const
> method on the MST, and rename it to dump. Also, please use
Support/raw_ostream.h
> for output instead of C++ iostreams.Okay.
> 
>> --- llvm-van/include/llvm/Analysis/Passes.h   2009-06-29
13:49:13.000000000 +0200
>> +++ llvm-c7/include/llvm/Analysis/Passes.h    2009-06-26
16:48:02.000000000 +0200
>>    // createProfileLoaderPass - This pass loads information from a
profile dump
>> -  // file.
>> +  // file. Since its possible to use the ProfileInfoLoaderPass not
only from
>> +  // opt but also e.g. from llvm-prof and with different profile info
filenames
>> +  // a creator is provided where the toolname and profile info
filename can be
>> +  // provided. (The toolname is used to create proper error messages.)
>>    //
>>    ModulePass *createProfileLoaderPass();
>> +  ModulePass *createProfileLoaderPass(const std::string &, const
std::string &);
> 
> Please name the arguments so it is clear which is the toolname and which is
the
> profile info name (bonus points for switching to doxygen comments which
> reference the parameters).I guess I will go for the bonus points then.
>> --- llvm-van/include/llvm/Analysis/ProfileInfo.h      2009-05-12
10:37:52.000000000 +0200
>> +++ llvm-c7/include/llvm/Analysis/ProfileInfo.h       2009-06-26
16:47:01.000000000 +0200
> 
> Keep in mind that most comments here are assuming this functionality is
moved to
> a subclass specific to optimal edge profiling.
> 
>>    protected:
>> -    // EdgeCounts - Count the number of times a transition between two
blocks is
>> -    // executed.  As a special case, we also hold an edge from the
null
>> -    // BasicBlock to the entry block to indicate how many times the
function was
>> -    // entered.
>> -    std::map<std::pair<BasicBlock*, BasicBlock*>,
unsigned> EdgeCounts;
>> +    // EdgeInformation - Count the number of times a transition
between two
>> +    // blocks is executed.  As a special case, we also hold an edge
from the
>> +    // null BasicBlock to the entry block (henceforth (0,entry) to
indicate
>> +    // how many times the function was entered. (This is especially
necessary
>> +    // if there is only the entry block.)
>> +    std::map<Function*, EdgeCounts> EdgeInformation;
>> +
>> +    // BlockInformation - Count the number of times a block is
executed
>> +    std::map<Function*, BlockCounts> BlockInformation;
>> +
>> +    // FunctionInformation - Count the number of times a function is
executed
>> +    std::map<Function*, double> FunctionInformation;
> 
> Instead of multiple maps from Function* ->, would it make sense to
collect these
> three (or four) maps inside a per-function structure?Yes, makes sense, its then easier to factor out the retrieval and handling of
this structure.
>> +    // getFunctionForEdge() - This returns the Function for a given
Edge. This
>> +    // requires special handling for (0,entry) so its centralised
here.
>> +    static inline Function* getFunctionForEdge(Edge E) {
>> +      BasicBlock *BB = E.first;
>> +      if ( BB == 0 ) { BB = E.second; };
>> +      assert ( BB != 0 && "both BasicBlocks of Edge are
NULL, can not "
>> +                          "determine function");
>> +      return BB->getParent();
>> +    }
> 
> The edge destination should always be non-null, correct? Why not just
> --
>    static inline Function* getFunctionForEdge(Edge E) {
>      assert(E.second && "Invalid profile edge!");
>      return E.second->getParent();
>    }
> --Nice one. Thanks!
>> +    // getFunctionForBlock() - Same as for Edges without special
handling, for
>> +    // the sake of cleaner code.
>> +    static inline Function* getFunctionForBlock(BasicBlock* BB) {
>> +      return BB->getParent();
>> +    }
> 
> I would prefer this wasn't used, its "common knowledge" in
the LLVM code that a
> basic blocks parent is its Function.I was not sure about this one anyways, so yes, it will go.
>> +    EdgeCounts getEdgeCounts(Function *F) {
>> +      std::map<Function*, EdgeCounts>::const_iterator J >>
+        EdgeInformation.find(F);
>> +      if ( J != EdgeInformation.end() ) {
>> +        return J->second;
>> +      }
>> +      return EdgeCounts(); // Return empty EdgeCounts in case none
found.
>> +    }
> 
> This is way too expensive, returning a deep copy of the edge counts map
isn't
> necessary. The simplest alternative is to return a const reference and
insert
> into the map for missing functions. Something like:
> --
>> +    const EdgeCounts& getEdgeCounts(Function *F) {
>> +      return EdgeInformation[F];
>> +    }
> --
> Unless we really care about not inserting empty entries this seems
reasonable to
> me.Here my somewhat laking C++ skills failed me. I didn't see that this is
essentially doing a copy, of course this is not
necessary!> 
>> +    void forceSynthesis(void) {
>> +      for (std::map<Function*, EdgeCounts>::iterator
>> +           FI = EdgeInformation.begin(), FIE = EdgeInformation.end();
>> +           FI != FIE; ++FI) {
>> +        Function *F = FI->first;
>> +        for (Function::iterator BB = F->begin(), BBE = F->end();
>> +             BB != BBE; ++BB) {
>> +          getExecutionCount(BB,false);
>> +        }
>> +        getExecutionCount(F,false);
>> +      }
>> +    }
> 
> It doesn't make sense to have this function here, clients should just
do the
> right thing if the count isn't present. It may be worth providing a
ProfileInfo
> level API to query whether any information is present for a particular
function
> so that clients don't perform needless queries.I have to look into this one, I guess its possible to do this kind of check on
demand and calculate BlockCounts from EdgeCounts and FunctionCounts form
BlockCounts when necessary.
>> --- llvm-van/lib/Analysis/MaximumSpanningTree.cpp     1970-01-01
01:00:00.000000000 +0100
>> +++ llvm-c7/lib/Analysis/MaximumSpanningTree.cpp      2009-06-26
16:47:01.000000000 +0200
> ...
>> +#define WOOD_REP(M,V1,V2) \
> 
> Please use "forest" instead of "wood".Oh, sorry, thats a false friend, in German "forest" is
"Wald".
>> +  // compare two weighted edges
>> +  struct VISIBILITY_HIDDEN EdgeWeightCompare {
>> +    bool operator()(const ProfileInfo::EdgeWeight X,
>> +                    const ProfileInfo::EdgeWeight Y) const {
>> +      if ( X.second > Y.second ) return true;
>> +      if ( X.second < Y.second ) return false;
>> +      if ( X.first.first != 0 && Y.first.first == 0 ) return
true;
>> +      if ( X.first.first == 0 && Y.first.first != 0 ) return
false;
>> +      if ( X.first.first == 0 && Y.first.first == 0 ) return
false;
>> +      int c1 = 
X.first.first->getName().compare(Y.first.first->getName());
>> +      if ( c1 > 0 ) return true;
>> +      if ( c1 < 0 ) return false;
>> +      int c2 = 
X.first.second->getName().compare(Y.first.second->getName());
>> +      if ( c2 > 0 ) return true;
>> +      return false;
>> +    }
>> +  };
> 
> Comparing edges based on the basic block name is not a good idea, in some
common
> cases these will all be empty. If the ordering really needs to be
deterministic
> then you will need to number the basic blocks, but is there a compelling
reason
> to not order based on the basic block address?This slipped my notice and shouldn't have made it into production code. Its
sufficient to order based on weights.
(This implementation was necessary when the MST was also needed for reading the
profiling information, but since now I handle this with the Constant-Initializer
that provides this information directly in the profile, its only necessary to
compare the weight.)
> [...]
>> +// countSuccessors() - This gives an estimate of the number of
successors of a
>> +// basic block. Returns 0, 1 or 2 depending on the block having no,
one or more
>> +// than one successors respectively.
>> +static int countSuccessors(BasicBlock* BB) {
>> +  succ_iterator SI = succ_begin(BB), SE = succ_end(BB);
>> +  if ( SI != SE ) {       // at least one successor
>> +    ++SI;
>> +    if ( SI == SE ) {     // exactly on successor
>> +      return 1;
>> +    } else {              // more than one successor
>> +      return 2;
>> +    }
>> +  } else {
>> +    return 0;
>> +  }
>> +}
> 
> This routine is only used in one place, and its result is compared against
0. I
> think it should be removed and the code should just check (succ_begin(BB)
=> succ_end(BB)), or for better readability add a succ_empty to CFG.h and use
that.Okay.
>> --- llvm-van/lib/Analysis/ProfileEstimatorPass.cpp    1970-01-01
01:00:00.000000000 +0100
>> +++ llvm-c7/lib/Analysis/ProfileEstimatorPass.cpp     2009-06-26
16:47:01.000000000 +0200
> ...
>> +bool OptimalEdgeProfiler::runOnModule(Module &M) {
>> +  Function *Main = M.getFunction("main");
>> +  if (Main == 0) {
>> +    cerr << "WARNING: cannot insert edge profiling into a
module"
>> +         << " with no main function!\n";
>> +    return false;  // No main, no instrumentation!
>> +  }
>> +
>> +  std::set<BasicBlock*> BlocksToInstrument;
>> +  unsigned NumEdges = 0;
>> +  for (Module::iterator F = M.begin(), E = M.end(); F != E; ++F) {
>> +    if (F->isDeclaration()) continue;
>> +    // use one counter for the edge (0,entry) for each function
>> +    ++NumEdges;
>> +    for (Function::iterator BB = F->begin(), E = F->end(); BB !=
E; ++BB) {
>> +      // Keep track of which blocks need to be instrumented.  We
don't want to
>> +      // instrument blocks that are added as the result of breaking
critical
>> +      // edges!
>> +      BlocksToInstrument.insert(BB);
>> +      NumEdges += BB->getTerminator()->getNumSuccessors();
>> +    }
>> +  }
>> +
>> +  const ArrayType *ATy = ArrayType::get(Type::Int32Ty, NumEdges);
>> +  GlobalVariable *Counters >> +    new GlobalVariable(ATy,
false, GlobalValue::InternalLinkage,
>> +                       0, "OptimalEdgeProfCounters",
&M);
>> +
>> +  std::vector<Constant*> Initializer =
std::vector<Constant*>(NumEdges);
> 
> This unnecessarily copies the vector, please use:
>  std::vector<Constant*> Initializer(NumEdges);Okay.
>> +  APInt zero(32,0);      Constant* zeroc = ConstantInt::get(zero);
>> +  APInt minusone(32,-1); Constant* minusonec =
ConstantInt::get(minusone);
> 
> Please use:
>> +  Constant* zeroc = ConstantInt::get(llvm::Type::Int32Ty, 0);
>> +  Constant* onec = ConstantInt::getSigned(llvm::Type::Int32Ty, -1);Okay.

Thank you so much for taking the time and reviewing this giving me very valuable
comments.

Andi

Andreas Neustifter

2009-Jul-07 09:42 UTC

head link

[LLVMdev] Profiling in LLVM Patch Followup 6

Hi,

this is the sixth in a series of patches to cleanup and improve the
LLVM Profiling Infrastructure. It depends on the previous patches from
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023569.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023602.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023642.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023643.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023649.html

This patch adds optimal edge profiling along the lines of [Ball94] throughout
the LLVM Profiling Infrastructure.
(It also contains some small bugfixes and comment-fixes in the ProfileInfo.)

Andi

[Ball94] Thomas Ball, James R. Larus:
"Optimally profiling and tracing programs"
ACM TOPLAS, Volume 16, Issue 4, 1994
http://portal.acm.org/citation.cfm?id=183527
-------------- next part --------------
A non-text attachment was scrubbed...
Name: llvm-r74898.optimal.edge.profiling.patch
Type: text/x-patch
Size: 32704 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090707/4db5c90d/attachment.bin>

Andreas Neustifter

2009-Jul-08 07:50 UTC

head link

[LLVMdev] Profiling in LLVM Patch

Hi,

this is the seventh in a series of patches to cleanup and improve the
LLVM Profiling Infrastructure. It depends on the previous patches from
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023569.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023602.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023642.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023643.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023649.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023683.html

It corrects an error which occured when loading the optimal edge profile. (The
MissingValue marker was not handled correctly.)

Andi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: llvm-r74898.missingvalue.bugfix.patch
Type: text/x-patch
Size: 4574 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090708/5b29b0c1/attachment.bin>

Andreas Neustifter

2009-Jul-08 10:02 UTC

head link

[LLVMdev] Profiling in LLVM Patch Followup 8

Hi,

this is the eighth in a series of patches to cleanup and improve the
LLVM Profiling Infrastructure. It depends on the previous patches from
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023569.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023602.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023642.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023643.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023649.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023683.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023713.html

It improves the initialisation of the MaximumSpanningTree data structure.

Andi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: llvm-r74898.maxspantree.init.patch
Type: text/x-patch
Size: 831 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090708/d79f9abb/attachment.bin>

Andreas Neustifter

2009-Jul-08 10:59 UTC

head link

[LLVMdev] Profiling in LLVM Patch Followup 9

Hi,

this is the ninth (and last, for now) in a series of patches to cleanup and
improve the LLVM Profiling Infrastructure. It depends on the previous patches
from
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023569.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023602.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023642.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023643.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023649.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023683.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023713.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-July/023715.html

This patch introduces the ProfileVerifier, a pass that calculates for each block
if the sum of the incoming weights is equal to the weight of the block and equal
to the sum of the outgoing weights. It is not strictly necessary for profiling
but it makes development of profiling related passes easier, thus its published
also.

Andi

-------------- next part --------------
A non-text attachment was scrubbed...
Name: llvm-r74993.profileverifier.patch
Type: text/x-patch
Size: 9060 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20090708/f719980e/attachment.bin>

Seemingly Similar Threads

Search for more seemingly similar threads

llvm dev - Jun 2009 - [LLVMdev] Profiling in LLVM Patch

[LLVMdev] Profiling in LLVM Patch

[LLVMdev] Profiling in LLVM Patch

[LLVMdev] Profiling in LLVM Patch

[LLVMdev] Profiling in LLVM Patch

[LLVMdev] Profiling in LLVM Patch Followup 6

[LLVMdev] Profiling in LLVM Patch

[LLVMdev] Profiling in LLVM Patch Followup 8

[LLVMdev] Profiling in LLVM Patch Followup 9

Seemingly Similar Threads