thr3ads.net - llvm dev - [LLVMdev] GSoC Proposal: Profiling Enhancements [Apr 2012]

If this information is useful, please help other people find it:
Share via:

Alastair Murray

2012-Apr-05 18:18 UTC

[LLVMdev] GSoC Proposal: Profiling Enhancements

Hello Everyone,

Before I get started I just want to sincerely apologise for not getting 
feedback on this earlier.  I've had an extremely busy week as I was 
presenting a paper at the CGO conference.  If anyone is able to provide 
feedback in such a short time-frame then it will be gratefully received. 
  If not, then I just hope the work described sounds useful.  I have 
already submitted this proposal due to the short time until the 
deadline, but I can make changes.  (By the way, no need to worry about 
me being so busy during the coding period, see below.)



Proposal
--------

=== Overview ==
LLVM already contains a profiling framework, but only a handful of 
transforms make use of the metadata.  Further, it even contains a path 
profiling framework, but no transforms make use of it.  This "Google 
Summer of Code" proposal lays out an achievable plan to enhance 
profiling in LLVM and to use profiling metadata in key transformations 
where it can have a strong positive effect.



=== Why this is Useful for LLVM ==
Many profiling-data friendly transformations in LLVM do not currently 
use that data.  Specific examples are loop-unrolling, loop-unswitching 
and inlining that can all find good performance improvements, though at 
the cost of increased code-size.  Using profiling data with these three 
transformations would allow faster code in hot areas by applying these 
more aggressively and smaller code cold areas where minimising 
instruction cache cold-hits is the key performance concern.

Additionally, the path profiling framework was added to LLVM in January 
2011 but hasn't been touched since (except for wide-spread API updates). 
  As no transforms use the the produced metadata the code never gets 
tested (though the mailing lists suggest a small number of people may 
still be using it for external purposes).  This code is in serious 
danger of suffering from "bit-rot".  This proposal suggests enhancing
a
transformation to use this information to improve code-quality, and as a 
side-effect this stale code will be testable. Superblock formation 
already has a basic implementation in LLVM (it is called 
Tail-Duplication) but it does not use profiling information.  It can be 
improved by using either basic profiling information (for small gains) 
or path profiling information (for larger gains).

The aim of this proposal is that it should result in code which really 
will be integrated into LLVM.  Scanning a small subset of previous 
"Google Summer of Code" efforts (for many projects, not just LLVM) it 
appeared that few projects were actually able to use the contributed 
code.  Often students would do a good job, but would not quite truly 
complete the work.  Once summer was over no-one had time to polish the 
code, so it would be left unused.  With this in mind I am proposing an 
achievable but useful project which is decomposable into related 
sub-goals.  Time has been left in the proposal plan to produce code of 
high enough quality to be committed to the LLVM repository, and to have 
time to find sensible default parameters for heuristics that are 
changed.  Also, risk is reduced as the project is decomposable into 
seven largely independent sub-goals, even if one aspect does not work, 
the other six should still be usable by the LLVM project.



=== About Me ==
I am a PhD student at the University of Edinburgh, in the UK.  I am very 
close to the end of my studies and have a few months of time available 
that line-up perfectly with "Google Summer of Code" (I won't
describe
why I have the time here, but feel free to ask me via private email or 
IM if you're curious).  During my PhD I produced seven papers (five as 
first author) based on research built on the GCC and SUIF compilers.  I 
have over four years of full-time experience working with compilers, and 
another two years during my undergraduate degree where compilers were 
part of my studies.  My research has focussed on both code-generation 
and middle-end transformations (my publications: 
http://homepages.inf.ed.ac.uk/s0233454/pubs.html).  In the past I have 
also undertaken internships which involved hand-coding assembly, so I've 
no shortage of experience working on optimisation.

I have not worked with LLVM before, but I am genuinely very keen to do 
so.  Looking at job advertisements and having just attended CGO 2012 I 
noticed at both that LLVM is now used more than GCC (required for more 
job postings, used in more papers).  That is one reason why this 
proposal touches multiple transformations within LLVM, I'd like to learn 
as much as possible about LLVM while using my existing skills to produce 
something useful for the project.  The opportunity to have an expert 
help mentor me through LLVM is an excellent opportunity.  Hopefully my 
general compiler experience can offset my lack of LLVM experience as it 
always fun to get to use new technologies.  Additionally I have worked 
on multiple projects written in C++ during my studies so the programming 
aspects of this proposal will not present any issues.

Contact information:
	* CUT -- This is in the official submission *



=== Further Proposal Details ==
All of the below used http://llvm.org/OpenProjects.html and 
http://nondot.org/sabre/LLVMNotes/ as a starting point.

The plan is to become familiar with working within the LLVM 
infrastructure during the "Bonding Period".  The best way to do that,
of
course, is to get stuck in, so I plan to add a few small benchmark items 
to "test-suite".  One of the above links says that DSPStone and UTDSP 
should be added.  As I have worked with both of these before I will add 
these to "test-suite".  The process of doing so is probably the best
way
to learn how the LLVM testing infrastructure works.  UTDSP, however, is 
non-free for commercial use, so I will only be able to add that as an 
external benchmark (like SPEC).

Further to this, the path-profiling pass currently in LLVM does not seem 
to have a test-case in "test", so I will create one to aid me in 
learning how verification works in LLVM.

Once the coding period begins I will first want to start working with 
the profiling infrastructure directly.  I will modify the profiler so 
that run-time instrumentation is not inserted for loops with iterations 
counts that are known at compile time.  This will be added to 
OptimalEdgeProfiling.cpp, and assuming it has a very-low compile-time 
overhead it will also be added to EdgeProfiling.cpp.  Test-cases will be 
created for "test", and compile-time overhead and run-time benefits
will
be measured using "test-suite".

After this three transforms will be modified to exploit profiling data: 
loop unrolling (LoopUnrollPass.cpp), unswitching (LoopUnswitch.cpp) and 
inlining (Inliner.cpp).  This will be done by modifying the cost 
heuristics (e.g. in LoopUnroll::runOnLoop or Inliner::shouldInline). 
Initially this will be done by raising the threshold which limits the 
transformation on hot code, and lowering it on cold code (changing both 
so perhaps the overall produced code size will not change much).  If 
required (or if there is time) more sophisticated heuristics will be 
evaluated, e.g. using the caller/callee site execution ratio in the 
inliner.  Attempts will also be made to try and keep profiling data 
correct after transformations, so e.g. a hot loop with a heavily biased 
conditional branch inside it can be unswitched and have only one side 
unrolled.  Test-cases will be added to "test" designed so that
identical
loops and functions exist but with different execution counts, so 
differing behaviour on the hot/cold samples should be observed. 
Performance will be evaluated using "test-suite" again, though with
one
addition.  To ensure that the produced code is not over-specialised to 
the training profiling data an additional set of data will be used. 
MiDatasets (http://ctuning.org/wiki/index.php/CTools:CBench:Downloads) 
provides alternatives inputs for the MiBench suite, so that will be used 
to test this.

The next large chunk of work will be to make use of the exisiting path 
profiling code (PathProfiling.cpp).  Superblock Formation will be done 
via Tail Duplication (already implemented in TailDuplication.cpp). 
Basic profiling data should provide small gains for this, but the true 
expected gains come from using path-profiling to create the superblock 
such that there is a "trace" through it.  For the hot case should
result
in good instruction locality, good branch-prediction behaviour and good 
scheduling for the "trace".  Path-profiling may need to be fixed or 
enhanced for this, as it is currently unused it is difficult to know how 
well it will work. (Note: I'm stating that it is unused based on the 
fact that nothing includes PathProfileInfo.h except the path profiling 
itself — it is, however, possible that tools outside the core llvm use 
it though I couldn't find anything by means of an internet search). This 
will be tested and evaluated in the same way as the previously modified 
transformations.

In all the above cases the modifications will be made with the aim that 
when profiling data is not available then compiler behaviour will remain 
unchanged from its current state.  This will be verified by ensuring 
identical binaries are produced for "llvm-suite" (LLVM-head vs my
local
LLVM without profiling).

The final changes to be made will be to "llvm-prof". It has been noted
on the mailing list that it is not currently compatible with path-based 
profiling, so support for this will be added.  Finally, overall 
performance evaluations and code-cleanups will be performed with the aim 
of having the code integrated into LLVM.  This will hopefully be done 
with the help of the llvm-commits mailing list.



=== Deliverables ==
* Additions to test-suite.
* Reduced profile instrumentation run-time overhead.
* Profile enhanced loop-unrolling.
* Profile enhanced loop-unswitching.
* Profile enhanced inlining.
* Tested (and fixed if required) path profiling.
* Profile and path profile enhanced tail-duplication
   (superblock formation).
* llvm-prof fixed to work with path profiling.



=== Timeline ==
During "Bonding Period":
	* Install/Setup LLVM (and Clang etc.) and run "test".
	* Install "test-suite" and run with profiling.
	* Re-read textbooks/papers concerning profiling, path-profiling and 
relevant transformations.
	* Read any relevant LLVM documentation.
	* Add DSPStone to test-suite. (I have worked with it before.)
	* Add UTDSP to test-suite as an "External" benchmark suite
           (like SPEC). (I have also worked with this before.)
	* Write test for path-profiling (there is currently not one
           in "test").

Week 1:  Eliminate profiling instrumentation for loops where
          the iteration count is known at compile time.
Weeks 2-5:  Have the heuristics for loop unrolling, unswitching
             and inlining use profiling information. Add MiDatasets
             to test-suite (this may not be suitable for the
             repository though).
Weeks 6-9: Enhance Tail-Duplication to use first basic profiling
            and then path profiling.  Test and fix path profiling.
Week 10: Fix llvm-prof to work with path-profiling.
Week 11: Performance analysis and code-cleanup.
Week 12: Separate code into sets of independent patches for review
          on llvm-commits mailing list. Write final reports.

Some aspects of weeks 12 may well be ready for review earlier than week 
12, but I don't want to over-complicate the timeline.

Thank you for taking the time to read this,
Alatair Murray.

Evan Cheng

2012-May-02 00:56 UTC

head link

[LLVMdev] GSoC Proposal: Profiling Enhancements

Hi Alastair,

I am interested in seeing this through. Do you have a GSoC mentor yet?

Evan

On Apr 5, 2012, at 11:18 AM, Alastair Murray wrote:
> Hello Everyone,
> 
> Before I get started I just want to sincerely apologise for not getting 
> feedback on this earlier.  I've had an extremely busy week as I was 
> presenting a paper at the CGO conference.  If anyone is able to provide 
> feedback in such a short time-frame then it will be gratefully received. 
>  If not, then I just hope the work described sounds useful.  I have 
> already submitted this proposal due to the short time until the 
> deadline, but I can make changes.  (By the way, no need to worry about 
> me being so busy during the coding period, see below.)
> 
> 
> 
> Proposal
> --------
> 
> === Overview ==> 
> LLVM already contains a profiling framework, but only a handful of 
> transforms make use of the metadata.  Further, it even contains a path 
> profiling framework, but no transforms make use of it.  This "Google 
> Summer of Code" proposal lays out an achievable plan to enhance 
> profiling in LLVM and to use profiling metadata in key transformations 
> where it can have a strong positive effect.
> 
> 
> 
> === Why this is Useful for LLVM ==> 
> Many profiling-data friendly transformations in LLVM do not currently 
> use that data.  Specific examples are loop-unrolling, loop-unswitching 
> and inlining that can all find good performance improvements, though at 
> the cost of increased code-size.  Using profiling data with these three 
> transformations would allow faster code in hot areas by applying these 
> more aggressively and smaller code cold areas where minimising 
> instruction cache cold-hits is the key performance concern.
> 
> Additionally, the path profiling framework was added to LLVM in January 
> 2011 but hasn't been touched since (except for wide-spread API
updates).
>  As no transforms use the the produced metadata the code never gets 
> tested (though the mailing lists suggest a small number of people may 
> still be using it for external purposes).  This code is in serious 
> danger of suffering from "bit-rot".  This proposal suggests
enhancing a
> transformation to use this information to improve code-quality, and as a 
> side-effect this stale code will be testable. Superblock formation 
> already has a basic implementation in LLVM (it is called 
> Tail-Duplication) but it does not use profiling information.  It can be 
> improved by using either basic profiling information (for small gains) 
> or path profiling information (for larger gains).
> 
> The aim of this proposal is that it should result in code which really 
> will be integrated into LLVM.  Scanning a small subset of previous 
> "Google Summer of Code" efforts (for many projects, not just
LLVM) it
> appeared that few projects were actually able to use the contributed 
> code.  Often students would do a good job, but would not quite truly 
> complete the work.  Once summer was over no-one had time to polish the 
> code, so it would be left unused.  With this in mind I am proposing an 
> achievable but useful project which is decomposable into related 
> sub-goals.  Time has been left in the proposal plan to produce code of 
> high enough quality to be committed to the LLVM repository, and to have 
> time to find sensible default parameters for heuristics that are 
> changed.  Also, risk is reduced as the project is decomposable into 
> seven largely independent sub-goals, even if one aspect does not work, 
> the other six should still be usable by the LLVM project.
> 
> 
> 
> === About Me ==> 
> I am a PhD student at the University of Edinburgh, in the UK.  I am very 
> close to the end of my studies and have a few months of time available 
> that line-up perfectly with "Google Summer of Code" (I won't
describe
> why I have the time here, but feel free to ask me via private email or 
> IM if you're curious).  During my PhD I produced seven papers (five as 
> first author) based on research built on the GCC and SUIF compilers.  I 
> have over four years of full-time experience working with compilers, and 
> another two years during my undergraduate degree where compilers were 
> part of my studies.  My research has focussed on both code-generation 
> and middle-end transformations (my publications: 
> http://homepages.inf.ed.ac.uk/s0233454/pubs.html).  In the past I have 
> also undertaken internships which involved hand-coding assembly, so
I've
> no shortage of experience working on optimisation.
> 
> I have not worked with LLVM before, but I am genuinely very keen to do 
> so.  Looking at job advertisements and having just attended CGO 2012 I 
> noticed at both that LLVM is now used more than GCC (required for more 
> job postings, used in more papers).  That is one reason why this 
> proposal touches multiple transformations within LLVM, I'd like to
learn
> as much as possible about LLVM while using my existing skills to produce 
> something useful for the project.  The opportunity to have an expert 
> help mentor me through LLVM is an excellent opportunity.  Hopefully my 
> general compiler experience can offset my lack of LLVM experience as it 
> always fun to get to use new technologies.  Additionally I have worked 
> on multiple projects written in C++ during my studies so the programming 
> aspects of this proposal will not present any issues.
> 
> Contact information:
> 	* CUT -- This is in the official submission *
> 
> 
> 
> === Further Proposal Details ==> 
> All of the below used http://llvm.org/OpenProjects.html and 
> http://nondot.org/sabre/LLVMNotes/ as a starting point.
> 
> The plan is to become familiar with working within the LLVM 
> infrastructure during the "Bonding Period".  The best way to do
that, of
> course, is to get stuck in, so I plan to add a few small benchmark items 
> to "test-suite".  One of the above links says that DSPStone and
UTDSP
> should be added.  As I have worked with both of these before I will add 
> these to "test-suite".  The process of doing so is probably the
best way
> to learn how the LLVM testing infrastructure works.  UTDSP, however, is 
> non-free for commercial use, so I will only be able to add that as an 
> external benchmark (like SPEC).
> 
> Further to this, the path-profiling pass currently in LLVM does not seem 
> to have a test-case in "test", so I will create one to aid me in 
> learning how verification works in LLVM.
> 
> Once the coding period begins I will first want to start working with 
> the profiling infrastructure directly.  I will modify the profiler so 
> that run-time instrumentation is not inserted for loops with iterations 
> counts that are known at compile time.  This will be added to 
> OptimalEdgeProfiling.cpp, and assuming it has a very-low compile-time 
> overhead it will also be added to EdgeProfiling.cpp.  Test-cases will be 
> created for "test", and compile-time overhead and run-time
benefits will
> be measured using "test-suite".
> 
> After this three transforms will be modified to exploit profiling data: 
> loop unrolling (LoopUnrollPass.cpp), unswitching (LoopUnswitch.cpp) and 
> inlining (Inliner.cpp).  This will be done by modifying the cost 
> heuristics (e.g. in LoopUnroll::runOnLoop or Inliner::shouldInline). 
> Initially this will be done by raising the threshold which limits the 
> transformation on hot code, and lowering it on cold code (changing both 
> so perhaps the overall produced code size will not change much).  If 
> required (or if there is time) more sophisticated heuristics will be 
> evaluated, e.g. using the caller/callee site execution ratio in the 
> inliner.  Attempts will also be made to try and keep profiling data 
> correct after transformations, so e.g. a hot loop with a heavily biased 
> conditional branch inside it can be unswitched and have only one side 
> unrolled.  Test-cases will be added to "test" designed so that
identical
> loops and functions exist but with different execution counts, so 
> differing behaviour on the hot/cold samples should be observed. 
> Performance will be evaluated using "test-suite" again, though
with one
> addition.  To ensure that the produced code is not over-specialised to 
> the training profiling data an additional set of data will be used. 
> MiDatasets (http://ctuning.org/wiki/index.php/CTools:CBench:Downloads) 
> provides alternatives inputs for the MiBench suite, so that will be used 
> to test this.
> 
> The next large chunk of work will be to make use of the exisiting path 
> profiling code (PathProfiling.cpp).  Superblock Formation will be done 
> via Tail Duplication (already implemented in TailDuplication.cpp). 
> Basic profiling data should provide small gains for this, but the true 
> expected gains come from using path-profiling to create the superblock 
> such that there is a "trace" through it.  For the hot case should
result
> in good instruction locality, good branch-prediction behaviour and good 
> scheduling for the "trace".  Path-profiling may need to be fixed
or
> enhanced for this, as it is currently unused it is difficult to know how 
> well it will work. (Note: I'm stating that it is unused based on the 
> fact that nothing includes PathProfileInfo.h except the path profiling 
> itself — it is, however, possible that tools outside the core llvm use 
> it though I couldn't find anything by means of an internet search).
This
> will be tested and evaluated in the same way as the previously modified 
> transformations.
> 
> In all the above cases the modifications will be made with the aim that 
> when profiling data is not available then compiler behaviour will remain 
> unchanged from its current state.  This will be verified by ensuring 
> identical binaries are produced for "llvm-suite" (LLVM-head vs my
local
> LLVM without profiling).
> 
> The final changes to be made will be to "llvm-prof". It has been
noted
> on the mailing list that it is not currently compatible with path-based 
> profiling, so support for this will be added.  Finally, overall 
> performance evaluations and code-cleanups will be performed with the aim 
> of having the code integrated into LLVM.  This will hopefully be done 
> with the help of the llvm-commits mailing list.
> 
> 
> 
> === Deliverables ==> 
> * Additions to test-suite.
> * Reduced profile instrumentation run-time overhead.
> * Profile enhanced loop-unrolling.
> * Profile enhanced loop-unswitching.
> * Profile enhanced inlining.
> * Tested (and fixed if required) path profiling.
> * Profile and path profile enhanced tail-duplication
>   (superblock formation).
> * llvm-prof fixed to work with path profiling.
> 
> 
> 
> === Timeline ==> 
> During "Bonding Period":
> 	* Install/Setup LLVM (and Clang etc.) and run "test".
> 	* Install "test-suite" and run with profiling.
> 	* Re-read textbooks/papers concerning profiling, path-profiling and 
> relevant transformations.
> 	* Read any relevant LLVM documentation.
> 	* Add DSPStone to test-suite. (I have worked with it before.)
> 	* Add UTDSP to test-suite as an "External" benchmark suite
>           (like SPEC). (I have also worked with this before.)
> 	* Write test for path-profiling (there is currently not one
>           in "test").
> 
> Week 1:  Eliminate profiling instrumentation for loops where
>          the iteration count is known at compile time.
> Weeks 2-5:  Have the heuristics for loop unrolling, unswitching
>             and inlining use profiling information. Add MiDatasets
>             to test-suite (this may not be suitable for the
>             repository though).
> Weeks 6-9: Enhance Tail-Duplication to use first basic profiling
>            and then path profiling.  Test and fix path profiling.
> Week 10: Fix llvm-prof to work with path-profiling.
> Week 11: Performance analysis and code-cleanup.
> Week 12: Separate code into sets of independent patches for review
>          on llvm-commits mailing list. Write final reports.
> 
> Some aspects of weeks 12 may well be ready for review earlier than week 
> 12, but I don't want to over-complicate the timeline.
> 
> Thank you for taking the time to read this,
> Alatair Murray.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Maybe Matching Threads

Search for more apparently analagous threads

llvm dev - Apr 2012 - [LLVMdev] GSoC Proposal: Profiling Enhancements

[LLVMdev] GSoC Proposal: Profiling Enhancements

[LLVMdev] GSoC Proposal: Profiling Enhancements

Maybe Matching Threads