thr3ads.net - llvm dev - [LLVMdev] IR Passes and TargetTransformInfo: Straw Man [Jul 2013]

If this information is useful, please help other people find it:
Share via:

Andrew Trick

2013-Jul-29 23:07 UTC

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

On Jul 27, 2013, at 5:47 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
> Hi, Sean:
> 
>   I'm sorry I lie.  I didn't mean to lie. I did try to avoid making
a *BIG* change
> to the IPO pass-ordering for now. However, when I make a minor change to  
> populateLTOPassManager() by separating module-pass and non-module-passes, I
> saw quite a few performance difference, most of them are degradations.
Attacking
> these degradations one by one in a piecemeal manner is wasting time. We
might as
> well define the pass-ordering for Pre-IPO, IPO and Post-IPO phases at this
time,
> and hopefully once for all.
>    
>  In order to repair the image of being a liar, I post some preliminary
result in this cozy
> Saturday afternoon which I normally denote to daydreaming :-) 
> 
>  So far I only measure the result of MultiSource benchmarks on my iMac
(late
> 2012 model), and the command to run the benchmark is  
>  "make TEST=simple report OPTFLAGS='-O3 -flto'".
> 
>  In terms of execution-time, some degrade, but more improve, few of them 
> are quite substantial. User-time is used for comparison. I measure the 
> result twice, they are basically very stable. As far as I can tell from the
result,
> the proposed pass-ordering is basically toward good change. 
> 
>  Interesting enough, if I combine the populatePreIPOPassMgr() as the preIPO
phase
> (see the patch) with original populateLTOPassManager() for both IPO and
postIPO,
> I see significant improve to
"Benchmarks/Trimaran/netbench-crc/netbench-crc"
> (about 94%, 0.5665s(was) vs 0.0295s), as of I write this mail, I have not
yet got chance
> to figure out why this combination improves this benchmark this much.
> 
>  In teams of compile-time, the result reports my change improve the compile
> time by about 2x, which is non-sense. I guess test-script doesn't count
> link-time.
> 
>   The new pass ordering Pre-IPO, IPO, and PostIPO are defined by  
> populate{PreIPO|IPO|PostIPO}PassMgr().
> 
>   I will discuss with Andy next Monday in order to be consistent with the 
> pass-ordering design he is envisioning, and measure more benchmarks then 
> post the patch and result to the community for discussion and approval.
> 
> Thanks
> Shuxin
I don't have any objection to this as long as your compile times are
comparable.

The major differences that I could spot are:

You've moved the second iteration of some scalar opts into post-IPO:
- JumpThreading
- CorrelatedValueProp

You no longer run InstCombine after the first round of scalar opts (in preIPO)
and before the second round (in PostIPO).

You now have an extra (3rd) SROA in PostIPO.

I don't see a problem, but I'd like to understand the rationale. I think
it would be valuable to capture some of the motivation behind the standard pass
ordering and any changes we make to it. Sometimes part of the design becomes
obsolete but no one can be sure. Shall we start a new doc under LLVM subsystems?

-Andy

Sean Silva

2013-Jul-29 23:18 UTC

head link

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

On Mon, Jul 29, 2013 at 4:07 PM, Andrew Trick <atrick at apple.com>
wrote:>
> I don't see a problem, but I'd like to understand the rationale. I
think
> it would be valuable to capture some of the motivation behind the standard
> pass ordering and any changes we make to it. Sometimes part of the design
> becomes obsolete but no one can be sure. Shall we start a new doc under
> LLVM subsystems?
>
Starting a new doc sounds like a good idea to me. If you aren't familiar
with adding to the Sphinx docs, the sphinx quickstart template will get you
up and running <http://llvm.org/docs/SphinxQuickstartTemplate.html>.

-- Sean Silva
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130729/335d6561/attachment.html>

Hal Finkel

2013-Jul-29 23:24 UTC

head link

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

----- Original Message -----> 
> On Jul 27, 2013, at 5:47 PM, Shuxin Yang <shuxin.llvm at gmail.com>
> wrote:
> 
> > Hi, Sean:
> > 
> >   I'm sorry I lie.  I didn't mean to lie. I did try to avoid
making
> >   a *BIG* change
> > to the IPO pass-ordering for now. However, when I make a minor
> > change to
> > populateLTOPassManager() by separating module-pass and
> > non-module-passes, I
> > saw quite a few performance difference, most of them are
> > degradations. Attacking
> > these degradations one by one in a piecemeal manner is wasting
> > time. We might as
> > well define the pass-ordering for Pre-IPO, IPO and Post-IPO phases
> > at this time,
> > and hopefully once for all.
> >    
> >  In order to repair the image of being a liar, I post some
> >  preliminary result in this cozy
> > Saturday afternoon which I normally denote to daydreaming :-)
> > 
> >  So far I only measure the result of MultiSource benchmarks on my
> >  iMac (late
> > 2012 model), and the command to run the benchmark is
> >  "make TEST=simple report OPTFLAGS='-O3 -flto'".
> > 
> >  In terms of execution-time, some degrade, but more improve, few of
> >  them
> > are quite substantial. User-time is used for comparison. I measure
> > the
> > result twice, they are basically very stable. As far as I can tell
> > from the result,
> > the proposed pass-ordering is basically toward good change.
> > 
> >  Interesting enough, if I combine the populatePreIPOPassMgr() as
> >  the preIPO phase
> > (see the patch) with original populateLTOPassManager() for both IPO
> > and postIPO,
> > I see significant improve to
> > "Benchmarks/Trimaran/netbench-crc/netbench-crc"
> > (about 94%, 0.5665s(was) vs 0.0295s), as of I write this mail, I
> > have not yet got chance
> > to figure out why this combination improves this benchmark this
> > much.
> > 
> >  In teams of compile-time, the result reports my change improve the
> >  compile
> > time by about 2x, which is non-sense. I guess test-script doesn't
> > count
> > link-time.
> > 
> >   The new pass ordering Pre-IPO, IPO, and PostIPO are defined by
> > populate{PreIPO|IPO|PostIPO}PassMgr().
> > 
> >   I will discuss with Andy next Monday in order to be consistent
> >   with the
> > pass-ordering design he is envisioning, and measure more benchmarks
> > then
> > post the patch and result to the community for discussion and
> > approval.
> > 
> > Thanks
> > Shuxin
> 
> I don't have any objection to this as long as your compile times are
> comparable.
> 
> The major differences that I could spot are:
> 
> You've moved the second iteration of some scalar opts into post-IPO:
> - JumpThreading
> - CorrelatedValueProp
> 
> You no longer run InstCombine after the first round of scalar opts
> (in preIPO) and before the second round (in PostIPO).
> 
> You now have an extra (3rd) SROA in PostIPO.
> 
> I don't see a problem, but I'd like to understand the rationale. I
> think it would be valuable to capture some of the motivation behind
> the standard pass ordering and any changes we make to it. Sometimes
> part of the design becomes obsolete but no one can be sure.
Out of curiosity, has anyone tried to optimize the pass ordering in some
(quasi-)automated way? Naively, a genetic algorithm seems like a perfect fit for
this.

 -Hal
> Shall we
> start a new doc under LLVM subsystems?
> 
> -Andy
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Sean Silva

2013-Jul-29 23:38 UTC

head link

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

On Mon, Jul 29, 2013 at 4:24 PM, Hal Finkel <hfinkel at anl.gov>
wrote:>
> Out of curiosity, has anyone tried to optimize the pass ordering in some
> (quasi-)automated way? Naively, a genetic algorithm seems like a perfect
> fit for this.
>
This is the closest I've seen:
http://donsbot.wordpress.com/2010/03/01/evolving-faster-haskell-programs-now-with-llvm/


However, it deals with a "toy" example. Doing something similar over
an
entire benchmark suite would be interesting (and it may find non-obvious,
highly-profitable interactions between passes that we aren't currently
exploiting).

-- Sean Silva
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130729/59af46da/attachment.html>

Shuxin Yang

2013-Jul-29 23:39 UTC

head link

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

On 7/29/13 4:07 PM, Andrew Trick wrote:> On Jul 27, 2013, at 5:47 PM, Shuxin Yang <shuxin.llvm at gmail.com>
wrote:
>
>> Hi, Sean:
>>
>>    I'm sorry I lie.  I didn't mean to lie. I did try to avoid
making a *BIG* change
>> to the IPO pass-ordering for now. However, when I make a minor change
to
>> populateLTOPassManager() by separating module-pass and
non-module-passes, I
>> saw quite a few performance difference, most of them are degradations.
Attacking
>> these degradations one by one in a piecemeal manner is wasting time. We
might as
>> well define the pass-ordering for Pre-IPO, IPO and Post-IPO phases at
this time,
>> and hopefully once for all.
>>     
>>   In order to repair the image of being a liar, I post some preliminary
result in this cozy
>> Saturday afternoon which I normally denote to daydreaming :-)
>>
>>   So far I only measure the result of MultiSource benchmarks on my iMac
(late
>> 2012 model), and the command to run the benchmark is
>>   "make TEST=simple report OPTFLAGS='-O3 -flto'".
>>
>>   In terms of execution-time, some degrade, but more improve, few of
them
>> are quite substantial. User-time is used for comparison. I measure the
>> result twice, they are basically very stable. As far as I can tell from
the result,
>> the proposed pass-ordering is basically toward good change.
>>
>>   Interesting enough, if I combine the populatePreIPOPassMgr() as the
preIPO phase
>> (see the patch) with original populateLTOPassManager() for both IPO and
postIPO,
>> I see significant improve to
"Benchmarks/Trimaran/netbench-crc/netbench-crc"
>> (about 94%, 0.5665s(was) vs 0.0295s), as of I write this mail, I have
not yet got chance
>> to figure out why this combination improves this benchmark this much.
>>
>>   In teams of compile-time, the result reports my change improve the
compile
>> time by about 2x, which is non-sense. I guess test-script doesn't
count
>> link-time.
>>
>>    The new pass ordering Pre-IPO, IPO, and PostIPO are defined by
>> populate{PreIPO|IPO|PostIPO}PassMgr().
>>
>>    I will discuss with Andy next Monday in order to be consistent with
the
>> pass-ordering design he is envisioning, and measure more benchmarks
then
>> post the patch and result to the community for discussion and approval.
>>
>> Thanks
>> Shuxin
> I don't have any objection to this as long as your compile times are
comparable.
>
> The major differences that I could spot are:
>
> You've moved the second iteration of some scalar opts into post-IPO:
> - JumpThreading
> - CorrelatedValuePropI don't see why we need so many iterations.  So, I get rid of it
>
> You no longer run InstCombine after the first round of scalar opts (in
preIPO) and before the second round (in PostIPO).
>
> You now have an extra (3rd) SROA in PostIPO.
I call the SROA for dead code elimination, seriously!

The dead-whatever-elimination (even if they are called aggressive) pass 
dose not eliminate last store the
local variable. Shame! Shame! Shame!

It seems we don't have better way since we don't like mem-ssa. We have 
to call SROA , a  all-in-one algorithm,
to perform such stuff.

>
> I don't see a problem, but I'd like to understand the rationale. I
think it would be valuable to capture some of the motivation behind the standard
pass ordering and any changes we make to it. Sometimes part of the design
becomes obsolete but no one can be sure. Shall we start a new doc under LLVM
subsystems?
>
> -Andy

Shuxin Yang

2013-Jul-29 23:47 UTC

head link

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

I personally strong abhor this kind of thing:-) I guess I should be more 
open-minded.

For pre-ipo phase, some passes should not invoke, say, any loop 
nest-opt, loop version, aggressive loop unrolling,
vectorization, aggressive inling.

The reasons are they will hinder the downstream optimizers if they kick 
in early.
> Out of curiosity, has anyone tried to optimize the pass ordering in some
(quasi-)automated way? Naively, a genetic algorithm seems like a perfect fit for
this.
>
>   -Hal
>
>> Shall we
>> start a new doc under LLVM subsystems?
>>
>> -Andy
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Jul 2013 - [LLVMdev] IR Passes and TargetTransformInfo: Straw Man

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

Possibly Parallel Threads