thr3ads.net - llvm dev - [LLVMdev] Additional Optimization I'm Missing? [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Bobby Powers

2008-Mar-31 23:46 UTC

[LLVMdev] Additional Optimization I'm Missing?

Hello, I'm working on using the LLVM JIT for a little project of mine,
amazing work first off!  I have a question about optimization passes.  I
initially have this function I've created, in python it looks like this:

OS_end = 50OS_start = 0OS_timestep = 1birth_rate = .3population 30.0for time in
range(OS_start, OS_end, OS_timestep):
    births = birth_rate * population
    deaths = 0.1 * population
    population_NEXT = population + births - deaths

    #generally put print statements here

    #updating stocks
    population = population_NEXT


Then I can turn it into LLVM IR:


define void @simulate() {
entry:
	%time = alloca double		; <double*> [#uses=4]
	%population_next = alloca double		; <double*> [#uses=2]
	%population = alloca double		; <double*> [#uses=5]
	%net_change = alloca double		; <double*> [#uses=2]
	%deaths = alloca double		; <double*> [#uses=2]
	%births = alloca double		; <double*> [#uses=2]
	%birth_rate = alloca double		; <double*> [#uses=2]
	%OS_timestep = alloca double		; <double*> [#uses=2]
	%OS_start = alloca double		; <double*> [#uses=2]
	%OS_end = alloca double		; <double*> [#uses=2]
	store double 5.000000e+01, double* %OS_end
	store double 0.000000e+00, double* %OS_start
	store double 1.000000e+00, double* %OS_timestep
	store double 3.000000e-01, double* %birth_rate
	store double 3.000000e+01, double* %population
	%OS_start1 = load double* %OS_start		; <double> [#uses=1]
	store double %OS_start1, double* %time
	br label %forcond

forcond:		; preds = %forinc, %entry
	%time2 = load double* %time		; <double> [#uses=1]
	%OS_end3 = load double* %OS_end		; <double> [#uses=1]
	%forcond4 = fcmp olt double %time2, %OS_end3		; <i1> [#uses=1]
	br i1 %forcond4, label %forbody, label %forafter

forbody:		; preds = %forcond
	%birth_rate5 = load double* %birth_rate		; <double> [#uses=1]
	%population6 = load double* %population		; <double> [#uses=1]
	%multmp = mul double %birth_rate5, %population6		; <double> [#uses=1]
	store double %multmp, double* %births
	%population7 = load double* %population		; <double> [#uses=1]
	%multmp8 = mul double 1.000000e-01, %population7		; <double> [#uses=1]
	store double %multmp8, double* %deaths
	%births9 = load double* %births		; <double> [#uses=1]
	%deaths10 = load double* %deaths		; <double> [#uses=1]
	%subtmp = sub double %births9, %deaths10		; <double> [#uses=1]
	store double %subtmp, double* %net_change
	%population11 = load double* %population		; <double> [#uses=1]
	%net_change12 = load double* %net_change		; <double> [#uses=1]
	%addtmp = add double %population11, %net_change12		; <double> [#uses=1]
	store double %addtmp, double* %population_next
	br label %updatestocks

updatestocks:		; preds = %forbody
	%population_next13 = load double* %population_next		; <double> [#uses=1]
	store double %population_next13, double* %population
	br label %forinc

forinc:		; preds = %updatestocks
	%time14 = load double* %time		; <double> [#uses=1]
	%OS_timestep15 = load double* %OS_timestep		; <double> [#uses=1]
	%new_time = add double %time14, %OS_timestep15		; <double> [#uses=1]
	store double %new_time, double* %time
	br label %forcond

forafter:		; preds = %forcond
	ret void
}



And then I run the optimizations from the Kaleidoscope tutorial
(mem2reg, predsimplify, instcombine, reassociate, gvn, simplifycfg)
and get the following:


define void @simulate() {
entry:
	br label %forcond

forcond:		; preds = %forbody, %entry
	%population.0 = phi double [ 3.000000e+01, %entry ], [ %addtmp,
%forbody ]		; <double> [#uses=3]
	%time.0 = phi double [ 0.000000e+00, %entry ], [ %new_time, %forbody
]		; <double> [#uses=2]
	%forcond4 = fcmp olt double %time.0, 5.000000e+01		; <i1> [#uses=1]
	br i1 %forcond4, label %forbody, label %forafter

forbody:		; preds = %forcond
	%multmp = mul double %population.0, 3.000000e-01		; <double> [#uses=1]
	%multmp8 = mul double %population.0, 1.000000e-01		; <double> [#uses=1]
	%subtmp = sub double %multmp, %multmp8		; <double> [#uses=1]
	%addtmp = add double %population.0, %subtmp		; <double> [#uses=1]
	%new_time = add double %time.0, 1.000000e+00		; <double> [#uses=1]
	br label %forcond

forafter:		; preds = %forcond
	ret void
}


The thing is, in forbody it is still doing:
subtmp = population + (population * .3) - (population * .1)

ideally I would love to see it reduced to:
subtmp = 1.2 * population


I've played around with adding a bunch of optimizations that sound good from
[http://www.llvm.org/docs/Passes.html], but I must admit I'm a bit over my
head here. Does anyone have any suggestions? Am I missing passes, do I need
to restructure the IR, or am I missing some concept?

thanks! (keep up the amazing work)
yours,
Bobby Powers
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080401/5df6b850/attachment.html>

me22

2008-Apr-01 00:08 UTC

head link

[LLVMdev] Additional Optimization I'm Missing?

On Mon, Mar 31, 2008 at 7:46 PM, Bobby Powers <bobbypowers at gmail.com>
wrote:>
> The thing is, in forbody it is still doing:
> subtmp = population + (population * .3) - (population * .1)
>
>
> ideally I would love to see it reduced to:
> subtmp = 1.2 * population
>
I expect that such a transformation is not safe in general thanks to
rounding errors, and as such isn't applied.  AFAIK, compilers usually
don't reorganize floating point operations.

Have you tried it on other compilers?  I expect it'll never happen
without some imprecise math option (like gcc's -ffast-math).

Dale Johannesen

2008-Apr-01 00:10 UTC

head link

[LLVMdev] Additional Optimization I'm Missing?

On Mar 31, 2008, at 4:46 PM, Bobby Powers wrote:
> Hello, I'm working on using the LLVM JIT for a little project of  
> mine, amazing work first off!  I have a question about optimization  
> passes.  I initially have this function I've created, in python it  
> looks like this:
>
> OS_end = 50
> OS_start = 0
> OS_timestep = 1
> birth_rate = .3
> population = 30.0
>
> for time in range(OS_start, OS_end, OS_timestep):
>     births = birth_rate * population
>     deaths = 0.1 * population
>     population_NEXT = population + births - deaths
>
>
> The thing is, in forbody it is still doing:
> subtmp = population + (population * .3) - (population * .1)
>
>
> ideally I would love to see it reduced to:
> subtmp = 1.2 * population
This transformation changes the result due to roundoff errors, and  
would be incorrect.
In C using %lf the printed values of the 2 expressions diverge on the  
84th iteration.
> I've played around with adding a bunch of optimizations that sound  
> good from [http://www.llvm.org/docs/Passes.html], but I must admit  
> I'm a bit over my head here.  Does anyone have any suggestions?  Am  
> I missing passes, do I need to restructure the IR, or am I missing  
> some concept?
>
> thanks! (keep up the amazing work)
> yours,
> Bobby Powers
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080331/d85f26a1/attachment.html>

Owen Anderson

2008-Apr-01 06:18 UTC

head link

[LLVMdev] Additional Optimization I'm Missing?

Other people have already addressed the specific case of this example,  
but, in general, the passes use by the opt tool in -std-compile-opts  
are considered to be a good set for unit-at-a-time optimization, and  
similarly the ones used in llvm-ld are considered good (in addition to  
the single-unit ones) for whole-program optimization.

Also note that predsimplify is incomplete and has some known bugs.  I  
would recommend against using it.

--Owen

On Mar 31, 2008, at 6:46 PM, Bobby Powers wrote:
> Hello, I'm working on using the LLVM JIT for a little project of  
> mine, amazing work first off!  I have a question about optimization  
> passes.  I initially have this function I've created, in python it  
> looks like this:
>
> OS_end = 50
> OS_start = 0
> OS_timestep = 1
> birth_rate = .3
> population = 30.0
>
> for time in range(OS_start, OS_end, OS_timestep):
>     births = birth_rate * population
>     deaths = 0.1 * population
>     population_NEXT = population + births - deaths
>
>     #generally put print statements here
>
>     #updating stocks
>     population = population_NEXT
>
> Then I can turn it into LLVM IR:
>
> define void @simulate() {
> entry:
> 	%time = alloca double		; <double*> [#uses=4]
> 	%population_next = alloca double		; <double*> [#uses=2]
> 	%population = alloca double		; <double*> [#uses=5]
> 	%net_change = alloca double		; <double*> [#uses=2]
> 	%deaths = alloca double		; <double*> [#uses=2]
> 	%births = alloca double		; <double*> [#uses=2]
> 	%birth_rate = alloca double		; <double*> [#uses=2]
> 	%OS_timestep = alloca double		; <double*> [#uses=2]
> 	%OS_start = alloca double		; <double*> [#uses=2]
> 	%OS_end = alloca double		; <double*> [#uses=2]
> 	store double 5.000000e+01, double* %OS_end
> 	store double 0.000000e+00, double* %OS_start
> 	store double 1.000000e+00, double* %OS_timestep
> 	store double 3.000000e-01, double* %birth_rate
> 	store double 3.000000e+01, double* %population
> 	%OS_start1 = load double* %OS_start		; <double> [#uses=1]
> 	store double %OS_start1, double* %time
> 	br label %forcond
>
> forcond:		; preds = %forinc, %entry
> 	%time2 = load double* %time		; <double> [#uses=1]
> 	%OS_end3 = load double* %OS_end		; <double> [#uses=1]
> 	%forcond4 = fcmp olt double %time2, %OS_end3		; <i1> [#uses=1]
> 	br i1 %forcond4, label %forbody, label %forafter
>
> forbody:		; preds = %forcond
> 	%birth_rate5 = load double* %birth_rate		; <double> [#uses=1]
> 	%population6 = load double* %population		; <double> [#uses=1]
> 	%multmp = mul double %birth_rate5, %population6		; <double>
[#uses=1]
> 	store double %multmp, double* %births
> 	%population7 = load double* %population		; <double> [#uses=1]
> 	%multmp8 = mul double 1.000000e-01, %population7		; <double>  
> [#uses=1]
> 	store double %multmp8, double* %deaths
> 	%births9 = load double* %births		; <double> [#uses=1]
> 	%deaths10 = load double* %deaths		; <double> [#uses=1]
> 	%subtmp = sub double %births9, %deaths10		; <double> [#uses=1]
> 	store double %subtmp, double* %net_change
> 	%population11 = load double* %population		; <double> [#uses=1]
> 	%net_change12 = load double* %net_change		; <double> [#uses=1]
> 	%addtmp = add double %population11, %net_change12		; <double>  
> [#uses=1]
> 	store double %addtmp, double* %population_next
> 	br label %updatestocks
>
> updatestocks:		; preds = %forbody
> 	%population_next13 = load double* %population_next		; <double>  
> [#uses=1]
> 	store double %population_next13, double* %population
> 	br label %forinc
>
> forinc:		; preds = %updatestocks
> 	%time14 = load double* %time		; <double> [#uses=1]
> 	%OS_timestep15 = load double* %OS_timestep		; <double> [#uses=1]
> 	%new_time = add double %time14, %OS_timestep15		; <double> [#uses=1]
> 	store double %new_time, double* %time
> 	br label %forcond
>
> forafter:		; preds = %forcond
> 	ret void
> }
>
>
> And then I run the optimizations from the Kaleidoscope tutorial  
> (mem2reg, predsimplify, instcombine, reassociate, gvn, simplifycfg)  
> and get the following:
>
> define void @simulate() {
> entry:
> 	br label %forcond
>
> forcond:		; preds = %forbody, %entry
> 	%population.0 = phi double [ 3.000000e+01, %entry ], [ %addtmp,  
> %forbody ]		; <double> [#uses=3]
> 	%time.0 = phi double [ 0.000000e+00, %entry ], [ %new_time,  
> %forbody ]		; <double> [#uses=2]
> 	%forcond4 = fcmp olt double %time.0, 5.000000e+01		; <i1> [#uses=1]
> 	br i1 %forcond4, label %forbody, label %forafter
>
> forbody:		; preds = %forcond
> 	%multmp = mul double %population.0, 3.000000e-01		; <double>  
> [#uses=1]
> 	%multmp8 = mul double %population.0, 1.000000e-01		; <double>  
> [#uses=1]
> 	%subtmp = sub double %multmp, %multmp8		; <double> [#uses=1]
> 	%addtmp = add double %population.0, %subtmp		; <double> [#uses=1]
> 	%new_time = add double %time.0, 1.000000e+00		; <double> [#uses=1]
> 	br label %forcond
>
> forafter:		; preds = %forcond
> 	ret void
> }
>
>
> The thing is, in forbody it is still doing:
> subtmp = population + (population * .3) - (population * .1)
>
>
> ideally I would love to see it reduced to:
> subtmp = 1.2 * population
>
>
>
> I've played around with adding a bunch of optimizations that sound  
> good from [http://www.llvm.org/docs/Passes.html], but I must admit  
> I'm a bit over my head here.  Does anyone have any suggestions?  Am  
> I missing passes, do I need to restructure the IR, or am I missing  
> some concept?
>
> thanks! (keep up the amazing work)
> yours,
> Bobby Powers
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080401/acb1e845/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2555 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080401/acb1e845/attachment.bin>

Bobby Powers

2008-Apr-02 08:56 UTC

head link

[LLVMdev] Additional Optimization I'm Missing?

thanks for the help, I suppose I'm yet not use to thinking of rounding
errors 84 iterations out :)
yours,
Bobby Powers
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080402/7385f3ed/attachment.html>

Duncan Sands

2008-Apr-02 09:39 UTC

head link

[LLVMdev] Additional Optimization I'm Missing?

> thanks for the help, I suppose I'm yet not use to thinking of rounding
> errors 84 iterations out :)
llc has an -enable-unsafe-fp-math option, while gcc has -fast-math, for
people who don't care about such details.

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Mar 2008 - [LLVMdev] Additional Optimization I'm Missing?

[LLVMdev] Additional Optimization I'm Missing?

[LLVMdev] Additional Optimization I'm Missing?

[LLVMdev] Additional Optimization I'm Missing?

[LLVMdev] Additional Optimization I'm Missing?

[LLVMdev] Additional Optimization I'm Missing?

[LLVMdev] Additional Optimization I'm Missing?

Reasonably Related Threads