Hello, I'm working on using the LLVM JIT for a little project of mine,
amazing work first off! I have a question about optimization passes. I
initially have this function I've created, in python it looks like this:
OS_end = 50OS_start = 0OS_timestep = 1birth_rate = .3population 30.0for time in
range(OS_start, OS_end, OS_timestep):
births = birth_rate * population
deaths = 0.1 * population
population_NEXT = population + births - deaths
#generally put print statements here
#updating stocks
population = population_NEXT
Then I can turn it into LLVM IR:
define void @simulate() {
entry:
%time = alloca double ; <double*> [#uses=4]
%population_next = alloca double ; <double*> [#uses=2]
%population = alloca double ; <double*> [#uses=5]
%net_change = alloca double ; <double*> [#uses=2]
%deaths = alloca double ; <double*> [#uses=2]
%births = alloca double ; <double*> [#uses=2]
%birth_rate = alloca double ; <double*> [#uses=2]
%OS_timestep = alloca double ; <double*> [#uses=2]
%OS_start = alloca double ; <double*> [#uses=2]
%OS_end = alloca double ; <double*> [#uses=2]
store double 5.000000e+01, double* %OS_end
store double 0.000000e+00, double* %OS_start
store double 1.000000e+00, double* %OS_timestep
store double 3.000000e-01, double* %birth_rate
store double 3.000000e+01, double* %population
%OS_start1 = load double* %OS_start ; <double> [#uses=1]
store double %OS_start1, double* %time
br label %forcond
forcond: ; preds = %forinc, %entry
%time2 = load double* %time ; <double> [#uses=1]
%OS_end3 = load double* %OS_end ; <double> [#uses=1]
%forcond4 = fcmp olt double %time2, %OS_end3 ; <i1> [#uses=1]
br i1 %forcond4, label %forbody, label %forafter
forbody: ; preds = %forcond
%birth_rate5 = load double* %birth_rate ; <double> [#uses=1]
%population6 = load double* %population ; <double> [#uses=1]
%multmp = mul double %birth_rate5, %population6 ; <double> [#uses=1]
store double %multmp, double* %births
%population7 = load double* %population ; <double> [#uses=1]
%multmp8 = mul double 1.000000e-01, %population7 ; <double> [#uses=1]
store double %multmp8, double* %deaths
%births9 = load double* %births ; <double> [#uses=1]
%deaths10 = load double* %deaths ; <double> [#uses=1]
%subtmp = sub double %births9, %deaths10 ; <double> [#uses=1]
store double %subtmp, double* %net_change
%population11 = load double* %population ; <double> [#uses=1]
%net_change12 = load double* %net_change ; <double> [#uses=1]
%addtmp = add double %population11, %net_change12 ; <double> [#uses=1]
store double %addtmp, double* %population_next
br label %updatestocks
updatestocks: ; preds = %forbody
%population_next13 = load double* %population_next ; <double> [#uses=1]
store double %population_next13, double* %population
br label %forinc
forinc: ; preds = %updatestocks
%time14 = load double* %time ; <double> [#uses=1]
%OS_timestep15 = load double* %OS_timestep ; <double> [#uses=1]
%new_time = add double %time14, %OS_timestep15 ; <double> [#uses=1]
store double %new_time, double* %time
br label %forcond
forafter: ; preds = %forcond
ret void
}
And then I run the optimizations from the Kaleidoscope tutorial
(mem2reg, predsimplify, instcombine, reassociate, gvn, simplifycfg)
and get the following:
define void @simulate() {
entry:
br label %forcond
forcond: ; preds = %forbody, %entry
%population.0 = phi double [ 3.000000e+01, %entry ], [ %addtmp,
%forbody ] ; <double> [#uses=3]
%time.0 = phi double [ 0.000000e+00, %entry ], [ %new_time, %forbody
] ; <double> [#uses=2]
%forcond4 = fcmp olt double %time.0, 5.000000e+01 ; <i1> [#uses=1]
br i1 %forcond4, label %forbody, label %forafter
forbody: ; preds = %forcond
%multmp = mul double %population.0, 3.000000e-01 ; <double> [#uses=1]
%multmp8 = mul double %population.0, 1.000000e-01 ; <double> [#uses=1]
%subtmp = sub double %multmp, %multmp8 ; <double> [#uses=1]
%addtmp = add double %population.0, %subtmp ; <double> [#uses=1]
%new_time = add double %time.0, 1.000000e+00 ; <double> [#uses=1]
br label %forcond
forafter: ; preds = %forcond
ret void
}
The thing is, in forbody it is still doing:
subtmp = population + (population * .3) - (population * .1)
ideally I would love to see it reduced to:
subtmp = 1.2 * population
I've played around with adding a bunch of optimizations that sound good from
[http://www.llvm.org/docs/Passes.html], but I must admit I'm a bit over my
head here. Does anyone have any suggestions? Am I missing passes, do I need
to restructure the IR, or am I missing some concept?
thanks! (keep up the amazing work)
yours,
Bobby Powers
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20080401/5df6b850/attachment.html>
On Mon, Mar 31, 2008 at 7:46 PM, Bobby Powers <bobbypowers at gmail.com> wrote:> > The thing is, in forbody it is still doing: > subtmp = population + (population * .3) - (population * .1) > > > ideally I would love to see it reduced to: > subtmp = 1.2 * population >I expect that such a transformation is not safe in general thanks to rounding errors, and as such isn't applied. AFAIK, compilers usually don't reorganize floating point operations. Have you tried it on other compilers? I expect it'll never happen without some imprecise math option (like gcc's -ffast-math).
On Mar 31, 2008, at 4:46 PM, Bobby Powers wrote:> Hello, I'm working on using the LLVM JIT for a little project of > mine, amazing work first off! I have a question about optimization > passes. I initially have this function I've created, in python it > looks like this: > > OS_end = 50 > OS_start = 0 > OS_timestep = 1 > birth_rate = .3 > population = 30.0 > > for time in range(OS_start, OS_end, OS_timestep): > births = birth_rate * population > deaths = 0.1 * population > population_NEXT = population + births - deaths > > > The thing is, in forbody it is still doing: > subtmp = population + (population * .3) - (population * .1) > > > ideally I would love to see it reduced to: > subtmp = 1.2 * populationThis transformation changes the result due to roundoff errors, and would be incorrect. In C using %lf the printed values of the 2 expressions diverge on the 84th iteration.> I've played around with adding a bunch of optimizations that sound > good from [http://www.llvm.org/docs/Passes.html], but I must admit > I'm a bit over my head here. Does anyone have any suggestions? Am > I missing passes, do I need to restructure the IR, or am I missing > some concept? > > thanks! (keep up the amazing work) > yours, > Bobby Powers > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080331/d85f26a1/attachment.html>
Other people have already addressed the specific case of this example, but, in general, the passes use by the opt tool in -std-compile-opts are considered to be a good set for unit-at-a-time optimization, and similarly the ones used in llvm-ld are considered good (in addition to the single-unit ones) for whole-program optimization. Also note that predsimplify is incomplete and has some known bugs. I would recommend against using it. --Owen On Mar 31, 2008, at 6:46 PM, Bobby Powers wrote:> Hello, I'm working on using the LLVM JIT for a little project of > mine, amazing work first off! I have a question about optimization > passes. I initially have this function I've created, in python it > looks like this: > > OS_end = 50 > OS_start = 0 > OS_timestep = 1 > birth_rate = .3 > population = 30.0 > > for time in range(OS_start, OS_end, OS_timestep): > births = birth_rate * population > deaths = 0.1 * population > population_NEXT = population + births - deaths > > #generally put print statements here > > #updating stocks > population = population_NEXT > > Then I can turn it into LLVM IR: > > define void @simulate() { > entry: > %time = alloca double ; <double*> [#uses=4] > %population_next = alloca double ; <double*> [#uses=2] > %population = alloca double ; <double*> [#uses=5] > %net_change = alloca double ; <double*> [#uses=2] > %deaths = alloca double ; <double*> [#uses=2] > %births = alloca double ; <double*> [#uses=2] > %birth_rate = alloca double ; <double*> [#uses=2] > %OS_timestep = alloca double ; <double*> [#uses=2] > %OS_start = alloca double ; <double*> [#uses=2] > %OS_end = alloca double ; <double*> [#uses=2] > store double 5.000000e+01, double* %OS_end > store double 0.000000e+00, double* %OS_start > store double 1.000000e+00, double* %OS_timestep > store double 3.000000e-01, double* %birth_rate > store double 3.000000e+01, double* %population > %OS_start1 = load double* %OS_start ; <double> [#uses=1] > store double %OS_start1, double* %time > br label %forcond > > forcond: ; preds = %forinc, %entry > %time2 = load double* %time ; <double> [#uses=1] > %OS_end3 = load double* %OS_end ; <double> [#uses=1] > %forcond4 = fcmp olt double %time2, %OS_end3 ; <i1> [#uses=1] > br i1 %forcond4, label %forbody, label %forafter > > forbody: ; preds = %forcond > %birth_rate5 = load double* %birth_rate ; <double> [#uses=1] > %population6 = load double* %population ; <double> [#uses=1] > %multmp = mul double %birth_rate5, %population6 ; <double> [#uses=1] > store double %multmp, double* %births > %population7 = load double* %population ; <double> [#uses=1] > %multmp8 = mul double 1.000000e-01, %population7 ; <double> > [#uses=1] > store double %multmp8, double* %deaths > %births9 = load double* %births ; <double> [#uses=1] > %deaths10 = load double* %deaths ; <double> [#uses=1] > %subtmp = sub double %births9, %deaths10 ; <double> [#uses=1] > store double %subtmp, double* %net_change > %population11 = load double* %population ; <double> [#uses=1] > %net_change12 = load double* %net_change ; <double> [#uses=1] > %addtmp = add double %population11, %net_change12 ; <double> > [#uses=1] > store double %addtmp, double* %population_next > br label %updatestocks > > updatestocks: ; preds = %forbody > %population_next13 = load double* %population_next ; <double> > [#uses=1] > store double %population_next13, double* %population > br label %forinc > > forinc: ; preds = %updatestocks > %time14 = load double* %time ; <double> [#uses=1] > %OS_timestep15 = load double* %OS_timestep ; <double> [#uses=1] > %new_time = add double %time14, %OS_timestep15 ; <double> [#uses=1] > store double %new_time, double* %time > br label %forcond > > forafter: ; preds = %forcond > ret void > } > > > And then I run the optimizations from the Kaleidoscope tutorial > (mem2reg, predsimplify, instcombine, reassociate, gvn, simplifycfg) > and get the following: > > define void @simulate() { > entry: > br label %forcond > > forcond: ; preds = %forbody, %entry > %population.0 = phi double [ 3.000000e+01, %entry ], [ %addtmp, > %forbody ] ; <double> [#uses=3] > %time.0 = phi double [ 0.000000e+00, %entry ], [ %new_time, > %forbody ] ; <double> [#uses=2] > %forcond4 = fcmp olt double %time.0, 5.000000e+01 ; <i1> [#uses=1] > br i1 %forcond4, label %forbody, label %forafter > > forbody: ; preds = %forcond > %multmp = mul double %population.0, 3.000000e-01 ; <double> > [#uses=1] > %multmp8 = mul double %population.0, 1.000000e-01 ; <double> > [#uses=1] > %subtmp = sub double %multmp, %multmp8 ; <double> [#uses=1] > %addtmp = add double %population.0, %subtmp ; <double> [#uses=1] > %new_time = add double %time.0, 1.000000e+00 ; <double> [#uses=1] > br label %forcond > > forafter: ; preds = %forcond > ret void > } > > > The thing is, in forbody it is still doing: > subtmp = population + (population * .3) - (population * .1) > > > ideally I would love to see it reduced to: > subtmp = 1.2 * population > > > > I've played around with adding a bunch of optimizations that sound > good from [http://www.llvm.org/docs/Passes.html], but I must admit > I'm a bit over my head here. Does anyone have any suggestions? Am > I missing passes, do I need to restructure the IR, or am I missing > some concept? > > thanks! (keep up the amazing work) > yours, > Bobby Powers > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080401/acb1e845/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2555 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080401/acb1e845/attachment.bin>
thanks for the help, I suppose I'm yet not use to thinking of rounding errors 84 iterations out :) yours, Bobby Powers -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080402/7385f3ed/attachment.html>
> thanks for the help, I suppose I'm yet not use to thinking of rounding > errors 84 iterations out :)llc has an -enable-unsafe-fp-math option, while gcc has -fast-math, for people who don't care about such details.