Hello, I'm working on using the LLVM JIT for a little project of mine, amazing work first off! I have a question about optimization passes. I initially have this function I've created, in python it looks like this: OS_end = 50OS_start = 0OS_timestep = 1birth_rate = .3population 30.0for time in range(OS_start, OS_end, OS_timestep): births = birth_rate * population deaths = 0.1 * population population_NEXT = population + births - deaths #generally put print statements here #updating stocks population = population_NEXT Then I can turn it into LLVM IR: define void @simulate() { entry: %time = alloca double ; <double*> [#uses=4] %population_next = alloca double ; <double*> [#uses=2] %population = alloca double ; <double*> [#uses=5] %net_change = alloca double ; <double*> [#uses=2] %deaths = alloca double ; <double*> [#uses=2] %births = alloca double ; <double*> [#uses=2] %birth_rate = alloca double ; <double*> [#uses=2] %OS_timestep = alloca double ; <double*> [#uses=2] %OS_start = alloca double ; <double*> [#uses=2] %OS_end = alloca double ; <double*> [#uses=2] store double 5.000000e+01, double* %OS_end store double 0.000000e+00, double* %OS_start store double 1.000000e+00, double* %OS_timestep store double 3.000000e-01, double* %birth_rate store double 3.000000e+01, double* %population %OS_start1 = load double* %OS_start ; <double> [#uses=1] store double %OS_start1, double* %time br label %forcond forcond: ; preds = %forinc, %entry %time2 = load double* %time ; <double> [#uses=1] %OS_end3 = load double* %OS_end ; <double> [#uses=1] %forcond4 = fcmp olt double %time2, %OS_end3 ; <i1> [#uses=1] br i1 %forcond4, label %forbody, label %forafter forbody: ; preds = %forcond %birth_rate5 = load double* %birth_rate ; <double> [#uses=1] %population6 = load double* %population ; <double> [#uses=1] %multmp = mul double %birth_rate5, %population6 ; <double> [#uses=1] store double %multmp, double* %births %population7 = load double* %population ; <double> [#uses=1] %multmp8 = mul double 1.000000e-01, %population7 ; <double> [#uses=1] store double %multmp8, double* %deaths %births9 = load double* %births ; <double> [#uses=1] %deaths10 = load double* %deaths ; <double> [#uses=1] %subtmp = sub double %births9, %deaths10 ; <double> [#uses=1] store double %subtmp, double* %net_change %population11 = load double* %population ; <double> [#uses=1] %net_change12 = load double* %net_change ; <double> [#uses=1] %addtmp = add double %population11, %net_change12 ; <double> [#uses=1] store double %addtmp, double* %population_next br label %updatestocks updatestocks: ; preds = %forbody %population_next13 = load double* %population_next ; <double> [#uses=1] store double %population_next13, double* %population br label %forinc forinc: ; preds = %updatestocks %time14 = load double* %time ; <double> [#uses=1] %OS_timestep15 = load double* %OS_timestep ; <double> [#uses=1] %new_time = add double %time14, %OS_timestep15 ; <double> [#uses=1] store double %new_time, double* %time br label %forcond forafter: ; preds = %forcond ret void } And then I run the optimizations from the Kaleidoscope tutorial (mem2reg, predsimplify, instcombine, reassociate, gvn, simplifycfg) and get the following: define void @simulate() { entry: br label %forcond forcond: ; preds = %forbody, %entry %population.0 = phi double [ 3.000000e+01, %entry ], [ %addtmp, %forbody ] ; <double> [#uses=3] %time.0 = phi double [ 0.000000e+00, %entry ], [ %new_time, %forbody ] ; <double> [#uses=2] %forcond4 = fcmp olt double %time.0, 5.000000e+01 ; <i1> [#uses=1] br i1 %forcond4, label %forbody, label %forafter forbody: ; preds = %forcond %multmp = mul double %population.0, 3.000000e-01 ; <double> [#uses=1] %multmp8 = mul double %population.0, 1.000000e-01 ; <double> [#uses=1] %subtmp = sub double %multmp, %multmp8 ; <double> [#uses=1] %addtmp = add double %population.0, %subtmp ; <double> [#uses=1] %new_time = add double %time.0, 1.000000e+00 ; <double> [#uses=1] br label %forcond forafter: ; preds = %forcond ret void } The thing is, in forbody it is still doing: subtmp = population + (population * .3) - (population * .1) ideally I would love to see it reduced to: subtmp = 1.2 * population I've played around with adding a bunch of optimizations that sound good from [http://www.llvm.org/docs/Passes.html], but I must admit I'm a bit over my head here. Does anyone have any suggestions? Am I missing passes, do I need to restructure the IR, or am I missing some concept? thanks! (keep up the amazing work) yours, Bobby Powers -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080401/5df6b850/attachment.html>
On Mon, Mar 31, 2008 at 7:46 PM, Bobby Powers <bobbypowers at gmail.com> wrote:> > The thing is, in forbody it is still doing: > subtmp = population + (population * .3) - (population * .1) > > > ideally I would love to see it reduced to: > subtmp = 1.2 * population >I expect that such a transformation is not safe in general thanks to rounding errors, and as such isn't applied. AFAIK, compilers usually don't reorganize floating point operations. Have you tried it on other compilers? I expect it'll never happen without some imprecise math option (like gcc's -ffast-math).
On Mar 31, 2008, at 4:46 PM, Bobby Powers wrote:> Hello, I'm working on using the LLVM JIT for a little project of > mine, amazing work first off! I have a question about optimization > passes. I initially have this function I've created, in python it > looks like this: > > OS_end = 50 > OS_start = 0 > OS_timestep = 1 > birth_rate = .3 > population = 30.0 > > for time in range(OS_start, OS_end, OS_timestep): > births = birth_rate * population > deaths = 0.1 * population > population_NEXT = population + births - deaths > > > The thing is, in forbody it is still doing: > subtmp = population + (population * .3) - (population * .1) > > > ideally I would love to see it reduced to: > subtmp = 1.2 * populationThis transformation changes the result due to roundoff errors, and would be incorrect. In C using %lf the printed values of the 2 expressions diverge on the 84th iteration.> I've played around with adding a bunch of optimizations that sound > good from [http://www.llvm.org/docs/Passes.html], but I must admit > I'm a bit over my head here. Does anyone have any suggestions? Am > I missing passes, do I need to restructure the IR, or am I missing > some concept? > > thanks! (keep up the amazing work) > yours, > Bobby Powers > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080331/d85f26a1/attachment.html>
Other people have already addressed the specific case of this example, but, in general, the passes use by the opt tool in -std-compile-opts are considered to be a good set for unit-at-a-time optimization, and similarly the ones used in llvm-ld are considered good (in addition to the single-unit ones) for whole-program optimization. Also note that predsimplify is incomplete and has some known bugs. I would recommend against using it. --Owen On Mar 31, 2008, at 6:46 PM, Bobby Powers wrote:> Hello, I'm working on using the LLVM JIT for a little project of > mine, amazing work first off! I have a question about optimization > passes. I initially have this function I've created, in python it > looks like this: > > OS_end = 50 > OS_start = 0 > OS_timestep = 1 > birth_rate = .3 > population = 30.0 > > for time in range(OS_start, OS_end, OS_timestep): > births = birth_rate * population > deaths = 0.1 * population > population_NEXT = population + births - deaths > > #generally put print statements here > > #updating stocks > population = population_NEXT > > Then I can turn it into LLVM IR: > > define void @simulate() { > entry: > %time = alloca double ; <double*> [#uses=4] > %population_next = alloca double ; <double*> [#uses=2] > %population = alloca double ; <double*> [#uses=5] > %net_change = alloca double ; <double*> [#uses=2] > %deaths = alloca double ; <double*> [#uses=2] > %births = alloca double ; <double*> [#uses=2] > %birth_rate = alloca double ; <double*> [#uses=2] > %OS_timestep = alloca double ; <double*> [#uses=2] > %OS_start = alloca double ; <double*> [#uses=2] > %OS_end = alloca double ; <double*> [#uses=2] > store double 5.000000e+01, double* %OS_end > store double 0.000000e+00, double* %OS_start > store double 1.000000e+00, double* %OS_timestep > store double 3.000000e-01, double* %birth_rate > store double 3.000000e+01, double* %population > %OS_start1 = load double* %OS_start ; <double> [#uses=1] > store double %OS_start1, double* %time > br label %forcond > > forcond: ; preds = %forinc, %entry > %time2 = load double* %time ; <double> [#uses=1] > %OS_end3 = load double* %OS_end ; <double> [#uses=1] > %forcond4 = fcmp olt double %time2, %OS_end3 ; <i1> [#uses=1] > br i1 %forcond4, label %forbody, label %forafter > > forbody: ; preds = %forcond > %birth_rate5 = load double* %birth_rate ; <double> [#uses=1] > %population6 = load double* %population ; <double> [#uses=1] > %multmp = mul double %birth_rate5, %population6 ; <double> [#uses=1] > store double %multmp, double* %births > %population7 = load double* %population ; <double> [#uses=1] > %multmp8 = mul double 1.000000e-01, %population7 ; <double> > [#uses=1] > store double %multmp8, double* %deaths > %births9 = load double* %births ; <double> [#uses=1] > %deaths10 = load double* %deaths ; <double> [#uses=1] > %subtmp = sub double %births9, %deaths10 ; <double> [#uses=1] > store double %subtmp, double* %net_change > %population11 = load double* %population ; <double> [#uses=1] > %net_change12 = load double* %net_change ; <double> [#uses=1] > %addtmp = add double %population11, %net_change12 ; <double> > [#uses=1] > store double %addtmp, double* %population_next > br label %updatestocks > > updatestocks: ; preds = %forbody > %population_next13 = load double* %population_next ; <double> > [#uses=1] > store double %population_next13, double* %population > br label %forinc > > forinc: ; preds = %updatestocks > %time14 = load double* %time ; <double> [#uses=1] > %OS_timestep15 = load double* %OS_timestep ; <double> [#uses=1] > %new_time = add double %time14, %OS_timestep15 ; <double> [#uses=1] > store double %new_time, double* %time > br label %forcond > > forafter: ; preds = %forcond > ret void > } > > > And then I run the optimizations from the Kaleidoscope tutorial > (mem2reg, predsimplify, instcombine, reassociate, gvn, simplifycfg) > and get the following: > > define void @simulate() { > entry: > br label %forcond > > forcond: ; preds = %forbody, %entry > %population.0 = phi double [ 3.000000e+01, %entry ], [ %addtmp, > %forbody ] ; <double> [#uses=3] > %time.0 = phi double [ 0.000000e+00, %entry ], [ %new_time, > %forbody ] ; <double> [#uses=2] > %forcond4 = fcmp olt double %time.0, 5.000000e+01 ; <i1> [#uses=1] > br i1 %forcond4, label %forbody, label %forafter > > forbody: ; preds = %forcond > %multmp = mul double %population.0, 3.000000e-01 ; <double> > [#uses=1] > %multmp8 = mul double %population.0, 1.000000e-01 ; <double> > [#uses=1] > %subtmp = sub double %multmp, %multmp8 ; <double> [#uses=1] > %addtmp = add double %population.0, %subtmp ; <double> [#uses=1] > %new_time = add double %time.0, 1.000000e+00 ; <double> [#uses=1] > br label %forcond > > forafter: ; preds = %forcond > ret void > } > > > The thing is, in forbody it is still doing: > subtmp = population + (population * .3) - (population * .1) > > > ideally I would love to see it reduced to: > subtmp = 1.2 * population > > > > I've played around with adding a bunch of optimizations that sound > good from [http://www.llvm.org/docs/Passes.html], but I must admit > I'm a bit over my head here. Does anyone have any suggestions? Am > I missing passes, do I need to restructure the IR, or am I missing > some concept? > > thanks! (keep up the amazing work) > yours, > Bobby Powers > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080401/acb1e845/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2555 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080401/acb1e845/attachment.bin>
thanks for the help, I suppose I'm yet not use to thinking of rounding errors 84 iterations out :) yours, Bobby Powers -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080402/7385f3ed/attachment.html>
> thanks for the help, I suppose I'm yet not use to thinking of rounding > errors 84 iterations out :)llc has an -enable-unsafe-fp-math option, while gcc has -fast-math, for people who don't care about such details.