In the creation of dynamic languages we often have to box values together. For instance, take the following expression: IntObj c = sqrt((a*a)+(b*b)); Here, most likely, a bytecode interpreter would execute this as "mul_ints", "add_ints", "sqrt", etc. Inside these primitive functions we would have to unwrap our IntObj types, add the values, allocate a new object and return that to the function. In the above example, we could probably expect around 4 allocations, and 7 unboxing operations. Now granted if my lanugage is running as a bytecode interpreter, I can speed it up simply by having LLVM call my functions in order, and perhaps even in-lining all the bytecode operations into a single function. But even then, I'm still left with the 4 allocations and 7 unboxings (is that even a word?). I know other compiler projects, such as PyPy have allocation removal where the optimization passes see that we only use the result of an allocation a single time. Thinking that LLVM may do this as well, I tried this simple test on in-browser LLVM compiler: -------- #include <stdio.h> #include <stdlib.h> typedef struct Foo { int *x; int x2; }Foo; int main(int argc, char **argv) { Foo *f = (Foo *)malloc(sizeof(Foo)); f->x = (int *)malloc(sizeof(int)); *f->x = 10; return *f->x; } ---- Output: ; ModuleID = '/tmp/webcompile/_28006_0.bc' target datalayout "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" target triple = "x86_64-linux-gnu" define i32 @main(i32 %argc, i8** nocapture %argv) nounwind { entry: %0 = tail call noalias i8* @malloc(i64 16) nounwind ; <i8*> [#uses=1] %1 = tail call noalias i8* @malloc(i64 4) nounwind ; <i8*> [#uses=1] %2 = bitcast i8* %1 to i32* ; <i32*> [#uses=2] %3 = bitcast i8* %0 to i32** ; <i32**> [#uses=1] store i32* %2, i32** %3, align 8 store i32 10, i32* %2, align 4 ret i32 10 } ------ As you can see, the allocations are still being performed. Now if we could get rid of these allocations, then this entire function could be reduced to a single line that just returns 10. Is any optimization like this planned or even feasible in LLVM? Thank you for your time. Timothy Baldridge -- “One of the main causes of the fall of the Roman Empire was that–lacking zero–they had no way to indicate successful termination of their C programs.” (Robert Firth)
Hi Timothy, LLVM cannot remove the malloc calls, as malloc() has a sideeffect and that would be changing the behaviour of the program. Apart from that, the problem with unboxing in dynamic languages is knowing beforehand which function to dispatch to. mul_ints or mul_floats, for example? What if a particular type has overridden the + operator, etc etc. So your code normally ends up bouncing through several functions making analysis difficult. James> -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Timothy Baldridge > Sent: 28 June 2011 19:25 > To: llvmdev at cs.uiuc.edu > Subject: [LLVMdev] Box removal > > In the creation of dynamic languages we often have to box values > together. > > For instance, take the following expression: > > IntObj c = sqrt((a*a)+(b*b)); > > Here, most likely, a bytecode interpreter would execute this as > "mul_ints", "add_ints", "sqrt", etc. Inside these primitive functions > we would have to unwrap our IntObj types, add the values, allocate a > new object and return that to the function. In the above example, we > could probably expect around 4 allocations, and 7 unboxing operations. > Now granted if my lanugage is running as a bytecode interpreter, I can > speed it up simply by having LLVM call my functions in order, and > perhaps even in-lining all the bytecode operations into a single > function. But even then, I'm still left with the 4 allocations and 7 > unboxings (is that even a word?). > > I know other compiler projects, such as PyPy have allocation removal > where the optimization passes see that we only use the result of an > allocation a single time. Thinking that LLVM may do this as well, I > tried this simple test on in-browser LLVM compiler: > -------- > #include <stdio.h> > #include <stdlib.h> > > typedef struct Foo > { > int *x; > int x2; > }Foo; > > int main(int argc, char **argv) { > Foo *f = (Foo *)malloc(sizeof(Foo)); > f->x = (int *)malloc(sizeof(int)); > *f->x = 10; > return *f->x; > } > > ---- > > Output: > > ; ModuleID = '/tmp/webcompile/_28006_0.bc' > target datalayout > "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32- > f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128- > n8:16:32:64" > target triple = "x86_64-linux-gnu" > > define i32 @main(i32 %argc, i8** nocapture %argv) nounwind { > entry: > %0 = tail call noalias i8* @malloc(i64 16) nounwind ; <i8*> [#uses=1] > %1 = tail call noalias i8* @malloc(i64 4) nounwind ; <i8*> [#uses=1] > %2 = bitcast i8* %1 to i32* ; <i32*> [#uses=2] > %3 = bitcast i8* %0 to i32** ; <i32**> [#uses=1] > store i32* %2, i32** %3, align 8 > store i32 10, i32* %2, align 4 > ret i32 10 > } > > ------ > > As you can see, the allocations are still being performed. Now if we > could get rid of these allocations, then this entire function could be > reduced to a single line that just returns 10. > > Is any optimization like this planned or even feasible in LLVM? > > > Thank you for your time. > > Timothy Baldridge > > > > > -- > "One of the main causes of the fall of the Roman Empire was > that-lacking zero-they had no way to indicate successful termination > of their C programs." > (Robert Firth) > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Tue, Jun 28, 2011 at 11:25, Timothy Baldridge <tbaldridge at gmail.com> wrote:> > I know other compiler projects, such as PyPy have allocation removal > where the optimization passes see that we only use the result of an > allocation a single time. Thinking that LLVM may do this as well, I > tried this simple test on in-browser LLVM compiler: >There's no malloc remover, but there is a malloc/free remover. If you fix the C code: #include <stdio.h> #include <stdlib.h> typedef struct Foo { int *x; int x2; }Foo; int main(int argc, char **argv) { Foo *f = (Foo *)malloc(sizeof(Foo)); f->x = (int *)malloc(sizeof(int)); *f->x = 10; int i = *f->x; free(f->x); free(f); return i; } Then it compiles exactly the way you want it to ; ModuleID = '/tmp/webcompile/_26915_0.bc' target datalayout "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" target triple = "x86_64-linux-gnu" define i32 @main(i32 %argc, i8** nocapture %argv) nounwind readnone { entry: ret i32 10 } ~ Scott