In the creation of dynamic languages we often have to box values together.
For instance, take the following expression:
IntObj c = sqrt((a*a)+(b*b));
Here, most likely, a bytecode interpreter would execute this as
"mul_ints", "add_ints", "sqrt", etc. Inside these
primitive functions
we would have to unwrap our IntObj types, add the values, allocate a
new object and return that to the function. In the above example, we
could probably expect around 4 allocations, and 7 unboxing operations.
Now granted if my lanugage is running as a bytecode interpreter, I can
speed it up simply by having LLVM call my functions in order, and
perhaps even in-lining all the bytecode operations into a single
function. But even then, I'm still left with the 4 allocations and 7
unboxings (is that even a word?).
I know other compiler projects, such as PyPy have allocation removal
where the optimization passes see that we only use the result of an
allocation a single time. Thinking that LLVM may do this as well, I
tried this simple test on in-browser LLVM compiler:
--------
#include <stdio.h>
#include <stdlib.h>
typedef struct Foo
{
int *x;
int x2;
}Foo;
int main(int argc, char **argv) {
Foo *f = (Foo *)malloc(sizeof(Foo));
f->x = (int *)malloc(sizeof(int));
*f->x = 10;
return *f->x;
}
----
Output:
; ModuleID = '/tmp/webcompile/_28006_0.bc'
target datalayout
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-linux-gnu"
define i32 @main(i32 %argc, i8** nocapture %argv) nounwind {
entry:
  %0 = tail call noalias i8* @malloc(i64 16) nounwind ; <i8*> [#uses=1]
  %1 = tail call noalias i8* @malloc(i64 4) nounwind ; <i8*> [#uses=1]
  %2 = bitcast i8* %1 to i32*                     ; <i32*> [#uses=2]
  %3 = bitcast i8* %0 to i32**                    ; <i32**> [#uses=1]
  store i32* %2, i32** %3, align 8
  store i32 10, i32* %2, align 4
  ret i32 10
}
------
As you can see, the allocations are still being performed. Now if we
could get rid of these allocations, then this entire function could be
reduced to a single line that just returns 10.
Is any optimization like this planned or even feasible in LLVM?
Thank you for your time.
Timothy Baldridge
-- 
“One of the main causes of the fall of the Roman Empire was
that–lacking zero–they had no way to indicate successful termination
of their C programs.”
(Robert Firth)
Hi Timothy, LLVM cannot remove the malloc calls, as malloc() has a sideeffect and that would be changing the behaviour of the program. Apart from that, the problem with unboxing in dynamic languages is knowing beforehand which function to dispatch to. mul_ints or mul_floats, for example? What if a particular type has overridden the + operator, etc etc. So your code normally ends up bouncing through several functions making analysis difficult. James> -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Timothy Baldridge > Sent: 28 June 2011 19:25 > To: llvmdev at cs.uiuc.edu > Subject: [LLVMdev] Box removal > > In the creation of dynamic languages we often have to box values > together. > > For instance, take the following expression: > > IntObj c = sqrt((a*a)+(b*b)); > > Here, most likely, a bytecode interpreter would execute this as > "mul_ints", "add_ints", "sqrt", etc. Inside these primitive functions > we would have to unwrap our IntObj types, add the values, allocate a > new object and return that to the function. In the above example, we > could probably expect around 4 allocations, and 7 unboxing operations. > Now granted if my lanugage is running as a bytecode interpreter, I can > speed it up simply by having LLVM call my functions in order, and > perhaps even in-lining all the bytecode operations into a single > function. But even then, I'm still left with the 4 allocations and 7 > unboxings (is that even a word?). > > I know other compiler projects, such as PyPy have allocation removal > where the optimization passes see that we only use the result of an > allocation a single time. Thinking that LLVM may do this as well, I > tried this simple test on in-browser LLVM compiler: > -------- > #include <stdio.h> > #include <stdlib.h> > > typedef struct Foo > { > int *x; > int x2; > }Foo; > > int main(int argc, char **argv) { > Foo *f = (Foo *)malloc(sizeof(Foo)); > f->x = (int *)malloc(sizeof(int)); > *f->x = 10; > return *f->x; > } > > ---- > > Output: > > ; ModuleID = '/tmp/webcompile/_28006_0.bc' > target datalayout > "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32- > f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128- > n8:16:32:64" > target triple = "x86_64-linux-gnu" > > define i32 @main(i32 %argc, i8** nocapture %argv) nounwind { > entry: > %0 = tail call noalias i8* @malloc(i64 16) nounwind ; <i8*> [#uses=1] > %1 = tail call noalias i8* @malloc(i64 4) nounwind ; <i8*> [#uses=1] > %2 = bitcast i8* %1 to i32* ; <i32*> [#uses=2] > %3 = bitcast i8* %0 to i32** ; <i32**> [#uses=1] > store i32* %2, i32** %3, align 8 > store i32 10, i32* %2, align 4 > ret i32 10 > } > > ------ > > As you can see, the allocations are still being performed. Now if we > could get rid of these allocations, then this entire function could be > reduced to a single line that just returns 10. > > Is any optimization like this planned or even feasible in LLVM? > > > Thank you for your time. > > Timothy Baldridge > > > > > -- > "One of the main causes of the fall of the Roman Empire was > that-lacking zero-they had no way to indicate successful termination > of their C programs." > (Robert Firth) > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Tue, Jun 28, 2011 at 11:25, Timothy Baldridge <tbaldridge at gmail.com> wrote:> > I know other compiler projects, such as PyPy have allocation removal > where the optimization passes see that we only use the result of an > allocation a single time. Thinking that LLVM may do this as well, I > tried this simple test on in-browser LLVM compiler: >There's no malloc remover, but there is a malloc/free remover. If you fix the C code: #include <stdio.h> #include <stdlib.h> typedef struct Foo { int *x; int x2; }Foo; int main(int argc, char **argv) { Foo *f = (Foo *)malloc(sizeof(Foo)); f->x = (int *)malloc(sizeof(int)); *f->x = 10; int i = *f->x; free(f->x); free(f); return i; } Then it compiles exactly the way you want it to ; ModuleID = '/tmp/webcompile/_26915_0.bc' target datalayout "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64" target triple = "x86_64-linux-gnu" define i32 @main(i32 %argc, i8** nocapture %argv) nounwind readnone { entry: ret i32 10 } ~ Scott