thr3ads.net - llvm dev - [LLVMdev] Box removal [Jun 2011]

If this information is useful, please help other people find it:
Share via:

Timothy Baldridge

2011-Jun-28 18:25 UTC

[LLVMdev] Box removal

In the creation of dynamic languages we often have to box values together.

For instance, take the following expression:

IntObj c = sqrt((a*a)+(b*b));

Here, most likely, a bytecode interpreter would execute this as
"mul_ints", "add_ints", "sqrt", etc. Inside these
primitive functions
we would have to unwrap our IntObj types, add the values, allocate a
new object and return that to the function. In the above example, we
could probably expect around 4 allocations, and 7 unboxing operations.
Now granted if my lanugage is running as a bytecode interpreter, I can
speed it up simply by having LLVM call my functions in order, and
perhaps even in-lining all the bytecode operations into a single
function. But even then, I'm still left with the 4 allocations and 7
unboxings (is that even a word?).

I know other compiler projects, such as PyPy have allocation removal
where the optimization passes see that we only use the result of an
allocation a single time. Thinking that LLVM may do this as well, I
tried this simple test on in-browser LLVM compiler:
--------
#include <stdio.h>
#include <stdlib.h>

typedef struct Foo
{
int *x;
int x2;
}Foo;

int main(int argc, char **argv) {
Foo *f = (Foo *)malloc(sizeof(Foo));
f->x = (int *)malloc(sizeof(int));
*f->x = 10;
return *f->x;
}

----

Output:

; ModuleID = '/tmp/webcompile/_28006_0.bc'
target datalayout
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-linux-gnu"

define i32 @main(i32 %argc, i8** nocapture %argv) nounwind {
entry:
  %0 = tail call noalias i8* @malloc(i64 16) nounwind ; <i8*> [#uses=1]
  %1 = tail call noalias i8* @malloc(i64 4) nounwind ; <i8*> [#uses=1]
  %2 = bitcast i8* %1 to i32*                     ; <i32*> [#uses=2]
  %3 = bitcast i8* %0 to i32**                    ; <i32**> [#uses=1]
  store i32* %2, i32** %3, align 8
  store i32 10, i32* %2, align 4
  ret i32 10
}

------

As you can see, the allocations are still being performed. Now if we
could get rid of these allocations, then this entire function could be
reduced to a single line that just returns 10.

Is any optimization like this planned or even feasible in LLVM?


Thank you for your time.

Timothy Baldridge




-- 
“One of the main causes of the fall of the Roman Empire was
that–lacking zero–they had no way to indicate successful termination
of their C programs.”
(Robert Firth)

James Molloy

2011-Jun-29 07:56 UTC

head link

[LLVMdev] Box removal

Hi Timothy,

LLVM cannot remove the malloc calls, as malloc() has a sideeffect and that
would be changing the behaviour of the program.

Apart from that, the problem with unboxing in dynamic languages is knowing
beforehand which function to dispatch to. mul_ints or mul_floats, for
example? What if a particular type has overridden the + operator, etc etc.
So your code normally ends up bouncing through several functions making
analysis difficult.

James
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu]
> On Behalf Of Timothy Baldridge
> Sent: 28 June 2011 19:25
> To: llvmdev at cs.uiuc.edu
> Subject: [LLVMdev] Box removal
> 
> In the creation of dynamic languages we often have to box values
> together.
> 
> For instance, take the following expression:
> 
> IntObj c = sqrt((a*a)+(b*b));
> 
> Here, most likely, a bytecode interpreter would execute this as
> "mul_ints", "add_ints", "sqrt", etc. Inside
these primitive functions
> we would have to unwrap our IntObj types, add the values, allocate a
> new object and return that to the function. In the above example, we
> could probably expect around 4 allocations, and 7 unboxing operations.
> Now granted if my lanugage is running as a bytecode interpreter, I can
> speed it up simply by having LLVM call my functions in order, and
> perhaps even in-lining all the bytecode operations into a single
> function. But even then, I'm still left with the 4 allocations and 7
> unboxings (is that even a word?).
> 
> I know other compiler projects, such as PyPy have allocation removal
> where the optimization passes see that we only use the result of an
> allocation a single time. Thinking that LLVM may do this as well, I
> tried this simple test on in-browser LLVM compiler:
> --------
> #include <stdio.h>
> #include <stdlib.h>
> 
> typedef struct Foo
> {
> int *x;
> int x2;
> }Foo;
> 
> int main(int argc, char **argv) {
> Foo *f = (Foo *)malloc(sizeof(Foo));
> f->x = (int *)malloc(sizeof(int));
> *f->x = 10;
> return *f->x;
> }
> 
> ----
> 
> Output:
> 
> ; ModuleID = '/tmp/webcompile/_28006_0.bc'
> target datalayout >
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-
> f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-
> n8:16:32:64"
> target triple = "x86_64-linux-gnu"
> 
> define i32 @main(i32 %argc, i8** nocapture %argv) nounwind {
> entry:
>   %0 = tail call noalias i8* @malloc(i64 16) nounwind ; <i8*>
[#uses=1]
>   %1 = tail call noalias i8* @malloc(i64 4) nounwind ; <i8*>
[#uses=1]
>   %2 = bitcast i8* %1 to i32*                     ; <i32*> [#uses=2]
>   %3 = bitcast i8* %0 to i32**                    ; <i32**> [#uses=1]
>   store i32* %2, i32** %3, align 8
>   store i32 10, i32* %2, align 4
>   ret i32 10
> }
> 
> ------
> 
> As you can see, the allocations are still being performed. Now if we
> could get rid of these allocations, then this entire function could be
> reduced to a single line that just returns 10.
> 
> Is any optimization like this planned or even feasible in LLVM?
> 
> 
> Thank you for your time.
> 
> Timothy Baldridge
> 
> 
> 
> 
> --
> "One of the main causes of the fall of the Roman Empire was
> that-lacking zero-they had no way to indicate successful termination
> of their C programs."
> (Robert Firth)
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

me22

2011-Jun-29 21:12 UTC

head link

[LLVMdev] Box removal

On Tue, Jun 28, 2011 at 11:25, Timothy Baldridge <tbaldridge at gmail.com>
wrote:>
> I know other compiler projects, such as PyPy have allocation removal
> where the optimization passes see that we only use the result of an
> allocation a single time. Thinking that LLVM may do this as well, I
> tried this simple test on in-browser LLVM compiler:
>
There's no malloc remover, but there is a malloc/free remover.  If you
fix the C code:

#include <stdio.h>
#include <stdlib.h>

typedef struct Foo
{
int *x;
int x2;
}Foo;

int main(int argc, char **argv) {
Foo *f = (Foo *)malloc(sizeof(Foo));
f->x = (int *)malloc(sizeof(int));
*f->x = 10;
int i = *f->x;
free(f->x);
free(f);
return i;
}

Then it compiles exactly the way you want it to

; ModuleID = '/tmp/webcompile/_26915_0.bc'
target datalayout
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-linux-gnu"

define i32 @main(i32 %argc, i8** nocapture %argv) nounwind readnone {
entry:
  ret i32 10
}

~ Scott

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Jun 2011 - [LLVMdev] Box removal

[LLVMdev] Box removal

[LLVMdev] Box removal

[LLVMdev] Box removal

Maybe Matching Threads