Ramkumar Ramachandra
2015-Jan-15 20:58 UTC
[LLVMdev] Transform: Eliminating boxing-unboxing in untyped languages
Hi, I'm playing with an untyped language compiler that generates tons of LLVM code on simple expressions. We can slowly build up to the full complexity, but let's look at a simple example first: The + operator that explicitly expects two integers, and those two integers are provided literally in the same function in the same basic block. So, (+ 3 4) generates the following. What's actually happening is that 3 and 4 are getting boxed into a value_t struct before getting unboxed immediately for the add operation. An InstCombine should be able to fix this, no? ; ModuleID = 'My JIT' target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" %value_t = type { i32, i64, i1, i8*, %value_t**, i64, double, %value_t* (i32, %value_t**, ...)*, i8, i1, %value_t* } declare i8* @gc_malloc(i64) declare i64 @strlen(i8*) ; Function Attrs: nounwind declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture readonly, i64, i32, i1) #0 ; Function Attrs: nounwind readnone declare double @llvm.pow.f64(double, double) #1 ; Function Attrs: nounwind declare void @llvm.va_start(i8*) #0 ; Function Attrs: nounwind declare void @llvm.va_end(i8*) #0 declare %value_t* @println(i32, %value_t**, ...) declare %value_t* @print(i32, %value_t**, ...) declare %value_t* @cequ(i32, %value_t**, ...) declare %value_t* @cstrjoin(i32, %value_t**, ...) define %value_t* @anon0() gc "rgc" { entry: %value = call i8* @gc_malloc(i64 ptrtoint (%value_t* getelementptr (%value_t* null, i32 1) to i64)) %malloc_value = bitcast i8* %value to %value_t* %boxptr = getelementptr inbounds %value_t* %malloc_value, i32 0, i32 0 %boxptr1 = getelementptr inbounds %value_t* %malloc_value, i32 0, i32 1 store i32 1, i32* %boxptr store i64 3, i64* %boxptr1 %value2 = call i8* @gc_malloc(i64 ptrtoint (%value_t* getelementptr (%value_t* null, i32 1) to i64)) %malloc_value3 = bitcast i8* %value2 to %value_t* %boxptr4 = getelementptr inbounds %value_t* %malloc_value3, i32 0, i32 0 %boxptr5 = getelementptr inbounds %value_t* %malloc_value3, i32 0, i32 1 store i32 1, i32* %boxptr4 store i64 4, i64* %boxptr5 %load = load i32* %boxptr %is_dbl = icmp eq i32 %load, 6 %value7 = call i8* @gc_malloc(i64 ptrtoint (%value_t* getelementptr (%value_t* null, i32 1) to i64)) %malloc_value8 = bitcast i8* %value7 to %value_t* %boxptr9 = getelementptr inbounds %value_t* %malloc_value8, i32 0, i32 0 %boxptr10 = getelementptr inbounds %value_t* %malloc_value8, i32 0, i32 2 store i32 2, i32* %boxptr9 store i1 %is_dbl, i1* %boxptr10 %load12 = load i32* %boxptr4 %is_dbl13 = icmp eq i32 %load12, 6 %value14 = call i8* @gc_malloc(i64 ptrtoint (%value_t* getelementptr (%value_t* null, i32 1) to i64)) %malloc_value15 = bitcast i8* %value14 to %value_t* %boxptr16 = getelementptr inbounds %value_t* %malloc_value15, i32 0, i32 0 %boxptr17 = getelementptr inbounds %value_t* %malloc_value15, i32 0, i32 2 store i32 2, i32* %boxptr16 store i1 %is_dbl13, i1* %boxptr17 %load19 = load i64* %boxptr5 %load21 = load i64* %boxptr1 %add = add i64 %load21, %load19 %value22 = call i8* @gc_malloc(i64 ptrtoint (%value_t* getelementptr (%value_t* null, i32 1) to i64)) %malloc_value23 = bitcast i8* %value22 to %value_t* %boxptr24 = getelementptr inbounds %value_t* %malloc_value23, i32 0, i32 0 %boxptr25 = getelementptr inbounds %value_t* %malloc_value23, i32 0, i32 1 store i32 1, i32* %boxptr24 store i64 %add, i64* %boxptr25 ret %value_t* %malloc_value23 } attributes #0 = { nounwind } attributes #1 = { nounwind readnone }
Philip Reames
2015-Jan-15 21:58 UTC
[LLVMdev] Transform: Eliminating boxing-unboxing in untyped languages
This is a memory analysis problem and is probably best solved by GVN (or possibly EarlyCSE). If you look at the debug output from GVN and mem dep analysis, I suspect you'll find that the second gc_malloc call is blocking the load forwarding from the first one. I'd suggest a few things: - Try a *trivial* example with a single boxed integer. Does that get 'unboxed'? (The allocation won't be removed most likely.) - If so, does adding a *single* call to gc_malloc between the store and load break it? (I suspect it will.) - Look into using appropriate attributes (noalias!) to convey the aliasing properties you need. This is best done by tracing through where a simple example fails and looking at surrounding code. (Also, see LangRef) Philip On 01/15/2015 12:58 PM, Ramkumar Ramachandra wrote:> Hi, > > I'm playing with an untyped language compiler that generates tons of > LLVM code on simple expressions. We can slowly build up to the full > complexity, but let's look at a simple example first: > > The + operator that explicitly expects two integers, and those two > integers are provided literally in the same function in the same basic > block. So, (+ 3 4) generates the following. What's actually happening > is that 3 and 4 are getting boxed into a value_t struct before getting > unboxed immediately for the add operation. An InstCombine should be > able to fix this, no? > > ; ModuleID = 'My JIT' > target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" > > %value_t = type { i32, i64, i1, i8*, %value_t**, i64, double, > %value_t* (i32, %value_t**, ...)*, i8, i1, %value_t* } > > declare i8* @gc_malloc(i64) > > declare i64 @strlen(i8*) > > ; Function Attrs: nounwind > declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture > readonly, i64, i32, i1) #0 > > ; Function Attrs: nounwind readnone > declare double @llvm.pow.f64(double, double) #1 > > ; Function Attrs: nounwind > declare void @llvm.va_start(i8*) #0 > > ; Function Attrs: nounwind > declare void @llvm.va_end(i8*) #0 > > declare %value_t* @println(i32, %value_t**, ...) > > declare %value_t* @print(i32, %value_t**, ...) > > declare %value_t* @cequ(i32, %value_t**, ...) > > declare %value_t* @cstrjoin(i32, %value_t**, ...) > > define %value_t* @anon0() gc "rgc" { > entry: > %value = call i8* @gc_malloc(i64 ptrtoint (%value_t* getelementptr > (%value_t* null, i32 1) to i64)) > %malloc_value = bitcast i8* %value to %value_t* > %boxptr = getelementptr inbounds %value_t* %malloc_value, i32 0, i32 0 > %boxptr1 = getelementptr inbounds %value_t* %malloc_value, i32 0, i32 1 > store i32 1, i32* %boxptr > store i64 3, i64* %boxptr1 > %value2 = call i8* @gc_malloc(i64 ptrtoint (%value_t* getelementptr > (%value_t* null, i32 1) to i64)) > %malloc_value3 = bitcast i8* %value2 to %value_t* > %boxptr4 = getelementptr inbounds %value_t* %malloc_value3, i32 0, i32 0 > %boxptr5 = getelementptr inbounds %value_t* %malloc_value3, i32 0, i32 1 > store i32 1, i32* %boxptr4 > store i64 4, i64* %boxptr5 > %load = load i32* %boxptr > %is_dbl = icmp eq i32 %load, 6 > %value7 = call i8* @gc_malloc(i64 ptrtoint (%value_t* getelementptr > (%value_t* null, i32 1) to i64)) > %malloc_value8 = bitcast i8* %value7 to %value_t* > %boxptr9 = getelementptr inbounds %value_t* %malloc_value8, i32 0, i32 0 > %boxptr10 = getelementptr inbounds %value_t* %malloc_value8, i32 0, i32 2 > store i32 2, i32* %boxptr9 > store i1 %is_dbl, i1* %boxptr10 > %load12 = load i32* %boxptr4 > %is_dbl13 = icmp eq i32 %load12, 6 > %value14 = call i8* @gc_malloc(i64 ptrtoint (%value_t* getelementptr > (%value_t* null, i32 1) to i64)) > %malloc_value15 = bitcast i8* %value14 to %value_t* > %boxptr16 = getelementptr inbounds %value_t* %malloc_value15, i32 0, i32 0 > %boxptr17 = getelementptr inbounds %value_t* %malloc_value15, i32 0, i32 2 > store i32 2, i32* %boxptr16 > store i1 %is_dbl13, i1* %boxptr17 > %load19 = load i64* %boxptr5 > %load21 = load i64* %boxptr1 > %add = add i64 %load21, %load19 > %value22 = call i8* @gc_malloc(i64 ptrtoint (%value_t* getelementptr > (%value_t* null, i32 1) to i64)) > %malloc_value23 = bitcast i8* %value22 to %value_t* > %boxptr24 = getelementptr inbounds %value_t* %malloc_value23, i32 0, i32 0 > %boxptr25 = getelementptr inbounds %value_t* %malloc_value23, i32 0, i32 1 > store i32 1, i32* %boxptr24 > store i64 %add, i64* %boxptr25 > ret %value_t* %malloc_value23 > } > > attributes #0 = { nounwind } > attributes #1 = { nounwind readnone }