Alex Nordwood
2015-Aug-06 01:09 UTC
[llvm-dev] Creating a virtual machine: stack, regs alloc & other problems
<DIV style="font-family:Arial, sans-serif; font-size:10pt;"><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">Hello,</SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;"><BR></SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">We are trying to port a virtual machine runtime written in x86 assembly language to other platforms. </SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">We considered using LLVM for our 'portable assembly' so that the VM runtime could be built for our new target platforms.</SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;"><BR></SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">It is stack VM, and one designed to utilize all the advantages of the assembly language implementation.</SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">Our attempt to port it to C has resulted in performance issues and our goal is to achieve the same</SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">(or better) performance as it is for our source VM. And this is where LLVM looks very promising for us.</SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;"><BR></SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">But there are several problems we noticed which we think require us to make some extensions to LLVM. </SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">The official doc referred us here for questions...and so we have some :) We want to thank everyone ahead of</SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">time who reviews this and provides us feedback, it is appreciated.</SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;"><BR></SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">Questions:</SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">1. The VM was designed to execute a high-level methods by running a special low-level function, which is either a special optimized </SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">low-level implementation of the high-level method or bytecode-interpreter run.</SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">After a high-level method executed by such a low-level function, there is a continuation that follows. The continuation is passed by VM stack and</SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">doing this using using C (by C function calls, CPS) led to significant performance loss.</SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">The bad news for us is that we have to strictly follow the existing VM design for many reasons (ex., backward compatibility).</SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">The current VM x86 assembly implementation uses a 'jump prototype' way of invoking a low-level function. </SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">This could be interpreted as a function which never returns and expects all arguments to be passed in registers.</SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">Arguments are limited to pointer and integer types. </SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">The function has no prologue/epilogue (thus EBP reg is free to use) and it 'returns' by jumping to a continuation function by pointer popped from VM stack.</SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">We are considering extending LLVM by creating a special calling convention which forces a function (using this convention) to pass args in registers and to</SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">be force tail-call optimized.</SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">We can see 'hipe' calling conv in LLVM, which does almost what we need... the difference is that we need all args passed in regs.</SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">Will extending the calling convention in the way we describe work? Does this sound reasonable? Or is there another simpler way to do so?</SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;"><BR></SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">2. Because the existing VM runtime is written in x86 assembly, and doesn't do function calls, it uses ESP register for VM stack purposes</SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">(again, it is not in use for low-level calls). </SPAN></FONT><SPAN style="font-size: 13.3333330154419px; font-family: Arial, sans-serif;">We want to do the same. </SPAN></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">In our C version we use the VM stack (and pointer) in a heap memory, which is very bad for performance. Ex., *--vmCtx->sp = obj for stack push.</SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">LLVM allows the initialization of the machine stack pointer to an arbitrary value by using 'savestack/restorestack' intrinsics, </SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">but there is no way to, for example, allocate/deallocate the VM stack frame. Note that it is _VM_ stack frame and it could be allocated </SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">by some of that low-level executing functions (as needed), but never in a prologue/epilogue, </SPAN></FONT><SPAN style="font-size: 13.3333330154419px; font-family: Arial, sans-serif;">so implementing it as another calling conv won't help here. </SPAN></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">We think that it could be implemented as intrinsics as well? Or perhaps we should create intrinsics for arbitrary machine stack access? </SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">We tried, for example, stacksave-sub-store-stackrestore sequence, but it never folds into a single push operation.</SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;"><BR></SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">3. Since the machine stack is a VM stack, we are not allowed to use alloca. It's not a problem, but the machine register allocator/spiller </SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">can still use the machine stack for register spilling purposes. </SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">How could this be solved? Should we provide our own register allocator? Or could it be solved by providing a machine function pass, </SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">which will run on a function marked with our calling conv </SPAN></FONT><SPAN style="font-size: 13.3333330154419px; font-family: Arial, sans-serif;">and substitute machine instructions responsible for spilling to stack </SPAN></DIV><DIV><SPAN style="font-size: 13.3333330154419px; font-family: Arial, sans-serif;">with load/store instructions to our heap memory?</SPAN></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;"><BR></SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">Hopefully this makes some sense? We know that we have to extend LLVM for every target platform, but it is still better than to rewrite VM in every target </SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">assembly code.</SPAN></FONT></DIV><DIV><FONT face="Arial, sans-serif"><SPAN style="font-size: 13.3333330154419px;">Thank you for your time.</SPAN></FONT></DIV><BR> <BR><HR>Are you a Techie? Get Your Free Tech Email Address Now! Visit http://www.TechEmail.com</DIV>
Micah Villmow via llvm-dev
2015-Aug-06 15:45 UTC
[llvm-dev] Creating a virtual machine: stack, regs alloc & other problems
Have you thought about writing specific LLVM passes to target your specific performance bottlenecks in order to speed up the C code? Micah From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Alex Nordwood Sent: Wednesday, August 05, 2015 6:09 PM To: llvm-dev at lists.llvm.org Subject: [llvm-dev] Creating a virtual machine: stack, regs alloc & other problems Hello, We are trying to port a virtual machine runtime written in x86 assembly language to other platforms. We considered using LLVM for our 'portable assembly' so that the VM runtime could be built for our new target platforms. It is stack VM, and one designed to utilize all the advantages of the assembly language implementation. Our attempt to port it to C has resulted in performance issues and our goal is to achieve the same (or better) performance as it is for our source VM. And this is where LLVM looks very promising for us. But there are several problems we noticed which we think require us to make some extensions to LLVM. The official doc referred us here for questions...and so we have some :) We want to thank everyone ahead of time who reviews this and provides us feedback, it is appreciated. Questions: 1. The VM was designed to execute a high-level methods by running a special low-level function, which is either a special optimized low-level implementation of the high-level method or bytecode-interpreter run. After a high-level method executed by such a low-level function, there is a continuation that follows. The continuation is passed by VM stack and doing this using using C (by C function calls, CPS) led to significant performance loss. The bad news for us is that we have to strictly follow the existing VM design for many reasons (ex., backward compatibility). The current VM x86 assembly implementation uses a 'jump prototype' way of invoking a low-level function. This could be interpreted as a function which never returns and expects all arguments to be passed in registers. Arguments are limited to pointer and integer types. The function has no prologue/epilogue (thus EBP reg is free to use) and it 'returns' by jumping to a continuation function by pointer popped from VM stack. We are considering extending LLVM by creating a special calling convention which forces a function (using this convention) to pass args in registers and to be force tail-call optimized. We can see 'hipe' calling conv in LLVM, which does almost what we need... the difference is that we need all args passed in regs. Will extending the calling convention in the way we describe work? Does this sound reasonable? Or is there another simpler way to do so? 2. Because the existing VM runtime is written in x86 assembly, and doesn't do function calls, it uses ESP register for VM stack purposes (again, it is not in use for low-level calls). We want to do the same. In our C version we use the VM stack (and pointer) in a heap memory, which is very bad for performance. Ex., *--vmCtx->sp = obj for stack push. LLVM allows the initialization of the machine stack pointer to an arbitrary value by using 'savestack/restorestack' intrinsics, but there is no way to, for example, allocate/deallocate the VM stack frame. Note that it is _VM_ stack frame and it could be allocated by some of that low-level executing functions (as needed), but never in a prologue/epilogue, so implementing it as another calling conv won't help here. We think that it could be implemented as intrinsics as well? Or perhaps we should create intrinsics for arbitrary machine stack access? We tried, for example, stacksave-sub-store-stackrestore sequence, but it never folds into a single push operation. 3. Since the machine stack is a VM stack, we are not allowed to use alloca. It's not a problem, but the machine register allocator/spiller can still use the machine stack for register spilling purposes. How could this be solved? Should we provide our own register allocator? Or could it be solved by providing a machine function pass, which will run on a function marked with our calling conv and substitute machine instructions responsible for spilling to stack with load/store instructions to our heap memory? Hopefully this makes some sense? We know that we have to extend LLVM for every target platform, but it is still better than to rewrite VM in every target assembly code. Thank you for your time. ________________________________ Are you a Techie? Get Your Free Tech Email Address Now! Visit http://www.TechEmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150806/49c412d0/attachment.html>