thr3ads.net - llvm dev - [llvm-dev] Creating a virtual machine: stack, regs alloc & other problems [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Alex Nordwood via llvm-dev

2015-Aug-06 11:01 UTC

[llvm-dev] Creating a virtual machine: stack, regs alloc & other problems

Hello,

We are trying to port a virtual machine runtime written in x86 assembly language
to other platforms.
We considered using LLVM for our 'portable assembly' so that the VM
runtime could be built for our new target platforms.

It is stack VM, and one designed to utilize all the advantages of the assembly
language implementation.
Our attempt to port it to C has resulted in performance issues and our goal is
to achieve the same
(or better) performance as it is for our source VM. And this is where LLVM looks
very promising for us.

But there are several problems we noticed which we think require us to make some
extensions to LLVM.
The official doc referred us here for questions...and so we have some :) We want
to thank everyone ahead of
time who reviews this and provides us feedback, it is appreciated.

Questions:
1. The VM was designed to execute a high-level methods by running a special
low-level function, which is either a special optimized
low-level implementation of the high-level method or bytecode-interpreter run.
After a high-level method executed by such a low-level function, there is a
continuation that follows. The continuation is passed by VM stack and
doing this using using C (by C function calls, CPS) led to significant
performance loss.
The bad news for us is that we have to strictly follow the existing VM design
for many reasons (ex., backward compatibility).
The current VM x86 assembly implementation uses a 'jump prototype' way
of invoking a low-level function.
This could be interpreted as a function which never returns and expects all
arguments to be passed in registers.
Arguments are limited to pointer and integer types.
The function has no prologue/epilogue (thus EBP reg is free to use) and it
'returns' by jumping to a continuation function by pointer popped from
VM stack.
We are considering extending LLVM by creating a special calling convention which
forces a function (using this convention) to pass args in registers and to
be force tail-call optimized.
We can see 'hipe' calling conv in LLVM, which does almost what we
need... the difference is that we need all args passed in regs.
Will extending the calling convention in the way we describe work? Does this
sound reasonable? Or is there another simpler way to do so?

2. Because the existing VM runtime is written in x86 assembly, and doesn't
do function calls, it uses ESP register for VM stack purposes
(again, it is not in use for low-level calls). We want to do the same.
In our C version we use the VM stack (and pointer) in a heap memory, which is
very bad for performance. Ex., *--vmCtx->sp = obj for stack push.
LLVM allows the initialization of the machine stack pointer to an arbitrary
value by using 'savestack/restorestack' intrinsics,
but there is no way to, for example, allocate/deallocate the VM stack frame.
Note that it is _VM_ stack frame and it could be allocated
by some of that low-level executing functions (as needed), but never in a
prologue/epilogue, so implementing it as another calling conv won't help
here.
We think that it could be implemented as intrinsics as well? Or perhaps we
should create intrinsics for arbitrary machine stack access?
We tried, for example, stacksave-sub-store-stackrestore sequence, but it never
folds into a single push operation.

3. Since the machine stack is a VM stack, we are not allowed to use alloca.
It's not a problem, but the machine register allocator/spiller
can still use the machine stack for register spilling purposes.
How could this be solved? Should we provide our own register allocator? Or could
it be solved by providing a machine function pass,
which will run on a function marked with our calling conv and substitute machine
instructions responsible for spilling to stack
with load/store instructions to our heap memory?

Hopefully this makes some sense? We know that we have to extend LLVM for every
target platform, but it is still better than to rewrite VM in every target
assembly code.
Thank you for your time.

_____________________________________________________________
Are you a Techie? Get Your Free Tech Email Address Now! Visit
http://www.TechEmail.com

Philip Reames via llvm-dev

2015-Aug-06 18:45 UTC

head link

[llvm-dev] Creating a virtual machine: stack, regs alloc & other problems

On 08/06/2015 04:01 AM, Alex Nordwood via llvm-dev
wrote:> Hello,
>
> We are trying to port a virtual machine runtime written in x86 assembly
language to other platforms.
> We considered using LLVM for our 'portable assembly' so that the VM
runtime could be built for our new target platforms.
>
> It is stack VM, and one designed to utilize all the advantages of the
assembly language implementation.
> Our attempt to port it to C has resulted in performance issues and our goal
is to achieve the same
> (or better) performance as it is for our source VM. And this is where LLVM
looks very promising for us.This sounds very, very familiar.  Are you willing to share which 
VM/language you're working on?>
> But there are several problems we noticed which we think require us to make
some extensions to LLVM.
> The official doc referred us here for questions...and so we have some :) We
want to thank everyone ahead of
> time who reviews this and provides us feedback, it is appreciated.
>
> Questions:
> 1. The VM was designed to execute a high-level methods by running a special
low-level function, which is either a special optimized
> low-level implementation of the high-level method or bytecode-interpreter
run.
> After a high-level method executed by such a low-level function, there is a
continuation that follows. The continuation is passed by VM stack and
> doing this using using C (by C function calls, CPS) led to significant
performance loss.I assume the continuation is a tail call?  If so, have you examined 
where the performance loss originated?  An obvious tail call in C code 
being compiled by modern Clang should be code generated as a tail call.  
You might be stumbling across an implementation limit and adjusting your 
input slightly might bypass that.> The bad news for us is that we have to strictly follow the existing VM
design for many reasons (ex., backward compatibility).
> The current VM x86 assembly implementation uses a 'jump prototype'
way of invoking a low-level function.
> This could be interpreted as a function which never returns and expects all
arguments to be passed in registers.
> Arguments are limited to pointer and integer types.
> The function has no prologue/epilogue (thus EBP reg is free to use) and it
'returns' by jumping to a continuation function by pointer popped from
VM stack.
> We are considering extending LLVM by creating a special calling convention
which forces a function (using this convention) to pass args in registers and to
> be force tail-call optimized.You absolutely will need a custom calling convention for the register 
assignments and such.  If your source IR uses musttail, in principal you 
shouldn't need to do anything special for the tail calls provided you're
running on one of the architectures where musttail has been
implemented.> We can see 'hipe' calling conv in LLVM, which does almost what we
need... the difference is that we need all args passed in regs.
> Will extending the calling convention in the way we describe work? Does
this sound reasonable? Or is there another simpler way to do so?Creating a new calling convention is easy.  It will require a custom 
LLVM build to get started since you have to change td and cpp files in 
the target.  For examples, see the existing ones in 
Target/X86/X86CallingConv.td>
> 2. Because the existing VM runtime is written in x86 assembly, and
doesn't do function calls, it uses ESP register for VM stack purposes
> (again, it is not in use for low-level calls). We want to do the same.This will be tricky... Do you absolutely absolutely need
this?> In our C version we use the VM stack (and pointer) in a heap memory, which
is very bad for performance. Ex., *--vmCtx->sp = obj for stack push.
> LLVM allows the initialization of the machine stack pointer to an arbitrary
value by using 'savestack/restorestack' intrinsics,
> but there is no way to, for example, allocate/deallocate the VM stack
frame. Note that it is _VM_ stack frame and it could be allocated
> by some of that low-level executing functions (as needed), but never in a
prologue/epilogue, so implementing it as another calling conv won't help
here.
> We think that it could be implemented as intrinsics as well? Or perhaps we
should create intrinsics for arbitrary machine stack access?
> We tried, for example, stacksave-sub-store-stackrestore sequence, but it
never folds into a single push operation.I would suggest just implementing your virtual machine stack as a normal 
bit of memory.  There's no reason that the compiler needs to know that 
this is the VM stack versus some other buffer.  You will need to provide 
aliasing facts, but that's a much smaller extension*.  Trying to much 
with the frames at runtime using the intrinsics is going to end very badly.

* Don't under estimate this point.  Providing aliasing metadata will be 
*really* important for this scheme to work reasonably well.  You may 
need to add custom extensions locally or propose extensions upstream to 
encode the information you need.

Fair warning: LLVM's memory optimization passes run fairly late in the 
standard pass order.  You are going to want to convert memory access to 
your stack to SSA for effective optimization.  You will need to use a 
non-standard pass order (or possibly a custom pass) to accomplish this.  
As a starting point, try running your code through -O2 twice.  It's a 
horrible hack, but will give you a sense of what's possible with an 
adjusted pass order.

Another option: If you know the size of your vm frame statically, you 
can emit loads from the vm stack locations into SSA values (or allocas, 
which will become SSA values) and spill as needed to ensure the VM stack 
is up to date as required by your language requirements.  This will 
likely a) decrease your dependence of the pass ordering above, and b) 
give slightly better results since LLVM is going to have to be 
conservative about calls into your runtime and your custom lowering gets 
to use language specific knowledge.>
> 3. Since the machine stack is a VM stack, we are not allowed to use alloca.
It's not a problem, but the machine register allocator/spiller
> can still use the machine stack for register spilling purposes.
> How could this be solved? Should we provide our own register allocator? Or
could it be solved by providing a machine function pass,
> which will run on a function marked with our calling conv and substitute
machine instructions responsible for spilling to stack
> with load/store instructions to our heap memory?I don't understand what you're trying to ask here.  If you can spill to 
the machine frame (instead of the VM stack frame), what's the
problem?>
> Hopefully this makes some sense? We know that we have to extend LLVM for
every target platform, but it is still better than to rewrite VM in every target
> assembly code.
> Thank you for your time.At a meta level, let me give my standard warning: implementing a 
functional compiler for a language of your choice on LLVM is relatively 
easy; your looking at around 1-3 man years of effort depending on the 
language.  implementing a *performant* compiler is far, far harder.  
Unless you're willing to budget for upwards of 10 man years of skilled 
compiler engineering time, you may need to adjust your expectations.  
How hard the problem will be also depends on how good your current 
implementation is of course.  :)

To give a flavour for the tuning involved, you might find this document 
helpful:
http://llvm.org/docs/Frontend/PerformanceTips.html

If you're serious about the project, I highly recommend that you make an 
effort to attend the developers conference in Oct.  You'll want to have 
a bunch of high bandwidth conversations with people who've been down 
this road before and email just doesn't quite work for that.

Philip

llvm dev - Aug 2015 - Creating a virtual machine: stack, regs alloc & other problems

[llvm-dev] Creating a virtual machine: stack, regs alloc & other problems

[llvm-dev] Creating a virtual machine: stack, regs alloc & other problems