thr3ads.net - llvm dev - [llvm-dev] Creating a virtual machine: stack, regs alloc & other problems [Aug 2015]

If this information is useful, please help other people find it:
Share via:
Alex Nordwood via llvm-dev
2015-Aug-07 13:03 UTC
[llvm-dev] Creating a virtual machine: stack, regs alloc & other problems

Hmm... that looks like an interesting idea.
We will look into it, thank you.

--- micah.villmow at softmachines.com wrote:

From: Micah Villmow <micah.villmow at softmachines.com>
To: "anordwood at techemail.com" <anordwood at techemail.com>
CC: "llvm-dev at lists.llvm.org" <llvm-dev at lists.llvm.org>
Subject: RE: [llvm-dev] Creating a virtual machine: stack, regs alloc &
other problems
Date: Thu, 6 Aug 2015 17:43:14 +0000

Yes, writing in LLVM-IR is not consider a portable solution
For example you state you having a problem:
" After a high-level method executed by such a low-level function, there is
a continuation that follows. The continuation is passed by VM stack and doing
this using using C (by C function calls, CPS) led to significant performance
loss."

What I was alluding to is to write annotations in the source code and then write
LLVM passes to specifically target your performance problem and produce the code
that runs at high performance. Either attributes on functions, calling
conventions or a set of target independent intrinsic functions. All of these can
be handled by custom passes that are expanded to target specific code.  This can
give you the portability of C and solve the performance bottlenecks caused by
it. You can even have a bitcode library of hand written inline assembly that
these functions expand to and have them linked in depending on the architecture.

This however runs on the assumption that you control the compiler, if that is
the case, there are lots of changes you can make. Now, the downside is
maintenance on LLVM version changes, but if that is relatively static, then it
doesn’t factor in too much.

Micah


-----Original Message-----
From: Alex Nordwood [mailto:anordwood at techemail.com] 
Sent: Thursday, August 06, 2015 10:33 AM
To: Micah Villmow
Cc: llvm-dev at lists.llvm.org
Subject: RE: [llvm-dev] Creating a virtual machine: stack, regs alloc &
other problems

Not sure I understand. Are you talking about writing extensions to Clang?
Our general idea is to code VM in LLVM IR, then run llc to produce obj files,
and so on.

--- micah.villmow at softmachines.com wrote:

From: Micah Villmow <micah.villmow at softmachines.com>
To: "anordwood at techemail.com" <anordwood at techemail.com>,
"llvm-dev at lists.llvm.org" <llvm-dev at lists.llvm.org>
Subject: RE: [llvm-dev] Creating a virtual machine: stack,	regs alloc &
other problems
Date: Thu, 6 Aug 2015 15:45:15 +0000

Have you thought about writing specific LLVM passes to target your specific
performance bottlenecks in order to speed up the C code?

Micah

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Alex
Nordwood
Sent: Wednesday, August 05, 2015 6:09 PM
To: llvm-dev at lists.llvm.org
Subject: [llvm-dev] Creating a virtual machine: stack, regs alloc & other
problems

Hello,

We are trying to port a virtual machine runtime written in x86 assembly language
to other platforms.
We considered using LLVM for our 'portable assembly' so that the VM
runtime could be built for our new target platforms.

It is stack VM, and one designed to utilize all the advantages of the assembly
language implementation.
Our attempt to port it to C has resulted in performance issues and our goal is
to achieve the same (or better) performance as it is for our source VM. And this
is where LLVM looks very promising for us.

But there are several problems we noticed which we think require us to make some
extensions to LLVM.
The official doc referred us here for questions...and so we have some :)  We
want to thank everyone ahead of time who reviews this and provides us feedback,
it is appreciated.

Questions:
1. The VM was designed to execute a high-level methods by running a special
low-level function, which is either a special optimized low-level implementation
of the high-level method or bytecode-interpreter run.
After a high-level method executed by such a low-level function, there is a
continuation that follows. The continuation is passed by VM stack and doing this
using using C (by C function calls, CPS) led to significant performance loss.
The bad news for us is that we have to strictly follow the existing VM design
for many reasons (ex., backward compatibility).
The current VM x86 assembly implementation uses a 'jump prototype' way
of invoking a low-level function.
This could be interpreted as a function which never returns and expects all
arguments to be passed in registers.
Arguments are limited to pointer and integer types.
The function has no prologue/epilogue (thus EBP reg is free to use) and it
'returns' by jumping to a continuation function by pointer popped from
VM stack.
We are considering extending LLVM by creating a special calling convention which
forces a function (using this convention) to pass args in registers and to be
force tail-call optimized.
We can see 'hipe' calling conv in LLVM, which does almost what we
need... the difference is that we need all args passed in regs.
Will extending the calling convention in the way we describe work? Does this
sound reasonable? Or is there another simpler way to do so?

2. Because the existing VM runtime is written in x86 assembly, and doesn't
do function calls, it uses ESP register for VM stack purposes (again, it is not
in use for low-level calls).  We want to do the same.
In our C version we use the VM stack (and pointer) in a heap memory, which is
very bad for performance. Ex., *--vmCtx->sp = obj for stack push.
LLVM allows the initialization of the machine stack pointer to an arbitrary
value by using 'savestack/restorestack' intrinsics, but there is no way
to, for example, allocate/deallocate the VM stack frame. Note that it is _VM_
stack frame and it could be allocated by some of that low-level executing
functions (as needed), but never in a prologue/epilogue, so implementing it as
another calling conv won't help here.
We think that it could be implemented as intrinsics as well?  Or perhaps we
should create intrinsics for arbitrary machine stack access?
We tried, for example, stacksave-sub-store-stackrestore sequence, but it never
folds into a single push operation.

3. Since the machine stack is a VM stack, we are not allowed to use alloca.
It's not a problem, but the machine register allocator/spiller can still use
the machine stack for register spilling purposes.
How could this be solved? Should we provide our own register allocator? Or could
it be solved by providing a machine function pass, which will run on a function
marked with our calling conv and substitute machine instructions responsible for
spilling to stack with load/store instructions to our heap memory?

Hopefully this makes some sense? We know that we have to extend LLVM for every
target platform, but it is still better than to rewrite VM in every target
assembly code.
Thank you for your time.


________________________________
Are you a Techie? Get Your Free Tech Email Address Now! Visit
http://www.TechEmail.com




_____________________________________________________________
Are you a Techie? Get Your Free Tech Email Address Now! Visit
http://www.TechEmail.com




_____________________________________________________________
Are you a Techie? Get Your Free Tech Email Address Now! Visit
http://www.TechEmail.com
llvm dev - Aug 2015 - Creating a virtual machine: stack, regs alloc & other problems

[llvm-dev] Creating a virtual machine: stack, regs alloc & other problems