thr3ads.net - llvm dev - [llvm-dev] Creating a virtual machine: stack, regs alloc & other problems [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Alex Nordwood via llvm-dev

2015-Aug-07 21:35 UTC

[llvm-dev] Creating a virtual machine: stack, regs alloc & other problems

Hello.
>> It is stack VM, and one designed to utilize all the advantages of the
assembly language implementation.
> This sounds very, very familiar.  Are you willing to share which 
> VM/language you're working on?I would like to because I think the additional context would be helpful...let me
ask for permission first.
>> doing this using using C (by C function calls, CPS) led to significant
performance loss.
> I assume the continuation is a tail call?  If so, have you examined 
> where the performance loss originated?  An obvious tail call in C code 
> being compiled by modern Clang should be code generated as a tail call.  
> You might be stumbling across an implementation limit and adjusting your 
> input slightly might bypass that.Yes, both executor and it's continuation are tail call. And we tried gcc
4.8, 4.9, 5.1
and clang 3.6 C compilers. All are good to infer a tail call if the function
call is explicit. But if
a continuation function pointer is popped up from VM stack, neither of these
were able to produce a jump,
leading to machine stack overflow. Not sure why clang wasn't able to do
this, because while using
LLVM IR it works (using a test code, of course).
>> We are considering extending LLVM by creating a special calling
convention which
>> forces a function (using this convention) to pass args in registers and
to
>> be force tail-call optimized.
> You absolutely will need a custom calling convention for the register 
> assignments and such.  If your source IR uses musttail, in principal you 
> shouldn't need to do anything special for the tail calls provided
you're
> running on one of the architectures where musttail has been implemented.musttail has some limitations (ex., the caller and callee prototypes must
match), but tail
with -tailcallopt work just fine.
Looks like we were on the right track with that question. Thanks!
> Creating a new calling convention is easy.  It will require a custom 
> LLVM build to get started since you have to change td and cpp files in 
> the target.  For examples, see the existing ones in 
> Target/X86/X86CallingConv.tdYes, we already have a custom build. Thanks!
>> 2. Because the existing VM runtime is written in x86 assembly, and
doesn't do function calls, it uses ESP register for VM stack purposes
>> (again, it is not in use for low-level calls). We want to do the same.
>This will be tricky... Do you absolutely absolutely need this?We still have to support x86 32-bit and this arch has a lack of GP registers,
and
a) esp register becomes almost unused, 
b) we will have to do stack operations using other regs, which may lead to more
spilling.
So, it's good to have the esp reg doing what is has to...but for VM stack.
We don't want to keep original x86 assembly version along with our new one
llvm-based (hopefully we will make it).
>> We think that it could be implemented as intrinsics as well? Or perhaps
we should create intrinsics for arbitrary machine stack access?
>> We tried, for example, stacksave-sub-store-stackrestore sequence, but
it never folds into a single push operation.
> I would suggest just implementing your virtual machine stack as a normal 
> bit of memory.  There's no reason that the compiler needs to know that 
> this is the VM stack versus some other buffer.  You will need to provide 
> aliasing facts, but that's a much smaller extension*.  Trying to much 
> with the frames at runtime using the intrinsics is going to end very badly.
> * Don't under estimate this point.  Providing aliasing metadata will be
> *really* important for this scheme to work reasonably well.  You may 
> need to add custom extensions locally or propose extensions upstream to 
> encode the information you need.Could you please be more specific? Pointing to some docs or examples will work
just fine. :)
> Another option: If you know the size of your vm frame statically, you 
> can emit loads from the vm stack locations into SSA values (or allocas, 
> which will become SSA values) and spill as needed to ensure the VM stack 
> is up to date as required by your language requirements.  This will 
> likely a) decrease your dependence of the pass ordering above, and b) 
> give slightly better results since LLVM is going to have to be 
> conservative about calls into your runtime and your custom lowering gets 
> to use language specific knowledge.Hmm.. this sounds interesting. thanks.
So, if I understand correctly, if we need to allocate a VM stack frame, the idea
is to create
enough allocas, then store there values which need to be in the VM frame? 
But how can it survive optimizing passes?
(I assume that we did stackload before and esp points to VM stack)
Could you please explain more? 
>> 3. Since the machine stack is a VM stack, we are not allowed to use
alloca. It's not a problem, but the machine register allocator/spiller
>> can still use the machine stack for register spilling purposes.
>> How could this be solved? Should we provide our own register allocator?
Or could it be solved by providing a machine function pass,
>> which will run on a function marked with our calling conv and
substitute machine instructions responsible for spilling to stack
>> with load/store instructions to our heap memory?
> I don't understand what you're trying to ask here.  If you can
spill to
> the machine frame (instead of the VM stack frame), what's the problem?I mean that if we do stackload and the machine stack points to the VM stack (and
we
somehow solved the problem above), LLVM still wants to spill regs to stack. 
It would be good to have the spill slots in the VM context, but not in the stack
(neither machine nor VM).
>> Thank you for your time.
> At a meta level, let me give my standard warning: implementing a 
> functional compiler for a language of your choice on LLVM is relatively 
> easy; your looking at around 1-3 man years of effort depending on the 
> language.  implementing a *performant* compiler is far, far harder.  
> Unless you're willing to budget for upwards of 10 man years of skilled 
> compiler engineering time, you may need to adjust your expectations.  
> How hard the problem will be also depends on how good your current 
> implementation is of course.  :)I appreciate the warning. 
Strictly speaking, we don't implement the compiler itself. It's only a
runtime
for interpreting bytecodes compiled earlier. JIT is upcoming, but not for now.
It's also not including a memory manager - it works good enough written in
C.
Our task is really not easy, but not so hard as you think. :) We think. :)
> To give a flavour for the tuning involved, you might find this document 
> helpful:
> http://llvm.org/docs/Frontend/PerformanceTips.html
> If you're serious about the project, I highly recommend that you make
an
> effort to attend the developers conference in Oct.  You'll want to have
> a bunch of high bandwidth conversations with people who've been down 
> this road before and email just doesn't quite work for that.Thanks! We will keep this in mind.


_____________________________________________________________
Are you a Techie? Get Your Free Tech Email Address Now! Visit
http://www.TechEmail.com

Philip Reames via llvm-dev

2015-Aug-07 22:27 UTC

head link

[llvm-dev] Creating a virtual machine: stack, regs alloc & other problems

On 08/07/2015 02:35 PM, Alex Nordwood wrote:> Hello.
>
>>> It is stack VM, and one designed to utilize all the advantages of
the assembly language implementation.
>> This sounds very, very familiar.  Are you willing to share which
>> VM/language you're working on?
> I would like to because I think the additional context would be
helpful...let me ask for permission first.
>
>>> doing this using using C (by C function calls, CPS) led to
significant performance loss.
>> I assume the continuation is a tail call?  If so, have you examined
>> where the performance loss originated?  An obvious tail call in C code
>> being compiled by modern Clang should be code generated as a tail call.
>> You might be stumbling across an implementation limit and adjusting
your
>> input slightly might bypass that.
> Yes, both executor and it's continuation are tail call. And we tried
gcc 4.8, 4.9, 5.1
> and clang 3.6 C compilers. All are good to infer a tail call if the
function call is explicit. But if
> a continuation function pointer is popped up from VM stack, neither of
these were able to produce a jump,
> leading to machine stack overflow. Not sure why clang wasn't able to do
this, because while using
> LLVM IR it works (using a test code, of course).I would suggest looking into this.  The smallest C reproducer which 
doesn't get a tail call would be interesting to see and might get fixed.  :)

One thing you might be running into if you're VM is in C vs C++ is that 
C++ pointers-to-member functions aren't just function
pointers.>
>>> We are considering extending LLVM by creating a special calling
convention which
>>> forces a function (using this convention) to pass args in registers
and to
>>> be force tail-call optimized.
>> You absolutely will need a custom calling convention for the register
>> assignments and such.  If your source IR uses musttail, in principal
you
>> shouldn't need to do anything special for the tail calls provided
you're
>> running on one of the architectures where musttail has been
implemented.
> musttail has some limitations (ex., the caller and callee prototypes must
match), but tail
> with -tailcallopt work just fine.
> Looks like we were on the right track with that question. Thanks!
>
>> Creating a new calling convention is easy.  It will require a custom
>> LLVM build to get started since you have to change td and cpp files in
>> the target.  For examples, see the existing ones in
>> Target/X86/X86CallingConv.td
> Yes, we already have a custom build. Thanks!
>
>>> 2. Because the existing VM runtime is written in x86 assembly, and
doesn't do function calls, it uses ESP register for VM stack purposes
>>> (again, it is not in use for low-level calls). We want to do the
same.
>> This will be tricky... Do you absolutely absolutely need this?
> We still have to support x86 32-bit and this arch has a lack of GP
registers, and
> a) esp register becomes almost unused,
> b) we will have to do stack operations using other regs, which may lead to
more spilling.
> So, it's good to have the esp reg doing what is has to...but for VM
stack.
> We don't want to keep original x86 assembly version along with our new
one llvm-based (hopefully we will make it).How does your current runtime track spills inserted by the compiler?  Is 
that integrated with the VM stack?  Or is that a distinct stack?  If you 
didn't mind spills being interwoven with vm frames, you could model the 
vm stack operations as dynamic allocas potentially.

>
>>> We think that it could be implemented as intrinsics as well? Or
perhaps we should create intrinsics for arbitrary machine stack access?
>>> We tried, for example, stacksave-sub-store-stackrestore sequence,
but it never folds into a single push operation.
>> I would suggest just implementing your virtual machine stack as a
normal
>> bit of memory.  There's no reason that the compiler needs to know
that
>> this is the VM stack versus some other buffer.  You will need to
provide
>> aliasing facts, but that's a much smaller extension*.  Trying to
much
>> with the frames at runtime using the intrinsics is going to end very
badly.
>> * Don't under estimate this point.  Providing aliasing metadata
will be
>> *really* important for this scheme to work reasonably well.  You may
>> need to add custom extensions locally or propose extensions upstream to
>> encode the information you need.
> Could you please be more specific? Pointing to some docs or examples will
work just fine. :)See LangRef.  Search for noalias, tbaa, invariant.load, inbounds, 
readonly, readnone, argmemonly, nonnull...

See the alias analysis docs and the TBAA pass as an example of how to 
write and integrate a custom AA pass.>
>> Another option: If you know the size of your vm frame statically, you
>> can emit loads from the vm stack locations into SSA values (or allocas,
>> which will become SSA values) and spill as needed to ensure the VM
stack
>> is up to date as required by your language requirements.  This will
>> likely a) decrease your dependence of the pass ordering above, and b)
>> give slightly better results since LLVM is going to have to be
>> conservative about calls into your runtime and your custom lowering
gets
>> to use language specific knowledge.
> Hmm.. this sounds interesting. thanks.
> So, if I understand correctly, if we need to allocate a VM stack frame, the
idea is to create
> enough allocas, then store there values which need to be in the VM frame?
> But how can it survive optimizing passes?The allocas specially shouldn't survive optimization.  That's the 
point.  :)  If the vm stack has been escaped at all the relevant points, 
the spills to the vm stack memory can't be eliminated.  As a result, 
you'd get the effect of having a materialized vm stack when you need it, 
and everything in SSA/execution stack the rest of the
time.> (I assume that we did stackload before and esp points to VM stack)
> Could you please explain more?I was assuming you still had a separate execution stack and vm stack.  
Mixing the two without letting LLVM spill things between VM stack 
sections would be "interesting".>>> 3. Since the machine stack is a VM stack, we are not allowed to use
alloca. It's not a problem, but the machine register allocator/spiller
>>> can still use the machine stack for register spilling purposes.
>>> How could this be solved? Should we provide our own register
allocator? Or could it be solved by providing a machine function pass,
>>> which will run on a function marked with our calling conv and
substitute machine instructions responsible for spilling to stack
>>> with load/store instructions to our heap memory?
>> I don't understand what you're trying to ask here.  If you can
spill to
>> the machine frame (instead of the VM stack frame), what's the
problem?
> I mean that if we do stackload and the machine stack points to the VM stack
(and we
> somehow solved the problem above), LLVM still wants to spill regs to stack.
> It would be good to have the spill slots in the VM context, but not in the
stack
> (neither machine nor VM).This is going to be a really problematic design point.  The entire LLVM 
backend assumes it owns the execution stack.  That's a really really 
built in assumption.  Trying to change that would be extremely 
challenging.  (See comment below)>
>>> Thank you for your time.
>> At a meta level, let me give my standard warning: implementing a
>> functional compiler for a language of your choice on LLVM is relatively
>> easy; your looking at around 1-3 man years of effort depending on the
>> language.  implementing a *performant* compiler is far, far harder.
>> Unless you're willing to budget for upwards of 10 man years of
skilled
>> compiler engineering time, you may need to adjust your expectations.
>> How hard the problem will be also depends on how good your current
>> implementation is of course.  :)
> I appreciate the warning.
> Strictly speaking, we don't implement the compiler itself. It's
only a runtime
> for interpreting bytecodes compiled earlier. JIT is upcoming, but not for
now.
> It's also not including a memory manager - it works good enough written
in C.
> Our task is really not easy, but not so hard as you think. :) We think. :)Wait, what?  I think I got something confused at some point.  All of my 
answers above were with a JIT in mind.  :)  Doing an interpreter is a 
slightly different beast.

For the record, trying to not have an execution stack for spilling just 
started seeming a lot more sane.  :)  I would still start with a design 
that uses an extra (ESP) register for the execution stack, but tuning 
the IR for an interpreter to not spill or changing the base pointer to 
be something special with a fixed scratch pad seems approachable.  
Dealing with a restricted bit of code is much more approachable than 
whatever a JIT might emit.  :)  Still challenging
though.>
>> To give a flavour for the tuning involved, you might find this document
>> helpful:
>> http://llvm.org/docs/Frontend/PerformanceTips.html
>> If you're serious about the project, I highly recommend that you
make an
>> effort to attend the developers conference in Oct.  You'll want to
have
>> a bunch of high bandwidth conversations with people who've been
down
>> this road before and email just doesn't quite work for that.
> Thanks! We will keep this in mind.
>
>
> _____________________________________________________________
> Are you a Techie? Get Your Free Tech Email Address Now! Visit
http://www.TechEmail.com

Jonathan S. Shapiro via llvm-dev

2015-Aug-07 23:43 UTC

head link

[llvm-dev] Creating a virtual machine: stack, regs alloc & other problems

Alex:

I'm not sure you're taking the right approach with this. You can either
have portability or you can play games with the calling convention assumed
by the back end, or you can modify the compiler to suit your desired
calling convention, but you probably can't get all three.

I'm the guy behind HDTrans (dynamic binrary translation for x86), and we
used direct x86 instruction emission as well, and we cheated like crazy on
calling conventions, stacks, you name it. So I understand where you are
coming from. I've also done some bytecode VM work. You just aren't going
to
get a portable result that way, and as others have said already, using
llvm-il isn't going to get you there. I think you are better off stepping
back and looking at this as a new engineering problem rather than trying to
translate your existing solution piece by piece. The bad news is that this
infrastructure my not let you get quite as far down toward the bare metal.
The good news is that it can be exploited to do more in the way of dynamic
optimization than is typically feasible with directly hand-generated
machine code.

If you like, get in touch with me off-line. I don't want to go spouting off
useless ideas here, because I don't understand what you are trying to do
yet. But I'd be happy to talk with you to get a slightly better sense and
see if I can offer some practical help.


Jonathan
shap (at) eros-osdogorg
Dog should be dot, of course. :-)
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150807/37359f19/attachment.html>

Seemingly Similar Threads

Search for more possibly parallel threads

llvm dev - Aug 2015 - Creating a virtual machine: stack, regs alloc & other problems

[llvm-dev] Creating a virtual machine: stack, regs alloc & other problems

[llvm-dev] Creating a virtual machine: stack, regs alloc & other problems

[llvm-dev] Creating a virtual machine: stack, regs alloc & other problems

Seemingly Similar Threads