Alex Nordwood via llvm-dev
2015-Aug-07 21:35 UTC
[llvm-dev] Creating a virtual machine: stack, regs alloc & other problems
Hello.>> It is stack VM, and one designed to utilize all the advantages of the assembly language implementation. > This sounds very, very familiar. Are you willing to share which > VM/language you're working on?I would like to because I think the additional context would be helpful...let me ask for permission first.>> doing this using using C (by C function calls, CPS) led to significant performance loss. > I assume the continuation is a tail call? If so, have you examined > where the performance loss originated? An obvious tail call in C code > being compiled by modern Clang should be code generated as a tail call. > You might be stumbling across an implementation limit and adjusting your > input slightly might bypass that.Yes, both executor and it's continuation are tail call. And we tried gcc 4.8, 4.9, 5.1 and clang 3.6 C compilers. All are good to infer a tail call if the function call is explicit. But if a continuation function pointer is popped up from VM stack, neither of these were able to produce a jump, leading to machine stack overflow. Not sure why clang wasn't able to do this, because while using LLVM IR it works (using a test code, of course).>> We are considering extending LLVM by creating a special calling convention which >> forces a function (using this convention) to pass args in registers and to >> be force tail-call optimized. > You absolutely will need a custom calling convention for the register > assignments and such. If your source IR uses musttail, in principal you > shouldn't need to do anything special for the tail calls provided you're > running on one of the architectures where musttail has been implemented.musttail has some limitations (ex., the caller and callee prototypes must match), but tail with -tailcallopt work just fine. Looks like we were on the right track with that question. Thanks!> Creating a new calling convention is easy. It will require a custom > LLVM build to get started since you have to change td and cpp files in > the target. For examples, see the existing ones in > Target/X86/X86CallingConv.tdYes, we already have a custom build. Thanks!>> 2. Because the existing VM runtime is written in x86 assembly, and doesn't do function calls, it uses ESP register for VM stack purposes >> (again, it is not in use for low-level calls). We want to do the same. >This will be tricky... Do you absolutely absolutely need this?We still have to support x86 32-bit and this arch has a lack of GP registers, and a) esp register becomes almost unused, b) we will have to do stack operations using other regs, which may lead to more spilling. So, it's good to have the esp reg doing what is has to...but for VM stack. We don't want to keep original x86 assembly version along with our new one llvm-based (hopefully we will make it).>> We think that it could be implemented as intrinsics as well? Or perhaps we should create intrinsics for arbitrary machine stack access? >> We tried, for example, stacksave-sub-store-stackrestore sequence, but it never folds into a single push operation. > I would suggest just implementing your virtual machine stack as a normal > bit of memory. There's no reason that the compiler needs to know that > this is the VM stack versus some other buffer. You will need to provide > aliasing facts, but that's a much smaller extension*. Trying to much > with the frames at runtime using the intrinsics is going to end very badly. > * Don't under estimate this point. Providing aliasing metadata will be > *really* important for this scheme to work reasonably well. You may > need to add custom extensions locally or propose extensions upstream to > encode the information you need.Could you please be more specific? Pointing to some docs or examples will work just fine. :)> Another option: If you know the size of your vm frame statically, you > can emit loads from the vm stack locations into SSA values (or allocas, > which will become SSA values) and spill as needed to ensure the VM stack > is up to date as required by your language requirements. This will > likely a) decrease your dependence of the pass ordering above, and b) > give slightly better results since LLVM is going to have to be > conservative about calls into your runtime and your custom lowering gets > to use language specific knowledge.Hmm.. this sounds interesting. thanks. So, if I understand correctly, if we need to allocate a VM stack frame, the idea is to create enough allocas, then store there values which need to be in the VM frame? But how can it survive optimizing passes? (I assume that we did stackload before and esp points to VM stack) Could you please explain more?>> 3. Since the machine stack is a VM stack, we are not allowed to use alloca. It's not a problem, but the machine register allocator/spiller >> can still use the machine stack for register spilling purposes. >> How could this be solved? Should we provide our own register allocator? Or could it be solved by providing a machine function pass, >> which will run on a function marked with our calling conv and substitute machine instructions responsible for spilling to stack >> with load/store instructions to our heap memory? > I don't understand what you're trying to ask here. If you can spill to > the machine frame (instead of the VM stack frame), what's the problem?I mean that if we do stackload and the machine stack points to the VM stack (and we somehow solved the problem above), LLVM still wants to spill regs to stack. It would be good to have the spill slots in the VM context, but not in the stack (neither machine nor VM).>> Thank you for your time. > At a meta level, let me give my standard warning: implementing a > functional compiler for a language of your choice on LLVM is relatively > easy; your looking at around 1-3 man years of effort depending on the > language. implementing a *performant* compiler is far, far harder. > Unless you're willing to budget for upwards of 10 man years of skilled > compiler engineering time, you may need to adjust your expectations. > How hard the problem will be also depends on how good your current > implementation is of course. :)I appreciate the warning. Strictly speaking, we don't implement the compiler itself. It's only a runtime for interpreting bytecodes compiled earlier. JIT is upcoming, but not for now. It's also not including a memory manager - it works good enough written in C. Our task is really not easy, but not so hard as you think. :) We think. :)> To give a flavour for the tuning involved, you might find this document > helpful: > http://llvm.org/docs/Frontend/PerformanceTips.html > If you're serious about the project, I highly recommend that you make an > effort to attend the developers conference in Oct. You'll want to have > a bunch of high bandwidth conversations with people who've been down > this road before and email just doesn't quite work for that.Thanks! We will keep this in mind. _____________________________________________________________ Are you a Techie? Get Your Free Tech Email Address Now! Visit http://www.TechEmail.com
Philip Reames via llvm-dev
2015-Aug-07 22:27 UTC
[llvm-dev] Creating a virtual machine: stack, regs alloc & other problems
On 08/07/2015 02:35 PM, Alex Nordwood wrote:> Hello. > >>> It is stack VM, and one designed to utilize all the advantages of the assembly language implementation. >> This sounds very, very familiar. Are you willing to share which >> VM/language you're working on? > I would like to because I think the additional context would be helpful...let me ask for permission first. > >>> doing this using using C (by C function calls, CPS) led to significant performance loss. >> I assume the continuation is a tail call? If so, have you examined >> where the performance loss originated? An obvious tail call in C code >> being compiled by modern Clang should be code generated as a tail call. >> You might be stumbling across an implementation limit and adjusting your >> input slightly might bypass that. > Yes, both executor and it's continuation are tail call. And we tried gcc 4.8, 4.9, 5.1 > and clang 3.6 C compilers. All are good to infer a tail call if the function call is explicit. But if > a continuation function pointer is popped up from VM stack, neither of these were able to produce a jump, > leading to machine stack overflow. Not sure why clang wasn't able to do this, because while using > LLVM IR it works (using a test code, of course).I would suggest looking into this. The smallest C reproducer which doesn't get a tail call would be interesting to see and might get fixed. :) One thing you might be running into if you're VM is in C vs C++ is that C++ pointers-to-member functions aren't just function pointers.> >>> We are considering extending LLVM by creating a special calling convention which >>> forces a function (using this convention) to pass args in registers and to >>> be force tail-call optimized. >> You absolutely will need a custom calling convention for the register >> assignments and such. If your source IR uses musttail, in principal you >> shouldn't need to do anything special for the tail calls provided you're >> running on one of the architectures where musttail has been implemented. > musttail has some limitations (ex., the caller and callee prototypes must match), but tail > with -tailcallopt work just fine. > Looks like we were on the right track with that question. Thanks! > >> Creating a new calling convention is easy. It will require a custom >> LLVM build to get started since you have to change td and cpp files in >> the target. For examples, see the existing ones in >> Target/X86/X86CallingConv.td > Yes, we already have a custom build. Thanks! > >>> 2. Because the existing VM runtime is written in x86 assembly, and doesn't do function calls, it uses ESP register for VM stack purposes >>> (again, it is not in use for low-level calls). We want to do the same. >> This will be tricky... Do you absolutely absolutely need this? > We still have to support x86 32-bit and this arch has a lack of GP registers, and > a) esp register becomes almost unused, > b) we will have to do stack operations using other regs, which may lead to more spilling. > So, it's good to have the esp reg doing what is has to...but for VM stack. > We don't want to keep original x86 assembly version along with our new one llvm-based (hopefully we will make it).How does your current runtime track spills inserted by the compiler? Is that integrated with the VM stack? Or is that a distinct stack? If you didn't mind spills being interwoven with vm frames, you could model the vm stack operations as dynamic allocas potentially.> >>> We think that it could be implemented as intrinsics as well? Or perhaps we should create intrinsics for arbitrary machine stack access? >>> We tried, for example, stacksave-sub-store-stackrestore sequence, but it never folds into a single push operation. >> I would suggest just implementing your virtual machine stack as a normal >> bit of memory. There's no reason that the compiler needs to know that >> this is the VM stack versus some other buffer. You will need to provide >> aliasing facts, but that's a much smaller extension*. Trying to much >> with the frames at runtime using the intrinsics is going to end very badly. >> * Don't under estimate this point. Providing aliasing metadata will be >> *really* important for this scheme to work reasonably well. You may >> need to add custom extensions locally or propose extensions upstream to >> encode the information you need. > Could you please be more specific? Pointing to some docs or examples will work just fine. :)See LangRef. Search for noalias, tbaa, invariant.load, inbounds, readonly, readnone, argmemonly, nonnull... See the alias analysis docs and the TBAA pass as an example of how to write and integrate a custom AA pass.> >> Another option: If you know the size of your vm frame statically, you >> can emit loads from the vm stack locations into SSA values (or allocas, >> which will become SSA values) and spill as needed to ensure the VM stack >> is up to date as required by your language requirements. This will >> likely a) decrease your dependence of the pass ordering above, and b) >> give slightly better results since LLVM is going to have to be >> conservative about calls into your runtime and your custom lowering gets >> to use language specific knowledge. > Hmm.. this sounds interesting. thanks. > So, if I understand correctly, if we need to allocate a VM stack frame, the idea is to create > enough allocas, then store there values which need to be in the VM frame? > But how can it survive optimizing passes?The allocas specially shouldn't survive optimization. That's the point. :) If the vm stack has been escaped at all the relevant points, the spills to the vm stack memory can't be eliminated. As a result, you'd get the effect of having a materialized vm stack when you need it, and everything in SSA/execution stack the rest of the time.> (I assume that we did stackload before and esp points to VM stack) > Could you please explain more?I was assuming you still had a separate execution stack and vm stack. Mixing the two without letting LLVM spill things between VM stack sections would be "interesting".>>> 3. Since the machine stack is a VM stack, we are not allowed to use alloca. It's not a problem, but the machine register allocator/spiller >>> can still use the machine stack for register spilling purposes. >>> How could this be solved? Should we provide our own register allocator? Or could it be solved by providing a machine function pass, >>> which will run on a function marked with our calling conv and substitute machine instructions responsible for spilling to stack >>> with load/store instructions to our heap memory? >> I don't understand what you're trying to ask here. If you can spill to >> the machine frame (instead of the VM stack frame), what's the problem? > I mean that if we do stackload and the machine stack points to the VM stack (and we > somehow solved the problem above), LLVM still wants to spill regs to stack. > It would be good to have the spill slots in the VM context, but not in the stack > (neither machine nor VM).This is going to be a really problematic design point. The entire LLVM backend assumes it owns the execution stack. That's a really really built in assumption. Trying to change that would be extremely challenging. (See comment below)> >>> Thank you for your time. >> At a meta level, let me give my standard warning: implementing a >> functional compiler for a language of your choice on LLVM is relatively >> easy; your looking at around 1-3 man years of effort depending on the >> language. implementing a *performant* compiler is far, far harder. >> Unless you're willing to budget for upwards of 10 man years of skilled >> compiler engineering time, you may need to adjust your expectations. >> How hard the problem will be also depends on how good your current >> implementation is of course. :) > I appreciate the warning. > Strictly speaking, we don't implement the compiler itself. It's only a runtime > for interpreting bytecodes compiled earlier. JIT is upcoming, but not for now. > It's also not including a memory manager - it works good enough written in C. > Our task is really not easy, but not so hard as you think. :) We think. :)Wait, what? I think I got something confused at some point. All of my answers above were with a JIT in mind. :) Doing an interpreter is a slightly different beast. For the record, trying to not have an execution stack for spilling just started seeming a lot more sane. :) I would still start with a design that uses an extra (ESP) register for the execution stack, but tuning the IR for an interpreter to not spill or changing the base pointer to be something special with a fixed scratch pad seems approachable. Dealing with a restricted bit of code is much more approachable than whatever a JIT might emit. :) Still challenging though.> >> To give a flavour for the tuning involved, you might find this document >> helpful: >> http://llvm.org/docs/Frontend/PerformanceTips.html >> If you're serious about the project, I highly recommend that you make an >> effort to attend the developers conference in Oct. You'll want to have >> a bunch of high bandwidth conversations with people who've been down >> this road before and email just doesn't quite work for that. > Thanks! We will keep this in mind. > > > _____________________________________________________________ > Are you a Techie? Get Your Free Tech Email Address Now! Visit http://www.TechEmail.com
Jonathan S. Shapiro via llvm-dev
2015-Aug-07 23:43 UTC
[llvm-dev] Creating a virtual machine: stack, regs alloc & other problems
Alex: I'm not sure you're taking the right approach with this. You can either have portability or you can play games with the calling convention assumed by the back end, or you can modify the compiler to suit your desired calling convention, but you probably can't get all three. I'm the guy behind HDTrans (dynamic binrary translation for x86), and we used direct x86 instruction emission as well, and we cheated like crazy on calling conventions, stacks, you name it. So I understand where you are coming from. I've also done some bytecode VM work. You just aren't going to get a portable result that way, and as others have said already, using llvm-il isn't going to get you there. I think you are better off stepping back and looking at this as a new engineering problem rather than trying to translate your existing solution piece by piece. The bad news is that this infrastructure my not let you get quite as far down toward the bare metal. The good news is that it can be exploited to do more in the way of dynamic optimization than is typically feasible with directly hand-generated machine code. If you like, get in touch with me off-line. I don't want to go spouting off useless ideas here, because I don't understand what you are trying to do yet. But I'd be happy to talk with you to get a slightly better sense and see if I can offer some practical help. Jonathan shap (at) eros-osdogorg Dog should be dot, of course. :-) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150807/37359f19/attachment.html>
Possibly Parallel Threads
- LLVM as a back end for HHVM
- Creating a virtual machine: stack, regs alloc & other problems
- Setting llvm::TargetOptions::GuaranteedTailCallOpt in LTO Code Generation
- [LLVMdev] Stack alignment on X86 AVX seems incorrect
- [LLVMdev] Stack alignment on X86 AVX seems incorrect