Hi llvmdev! I've been working on adding coroutines to LLVM. Mentioned below is the implementation plan I'm following, for suggestions, flames and other input. Using segmented stacks is a prerequisite. The idea is to associate every coroutine with a coroutine descriptor. A coroutine descriptor consists of four words: w0, w1, w2 and w3. w0 always contains the _launcher_, and invoking a coroutine always involves setting a predefined register (%eax or %rax on x86) to point to the descriptor and calling into w0. The contents of w1, w2 and w3 are decided by the state the coroutine is in. As of now, there are two possibilities: A. The coroutine has just been created: w0 is set to __launch_coroutine, which starts the coroutine by mallocing some initial stack space, calling into __generic_morestack_set_initial_sp (in libgcc) and then finally calling into the "main" function for that coroutine. Once the "main" function returns, a cleanup routine frees the stack. Before calling the "main" function, it also sets w0 to __continue_coroutine. In this state: w1 contains a pointer to the "main" function w2 contains a pointer that is passed as an argument to the "main" function w3 contains the initial stack size. B. The coroutine has already been invoked once: w0 is set to __continue_coroutine. __continue_coroutine sets the stack pointer correctly, and jumps to where the last coroutine switch left off. In this state: w1 contains the IP to jump to. w2 contains the stack pointer. w3 is undefined. Directly based on previous discussions on the list, I think it makes sense to add the following intrinsics to LLVM: // Returns the size of a coroutine descriptor. i32 llvm.coroutine_descriptor_size() // Creates a new coroutine, which starts off with the function // `function' being called with the argument `baton'. For this to // work correctly, function must be of the type `void () (i8 *)'. // `stack_size' will be the initial stack size. `descriptor' is a // block of memory with at least as many bytes as returned by // llvm.coroutine_descriptor_size(). void llvm.coroutine_new(i8 *descriptor, i8 *function, i8 *baton, i32 stack_size) // Store the state of the current coroutine in `this' and switch to // the one described by `new'. void llvm.coroutine_switch(i8 *this, i8 *new) It should be possible to have the LLVM register allocator correctly handle persisting the registers by replacing llvm.coroutine_switch with a pseudo instruction which has every register in its Def list. This will also have the advantage of not saving unneeded registers. One issue is deciding a home for __launch_coroutine and __continue_coroutine. Currently I'm coding based on these two being in a separate library of their own; but they can be merged into compiler-rt if required. Thanks! -- Sanjoy Das http://playingwithpointers.com
On 07/28/2011 05:31 PM, Sanjoy Das wrote:> Hi llvmdev! > > I've been working on adding coroutines to LLVM. Mentioned below is the > implementation plan I'm following, for suggestions, flames and other > input. Using segmented stacks is a prerequisite.I think my only comment is that, while this would probably work, implementing it in C with a bit of assembly for saving/restoring registers also does. What are the benefits of doing it in llvm itself?> The idea is to associate every coroutine with a coroutine descriptor. A > coroutine descriptor consists of four words: w0, w1, w2 and w3. > > w0 always contains the _launcher_, and invoking a coroutine always > involves setting a predefined register (%eax or %rax on x86) to point to > the descriptor and calling into w0. The contents of w1, w2 and w3 are > decided by the state the coroutine is in. As of now, there are two > possibilities: > > A. The coroutine has just been created: > > w0 is set to __launch_coroutine, which starts the coroutine by mallocing > some initial stack space, calling into > __generic_morestack_set_initial_sp (in libgcc) and then finally calling > into the "main" function for that coroutine. Once the "main" function > returns, a cleanup routine frees the stack. > > Before calling the "main" function, it also sets w0 to __continue_coroutine. > > In this state: > > w1 contains a pointer to the "main" function > w2 contains a pointer that is passed as an argument to the "main" function > w3 contains the initial stack size. > > B. The coroutine has already been invoked once: > > w0 is set to __continue_coroutine. __continue_coroutine sets the stack > pointer correctly, and jumps to where the last coroutine switch left > off. In this state: > > w1 contains the IP to jump to. > w2 contains the stack pointer. > w3 is undefined. > > > > Directly based on previous discussions on the list, I think it makes > sense to add the following intrinsics to LLVM: > > // Returns the size of a coroutine descriptor. > i32 llvm.coroutine_descriptor_size() > > // Creates a new coroutine, which starts off with the function > // `function' being called with the argument `baton'. For this to > // work correctly, function must be of the type `void () (i8 *)'. > // `stack_size' will be the initial stack size. `descriptor' is a > // block of memory with at least as many bytes as returned by > // llvm.coroutine_descriptor_size(). > > void llvm.coroutine_new(i8 *descriptor, i8 *function, i8 *baton, i32 > stack_size) > > // Store the state of the current coroutine in `this' and switch to > // the one described by `new'. > void llvm.coroutine_switch(i8 *this, i8 *new) > > It should be possible to have the LLVM register allocator correctly > handle persisting the registers by replacing llvm.coroutine_switch with > a pseudo instruction which has every register in its Def list. This will > also have the advantage of not saving unneeded registers. > > One issue is deciding a home for __launch_coroutine and > __continue_coroutine. Currently I'm coding based on these two being in a > separate library of their own; but they can be merged into compiler-rt > if required. > > Thanks! >Cheers, Rafael
On Fri, Jul 29, 2011 at 03:01:12AM +0530, Sanjoy Das wrote:> I've been working on adding coroutines to LLVM. Mentioned below is the > implementation plan I'm following, for suggestions, flames and other > input. Using segmented stacks is a prerequisite.Please look at architectures with register window like SPARC. They are the hard case for efficient co-routines. Joerg
Am 04.08.2011 um 22:06 schrieb Rafael Ávila de Espíndola:> On 07/28/2011 05:31 PM, Sanjoy Das wrote: >> Hi llvmdev! >> >> I've been working on adding coroutines to LLVM. Mentioned below is the >> implementation plan I'm following, for suggestions, flames and other >> input. Using segmented stacks is a prerequisite. > > > I think my only comment is that, while this would probably work, > implementing it in C with a bit of assembly for saving/restoring > registers also does. > > What are the benefits of doing it in llvm itself?Hi, The benefits are quite simple. When you do it with setjmp/longjmp oder context (which is deprecated in OS X Lion) your program uses lots of memory because each coroutine need its own stack (I believe for context its 32k per thread). As ucontext is a kernel call, this uses a lot of time as well. The setjmp/longjmp implementations usually invalidate pointers to objects on other coroutines calling stacks. Also, the debugging with gdb isn't really nice. Implementing coroutines with C macros is not really an option either. Doing everything with llvm would allow the compiler to do everything without stack switching. You could turn each coroutine into a state machine. If a context change occurs, llvm would return from one coroutine to the "llvm scheduler" without unwinding the state machines calling stack (this should be possible because of the usage of segmented stacks). Then it would execute the other coroutine from its current state. This can also be imagined as splitting each coroutines into small tasks ( the program between two context switches would become one task). And then the "llvm scheduler" would execute a full task and after it is done it executes the next task (the one that the coroutine wanted to switch to). Also, there are lots of optimization opportunities if llvm knows what a coroutine is. For example, quite often it is possible that you don't need to switch to the other coroutine but can just inline some simple code. Toralf Niebuhr
Am 04.08.2011 um 22:06 schrieb Rafael Ávila de Espíndola:> On 07/28/2011 05:31 PM, Sanjoy Das wrote: >> Hi llvmdev! >> >> I've been working on adding coroutines to LLVM. Mentioned below is the >> implementation plan I'm following, for suggestions, flames and other >> input. Using segmented stacks is a prerequisite. > > > I think my only comment is that, while this would probably work, > implementing it in C with a bit of assembly for saving/restoring > registers also does. > > What are the benefits of doing it in llvm itself?Hi, The benefits are quite simple. When you do it with setjmp/longjmp or context (which is deprecated in OS X Lion) your program uses lots of memory because each coroutine need its own stack (I believe for context its 32k per thread). As ucontext is a kernel call, this uses a lot of time as well. The setjmp/longjmp implementations usually invalidate pointers to objects on other coroutines calling stacks. Also, the debugging with gdb isn't really nice. Implementing coroutines with C macros is not really an option either. Doing everything with llvm would allow the compiler to do everything without stack switching. You could turn each coroutine into a state machine. If a context change occurs, llvm would return from one coroutine to the "llvm scheduler" without unwinding the state machines calling stack (this should be possible because of the usage of segmented stacks). Then it would execute the other coroutine from its current state. This can also be imagined as splitting each coroutines into small tasks ( the program between two context switches would become one task). And then the "llvm scheduler" would execute a full task and after it is done it executes the next task (the one that the coroutine wanted to switch to). Also, there are lots of optimization opportunities if llvm knows what a coroutine is. For example, quite often it is possible that you don't need to switch to the other coroutine but can just inline some simple code. Toralf Niebuhr (sorry for double post….I used the wrong mail address)