thr3ads.net - llvm dev - [LLVMdev] [RFC] Coroutines [Jul 2011]

If this information is useful, please help other people find it:
Share via:

Sanjoy Das

2011-Jul-28 21:31 UTC

[LLVMdev] [RFC] Coroutines

Hi llvmdev!

I've been working on adding coroutines to LLVM. Mentioned below is the
implementation plan I'm following, for suggestions, flames and other
input. Using segmented stacks is a prerequisite.

The idea is to associate every coroutine with a coroutine descriptor. A
coroutine descriptor consists of four words: w0, w1, w2 and w3.

w0 always contains the _launcher_, and invoking a coroutine always
involves setting a predefined register (%eax or %rax on x86) to point to
the descriptor and calling into w0. The contents of w1, w2 and w3  are
decided by the state the coroutine is in. As of now, there are two
possibilities:

A. The coroutine has just been created:

w0 is set to __launch_coroutine, which starts the coroutine by mallocing
some initial stack space, calling into
__generic_morestack_set_initial_sp (in libgcc) and then finally calling
into the "main" function for that coroutine. Once the "main"
function
returns, a cleanup routine frees the stack.

Before calling the "main" function, it also sets w0 to
__continue_coroutine.

In this state:

w1 contains a pointer to the "main" function
w2 contains a pointer that is passed as an argument to the "main"
function
w3 contains the initial stack size.

B. The coroutine has already been invoked once:

w0 is set to __continue_coroutine. __continue_coroutine sets the stack
pointer correctly, and jumps to where the last coroutine switch left
off. In this state:

w1 contains the IP to jump to.
w2 contains the stack pointer.
w3 is undefined.



Directly based on previous discussions on the list, I think it makes
sense to add the following intrinsics to LLVM:

// Returns the size of a coroutine descriptor.
i32 llvm.coroutine_descriptor_size()

// Creates a new coroutine, which starts off with the function
// `function' being called with the argument `baton'. For this to
// work correctly, function must be of the type `void () (i8 *)'.
// `stack_size' will be the initial stack size. `descriptor' is a
// block of memory with at least as many bytes as returned by
// llvm.coroutine_descriptor_size().

void llvm.coroutine_new(i8 *descriptor, i8 *function, i8 *baton, i32
stack_size)

// Store the state of the current coroutine in `this' and switch to
// the one described by `new'.
void llvm.coroutine_switch(i8 *this, i8 *new)

It should be possible to have the LLVM register allocator correctly
handle persisting the registers by replacing llvm.coroutine_switch with
a pseudo instruction which has every register in its Def list. This will
also have the advantage of not saving unneeded registers.

One issue is deciding a home for __launch_coroutine and
__continue_coroutine. Currently I'm coding based on these two being in a
separate library of their own; but they can be merged into compiler-rt
if required.

Thanks!

-- 
Sanjoy Das
http://playingwithpointers.com

Rafael Ávila de Espíndola

2011-Aug-04 20:06 UTC

head link

[LLVMdev] [RFC] Coroutines

On 07/28/2011 05:31 PM, Sanjoy Das wrote:> Hi llvmdev!
>
> I've been working on adding coroutines to LLVM. Mentioned below is the
> implementation plan I'm following, for suggestions, flames and other
> input. Using segmented stacks is a prerequisite.

I think my only comment is that, while this would probably work, 
implementing it in C with a bit of assembly for saving/restoring 
registers also does.

What are the benefits of doing it in llvm itself?
> The idea is to associate every coroutine with a coroutine descriptor. A
> coroutine descriptor consists of four words: w0, w1, w2 and w3.
>
> w0 always contains the _launcher_, and invoking a coroutine always
> involves setting a predefined register (%eax or %rax on x86) to point to
> the descriptor and calling into w0. The contents of w1, w2 and w3  are
> decided by the state the coroutine is in. As of now, there are two
> possibilities:
>
> A. The coroutine has just been created:
>
> w0 is set to __launch_coroutine, which starts the coroutine by mallocing
> some initial stack space, calling into
> __generic_morestack_set_initial_sp (in libgcc) and then finally calling
> into the "main" function for that coroutine. Once the
"main" function
> returns, a cleanup routine frees the stack.
>
> Before calling the "main" function, it also sets w0 to
__continue_coroutine.
>
> In this state:
>
> w1 contains a pointer to the "main" function
> w2 contains a pointer that is passed as an argument to the "main"
function
> w3 contains the initial stack size.
>
> B. The coroutine has already been invoked once:
>
> w0 is set to __continue_coroutine. __continue_coroutine sets the stack
> pointer correctly, and jumps to where the last coroutine switch left
> off. In this state:
>
> w1 contains the IP to jump to.
> w2 contains the stack pointer.
> w3 is undefined.
>
>
>
> Directly based on previous discussions on the list, I think it makes
> sense to add the following intrinsics to LLVM:
>
> // Returns the size of a coroutine descriptor.
> i32 llvm.coroutine_descriptor_size()
>
> // Creates a new coroutine, which starts off with the function
> // `function' being called with the argument `baton'. For this to
> // work correctly, function must be of the type `void () (i8 *)'.
> // `stack_size' will be the initial stack size. `descriptor' is a
> // block of memory with at least as many bytes as returned by
> // llvm.coroutine_descriptor_size().
>
> void llvm.coroutine_new(i8 *descriptor, i8 *function, i8 *baton, i32
> stack_size)
>
> // Store the state of the current coroutine in `this' and switch to
> // the one described by `new'.
> void llvm.coroutine_switch(i8 *this, i8 *new)
>
> It should be possible to have the LLVM register allocator correctly
> handle persisting the registers by replacing llvm.coroutine_switch with
> a pseudo instruction which has every register in its Def list. This will
> also have the advantage of not saving unneeded registers.
>
> One issue is deciding a home for __launch_coroutine and
> __continue_coroutine. Currently I'm coding based on these two being in
a
> separate library of their own; but they can be merged into compiler-rt
> if required.
>
> Thanks!
>
Cheers,
Rafael

Joerg Sonnenberger

2011-Aug-05 01:00 UTC

head link

[LLVMdev] [RFC] Coroutines

On Fri, Jul 29, 2011 at 03:01:12AM +0530, Sanjoy Das
wrote:> I've been working on adding coroutines to LLVM. Mentioned below is the
> implementation plan I'm following, for suggestions, flames and other
> input. Using segmented stacks is a prerequisite.
Please look at architectures with register window like SPARC. They are
the hard case for efficient co-routines.

Joerg

Toralf Niebuhr

2011-Aug-05 15:52 UTC

head link

[LLVMdev] [RFC] Coroutines

Am 04.08.2011 um 22:06 schrieb Rafael Ávila de Espíndola:
> On 07/28/2011 05:31 PM, Sanjoy Das wrote:
>> Hi llvmdev!
>> 
>> I've been working on adding coroutines to LLVM. Mentioned below is
the
>> implementation plan I'm following, for suggestions, flames and
other
>> input. Using segmented stacks is a prerequisite.
> 
> 
> I think my only comment is that, while this would probably work, 
> implementing it in C with a bit of assembly for saving/restoring 
> registers also does.
> 
> What are the benefits of doing it in llvm itself?Hi,

The benefits are quite simple. 

When you do it with setjmp/longjmp oder context (which is deprecated in OS X
Lion) your program uses lots of memory because each coroutine need its own stack
(I believe for context its 32k per thread).
As ucontext is a kernel call, this uses a lot of time as well. The
setjmp/longjmp implementations usually invalidate pointers to objects on other
coroutines calling stacks.
Also, the debugging with gdb isn't really nice.
Implementing coroutines with C macros is not really an option either.

Doing everything with llvm would allow the compiler to do everything without
stack switching. You could turn each coroutine into a state machine. If a
context change occurs, llvm would return from one coroutine to the "llvm
scheduler" without unwinding the state machines calling stack (this should
be possible because of the usage of segmented stacks). Then it would execute the
other coroutine from its current state.
This can also be imagined as splitting each coroutines into small tasks ( the
program between two context switches would become one task). And then the
"llvm scheduler" would execute a full task and after it is done it
executes the next task (the one that the coroutine wanted to switch to).

Also, there are lots of optimization opportunities if llvm knows what a
coroutine is. For example, quite often it is possible that you don't need to
switch to the other coroutine but can just inline some simple code.

Toralf Niebuhr

Toralf Niebuhr

2011-Aug-05 16:43 UTC

head link

[LLVMdev] [RFC] Coroutines

Am 04.08.2011 um 22:06 schrieb Rafael Ávila de Espíndola:
> On 07/28/2011 05:31 PM, Sanjoy Das wrote:
>> Hi llvmdev!
>> 
>> I've been working on adding coroutines to LLVM. Mentioned below is
the
>> implementation plan I'm following, for suggestions, flames and
other
>> input. Using segmented stacks is a prerequisite.
> 
> 
> I think my only comment is that, while this would probably work, 
> implementing it in C with a bit of assembly for saving/restoring 
> registers also does.
> 
> What are the benefits of doing it in llvm itself?Hi,

The benefits are quite simple. 

When you do it with setjmp/longjmp or context (which is deprecated in OS X Lion)
your program uses lots of memory because each coroutine need its own stack (I
believe for context its 32k per thread).
As ucontext is a kernel call, this uses a lot of time as well. The
setjmp/longjmp implementations usually invalidate pointers to objects on other
coroutines calling stacks.
Also, the debugging with gdb isn't really nice.
Implementing coroutines with C macros is not really an option either.

Doing everything with llvm would allow the compiler to do everything without
stack switching. You could turn each coroutine into a state machine. If a
context change occurs, llvm would return from one coroutine to the "llvm
scheduler" without unwinding the state machines calling stack (this should
be possible because of the usage of segmented stacks). Then it would execute the
other coroutine from its current state.
This can also be imagined as splitting each coroutines into small tasks ( the
program between two context switches would become one task). And then the
"llvm scheduler" would execute a full task and after it is done it
executes the next task (the one that the coroutine wanted to switch to).

Also, there are lots of optimization opportunities if llvm knows what a
coroutine is. For example, quite often it is possible that you don't need to
switch to the other coroutine but can just inline some simple code.

Toralf Niebuhr

(sorry for double post….I used the wrong mail address)

Seemingly Similar Threads

Search for more maybe matching threads

llvm dev - Jul 2011 - [LLVMdev] [RFC] Coroutines

[LLVMdev] [RFC] Coroutines

[LLVMdev] [RFC] Coroutines

[LLVMdev] [RFC] Coroutines

[LLVMdev] [RFC] Coroutines

[LLVMdev] [RFC] Coroutines

Seemingly Similar Threads