On Mon, Apr 11, 2011 at 7:34 AM, Justin Holewinski < justin.holewinski at gmail.com> wrote:> On Mon, Apr 11, 2011 at 9:07 AM, Sanjoy Das < > sanjoy at playingwithpointers.com> wrote: > >> Hi! >> >> Thanks for the feedback. For context, my implementation plan is here: >> http://pastebin.com/e9JMZNCE >> >> First, about unwinding: >> >> In architectures like x86-64, where unwinding based on DWARF info, there >> shouldn't be any problems; since the DWARF info will be emitted >> correctly. Otherwise, if the unwinding is done by following BP, it >> should still be possible to have BP de-reference correctly (ref. "Frame >> Pointers" section in the implementation plan). SP will not always have a >> correct value - I don't know if this is problem. >> >> About co-routines: >> >> Here is a sketch of how I think co-routines can be implemented (I'll >> merge this with the main implementation plan after I get some feedback): >> >> Have a new instruction, called "yield" to return a value from a >> co-routine, preserving the state. Thus, we immediately know which >> functions are co-routines. Each co-routine will have a new stack. >> Associate each co-routine with a thread-local global variable (called >> saved_stack here, will have to be mangled with the name of the >> co-routine) which points to the start stack block for that co-routine. >> This will be the first block in the chain of blocks to follow. >> >> The structure of the block will be similar to the structure of a regular >> stack block, except that it will also have space to store two registers >> - this_ip and this_sp. >> >> The prologue of a co-routine will jump to a function similar to >> setup_new_block (setup_new_block_coroutine) which will work like >> setup_new_block, except: >> >> 1. It will first check if saved_stack is NULL. If it is NULL, it will >> allocate a new block and save it to saved_stack. It if isn't, it'll >> simply restore saved_sp, saved_ip. >> >> 2. In case a new block was allocated, it will pretty much do what >> setup_block does, after which it will adjust the SP to make space for >> the saved registers. >> >> The destroy_block procedure will also have to be a little different >> (mentioned below). >> >> There are four things (relevant to this discussion) a co-routine can do: >> >> Yield >> >> This returns control to the calling function, without forgetting the >> current state of the function. To do this, we save saved_ip and >> saved_sp. Every yield be padded with instructions pushing and popping >> the registers live at that point. Then we set the return value (register >> or memory), and restore saved_sp and saved_ip from the current block. We >> can't simply return because the actual return value has been hijacked to >> provide for block cleanup. >> >> Call to regular function >> >> Just a simple call - the caller's prologue will handle setting up a it's >> own stack space etc. >> >> Call to Co-routine >> >> This too should "just work", since all the heavy-lifting is done in the >> co-routine's prologue. However, the above approach will not work for >> nested co-routines (i.e. calling the same co-routine body with one call >> is still active, recursively). I'm not sure if having support for nested >> co-routines will add any value. >> >> Return >> >> This will be a regular return. Since the return value has been hijacked >> to point to a another block of code (destroy_block_coroutine), control >> will jump there instead. >> >> destroy_block_coroutine will free the linked-list of stack blocks (we >> have to free this, since we will won't have a reference to this list >> anymore), set saved_stack for this co-routine to NULL, and restore >> saved_sp and saved_ip. >> > > I'm wondering how much of this should be implemented as new LLVM > functionality, and how much should be left to the front-end compiler. With > some additional LLVM intrinsics (e.g. llvm.stack.new.block, > llvm.stack.delete.block, etc.), a front-end could take care of the details > of how co-routines are actually implemented. This would also give the > front-end freedom to implement whatever semantics/conventions are > necessary/required for the source language. I'm just not sure having LLVM > dictate *how* co-routines are implemented is the best way to approach > this. >I'm in agreement with Justin. 1) It's much easier to add new intrinsics to LLVM than it is to add new instructions. Library functions are even easier - in general you want to use intrinsics in cases where the presence of the intrinsic will affect the way the code is generated in the calling function. 2) The API that I was envisioning for creating stacks was something fairly simple (maybe too simple): // Create a new stack, where 'stack_handle' is some opaque object, and 'func' // is an LLVM function. stack_handle = llvm.createstack(func, data) // Switch to a different stack (i.e. yield) llvm.switchstack(stack_handle) // deallocate a stack llvm.destroystack(stack_handle) The frontend would be responsible for managing the lifetime of the stack_handle object. For something like Python generators, the stack_handle would be stored in the iterator object that is returned from the generator, and deallocated when the generator gets garbage-collected. For cooperative threads, the handle would be stored as a private member of a Thread or Coroutine class. Different languages would support different ways of using coroutines. Similarly, the frontend would be responsible for passing data between the different coroutines, using either thread-local variables or some other method. Also I'm thinking the frontend would be responsible for propagating exceptions between stacks - so for example, if my coroutine throws an exception that is not caught, but instead propagates all the way to the top of it's stack, then that coroutine gets terminated, and depending on the language semantics, the same exception or another one gets thrown upon return from 'yield' in the other context. I would have no objection to implementing all of that logic in my frontend. Because the internals of the stack_handle may be different on different platforms, I'm thinking that the frontend should only get a void* pointer - it doesn't need to know what's inside it. I think the frontend only needs a small number of operations: create, yield, destroy, and iterate through call frames (for garbage collection). The 'yield' instruction seems somewhat inspired from Python's 'yield' statement, which unfortunately has some substantial drawbacks - such as the fact that the yield must be in the top-most call frame in order for the compiler to know that the function is a generator / coroutine. In other words, you can't in Python have a generator which calls a subroutine that does the yield. I'd like to avoid that limitation. Scheme and Lua, to name just two examples, have much more powerful models for doing continuations and coroutines, and I'd like to be able to support those. I think that can be done if we supply only the most minimal support on the LLVM side, and leave most of the details to the frontend.> >> >> -- >> Sanjoy Das >> http://playingwithpointers.com >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> > > > > -- > > Thanks, > > Justin Holewinski > >-- -- Talin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110411/eb7226fa/attachment.html>
This reminds me of Kenneth's "context" proposal from some time back: http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-April/030787.html I haven't compared the two too closely, but I thought I should throw it out there as food for thought. Reid On Mon, Apr 11, 2011 at 3:26 PM, Talin <viridia at gmail.com> wrote:> On Mon, Apr 11, 2011 at 7:34 AM, Justin Holewinski > <justin.holewinski at gmail.com> wrote: >> >> On Mon, Apr 11, 2011 at 9:07 AM, Sanjoy Das >> <sanjoy at playingwithpointers.com> wrote: >>> >>> Hi! >>> >>> Thanks for the feedback. For context, my implementation plan is here: >>> http://pastebin.com/e9JMZNCE >>> >>> First, about unwinding: >>> >>> In architectures like x86-64, where unwinding based on DWARF info, there >>> shouldn't be any problems; since the DWARF info will be emitted >>> correctly. Otherwise, if the unwinding is done by following BP, it >>> should still be possible to have BP de-reference correctly (ref. "Frame >>> Pointers" section in the implementation plan). SP will not always have a >>> correct value - I don't know if this is problem. >>> >>> About co-routines: >>> >>> Here is a sketch of how I think co-routines can be implemented (I'll >>> merge this with the main implementation plan after I get some feedback): >>> >>> Have a new instruction, called "yield" to return a value from a >>> co-routine, preserving the state. Thus, we immediately know which >>> functions are co-routines. Each co-routine will have a new stack. >>> Associate each co-routine with a thread-local global variable (called >>> saved_stack here, will have to be mangled with the name of the >>> co-routine) which points to the start stack block for that co-routine. >>> This will be the first block in the chain of blocks to follow. >>> >>> The structure of the block will be similar to the structure of a regular >>> stack block, except that it will also have space to store two registers >>> - this_ip and this_sp. >>> >>> The prologue of a co-routine will jump to a function similar to >>> setup_new_block (setup_new_block_coroutine) which will work like >>> setup_new_block, except: >>> >>> 1. It will first check if saved_stack is NULL. If it is NULL, it will >>> allocate a new block and save it to saved_stack. It if isn't, it'll >>> simply restore saved_sp, saved_ip. >>> >>> 2. In case a new block was allocated, it will pretty much do what >>> setup_block does, after which it will adjust the SP to make space for >>> the saved registers. >>> >>> The destroy_block procedure will also have to be a little different >>> (mentioned below). >>> >>> There are four things (relevant to this discussion) a co-routine can do: >>> >>> Yield >>> >>> This returns control to the calling function, without forgetting the >>> current state of the function. To do this, we save saved_ip and >>> saved_sp. Every yield be padded with instructions pushing and popping >>> the registers live at that point. Then we set the return value (register >>> or memory), and restore saved_sp and saved_ip from the current block. We >>> can't simply return because the actual return value has been hijacked to >>> provide for block cleanup. >>> >>> Call to regular function >>> >>> Just a simple call - the caller's prologue will handle setting up a it's >>> own stack space etc. >>> >>> Call to Co-routine >>> >>> This too should "just work", since all the heavy-lifting is done in the >>> co-routine's prologue. However, the above approach will not work for >>> nested co-routines (i.e. calling the same co-routine body with one call >>> is still active, recursively). I'm not sure if having support for nested >>> co-routines will add any value. >>> >>> Return >>> >>> This will be a regular return. Since the return value has been hijacked >>> to point to a another block of code (destroy_block_coroutine), control >>> will jump there instead. >>> >>> destroy_block_coroutine will free the linked-list of stack blocks (we >>> have to free this, since we will won't have a reference to this list >>> anymore), set saved_stack for this co-routine to NULL, and restore >>> saved_sp and saved_ip. >> >> I'm wondering how much of this should be implemented as new LLVM >> functionality, and how much should be left to the front-end compiler. With >> some additional LLVM intrinsics (e.g. llvm.stack.new.block, >> llvm.stack.delete.block, etc.), a front-end could take care of the details >> of how co-routines are actually implemented. This would also give the >> front-end freedom to implement whatever semantics/conventions are >> necessary/required for the source language. I'm just not sure having LLVM >> dictate how co-routines are implemented is the best way to approach this. > > I'm in agreement with Justin. > 1) It's much easier to add new intrinsics to LLVM than it is to add new > instructions. Library functions are even easier - in general you want to use > intrinsics in cases where the presence of the intrinsic will affect the way > the code is generated in the calling function. > 2) The API that I was envisioning for creating stacks was something fairly > simple (maybe too simple): > // Create a new stack, where 'stack_handle' is some opaque object, and > 'func' > // is an LLVM function. > stack_handle = llvm.createstack(func, data) > // Switch to a different stack (i.e. yield) > llvm.switchstack(stack_handle) > // deallocate a stack > llvm.destroystack(stack_handle) > The frontend would be responsible for managing the lifetime of the > stack_handle object. For something like Python generators, the stack_handle > would be stored in the iterator object that is returned from the generator, > and deallocated when the generator gets garbage-collected. For cooperative > threads, the handle would be stored as a private member of a Thread or > Coroutine class. Different languages would support different ways of using > coroutines. > Similarly, the frontend would be responsible for passing data between the > different coroutines, using either thread-local variables or some other > method. > Also I'm thinking the frontend would be responsible for propagating > exceptions between stacks - so for example, if my coroutine throws an > exception that is not caught, but instead propagates all the way to the top > of it's stack, then that coroutine gets terminated, and depending on the > language semantics, the same exception or another one gets thrown upon > return from 'yield' in the other context. I would have no objection to > implementing all of that logic in my frontend. > Because the internals of the stack_handle may be different on different > platforms, I'm thinking that the frontend should only get a void* pointer - > it doesn't need to know what's inside it. I think the frontend only needs a > small number of operations: create, yield, destroy, and iterate through call > frames (for garbage collection). > > The 'yield' instruction seems somewhat inspired from Python's 'yield' > statement, which unfortunately has some substantial drawbacks - such as the > fact that the yield must be in the top-most call frame in order for the > compiler to know that the function is a generator / coroutine. In other > words, you can't in Python have a generator which calls a subroutine that > does the yield. I'd like to avoid that limitation. Scheme and Lua, to name > just two examples, have much more powerful models for doing continuations > and coroutines, and I'd like to be able to support those. I think that can > be done if we supply only the most minimal support on the LLVM side, and > leave most of the details to the frontend. >> >> >>> >>> -- >>> Sanjoy Das >>> http://playingwithpointers.com >>> _______________________________________________ >>> LLVM Developers mailing list >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >> >> >> >> -- >> >> Thanks, >> Justin Holewinski > > > > -- > -- Talin > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >
On Mon, Apr 11, 2011 at 12:38 PM, Reid Kleckner <reid.kleckner at gmail.com>wrote:> This reminds me of Kenneth's "context" proposal from some time back: > > http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-April/030787.html > > I haven't compared the two too closely, but I thought I should throw > it out there as food for thought. >That's very interesting, I had not seen that before. It would pretty much solve all of my needs, although it only supports fixed-sized stacks. If this were combined with what Sanjoy is proposing that would be terrific. There's one point there that I don't quite understand - the doc says that the context buffer that you allocate would be large enough to store a snapshot of all the machine's registers. I would think that the machine registers would be more conveniently stored on the stack itself, and the context buffer would merely contain a pointer to the end of the stack - that means that a call to setcontext() would simply push all the current registers, change the stack pointer, and then pop each register value off of the stack before returning. However, that's a minor implementation detail and not something I really care about one way or the other. There's one other feature that I can think of, not strictly necessary, but might help make context switching more efficient. Let's assume that I have a yield() function in my language which wraps around llvm.setcontext() and performs some additional language-specific housekeeping. So for example: void yield() { llvm.swap_context(next_context, etc); // If the other context returned with an exception, // propagate the exception to the current context. if (pending_throwable) { throw pending_throwable; } } So the problem here is that I have to perform a check for a throwable, when 99.9% of the time it will be NULL. One could get around this by playing games with the return address: void yield() { llvm.swap_context(next_context, etc); normal_exit: return; error_exit: throw pending_throwable; } The idea is that when you want to pass an exception to the other thread, you would modify the return address on the stack to point to error_exit instead of normal_exit. Unfortunately to get this to work you'd also have to convince LLVM's optimizer not to complain about unreachable code, so maybe this is more trouble than it's worth.> Reid > > On Mon, Apr 11, 2011 at 3:26 PM, Talin <viridia at gmail.com> wrote: > > On Mon, Apr 11, 2011 at 7:34 AM, Justin Holewinski > > <justin.holewinski at gmail.com> wrote: > >> > >> On Mon, Apr 11, 2011 at 9:07 AM, Sanjoy Das > >> <sanjoy at playingwithpointers.com> wrote: > >>> > >>> Hi! > >>> > >>> Thanks for the feedback. For context, my implementation plan is here: > >>> http://pastebin.com/e9JMZNCE > >>> > >>> First, about unwinding: > >>> > >>> In architectures like x86-64, where unwinding based on DWARF info, > there > >>> shouldn't be any problems; since the DWARF info will be emitted > >>> correctly. Otherwise, if the unwinding is done by following BP, it > >>> should still be possible to have BP de-reference correctly (ref. "Frame > >>> Pointers" section in the implementation plan). SP will not always have > a > >>> correct value - I don't know if this is problem. > >>> > >>> About co-routines: > >>> > >>> Here is a sketch of how I think co-routines can be implemented (I'll > >>> merge this with the main implementation plan after I get some > feedback): > >>> > >>> Have a new instruction, called "yield" to return a value from a > >>> co-routine, preserving the state. Thus, we immediately know which > >>> functions are co-routines. Each co-routine will have a new stack. > >>> Associate each co-routine with a thread-local global variable (called > >>> saved_stack here, will have to be mangled with the name of the > >>> co-routine) which points to the start stack block for that co-routine. > >>> This will be the first block in the chain of blocks to follow. > >>> > >>> The structure of the block will be similar to the structure of a > regular > >>> stack block, except that it will also have space to store two registers > >>> - this_ip and this_sp. > >>> > >>> The prologue of a co-routine will jump to a function similar to > >>> setup_new_block (setup_new_block_coroutine) which will work like > >>> setup_new_block, except: > >>> > >>> 1. It will first check if saved_stack is NULL. If it is NULL, it will > >>> allocate a new block and save it to saved_stack. It if isn't, it'll > >>> simply restore saved_sp, saved_ip. > >>> > >>> 2. In case a new block was allocated, it will pretty much do what > >>> setup_block does, after which it will adjust the SP to make space for > >>> the saved registers. > >>> > >>> The destroy_block procedure will also have to be a little different > >>> (mentioned below). > >>> > >>> There are four things (relevant to this discussion) a co-routine can > do: > >>> > >>> Yield > >>> > >>> This returns control to the calling function, without forgetting the > >>> current state of the function. To do this, we save saved_ip and > >>> saved_sp. Every yield be padded with instructions pushing and popping > >>> the registers live at that point. Then we set the return value > (register > >>> or memory), and restore saved_sp and saved_ip from the current block. > We > >>> can't simply return because the actual return value has been hijacked > to > >>> provide for block cleanup. > >>> > >>> Call to regular function > >>> > >>> Just a simple call - the caller's prologue will handle setting up a > it's > >>> own stack space etc. > >>> > >>> Call to Co-routine > >>> > >>> This too should "just work", since all the heavy-lifting is done in the > >>> co-routine's prologue. However, the above approach will not work for > >>> nested co-routines (i.e. calling the same co-routine body with one call > >>> is still active, recursively). I'm not sure if having support for > nested > >>> co-routines will add any value. > >>> > >>> Return > >>> > >>> This will be a regular return. Since the return value has been hijacked > >>> to point to a another block of code (destroy_block_coroutine), control > >>> will jump there instead. > >>> > >>> destroy_block_coroutine will free the linked-list of stack blocks (we > >>> have to free this, since we will won't have a reference to this list > >>> anymore), set saved_stack for this co-routine to NULL, and restore > >>> saved_sp and saved_ip. > >> > >> I'm wondering how much of this should be implemented as new LLVM > >> functionality, and how much should be left to the front-end compiler. > With > >> some additional LLVM intrinsics (e.g. llvm.stack.new.block, > >> llvm.stack.delete.block, etc.), a front-end could take care of the > details > >> of how co-routines are actually implemented. This would also give the > >> front-end freedom to implement whatever semantics/conventions are > >> necessary/required for the source language. I'm just not sure having > LLVM > >> dictate how co-routines are implemented is the best way to approach > this. > > > > I'm in agreement with Justin. > > 1) It's much easier to add new intrinsics to LLVM than it is to add new > > instructions. Library functions are even easier - in general you want to > use > > intrinsics in cases where the presence of the intrinsic will affect the > way > > the code is generated in the calling function. > > 2) The API that I was envisioning for creating stacks was something > fairly > > simple (maybe too simple): > > // Create a new stack, where 'stack_handle' is some opaque object, and > > 'func' > > // is an LLVM function. > > stack_handle = llvm.createstack(func, data) > > // Switch to a different stack (i.e. yield) > > llvm.switchstack(stack_handle) > > // deallocate a stack > > llvm.destroystack(stack_handle) > > The frontend would be responsible for managing the lifetime of the > > stack_handle object. For something like Python generators, the > stack_handle > > would be stored in the iterator object that is returned from the > generator, > > and deallocated when the generator gets garbage-collected. For > cooperative > > threads, the handle would be stored as a private member of a Thread or > > Coroutine class. Different languages would support different ways of > using > > coroutines. > > Similarly, the frontend would be responsible for passing data between the > > different coroutines, using either thread-local variables or some other > > method. > > Also I'm thinking the frontend would be responsible for propagating > > exceptions between stacks - so for example, if my coroutine throws an > > exception that is not caught, but instead propagates all the way to the > top > > of it's stack, then that coroutine gets terminated, and depending on the > > language semantics, the same exception or another one gets thrown upon > > return from 'yield' in the other context. I would have no objection to > > implementing all of that logic in my frontend. > > Because the internals of the stack_handle may be different on different > > platforms, I'm thinking that the frontend should only get a void* pointer > - > > it doesn't need to know what's inside it. I think the frontend only needs > a > > small number of operations: create, yield, destroy, and iterate through > call > > frames (for garbage collection). > > > > The 'yield' instruction seems somewhat inspired from Python's 'yield' > > statement, which unfortunately has some substantial drawbacks - such as > the > > fact that the yield must be in the top-most call frame in order for the > > compiler to know that the function is a generator / coroutine. In other > > words, you can't in Python have a generator which calls a subroutine that > > does the yield. I'd like to avoid that limitation. Scheme and Lua, to > name > > just two examples, have much more powerful models for doing continuations > > and coroutines, and I'd like to be able to support those. I think that > can > > be done if we supply only the most minimal support on the LLVM side, and > > leave most of the details to the frontend. > >> > >> > >>> > >>> -- > >>> Sanjoy Das > >>> http://playingwithpointers.com > >>> _______________________________________________ > >>> LLVM Developers mailing list > >>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >> > >> > >> > >> -- > >> > >> Thanks, > >> Justin Holewinski > > > > > > > > -- > > -- Talin > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > >-- -- Talin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110411/293fd79d/attachment.html>