Kenneth Uildriks
2010-Apr-07  19:14 UTC
[LLVMdev] Proposal: stack/context switching within a thread
Right now the functionality is available, sometimes, from the C
standard library.  But embedded environments (often running a limited
standard library) and server environments would benefit heavily from a
standard way to specify context switches within a single thread in the
style of makecontext/swapcontext/setcontext, and built-in support for
these operations would also open the way for optimizers to begin
handling these execution paths.
The use cases for these operations, and things like coroutines built
on top of them, will only increase in the future as developers look
for ways to get more concurrency while limiting the number of
high-overhead and difficult to manage native threads, locks, and
mutexes.
-------------- next part --------------
//===----------------------------------------------------------------------===//
//                         Stack and context switching
//===----------------------------------------------------------------------===//
4/7/2010 - Initial Revision
At the time of this writing, LLVM supports standard function calls on a thread
of execution, but does not support switching function contexts or stacks
within a thread of execution.  Such support would enable creating coroutines,
which in turn supports high performance, safer concurrency, and lower overhead
than native threads do, and enables concurrency on systems that lack native 
thread support.
Some C library implementations include support for stack and context switching
with functions such as makecontext, getcontext, and setcontext.  However,
these calls may not exist on some embedded systems, which may also lack native
thread support and therefore have a greater need for context switching.  Also,
built-in support for context switching allows such operations to be lowered
to inline assembly rather than a call into the C library.  Implementation of
kernels in IR would benefit from these intrinsics.  The C library functions 
depend on machine-specific structures to represent the context.  Finally, with 
intrinsic functions for handling context switches, optimizers can be made to 
recognize and optimize code whose flow of execution is impacted by these context
switches.
//===----------------------------------------------------------------------===//
// Implementation Approach
We will model the intrinsics after the C library functions.  In environments 
where the library functions are present, these intrinsics should behave 
identically to the library functions, so that code using them can interact 
gracefully with platform native code using the library functions.  In
environments where the library functions are absent, these intrinsics can be 
lowered to inline assembly language instructions with the proper effect.
We will constrain the definition of the function passed to llvm.makecontext,
and unconditionally pass the function's own context to it, in order to make
it easy for the optimizer to determine when the function uses
llvm.swapcontext to temporarily return execution to its "link" context
or
when it passes its own context as the "link" context to a newly
created context.  This should make it possible to optimize some common
coroutine patterns.
A context must be executing within at most one thread at any given time.
A context may execute in one thread, and later execute in a different thread.
; Returns the number of bytes needed to hold a stack context.  Since the
; context typically includes a copy of most or all machine registers plus
; additional data, mem2reg would offer little advantage; therefore, having the
; size not recognized in the IR as a constant, while it would block mem2reg for
; that particular alloca, would have little practical disadvantage.
declare i64 llvm.context.size() readonly
; pCtx shall be a pointer to a memory area of at least the number of bytes
; returned by llvm.context.size().  That memory area will be filled with
; stack context data.  A call to llvm.setcontext or llvm.swapcontext with
; that context data will cause execution to proceed as if from a return from
; this llvm.getcontext call.
declare void llvm.getcontext({}* %pCtx)
; pCtx shall be a pointer to context data generated by a call to llvm.getcontext
; or llvm.makecontext.  llvm.setcontext will cause execution to transfer to
; the point specified in the context data.
declare void llvm.setcontext({}* %pCtx)
; The first argument is a pointer to a memory area of at least the numnber of
; bytes returned by llvm.context.size().  That memory area will be filled with
; new context data.  The second argument is a pointer to the function where 
; execution of this context shall begin.  The third and fourth arguments define 
; the memory reserved for the context's local stack; this stack will not
grow,
; and overflow of this stack will lead to undefined behavior.  The fifth
; argument is a pointer to the "linked" or "previous"
context; when %func
; returns, this pointer is dereferenced and execution continues at the linked
; context.  When the context begins executing, %func is called with a pointer to
; its corresponding context as its first parameter, and the sixth parameter to
; llvm.makecontext as its second parameter.  If execution unwinds past %func, 
; undefined behavior results.  If the memory pointed to by %newCtx is
; overwritten by anything other than a call to llvm.swapcontext, a later return
; from %func will result in undefined behavior.
declare void llvm.makecontext({}* %newCtx, void({}*, {}*)* %func, 
i8* %stackbegin, i64 %stacksize, {}* %linkCtx, {}* %data)
; Retrieves the link context pointer from the given context.  The link context
; pointer is the same one that was passed to llvm.makecontext to create the
; given context.  If pCtx was populated by llvm.getcontext rather than
; llvm.makecontext, this function returns a null pointer.  If pCtx was
; populated only by llvm.swapcontext, the return value is undefined.
declare {}* llvm.getlinkcontext({}* %pCtx)
; The first argument shall point to a memory area of at least the number of
; bytes returned by llvm.context.size().  That memory area will be filled with
; stack context data corresponding to the current execution context.  The
; second argument shall point to context data generated by a call to
; llvm.getcontext or llvm.makecontext.  Execution will transfer to the context
; specified by pNewCtx.  Using llvm.swapcontext or llvm.setcontext to set
; the context to that stored in pThisCtx will cause execution to transfer back
; to this context as if from a return from llvm.swapcontext.  The linked
; context pointer corresponding to %pThisCtx is not modified.
declare void llvm.swapcontext({}* %pThisCtx, {}* %pNewCtx)
A simple example using these intrinsics:
define void @co1({}* %thisCtx, {}* %data) nounwind {
entry:
  %prevCtx = call {}* @llvm.getlinkcontext(%thisCtx)
  ; Now call print messages.  After each print message, temporarily yield
  ; control back to the previous context.
  call void @printCo1FirstMessage()
  call void @llvm.swapcontext({}* %thisCtx, {}* %prevCtx)
  call void @printCo1SecondMessage()
  call void @llvm.swapcontext({}* %thisCtx, {}* %prevCtx)
  call void @printCo1ThirdMessage()
    
}
define void @co2({}* %thisCtx, {}* %data) nounwind {
entry:
  %prevCtx = call {}* @llvm.getlinkcontext(%thisCtx)
  ; Now call print messages.  After each print message, temporarily yield
  ; control back to the previous context.
  call void @printCo2FirstMessage()
  call void @llvm.swapcontext({}* %thisCtx, {}* %prevCtx)
  call void @printCo2SecondMessage()
  call void @llvm.swapcontext({}* %thisCtx, {}* %prevCtx)
  call void @printCo2ThirdMessage()
}
define i32 @main() nounwind {
entry:
  ; alloca space for the contexts.
  %ctxSize = call i64 @llvm.context.size()
  %p0 = alloca i8, i64 %ctxSize
  %mainCtx = bitcast i8* %p0 to {}*
  %p1 = alloca i8, i64 %ctxSize
  %ctx1 = bitcast i8* %p1 to {}*
  %p2 = alloca i8, i64 %ctxSize
  %ctx2 = bitcast i8* %p2 to {}*
  ; Stacks for the contexts
  %stack1 = alloca i8, i64 4096
  %stack2 = alloca i8, i64 4096
  ; Create contexts for co1 and co2.
  call void @llvm.makecontext({}* %ctx1, void({}*, {}*)* @co1, i8* %stack1, 
i64 4096, {}* %mainCtx, {}* null)
  call void @llvm.makecontext({}* %ctx2, void({}*, {}*)* @co2, i8* %stack2, 
i64 4096, {}* %mainCtx, {}* null)
  ; Run co1 and co2 in an alternating fashion.
  call void @llvm.swapcontext({}* %mainCtx, {}* %ctx1)
  call void @llvm.swapcontext({}* %mainCtx, {}* %ctx2)
  call void @llvm.swapcontext({}* %mainCtx, {}* %ctx1)
  call void @llvm.swapcontext({}* %mainCtx, {}* %ctx2)
  call void @llvm.swapcontext({}* %mainCtx, {}* %ctx1)
  call void @llvm.swapcontext({}* %mainCtx, {}* %ctx2)
}
This code should make the following sequence of messaging calls:
printCo1FirstMessage()
printCo2FirstMessage()
printCo1SecondMessage()
printCo2SecondMessage()
printCo1ThirdMessage()
printCo2ThirdMessage()
//===----------------------------------------------------------------------===//
// Moving forward
//
An easy first step would be to lower the intrinsics to corresponding calls to 
makecontext and friends.  Once that is working on test platforms with these
calls in their C libraries, we can start adding platform specific lowering
code to platforms that need it, and then as an option for all platforms to
allow using this functionality in kernels and other environments that lack a
standard C library.  In the meantime, optimization passes can be introduced that
know how to optimize across some context switches.
Jeffrey Yasskin
2010-Apr-10  17:54 UTC
[LLVMdev] Proposal: stack/context switching within a thread
I took the liberty of forwarding this to the Stackless Python list, since they switch stacks, and I got a response at http://thread.gmane.org/gmane.comp.python.stackless/4464/focus=4467. The upshot is that they really need the ability to allocate only a tiny amount of space for each thread and grow that as the thread actually uses more stack. The way they accomplish that now is by copying the entire stack to the heap on a context switch, and having all threads share the main C stack. This isn't quite as bad as it sounds because it only happens to threads that call into C extension modules. Pure Python threads operate entirely within heap Python frames. Still, it would be nice to support this use case. Kenneth, I don't want to insist that the first version of this be both a floor wax _and_ a dessert topping, but is there a natural extension to supporting what Stackless needs that we could add later? It looks like swapcontext() simply repoints the stack pointer and restores the registers, while Stackless wants it to be able to allocate memory and copy the stack. Maybe that implies a "mode" argument? Alternately, Stackless could probably work with a segmented stack mechanism like Ian Taylor implemented in gcc for Go. Do you see anything that would prevent us from layering segmented stacks on top of this context switching mechanism later? Thanks, Jeffrey On Wed, Apr 7, 2010 at 12:14 PM, Kenneth Uildriks <kennethuil at gmail.com> wrote:> Right now the functionality is available, sometimes, from the C > standard library. But embedded environments (often running a limited > standard library) and server environments would benefit heavily from a > standard way to specify context switches within a single thread in the > style of makecontext/swapcontext/setcontext, and built-in support for > these operations would also open the way for optimizers to begin > handling these execution paths. > > The use cases for these operations, and things like coroutines built > on top of them, will only increase in the future as developers look > for ways to get more concurrency while limiting the number of > high-overhead and difficult to manage native threads, locks, and > mutexes. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >
Kenneth Uildriks
2010-Apr-10  18:32 UTC
[LLVMdev] Proposal: stack/context switching within a thread
On Sat, Apr 10, 2010 at 12:54 PM, Jeffrey Yasskin <jyasskin at google.com> wrote:> I took the liberty of forwarding this to the Stackless Python list, > since they switch stacks, and I got a response at > http://thread.gmane.org/gmane.comp.python.stackless/4464/focus=4467. > The upshot is that they really need the ability to allocate only a > tiny amount of space for each thread and grow that as the thread > actually uses more stack. The way they accomplish that now is by > copying the entire stack to the heap on a context switch, and having > all threads share the main C stack. This isn't quite as bad as it > sounds because it only happens to threads that call into C extension > modules. Pure Python threads operate entirely within heap Python > frames. Still, it would be nice to support this use case. > > Kenneth, I don't want to insist that the first version of this be both > a floor wax _and_ a dessert topping, but is there a natural extension > to supporting what Stackless needs that we could add later? It looks > like swapcontext() simply repoints the stack pointer and restores the > registers, while Stackless wants it to be able to allocate memory and > copy the stack. Maybe that implies a "mode" argument? > > Alternately, Stackless could probably work with a segmented stack > mechanism like Ian Taylor implemented in gcc for Go. Do you see > anything that would prevent us from layering segmented stacks on top > of this context switching mechanism later? > > Thanks, > JeffreyAs I see it, the context switching mechanism itself needs to know where to point the stack register when switching. The C routines take an initial stack pointer when creating the context, and keep track of it from there. If we don't actually need to interoperate with contexts created from the C routines, we have a lot more freedom. Anyway, one approach would be to expose intrinsics to interrogate an inactive context, to get its initial stack pointer (the one it was created with) and its current stack pointer, and also to modify both before making the context active again. I don't see any reason why this scheme wouldn't also be compatible with segmented stacks. In fact, one could segment an inactive context stack at any time rather than copying the whole thing, as long as you can assume that there aren't any pointers into the context's stack living outside the context.
Apparently Analagous Threads
- [LLVMdev] Proposal: stack/context switching within a thread
- [LLVMdev] Proposal: stack/context switching within a thread
- [LLVMdev] Proposal: stack/context switching within a thread
- [LLVMdev] Proposal: stack/context switching within a thread
- [LLVMdev] Proposal: stack/context switching within a thread