Kenneth Uildriks
2010-Apr-07 19:14 UTC
[LLVMdev] Proposal: stack/context switching within a thread
Right now the functionality is available, sometimes, from the C standard library. But embedded environments (often running a limited standard library) and server environments would benefit heavily from a standard way to specify context switches within a single thread in the style of makecontext/swapcontext/setcontext, and built-in support for these operations would also open the way for optimizers to begin handling these execution paths. The use cases for these operations, and things like coroutines built on top of them, will only increase in the future as developers look for ways to get more concurrency while limiting the number of high-overhead and difficult to manage native threads, locks, and mutexes. -------------- next part -------------- //===----------------------------------------------------------------------===// // Stack and context switching //===----------------------------------------------------------------------===// 4/7/2010 - Initial Revision At the time of this writing, LLVM supports standard function calls on a thread of execution, but does not support switching function contexts or stacks within a thread of execution. Such support would enable creating coroutines, which in turn supports high performance, safer concurrency, and lower overhead than native threads do, and enables concurrency on systems that lack native thread support. Some C library implementations include support for stack and context switching with functions such as makecontext, getcontext, and setcontext. However, these calls may not exist on some embedded systems, which may also lack native thread support and therefore have a greater need for context switching. Also, built-in support for context switching allows such operations to be lowered to inline assembly rather than a call into the C library. Implementation of kernels in IR would benefit from these intrinsics. The C library functions depend on machine-specific structures to represent the context. Finally, with intrinsic functions for handling context switches, optimizers can be made to recognize and optimize code whose flow of execution is impacted by these context switches. //===----------------------------------------------------------------------===// // Implementation Approach We will model the intrinsics after the C library functions. In environments where the library functions are present, these intrinsics should behave identically to the library functions, so that code using them can interact gracefully with platform native code using the library functions. In environments where the library functions are absent, these intrinsics can be lowered to inline assembly language instructions with the proper effect. We will constrain the definition of the function passed to llvm.makecontext, and unconditionally pass the function's own context to it, in order to make it easy for the optimizer to determine when the function uses llvm.swapcontext to temporarily return execution to its "link" context or when it passes its own context as the "link" context to a newly created context. This should make it possible to optimize some common coroutine patterns. A context must be executing within at most one thread at any given time. A context may execute in one thread, and later execute in a different thread. ; Returns the number of bytes needed to hold a stack context. Since the ; context typically includes a copy of most or all machine registers plus ; additional data, mem2reg would offer little advantage; therefore, having the ; size not recognized in the IR as a constant, while it would block mem2reg for ; that particular alloca, would have little practical disadvantage. declare i64 llvm.context.size() readonly ; pCtx shall be a pointer to a memory area of at least the number of bytes ; returned by llvm.context.size(). That memory area will be filled with ; stack context data. A call to llvm.setcontext or llvm.swapcontext with ; that context data will cause execution to proceed as if from a return from ; this llvm.getcontext call. declare void llvm.getcontext({}* %pCtx) ; pCtx shall be a pointer to context data generated by a call to llvm.getcontext ; or llvm.makecontext. llvm.setcontext will cause execution to transfer to ; the point specified in the context data. declare void llvm.setcontext({}* %pCtx) ; The first argument is a pointer to a memory area of at least the numnber of ; bytes returned by llvm.context.size(). That memory area will be filled with ; new context data. The second argument is a pointer to the function where ; execution of this context shall begin. The third and fourth arguments define ; the memory reserved for the context's local stack; this stack will not grow, ; and overflow of this stack will lead to undefined behavior. The fifth ; argument is a pointer to the "linked" or "previous" context; when %func ; returns, this pointer is dereferenced and execution continues at the linked ; context. When the context begins executing, %func is called with a pointer to ; its corresponding context as its first parameter, and the sixth parameter to ; llvm.makecontext as its second parameter. If execution unwinds past %func, ; undefined behavior results. If the memory pointed to by %newCtx is ; overwritten by anything other than a call to llvm.swapcontext, a later return ; from %func will result in undefined behavior. declare void llvm.makecontext({}* %newCtx, void({}*, {}*)* %func, i8* %stackbegin, i64 %stacksize, {}* %linkCtx, {}* %data) ; Retrieves the link context pointer from the given context. The link context ; pointer is the same one that was passed to llvm.makecontext to create the ; given context. If pCtx was populated by llvm.getcontext rather than ; llvm.makecontext, this function returns a null pointer. If pCtx was ; populated only by llvm.swapcontext, the return value is undefined. declare {}* llvm.getlinkcontext({}* %pCtx) ; The first argument shall point to a memory area of at least the number of ; bytes returned by llvm.context.size(). That memory area will be filled with ; stack context data corresponding to the current execution context. The ; second argument shall point to context data generated by a call to ; llvm.getcontext or llvm.makecontext. Execution will transfer to the context ; specified by pNewCtx. Using llvm.swapcontext or llvm.setcontext to set ; the context to that stored in pThisCtx will cause execution to transfer back ; to this context as if from a return from llvm.swapcontext. The linked ; context pointer corresponding to %pThisCtx is not modified. declare void llvm.swapcontext({}* %pThisCtx, {}* %pNewCtx) A simple example using these intrinsics: define void @co1({}* %thisCtx, {}* %data) nounwind { entry: %prevCtx = call {}* @llvm.getlinkcontext(%thisCtx) ; Now call print messages. After each print message, temporarily yield ; control back to the previous context. call void @printCo1FirstMessage() call void @llvm.swapcontext({}* %thisCtx, {}* %prevCtx) call void @printCo1SecondMessage() call void @llvm.swapcontext({}* %thisCtx, {}* %prevCtx) call void @printCo1ThirdMessage() } define void @co2({}* %thisCtx, {}* %data) nounwind { entry: %prevCtx = call {}* @llvm.getlinkcontext(%thisCtx) ; Now call print messages. After each print message, temporarily yield ; control back to the previous context. call void @printCo2FirstMessage() call void @llvm.swapcontext({}* %thisCtx, {}* %prevCtx) call void @printCo2SecondMessage() call void @llvm.swapcontext({}* %thisCtx, {}* %prevCtx) call void @printCo2ThirdMessage() } define i32 @main() nounwind { entry: ; alloca space for the contexts. %ctxSize = call i64 @llvm.context.size() %p0 = alloca i8, i64 %ctxSize %mainCtx = bitcast i8* %p0 to {}* %p1 = alloca i8, i64 %ctxSize %ctx1 = bitcast i8* %p1 to {}* %p2 = alloca i8, i64 %ctxSize %ctx2 = bitcast i8* %p2 to {}* ; Stacks for the contexts %stack1 = alloca i8, i64 4096 %stack2 = alloca i8, i64 4096 ; Create contexts for co1 and co2. call void @llvm.makecontext({}* %ctx1, void({}*, {}*)* @co1, i8* %stack1, i64 4096, {}* %mainCtx, {}* null) call void @llvm.makecontext({}* %ctx2, void({}*, {}*)* @co2, i8* %stack2, i64 4096, {}* %mainCtx, {}* null) ; Run co1 and co2 in an alternating fashion. call void @llvm.swapcontext({}* %mainCtx, {}* %ctx1) call void @llvm.swapcontext({}* %mainCtx, {}* %ctx2) call void @llvm.swapcontext({}* %mainCtx, {}* %ctx1) call void @llvm.swapcontext({}* %mainCtx, {}* %ctx2) call void @llvm.swapcontext({}* %mainCtx, {}* %ctx1) call void @llvm.swapcontext({}* %mainCtx, {}* %ctx2) } This code should make the following sequence of messaging calls: printCo1FirstMessage() printCo2FirstMessage() printCo1SecondMessage() printCo2SecondMessage() printCo1ThirdMessage() printCo2ThirdMessage() //===----------------------------------------------------------------------===// // Moving forward // An easy first step would be to lower the intrinsics to corresponding calls to makecontext and friends. Once that is working on test platforms with these calls in their C libraries, we can start adding platform specific lowering code to platforms that need it, and then as an option for all platforms to allow using this functionality in kernels and other environments that lack a standard C library. In the meantime, optimization passes can be introduced that know how to optimize across some context switches.
Jeffrey Yasskin
2010-Apr-10 17:54 UTC
[LLVMdev] Proposal: stack/context switching within a thread
I took the liberty of forwarding this to the Stackless Python list, since they switch stacks, and I got a response at http://thread.gmane.org/gmane.comp.python.stackless/4464/focus=4467. The upshot is that they really need the ability to allocate only a tiny amount of space for each thread and grow that as the thread actually uses more stack. The way they accomplish that now is by copying the entire stack to the heap on a context switch, and having all threads share the main C stack. This isn't quite as bad as it sounds because it only happens to threads that call into C extension modules. Pure Python threads operate entirely within heap Python frames. Still, it would be nice to support this use case. Kenneth, I don't want to insist that the first version of this be both a floor wax _and_ a dessert topping, but is there a natural extension to supporting what Stackless needs that we could add later? It looks like swapcontext() simply repoints the stack pointer and restores the registers, while Stackless wants it to be able to allocate memory and copy the stack. Maybe that implies a "mode" argument? Alternately, Stackless could probably work with a segmented stack mechanism like Ian Taylor implemented in gcc for Go. Do you see anything that would prevent us from layering segmented stacks on top of this context switching mechanism later? Thanks, Jeffrey On Wed, Apr 7, 2010 at 12:14 PM, Kenneth Uildriks <kennethuil at gmail.com> wrote:> Right now the functionality is available, sometimes, from the C > standard library. But embedded environments (often running a limited > standard library) and server environments would benefit heavily from a > standard way to specify context switches within a single thread in the > style of makecontext/swapcontext/setcontext, and built-in support for > these operations would also open the way for optimizers to begin > handling these execution paths. > > The use cases for these operations, and things like coroutines built > on top of them, will only increase in the future as developers look > for ways to get more concurrency while limiting the number of > high-overhead and difficult to manage native threads, locks, and > mutexes. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >
Kenneth Uildriks
2010-Apr-10 18:32 UTC
[LLVMdev] Proposal: stack/context switching within a thread
On Sat, Apr 10, 2010 at 12:54 PM, Jeffrey Yasskin <jyasskin at google.com> wrote:> I took the liberty of forwarding this to the Stackless Python list, > since they switch stacks, and I got a response at > http://thread.gmane.org/gmane.comp.python.stackless/4464/focus=4467. > The upshot is that they really need the ability to allocate only a > tiny amount of space for each thread and grow that as the thread > actually uses more stack. The way they accomplish that now is by > copying the entire stack to the heap on a context switch, and having > all threads share the main C stack. This isn't quite as bad as it > sounds because it only happens to threads that call into C extension > modules. Pure Python threads operate entirely within heap Python > frames. Still, it would be nice to support this use case. > > Kenneth, I don't want to insist that the first version of this be both > a floor wax _and_ a dessert topping, but is there a natural extension > to supporting what Stackless needs that we could add later? It looks > like swapcontext() simply repoints the stack pointer and restores the > registers, while Stackless wants it to be able to allocate memory and > copy the stack. Maybe that implies a "mode" argument? > > Alternately, Stackless could probably work with a segmented stack > mechanism like Ian Taylor implemented in gcc for Go. Do you see > anything that would prevent us from layering segmented stacks on top > of this context switching mechanism later? > > Thanks, > JeffreyAs I see it, the context switching mechanism itself needs to know where to point the stack register when switching. The C routines take an initial stack pointer when creating the context, and keep track of it from there. If we don't actually need to interoperate with contexts created from the C routines, we have a lot more freedom. Anyway, one approach would be to expose intrinsics to interrogate an inactive context, to get its initial stack pointer (the one it was created with) and its current stack pointer, and also to modify both before making the context active again. I don't see any reason why this scheme wouldn't also be compatible with segmented stacks. In fact, one could segment an inactive context stack at any time rather than copying the whole thing, as long as you can assume that there aren't any pointers into the context's stack living outside the context.
Maybe Matching Threads
- [LLVMdev] Proposal: stack/context switching within a thread
- [LLVMdev] Proposal: stack/context switching within a thread
- [LLVMdev] Proposal: stack/context switching within a thread
- [LLVMdev] Proposal: stack/context switching within a thread
- [LLVMdev] Proposal: stack/context switching within a thread