Jeffrey Yasskin
2010-Apr-11 21:09 UTC
[LLVMdev] Proposal: stack/context switching within a thread
Kenneth Uildriks <kennethuil at gmail.com> wrote:> As I see it, the context switching mechanism itself needs to know > where to point the stack register when switching. The C routines take > an initial stack pointer when creating the context, and keep track of > it from there. If we don't actually need to interoperate with > contexts created from the C routines, we have a lot more freedom.I guess the reason to interoperate with contexts from the C routines would be to support ucontext_t's passed into signal handlers? But then the LLVM intrinsics need to specify that their context's layout is the same as ucontext_t's, on platforms where ucontext_t exists.> Anyway, one approach would be to expose intrinsics to interrogate an > inactive context, to get its initial stack pointer (the one it was > created with) and its current stack pointer, and also to modify both > before making the context active again. > > I don't see any reason why this scheme wouldn't also be compatible > with segmented stacks. > ... > On the other hand, stack manipulation really ought to be handled by > the target, since only the target knows the details of how the stack > is laid out to begin with. Also, if we have stack manipulation calls > in the IR, optimization quickly becomes very difficult. Unless we > just allow optimizers to ignore the stack manipulations and assume > they're doing the "right" thing. > > On the gripping hand, we don't want the target emitting memory > allocation calls in order to grow the stack (unless a function pointer > to malloc or its equivalent is passed in from the IR).In gcc's split-stacks (http://gcc.gnu.org/ml/gcc/2009-02/msg00429.html; I got the name wrong earlier), Ian planned to call a known global name to allocate memory (http://gcc.gnu.org/ml/gcc/2009-02/msg00479.html). I'm not sure what he actually wound up doing on the gccgo branch. LLVM could also put the allocation/deallocation functions into the context, although it'd probably be better to just follow gcc.>> The way they accomplish that now is by >> copying the entire stack to the heap on a context switch, and having >> all threads share the main C stack. This isn't quite as bad as it >> sounds because it only happens to threads that call into C extension >> modules. Pure Python threads operate entirely within heap Python >> frames. Still, it would be nice to support this use case. > > This wouldn't hold in IR, since virtual registers regularly get > spilled to the stack.. every context, regardless of the language, > would have to have its stack saved. Also, this method would mean that > a context cannot be used in any native thread other than the one that > created it, right?Well, a frontend can generate code in continuation-passing style or do all of its user-level "stack" frame manipulation on the heap. Then it only uses a constant amount of C-stack space, which might not be part of the context that needs to be switched. Only foreign calls necessarily use a chunk of C stack. Stackless's approach does seem to prevent one coroutine's foreign code from using pointers into another coroutine's stack, and maybe they could/should create a new context each time they need to enter a foreign frame instead of trying to copy the stack...> 2. We should be able to support "hard switching" in Stackless Python > by adding a llvm.getcontextstacktop intrinsic. If, as in Kristján's > example, llvm.getcontext is used to create context A, and then > execution continues until context B is created with > llvm.swapcontext(B, A), the region of memory between > llvm.getcontextstacktop(A) and llvm.getcontextstacktop(B) can be saved > and later restored when B is resumed.Wait, what stack top does swapcontext get? I'd thought that A's and B's stack top would be the same since they're executing on the same stack.> Of course that usage would > throw a monkey wrench into a segmented stack scheme... it assumes that > context stack areas actually behave like contiguous stacks. Not only > that, it assumes that no pointers to a context's stack exist outside > of the context... when the context is inactive, a pointer into a > context's stack won't be valid! > > But in the case of Stackless Python, these caveats can be addressed > with a simple "Don't do that!", since it's all tied into the language.And users shouldn't need both stack copying and split stacks. Just one should suffice.> 3. I would need to run some benchmarks, but in some cases it might be > better to use mmap to swap stacks between contexts... that way nothing > would need to be copied.Presumably the user would deal with that in allocating their stacks and switching contexts, using the intrinsics LLVM provides? I don't see a reason yet for LLVM to get into the mmap business.> 4. I'm hoping that LLVM ends up growing optimization passes that > minimize the actual physical use of contexts in many use cases.That sounds very tricky...> Also, > we might be able to guarantee small stack usage with a pass that > forces recursive calls to spawn a new context and turns large alloca's > into malloc's, making it safer to have a bunch of little stacks > without any needed juggling.This sounds like a stopgap until real split stacks can be implemented. http://gcc.gnu.org/wiki/SplitStacks#Backward_compatibility describes some of the other difficulties in getting even this much to work. (foreign calls, and function pointers, at least)
Kenneth Uildriks
2010-Apr-11 21:41 UTC
[LLVMdev] Proposal: stack/context switching within a thread
On Sun, Apr 11, 2010 at 4:09 PM, Jeffrey Yasskin <jyasskin at google.com> wrote:> Kenneth Uildriks <kennethuil at gmail.com> wrote: >> As I see it, the context switching mechanism itself needs to know >> where to point the stack register when switching. The C routines take >> an initial stack pointer when creating the context, and keep track of >> it from there. If we don't actually need to interoperate with >> contexts created from the C routines, we have a lot more freedom. > > I guess the reason to interoperate with contexts from the C routines > would be to support ucontext_t's passed into signal handlers? But then > the LLVM intrinsics need to specify that their context's layout is the > same as ucontext_t's, on platforms where ucontext_t exists.Or perhaps it can be an argument to the target code generator, if there's any need to switch "compatibility mode" off and on. All that the intrinsics require is that context creators be given a memory area of at least size llvm.context.size() to write contexts into, and that nothing besides the intrinsics mess with the context structure.> >> Anyway, one approach would be to expose intrinsics to interrogate an >> inactive context, to get its initial stack pointer (the one it was >> created with) and its current stack pointer, and also to modify both >> before making the context active again. >> >> I don't see any reason why this scheme wouldn't also be compatible >> with segmented stacks. >> ... >> On the other hand, stack manipulation really ought to be handled by >> the target, since only the target knows the details of how the stack >> is laid out to begin with. Also, if we have stack manipulation calls >> in the IR, optimization quickly becomes very difficult. Unless we >> just allow optimizers to ignore the stack manipulations and assume >> they're doing the "right" thing. >> >> On the gripping hand, we don't want the target emitting memory >> allocation calls in order to grow the stack (unless a function pointer >> to malloc or its equivalent is passed in from the IR). > > In gcc's split-stacks > (http://gcc.gnu.org/ml/gcc/2009-02/msg00429.html; I got the name wrong > earlier), Ian planned to call a known global name to allocate memory > (http://gcc.gnu.org/ml/gcc/2009-02/msg00479.html). I'm not sure what > he actually wound up doing on the gccgo branch. LLVM could also put > the allocation/deallocation functions into the context, although it'd > probably be better to just follow gcc. > >>> The way they accomplish that now is by >>> copying the entire stack to the heap on a context switch, and having >>> all threads share the main C stack. This isn't quite as bad as it >>> sounds because it only happens to threads that call into C extension >>> modules. Pure Python threads operate entirely within heap Python >>> frames. Still, it would be nice to support this use case. >> >> This wouldn't hold in IR, since virtual registers regularly get >> spilled to the stack.. every context, regardless of the language, >> would have to have its stack saved. Also, this method would mean that >> a context cannot be used in any native thread other than the one that >> created it, right? > > Well, a frontend can generate code in continuation-passing style or do > all of its user-level "stack" frame manipulation on the heap. Then it > only uses a constant amount of C-stack space, which might not be part > of the context that needs to be switched. Only foreign calls > necessarily use a chunk of C stack. Stackless's approach does seem to > prevent one coroutine's foreign code from using pointers into another > coroutine's stack, and maybe they could/should create a new context > each time they need to enter a foreign frame instead of trying to copy > the stack...I see what you mean. I'll have to look up the conditions under which an llvm call instruction can avoid creating a new physical stack frame, but you've convinced me that it could be made to work better than I thought when I wrote that.> >> 2. We should be able to support "hard switching" in Stackless Python >> by adding a llvm.getcontextstacktop intrinsic. If, as in Kristján's >> example, llvm.getcontext is used to create context A, and then >> execution continues until context B is created with >> llvm.swapcontext(B, A), the region of memory between >> llvm.getcontextstacktop(A) and llvm.getcontextstacktop(B) can be saved >> and later restored when B is resumed. > > Wait, what stack top does swapcontext get? I'd thought that A's and > B's stack top would be the same since they're executing on the same > stack.No, A's stack top would be whatever the stack pointer was when llvm.getcontext was called to create it. B's stack top would be whatever the stack pointer was when llvm.swapcontext was called to create it... it would be further from the common base than A's stack top. The region between them is what needs to be restored before B can become active again, assuming that A's stack space remained valid.> >> Of course that usage would >> throw a monkey wrench into a segmented stack scheme... it assumes that >> context stack areas actually behave like contiguous stacks. Not only >> that, it assumes that no pointers to a context's stack exist outside >> of the context... when the context is inactive, a pointer into a >> context's stack won't be valid! >> >> But in the case of Stackless Python, these caveats can be addressed >> with a simple "Don't do that!", since it's all tied into the language. > > And users shouldn't need both stack copying and split stacks. Just one > should suffice.Exactly.> >> 3. I would need to run some benchmarks, but in some cases it might be >> better to use mmap to swap stacks between contexts... that way nothing >> would need to be copied. > > Presumably the user would deal with that in allocating their stacks > and switching contexts, using the intrinsics LLVM provides? I don't > see a reason yet for LLVM to get into the mmap business.Me either. Stack copying, mmap'ing, slicing, or whatever should be done by the scheduler. LLVM would not include any part of a scheduler, just intrinsics to allow a scheduler to create and switch contexts.> >> 4. I'm hoping that LLVM ends up growing optimization passes that >> minimize the actual physical use of contexts in many use cases. > > That sounds very tricky...One common case would be for a coroutine to be inlined with the help of indirect branches.> >> Also, >> we might be able to guarantee small stack usage with a pass that >> forces recursive calls to spawn a new context and turns large alloca's >> into malloc's, making it safer to have a bunch of little stacks >> without any needed juggling. > > This sounds like a stopgap until real split stacks can be implemented. > http://gcc.gnu.org/wiki/SplitStacks#Backward_compatibility describes > some of the other difficulties in getting even this much to work. > (foreign calls, and function pointers, at least) >True. Some front-ends have more control over this than others, though. And users of a C-like front end would find this pass helpful even though they're ultimately responsible for only calling "safe" functions within a small-stack-space context. Anyway, I updated the document to take a lot of this discussion into account. I hope that the assumptions I made actually are universally applicable in (non-split-stack-enabled) LLVM. -------------- next part -------------- //===----------------------------------------------------------------------===// // Stack and context switching //===----------------------------------------------------------------------===// 4/07/2010 - Initial Revision 4/11/2010 - Introduced llvm.getcontextstacktop and llvm.stackgrowsdown, changed the definition of llvm.makecontext slightly, and added a discussion of context stack space use and possible strategies for minimizing total memory and address space reserved for stacks. At the time of this writing, LLVM supports standard function calls on a thread of execution, but does not support switching function contexts or stacks within a thread of execution. Such support would enable creating coroutines, which in turn supports high performance, safer concurrency, and lower overhead than native threads do, and enables concurrency on systems that lack native thread support. Some C library implementations include support for stack and context switching with functions such as makecontext, getcontext, and setcontext. However, these calls may not exist on some embedded systems, which may also lack native thread support and therefore have a greater need for context switching. Also, built-in support for context switching allows such operations to be lowered to inline assembly rather than a call into the C library. Implementation of kernels in IR would benefit from these intrinsics. The C library functions depend on machine-specific structures to represent the context. Finally, with intrinsic functions for handling context switches, optimizers can be made to recognize and optimize code whose flow of execution is impacted by these context switches. //===----------------------------------------------------------------------===// // Implementation Approach We will model the intrinsics after the C library functions. In environments where the library functions are present, these intrinsics should (optionally) behave compatibly to the library functions, so that code using them can interact gracefully with platform native code using the library functions. In environments where the library functions are absent, these intrinsics can be lowered to inline assembly language instructions with the proper effect. The function passed to llvm.makecontext will unconditionally have its own context passed to it, in order to make it easy for the optimizer to determine when the function uses llvm.swapcontext to temporarily return execution to its "link" context or when it passes its own context as the "link" context to a newly created context. This should make it possible to optimize some common coroutine patterns. A context must be executing within at most one thread at any given time. A context may execute in one thread, and later execute in a different thread. ; Returns the number of bytes needed to hold a stack context. Since the ; context typically includes a copy of most or all machine registers plus ; additional data, mem2reg would offer little advantage; therefore, having the ; size not recognized in the IR as a constant, while it would block mem2reg for ; that particular alloca, would have little practical disadvantage. declare i64 llvm.context.size() readonly ; pCtx shall be a pointer to a memory area of at least the number of bytes ; returned by llvm.context.size(). That memory area will be filled with ; stack context data. A call to llvm.setcontext or llvm.swapcontext with ; that context data will cause execution to proceed as if from a return from ; this llvm.getcontext call. declare void llvm.getcontext({}* %pCtx) ; pCtx shall be a pointer to context data generated by a call to llvm.getcontext ; or llvm.makecontext. llvm.setcontext will cause execution to transfer to ; the point specified in the context data. declare void llvm.setcontext({}* %pCtx) ; The first argument is a pointer to a memory area of at least the number of ; bytes returned by llvm.context.size(). That memory area will be filled with ; new context data. The second argument is a pointer to the function where ; execution of this context shall begin. The third argument defines the ; base of the context's local stack space; this stack space will not ; automatically grow, but no attempt will be made within LLVM to detect overflow ; of this stack space. The fourth argument is a pointer to the "linked" or ; "previous" context; when %func returns, this pointer is dereferenced and ; execution continues at the linked context. When the context begins executing, ; %func is called with a pointer to its corresponding context as its first ; parameter, and the values passed to llvm.makecontext after the %linkCtx ; parameter as its following parameters. %func must use the "C" calling ; convention. If execution unwinds past %func, undefined behavior results. If ; the memory pointed to by %newCtx is overwritten by anything other than a call ; to llvm.swapcontext, a later return from %func will result in undefined ; behavior. The number and types of parameters passed after the %linkCtx ; parameter must match the number and types of parameters expected by %func ; after its first parameter; otherwise, undefined behavior results. declare void llvm.makecontext({}* %newCtx, {}* %func, i8* %stackbegin, {}* %linkCtx, ...) ; Retrieves the link context pointer from the given context. The link context ; pointer is the same one that was passed to llvm.makecontext to create the ; given context. If pCtx was populated by llvm.getcontext rather than ; llvm.makecontext, this function returns a null pointer. If pCtx was ; populated only by llvm.swapcontext, the return value is undefined. declare {}* llvm.getlinkcontext({}* %pCtx) ; Retrieves the context's current stack pointer. declare {}* llvm.getcontextstacktop({}* %pCtx) ; Returns true if the stack grows downward (toward lower addresses), ; and false otherwise. If the stack does grow downward, the *highest* address ; of the area reserved for a context's stack must be passed to ; llvm.makecontext. declare i1 llvm.stackgrowsdown() ; The first argument shall point to a memory area of at least the number of ; bytes returned by llvm.context.size(). That memory area will be filled with ; stack context data corresponding to the current execution context. The ; second argument shall point to context data generated by a call to ; llvm.getcontext or llvm.makecontext. Execution will transfer to the context ; specified by pNewCtx. Using llvm.swapcontext or llvm.setcontext to set ; the context to that stored in pThisCtx will cause execution to transfer back ; to this context as if from a return from llvm.swapcontext. The linked ; context pointer stored in %pThisCtx is not modified. declare void llvm.swapcontext({}* %pThisCtx, {}* %pNewCtx) A simple example using these intrinsics: define void @co1({}* %thisCtx, i32 %startVal) nounwind { entry: %prevCtx = call {}* @llvm.getlinkcontext(%thisCtx) ; Now call print messages. After each print message, temporarily yield ; control back to the previous context. call void @printCo1FirstMessage(i32 %startVal) call void @llvm.swapcontext({}* %thisCtx, {}* %prevCtx) call void @printCo1SecondMessage() call void @llvm.swapcontext({}* %thisCtx, {}* %prevCtx) call void @printCo1ThirdMessage() ret void } define void @co2({}* %thisCtx, i32 %startVal) nounwind { entry: %prevCtx = call {}* @llvm.getlinkcontext(%thisCtx) ; Now call print messages. After each print message, temporarily yield ; control back to the previous context. call void @printCo2FirstMessage(i32 %startVal) call void @llvm.swapcontext({}* %thisCtx, {}* %prevCtx) call void @printCo2SecondMessage() call void @llvm.swapcontext({}* %thisCtx, {}* %prevCtx) call void @printCo2ThirdMessage() ret void } define i32 @main() nounwind { entry: ; alloca space for the context descriptors. %ctxSize = call i64 @llvm.context.size() %p0 = alloca i8, i64 %ctxSize %mainCtx = bitcast i8* %p0 to {}* %p1 = alloca i8, i64 %ctxSize %ctx1 = bitcast i8* %p1 to {}* %p2 = alloca i8, i64 %ctxSize %ctx2 = bitcast i8* %p2 to {}* ; Stacks for the contexts. If llvm.stackgrowsdown returns true, we need ; pointers to the highest addresses of the stack areas rather than the ; lowest addresses. %growsdown = call i1 @llvm.stackgrowsdown() %stackbuf1 = alloca i8, i64 4096 %stackhi1 = getelementptr i8* %stackbuf1, i64 4096 %stack1 = select i1 %growsdown, i8* %stackhi1, i8* %stackbuf1 %stackbuf2 = alloca i8, i64 4096 %stackhi2 = getelementptr i8* %stackbuf2, i64 4096 %stack2 = select i1 %growsdown, i8* %stackhi2, i8* %stackbuf2 ; Create contexts for co1 and co2. co1 and co2 each need an i32 ; initial value. call void @llvm.makecontext({}* %ctx1, bitcast void({}*, i32)* @co1 to {}*, i8* %stack1, {}* %mainCtx, i32 24601) call void @llvm.makecontext({}* %ctx2, bitcast void({}*, i32)* @co2 to {}*, i8* %stack2, {}* %mainCtx, i32 1776) ; Run co1 and co2 in an alternating fashion. call void @llvm.swapcontext({}* %mainCtx, {}* %ctx1) call void @llvm.swapcontext({}* %mainCtx, {}* %ctx2) call void @llvm.swapcontext({}* %mainCtx, {}* %ctx1) call void @llvm.swapcontext({}* %mainCtx, {}* %ctx2) call void @llvm.swapcontext({}* %mainCtx, {}* %ctx1) call void @llvm.swapcontext({}* %mainCtx, {}* %ctx2) ret i32 0 } This code should make the following sequence of messaging calls: printCo1FirstMessage(24601) printCo2FirstMessage(1776) printCo1SecondMessage() printCo2SecondMessage() printCo1ThirdMessage() printCo2ThirdMessage() //===----------------------------------------------------------------------===// // Stack management // When creating a new context with llvm.makecontext, you must pass in a pointer to the base of the context's stack. Function calls and returns will adjust the stack pointer from there in the usual manner. The context will not carry any information about the maximum stack space available to it, and no attempt will be made by LLVM to check for stack overflow in a context. In order to maintain a large number of contexts without overflowing any context stack areas or consuming excessive amounts of virtual memory or address space, schedulers will need to be able to make some assumptions about context stack behavior. A set of useful and (I belive) currently universally applicable assumptions in LLVM: 1. A function call will cause the stack pointer to move further from the original stack base. A function return will cause the stack pointer to move closer to the original stack base, and cause it to take on the value it had immediately before the function call. 2. If llvm.stackgrowsdown() returns true, the context's stack pointer will always compare less than its original base. Otherwise, the context's stack pointer will always compare greater than its original base. 3. The contiguous region of memory between a context's original base and its current stack pointer makes up the context's stack state. It is safe for a context to become active if the content of this region of memory is identical to the content of the same region of memory at the time the context last became inactive or at the time it was first created, whichever is later. 4. It is safe for a context to be active so long as no code outside of that context is modifying any part of the active context's stack state. This implies that multiple contexts may be active on different threads so long as no two active contexts share any part of their stack state. 5. A context created by llvm.getstackcontext() will have the same base as the context from which llvm.getstackcontext() was called. These assumptions directly imply a sixth assumption: If context B is created by executing llvm.getstackcontext() within context A, and the stack state of context A is known to be valid, the stack state of context B can be made valid by restoring the portion of B's stack state between B's stack pointer and A's stack pointer. These assumptions will not hold if a stack segmentation scheme is incorporated into the code running in any context. However, a stack segmentation scheme would remove many of the problems that give rise to the need for these assumptions. Assuming that the above assumptions do hold (i.e., no stack segmentation scheme), a few possible strategies for conserving context stack space: 1. Have multiple contexts share a safely large region of stack space. Ensure that no two contexts sharing stack space are running at the same time. a. When switching out a context, copy all or part of its stack state to a separate buffer, and copy it back when switching the context back in. b. Use mmap/munmap/remap_file_pages (where available) to map a context's stack state into its stack address range when it becomes active. Keep in mind that, if this scheme is in use, no pointer to any of a context's stack variables is safe to dereference unless that context is active. 2. Introduce a pass that forces recursive function calls to run in newly-created contexts and changes large or numerous alloca's into malloc/free combinations. Functions transformed in this way should be safer to run in small stack spaces, although calls to unmodified third-party functions might still pose problems. Of course we would want this pass to run after tail call optimization. 3. Introduce context-aware optimization passes to minimize the actual use of contexts by non-recursive code. For example, some coroutines can be inlined with the help of indirect branches. 4. Have the language front-end always emit code like that produced by the transormation pass described in (2), and ensure the language's runtime libraries also conform. It is not possible in LLVM to completely eliminate the use of stack space, however. //===----------------------------------------------------------------------===// // Moving forward // An easy first step would be to lower the intrinsics to corresponding calls to makecontext and friends. Once that is working on test platforms with these calls in their C libraries, we can start adding platform specific lowering code to platforms that need it, and then as an option for all platforms to allow using this functionality in kernels and other environments that lack a standard C library. In the meantime, optimization passes can be introduced that now how to optimize across some context switches.
Kenneth Uildriks
2010-Apr-11 23:23 UTC
[LLVMdev] Proposal: stack/context switching within a thread
>> >>> 2. We should be able to support "hard switching" in Stackless Python >>> by adding a llvm.getcontextstacktop intrinsic. If, as in Kristján's >>> example, llvm.getcontext is used to create context A, and then >>> execution continues until context B is created with >>> llvm.swapcontext(B, A), the region of memory between >>> llvm.getcontextstacktop(A) and llvm.getcontextstacktop(B) can be saved >>> and later restored when B is resumed. >> >> Wait, what stack top does swapcontext get? I'd thought that A's and >> B's stack top would be the same since they're executing on the same >> stack. > > No, A's stack top would be whatever the stack pointer was when > llvm.getcontext was called to > create it. B's stack top would be whatever the stack pointer was when > llvm.swapcontext was > called to create it... it would be further from the common base than > A's stack top. The region > between them is what needs to be restored before B can become active > again, assuming that > A's stack space remained valid.I forgot to mention that this depends on the assumption that the function that created context A did not return to its caller before the llvm.swapcontext that created context B was executed. And while I'm at it, what it the function that created context A made a tail call in the meantime?
Jeffrey Yasskin
2010-Apr-12 05:07 UTC
[LLVMdev] Proposal: stack/context switching within a thread
On Sun, Apr 11, 2010 at 2:41 PM, Kenneth Uildriks <kennethuil at gmail.com> wrote:> On Sun, Apr 11, 2010 at 4:09 PM, Jeffrey Yasskin <jyasskin at google.com> wrote: >> Kenneth Uildriks <kennethuil at gmail.com> wrote: >>> As I see it, the context switching mechanism itself needs to know >>> where to point the stack register when switching. The C routines take >>> an initial stack pointer when creating the context, and keep track of >>> it from there. If we don't actually need to interoperate with >>> contexts created from the C routines, we have a lot more freedom. >> >> I guess the reason to interoperate with contexts from the C routines >> would be to support ucontext_t's passed into signal handlers? But then >> the LLVM intrinsics need to specify that their context's layout is the >> same as ucontext_t's, on platforms where ucontext_t exists. > > Or perhaps it can be an argument to the target code generator, if > there's any need to switch "compatibility mode" off and on. All that > the intrinsics require is that context creators be given > a memory area of at least size llvm.context.size() to write contexts > into, and that nothing besides the intrinsics mess with the context > structure.Yeah, that sounds right.>>> 2. We should be able to support "hard switching" in Stackless Python >>> by adding a llvm.getcontextstacktop intrinsic. If, as in Kristján's >>> example, llvm.getcontext is used to create context A, and then >>> execution continues until context B is created with >>> llvm.swapcontext(B, A), the region of memory between >>> llvm.getcontextstacktop(A) and llvm.getcontextstacktop(B) can be saved >>> and later restored when B is resumed. >> >> Wait, what stack top does swapcontext get? I'd thought that A's and >> B's stack top would be the same since they're executing on the same >> stack. > > No, A's stack top would be whatever the stack pointer was when > llvm.getcontext was called to > create it. B's stack top would be whatever the stack pointer was when > llvm.swapcontext was > called to create it... it would be further from the common base than > A's stack top. The region > between them is what needs to be restored before B can become active > again, assuming that > A's stack space remained valid.Oops. Either I can't read, or I was confused by the fact that x86 stacks grow down. Probably the reading.> I forgot to mention that this depends on the assumption that the > function that created context A did not return to its caller before > the llvm.swapcontext that created context B was executed. And while > I'm at it, what it the function that created context A made a tail > call in the meantime?Yep. The opengroup manpages are pretty bad about describing the limits of setcontext(): http://www.opengroup.org/onlinepubs/007908775/xsh/getcontext.html. Could you sketch out the restrictions in your document? They may be identical to setjmp/longjmp, which opengroup does document: http://www.opengroup.org/onlinepubs/007908775/xsh/longjmp.html.> Me either. Stack copying, mmap'ing, slicing, or whatever should be > done by the scheduler. LLVM > would not include any part of a scheduler, just intrinsics to allow a > scheduler to create and switch contexts.+1> Anyway, I updated the document to take a lot of this discussion into > account. I hope that the assumptions I made actually are universally > applicable in (non-split-stack-enabled) LLVM.Thanks! Here are some thoughts on your additions. (just the "Stack management" section, right?) A working document like this may work better on a wiki, in http://codereview.appspot.com, or in a public repository rather than as a series of email attachments. :) "The context will not carry any information about the maximum stack space available to it" <- and this is the only line that would need to change to add split stacks, I think. "1. A function call will ..." -> "1. A non-tail function call will ..." ? Item 3 starts referring to "active" and "inactive" contexts, but I don't think you've introduced the terms. The opengroup manpages of swapcontext() and friends don't mention whether it's possible to use them to move a context from one thread to another. *sigh*. I suspect it works fine, but there's a chance it does the wrong thing to thread-local variables. Could you add that as an open question? It'd be nice for LLVM to allow it. Point 4 is a bit confusing. Normally, it's fine for a thread to share some of its stack space with another thread, but your wording seems to prohibit that. I'll forward your next draft back to the stackless folks, unless you want to pick up the thread with them. Thanks, Jeffrey
Reasonably Related Threads
- [LLVMdev] Proposal: stack/context switching within a thread
- [LLVMdev] Proposal: stack/context switching within a thread
- [LLVMdev] Proposal: stack/context switching within a thread
- [LLVMdev] Proposal: stack/context switching within a thread
- [LLVMdev] Proposal: stack/context switching within a thread