thr3ads.net - llvm dev - [LLVMdev] Proposal: stack/context switching within a thread [Apr 2010]

If this information is useful, please help other people find it:
Share via:

Kenneth Uildriks

2010-Apr-10 21:34 UTC

[LLVMdev] Proposal: stack/context switching within a thread

On the other hand, stack manipulation really ought to be handled by
the target, since only the target knows the details of how the stack
is laid out to begin with.  Also, if we have stack manipulation calls
in the IR, optimization quickly becomes very difficult.  Unless we
just allow optimizers to ignore the stack manipulations and assume
they're doing the "right" thing.

On the gripping hand, we don't want the target emitting memory
allocation calls in order to grow the stack (unless a function pointer
to malloc or its equivalent is passed in from the IR).
> The way they accomplish that now is by
> copying the entire stack to the heap on a context switch, and having
> all threads share the main C stack. This isn't quite as bad as it
> sounds because it only happens to threads that call into C extension
> modules. Pure Python threads operate entirely within heap Python
> frames. Still, it would be nice to support this use case.
This wouldn't hold in IR, since virtual registers regularly get
spilled to the stack.. every context, regardless of the language,
would have to have its stack saved.  Also, this method would mean that
a context cannot be used in any native thread other than the one that
created it, right?

Kenneth Uildriks

2010-Apr-11 16:01 UTC

head link

[LLVMdev] Proposal: stack/context switching within a thread

Having read through Stackless Python's web pages a bit:

1. They're doing pretty much what I'd like to do, except that I
don't
want to be tied to a particular language and I'd like to be able to
use the stack.  (Also, stack use is inescapable with LLVM, as far as I
can tell).

2. We should be able to support "hard switching" in Stackless Python
by adding a llvm.getcontextstacktop intrinsic.  If, as in Kristján's
example, llvm.getcontext is used to create context A, and then
execution continues until context B is created with
llvm.swapcontext(B, A), the region of memory between
llvm.getcontextstacktop(A) and llvm.getcontextstacktop(B) can be saved
and later restored when B is resumed.  Of course that usage would
throw a monkey wrench into a segmented stack scheme... it assumes that
context stack areas actually behave like contiguous stacks.  Not only
that, it assumes that no pointers to a context's stack exist outside
of the context... when the context is inactive, a pointer into a
context's stack won't be valid!

But in the case of Stackless Python, these caveats can be addressed
with a simple "Don't do that!", since it's all tied into the
language.

3. I would need to run some benchmarks, but in some cases it might be
better to use mmap to swap stacks between contexts... that way nothing
would need to be copied.

4. I'm hoping that LLVM ends up growing optimization passes that
minimize the actual physical use of contexts in many use cases.  Also,
we might be able to guarantee small stack usage with a pass that
forces recursive calls to spawn a new context and turns large alloca's
into malloc's, making it safer to have a bunch of little stacks
without any needed juggling.

Jeffrey Yasskin

2010-Apr-11 21:09 UTC

head link

[LLVMdev] Proposal: stack/context switching within a thread

Kenneth Uildriks <kennethuil at gmail.com> wrote:> As I see it, the context switching mechanism itself needs to know
> where to point the stack register when switching.  The C routines take
> an initial stack pointer when creating the context, and keep track of
> it from there.  If we don't actually need to interoperate with
> contexts created from the C routines, we have a lot more freedom.
I guess the reason to interoperate with contexts from the C routines
would be to support ucontext_t's passed into signal handlers? But then
the LLVM intrinsics need to specify that their context's layout is the
same as ucontext_t's, on platforms where ucontext_t exists.
> Anyway, one approach would be to expose intrinsics to interrogate an
> inactive context, to get its initial stack pointer (the one it was
> created with) and its current stack pointer, and also  to modify both
> before making the context active again.
>
> I don't see any reason why this scheme wouldn't also be compatible
> with segmented stacks.
> ...
> On the other hand, stack manipulation really ought to be handled by
> the target, since only the target knows the details of how the stack
> is laid out to begin with.  Also, if we have stack manipulation calls
> in the IR, optimization quickly becomes very difficult.  Unless we
> just allow optimizers to ignore the stack manipulations and assume
> they're doing the "right" thing.
>
> On the gripping hand, we don't want the target emitting memory
> allocation calls in order to grow the stack (unless a function pointer
> to malloc or its equivalent is passed in from the IR).
In gcc's split-stacks
(http://gcc.gnu.org/ml/gcc/2009-02/msg00429.html; I got the name wrong
earlier), Ian planned to call a known global name to allocate memory
(http://gcc.gnu.org/ml/gcc/2009-02/msg00479.html). I'm not sure what
he actually wound up doing on the gccgo branch. LLVM could also put
the allocation/deallocation functions into the context, although it'd
probably be better to just follow gcc.
>> The way they accomplish that now is by
>> copying the entire stack to the heap on a context switch, and having
>> all threads share the main C stack. This isn't quite as bad as it
>> sounds because it only happens to threads that call into C extension
>> modules. Pure Python threads operate entirely within heap Python
>> frames. Still, it would be nice to support this use case.
>
> This wouldn't hold in IR, since virtual registers regularly get
> spilled to the stack.. every context, regardless of the language,
> would have to have its stack saved.  Also, this method would mean that
> a context cannot be used in any native thread other than the one that
> created it, right?
Well, a frontend can generate code in continuation-passing style or do
all of its user-level "stack" frame manipulation on the heap. Then it
only uses a constant amount of C-stack space, which might not be part
of the context that needs to be switched. Only foreign calls
necessarily use a chunk of C stack. Stackless's approach does seem to
prevent one coroutine's foreign code from using pointers into another
coroutine's stack, and maybe they could/should create a new context
each time they need to enter a foreign frame instead of trying to copy
the stack...
> 2. We should be able to support "hard switching" in Stackless
Python
> by adding a llvm.getcontextstacktop intrinsic.  If, as in Kristján's
> example, llvm.getcontext is used to create context A, and then
> execution continues until context B is created with
> llvm.swapcontext(B, A), the region of memory between
> llvm.getcontextstacktop(A) and llvm.getcontextstacktop(B) can be saved
> and later restored when B is resumed.
Wait, what stack top does swapcontext get? I'd thought that A's and
B's stack top would be the same since they're executing on the same
stack.
> Of course that usage would
> throw a monkey wrench into a segmented stack scheme... it assumes that
> context stack areas actually behave like contiguous stacks.  Not only
> that, it assumes that no pointers to a context's stack exist outside
> of the context... when the context is inactive, a pointer into a
> context's stack won't be valid!
>
> But in the case of Stackless Python, these caveats can be addressed
> with a simple "Don't do that!", since it's all tied into
the language.
And users shouldn't need both stack copying and split stacks. Just one
should suffice.
> 3. I would need to run some benchmarks, but in some cases it might be
> better to use mmap to swap stacks between contexts... that way nothing
> would need to be copied.
Presumably the user would deal with that in allocating their stacks
and switching contexts, using the intrinsics LLVM provides? I don't
see a reason yet for LLVM to get into the mmap business.
> 4. I'm hoping that LLVM ends up growing optimization passes that
> minimize the actual physical use of contexts in many use cases.
That sounds very tricky...
> Also,
> we might be able to guarantee small stack usage with a pass that
> forces recursive calls to spawn a new context and turns large alloca's
> into malloc's, making it safer to have a bunch of little stacks
> without any needed juggling.
This sounds like a stopgap until real split stacks can be implemented.
http://gcc.gnu.org/wiki/SplitStacks#Backward_compatibility describes
some of the other difficulties in getting even this much to work.
(foreign calls, and function pointers, at least)

Possibly Parallel Threads

Search for more seemingly similar threads

llvm dev - Apr 2010 - [LLVMdev] Proposal: stack/context switching within a thread

[LLVMdev] Proposal: stack/context switching within a thread

[LLVMdev] Proposal: stack/context switching within a thread

[LLVMdev] Proposal: stack/context switching within a thread

Possibly Parallel Threads