thr3ads.net - llvm dev - [LLVMdev] nonlocal go to -- how? [May 2008]

If this information is useful, please help other people find it:
Share via:

Hendrik Boom

2008-May-04 16:05 UTC

[LLVMdev] nonlocal go to -- how?

The languages I'm faced with compiling in the near future have nonlocal 
go to statements and nested procedures.

A procedure gets implemented as a structure containing its entry point 
and an environment pointer.  It is easy enough to call its entry point 
and pass the environment pointer as an extra argument (rather like the 
pointer to this or self in object-oriented code).  It's no problem.  The 
trampoline intrinsic is a neat way of packaging this so as to retrofit 
into C's one-address-only function pointer, but that's not necessary in
a
language where procedures are all known to have this behaviour.

The problem is with the go to statement.  Again, local go to's, that go 
somewhere within the same function are no particular problem -- though I 
haven't studied the interaction with alloca yet; that might offer a few 
surprises.  The questions I have are about goto's that exit from a 
function.  The traditional mechanism is to implement a label as an 
instruction-pointer/environment-pointer pair, just as for procedures.  
You just load the environment-pointer into the appropriate register, and 
jump to the address.  (again there are technical details about the size 
of the local stack at the destination, disposition of alloca storage, and 
the like, and nowadays, unwinding the stack).

I don't see a mechanism for this.

The closest I see is the mechanism for exceptions.  I cannot tell by 
reading the documentation for exception-handling whether it is 
sufficiently flexible to accomodate nonlocal goto's.  It's plain that on
modern systems (which dribble saved registers all over the sack) some 
kind of unwinding is necessary, and that the traditional model of loading 
a register and jumping is no longer sufficient.  What's not clear is 
whether the exception handling mechanism is sufficiently dynamic to 
permit an accurate specification of the jump target.  It's not 
necessarily the most local version of the label on the stack that's the 
target; it's the one whose stack frame is reached by the jumping 
procedure's array of static links.

-- hendrik

Hendrik Boom

2008-May-04 23:56 UTC

head link

[LLVMdev] nonlocal go to -- how?

On Sun, 04 May 2008 16:05:44 +0000, Hendrik Boom wrote:
> The languages I'm faced with compiling in the near future have nonlocal
> go to statements and nested procedures.
> 
> A procedure gets implemented as a structure containing its entry point
> and an environment pointer.  It is easy enough to call its entry point
> and pass the environment pointer as an extra argument (rather like the
> pointer to this or self in object-oriented code).  It's no problem. 
The
> trampoline intrinsic is a neat way of packaging this so as to retrofit
> into C's one-address-only function pointer, but that's not
necessary in
> a language where procedures are all known to have this behaviour.
> 
> The problem is with the go to statement.  Again, local go to's, that go
> somewhere within the same function are no particular problem -- though I
> haven't studied the interaction with alloca yet; that might offer a few
> surprises.  The questions I have are about goto's that exit from a
> function.  The traditional mechanism is to implement a label as an
> instruction-pointer/environment-pointer pair, just as for procedures.
> You just load the environment-pointer into the appropriate register, and
> jump to the address.  (again there are technical details about the size
> of the local stack at the destination, disposition of alloca storage,
> and the like, and nowadays, unwinding the stack).
> 
> I don't see a mechanism for this.
> 
> The closest I see is the mechanism for exceptions.  I cannot tell by
> reading the documentation for exception-handling whether it is
> sufficiently flexible to accomodate nonlocal goto's.  It's plain
that on
> modern systems (which dribble saved registers all over the sack) some
> kind of unwinding is necessary, and that the traditional model of
> loading a register and jumping is no longer sufficient.  What's not
> clear is whether the exception handling mechanism is sufficiently
> dynamic to permit an accurate specification of the jump target.  It's
> not necessarily the most local version of the label on the stack that's
> the target; it's the one whose stack frame is reached by the jumping
> procedure's array of static links.
> 
> -- hendrik
Well, I suppose it could be kind of done in not-quite C++, and therefore 
it could probably be made to work in LLVM, like this.

Labels are values, like:
 
struct Label{ void * level, *target; }

The level identifies the activation record in which the jump target 
occurs; the target  is the actual address of the instruction to be jumped 
to when the label is used.

Suppose the high-level language defined a label L in functino foo.  Then 
we build a Label object LL to represent it.

foo()
{
Label LL;
LL.level = &LL;
LL.target = &L;

...
...

L:
}


Someplace else, syntactically nested within foo, there's another function 
bar, which contains a goto L.

This goto gets translated into something like:
Translate the goto inso something like

  throw appropriate_static_link->LL;

where appropriate static link is whatever is needed to access the proper 
variable in the enclosing scope.

Around every call *from* any function that contains a Label, we have code 
like:

  try(
	the_call
  }
  catch(Label e)
  { if(e.level == LL.level)
	goto e.label;
    else throw e;
  }

Of course this coding requires that you can take addresses of labels and 
jump to them later in C++, which as far as I can recall is not one of its 
language features.

But LLVM is a lower-level language, so I'm hoping that something similar 
can be accomplished.  But I haven't noticed dynamic arguments to the br 
instruction.

-- hendrik

Gordon Henriksen

2008-May-05 02:16 UTC

head link

[LLVMdev] nonlocal go to -- how?

Hi Hendrick,

LLVM's CFG is purely static. This is necessary for efficient register  
allocation behavior. Your front-end could implement a nonlocal goto by  
using a tail call and a switch in the entry block of your branched-to  
function. This is essentially how LLVM implements the address-of-label  
extension, transforming 'goto x' into a switch statement.

// 'foo' makes a nonlocal goto into 'bar'.
int foo() {
   return bar_impl(2 /* non-default entry point */, undef, undef /*  
who knows? */);
}

// Invoke for a 'normal' call.
int bar(int x, int y) {
   return bar_impl(0 /* default entry point */, x, y);
}

// Implementation detail.
int bar_impl(int label, int x, int y) {
   switch (label) {
   case 1: goto L1;
   case 2: goto L2;
   default: goto entry;
   }
entry:
   ...
L1:
   ...
L2:
   ...
}

Your front-end would need to track whether it is necessary to emit  
such a shim (i.e., whether there are any incoming gotos).

If you use a 'tail call' instruction to invoke 'bar_impl', LLVM
will
use proper tail calls in some cases.

Of course, this doesn't support the full generality of indirect  
nonlocal goto, so may not be suitable for your application.

On 2008-05-04, at 19:56, Hendrik Boom wrote:
> On Sun, 04 May 2008 16:05:44 +0000, Hendrik Boom wrote:
>
>> The languages I'm faced with compiling in the near future have  
>> nonlocal
>> go to statements and nested procedures.
>>
>> A procedure gets implemented as a structure containing its entry  
>> point
>> and an environment pointer.  It is easy enough to call its entry  
>> point
>> and pass the environment pointer as an extra argument (rather like  
>> the
>> pointer to this or self in object-oriented code).  It's no  
>> problem.  The
>> trampoline intrinsic is a neat way of packaging this so as to  
>> retrofit
>> into C's one-address-only function pointer, but that's not  
>> necessary in
>> a language where procedures are all known to have this behaviour.
>>
>> The problem is with the go to statement.  Again, local go to's,  
>> that go
>> somewhere within the same function are no particular problem --  
>> though I
>> haven't studied the interaction with alloca yet; that might offer a
>> few
>> surprises.  The questions I have are about goto's that exit from a
>> function.  The traditional mechanism is to implement a label as an
>> instruction-pointer/environment-pointer pair, just as for procedures.
>> You just load the environment-pointer into the appropriate  
>> register, and
>> jump to the address.  (again there are technical details about the  
>> size
>> of the local stack at the destination, disposition of alloca storage,
>> and the like, and nowadays, unwinding the stack).
>>
>> I don't see a mechanism for this.
>>
>> The closest I see is the mechanism for exceptions.  I cannot tell by
>> reading the documentation for exception-handling whether it is
>> sufficiently flexible to accomodate nonlocal goto's.  It's
plain
>> that on
>> modern systems (which dribble saved registers all over the sack) some
>> kind of unwinding is necessary, and that the traditional model of
>> loading a register and jumping is no longer sufficient.  What's not
>> clear is whether the exception handling mechanism is sufficiently
>> dynamic to permit an accurate specification of the jump target. 
It's
>> not necessarily the most local version of the label on the stack  
>> that's
>> the target; it's the one whose stack frame is reached by the
jumping
>> procedure's array of static links.
>>
>> -- hendrik
>
> Well, I suppose it could be kind of done in not-quite C++, and  
> therefore
> it could probably be made to work in LLVM, like this.
>
> Labels are values, like:
>
> struct Label{ void * level, *target; }
>
> The level identifies the activation record in which the jump target
> occurs; the target  is the actual address of the instruction to be  
> jumped
> to when the label is used.
>
> Suppose the high-level language defined a label L in functino foo.   
> Then
> we build a Label object LL to represent it.
>
> foo()
> {
> Label LL;
> LL.level = &LL;
> LL.target = &L;
>
> ...
> ...
>
> L:
> }
>
>
> Someplace else, syntactically nested within foo, there's another  
> function
> bar, which contains a goto L.
>
> This goto gets translated into something like:
> Translate the goto inso something like
>
>  throw appropriate_static_link->LL;
>
> where appropriate static link is whatever is needed to access the  
> proper
> variable in the enclosing scope.
>
> Around every call *from* any function that contains a Label, we have  
> code
> like:
>
>  try(
> 	the_call
>  }
>  catch(Label e)
>  { if(e.level == LL.level)
> 	goto e.label;
>    else throw e;
>  }
>
> Of course this coding requires that you can take addresses of labels  
> and
> jump to them later in C++, which as far as I can recall is not one  
> of its
> language features.
>
> But LLVM is a lower-level language, so I'm hoping that something  
> similar
> can be accomplished.  But I haven't noticed dynamic arguments to the  
> br
> instruction.
>
> -- hendrik
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


— Gordon

Mike Stump

2008-May-05 02:53 UTC

head link

[LLVMdev] nonlocal go to -- how?

On May 4, 2008, at 9:05 AM, Hendrik Boom wrote:> The languages I'm faced with compiling in the near future have  
> nonlocal goto statements and nested procedures.
You want to return to a previous activation record (pascal speak) or  
stack frame?  If yes, then yes, EH will do that for you.  You'll want  
to understand what EH is in detail and how to map the semantics you  
want onto it.  That mapping could be simple (if you match the usual  
and customary EH semantics) or complex (if you want to stray farther  
away).  You'll want to understand how you want other languages to  
behave if you have the ability to have other languages on the stack.

For example, Objective-C on Apple's platform currently doesn't  
interoperate very well when C++ is on the stack (imagine an Objective- 
C runtime that uses setjmp/longjmp that bypasses the running of C++  
dtors for stack variables as you throw past them).  This setup, I'd  
claim is unfortunate, and you really want to avoid it.  It is better  
to map whatever semantics you want onto the normal platform EH abi, so  
that other languages, C++ included, just work.

Also, if you do it that way, you should be able to count on the  
optimizer getting rid of all the extra stuff and making the code  
`fast' in more trivial cases.
> The problem is with the go to statement.  Again, local go to's, that  
> go
> somewhere within the same function are no particular problem --  
> though I
> haven't studied the interaction with alloca yet; that might offer a  
> few
> surprises.
The semantics you generally want are all stack allocated variables are  
destroyed as the frame which contained them is destroyed.  Behavior  
other than this would be weird.  A special exception is made for the  
object being thrown and all the data that goes with it.
> The questions I have are about goto's that exit from a
> function.  The traditional mechanism is to implement a label as an
> instruction-pointer/environment-pointer pair, just as for procedures.
> You just load the environment-pointer into the appropriate register,  
> and
> jump to the address.
Additionally, one generally needs to deallocate stack variables as well.
> I don't see a mechanism for this.
See llvm-g++ and C++.  I believe that it works.
> The closest I see is the mechanism for exceptions.  I cannot tell by
> reading the documentation for exception-handling whether it is
> sufficiently flexible to accomodate nonlocal goto's.
It is.
> What's not clear is whether the exception handling mechanism is  
> sufficiently dynamic to
> permit an accurate specification of the jump target.
It is.
> It's not necessarily the most local version of the label on the  
> stack that's the
> target; it's the one whose stack frame is reached by the jumping
> procedure's array of static links.
You're allowed an arbitrary predicate that explains if this is the  
right place to wind up.

Duncan Sands

2008-May-05 07:56 UTC

head link

[LLVMdev] nonlocal go to -- how?

Hi,
> The problem is with the go to statement.  Again, local go to's, that go
> somewhere within the same function are no particular problem -- though I 
> haven't studied the interaction with alloca yet; that might offer a few
> surprises.  The questions I have are about goto's that exit from a 
> function.  The traditional mechanism is to implement a label as an 
> instruction-pointer/environment-pointer pair, just as for procedures.  
> You just load the environment-pointer into the appropriate register, and 
> jump to the address.  (again there are technical details about the size 
> of the local stack at the destination, disposition of alloca storage, and 
> the like, and nowadays, unwinding the stack).
> 
> I don't see a mechanism for this.
you can roll your unwinder using multiple return values (support for functions
that return multiple values was recently added): the first return value is the
usual function return value; the next one is the nesting depth to which you
want to go (I'm assuming that you can only go to a less nested function);
the last
one is a number indicating which label in the final function you want to branch
to.

A non-local goto (to function F, label L) becomes:
  ret { undef, F_static_depth, L_number }
After every call to this function you then check the second return value to see
if
it is a valid static depth (if not, no non-local goto happened and execution
continues normally) and if it is then either: (a) the static depth is that for
this function; you then execute a switch statement using L_number that branches
to the right label; or (b) the static depth is for a lexically less nested
function, and you again execute ret { undef, F_static_depth, L_number }
> The closest I see is the mechanism for exceptions.  I cannot tell by 
> reading the documentation for exception-handling whether it is 
> sufficiently flexible to accomodate nonlocal goto's.  It's plain
that on
> modern systems (which dribble saved registers all over the sack) some 
> kind of unwinding is necessary, and that the traditional model of loading 
> a register and jumping is no longer sufficient.  What's not clear is 
> whether the exception handling mechanism is sufficiently dynamic to 
> permit an accurate specification of the jump target.  It's not 
> necessarily the most local version of the label on the stack that's the
> target; it's the one whose stack frame is reached by the jumping 
> procedure's array of static links.
The problem with unwinding is that you need to have a personality
function and writing your own is a pain.  On my infinite list of
things to do is working out how to codegen invoke when there is no
personality function, and how to codegen unwind.

Ciao,

Duncan.

Jonathan S. Shapiro

2008-May-05 11:25 UTC

head link

[LLVMdev] nonlocal go to -- how?

Hendrick:

I suspect that exception handling is going to be the way to go here. The
following is partly tangential, but possibly relevant.

Largely for social reasons, we've been looking at doing a C-like surface
syntax for BitC. As part of that, we looked at multi-level break and
gotos to same-or-outer scope. In particular, we started by asking
whether either construct adds any new core semantics to the language,
given that we already have exception handling.

The answer is "no". As long as all gotos proceed to same-or-outer
scope,
a program containing gotos can be transformed without loss of meaning
into a program containing only exception catch blocks and raise/throw
statements. The catch block executes a switch statement to determine how
to resume computation at the appropriate label. It isn't pretty, and
it's not the way that I would want to actually implement it, but we did
find that outcome comfortable from a formal semantic specification
standpoint.

The nice thing about it is that it generalizes to nested procedure exits
in all cases where the target label of the goto appears in a lexically
nested scope.

However, Mike Stump is right -- you need to consider what happens when
other languages intervene on the call stack. This may not arise in your
case precisely because you are speaking about inner procedures, but do
think it through.

shap

Jonathan S. Shapiro

2008-May-05 11:28 UTC

head link

[LLVMdev] nonlocal go to -- how?

On Mon, 2008-05-05 at 09:56 +0200, Duncan Sands wrote:> you can roll your unwinder using multiple return values (support for
functions
> that return multiple values was recently added)...
Yes. In fact, this generalizes to exception handlers in general, which
demonstrates that exceptions are "pure" in the functional language
sense.

Slight modification to what Duncan wrote. What you conceptually want
here is to convert the return type into a discriminated union. One leg
is used for normal return, while the other is used for goto exit and
target label ID (I'm assuming that goto exit does not carry a return
value). But it's the same idea as the one Duncan was sketching.


shap

Hendrik Boom

2008-May-05 14:00 UTC

head link

[LLVMdev] nonlocal go to -- how?

On Mon, 05 May 2008 09:56:26 +0200, Duncan Sands wrote:
> Hi,
> 
>> The problem is with the go to statement.  Again, local go to's,
that go
>> somewhere within the same function are no particular problem -- though
>> I haven't studied the interaction with alloca yet; that might offer
a
>> few surprises.  The questions I have are about goto's that exit
from a
>> function.  The traditional mechanism is to implement a label as an
>> instruction-pointer/environment-pointer pair, just as for procedures.
>> You just load the environment-pointer into the appropriate register,
>> and jump to the address.  (again there are technical details about the
>> size of the local stack at the destination, disposition of alloca
>> storage, and the like, and nowadays, unwinding the stack).
>> 
>> I don't see a mechanism for this.
> 
> you can roll your unwinder using multiple return values (support for
> functions that return multiple values was recently added): the first
> return value is the usual function return value; the next one is the
> nesting depth to which you want to go (I'm assuming that you can only
go
> to a less nested function); the last one is a number indicating which
> label in the final function you want to branch to.
> 
> A non-local goto (to function F, label L) becomes:
>   ret { undef, F_static_depth, L_number }
> After every call to this function you then check the second return value
> to see if it is a valid static depth (if not, no non-local goto happened
> and execution continues normally) and if it is then either: (a) the
> static depth is that for this function; you then execute a switch
> statement using L_number that branches to the right label; or (b) the
> static depth is for a lexically less nested function, and you again
> execute ret { undef, F_static_depth, L_number }
Except that it's not the static depth you want, it's the dynamic stack-
nesting depth that at run-time corresponds to the function activation 
that is at the right static depth in the jumper's environment.  All 
invocations of one function are at the same static depth -- that's
what's
static about it.  This doesn't much affect the mechanisms you are 
presenting, though.

The problem with this is that it places a lot of overhead on every 
function call and return.  The alternative mechanisms involving 
exceptions, while more complex, involve overhead only for functions that 
actually involve labels (which are rare in practise).

Or am I wrong on this.? Does every call through which an exception might 
pass have to be implemented by an invoke, even though the code there 
knows nothing whatsoever about exceptions?  Might functions with no try, 
catch, or throw in their code still have to perform calls using invoke 
instead of call in case their callee might throw an exception straight 
through them to a faraway outer function?
> 
>> The closest I see is the mechanism for exceptions.  I cannot tell by
>> reading the documentation for exception-handling whether it is
>> sufficiently flexible to accomodate nonlocal goto's.  It's
plain that
>> on modern systems (which dribble saved registers all over the sack)
>> some kind of unwinding is necessary, and that the traditional model of
>> loading a register and jumping is no longer sufficient.  What's not
>> clear is whether the exception handling mechanism is sufficiently
>> dynamic to permit an accurate specification of the jump target. 
It's
>> not necessarily the most local version of the label on the stack
that's
>> the target; it's the one whose stack frame is reached by the
jumping
>> procedure's array of static links.
> 
> The problem with unwinding is that you need to have a personality
> function and writing your own is a pain.
At the moment I haven't seen (or if I have seen, I haven't understood in
detail) enough documentation on exceptions that I can figure out what 
code to generate or exactly what a personality function does.
> On my infinite list of things
> to do is working out how to codegen invoke when there is no personality
> function, and how to codegen unwind.
So this makes me ask, is the C++ exception handling implemented?  If so, 
I might be able to use it instead of my own.
 > 
> Ciao,
> 
> Duncan.
-- hendrik

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - May 2008 - [LLVMdev] nonlocal go to -- how?

[LLVMdev] nonlocal go to -- how?

[LLVMdev] nonlocal go to -- how?

[LLVMdev] nonlocal go to -- how?

[LLVMdev] nonlocal go to -- how?

[LLVMdev] nonlocal go to -- how?

[LLVMdev] nonlocal go to -- how?

[LLVMdev] nonlocal go to -- how?

[LLVMdev] nonlocal go to -- how?

Possibly Parallel Threads