The languages I'm faced with compiling in the near future have nonlocal go to statements and nested procedures. A procedure gets implemented as a structure containing its entry point and an environment pointer. It is easy enough to call its entry point and pass the environment pointer as an extra argument (rather like the pointer to this or self in object-oriented code). It's no problem. The trampoline intrinsic is a neat way of packaging this so as to retrofit into C's one-address-only function pointer, but that's not necessary in a language where procedures are all known to have this behaviour. The problem is with the go to statement. Again, local go to's, that go somewhere within the same function are no particular problem -- though I haven't studied the interaction with alloca yet; that might offer a few surprises. The questions I have are about goto's that exit from a function. The traditional mechanism is to implement a label as an instruction-pointer/environment-pointer pair, just as for procedures. You just load the environment-pointer into the appropriate register, and jump to the address. (again there are technical details about the size of the local stack at the destination, disposition of alloca storage, and the like, and nowadays, unwinding the stack). I don't see a mechanism for this. The closest I see is the mechanism for exceptions. I cannot tell by reading the documentation for exception-handling whether it is sufficiently flexible to accomodate nonlocal goto's. It's plain that on modern systems (which dribble saved registers all over the sack) some kind of unwinding is necessary, and that the traditional model of loading a register and jumping is no longer sufficient. What's not clear is whether the exception handling mechanism is sufficiently dynamic to permit an accurate specification of the jump target. It's not necessarily the most local version of the label on the stack that's the target; it's the one whose stack frame is reached by the jumping procedure's array of static links. -- hendrik
On Sun, 04 May 2008 16:05:44 +0000, Hendrik Boom wrote:> The languages I'm faced with compiling in the near future have nonlocal > go to statements and nested procedures. > > A procedure gets implemented as a structure containing its entry point > and an environment pointer. It is easy enough to call its entry point > and pass the environment pointer as an extra argument (rather like the > pointer to this or self in object-oriented code). It's no problem. The > trampoline intrinsic is a neat way of packaging this so as to retrofit > into C's one-address-only function pointer, but that's not necessary in > a language where procedures are all known to have this behaviour. > > The problem is with the go to statement. Again, local go to's, that go > somewhere within the same function are no particular problem -- though I > haven't studied the interaction with alloca yet; that might offer a few > surprises. The questions I have are about goto's that exit from a > function. The traditional mechanism is to implement a label as an > instruction-pointer/environment-pointer pair, just as for procedures. > You just load the environment-pointer into the appropriate register, and > jump to the address. (again there are technical details about the size > of the local stack at the destination, disposition of alloca storage, > and the like, and nowadays, unwinding the stack). > > I don't see a mechanism for this. > > The closest I see is the mechanism for exceptions. I cannot tell by > reading the documentation for exception-handling whether it is > sufficiently flexible to accomodate nonlocal goto's. It's plain that on > modern systems (which dribble saved registers all over the sack) some > kind of unwinding is necessary, and that the traditional model of > loading a register and jumping is no longer sufficient. What's not > clear is whether the exception handling mechanism is sufficiently > dynamic to permit an accurate specification of the jump target. It's > not necessarily the most local version of the label on the stack that's > the target; it's the one whose stack frame is reached by the jumping > procedure's array of static links. > > -- hendrikWell, I suppose it could be kind of done in not-quite C++, and therefore it could probably be made to work in LLVM, like this. Labels are values, like: struct Label{ void * level, *target; } The level identifies the activation record in which the jump target occurs; the target is the actual address of the instruction to be jumped to when the label is used. Suppose the high-level language defined a label L in functino foo. Then we build a Label object LL to represent it. foo() { Label LL; LL.level = ≪ LL.target = &L; ... ... L: } Someplace else, syntactically nested within foo, there's another function bar, which contains a goto L. This goto gets translated into something like: Translate the goto inso something like throw appropriate_static_link->LL; where appropriate static link is whatever is needed to access the proper variable in the enclosing scope. Around every call *from* any function that contains a Label, we have code like: try( the_call } catch(Label e) { if(e.level == LL.level) goto e.label; else throw e; } Of course this coding requires that you can take addresses of labels and jump to them later in C++, which as far as I can recall is not one of its language features. But LLVM is a lower-level language, so I'm hoping that something similar can be accomplished. But I haven't noticed dynamic arguments to the br instruction. -- hendrik
Hi Hendrick, LLVM's CFG is purely static. This is necessary for efficient register allocation behavior. Your front-end could implement a nonlocal goto by using a tail call and a switch in the entry block of your branched-to function. This is essentially how LLVM implements the address-of-label extension, transforming 'goto x' into a switch statement. // 'foo' makes a nonlocal goto into 'bar'. int foo() { return bar_impl(2 /* non-default entry point */, undef, undef /* who knows? */); } // Invoke for a 'normal' call. int bar(int x, int y) { return bar_impl(0 /* default entry point */, x, y); } // Implementation detail. int bar_impl(int label, int x, int y) { switch (label) { case 1: goto L1; case 2: goto L2; default: goto entry; } entry: ... L1: ... L2: ... } Your front-end would need to track whether it is necessary to emit such a shim (i.e., whether there are any incoming gotos). If you use a 'tail call' instruction to invoke 'bar_impl', LLVM will use proper tail calls in some cases. Of course, this doesn't support the full generality of indirect nonlocal goto, so may not be suitable for your application. On 2008-05-04, at 19:56, Hendrik Boom wrote:> On Sun, 04 May 2008 16:05:44 +0000, Hendrik Boom wrote: > >> The languages I'm faced with compiling in the near future have >> nonlocal >> go to statements and nested procedures. >> >> A procedure gets implemented as a structure containing its entry >> point >> and an environment pointer. It is easy enough to call its entry >> point >> and pass the environment pointer as an extra argument (rather like >> the >> pointer to this or self in object-oriented code). It's no >> problem. The >> trampoline intrinsic is a neat way of packaging this so as to >> retrofit >> into C's one-address-only function pointer, but that's not >> necessary in >> a language where procedures are all known to have this behaviour. >> >> The problem is with the go to statement. Again, local go to's, >> that go >> somewhere within the same function are no particular problem -- >> though I >> haven't studied the interaction with alloca yet; that might offer a >> few >> surprises. The questions I have are about goto's that exit from a >> function. The traditional mechanism is to implement a label as an >> instruction-pointer/environment-pointer pair, just as for procedures. >> You just load the environment-pointer into the appropriate >> register, and >> jump to the address. (again there are technical details about the >> size >> of the local stack at the destination, disposition of alloca storage, >> and the like, and nowadays, unwinding the stack). >> >> I don't see a mechanism for this. >> >> The closest I see is the mechanism for exceptions. I cannot tell by >> reading the documentation for exception-handling whether it is >> sufficiently flexible to accomodate nonlocal goto's. It's plain >> that on >> modern systems (which dribble saved registers all over the sack) some >> kind of unwinding is necessary, and that the traditional model of >> loading a register and jumping is no longer sufficient. What's not >> clear is whether the exception handling mechanism is sufficiently >> dynamic to permit an accurate specification of the jump target. It's >> not necessarily the most local version of the label on the stack >> that's >> the target; it's the one whose stack frame is reached by the jumping >> procedure's array of static links. >> >> -- hendrik > > Well, I suppose it could be kind of done in not-quite C++, and > therefore > it could probably be made to work in LLVM, like this. > > Labels are values, like: > > struct Label{ void * level, *target; } > > The level identifies the activation record in which the jump target > occurs; the target is the actual address of the instruction to be > jumped > to when the label is used. > > Suppose the high-level language defined a label L in functino foo. > Then > we build a Label object LL to represent it. > > foo() > { > Label LL; > LL.level = ≪ > LL.target = &L; > > ... > ... > > L: > } > > > Someplace else, syntactically nested within foo, there's another > function > bar, which contains a goto L. > > This goto gets translated into something like: > Translate the goto inso something like > > throw appropriate_static_link->LL; > > where appropriate static link is whatever is needed to access the > proper > variable in the enclosing scope. > > Around every call *from* any function that contains a Label, we have > code > like: > > try( > the_call > } > catch(Label e) > { if(e.level == LL.level) > goto e.label; > else throw e; > } > > Of course this coding requires that you can take addresses of labels > and > jump to them later in C++, which as far as I can recall is not one > of its > language features. > > But LLVM is a lower-level language, so I'm hoping that something > similar > can be accomplished. But I haven't noticed dynamic arguments to the > br > instruction. > > -- hendrik > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev— Gordon
On May 4, 2008, at 9:05 AM, Hendrik Boom wrote:> The languages I'm faced with compiling in the near future have > nonlocal goto statements and nested procedures.You want to return to a previous activation record (pascal speak) or stack frame? If yes, then yes, EH will do that for you. You'll want to understand what EH is in detail and how to map the semantics you want onto it. That mapping could be simple (if you match the usual and customary EH semantics) or complex (if you want to stray farther away). You'll want to understand how you want other languages to behave if you have the ability to have other languages on the stack. For example, Objective-C on Apple's platform currently doesn't interoperate very well when C++ is on the stack (imagine an Objective- C runtime that uses setjmp/longjmp that bypasses the running of C++ dtors for stack variables as you throw past them). This setup, I'd claim is unfortunate, and you really want to avoid it. It is better to map whatever semantics you want onto the normal platform EH abi, so that other languages, C++ included, just work. Also, if you do it that way, you should be able to count on the optimizer getting rid of all the extra stuff and making the code `fast' in more trivial cases.> The problem is with the go to statement. Again, local go to's, that > go > somewhere within the same function are no particular problem -- > though I > haven't studied the interaction with alloca yet; that might offer a > few > surprises.The semantics you generally want are all stack allocated variables are destroyed as the frame which contained them is destroyed. Behavior other than this would be weird. A special exception is made for the object being thrown and all the data that goes with it.> The questions I have are about goto's that exit from a > function. The traditional mechanism is to implement a label as an > instruction-pointer/environment-pointer pair, just as for procedures. > You just load the environment-pointer into the appropriate register, > and > jump to the address.Additionally, one generally needs to deallocate stack variables as well.> I don't see a mechanism for this.See llvm-g++ and C++. I believe that it works.> The closest I see is the mechanism for exceptions. I cannot tell by > reading the documentation for exception-handling whether it is > sufficiently flexible to accomodate nonlocal goto's.It is.> What's not clear is whether the exception handling mechanism is > sufficiently dynamic to > permit an accurate specification of the jump target.It is.> It's not necessarily the most local version of the label on the > stack that's the > target; it's the one whose stack frame is reached by the jumping > procedure's array of static links.You're allowed an arbitrary predicate that explains if this is the right place to wind up.
Hi,> The problem is with the go to statement. Again, local go to's, that go > somewhere within the same function are no particular problem -- though I > haven't studied the interaction with alloca yet; that might offer a few > surprises. The questions I have are about goto's that exit from a > function. The traditional mechanism is to implement a label as an > instruction-pointer/environment-pointer pair, just as for procedures. > You just load the environment-pointer into the appropriate register, and > jump to the address. (again there are technical details about the size > of the local stack at the destination, disposition of alloca storage, and > the like, and nowadays, unwinding the stack). > > I don't see a mechanism for this.you can roll your unwinder using multiple return values (support for functions that return multiple values was recently added): the first return value is the usual function return value; the next one is the nesting depth to which you want to go (I'm assuming that you can only go to a less nested function); the last one is a number indicating which label in the final function you want to branch to. A non-local goto (to function F, label L) becomes: ret { undef, F_static_depth, L_number } After every call to this function you then check the second return value to see if it is a valid static depth (if not, no non-local goto happened and execution continues normally) and if it is then either: (a) the static depth is that for this function; you then execute a switch statement using L_number that branches to the right label; or (b) the static depth is for a lexically less nested function, and you again execute ret { undef, F_static_depth, L_number }> The closest I see is the mechanism for exceptions. I cannot tell by > reading the documentation for exception-handling whether it is > sufficiently flexible to accomodate nonlocal goto's. It's plain that on > modern systems (which dribble saved registers all over the sack) some > kind of unwinding is necessary, and that the traditional model of loading > a register and jumping is no longer sufficient. What's not clear is > whether the exception handling mechanism is sufficiently dynamic to > permit an accurate specification of the jump target. It's not > necessarily the most local version of the label on the stack that's the > target; it's the one whose stack frame is reached by the jumping > procedure's array of static links.The problem with unwinding is that you need to have a personality function and writing your own is a pain. On my infinite list of things to do is working out how to codegen invoke when there is no personality function, and how to codegen unwind. Ciao, Duncan.
Hendrick: I suspect that exception handling is going to be the way to go here. The following is partly tangential, but possibly relevant. Largely for social reasons, we've been looking at doing a C-like surface syntax for BitC. As part of that, we looked at multi-level break and gotos to same-or-outer scope. In particular, we started by asking whether either construct adds any new core semantics to the language, given that we already have exception handling. The answer is "no". As long as all gotos proceed to same-or-outer scope, a program containing gotos can be transformed without loss of meaning into a program containing only exception catch blocks and raise/throw statements. The catch block executes a switch statement to determine how to resume computation at the appropriate label. It isn't pretty, and it's not the way that I would want to actually implement it, but we did find that outcome comfortable from a formal semantic specification standpoint. The nice thing about it is that it generalizes to nested procedure exits in all cases where the target label of the goto appears in a lexically nested scope. However, Mike Stump is right -- you need to consider what happens when other languages intervene on the call stack. This may not arise in your case precisely because you are speaking about inner procedures, but do think it through. shap
On Mon, 2008-05-05 at 09:56 +0200, Duncan Sands wrote:> you can roll your unwinder using multiple return values (support for functions > that return multiple values was recently added)...Yes. In fact, this generalizes to exception handlers in general, which demonstrates that exceptions are "pure" in the functional language sense. Slight modification to what Duncan wrote. What you conceptually want here is to convert the return type into a discriminated union. One leg is used for normal return, while the other is used for goto exit and target label ID (I'm assuming that goto exit does not carry a return value). But it's the same idea as the one Duncan was sketching. shap
On Mon, 05 May 2008 09:56:26 +0200, Duncan Sands wrote:> Hi, > >> The problem is with the go to statement. Again, local go to's, that go >> somewhere within the same function are no particular problem -- though >> I haven't studied the interaction with alloca yet; that might offer a >> few surprises. The questions I have are about goto's that exit from a >> function. The traditional mechanism is to implement a label as an >> instruction-pointer/environment-pointer pair, just as for procedures. >> You just load the environment-pointer into the appropriate register, >> and jump to the address. (again there are technical details about the >> size of the local stack at the destination, disposition of alloca >> storage, and the like, and nowadays, unwinding the stack). >> >> I don't see a mechanism for this. > > you can roll your unwinder using multiple return values (support for > functions that return multiple values was recently added): the first > return value is the usual function return value; the next one is the > nesting depth to which you want to go (I'm assuming that you can only go > to a less nested function); the last one is a number indicating which > label in the final function you want to branch to. > > A non-local goto (to function F, label L) becomes: > ret { undef, F_static_depth, L_number } > After every call to this function you then check the second return value > to see if it is a valid static depth (if not, no non-local goto happened > and execution continues normally) and if it is then either: (a) the > static depth is that for this function; you then execute a switch > statement using L_number that branches to the right label; or (b) the > static depth is for a lexically less nested function, and you again > execute ret { undef, F_static_depth, L_number }Except that it's not the static depth you want, it's the dynamic stack- nesting depth that at run-time corresponds to the function activation that is at the right static depth in the jumper's environment. All invocations of one function are at the same static depth -- that's what's static about it. This doesn't much affect the mechanisms you are presenting, though. The problem with this is that it places a lot of overhead on every function call and return. The alternative mechanisms involving exceptions, while more complex, involve overhead only for functions that actually involve labels (which are rare in practise). Or am I wrong on this.? Does every call through which an exception might pass have to be implemented by an invoke, even though the code there knows nothing whatsoever about exceptions? Might functions with no try, catch, or throw in their code still have to perform calls using invoke instead of call in case their callee might throw an exception straight through them to a faraway outer function?> >> The closest I see is the mechanism for exceptions. I cannot tell by >> reading the documentation for exception-handling whether it is >> sufficiently flexible to accomodate nonlocal goto's. It's plain that >> on modern systems (which dribble saved registers all over the sack) >> some kind of unwinding is necessary, and that the traditional model of >> loading a register and jumping is no longer sufficient. What's not >> clear is whether the exception handling mechanism is sufficiently >> dynamic to permit an accurate specification of the jump target. It's >> not necessarily the most local version of the label on the stack that's >> the target; it's the one whose stack frame is reached by the jumping >> procedure's array of static links. > > The problem with unwinding is that you need to have a personality > function and writing your own is a pain.At the moment I haven't seen (or if I have seen, I haven't understood in detail) enough documentation on exceptions that I can figure out what code to generate or exactly what a personality function does.> On my infinite list of things > to do is working out how to codegen invoke when there is no personality > function, and how to codegen unwind.So this makes me ask, is the C++ exception handling implemented? If so, I might be able to use it instead of my own.> > Ciao, > > Duncan.-- hendrik