According to this page:
http://lwn.net/Articles/252125/
data coming from L1 is only about three times as expensive as data
coming from a register. So putting a register check after *every*
call is probably not going to be profitable, compared to a
thread-local global variable check after every invoke... if they
happen often on a thread, that variable will probably be in cache, and
if they don't happen often, the performance impact will be minimal.
Of course if most methods have variables with destructors, I'll end up
with a check of some kind after almost every (non-nounwind) call
anyway, so a register check would be better. On the other hand,
implementing the register check would seem to require native codegen
changes at callsites as opposed to an IR-modifying pass with a
possible new intrinsic or two.
Anyway, here's my new plan:
1. A thread local global variable, type i8*, initialized to zero.
2. At invoke callsites, right before the invoke call a native method
(mysetjmp) that:
a. Saves ESI, EDI, EBX, EBP, ESP to a buffer alloca'd within the
method containing the invokesite..
b. Sets EAX to 0
c. Returns.
3. The return value of that native method (EAX) is checked, and if
nonzero, branch to unwind label. Otherwise, save the value of the
thread-local-global into the buffer, write the address of that
alloca'd buffer into the thread-local global and make the call.
4. After the call returns, copy the old thread-local-global value out
of the alloca'd buffer back to the thread-local-global.
The unwind instruction will then:
1. Load the thread-local-global value. If it's zero, there's nowhere
to unwind to, so abort.
2. Restore ESI, EDI, EBX, EBP, ESP, and the thread-local-global value
from the buffer.
3. Set EAX to 1.
4. Jump to 2c. (the return instruction for the native method mysetjmp).
The native method will return with all callee-saved registers restored
and a return value in EAX of 1, which will cause the following check
to branch to the unwind label.
Invoke sites only write five callee-saved registers to the stack, and
read/write one pointer to a single thread-local global variable, and
make one direct call. Unwind sites make one direct call, read five
callee-saved registers from the stack (some distance up, so those
memory values might not be warm) and read/write one pointer to a
single thread-local global variable.
The next step would be to replace the mysetjmp call with a new
intrinsic, and then I'd have to save EIP and do an indirect jump to it
at the unwind site instead of jumping to a constant offset within the
native mysetjmp. Making mylongjmp call a new intrinsic will
necessitate no other modifications.
On Thu, Jul 16, 2009 at 11:44 AM, Eli Friedman<eli.friedman at gmail.com>
wrote:> On Thu, Jul 16, 2009 at 9:10 AM, Kenneth Uildriks<kennethuil at
gmail.com> wrote:
>> 1. Which ones? I know that Windows uses it for the "this"
pointer.
>
> The internal fastcc convention and the Windows fastcall convention off
> the top of my head.
>
>> Anyway, unless the callee is required to preserve it in a given
>> calling convention, that doesn't preclude us using it for a
*return*
>> value. It would be checked after calls return, and wouldn't affect
>> the use of the register for passing values in before the call is made.
>> The callee would set it right before return.
>
> Right, so that sounds okay.
>
>> 2. Does LLVM support nested functions? I must have missed that.
>
> To the extent required to implement the gcc nested functions
> extension, yes. The specific relevant behavior here is that if a
> parameter is marked with the nest attribute, it gets passed in ECX.
>
> -Eli
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>