tfpt review "/shelveset:Loops20;REDMOND\tomat"
Implements adaptive loop compilation. This feature needed major changes to local
variable handling and control flow implementation in interpreter.
Local variables
Replaces a list of local variables with LocalsVariable structure that
encapsulates a dictionary. It doesn''t support variable shadowing yet
but it at least detects it and throws NotSupportedException. Previously we
silently used wrong indices to the variable array.
Control flow
Reimplements interpreter goto instructions and exception handling. Goto
instructions used to encode all information describing the jump (a list of
finally blocks to be executed and target stack depth). The loop compiler needs
to find all GotoExpressions within the loop that jump out of the loop and
associate them with the corresponding Goto instructions. This cannot be done in
presence of reducible nodes as they don''t preserve nodes identity.
Therefore we need to move the jump information from goto instruction to the
target label and track current try and finally blocks.
GotoInstruction, EnterTryFinallyInstruction and LeaveExceptionHandlerInstruction
derive now from IndexedBranchInstruction. While OffsetInstruction hold on a
relative offset these instructions hold on the target label index in the table
of RuntimeLabels. RuntimeLabel struct comprises of target instruction index and
target stack depth and target continuation stack depth. That''s all it
is needed for a jump to be executed. Jumps via label index are a little bit
slower than jumps to relative offset since they need to look up the target index
in the label table. Also the label table is only as big as there are gotos and
try-catch/try-finally blocks in the lambda. We can easily convert other branch
instructions into IndexedBranchInstructions if we find it better.
Using indexed branch instructions moves target stack depth to the label. We also
need to move finally list out of goto instruction. Since a single label might be
used as a target of multiple goto instructions/expressions and these could be
nested in different try-finally blocks we need to track the stack of finally
blocks that we enter and leave as we execute instructions.
EnterTryFinallyInstruction is added at the beginning of every try-finally block.
This instruction pushes a local continuation into the stack of continuations
stored on InterpretedFrame. The top item of this stack is current continuation.
A continuation is implemented as an integer index into label table. The
continuation pushed by EnterTryFinally points to finally clause.
GotoInstruction sets the current pending continuation and pending value (if it
transfers a value) and jumps to the current continuation if there is any.
A GotoInstruction is emitted at the end of the try-finally body. This
goto''s target is the end of the entire try expression.
EnterFinallyInstruction is emitted at the beginning of finally clause. It
removes the current continuation from the continuation stack, pushes the pending
continuation and value onto the data stack and invalidates them. If any
exception is thrown but not caught during execution of finally clause the
current pending continuation is canceled (and forgotten) and a new one is set.
LeaveFinallyInstruction is emitted at the end of the finally clause. It pops the
pending continuation (and pending value) from data stack and yields to it.
YeildToPendingContinuation operation compares continuation stack depth of the
current continuation with the continuation stack depth of the pending one. It
jumps to the pending one only if its depth is less, i.e. when there is no
continuation (finally clause) to be executed before we can jump to the target
block. Otherwise it jumps to the current continuation.
Whenever an exception occurs we catch it in Interpreter.Run method. We look for
the exception handler that should be executed.
If we find one we perform the same steps as if we just executed GotoInstruction
targeted to the exception handle: we set the current pending continuation to the
label that points to the handler and set pending value to the exception object.
Finally, we jump to the current continuation.
If there is no catch or fault handler we do the same as if there was one with
instruction index Int32.MaxValue. That emulates a jump to the end of the
instruction sequence. If this jump is not interrupted by another exception
raised from some finally/fault block or goto jumping from a finally block we
finish instruction execution and return from Run method with the current
InstructionIndex set to the special value Int32.MaxValue. That indicates that we
should rethrow the exception and so we do.
Moves InterpretedFrame chaining from IronRuby to the interpreter. The frames are
linked into a stack by Interpreter.Run method so that each CLR frame of this
method corresponds to an interpreted stack frame in the interpreted stack. The
two traces can be combined into one. A static
ThreadLocal<InterpretedFrame> variable is updated upon entry and exit from
Run method.
Loop compiler
Adds a new EnterLoopInstruction that is injected at the beginning of a loop
generated from LoopExpression. This instruction has a counter that increments
each time it is executed. If the counter reached CompilationThreshold a
compilation is started on a background thread. The instruction holds on the
LoopExpression to compile. The loop needs to be massaged before we can compile
it to a lambda. The lambda we produce looks like:
int lambda(InterpretedFrame frame) {
T$1 loc$1 = (T$1)frame.Data[$index1];
...
T$n loc$n = (T$n)frame.Data[$indexN];
StrongBox<object> closure_loc$1 = frame.Closure[$index1];
...
StrongBox<object> closure_loc$M = frame.Closure[$indexM];
try {
... loc$1 = value ...
... closure_loc$1.Value = (object)value;
... return frame.Goto(labelIndex, value) // for each goto label (value),
where label is outside loop
} finally {
// write back
Frame.Data[$index1] = (object)loc$1;
}
return $breakOffset;
}
When the lambda is ready the EnterLoopInstruction is replaced by a
CompiledLoopInstruction that holds on a delegate to the compiled lambda and
calls it upon execution.
Perf impact
The interpreter thruput with disabled compilation is about 5% worse on Pystone
with this change. About 1% amounts for tracking interpreted stack chain the rest
is probably due to the more expensive try-finally blocks (continuation stack is
allocated, continuations are pushed/popped on entry/exit to try and finally
blocks, etc.).
-X:NoAdaptiveCompilation is now better than adaptive compilation only by 4-7%
(for compilation threshold 2 and 32, respectively), it used to be about 4 times
better.
Misc
Special cases adaptive compilation for CompilationThreshold 0 and 1. In both
cases the compilation is synchronous. This allows us to easily test and debug
loop compiler and lambda compiler.
Implements instruction provider for FinallyFlowControlExpression - the
interpreter handles jumps from finally directly, so we don''t need to
rewrite the tree.
FlowControlRewriter should reduce all extensible nodes within the tree. It might
miss some goto expressions or finally clauses otherwise (e.g. { label: try {
REDUCIBLE } finally { REDUCIBLE; } }, where any of the REDUCIBLEs reduces to
"goto label".
Ruby, Python:
CatchBlock defines a scope for its exception variable, which wasn''t
taken into account in Python and Ruby AST generators and rewriters. They
declared the variable in the containing block duplicating the variable
definition and depending on variable shadowing. Removes the duplicate
declarations.
Removes "compileLoops" argument passed to LightCompile. All loops are
adaptively compiled now.
Python
Adds missing debug info around for-loop initialization (see test_traceback.py
run:test_throw_while_yield)
Increases test_memory limit to 18k since the loop is adaptively compiled now. We
might want to disable adaptive compilation during this test.
Disables test_dict.py run:test_container_iterator. Filed bug:
http://ironpython.codeplex.com/WorkItem/View.aspx?WorkItemId=25419
Disables test_traceback.py run:test_throw_while_yield. Filed bug:
http://ironpython.codeplex.com/WorkItem/View.aspx?WorkItemId=25428
Ruby:
Fixes mangling of "me" name.
Disabled one test case in core/kernel/caller_spec.rb. The behavior that made
this test accidentally pass was incorrect.
Tomas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Loops20.diff
Type: application/octet-stream
Size: 85345 bytes
Desc: Loops20.diff
URL:
<http://rubyforge.org/pipermail/ironruby-core/attachments/20091124/f7a6223c/attachment-0001.obj>