On Apr 8, 2011, at 3:29 AM, Nicolas Capens wrote:
> x86 processors use macro-op fusion to merge together two instructions and
execute them as one. So it's beneficial for the compiler to emit them as a
pair.
>
> Currently only compare and jump instructions get fused though. And I was
wondering whether it also makes sense to fuse move and arithmetic instructions
together, to form non-destructive instructions (which x86 lacks for regular
instructions). For instance:
> 8B C3 mov eax, ebx
> 03 C1 add eax, ecx
> becomes
> 8B C3 03 C1 add eax, ebx, ecx
>
> There's no difference in the binary encoding; it's just considered
one instruction at a logical level and inside the hardware (I'm assuming
x86's RISC internals actually use non-destructive micro-operations).
Most x86 implementations use register renaming these days, so micro-operations
are non-destructive, but they don't refer to architectural registers. They
refer to a larger number of real registers.
Register copies are mostly free to execute except they increase code size and
consume decoder resources. To my knowledge, they are not fused in the way you
describe.
Intel's optimization reference manual describes which instructions can be
fused. The Sandy Bridge processors fuse more pairs than previous generations,
but the second instruction is always a conditional branch.
There is no need to define pseudo-instructions to support this. If you want to
experiment, you could add a late pass that tries to form fusable pairs by
pushing instructions down to the conditional branch. This should happen after
register allocation where code is often inserted before a branch.
I would be interested to see the performance impact of such a pass.
/jakob