thr3ads.net - llvm dev - [LLVMdev] Macro-op fusion experiment [Apr 2011]

If this information is useful, please help other people find it:
Share via:

Jakob Stoklund Olesen

2011-Apr-08 17:27 UTC

[LLVMdev] Macro-op fusion experiment

On Apr 8, 2011, at 9:56 AM, NAKAMURA Takumi wrote:
>>>                 8B C3 mov eax, ebx
>>>                 03 C1 add eax, ecx
>>> becomes
>>>                 8B C3 03 C1 add eax, ebx, ecx
> 
> In my understanding, twoaddr pass tends to emit such a sequence.
Yes, it always does, and the coalescer tries very hard to eliminate the copy.
> Though I don't have sandybridge, I have not measured.
> Prior processors(intel and amd) might spend 1 ALU to execute
"mov",
> then mov - add must have dependency.
I think you will find it is more complicated than that. A 'mov' usually
doesn't need an ALU resource.

You should read about the 'reservation station' style register renaming.

http://en.wikipedia.org/wiki/Register_renaming
http://www.intel.com/Assets/PDF/manual/248966.pdf

/jakob

Nicolas Capens

2011-Apr-17 16:24 UTC

head link

[LLVMdev] Macro-op fusion experiment

Hi Jacob,

As far as I know, an x86 'mov' instruction always 

On 08 Apr 2011, at 19:27, Jakob Stoklund Olesen <stoklund at 2pi.dk>
wrote:
> 
> On Apr 8, 2011, at 9:56 AM, NAKAMURA Takumi wrote:
> 
>>>>                8B C3 mov eax, ebx
>>>>                03 C1 add eax, ecx
>>>> becomes
>>>>                8B C3 03 C1 add eax, ebx, ecx
>> 
>> In my understanding, twoaddr pass tends to emit such a sequence.
> 
> Yes, it always does, and the coalescer tries very hard to eliminate the
copy.
> 
>> Though I don't have sandybridge, I have not measured.
>> Prior processors(intel and amd) might spend 1 ALU to execute
"mov",
>> then mov - add must have dependency.
> 
> I think you will find it is more complicated than that. A 'mov'
usually doesn't need an ALU resource.
> 
> You should read about the 'reservation station' style register
renaming.
> 
> http://en.wikipedia.org/wiki/Register_renaming
> http://www.intel.com/Assets/PDF/manual/248966.pdf
> 
> /jakob
>

Nicolas Capens

2011-Apr-17 16:59 UTC

head link

[LLVMdev] Macro-op fusion experiment

Hi Jacob,

As far as I know, an x86 'mov' instruction always uses an ALU resource.
According to Agner Fog's documents (http://www.agner.org/optimize/), it can
execute on port 0, 1 or 5 on recent architectures though. So it's not that
likely to be resource limited. But it still occupies an instruction slot
throughout the entire pipeline, costing power and potentially limiting other
actual arithmetic instructions from scheduling optimally. Also, it has a
latency of 1 cycle, while non-destructive instructions would shorten the
latency of dependent instructions.

My immediate concern is getting a reasonable estimate for how often this
macro-op fusion could be performed. This could then be used to evaluate
whether it's worth the added decoder complexity.

Cheers,
Nicolas

On Fri, Apr 8, 2011 at 7:27 PM, Jakob Stoklund Olesen <stoklund at
2pi.dk>wrote:
>
> On Apr 8, 2011, at 9:56 AM, NAKAMURA Takumi wrote:
>
> >>>                 8B C3 mov eax, ebx
> >>>                 03 C1 add eax, ecx
> >>> becomes
> >>>                 8B C3 03 C1 add eax, ebx, ecx
> >
> > In my understanding, twoaddr pass tends to emit such a sequence.
>
> Yes, it always does, and the coalescer tries very hard to eliminate the
> copy.
>
> > Though I don't have sandybridge, I have not measured.
> > Prior processors(intel and amd) might spend 1 ALU to execute
"mov",
> > then mov - add must have dependency.
>
> I think you will find it is more complicated than that. A 'mov'
usually
> doesn't need an ALU resource.
>
> You should read about the 'reservation station' style register
renaming.
>
> http://en.wikipedia.org/wiki/Register_renaming
> http://www.intel.com/Assets/PDF/manual/248966.pdf
>
> /jakob
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20110417/8f1cd34f/attachment.html>

Jakob Stoklund Olesen

2011-Apr-17 18:19 UTC

head link

[LLVMdev] Macro-op fusion experiment

On Apr 17, 2011, at 9:59 AM, Nicolas Capens wrote:
> My immediate concern is getting a reasonable estimate for how often this
macro-op fusion could be performed. This could then be used to evaluate whether
it's worth the added decoder complexity.
In that case, just look at the generated code. I don't think any pass is
inserting instructions between 'mov' and two-address arithmetic
instructions.

/jakob

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Apr 2011 - [LLVMdev] Macro-op fusion experiment

[LLVMdev] Macro-op fusion experiment

[LLVMdev] Macro-op fusion experiment

[LLVMdev] Macro-op fusion experiment

[LLVMdev] Macro-op fusion experiment

Reasonably Related Threads