thr3ads.net - llvm dev - [LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences [Dec 2014]

If this information is useful, please help other people find it:
Share via:

Caldarale, Charles R

2014-Dec-22 02:55 UTC

[LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences

> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu]
> On Behalf Of Herbie Robinson
> Subject: Re: [LLVMdev] [RFC] [X86] Mov to push transformation in x86-32
call sequences
> > On 12/21/14 4:27 AM, Kuperstein, Michael M wrote:
> > Which performance guidelines are you referring to?
> Table C-21 in "Intel(r) 64 and IA-32 Architectures Optimization
Reference Manual", September 2014.
> It hasn't changed.  It still lists push and pop instructions as 2-3
times more expensive as mov.
And verified by Agner Fog's independent measurements: 
http://www.agner.org/optimize/instruction_tables.pdf

The relevant Haswell numbers are on pages 186 - 187.

 -Chuck

Kuperstein, Michael M

2014-Dec-22 07:21 UTC

head link

[LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences

The official Intel guide has less resolution here than we need.
Consider that the MOV instruction described in C-21 only uses the ALU execution
unit, not memory.

If we look at Agner's table for Haswell, the relevant form of the MOV
instruction (m,r) has a latency of 3 cycles and a reciprocal throughput of 1,
same as a register PUSH.
The same applies for other recent x86 processors.

-----Original Message-----
From: Caldarale, Charles R [mailto:Chuck.Caldarale at unisys.com] 
Sent: Monday, December 22, 2014 04:56
To: Herbie Robinson; Kuperstein, Michael M; LLVMdev at cs.uiuc.edu
Subject: RE: [LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call
sequences
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu]
> On Behalf Of Herbie Robinson
> Subject: Re: [LLVMdev] [RFC] [X86] Mov to push transformation in x86-32
call sequences
> > On 12/21/14 4:27 AM, Kuperstein, Michael M wrote:
> > Which performance guidelines are you referring to?
> Table C-21 in "Intel(r) 64 and IA-32 Architectures Optimization
Reference Manual", September 2014.
> It hasn't changed.  It still lists push and pop instructions as 2-3
times more expensive as mov.
And verified by Agner Fog's independent measurements: 
http://www.agner.org/optimize/instruction_tables.pdf

The relevant Haswell numbers are on pages 186 - 187.

 -Chuck

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Herbie Robinson

2014-Dec-22 18:41 UTC

head link

[LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences

But the r,m move has a reciprocal throughput of 0.5 while the pop is 1.

It sounds like this should be optional as you have already proposed.

On 12/22/14 2:21 AM, Kuperstein, Michael M wrote:> The official Intel guide has less resolution here than we need.
> Consider that the MOV instruction described in C-21 only uses the ALU
execution unit, not memory.
>
> If we look at Agner's table for Haswell, the relevant form of the MOV
instruction (m,r) has a latency of 3 cycles and a reciprocal throughput of 1,
same as a register PUSH.
> The same applies for other recent x86 processors.
>
> -----Original Message-----
> From: Caldarale, Charles R [mailto:Chuck.Caldarale at unisys.com]
> Sent: Monday, December 22, 2014 04:56
> To: Herbie Robinson; Kuperstein, Michael M; LLVMdev at cs.uiuc.edu
> Subject: RE: [LLVMdev] [RFC] [X86] Mov to push transformation in x86-32
call sequences
>
>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu]
>> On Behalf Of Herbie Robinson
>> Subject: Re: [LLVMdev] [RFC] [X86] Mov to push transformation in x86-32
call sequences
>>> On 12/21/14 4:27 AM, Kuperstein, Michael M wrote:
>>> Which performance guidelines are you referring to?
>> Table C-21 in "Intel(r) 64 and IA-32 Architectures Optimization
Reference Manual", September 2014.
>> It hasn't changed.  It still lists push and pop instructions as 2-3
times more expensive as mov.
> And verified by Agner Fog's independent measurements:
> http://www.agner.org/optimize/instruction_tables.pdf
>
> The relevant Haswell numbers are on pages 186 - 187.
>
>   -Chuck
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

llvm dev - Dec 2014 - [LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences

[LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences

[LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences

[LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences