Caldarale, Charles R
2014-Dec-22 02:55 UTC
[LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Herbie Robinson > Subject: Re: [LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences> > On 12/21/14 4:27 AM, Kuperstein, Michael M wrote: > > Which performance guidelines are you referring to?> Table C-21 in "Intel(r) 64 and IA-32 Architectures Optimization Reference Manual", September 2014.> It hasn't changed. It still lists push and pop instructions as 2-3 times more expensive as mov.And verified by Agner Fog's independent measurements: http://www.agner.org/optimize/instruction_tables.pdf The relevant Haswell numbers are on pages 186 - 187. -Chuck
Kuperstein, Michael M
2014-Dec-22 07:21 UTC
[LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences
The official Intel guide has less resolution here than we need. Consider that the MOV instruction described in C-21 only uses the ALU execution unit, not memory. If we look at Agner's table for Haswell, the relevant form of the MOV instruction (m,r) has a latency of 3 cycles and a reciprocal throughput of 1, same as a register PUSH. The same applies for other recent x86 processors. -----Original Message----- From: Caldarale, Charles R [mailto:Chuck.Caldarale at unisys.com] Sent: Monday, December 22, 2014 04:56 To: Herbie Robinson; Kuperstein, Michael M; LLVMdev at cs.uiuc.edu Subject: RE: [LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Herbie Robinson > Subject: Re: [LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences> > On 12/21/14 4:27 AM, Kuperstein, Michael M wrote: > > Which performance guidelines are you referring to?> Table C-21 in "Intel(r) 64 and IA-32 Architectures Optimization Reference Manual", September 2014.> It hasn't changed. It still lists push and pop instructions as 2-3 times more expensive as mov.And verified by Agner Fog's independent measurements: http://www.agner.org/optimize/instruction_tables.pdf The relevant Haswell numbers are on pages 186 - 187. -Chuck --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
Herbie Robinson
2014-Dec-22 18:41 UTC
[LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences
But the r,m move has a reciprocal throughput of 0.5 while the pop is 1. It sounds like this should be optional as you have already proposed. On 12/22/14 2:21 AM, Kuperstein, Michael M wrote:> The official Intel guide has less resolution here than we need. > Consider that the MOV instruction described in C-21 only uses the ALU execution unit, not memory. > > If we look at Agner's table for Haswell, the relevant form of the MOV instruction (m,r) has a latency of 3 cycles and a reciprocal throughput of 1, same as a register PUSH. > The same applies for other recent x86 processors. > > -----Original Message----- > From: Caldarale, Charles R [mailto:Chuck.Caldarale at unisys.com] > Sent: Monday, December 22, 2014 04:56 > To: Herbie Robinson; Kuperstein, Michael M; LLVMdev at cs.uiuc.edu > Subject: RE: [LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences > >> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] >> On Behalf Of Herbie Robinson >> Subject: Re: [LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences >>> On 12/21/14 4:27 AM, Kuperstein, Michael M wrote: >>> Which performance guidelines are you referring to? >> Table C-21 in "Intel(r) 64 and IA-32 Architectures Optimization Reference Manual", September 2014. >> It hasn't changed. It still lists push and pop instructions as 2-3 times more expensive as mov. > And verified by Agner Fog's independent measurements: > http://www.agner.org/optimize/instruction_tables.pdf > > The relevant Haswell numbers are on pages 186 - 187. > > -Chuck > > --------------------------------------------------------------------- > Intel Israel (74) Limited > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies. > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >