search for: instruction_tables

Displaying 8 results from an estimated 8 matches for "instruction_tables".

2016 Jan 21
2
Adding support for self-modifying branches to LLVM?
On 01/19/2016 09:04 PM, Sean Silva via llvm-dev wrote: > > AFAIK, the cost of a well-predicted, not-taken branch is the same as a > nop on every x86 made in the last many years. > See http://www.agner.org/optimize/instruction_tables.pdf > <http://www.agner.org/optimize/instruction_tables.pdf> > Generally speaking a correctly-predicted not-taken branch is basically > identical to a nop, and a correctly-predicted taken branch is has an > extra overhead similar to an "add" or other extremely cheap op...
2019 May 13
3
How shall I evaluate the latency of each instruction in LLVM IR?
Inspired by https://www.agner.org/optimize/instruction_tables.pdf, which gives us the latency and reciprocal throughput of each instruction in the different architecture of X86, Is there anybody taking the effort to do a similar job for LLVM IR? Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org...
2014 Dec 22
2
[LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences
...C-21 in "Intel(r) 64 and IA-32 Architectures Optimization Reference Manual", September 2014. > It hasn't changed. It still lists push and pop instructions as 2-3 times more expensive as mov. And verified by Agner Fog's independent measurements: http://www.agner.org/optimize/instruction_tables.pdf The relevant Haswell numbers are on pages 186 - 187. -Chuck
2019 Aug 20
1
Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
...ot; is not enough: XCHG is of course slow for register- register operations too, otherwise I would not have spend time to write in. See https://stackoverflow.com/questions/45766444/why-is-xchg-reg-reg-a-3-micro-op-instruction-on-modern-intel-architectures or Agner Fogs http://www.agner.org/optimize/instruction_tables.pdf > Remember, too, that klibc is optimized for size. Remember that the linker aligns functions on 16 byte boundaries! With XCHG, these functions have a code size of 29 bytes; with MOV they grow by 1 byte. >> PS: I doubt that a current GCC emits calls of the routines >> in t...
2016 Jan 19
4
Adding support for self-modifying branches to LLVM?
Hi, I’m thinking about using LLVM to implement a limited form of self-modifying code. Before diving into that, I’d like to get some feedback from you all. *The goal:* I’d like to add “optional” code to a program that I can enable at runtime and that has zero (i.e., as close to zero as I can get) overhead when not enabled. *Existing solutions:* Currently, I can guard optional code using a
2019 Aug 15
2
Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
Hi, both https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__ashldi3.S and https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S use the following code sequences for shift counts greater 31: 1: 1: xorl %edx,%edx shrl %cl,%edx shl %cl,%eax xorl %eax,%eax
2016 Jan 21
3
Adding support for self-modifying branches to LLVM?
...es.com>> wrote: > > > > On 01/19/2016 09:04 PM, Sean Silva via llvm-dev wrote: >> >> AFAIK, the cost of a well-predicted, not-taken branch is the same >> as a nop on every x86 made in the last many years. See >> http://www.agner.org/optimize/instruction_tables.pdf >> Generally speaking a correctly-predicted not-taken branch is >> basically identical to a nop, and a correctly-predicted taken >> branch is has an extra overhead similar to an "add" or other >> extremely cheap operation. > Specifically...
2014 Dec 21
2
[LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences
Which performance guidelines are you referring to? I'm not that familiar with decade-old CPUs, but to the best of my knowledge, this is not true on current hardware. There is one specific circumstance where PUSHes should be avoided - for Atom/Silvermont processors, the memory form of PUSH is inefficient, so the register-freeing optimization below may not be profitable (see 14.3.3.6 and