Displaying 8 results from an estimated 8 matches for "instruction_tables".
2016 Jan 21
2
Adding support for self-modifying branches to LLVM?
On 01/19/2016 09:04 PM, Sean Silva via llvm-dev wrote:
>
> AFAIK, the cost of a well-predicted, not-taken branch is the same as a
> nop on every x86 made in the last many years.
> See http://www.agner.org/optimize/instruction_tables.pdf
> <http://www.agner.org/optimize/instruction_tables.pdf>
> Generally speaking a correctly-predicted not-taken branch is basically
> identical to a nop, and a correctly-predicted taken branch is has an
> extra overhead similar to an "add" or other extremely cheap op...
2019 May 13
3
How shall I evaluate the latency of each instruction in LLVM IR?
Inspired by https://www.agner.org/optimize/instruction_tables.pdf, which
gives us the latency and reciprocal throughput of each instruction in the
different architecture of X86, Is there anybody taking the effort to do a
similar job for LLVM IR?
Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org...
2014 Dec 22
2
[LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences
...C-21 in "Intel(r) 64 and IA-32 Architectures Optimization Reference Manual", September 2014.
> It hasn't changed. It still lists push and pop instructions as 2-3 times more expensive as mov.
And verified by Agner Fog's independent measurements:
http://www.agner.org/optimize/instruction_tables.pdf
The relevant Haswell numbers are on pages 186 - 187.
-Chuck
2019 Aug 20
1
Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
...ot; is not enough: XCHG is of course slow for register-
register operations too, otherwise I would not have spend time to write in.
See https://stackoverflow.com/questions/45766444/why-is-xchg-reg-reg-a-3-micro-op-instruction-on-modern-intel-architectures
or Agner Fogs http://www.agner.org/optimize/instruction_tables.pdf
> Remember, too, that klibc is optimized for size.
Remember that the linker aligns functions on 16 byte boundaries!
With XCHG, these functions have a code size of 29 bytes; with MOV
they grow by 1 byte.
>> PS: I doubt that a current GCC emits calls of the routines
>> in t...
2016 Jan 19
4
Adding support for self-modifying branches to LLVM?
Hi,
I’m thinking about using LLVM to implement a limited form of self-modifying
code. Before diving into that, I’d like to get some feedback from you all.
*The goal:* I’d like to add “optional” code to a program that I can enable
at runtime and that has zero (i.e., as close to zero as I can get) overhead
when not enabled.
*Existing solutions:* Currently, I can guard optional code using a
2019 Aug 15
2
Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
Hi,
both
https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__ashldi3.S
and
https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S
use the following code sequences for shift counts greater 31:
1: 1:
xorl %edx,%edx shrl %cl,%edx
shl %cl,%eax xorl %eax,%eax
2016 Jan 21
3
Adding support for self-modifying branches to LLVM?
...es.com>> wrote:
>
>
>
> On 01/19/2016 09:04 PM, Sean Silva via llvm-dev wrote:
>>
>> AFAIK, the cost of a well-predicted, not-taken branch is the same
>> as a nop on every x86 made in the last many years. See
>> http://www.agner.org/optimize/instruction_tables.pdf
>> Generally speaking a correctly-predicted not-taken branch is
>> basically identical to a nop, and a correctly-predicted taken
>> branch is has an extra overhead similar to an "add" or other
>> extremely cheap operation.
> Specifically...
2014 Dec 21
2
[LLVMdev] [RFC] [X86] Mov to push transformation in x86-32 call sequences
Which performance guidelines are you referring to?
I'm not that familiar with decade-old CPUs, but to the best of my knowledge, this is not true on current hardware.
There is one specific circumstance where PUSHes should be avoided - for Atom/Silvermont processors, the memory form of PUSH is inefficient, so the register-freeing optimization below may not be profitable (see 14.3.3.6 and