Stefan Kanthak
2019-Aug-15 04:42 UTC
[klibc] Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
Hi, both https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__ashldi3.S and https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S use the following code sequences for shift counts greater 31: 1: 1: xorl %edx,%edx shrl %cl,%edx shl %cl,%eax xorl %eax,%eax ^ xchgl %edx,%eax xchgl %edx,%eax ret ret At least and especially on Intel processors XCHG was and still is a rather slow instruction and should be avoided. Use the following better code sequences instead: 1: 1: shll %cl,%eax shrl %cl,%edx movl %eax,%edx movl %edx,%eax xorl %eax,%eax xorl %edx,%edx ret ret regards Stefan Kanthak PS: I doubt that a current GCC emits calls of the routines in the /usr/klibc/arch/i386 subdirectory any more.
H. Peter Anvin
2019-Aug-19 22:51 UTC
[klibc] Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
On 8/14/19 9:42 PM, Stefan Kanthak wrote:> Hi, > > both > https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__ashldi3.S > and > https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S > use the following code sequences for shift counts greater 31: > > 1: 1: > xorl %edx,%edx shrl %cl,%edx > shl %cl,%eax xorl %eax,%eax > ^ > xchgl %edx,%eax xchgl %edx,%eax > ret ret > > At least and especially on Intel processors XCHG was and > still is a rather slow instruction and should be avoided. > Use the following better code sequences instead: > > 1: 1: > shll %cl,%eax shrl %cl,%edx > movl %eax,%edx movl %edx,%eax > xorl %eax,%eax xorl %edx,%edx > ret ret > > regards > Stefan Kanthak >XCHG is slow for register-memory operations due to implicit locking, but should be fine for register-register. Remember, too, that klibc is optimized for size.> PS: I doubt that a current GCC emits calls of the routines > in the /usr/klibc/arch/i386 subdirectory any more.Which, of course, is even better. -hpa
Stefan Kanthak
2019-Aug-20 00:35 UTC
[klibc] Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
"H. Peter Anvin" <hpa at zytor.com> wrote August 20, 2019 12:51 AM:> On 8/14/19 9:42 PM, Stefan Kanthak wrote: >> Hi, >> >> both >> https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__ashldi3.S >> and >> https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S >> use the following code sequences for shift counts greater 31: >> >> 1: 1: >> xorl %edx,%edx shrl %cl,%edx >> shl %cl,%eax xorl %eax,%eax >> ^ >> xchgl %edx,%eax xchgl %edx,%eax >> ret ret >> >> At least and especially on Intel processors XCHG was and >> still is a rather slow instruction and should be avoided. >> Use the following better code sequences instead: >> >> 1: 1: >> shll %cl,%eax shrl %cl,%edx >> movl %eax,%edx movl %edx,%eax >> xorl %eax,%eax xorl %edx,%edx >> ret ret >> >> regards >> Stefan Kanthak >> > > XCHG is slow for register-memory operations due to implicit locking, but > should be fine for register-register."but should be fine" is not enough: XCHG is of course slow for register- register operations too, otherwise I would not have spend time to write in. See https://stackoverflow.com/questions/45766444/why-is-xchg-reg-reg-a-3-micro-op-instruction-on-modern-intel-architectures or Agner Fogs http://www.agner.org/optimize/instruction_tables.pdf> Remember, too, that klibc is optimized for size.Remember that the linker aligns functions on 16 byte boundaries! With XCHG, these functions have a code size of 29 bytes; with MOV they grow by 1 byte.>> PS: I doubt that a current GCC emits calls of the routines >> in the /usr/klibc/arch/i386 subdirectory any more. > > Which, of course, is even better.... and means that you can get rid of this subdirectory! regards Stefan
Maybe Matching Threads
- Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
- Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
- [LLVMdev] Scheduling quirks
- [klibc 24/43] i386 support for klibc
- [PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support