thr3ads.net - similar to: "Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/_

Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S

2019 Aug 20

1

Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S

"H. Peter Anvin" <hpa at zytor.com> wrote August 20, 2019 12:51 AM: > On 8/14/19 9:42 PM, Stefan Kanthak wrote: >> Hi, >> >> both >> https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__ashldi3.S >> and >> https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S

Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S

2019 Aug 19

0

Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S

On 8/14/19 9:42 PM, Stefan Kanthak wrote: > Hi, > > both > https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__ashldi3.S > and > https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S > use the following code sequences for shift counts greater 31: > > 1: 1: >

[LLVMdev] Scheduling quirks

2014 Jan 18

2

[LLVMdev] Scheduling quirks

Hello all! When I compile the following more or less stupid functions with clang++ -O3 -S test.cpp ===> int test_register(int x) { x ^= (x >> 2); x ^= (x >> 3); x = x ^ (x >> 4); int y = x; x >>= 5; x ^= y; // almost the same but explicit return x; } int test_scheduler(int x) { return ((x>>2) & 15) ^ ((x>>3) & 31); }

[klibc 24/43] i386 support for klibc

2006 Jun 26

0

[klibc 24/43] i386 support for klibc

The parts of klibc specific to the i386 architecture. Signed-off-by: H. Peter Anvin <hpa at zytor.com> --- commit bd0599e5290ca1a16bb7a68f7c362d395c612eb3 tree 8f33afdd02a14c22e7a3984da2bad13184e3f729 parent 84f6a72f42cf41e32daa59871a0b5424572093e4 author H. Peter Anvin <hpa at zytor.com> Sun, 25 Jun 2006 16:58:21 -0700 committer H. Peter Anvin <hpa at zytor.com> Sun, 25 Jun

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

2017 Oct 20

1

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

On 20 October 2017 at 09:24, Ingo Molnar <mingo at kernel.org> wrote: > > * Thomas Garnier <thgarnie at google.com> wrote: > >> Change the assembly code to use only relative references of symbols for the >> kernel to be PIE compatible. >> >> Position Independent Executable (PIE) support will allow to extended the >> KASLR randomization range below

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

2017 Oct 20

1

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

On 20 October 2017 at 09:24, Ingo Molnar <mingo at kernel.org> wrote: > > * Thomas Garnier <thgarnie at google.com> wrote: > >> Change the assembly code to use only relative references of symbols for the >> kernel to be PIE compatible. >> >> Position Independent Executable (PIE) support will allow to extended the >> KASLR randomization range below

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

2017 Oct 11

1

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

Change the assembly code to use only relative references of symbols for the kernel to be PIE compatible. Position Independent Executable (PIE) support will allow to extended the KASLR randomization range below the -2G memory limit. Signed-off-by: Thomas Garnier <thgarnie at google.com> --- arch/x86/crypto/aes-x86_64-asm_64.S | 45 ++++++++----- arch/x86/crypto/aesni-intel_asm.S

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

3

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote: > On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave at stgolabs.net> wrote: > > > > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is > > constantly cheaper (by at least half the latency) than MFENCE. While there > > was a decent amount of variation, this difference

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

2016 Jan 12

3

[PATCH 3/4] x86,asm: Re-work smp_store_mb()

On Mon, Nov 02, 2015 at 04:06:46PM -0800, Linus Torvalds wrote: > On Mon, Nov 2, 2015 at 12:15 PM, Davidlohr Bueso <dave at stgolabs.net> wrote: > > > > So I ran some experiments on an IvyBridge (2.8GHz) and the cost of XCHG is > > constantly cheaper (by at least half the latency) than MFENCE. While there > > was a decent amount of variation, this difference

[LLVMdev] [llvm-commits] rotate

2012 Jul 31

0

[LLVMdev] [llvm-commits] rotate

On Tue, Jul 31, 2012 at 8:42 AM, Cameron McInally <cameron.mcinally at nyu.edu> wrote: > Andy, > > Here is the left circular shift operator patch. I apologize to the reviewer > in advance. The patch has a good bit of fine detail. Any > comments/criticisms? > > Some caveats... > > 1) This is just the bare minimum needed to make the left circular shift > operator

kernel: locore.s doesn't assemble (fillkpt, $PAGE_SHIFT, $PTESHIFT)

2003 Aug 22

2

kernel: locore.s doesn't assemble (fillkpt, $PAGE_SHIFT, $PTESHIFT)

since august 8th, 2003 the kernel on my i386 pentiumIII won't compile. the problem arises in locore.s with the definition of the constants $PAGE_SHIFT and $PTESHIFT used in `shr' and `shl' instructions within the macros `fillkpt' and `fillkptphys'. i've tried to cvsup(1) RELENG_4 and RELENG_4_8 every day for over a week now, but kernel builds (as part of a buildworld)

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

2017 Oct 20

0

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

* Thomas Garnier <thgarnie at google.com> wrote: > Change the assembly code to use only relative references of symbols for the > kernel to be PIE compatible. > > Position Independent Executable (PIE) support will allow to extended the > KASLR randomization range below the -2G memory limit. > diff --git a/arch/x86/crypto/aes-x86_64-asm_64.S

[LLVMdev] X86TargetLowering::LowerToBT

2015 Jan 23

2

[LLVMdev] X86TargetLowering::LowerToBT

I suspect that this is because the mask in your example is the result of a variable shift, which (a) has it’s own performance and flags hazards pre-SHLX and (b) requires additional µops to do with TEST. I expect that ICC is putting a dummy TEST or XOR ahead of the BT to break the false flags dependency, as well. If the mask were constant, I expect ICC would generate TEST instead (but I don’t

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

2017 Oct 11

32

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce dynamic relocation space on mapped memory. It also simplifies the relocation process. - Move the start the module section next to the kernel. Remove the need for -mcmodel=large on modules. Extends

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

2017 Oct 11

32

[PATCH v1 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce dynamic relocation space on mapped memory. It also simplifies the relocation process. - Move the start the module section next to the kernel. Remove the need for -mcmodel=large on modules. Extends

BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

2018 Nov 25

3

BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

bswapdi2 for i386 is correct Bits 31:0 of the source are loaded into edx. Bits 63:32 are loaded into eax. Those are each bswapped. The ABI for the return is edx contains bits [63:32] and eax contains [31:0]. This is opposite of how the register were loaded. ~Craig On Sun, Nov 25, 2018 at 10:36 AM Craig Topper <craig.topper at gmail.com> wrote: > bswapsi2 on the x86-64 isn't using

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

2018 Mar 13

32

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

2018 Mar 13

32

[PATCH v2 00/27] x86: PIE support and option to extend KASLR randomization

Changes: - patch v2: - Adapt patch to work post KPTI and compiler changes - Redo all performance testing with latest configs and compilers - Simplify mov macro on PIE (MOVABS now) - Reduce GOT footprint - patch v1: - Simplify ftrace implementation. - Use gcc mstack-protector-guard-reg=%gs with PIE when possible. - rfc v3: - Use --emit-relocs instead of -pie to reduce

[LLVMdev] rotate

2012 Jul 29

3

[LLVMdev] rotate

Nice! Clever compiler.. On 07/28/2012 08:55 PM, Michael Gottesman wrote: > I can get clang/llvm to emit a rotate instruction on x86-64 when compiling C by just using -Os and the rotate from Hacker's Delight i.e., > > ====== > #include<stdlib.h> > #include<stdint.h> > > uint32_t ror(uint32_t input, size_t rot_bits) > { > return (input>>

[LLVMdev] lli/JIT missing libgcc symbols on Mingw32/x86

2008 Jun 05

1

[LLVMdev] lli/JIT missing libgcc symbols on Mingw32/x86

Hello, I have a bytecode doing 64 bits division and on Mingw32/x86, lli complains it cannot resolve __udivdi3 when running it. Those symbols are all part of libgcc and all present in lli, but they cannot be found by SearchForAddressOfSymbol (not in any DLL). To workaround that, I explicitely define them in Win32/DynamicLibrary.inc if the current target is Mingw32 (patch attached). Anybody had

similar to: Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S

similar to: Slow XCHG in arch/i386/libgcc/ashrdi3.S and arch/i386/libgcc/lshrdi3.S