thr3ads.net - similar to: "[LLVMdev] Scheduling quirks"

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

2017 Oct 20

1

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

On 20 October 2017 at 09:24, Ingo Molnar <mingo at kernel.org> wrote: > > * Thomas Garnier <thgarnie at google.com> wrote: > >> Change the assembly code to use only relative references of symbols for the >> kernel to be PIE compatible. >> >> Position Independent Executable (PIE) support will allow to extended the >> KASLR randomization range below

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

2017 Oct 20

1

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

On 20 October 2017 at 09:24, Ingo Molnar <mingo at kernel.org> wrote: > > * Thomas Garnier <thgarnie at google.com> wrote: > >> Change the assembly code to use only relative references of symbols for the >> kernel to be PIE compatible. >> >> Position Independent Executable (PIE) support will allow to extended the >> KASLR randomization range below

Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S

2019 Aug 15

2

Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S

Hi, both https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__ashldi3.S and https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S use the following code sequences for shift counts greater 31: 1: 1: xorl %edx,%edx shrl %cl,%edx shl %cl,%eax xorl %eax,%eax

Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S

2019 Aug 20

1

Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S

"H. Peter Anvin" <hpa at zytor.com> wrote August 20, 2019 12:51 AM: > On 8/14/19 9:42 PM, Stefan Kanthak wrote: >> Hi, >> >> both >> https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__ashldi3.S >> and >> https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

2017 Oct 11

1

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

Change the assembly code to use only relative references of symbols for the kernel to be PIE compatible. Position Independent Executable (PIE) support will allow to extended the KASLR randomization range below the -2G memory limit. Signed-off-by: Thomas Garnier <thgarnie at google.com> --- arch/x86/crypto/aes-x86_64-asm_64.S | 45 ++++++++----- arch/x86/crypto/aesni-intel_asm.S

[LLVMdev] Checked arithmetic

2008 Mar 26

2

[LLVMdev] Checked arithmetic

Hi Chris, > Why not define an "add with overflow" intrinsic that returns its value and > overflow bit as an i1? what's the point? We have this today with apint codegen (if you turn on LegalizeTypes). For example, this function define i1 @cc(i32 %x, i32 %y) { %xx = zext i32 %x to i33 %yy = zext i32 %y to i33 %s = add i33 %xx, %yy %tmp = lshr i33 %s, 32 %b = trunc

[LLVMdev] X86TargetLowering::LowerToBT

2015 Jan 23

2

[LLVMdev] X86TargetLowering::LowerToBT

I suspect that this is because the mask in your example is the result of a variable shift, which (a) has it’s own performance and flags hazards pre-SHLX and (b) requires additional µops to do with TEST. I expect that ICC is putting a dummy TEST or XOR ahead of the BT to break the false flags dependency, as well. If the mask were constant, I expect ICC would generate TEST instead (but I don’t

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

2017 Oct 20

0

[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support

* Thomas Garnier <thgarnie at google.com> wrote: > Change the assembly code to use only relative references of symbols for the > kernel to be PIE compatible. > > Position Independent Executable (PIE) support will allow to extended the > KASLR randomization range below the -2G memory limit. > diff --git a/arch/x86/crypto/aes-x86_64-asm_64.S

Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S

2019 Aug 19

0

Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S

On 8/14/19 9:42 PM, Stefan Kanthak wrote: > Hi, > > both > https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__ashldi3.S > and > https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S > use the following code sequences for shift counts greater 31: > > 1: 1: >

[klibc 24/43] i386 support for klibc

2006 Jun 26

0

[klibc 24/43] i386 support for klibc

The parts of klibc specific to the i386 architecture. Signed-off-by: H. Peter Anvin <hpa at zytor.com> --- commit bd0599e5290ca1a16bb7a68f7c362d395c612eb3 tree 8f33afdd02a14c22e7a3984da2bad13184e3f729 parent 84f6a72f42cf41e32daa59871a0b5424572093e4 author H. Peter Anvin <hpa at zytor.com> Sun, 25 Jun 2006 16:58:21 -0700 committer H. Peter Anvin <hpa at zytor.com> Sun, 25 Jun

[RFC] New pass: LoopExitValues

2015 Sep 01

2

[RFC] New pass: LoopExitValues

On Mon, Aug 31, 2015 at 5:52 PM, Jake VanAdrighem <jvanadrighem at gmail.com> wrote: > Do you have some specific performance measurements? Averaging 4 runs of 10000 iterations each of Coremark on my X86_64 desktop showed: -O2 performance: +2.9% faster with the L.E.V. pass -Os size: 1.5% smaller with the L.E.V. pass In the case of Coremark, the benefit comes mainly from the matrix

[PATCH] efi: leaving long mode in kernel_jump routine

2015 Aug 04

13

[PATCH] efi: leaving long mode in kernel_jump routine

Syslinux 6.03 (efi64) fails to boot a 32-bit kernel. The way Syslinux leaves long mode in kernel_jump assembly routine does not follow AMD64 specifications. More precisely: 1. After setting a new GADT, `cs` has to be refresh by doing a long jump, but it is not 2. Other segments have to be updated, but they are not 3. Disabling paging has to be done before disabling long mode, but the

[PATCH v2 00/11] xen: Initial kexec/kdump implementation

2012 Nov 20

12

[PATCH v2 00/11] xen: Initial kexec/kdump implementation

Hi, This set of patches contains initial kexec/kdump implementation for Xen v2 (previous version were posted to few people by mistake; sorry for that). Currently only dom0 is supported, however, almost all infrustructure required for domU support is ready. Jan Beulich suggested to merge Xen x86 assembler code with baremetal x86 code. This could simplify and reduce a bit size of kernel code.

[PATCH v2 00/11] xen: Initial kexec/kdump implementation

2012 Nov 20

12

[PATCH v2 00/11] xen: Initial kexec/kdump implementation

Hi, This set of patches contains initial kexec/kdump implementation for Xen v2 (previous version were posted to few people by mistake; sorry for that). Currently only dom0 is supported, however, almost all infrustructure required for domU support is ready. Jan Beulich suggested to merge Xen x86 assembler code with baremetal x86 code. This could simplify and reduce a bit size of kernel code.

[PATCH v2 00/11] xen: Initial kexec/kdump implementation

2012 Nov 20

12

[PATCH v2 00/11] xen: Initial kexec/kdump implementation

Hi, This set of patches contains initial kexec/kdump implementation for Xen v2 (previous version were posted to few people by mistake; sorry for that). Currently only dom0 is supported, however, almost all infrustructure required for domU support is ready. Jan Beulich suggested to merge Xen x86 assembler code with baremetal x86 code. This could simplify and reduce a bit size of kernel code.

[RFC] PT.2 Add IR level interprocedural outliner for code size.

2017 Sep 28

0

[RFC] PT.2 Add IR level interprocedural outliner for code size.

On Wed, Sep 27, 2017 at 6:07 PM, Matthias Braun <mbraun at apple.com> wrote: > > On Sep 27, 2017, at 3:23 PM, Davide Italiano via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > On Wed, Sep 27, 2017 at 9:28 AM, Jessica Paquette via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > I think that, given previous discussion on the topic, we might want

[X86][AVX512] RFC: make i1 illegal in the Codegen

2017 Jan 24

7

[X86][AVX512] RFC: make i1 illegal in the Codegen

Hi All, AVX-512 introduced the K mask registers and masked operations which make a natural choice for legalizing vectors of i1's. For example, define <8 x i32> @foo(<8 x i32>%a, <8 x i32*> %p) { %r = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %p, i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>,

[LLVMdev] Compiling integer mod

2012 Mar 27

1

[LLVMdev] Compiling integer mod

For the simple C program below I show the output of clang and the output of the VS compiler (I am on windows). Maybe this is obvious to you, but is it really faster to do 2 multiplications, 3 movl instructions, 2 shifts, 1 add, and 1 substract than to do 1 mov, 1 cdq, and 1 idiv? I run into this while trying to understand why my code runs slower with llvm than a comparable program on windows.

[LLVMdev] Checked arithmetic

2008 Mar 26

0

[LLVMdev] Checked arithmetic

On Wed, 26 Mar 2008, Duncan Sands wrote: > Hi Chris, > >> Why not define an "add with overflow" intrinsic that returns its value and >> overflow bit as an i1? > > what's the point? We have this today with apint codegen (if you turn on > LegalizeTypes). For example, this function The desired code is something like: foo: addl %eax, %ecx jo

[LLVMdev] Checked arithmetic

2008 Mar 26

0

[LLVMdev] Checked arithmetic

On Wed, 26 Mar 2008, Jonathan S. Shapiro wrote: > I want to background process this for a bit, but it would be helpful to > discuss some approaches first. > > There would appear to be three approaches: > > 1. Introduce a CC register class into the IR. This seems to be a > fairly major overhaul. > > 2. Introduce a set of scalar and fp computation quasi-instructions

similar to: [LLVMdev] Scheduling quirks