similar to: [LLVMdev] Scheduling quirks

Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] Scheduling quirks"

2017 Oct 20
1
[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support
On 20 October 2017 at 09:24, Ingo Molnar <mingo at kernel.org> wrote: > > * Thomas Garnier <thgarnie at google.com> wrote: > >> Change the assembly code to use only relative references of symbols for the >> kernel to be PIE compatible. >> >> Position Independent Executable (PIE) support will allow to extended the >> KASLR randomization range below
2017 Oct 20
1
[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support
On 20 October 2017 at 09:24, Ingo Molnar <mingo at kernel.org> wrote: > > * Thomas Garnier <thgarnie at google.com> wrote: > >> Change the assembly code to use only relative references of symbols for the >> kernel to be PIE compatible. >> >> Position Independent Executable (PIE) support will allow to extended the >> KASLR randomization range below
2019 Aug 15
2
Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
Hi, both https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__ashldi3.S and https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S use the following code sequences for shift counts greater 31: 1: 1: xorl %edx,%edx shrl %cl,%edx shl %cl,%eax xorl %eax,%eax
2019 Aug 20
1
Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
"H. Peter Anvin" <hpa at zytor.com> wrote August 20, 2019 12:51 AM: > On 8/14/19 9:42 PM, Stefan Kanthak wrote: >> Hi, >> >> both >> https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__ashldi3.S >> and >> https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S
2017 Oct 11
1
[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support
Change the assembly code to use only relative references of symbols for the kernel to be PIE compatible. Position Independent Executable (PIE) support will allow to extended the KASLR randomization range below the -2G memory limit. Signed-off-by: Thomas Garnier <thgarnie at google.com> --- arch/x86/crypto/aes-x86_64-asm_64.S | 45 ++++++++----- arch/x86/crypto/aesni-intel_asm.S
2008 Mar 26
2
[LLVMdev] Checked arithmetic
Hi Chris, > Why not define an "add with overflow" intrinsic that returns its value and > overflow bit as an i1? what's the point? We have this today with apint codegen (if you turn on LegalizeTypes). For example, this function define i1 @cc(i32 %x, i32 %y) { %xx = zext i32 %x to i33 %yy = zext i32 %y to i33 %s = add i33 %xx, %yy %tmp = lshr i33 %s, 32 %b = trunc
2015 Jan 23
2
[LLVMdev] X86TargetLowering::LowerToBT
I suspect that this is because the mask in your example is the result of a variable shift, which (a) has it’s own performance and flags hazards pre-SHLX and (b) requires additional µops to do with TEST. I expect that ICC is putting a dummy TEST or XOR ahead of the BT to break the false flags dependency, as well. If the mask were constant, I expect ICC would generate TEST instead (but I don’t
2017 Oct 20
0
[PATCH v1 01/27] x86/crypto: Adapt assembly for PIE support
* Thomas Garnier <thgarnie at google.com> wrote: > Change the assembly code to use only relative references of symbols for the > kernel to be PIE compatible. > > Position Independent Executable (PIE) support will allow to extended the > KASLR randomization range below the -2G memory limit. > diff --git a/arch/x86/crypto/aes-x86_64-asm_64.S
2019 Aug 19
0
Slow XCHG in arch/i386/libgcc/__ashrdi3.S and arch/i386/libgcc/__lshrdi3.S
On 8/14/19 9:42 PM, Stefan Kanthak wrote: > Hi, > > both > https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__ashldi3.S > and > https://git.kernel.org/pub/scm/libs/klibc/klibc.git/plain/usr/klibc/arch/i386/libgcc/__lshrdi3.S > use the following code sequences for shift counts greater 31: > > 1: 1: >
2006 Jun 26
0
[klibc 24/43] i386 support for klibc
The parts of klibc specific to the i386 architecture. Signed-off-by: H. Peter Anvin <hpa at zytor.com> --- commit bd0599e5290ca1a16bb7a68f7c362d395c612eb3 tree 8f33afdd02a14c22e7a3984da2bad13184e3f729 parent 84f6a72f42cf41e32daa59871a0b5424572093e4 author H. Peter Anvin <hpa at zytor.com> Sun, 25 Jun 2006 16:58:21 -0700 committer H. Peter Anvin <hpa at zytor.com> Sun, 25 Jun
2015 Sep 01
2
[RFC] New pass: LoopExitValues
On Mon, Aug 31, 2015 at 5:52 PM, Jake VanAdrighem <jvanadrighem at gmail.com> wrote: > Do you have some specific performance measurements? Averaging 4 runs of 10000 iterations each of Coremark on my X86_64 desktop showed: -O2 performance: +2.9% faster with the L.E.V. pass -Os size: 1.5% smaller with the L.E.V. pass In the case of Coremark, the benefit comes mainly from the matrix
2015 Aug 04
13
[PATCH] efi: leaving long mode in kernel_jump routine
Syslinux 6.03 (efi64) fails to boot a 32-bit kernel. The way Syslinux leaves long mode in kernel_jump assembly routine does not follow AMD64 specifications. More precisely: 1. After setting a new GADT, `cs` has to be refresh by doing a long jump, but it is not 2. Other segments have to be updated, but they are not 3. Disabling paging has to be done before disabling long mode, but the
2012 Nov 20
12
[PATCH v2 00/11] xen: Initial kexec/kdump implementation
Hi, This set of patches contains initial kexec/kdump implementation for Xen v2 (previous version were posted to few people by mistake; sorry for that). Currently only dom0 is supported, however, almost all infrustructure required for domU support is ready. Jan Beulich suggested to merge Xen x86 assembler code with baremetal x86 code. This could simplify and reduce a bit size of kernel code.
2012 Nov 20
12
[PATCH v2 00/11] xen: Initial kexec/kdump implementation
Hi, This set of patches contains initial kexec/kdump implementation for Xen v2 (previous version were posted to few people by mistake; sorry for that). Currently only dom0 is supported, however, almost all infrustructure required for domU support is ready. Jan Beulich suggested to merge Xen x86 assembler code with baremetal x86 code. This could simplify and reduce a bit size of kernel code.
2012 Nov 20
12
[PATCH v2 00/11] xen: Initial kexec/kdump implementation
Hi, This set of patches contains initial kexec/kdump implementation for Xen v2 (previous version were posted to few people by mistake; sorry for that). Currently only dom0 is supported, however, almost all infrustructure required for domU support is ready. Jan Beulich suggested to merge Xen x86 assembler code with baremetal x86 code. This could simplify and reduce a bit size of kernel code.
2017 Sep 28
0
[RFC] PT.2 Add IR level interprocedural outliner for code size.
On Wed, Sep 27, 2017 at 6:07 PM, Matthias Braun <mbraun at apple.com> wrote: > > On Sep 27, 2017, at 3:23 PM, Davide Italiano via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > On Wed, Sep 27, 2017 at 9:28 AM, Jessica Paquette via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > > I think that, given previous discussion on the topic, we might want
2017 Jan 24
7
[X86][AVX512] RFC: make i1 illegal in the Codegen
Hi All, AVX-512 introduced the K mask registers and masked operations which make a natural choice for legalizing vectors of i1's. For example, define <8 x i32> @foo(<8 x i32>%a, <8 x i32*> %p) { %r = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %p, i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>,
2012 Mar 27
1
[LLVMdev] Compiling integer mod
For the simple C program below I show the output of clang and the output of the VS compiler (I am on windows). Maybe this is obvious to you, but is it really faster to do 2 multiplications, 3 movl instructions, 2 shifts, 1 add, and 1 substract than to do 1 mov, 1 cdq, and 1 idiv? I run into this while trying to understand why my code runs slower with llvm than a comparable program on windows.
2008 Mar 26
0
[LLVMdev] Checked arithmetic
On Wed, 26 Mar 2008, Duncan Sands wrote: > Hi Chris, > >> Why not define an "add with overflow" intrinsic that returns its value and >> overflow bit as an i1? > > what's the point? We have this today with apint codegen (if you turn on > LegalizeTypes). For example, this function The desired code is something like: foo: addl %eax, %ecx jo
2008 Mar 26
0
[LLVMdev] Checked arithmetic
On Wed, 26 Mar 2008, Jonathan S. Shapiro wrote: > I want to background process this for a bit, but it would be helpful to > discuss some approaches first. > > There would appear to be three approaches: > > 1. Introduce a CC register class into the IR. This seems to be a > fairly major overhaul. > > 2. Introduce a set of scalar and fp computation quasi-instructions