thr3ads.net - similar to: "[PATCH] Optimized assembler version of md5

Displaying 20 results from an estimated 8000 matches similar to: "[PATCH] Optimized assembler version of md5_process() for x86-64"

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 13

[LLVMdev] trunk's optimizer generates slower code than 3.5

I submitted the problem report to clang's bugzilla but no one seems to care so I have to send it to the mailing list. clang 3.7 svn (trunk 229055 as the time I was to report this problem) generates slower code than 3.5 (Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)) for the following code. It is a "8 queens puzzle" solver written as an educational example. As

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 14

[LLVMdev] trunk's optimizer generates slower code than 3.5

The regressions in the performance of generated code, introduced by the llvm 3.6 release, don't seem to be limited to this 8 queens puzzle" solver test case. See... http://www.phoronix.com/scan.php?page=article&item=llvm-clang-3.5-3.6-rc1&num=1 where a bit hit in the performance of the Sparse Matrix Multiply test of the SciMark v2.0 benchmark was observed as well as others.

[LLVMdev] trunk's optimizer generates slower code than 3.5

2015 Feb 14

[LLVMdev] trunk's optimizer generates slower code than 3.5

Using the SciMark 2.0 code from http://math.nist.gov/scimark2/scimark2_1c.zip compiled with the same... make CFLAGS="-O3 -march=native" I am able to reproduce the 22% performance regression in the run time of the Sparse matmult benchmark. For 10 runs of the scimark2 benechmark, I get 998.439+/-0.4828 with the release llvm clang 3.5.1 compiler and 1217.363+/-1.1004 for the current

[LLVMdev] bug in X86 disasm code?

2013 Sep 12

[LLVMdev] bug in X86 disasm code?

hi, i found this code in X86DisassemblerDecoder.h #define EA_BASES_32BIT \ ENTRY(EAX) \ ENTRY(ECX) \ ENTRY(EDX) \ ENTRY(EBX) \ ENTRY(sib) \ ENTRY(EBP) \ ENTRY(ESI) \ ENTRY(EDI) \ ENTRY(R8D) \ ENTRY(R9D) \ ENTRY(R10D) \ ENTRY(R11D) \

KNL Assembly Code for Matrix Multiplication

2017 Jul 01

KNL Assembly Code for Matrix Multiplication

Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=

[RFC] New pass: LoopExitValues

2015 Sep 01

[RFC] New pass: LoopExitValues

On Mon, Aug 31, 2015 at 5:52 PM, Jake VanAdrighem <jvanadrighem at gmail.com> wrote: > Do you have some specific performance measurements? Averaging 4 runs of 10000 iterations each of Coremark on my X86_64 desktop showed: -O2 performance: +2.9% faster with the L.E.V. pass -Os size: 1.5% smaller with the L.E.V. pass In the case of Coremark, the benefit comes mainly from the matrix

2020 Jan 10

Hi Scott, Sorry for the late reply, I was out of office during the holidays. 1. A def node can reach either a use node, or another def node. In the highlighted phi node (p3224), the def (d3225) reaches another def (1598) in statement (s1597), that’s why it’s needed. 2. The reason why the def of R11 in s1578 is not connected directly to the use in s1725 is that there may be an intervening

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 25

Tail call optimization is getting affected due to local function related optimization with IPRA

On Sat, Jun 25, 2016 at 11:03 PM, vivek pandya <vivekvpandya at gmail.com> wrote: > Hello LLVM Community, > > To improve Interprocedural Register Allocation (IPRA) we are trying to > force caller > saved registers for local functions (which has likage type local). To > achive it > I have modified TargetFrameLowering::determineCalleeSaves() to return > early for >

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 25

Tail call optimization is getting affected due to local function related optimization with IPRA

Hello LLVM Community, To improve Interprocedural Register Allocation (IPRA) we are trying to force caller saved registers for local functions (which has likage type local). To achive it I have modified TargetFrameLowering::determineCalleeSaves() to return early for function which satisfies if (F->hasLocalLinkage() && !F->hasAddressTaken()) and also reflecting the fact that for local

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 28

Tail call optimization is getting affected due to local function related optimization with IPRA

On Tue, Jun 28, 2016 at 8:11 PM, Mehdi Amini <mehdi.amini at apple.com> wrote: > > On Jun 27, 2016, at 12:25 PM, vivek pandya <vivekvpandya at gmail.com> wrote: > > Hello , > > To solve this bug locally I have given preference to tail call > optimization over local function related optimization in IPRA. I have added > following method to achieve this: > >

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 28

Tail call optimization is getting affected due to local function related optimization with IPRA

> On Jun 27, 2016, at 12:25 PM, vivek pandya <vivekvpandya at gmail.com> wrote: > > Hello , > > To solve this bug locally I have given preference to tail call optimization over local function related optimization in IPRA. I have added following method to achieve this: > > bool isEligibleForTailCallOptimization(Function *F) { > CallingConv::ID CC =

patch

2004 Sep 10

patch

So here is quick patch solving the problem, now it should be PIC. -- Miroslav Lichvar lichvarm@phoenix.inf.upol.cz -------------- next part -------------- --- lpc_asm.nasm.orig Wed Jul 18 02:23:40 2001 +++ lpc_asm.nasm Sat Nov 17 21:09:46 2001 @@ -59,10 +59,10 @@ ; ALIGN 16 cident FLAC__lpc_compute_autocorrelation_asm_ia32 - ;[esp + 24] == autoc[] - ;[esp + 20] == lag - ;[esp + 16] ==

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 27

Tail call optimization is getting affected due to local function related optimization with IPRA

Hello , To solve this bug locally I have given preference to tail call optimization over local function related optimization in IPRA. I have added following method to achieve this: bool isEligibleForTailCallOptimization(Function *F) { CallingConv::ID CC = F->getCallingConv(); if (CC == CallingConv::Fast || CC == CallingConv::GHC || CC == CallingConv::HiPE) return true; return false;

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 26

Tail call optimization is getting affected due to local function related optimization with IPRA

According to this http://llvm.org/docs/CodeGenerator.html#tail-call-section, it seems that adding a new CC for the purpose of local function optimization seems a good idea because tail call optimization only takes place when both caller and callee have fastcc or GHC or HiPE calling convention. -Vivek On Sun, Jun 26, 2016 at 1:26 AM, vivek pandya <vivekvpandya at gmail.com> wrote: >

Tail call optimization is getting affected due to local function related optimization with IPRA

2016 Jun 28

Tail call optimization is getting affected due to local function related optimization with IPRA

Sent from my iPhone > On Jun 28, 2016, at 12:53 PM, vivek pandya <vivekvpandya at gmail.com> wrote: > > > >> On Tue, Jun 28, 2016 at 8:11 PM, Mehdi Amini <mehdi.amini at apple.com> wrote: >> >>> On Jun 27, 2016, at 12:25 PM, vivek pandya <vivekvpandya at gmail.com> wrote: >>> >>> Hello , >>> >>> To solve

2019 Dec 23

Hi Scott, That #1073741833 is a register mask. They are treated as aggregate registers (essentially sets of registers), so if it includes R9D and R11D, it will be treated as being aliased with both. These separate defs are there because they reach disjoint registers. -- Krzysztof Parzyszek kparzysz at quicinc.com<mailto:kparzysz at quicinc.com> AI tools development From: Scott

[RFC][PATCH] Gfxboot COMBOOT module

2008 Nov 22

[RFC][PATCH] Gfxboot COMBOOT module

So here it is. Ugly and far from acceptable shape but nonetheless it seems to work. Parts are borrowed from syslinux core and of course the gfxboot patch for syslinux 3.63. Syntax: gfxboot.com <bootlogo file> - Sebastian --- /dev/null 2007-09-21 23:50:58.000000000 +0200 +++ syslinux-3.73-pre6/modules/gfxboot.asm 2008-11-22 19:01:10.000000000 +0100 @@ -0,0 +1,883 @@ + absolute 0

[PATCH] promised MMX patches rc1

2005 Mar 23

[PATCH] promised MMX patches rc1

Hello, Here is my first speedup patch. Like 10-11%. No IDCT yet. Please feel free to comment my code or even better think about improvements. :) I belive my routines are not so bad, maybe one day they will be even more faster. What needs to be optimized is the loop filter fuction. I have no ideas now how to do it. It does not leave much space for parallel stuff, copying memory from lot of

[LLVMdev] Being able to know the jitted code-size before emitting

2008 Apr 16

[LLVMdev] Being able to know the jitted code-size before emitting

Comments below. On Apr 15, 2008, at 4:24 AM, Nicolas Geoffray wrote: > OK, here's a new patch that adds the infrastructure and the > implementation for X86, ARM and PPC of GetInstSize and > GetFunctionSize. Both functions are virtual functions defined in > TargetInstrInfo.h. > > For X86, I moved some commodity functions from X86CodeEmitter to > X86InstrInfo. >

Issue with inline assembly, function inlining, and position independent code

2020 Jul 31

Issue with inline assembly, function inlining, and position independent code

Code: https://godbolt.org/z/T397fo I'm running some performance experiments on a x86-64 linux system, where I've modified LLVM to reserve a register, and I'd like to use that register in my code. Currently, I'm using %r12d, which is callee save, so I don't need to worry about compatibility with existing libraries or system calls. For security reasons, the generated binaries

similar to: [PATCH] Optimized assembler version of md5_process() for x86-64