similar to: [PATCH] Optimized assembler version of md5_process() for x86-64

Displaying 20 results from an estimated 8000 matches similar to: "[PATCH] Optimized assembler version of md5_process() for x86-64"

2015 Feb 13
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
I submitted the problem report to clang's bugzilla but no one seems to care so I have to send it to the mailing list. clang 3.7 svn (trunk 229055 as the time I was to report this problem) generates slower code than 3.5 (Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)) for the following code. It is a "8 queens puzzle" solver written as an educational example. As
2015 Feb 14
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
The regressions in the performance of generated code, introduced by the llvm 3.6 release, don't seem to be limited to this 8 queens puzzle" solver test case. See... http://www.phoronix.com/scan.php?page=article&item=llvm-clang-3.5-3.6-rc1&num=1 where a bit hit in the performance of the Sparse Matrix Multiply test of the SciMark v2.0 benchmark was observed as well as others.
2015 Feb 14
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
Using the SciMark 2.0 code from http://math.nist.gov/scimark2/scimark2_1c.zip compiled with the same... make CFLAGS="-O3 -march=native" I am able to reproduce the 22% performance regression in the run time of the Sparse matmult benchmark. For 10 runs of the scimark2 benechmark, I get 998.439+/-0.4828 with the release llvm clang 3.5.1 compiler and 1217.363+/-1.1004 for the current
2013 Sep 12
1
[LLVMdev] bug in X86 disasm code?
hi, i found this code in X86DisassemblerDecoder.h #define EA_BASES_32BIT \ ENTRY(EAX) \ ENTRY(ECX) \ ENTRY(EDX) \ ENTRY(EBX) \ ENTRY(sib) \ ENTRY(EBP) \ ENTRY(ESI) \ ENTRY(EDI) \ ENTRY(R8D) \ ENTRY(R9D) \ ENTRY(R10D) \ ENTRY(R11D) \
2017 Jul 01
2
KNL Assembly Code for Matrix Multiplication
Thank You, It means vmovdqa64 zmm22, zmmword ptr [rip + .LCPI0_0] # zmm22 = [8,9,10,11,12,13,14,15] zmm22 will contain 64 bit constant values which are indexes here zmm22=8, 9, 10, 11, 12,13,14,15. not the values loaded from these locations. and zmm2 contains constant 4000. so, vpmuludq zmm14, zmm10, zmm2 ; will multiply the indexes values with 4000, as for array b the stride is 4000. zmm14=
2015 Sep 01
2
[RFC] New pass: LoopExitValues
On Mon, Aug 31, 2015 at 5:52 PM, Jake VanAdrighem <jvanadrighem at gmail.com> wrote: > Do you have some specific performance measurements? Averaging 4 runs of 10000 iterations each of Coremark on my X86_64 desktop showed: -O2 performance: +2.9% faster with the L.E.V. pass -Os size: 1.5% smaller with the L.E.V. pass In the case of Coremark, the benefit comes mainly from the matrix
2020 Jan 10
2
Register Dataflow Analysis on X86
Hi Scott, Sorry for the late reply, I was out of office during the holidays. 1. A def node can reach either a use node, or another def node. In the highlighted phi node (p3224), the def (d3225) reaches another def (1598) in statement (s1597), that’s why it’s needed. 2. The reason why the def of R11 in s1578 is not connected directly to the use in s1725 is that there may be an intervening
2016 Jun 25
0
Tail call optimization is getting affected due to local function related optimization with IPRA
On Sat, Jun 25, 2016 at 11:03 PM, vivek pandya <vivekvpandya at gmail.com> wrote: > Hello LLVM Community, > > To improve Interprocedural Register Allocation (IPRA) we are trying to > force caller > saved registers for local functions (which has likage type local). To > achive it > I have modified TargetFrameLowering::determineCalleeSaves() to return > early for >
2016 Jun 25
3
Tail call optimization is getting affected due to local function related optimization with IPRA
Hello LLVM Community, To improve Interprocedural Register Allocation (IPRA) we are trying to force caller saved registers for local functions (which has likage type local). To achive it I have modified TargetFrameLowering::determineCalleeSaves() to return early for function which satisfies if (F->hasLocalLinkage() && !F->hasAddressTaken()) and also reflecting the fact that for local
2016 Jun 28
0
Tail call optimization is getting affected due to local function related optimization with IPRA
On Tue, Jun 28, 2016 at 8:11 PM, Mehdi Amini <mehdi.amini at apple.com> wrote: > > On Jun 27, 2016, at 12:25 PM, vivek pandya <vivekvpandya at gmail.com> wrote: > > Hello , > > To solve this bug locally I have given preference to tail call > optimization over local function related optimization in IPRA. I have added > following method to achieve this: > >
2004 Sep 10
3
patch
So here is quick patch solving the problem, now it should be PIC. -- Miroslav Lichvar lichvarm@phoenix.inf.upol.cz -------------- next part -------------- --- lpc_asm.nasm.orig Wed Jul 18 02:23:40 2001 +++ lpc_asm.nasm Sat Nov 17 21:09:46 2001 @@ -59,10 +59,10 @@ ; ALIGN 16 cident FLAC__lpc_compute_autocorrelation_asm_ia32 - ;[esp + 24] == autoc[] - ;[esp + 20] == lag - ;[esp + 16] ==
2016 Jun 28
2
Tail call optimization is getting affected due to local function related optimization with IPRA
> On Jun 27, 2016, at 12:25 PM, vivek pandya <vivekvpandya at gmail.com> wrote: > > Hello , > > To solve this bug locally I have given preference to tail call optimization over local function related optimization in IPRA. I have added following method to achieve this: > > bool isEligibleForTailCallOptimization(Function *F) { > CallingConv::ID CC =
2016 Jun 27
0
Tail call optimization is getting affected due to local function related optimization with IPRA
Hello , To solve this bug locally I have given preference to tail call optimization over local function related optimization in IPRA. I have added following method to achieve this: bool isEligibleForTailCallOptimization(Function *F) { CallingConv::ID CC = F->getCallingConv(); if (CC == CallingConv::Fast || CC == CallingConv::GHC || CC == CallingConv::HiPE) return true; return false;
2016 Jun 26
3
Tail call optimization is getting affected due to local function related optimization with IPRA
According to this http://llvm.org/docs/CodeGenerator.html#tail-call-section, it seems that adding a new CC for the purpose of local function optimization seems a good idea because tail call optimization only takes place when both caller and callee have fastcc or GHC or HiPE calling convention. -Vivek On Sun, Jun 26, 2016 at 1:26 AM, vivek pandya <vivekvpandya at gmail.com> wrote: >
2016 Jun 28
2
Tail call optimization is getting affected due to local function related optimization with IPRA
Sent from my iPhone > On Jun 28, 2016, at 12:53 PM, vivek pandya <vivekvpandya at gmail.com> wrote: > > > >> On Tue, Jun 28, 2016 at 8:11 PM, Mehdi Amini <mehdi.amini at apple.com> wrote: >> >>> On Jun 27, 2016, at 12:25 PM, vivek pandya <vivekvpandya at gmail.com> wrote: >>> >>> Hello , >>> >>> To solve
2019 Dec 23
2
Register Dataflow Analysis on X86
Hi Scott, That #1073741833 is a register mask. They are treated as aggregate registers (essentially sets of registers), so if it includes R9D and R11D, it will be treated as being aliased with both. These separate defs are there because they reach disjoint registers. -- Krzysztof Parzyszek kparzysz at quicinc.com<mailto:kparzysz at quicinc.com> AI tools development From: Scott
2008 Nov 22
5
[RFC][PATCH] Gfxboot COMBOOT module
So here it is. Ugly and far from acceptable shape but nonetheless it seems to work. Parts are borrowed from syslinux core and of course the gfxboot patch for syslinux 3.63. Syntax: gfxboot.com <bootlogo file> - Sebastian --- /dev/null 2007-09-21 23:50:58.000000000 +0200 +++ syslinux-3.73-pre6/modules/gfxboot.asm 2008-11-22 19:01:10.000000000 +0100 @@ -0,0 +1,883 @@ + absolute 0
2005 Mar 23
3
[PATCH] promised MMX patches rc1
Hello, Here is my first speedup patch. Like 10-11%. No IDCT yet. Please feel free to comment my code or even better think about improvements. :) I belive my routines are not so bad, maybe one day they will be even more faster. What needs to be optimized is the loop filter fuction. I have no ideas now how to do it. It does not leave much space for parallel stuff, copying memory from lot of
2008 Apr 16
0
[LLVMdev] Being able to know the jitted code-size before emitting
Comments below. On Apr 15, 2008, at 4:24 AM, Nicolas Geoffray wrote: > OK, here's a new patch that adds the infrastructure and the > implementation for X86, ARM and PPC of GetInstSize and > GetFunctionSize. Both functions are virtual functions defined in > TargetInstrInfo.h. > > For X86, I moved some commodity functions from X86CodeEmitter to > X86InstrInfo. >
2020 Jul 31
2
Issue with inline assembly, function inlining, and position independent code
Code: https://godbolt.org/z/T397fo I'm running some performance experiments on a x86-64 linux system, where I've modified LLVM to reserve a register, and I'd like to use that register in my code. Currently, I'm using %r12d, which is callee save, so I don't need to worry about compatibility with existing libraries or system calls. For security reasons, the generated binaries