search for: lea

Displaying 20 results from an estimated 491 matches for "lea".

Did you mean: le
2012 Sep 28
2
[LLVMdev] [PROPOSAL] Improve uses of LEA on Atom
Hi, Here is an update on our proposal to improve the uses of LEA on Atom processors. 1. Disable current generation of LEAs Due to a 3 cycle stall between the ALU and the AGU any address generation done using math instruction will cause a stall on loads and stores which are within 3 cycles of the address generation. Consequently, the heuristics for using LEAs e...
2013 Sep 30
0
[LLVMdev] [PROPOSAL] Improve uses of LEA on Atom
Was there any development on this? I noticed that clang still produces a lea for the testcase in llvm.org/pr13320. On 28 September 2012 11:36, Nowicki, Tyler <tyler.nowicki at intel.com> wrote: > Hi, > > > > Here is an update on our proposal to improve the uses of LEA on Atom > processors. > > > > 1. Disable current generation of LEAs &g...
2020 May 22
2
[PATCH] Optimized assembler version of md5_process() for x86-64
...push %rbx + push %r12 + push %r13 # not really useful (r13 is unused) + push %r14 + push %r15 + + # rdi = arg #1 (ctx, MD5_CTX pointer) + # rsi = arg #2 (ptr, data pointer) + # rdx = arg #3 (nbr, number of 16-word blocks to process) + mov %rdi, %rbp # rbp = ctx + shl $6, %rdx # rdx = nbr in bytes + lea (%rsi,%rdx), %rdi # rdi = end + mov 0*4(%rbp), %eax # eax = ctx->A + mov 1*4(%rbp), %ebx # ebx = ctx->B + mov 2*4(%rbp), %ecx # ecx = ctx->C + mov 3*4(%rbp), %edx # edx = ctx->D + # end is 'rdi' + # ptr is 'rsi' + # A is 'eax' + # B is 'ebx' + # C is '...
2017 Dec 27
1
Convert MachineInstr to MCInst in AsmPrinter.cpp
Hello everyone, In the file *lib/CodeGen/AsmPrinter/AsmPrinter.cpp*, I would like to obtain an MCInst corresponding to its MachineInstr. Can anyone tell me a way to do that? If that is not possible, then, I would like to know if a given MachineInstr is an *lea *instruction and I would like to know if the symbol involved with this lea instruction is a jump-table. For instance, given a MachineInstr, I would like to know if it is of the following form. *leaq LJTI0_0(%rip), %rax* Also, say, I want to add custom labels (some string) while generating assembl...
2012 Aug 10
0
[LLVMdev] RFC: Adding pass in X86PassConfig::addPreEmitPass for LEA optimization on Atom
Hi, We are getting ready to implement several heuristics for correctly using LEAs to avoid stalls in the address generator on Atom. Our plan is to: 1. Disabling LEA generation on Atom in X86ISelDAGToDAG:: SelectLEAAddr() for all but a few pseudo-instructions 2. Identify loads and stores in a X86PassConfig::addPreEmitPass() pass and examine several preceding instr...
2013 Sep 17
2
[LLVMdev] Codegen performance issue: LEA vs. INC.
Hi all. I'm looking for an advice on how to deal with inefficient code generation for Intel Nehalem/Westmere architecture on 64-bit platform for the attached test.cpp (LLVM IR is in test.cpp.ll). The inner loop has 11 iterations and eventually unrolled. Test.lea.s is the assembly code of the outer loop. It simply has 11 loads, 11 FP add, 11 FP mull, 1 FP store and lea+mov for index computation, cmp and jump. The problem is that lea is on critical path because it's dispatched on the same port as all FP add operations (port 1). Intel Architecture Code An...
2015 Feb 13
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
...mov r15, rax shl rax, 20h mov rsi, offset __mh_execute_header add rsi, rax sar rsi, 20h ; size_t mov edi, 4 ; size_t call _calloc lea edx, [r15-1] movsxd r8, edx mov ecx, r15d add ecx, 0FFFFFFFEh js loc_100000DFA test r15d, r15d mov r11d, [rax+r8*4] jle loc_100000EAE mov...
2013 Sep 12
2
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
...ug in Clang > itself just seems perverse to me. (And we shouldn't let a CodeGen bug > dictate how we implement our functions either). Looking at the assembly there's something wrong with SI that is not getting saved anywhere after CPUID and 0x20 bit test before it gets overwritten by LEA. 332: mov eax,0x7 337: mov rsi,rbx 33a: cpuid 33c: xchg rsi,rbx 33f: and esi,0x20 342: shr esi,0x5 345: lea rbp,[rip+0x0] # 34c <llvm::sys::getHostCPUName()+0xbc> 34c: lea r12,[rip+0x0] # 353 <llvm::sys::getHostCPUName()+0xc3> 353: cmove rb...
2015 Feb 14
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
The regressions in the performance of generated code, introduced by the llvm 3.6 release, don't seem to be limited to this 8 queens puzzle" solver test case. See... http://www.phoronix.com/scan.php?page=article&item=llvm-clang-3.5-3.6-rc1&num=1 where a bit hit in the performance of the Sparse Matrix Multiply test of the SciMark v2.0 benchmark was observed as well a...
2011 Sep 21
1
[LLVMdev] Instruction Selection
...estion about instruction selection for a backend I'm writing. The target has two register classes, RC1 and RC2. The instruction set is far from orthogonal. The ADD instruction is two address with both register/immediate and register/memory forms. The register operand is in the RC1 class. The LEA instruction is three address with the destination register in the RC2 class. There are two forms: register/immediate in which the register is RC2 class and register/register in which one register is RC1 and the other RC2. The algorithm used to generate the instruction selection table gives a relat...
2015 Feb 14
2
[LLVMdev] trunk's optimizer generates slower code than 3.5
...e from http://math.nist.gov/scimark2/scimark2_1c.zip compiled with the same... make CFLAGS="-O3 -march=native" I am able to reproduce the 22% performance regression in the run time of the Sparse matmult benchmark. For 10 runs of the scimark2 benechmark, I get 998.439+/-0.4828 with the release llvm clang 3.5.1 compiler and 1217.363+/-1.1004 for the current clang 3.6svn from 3.6 branch. Not good. Jack On Sat, Feb 14, 2015 at 11:19 AM, Jack Howarth <howarth.mailing.lists at gmail.com> wrote: > Do any of the build-bots routinely run the SciMark v2.0 benchm...
2013 Oct 02
0
[LLVMdev] Codegen performance issue: LEA vs. INC.
...gt; > > > I’m looking for an advice on how to deal with inefficient code generation > for Intel Nehalem/Westmere architecture on 64-bit platform for the attached > test.cpp (LLVM IR is in test.cpp.ll). > > The inner loop has 11 iterations and eventually unrolled. > > Test.lea.s is the assembly code of the outer loop. It simply has 11 loads, > 11 FP add, 11 FP mull, 1 FP store and lea+mov for index computation, cmp and > jump. > > The problem is that lea is on critical path because it’s dispatched on the > same port as all FP add operations (port 1). >...
2013 Oct 03
2
[LLVMdev] Codegen performance issue: LEA vs. INC.
...re. It sounds like it should be taught about profitability. In cases where profitability can only be determined with something machinetracemetric then it probably should live it to more sophisticated pass like regalloc. In this case, we probably need a profitability target hook which knows about lea. We should also consider disabling it's dumb pseudo scheduling code when we enable MI scheduler. Evan Sent from my iPad > On Oct 2, 2013, at 8:38 AM, Rafael Espíndola <rafael.espindola at gmail.com> wrote: > > This sounds like llvm.org/pr13320. > >> On 17 September...
2012 Jun 24
1
how to find out lea instruction causes skype crash when starting
Hi david, I find you signed off a patch about "x86: emulate lea with two register operands correctly" .In this patch,you described skype does a lea instruction and will crash when starting if it does not get the exception.I have used a tool named mentorKG.exe to make a LICENSE.TXT.but the software crashes when starting.I used your patch,and find it works w...
2013 Oct 05
0
[LLVMdev] Codegen performance issue: LEA vs. INC.
...nds like it should be taught about profitability. In cases where profitability can only be determined with something machinetracemetric then it probably should live it to more sophisticated pass like regalloc. > > In this case, we probably need a profitability target hook which knows about lea. We should also consider disabling it's dumb pseudo scheduling code when we enable MI scheduler. Sorry, I set this aside to look at closely and never got back to it. The lea->cmp problem is fixed by switching to the MI scheduler. Please run with -mllvm -misched-bench to confirm. Leaving...
2013 Sep 13
0
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
Pretty sure you need to check EAX>=7 from cpuid leaf 0 before calling leaf 7 and you need to use the pass ECX=0 to leaf 7. See lib/Target/X86/X86Subtarget.cpp which uses a GetX86CpuIDAndInfoEx function to pass EAX and ECX to cpuid. I don't think it explains your compiler bug though. On Thu, Sep 12, 2013 at 2:12 PM, Adam Strzelecki <ono at...
2014 Mar 25
3
[LLVMdev] Getting the Debugging JIT-ed Code with GDB example to work
...is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-linux-gnu". > For bug reporting instructions, please see: > <http://bugs.launchpad.net/gdb-linaro/>... > Reading symbols from > /home/zdevito/clang+llvm-3.4-x86_64-unknown-ubuntu12.04/bin/lli...(no > debugging symbols found)...done. > (gdb) b showdebug.c:6 > No symbol table is loaded. Use the "file" command. >...
2004 Sep 10
3
patch
...mov edi, [esp + 28] ; edi == autoc + mov esi, [esp + 16] ; esi == data inc ecx ; we are looping <= limit so we add one to the counter ; for(sample = 0; sample <= limit; sample++) { @@ -97,7 +98,11 @@ ; each iteration is 11 bytes so we need (-eax)*11, so we do (-12*eax + eax) lea edx, [eax + eax*2] neg edx - lea edx, [eax + edx*4 + .jumper1_0] + lea edx, [eax + edx*4 + .jumper1_0 - .get_eip1] + call .get_eip1 +.get_eip1: + pop ebx + add edx, ebx inc edx ; compensate for the shorter opcode on the last iteration inc edx ; compensate for the shorter opcode on the l...
2009 Aug 30
3
experimental patch for libtheora1.1beta3
...v retrieving revision 1.10 diff -u Makefile --- Makefile 12 Feb 2009 03:21:56 -0000 1.10 +++ Makefile 25 Aug 2009 14:46:39 -0000 @@ -2,14 +2,14 @@ COMMENT= open video codec -DISTNAME= libtheora-1.0 +DISTNAME= libtheora-1.1beta3 CATEGORIES= multimedia MASTER_SITES= http://downloads.xiph.org/releases/theora/ EXTRACT_SUFX= .tar.bz2 SHARED_LIBS+= theora 3.1 SHARED_LIBS+= theoradec 1.0 -SHARED_LIBS+= theoraenc 1.1 +SHARED_LIBS+= theoraenc 1.2 HOMEPAGE= http://www.theora.org/ @@ -30,5 +30,8 @@ CONFIGURE_ARGS= --disable-examples CONFIGURE_ENV= ac_cv_prog_HAVE_DOXYGEN=false \ ac_cv...
2008 Feb 11
2
[LLVMdev] "make check" failures: leaq in fold-mul-lohi.ll, stride-nine-with-base-reg.ll, stride-reuse.ll
I'm seeing the following failures with "make check" (x86-32 linux): FAIL: test/CodeGen/X86/fold-mul-lohi.ll Failed with exit(1) at line 2 while running: llvm-as < test/CodeGen/X86/fold-mul-lohi.ll | llc -march=x86-64 | not grep lea leaq B, %rsi leaq A, %r8 leaq P, %rsi child process exited abnormally FAIL: test/CodeGen/X86/stride-nine-with-base-reg.ll Failed with exit(1) at line 2 while running: llvm-as < test/CodeGen/X86/stride-nine-with-base-reg.ll | llc -march=x86-64 | not grep lea...