search for: addq

Displaying 20 results from an estimated 206 matches for "addq".

Did you mean: add
2016 Jun 29
2
avx512 JIT backend generates wrong code on <4 x float>
...,f16c,ssse3,mmx,-pku,cmov,-xop,rdseed,movbe,-hle,xsaveopt,-sha,sse2,sse3,-avx512dq, Assembly: .text .file "module_KFxOBX_i4_after.ll" .globl adjmul .align 16, 0x90 .type adjmul, at function adjmul: .cfi_startproc leaq (%rdi,%r8), %rdx addq %rsi, %r8 testb $1, %cl cmoveq %rdi, %rdx cmoveq %rsi, %r8 movq %rdx, %rax sarq $63, %rax shrq $62, %rax addq %rdx, %rax sarq $2, %rax movq %r8, %rcx sarq $63, %rcx shrq $62, %rcx addq %r8, %rcx...
2016 Jun 29
0
avx512 JIT backend generates wrong code on <4 x float>
...saveopt,-sha,sse2,sse3,-avx512dq, > Assembly: > .text > .file "module_KFxOBX_i4_after.ll" > .globl adjmul > .align 16, 0x90 > .type adjmul, at function > adjmul: > .cfi_startproc > leaq (%rdi,%r8), %rdx > addq %rsi, %r8 > testb $1, %cl > cmoveq %rdi, %rdx > cmoveq %rsi, %r8 > movq %rdx, %rax > sarq $63, %rax > shrq $62, %rax > addq %rdx, %rax > sarq $2, %rax > movq %r8, %rcx > sarq $63, %rcx &...
2016 Jun 30
1
avx512 JIT backend generates wrong code on <4 x float>
...bly: >> .text >> .file "module_KFxOBX_i4_after.ll" >> .globl adjmul >> .align 16, 0x90 >> .type adjmul, at function >> adjmul: >> .cfi_startproc >> leaq (%rdi,%r8), %rdx >> addq %rsi, %r8 >> testb $1, %cl >> cmoveq %rdi, %rdx >> cmoveq %rsi, %r8 >> movq %rdx, %rax >> sarq $63, %rax >> shrq $62, %rax >> addq %rdx, %rax >> sarq $2, %rax >> mo...
2014 Jul 23
4
[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops
...es of code the code in main is also sometimes different (not just inlined) to the_func clang -DITER -O2 clang -DITER -O3 gives: the_func: leaq 12(%rdi), %rcx leaq 4(%rdi), %rax cmpq %rax, %rcx cmovaq %rcx, %rax movq %rdi, %rsi notq %rsi addq %rax, %rsi shrq $2, %rsi incq %rsi xorl %edx, %edx movabsq $9223372036854775800, %rax # imm = 0x7FFFFFFFFFFFFFF8 andq %rsi, %rax pxor %xmm0, %xmm0 je .LBB0_1 # BB#2: # %vector.body.preheader leaq...
2016 Jan 04
2
Fwd: Strength reduction in loops
Here is a simple loop: long foo(int len, long* s) { long sum = 0; for (int i=0; i<len; i++) sum += s[i*12]; return sum; } There is a multiplication in each loop iteration. Can this be turned into addition, and is there already a pass that does? (https://en.wikipedia.org/wiki/Strength_reduction uses this very situation as an example in the opening paragraph: "In
2016 Jun 23
2
AVX512 instruction generated when JIT compiling for an avx2 architecture
...ng at the assembler reveals an AVX512 instruction which shouldn't be there. Assembly: .text .file "module" .globl main .align 16, 0x90 .type main, at function main: .cfi_startproc movq 8(%rsp), %r10 leaq (%rdi,%r8), %rdx addq %rsi, %r8 testb $1, %cl cmoveq %rdi, %rdx cmoveq %rsi, %r8 movq %rdx, %rax sarq $63, %rax shrq $62, %rax addq %rdx, %rax sarq $2, %rax movq %r8, %rcx sarq $63, %rcx shrq $62, %rcx addq %r8, %rcx...
2016 Jun 23
2
AVX512 instruction generated when JIT compiling for an avx2 architecture
...Assembly: > .text > .file "module" > .globl main > .align 16, 0x90 > .type main, at function > main: > .cfi_startproc > movq 8(%rsp), %r10 > leaq (%rdi,%r8), %rdx > addq %rsi, %r8 > testb $1, %cl > cmoveq %rdi, %rdx > cmoveq %rsi, %r8 > movq %rdx, %rax > sarq $63, %rax > shrq $62, %rax > addq %rdx, %rax > sarq $2, %rax > movq %r8, %rcx &g...
2015 Oct 27
4
How can I tell llvm, that a branch is preferred ?
...or "switch". And __buildin_expect does nothing, that I am sure of. Unfortunately llvm has this knack for ordering my one most crucial part of code exactly the opposite I want to, it does: (x86_64) cmpq %r15, (%rax,%rdx) jne LBB0_3 Ltmp18: leaq 8(%rax,%rdx), %rcx jmp LBB0_4 LBB0_3: addq $8, %rcx LBB0_4: when I want, cmpq %r15, (%rax,%rdx) jeq LBB0_3 addq $8, %rcx jmp LBB0_4 LBB0_3: leaq 8(%rax,%rdx), %rcx LBB0_4: since that saves me executing a jump 99.9% of the time. Is there anything I can do ? Ciao Nat!
2015 Jul 29
2
[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address
When I compile attached IR with LLVM 3.6 llc -march=x86-64 -o f.S f.ll it generates an aligned ADDPS with unaligned address. See attached f.S, here an extract: addq $12, %r9 # $12 is not a multiple of 4, thus for xmm0 this is unaligned xorl %esi, %esi .align 16, 0x90 .LBB0_1: # %loop2 # =>This Inner Loop Header: Depth=1 movq offset_array3(,%...
2013 Apr 03
2
[LLVMdev] Packed instructions generaetd by LoopVectorize?
...m I doing something wrong? Tyler float dotproduct(float *A, float *B, int n) { float sum = 0; for(int i = 0; i < n; ++i) { sum += A[i] * B[i]; } return sum; } clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o - <loop body> .LBB1_1: movss (%rdi), %xmm1 addq $4, %rdi mulss (%rsi), %xmm1 addq $4, %rsi decl %edx addss %xmm1, %xmm0 jne .LBB1_1 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130403/529c8ae3/att...
2010 Jan 22
2
[LLVMdev] Exception handling question
...# %entry subq $56, %rsp .Llabel294: .LBB153_1: movq %rdi, 24(%rsp) movq %rsi, 48(%rsp) movl %edx, 44(%rsp) movq %rcx, 32(%rsp) .LBB153_2: # %.try_body movq 32(%rsp), %rdi .Llabel291: addq $16, %rdi xorb %al, %al call _Unwind_RaiseException .Llabel292: jmp .LBB153_4 .LBB153_3: # %.finally_pad .Llabel293: movq %rax, 16(%rsp) testq %rdx, %rdx setne %al movzbl %al, %eax movq %rax,...
2018 May 11
2
best way to represent function call with new stack in LLVM IR?
...uot; # set the new base pointer for this function\0A movq %rsi, %rbp\0A # store stack pointer of this function for later\0A movq %rsp, (%rsi)\0A # save this new stack pointer for use later\0A movq %rsp, r11\0A # compute the new stack pointer for this function\0A subq %rdi, %rsi\0A addq %rsp, %rsi \0A movq %rsi, %rsp\0A # copy args that were passed via the old stack to the new stack\0A # %r11 marches towards %rdi which is the source addresses\0A1:\0A cmpq %rdi, %r11\0A je 2\0A movq (%r11), %r12\0A movq %r12, (%rsi)\0A addq $$0x8, %rsi\0A addq $$0x8, %r11\0A jmp...
2010 May 19
1
[LLVMdev] Scheduled Instructions go missing
All, I'm working on a new scheduler. I have a basic block for which my scheduler generates bad code. The C code looks like int j, *p; if ((j = *p++) != 0) {...} My scheduler emits (x86, AT&T) mov p, %rax mov (%rax), %rax mov %rax, j addq $0x04, p je ... Notice there is no test instruction. The default list scheduler generates mov p, %rax mov (%rax), %rax mov %rax, j addq $0x04, p test %rax je ... The sequence generated by both schedulers after scheduling and before emission and they are the same. Specifically, the test i...
2007 Apr 30
2
[PATCH 0/12] Early USB debug port and i386 boot cleanups
Modern hardware relies primarily on memory mapped I/O which is typically at addresses that are not mapped by the kernels initial page tables, which makes using them currently unusable for early debugging print support. So this patch set digs in and fixes the early page tables on both arch/i386 and arch/x86_64 so that set_fixmap works with our initial boot page tables. All that is needed is that
2007 Apr 30
2
[PATCH 0/12] Early USB debug port and i386 boot cleanups
Modern hardware relies primarily on memory mapped I/O which is typically at addresses that are not mapped by the kernels initial page tables, which makes using them currently unusable for early debugging print support. So this patch set digs in and fixes the early page tables on both arch/i386 and arch/x86_64 so that set_fixmap works with our initial boot page tables. All that is needed is that
2010 Sep 01
5
[LLVMdev] equivalent IR, different asm
...shq %rbx subq $8, %rsp movq %rsi, %rbx movq %rdi, %r14 movq %rdx, %rdi movq %rcx, %rsi callq __ZN7WebCore4viziEPKNS_20RenderBoxModelObjectEPNS_10StyleImageE movq %rax, %rcx shrq $32, %rcx testl %ecx, %ecx je LBB0_2 ## BB#1: imull (%rbx), %eax cltd idivl %ecx movl %eax, (%r14) LBB0_2: addq $8, %rsp popq %rbx popq %r14 ret $ llc opt-fail.ll -o - .section __TEXT,__text,regular,pure_instructions .globl __ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxModelObjectEPNS_10StyleImageE .align 4, 0x90 __ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxModelObjectEPNS_10StyleImageE: ## @_ZN7WebCore6kolo...
2013 Apr 03
0
[LLVMdev] Packed instructions generaetd by LoopVectorize?
...*B, int n) { > float sum = 0; > for(int i = 0; i < n; ++i) { > sum += A[i] * B[i]; > } > return sum; > } > > clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o - > > <loop body> > .LBB1_1: > movss (%rdi), %xmm1 > addq $4, %rdi > mulss (%rsi), %xmm1 > addq $4, %rsi > decl %edx > addss %xmm1, %xmm0 > jne .LBB1_1 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachm...
2010 Sep 01
0
[LLVMdev] equivalent IR, different asm
...rdx, %rdi > movq %rcx, %rsi > callq __ZN7WebCore4viziEPKNS_20RenderBoxModelObjectEPNS_10StyleImageE > movq %rax, %rcx > shrq $32, %rcx > testl %ecx, %ecx > je LBB0_2 > ## BB#1: > imull (%rbx), %eax > cltd > idivl %ecx > movl %eax, (%r14) > LBB0_2: > addq $8, %rsp > popq %rbx > popq %r14 > ret > > > $ llc opt-fail.ll -o - > > .section __TEXT,__text,regular,pure_instructions > .globl __ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxModelObjectEPNS_10StyleImageE > .align 4, 0x90 > __ZN7WebCore6kolos1ERiS0_PKNS_20Rende...
2010 Jan 22
0
[LLVMdev] Exception handling question
....globl f .type f, at function f: # @f .Leh_func_begin1: # BB#0: # %e subq $8, %rsp .Llabel4: .Llabel1: callq g .Llabel2: # BB#1: # %c addq $8, %rsp ret .LBB1_2: # %u .Llabel3: addq $8, %rsp ret .size f, .-f .Leh_func_end1: .section .gcc_except_table,"a", at progbits .align 4 GCC_except_table1: .byte 0...
2020 May 09
2
[llvm-mca] Resource consumption of ProcResGroups
Hi, I’m trying to work out the behavior of llvm-mca on instructions with ProcResGroups. My current understanding is: When an instruction requests a port group (e.g., HWPort015) and all of its atomic sub-resources (e.g., HWPort0,HWPort1,HWPort5), HWPort015 is marked as “reserved” and is issued in parallel with HWPort0, HWPort1, and HWPort5, blocking future instructions from reserving HWPort015