thr3ads.net - search: "addq"

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 29

2

avx512 JIT backend generates wrong code on <4 x float>

...,f16c,ssse3,mmx,-pku,cmov,-xop,rdseed,movbe,-hle,xsaveopt,-sha,sse2,sse3,-avx512dq, Assembly: .text .file "module_KFxOBX_i4_after.ll" .globl adjmul .align 16, 0x90 .type adjmul, at function adjmul: .cfi_startproc leaq (%rdi,%r8), %rdx addq %rsi, %r8 testb $1, %cl cmoveq %rdi, %rdx cmoveq %rsi, %r8 movq %rdx, %rax sarq $63, %rax shrq $62, %rax addq %rdx, %rax sarq $2, %rax movq %r8, %rcx sarq $63, %rcx shrq $62, %rcx addq %r8, %rcx...

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 29

0

avx512 JIT backend generates wrong code on <4 x float>

...saveopt,-sha,sse2,sse3,-avx512dq, > Assembly: > .text > .file "module_KFxOBX_i4_after.ll" > .globl adjmul > .align 16, 0x90 > .type adjmul, at function > adjmul: > .cfi_startproc > leaq (%rdi,%r8), %rdx > addq %rsi, %r8 > testb $1, %cl > cmoveq %rdi, %rdx > cmoveq %rsi, %r8 > movq %rdx, %rax > sarq $63, %rax > shrq $62, %rax > addq %rdx, %rax > sarq $2, %rax > movq %r8, %rcx > sarq $63, %rcx &...

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 30

1

avx512 JIT backend generates wrong code on <4 x float>

...bly: >> .text >> .file "module_KFxOBX_i4_after.ll" >> .globl adjmul >> .align 16, 0x90 >> .type adjmul, at function >> adjmul: >> .cfi_startproc >> leaq (%rdi,%r8), %rdx >> addq %rsi, %r8 >> testb $1, %cl >> cmoveq %rdi, %rdx >> cmoveq %rsi, %r8 >> movq %rdx, %rax >> sarq $63, %rax >> shrq $62, %rax >> addq %rdx, %rax >> sarq $2, %rax >> mo...

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

2014 Jul 23

4

[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops

...es of code the code in main is also sometimes different (not just inlined) to the_func clang -DITER -O2 clang -DITER -O3 gives: the_func: leaq 12(%rdi), %rcx leaq 4(%rdi), %rax cmpq %rax, %rcx cmovaq %rcx, %rax movq %rdi, %rsi notq %rsi addq %rax, %rsi shrq $2, %rsi incq %rsi xorl %edx, %edx movabsq $9223372036854775800, %rax # imm = 0x7FFFFFFFFFFFFFF8 andq %rsi, %rax pxor %xmm0, %xmm0 je .LBB0_1 # BB#2: # %vector.body.preheader leaq...

Fwd: Strength reduction in loops

2016 Jan 04

2

Fwd: Strength reduction in loops

Here is a simple loop: long foo(int len, long* s) { long sum = 0; for (int i=0; i<len; i++) sum += s[i*12]; return sum; } There is a multiplication in each loop iteration. Can this be turned into addition, and is there already a pass that does? (https://en.wikipedia.org/wiki/Strength_reduction uses this very situation as an example in the opening paragraph: "In

AVX512 instruction generated when JIT compiling for an avx2 architecture

2016 Jun 23

2

AVX512 instruction generated when JIT compiling for an avx2 architecture

...ng at the assembler reveals an AVX512 instruction which shouldn't be there. Assembly: .text .file "module" .globl main .align 16, 0x90 .type main, at function main: .cfi_startproc movq 8(%rsp), %r10 leaq (%rdi,%r8), %rdx addq %rsi, %r8 testb $1, %cl cmoveq %rdi, %rdx cmoveq %rsi, %r8 movq %rdx, %rax sarq $63, %rax shrq $62, %rax addq %rdx, %rax sarq $2, %rax movq %r8, %rcx sarq $63, %rcx shrq $62, %rcx addq %r8, %rcx...

AVX512 instruction generated when JIT compiling for an avx2 architecture

2016 Jun 23

2

AVX512 instruction generated when JIT compiling for an avx2 architecture

...Assembly: > .text > .file "module" > .globl main > .align 16, 0x90 > .type main, at function > main: > .cfi_startproc > movq 8(%rsp), %r10 > leaq (%rdi,%r8), %rdx > addq %rsi, %r8 > testb $1, %cl > cmoveq %rdi, %rdx > cmoveq %rsi, %r8 > movq %rdx, %rax > sarq $63, %rax > shrq $62, %rax > addq %rdx, %rax > sarq $2, %rax > movq %r8, %rcx &g...

How can I tell llvm, that a branch is preferred ?

2015 Oct 27

4

How can I tell llvm, that a branch is preferred ?

...or "switch". And __buildin_expect does nothing, that I am sure of. Unfortunately llvm has this knack for ordering my one most crucial part of code exactly the opposite I want to, it does: (x86_64) cmpq %r15, (%rax,%rdx) jne LBB0_3 Ltmp18: leaq 8(%rax,%rdx), %rcx jmp LBB0_4 LBB0_3: addq $8, %rcx LBB0_4: when I want, cmpq %r15, (%rax,%rdx) jeq LBB0_3 addq $8, %rcx jmp LBB0_4 LBB0_3: leaq 8(%rax,%rdx), %rcx LBB0_4: since that saves me executing a jump 99.9% of the time. Is there anything I can do ? Ciao Nat!

[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address

2015 Jul 29

2

[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address

When I compile attached IR with LLVM 3.6 llc -march=x86-64 -o f.S f.ll it generates an aligned ADDPS with unaligned address. See attached f.S, here an extract: addq $12, %r9 # $12 is not a multiple of 4, thus for xmm0 this is unaligned xorl %esi, %esi .align 16, 0x90 .LBB0_1: # %loop2 # =>This Inner Loop Header: Depth=1 movq offset_array3(,%...

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 03

2

[LLVMdev] Packed instructions generaetd by LoopVectorize?

...m I doing something wrong? Tyler float dotproduct(float *A, float *B, int n) { float sum = 0; for(int i = 0; i < n; ++i) { sum += A[i] * B[i]; } return sum; } clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o - <loop body> .LBB1_1: movss (%rdi), %xmm1 addq $4, %rdi mulss (%rsi), %xmm1 addq $4, %rsi decl %edx addss %xmm1, %xmm0 jne .LBB1_1 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130403/529c8ae3/att...

[LLVMdev] Exception handling question

2010 Jan 22

2

[LLVMdev] Exception handling question

...# %entry subq $56, %rsp .Llabel294: .LBB153_1: movq %rdi, 24(%rsp) movq %rsi, 48(%rsp) movl %edx, 44(%rsp) movq %rcx, 32(%rsp) .LBB153_2: # %.try_body movq 32(%rsp), %rdi .Llabel291: addq $16, %rdi xorb %al, %al call _Unwind_RaiseException .Llabel292: jmp .LBB153_4 .LBB153_3: # %.finally_pad .Llabel293: movq %rax, 16(%rsp) testq %rdx, %rdx setne %al movzbl %al, %eax movq %rax,...

best way to represent function call with new stack in LLVM IR?

2018 May 11

2

best way to represent function call with new stack in LLVM IR?

...uot; # set the new base pointer for this function\0A movq %rsi, %rbp\0A # store stack pointer of this function for later\0A movq %rsp, (%rsi)\0A # save this new stack pointer for use later\0A movq %rsp, r11\0A # compute the new stack pointer for this function\0A subq %rdi, %rsi\0A addq %rsp, %rsi \0A movq %rsi, %rsp\0A # copy args that were passed via the old stack to the new stack\0A # %r11 marches towards %rdi which is the source addresses\0A1:\0A cmpq %rdi, %r11\0A je 2\0A movq (%r11), %r12\0A movq %r12, (%rsi)\0A addq $$0x8, %rsi\0A addq $$0x8, %r11\0A jmp...

[LLVMdev] Scheduled Instructions go missing

2010 May 19

1

[LLVMdev] Scheduled Instructions go missing

All, I'm working on a new scheduler. I have a basic block for which my scheduler generates bad code. The C code looks like int j, *p; if ((j = *p++) != 0) {...} My scheduler emits (x86, AT&T) mov p, %rax mov (%rax), %rax mov %rax, j addq $0x04, p je ... Notice there is no test instruction. The default list scheduler generates mov p, %rax mov (%rax), %rax mov %rax, j addq $0x04, p test %rax je ... The sequence generated by both schedulers after scheduling and before emission and they are the same. Specifically, the test i...

[PATCH 0/12] Early USB debug port and i386 boot cleanups

2007 Apr 30

2

[PATCH 0/12] Early USB debug port and i386 boot cleanups

Modern hardware relies primarily on memory mapped I/O which is typically at addresses that are not mapped by the kernels initial page tables, which makes using them currently unusable for early debugging print support. So this patch set digs in and fixes the early page tables on both arch/i386 and arch/x86_64 so that set_fixmap works with our initial boot page tables. All that is needed is that

[PATCH 0/12] Early USB debug port and i386 boot cleanups

2007 Apr 30

2

[PATCH 0/12] Early USB debug port and i386 boot cleanups

Modern hardware relies primarily on memory mapped I/O which is typically at addresses that are not mapped by the kernels initial page tables, which makes using them currently unusable for early debugging print support. So this patch set digs in and fixes the early page tables on both arch/i386 and arch/x86_64 so that set_fixmap works with our initial boot page tables. All that is needed is that

[LLVMdev] equivalent IR, different asm

2010 Sep 01

5

[LLVMdev] equivalent IR, different asm

...shq %rbx subq $8, %rsp movq %rsi, %rbx movq %rdi, %r14 movq %rdx, %rdi movq %rcx, %rsi callq __ZN7WebCore4viziEPKNS_20RenderBoxModelObjectEPNS_10StyleImageE movq %rax, %rcx shrq $32, %rcx testl %ecx, %ecx je LBB0_2 ## BB#1: imull (%rbx), %eax cltd idivl %ecx movl %eax, (%r14) LBB0_2: addq $8, %rsp popq %rbx popq %r14 ret $ llc opt-fail.ll -o - .section __TEXT,__text,regular,pure_instructions .globl __ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxModelObjectEPNS_10StyleImageE .align 4, 0x90 __ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxModelObjectEPNS_10StyleImageE: ## @_ZN7WebCore6kolo...

[LLVMdev] Packed instructions generaetd by LoopVectorize?

2013 Apr 03

0

[LLVMdev] Packed instructions generaetd by LoopVectorize?

...*B, int n) { > float sum = 0; > for(int i = 0; i < n; ++i) { > sum += A[i] * B[i]; > } > return sum; > } > > clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o - > > <loop body> > .LBB1_1: > movss (%rdi), %xmm1 > addq $4, %rdi > mulss (%rsi), %xmm1 > addq $4, %rsi > decl %edx > addss %xmm1, %xmm0 > jne .LBB1_1 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachm...

[LLVMdev] equivalent IR, different asm

2010 Sep 01

0

[LLVMdev] equivalent IR, different asm

...rdx, %rdi > movq %rcx, %rsi > callq __ZN7WebCore4viziEPKNS_20RenderBoxModelObjectEPNS_10StyleImageE > movq %rax, %rcx > shrq $32, %rcx > testl %ecx, %ecx > je LBB0_2 > ## BB#1: > imull (%rbx), %eax > cltd > idivl %ecx > movl %eax, (%r14) > LBB0_2: > addq $8, %rsp > popq %rbx > popq %r14 > ret > > > $ llc opt-fail.ll -o - > > .section __TEXT,__text,regular,pure_instructions > .globl __ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxModelObjectEPNS_10StyleImageE > .align 4, 0x90 > __ZN7WebCore6kolos1ERiS0_PKNS_20Rende...

[LLVMdev] Exception handling question

2010 Jan 22

0

[LLVMdev] Exception handling question

....globl f .type f, at function f: # @f .Leh_func_begin1: # BB#0: # %e subq $8, %rsp .Llabel4: .Llabel1: callq g .Llabel2: # BB#1: # %c addq $8, %rsp ret .LBB1_2: # %u .Llabel3: addq $8, %rsp ret .size f, .-f .Leh_func_end1: .section .gcc_except_table,"a", at progbits .align 4 GCC_except_table1: .byte 0...

[llvm-mca] Resource consumption of ProcResGroups

2020 May 09

2

[llvm-mca] Resource consumption of ProcResGroups

Hi, I’m trying to work out the behavior of llvm-mca on instructions with ProcResGroups. My current understanding is: When an instruction requests a port group (e.g., HWPort015) and all of its atomic sub-resources (e.g., HWPort0,HWPort1,HWPort5), HWPort015 is marked as “reserved” and is issued in parallel with HWPort0, HWPort1, and HWPort5, blocking future instructions from reserving HWPort015

search for: addq