Displaying 20 results from an estimated 206 matches for "addq".
Did you mean:
add
2016 Jun 29
2
avx512 JIT backend generates wrong code on <4 x float>
...,f16c,ssse3,mmx,-pku,cmov,-xop,rdseed,movbe,-hle,xsaveopt,-sha,sse2,sse3,-avx512dq,
Assembly:
.text
.file "module_KFxOBX_i4_after.ll"
.globl adjmul
.align 16, 0x90
.type adjmul, at function
adjmul:
.cfi_startproc
leaq (%rdi,%r8), %rdx
addq %rsi, %r8
testb $1, %cl
cmoveq %rdi, %rdx
cmoveq %rsi, %r8
movq %rdx, %rax
sarq $63, %rax
shrq $62, %rax
addq %rdx, %rax
sarq $2, %rax
movq %r8, %rcx
sarq $63, %rcx
shrq $62, %rcx
addq %r8, %rcx...
2016 Jun 29
0
avx512 JIT backend generates wrong code on <4 x float>
...saveopt,-sha,sse2,sse3,-avx512dq,
> Assembly:
> .text
> .file "module_KFxOBX_i4_after.ll"
> .globl adjmul
> .align 16, 0x90
> .type adjmul, at function
> adjmul:
> .cfi_startproc
> leaq (%rdi,%r8), %rdx
> addq %rsi, %r8
> testb $1, %cl
> cmoveq %rdi, %rdx
> cmoveq %rsi, %r8
> movq %rdx, %rax
> sarq $63, %rax
> shrq $62, %rax
> addq %rdx, %rax
> sarq $2, %rax
> movq %r8, %rcx
> sarq $63, %rcx
&...
2016 Jun 30
1
avx512 JIT backend generates wrong code on <4 x float>
...bly:
>> .text
>> .file "module_KFxOBX_i4_after.ll"
>> .globl adjmul
>> .align 16, 0x90
>> .type adjmul, at function
>> adjmul:
>> .cfi_startproc
>> leaq (%rdi,%r8), %rdx
>> addq %rsi, %r8
>> testb $1, %cl
>> cmoveq %rdi, %rdx
>> cmoveq %rsi, %r8
>> movq %rdx, %rax
>> sarq $63, %rax
>> shrq $62, %rax
>> addq %rdx, %rax
>> sarq $2, %rax
>> mo...
2014 Jul 23
4
[LLVMdev] the clang 3.5 loop optimizer seems to jump in unintentional for simple loops
...es of code
the code in main is also sometimes different (not just inlined) to the_func
clang -DITER -O2
clang -DITER -O3
gives:
the_func:
leaq 12(%rdi), %rcx
leaq 4(%rdi), %rax
cmpq %rax, %rcx
cmovaq %rcx, %rax
movq %rdi, %rsi
notq %rsi
addq %rax, %rsi
shrq $2, %rsi
incq %rsi
xorl %edx, %edx
movabsq $9223372036854775800, %rax # imm = 0x7FFFFFFFFFFFFFF8
andq %rsi, %rax
pxor %xmm0, %xmm0
je .LBB0_1
# BB#2: # %vector.body.preheader
leaq...
2016 Jan 04
2
Fwd: Strength reduction in loops
Here is a simple loop:
long foo(int len, long* s) {
long sum = 0;
for (int i=0; i<len; i++)
sum += s[i*12];
return sum;
}
There is a multiplication in each loop iteration. Can this be turned
into addition, and is there already a pass that does?
(https://en.wikipedia.org/wiki/Strength_reduction uses this very
situation as an example in the opening paragraph:
"In
2016 Jun 23
2
AVX512 instruction generated when JIT compiling for an avx2 architecture
...ng at the assembler reveals an AVX512 instruction which shouldn't
be there.
Assembly:
.text
.file "module"
.globl main
.align 16, 0x90
.type main, at function
main:
.cfi_startproc
movq 8(%rsp), %r10
leaq (%rdi,%r8), %rdx
addq %rsi, %r8
testb $1, %cl
cmoveq %rdi, %rdx
cmoveq %rsi, %r8
movq %rdx, %rax
sarq $63, %rax
shrq $62, %rax
addq %rdx, %rax
sarq $2, %rax
movq %r8, %rcx
sarq $63, %rcx
shrq $62, %rcx
addq %r8, %rcx...
2016 Jun 23
2
AVX512 instruction generated when JIT compiling for an avx2 architecture
...Assembly:
> .text
> .file "module"
> .globl main
> .align 16, 0x90
> .type main, at function
> main:
> .cfi_startproc
> movq 8(%rsp), %r10
> leaq (%rdi,%r8), %rdx
> addq %rsi, %r8
> testb $1, %cl
> cmoveq %rdi, %rdx
> cmoveq %rsi, %r8
> movq %rdx, %rax
> sarq $63, %rax
> shrq $62, %rax
> addq %rdx, %rax
> sarq $2, %rax
> movq %r8, %rcx
&g...
2015 Oct 27
4
How can I tell llvm, that a branch is preferred ?
...or "switch". And __buildin_expect does nothing, that I am sure of.
Unfortunately llvm has this knack for ordering my one most crucial part
of code exactly the opposite I want to, it does: (x86_64)
cmpq %r15, (%rax,%rdx)
jne LBB0_3
Ltmp18:
leaq 8(%rax,%rdx), %rcx
jmp LBB0_4
LBB0_3:
addq $8, %rcx
LBB0_4:
when I want,
cmpq %r15, (%rax,%rdx)
jeq LBB0_3
addq $8, %rcx
jmp LBB0_4
LBB0_3:
leaq 8(%rax,%rdx), %rcx
LBB0_4:
since that saves me executing a jump 99.9% of the time. Is there
anything I can do ?
Ciao
Nat!
2015 Jul 29
2
[LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address
When I compile attached IR with LLVM 3.6
llc -march=x86-64 -o f.S f.ll
it generates an aligned ADDPS with unaligned address. See attached f.S,
here an extract:
addq $12, %r9 # $12 is not a multiple of 4, thus for
xmm0 this is unaligned
xorl %esi, %esi
.align 16, 0x90
.LBB0_1: # %loop2
# =>This Inner Loop Header: Depth=1
movq offset_array3(,%...
2013 Apr 03
2
[LLVMdev] Packed instructions generaetd by LoopVectorize?
...m I doing something wrong?
Tyler
float dotproduct(float *A, float *B, int n) {
float sum = 0;
for(int i = 0; i < n; ++i) {
sum += A[i] * B[i];
}
return sum;
}
clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o -
<loop body>
.LBB1_1:
movss (%rdi), %xmm1
addq $4, %rdi
mulss (%rsi), %xmm1
addq $4, %rsi
decl %edx
addss %xmm1, %xmm0
jne .LBB1_1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130403/529c8ae3/att...
2010 Jan 22
2
[LLVMdev] Exception handling question
...# %entry
subq $56, %rsp
.Llabel294:
.LBB153_1:
movq %rdi, 24(%rsp)
movq %rsi, 48(%rsp)
movl %edx, 44(%rsp)
movq %rcx, 32(%rsp)
.LBB153_2: # %.try_body
movq 32(%rsp), %rdi
.Llabel291:
addq $16, %rdi
xorb %al, %al
call _Unwind_RaiseException
.Llabel292:
jmp .LBB153_4
.LBB153_3: # %.finally_pad
.Llabel293:
movq %rax, 16(%rsp)
testq %rdx, %rdx
setne %al
movzbl %al, %eax
movq %rax,...
2018 May 11
2
best way to represent function call with new stack in LLVM IR?
...uot; # set the new base pointer for this
function\0A movq %rsi, %rbp\0A # store stack pointer of this function
for later\0A movq %rsp, (%rsi)\0A # save this new stack pointer for use
later\0A movq %rsp, r11\0A # compute the new stack pointer for this
function\0A subq %rdi, %rsi\0A addq %rsp, %rsi \0A movq %rsi,
%rsp\0A # copy args that were passed via the old stack to the new
stack\0A # %r11 marches towards %rdi which is the source
addresses\0A1:\0A cmpq %rdi, %r11\0A je 2\0A movq (%r11), %r12\0A
movq %r12, (%rsi)\0A addq $$0x8, %rsi\0A addq $$0x8, %r11\0A jmp...
2010 May 19
1
[LLVMdev] Scheduled Instructions go missing
All,
I'm working on a new scheduler. I have a basic block for
which my scheduler generates bad code. The C code looks
like
int j, *p;
if ((j = *p++) != 0) {...}
My scheduler emits (x86, AT&T)
mov p, %rax
mov (%rax), %rax
mov %rax, j
addq $0x04, p
je ...
Notice there is no test instruction. The default list
scheduler generates
mov p, %rax
mov (%rax), %rax
mov %rax, j
addq $0x04, p
test %rax
je ...
The sequence generated by both schedulers after scheduling
and before emission and they are the same. Specifically,
the test i...
2007 Apr 30
2
[PATCH 0/12] Early USB debug port and i386 boot cleanups
Modern hardware relies primarily on memory mapped I/O which is typically
at addresses that are not mapped by the kernels initial page tables,
which makes using them currently unusable for early debugging print support.
So this patch set digs in and fixes the early page tables on both
arch/i386 and arch/x86_64 so that set_fixmap works with our initial boot
page tables. All that is needed is that
2007 Apr 30
2
[PATCH 0/12] Early USB debug port and i386 boot cleanups
Modern hardware relies primarily on memory mapped I/O which is typically
at addresses that are not mapped by the kernels initial page tables,
which makes using them currently unusable for early debugging print support.
So this patch set digs in and fixes the early page tables on both
arch/i386 and arch/x86_64 so that set_fixmap works with our initial boot
page tables. All that is needed is that
2010 Sep 01
5
[LLVMdev] equivalent IR, different asm
...shq %rbx
subq $8, %rsp
movq %rsi, %rbx
movq %rdi, %r14
movq %rdx, %rdi
movq %rcx, %rsi
callq __ZN7WebCore4viziEPKNS_20RenderBoxModelObjectEPNS_10StyleImageE
movq %rax, %rcx
shrq $32, %rcx
testl %ecx, %ecx
je LBB0_2
## BB#1:
imull (%rbx), %eax
cltd
idivl %ecx
movl %eax, (%r14)
LBB0_2:
addq $8, %rsp
popq %rbx
popq %r14
ret
$ llc opt-fail.ll -o -
.section __TEXT,__text,regular,pure_instructions
.globl __ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxModelObjectEPNS_10StyleImageE
.align 4, 0x90
__ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxModelObjectEPNS_10StyleImageE: ## @_ZN7WebCore6kolo...
2013 Apr 03
0
[LLVMdev] Packed instructions generaetd by LoopVectorize?
...*B, int n) {
> float sum = 0;
> for(int i = 0; i < n; ++i) {
> sum += A[i] * B[i];
> }
> return sum;
> }
>
> clang dotproduct.cpp -O3 -fvectorize -march=atom -S -o -
>
> <loop body>
> .LBB1_1:
> movss (%rdi), %xmm1
> addq $4, %rdi
> mulss (%rsi), %xmm1
> addq $4, %rsi
> decl %edx
> addss %xmm1, %xmm0
> jne .LBB1_1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachm...
2010 Sep 01
0
[LLVMdev] equivalent IR, different asm
...rdx, %rdi
> movq %rcx, %rsi
> callq __ZN7WebCore4viziEPKNS_20RenderBoxModelObjectEPNS_10StyleImageE
> movq %rax, %rcx
> shrq $32, %rcx
> testl %ecx, %ecx
> je LBB0_2
> ## BB#1:
> imull (%rbx), %eax
> cltd
> idivl %ecx
> movl %eax, (%r14)
> LBB0_2:
> addq $8, %rsp
> popq %rbx
> popq %r14
> ret
>
>
> $ llc opt-fail.ll -o -
>
> .section __TEXT,__text,regular,pure_instructions
> .globl __ZN7WebCore6kolos1ERiS0_PKNS_20RenderBoxModelObjectEPNS_10StyleImageE
> .align 4, 0x90
> __ZN7WebCore6kolos1ERiS0_PKNS_20Rende...
2010 Jan 22
0
[LLVMdev] Exception handling question
....globl f
.type f, at function
f: # @f
.Leh_func_begin1:
# BB#0: # %e
subq $8, %rsp
.Llabel4:
.Llabel1:
callq g
.Llabel2:
# BB#1: # %c
addq $8, %rsp
ret
.LBB1_2:
# %u
.Llabel3:
addq $8, %rsp
ret
.size f, .-f
.Leh_func_end1:
.section .gcc_except_table,"a", at progbits
.align 4
GCC_except_table1:
.byte 0...
2020 May 09
2
[llvm-mca] Resource consumption of ProcResGroups
Hi,
I’m trying to work out the behavior of llvm-mca on instructions with ProcResGroups. My current understanding is:
When an instruction requests a port group (e.g., HWPort015) and all of its atomic sub-resources (e.g., HWPort0,HWPort1,HWPort5), HWPort015 is marked as “reserved” and is issued in parallel with HWPort0, HWPort1, and HWPort5, blocking future instructions from reserving HWPort015