Displaying 20 results from an estimated 1417 matches for "rdx".
Did you mean:
idx
2016 Jun 29
2
avx512 JIT backend generates wrong code on <4 x float>
...l,avx512f,f16c,ssse3,mmx,-pku,cmov,-xop,rdseed,movbe,-hle,xsaveopt,-sha,sse2,sse3,-avx512dq,
Assembly:
.text
.file "module_KFxOBX_i4_after.ll"
.globl adjmul
.align 16, 0x90
.type adjmul, at function
adjmul:
.cfi_startproc
leaq (%rdi,%r8), %rdx
addq %rsi, %r8
testb $1, %cl
cmoveq %rdi, %rdx
cmoveq %rsi, %r8
movq %rdx, %rax
sarq $63, %rax
shrq $62, %rax
addq %rdx, %rax
sarq $2, %rax
movq %r8, %rcx
sarq $63, %rcx
shrq $62, %rcx
addq %r8,...
2019 Aug 30
1
New lazyload rdx key type: list(eagerKey=, lazyKeys=)
Prior to R-3.6.0 the keys in the lazyload key files, e.g.
pkg/data/Rdata.rdx or pkg/R/pkg.rdx, seemed to all be 2-long integer
vectors. Now they can be lists. The ones I have seen have two components,
"eagerKey" is a 2-long integer vector and "lazyKeys" is a named list of
2-long integer vectors.
> rdx <- readRDS(system.file(package="surviva...
2016 Jun 29
0
avx512 JIT backend generates wrong code on <4 x float>
...d,movbe,-hle,xsaveopt,-sha,sse2,sse3,-avx512dq,
> Assembly:
> .text
> .file "module_KFxOBX_i4_after.ll"
> .globl adjmul
> .align 16, 0x90
> .type adjmul, at function
> adjmul:
> .cfi_startproc
> leaq (%rdi,%r8), %rdx
> addq %rsi, %r8
> testb $1, %cl
> cmoveq %rdi, %rdx
> cmoveq %rsi, %r8
> movq %rdx, %rax
> sarq $63, %rax
> shrq $62, %rax
> addq %rdx, %rax
> sarq $2, %rax
> movq %r8, %rcx
> sarq...
2016 Jun 30
1
avx512 JIT backend generates wrong code on <4 x float>
...2dq,
>> Assembly:
>> .text
>> .file "module_KFxOBX_i4_after.ll"
>> .globl adjmul
>> .align 16, 0x90
>> .type adjmul, at function
>> adjmul:
>> .cfi_startproc
>> leaq (%rdi,%r8), %rdx
>> addq %rsi, %r8
>> testb $1, %cl
>> cmoveq %rdi, %rdx
>> cmoveq %rsi, %r8
>> movq %rdx, %rax
>> sarq $63, %rax
>> shrq $62, %rax
>> addq %rdx, %rax
>> sarq $2, %r...
2018 Sep 11
2
Byte-wide stores aren't coalesced if interspersed with other stores
Andres:
FWIW, codegen will do the merge if you turn on global alias analysis for it
"-combiner-global-alias-analysis". That said, we should be able to do this
merging earlier.
-Nirav
On Mon, Sep 10, 2018 at 8:33 PM, Andres Freund via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi,
>
> On 2018-09-10 13:42:21 -0700, Andres Freund wrote:
> > I have, in postres,
2016 Jun 23
2
AVX512 instruction generated when JIT compiling for an avx2 architecture
...ed.
Looking at the assembler reveals an AVX512 instruction which shouldn't
be there.
Assembly:
.text
.file "module"
.globl main
.align 16, 0x90
.type main, at function
main:
.cfi_startproc
movq 8(%rsp), %r10
leaq (%rdi,%r8), %rdx
addq %rsi, %r8
testb $1, %cl
cmoveq %rdi, %rdx
cmoveq %rsi, %r8
movq %rdx, %rax
sarq $63, %rax
shrq $62, %rax
addq %rdx, %rax
sarq $2, %rax
movq %r8, %rcx
sarq $63, %rcx
shrq $62, %rcx
addq %r8,...
2013 Aug 20
0
[LLVMdev] Memory optimizations for LLVM JIT
...IT is not as good as that generated by clang or llc.
Here is an example:
--------------------------------------------------------------------
source fragment ==> clang or llc
struct {
uint64_t a[10];
} *p;
mov 0x8(%rax),%rdx
p->a[2] = p->a[1]; mov %rdx,0x10(%rax)
p->a[3] = p->a[1]; ==> mov %rdx,0x18(%rax)
p->a[4] = p->a[2]; mov %rdx,0x20(%rax)
p->a[5] = p->a[4]; mov %rdx,0x28(%rax)
-------------------...
2017 Oct 11
1
[PATCH v1 06/27] x86/entry/64: Adapt assembly for PIE support
...entry)
movl %ecx, %eax /* zero extend */
cmpq %rax, RIP+8(%rsp)
je .Lbstep_iret
- cmpq $.Lgs_change, RIP+8(%rsp)
+ leaq .Lgs_change(%rip), %rcx
+ cmpq %rcx, RIP+8(%rsp)
jne .Lerror_entry_done
/*
@@ -1383,10 +1388,10 @@ ENTRY(nmi)
* resume the outer NMI.
*/
- movq $repeat_nmi, %rdx
+ leaq repeat_nmi(%rip), %rdx
cmpq 8(%rsp), %rdx
ja 1f
- movq $end_repeat_nmi, %rdx
+ leaq end_repeat_nmi(%rip), %rdx
cmpq 8(%rsp), %rdx
ja nested_nmi_out
1:
@@ -1440,7 +1445,8 @@ nested_nmi:
pushq %rdx
pushfq
pushq $__KERNEL_CS
- pushq $repeat_nmi
+ leaq repeat_nmi(%rip), %rdx
+ pus...
2013 Oct 22
1
[LLVMdev] System call miscompilation using the fast register allocator
...%val = alloca i32, align 4
store i32 1, i32* %val, align 4
%0 = ptrtoint i32* %val to i64
call void asm sideeffect "", "{r8}"(i64 4) nounwind
call void asm sideeffect "", "{r10}"(i64 %0) nounwind
call void asm sideeffect "", "{rdx}"(i64 3) nounwind
call void asm sideeffect "", "{rsi}"(i64 1) nounwind
call void asm sideeffect "", "{rdi}"(i64 -1) nounwind
%1 = call i64 asm sideeffect "", "={rdi}"() nounwind
%2 = call i64 asm sideeffect "", &...
2013 Aug 20
4
[LLVMdev] Memory optimizations for LLVM JIT
...IT is not as good as that generated
by clang or llc.
Here is an example:
--------------------------------------------------------------------
source fragment ==> clang or llc
struct {
uint64_t a[10];
} *p;
mov 0x8(%rax),%rdx
p->a[2] = p->a[1]; mov %rdx,0x10(%rax)
p->a[3] = p->a[1]; ==> mov %rdx,0x18(%rax)
p->a[4] = p->a[2]; mov %rdx,0x20(%rax)
p->a[5] = p->a[4]; mov %rdx,0x28(%rax)
-------------------...
2016 Jun 23
2
AVX512 instruction generated when JIT compiling for an avx2 architecture
...re.
>
> Assembly:
> .text
> .file "module"
> .globl main
> .align 16, 0x90
> .type main, at function
> main:
> .cfi_startproc
> movq 8(%rsp), %r10
> leaq (%rdi,%r8), %rdx
> addq %rsi, %r8
> testb $1, %cl
> cmoveq %rdi, %rdx
> cmoveq %rsi, %r8
> movq %rdx, %rax
> sarq $63, %rax
> shrq $62, %rax
> addq %rdx, %rax
> sarq $2, %rax
> mo...
2018 Sep 11
2
Byte-wide stores aren't coalesced if interspersed with other stores
...ss all the
> previous stores) that allow it to its job.
>
> In the case at hand, with a manual 64bit store (this is on a 64bit
> target), llvm then combines 8 byte-wide stores into one.
>
>
> Without -combiner-global-alias-analysis it generates:
>
> movb $0, 1(%rdx)
> movl 4(%rsi,%rdi), %ebx
> movq %rbx, 8(%rcx)
> movb $0, 2(%rdx)
> movl 8(%rsi,%rdi), %ebx
> movq %rbx, 16(%rcx)
> movb $0, 3(%rdx)
> movl 12(%rsi,%rdi), %ebx
> movq %rbx, 24(%rcx)
>...
2015 Mar 03
2
[LLVMdev] Need a clue to improve the optimization of some C code
...ss ?
Thanks for any feedback.
Ciao
Nat!
P.S. In case someone is interested, here is the assembler code and the IR that produced it.
Relevant LLVM generated x86_64 assembler portion with -Os
~~~
testq %r12, %r12
je LBB0_5
## BB#1:
movq -8(%r12), %rcx
movq (%rcx), %rax
movq -8(%rax), %rdx
andq %r15, %rdx
cmpq %r15, (%rax,%rdx)
je LBB0_2
## BB#3:
addq $8, %rcx
jmp LBB0_4
LBB0_2:
leaq 8(%rdx,%rax), %rcx
LBB0_4:
movq %r12, %rdi
movq %r15, %rsi
movq %r14, %rdx
callq *(%rcx)
movq %rax, %rbx
LBB0_5:
~~~
Better/tighter assembler code would be (saves 2 instructions, one jump les...
2015 Jul 24
2
[LLVMdev] SIMD for sdiv <2 x i64>
...sdiv <2 x i64> %sub.ptr.sub.i6.i.i.i.i, <i64 24, i64 24>
>>
>> Assembly:
>> vpsubq %xmm6, %xmm5, %xmm5
>> vmovq %xmm5, %rax
>> movabsq $3074457345618258603, %rbx # imm = 0x2AAAAAAAAAAAAAAB
>> imulq %rbx
>> movq %rdx, %rcx
>> movq %rcx, %rax
>> shrq $63, %rax
>> shrq $2, %rcx
>> addl %eax, %ecx
>> vpextrq $1, %xmm5, %rax
>> imulq %rbx
>> movq %rdx, %rax
>> shrq $63, %rax
>> shrq $2, %rdx
>...
2015 Jul 24
0
[LLVMdev] SIMD for sdiv <2 x i64>
...%rbx
movq %rbx, %rdi
vmovaps %xmm2, 96(%rsp) # 16-byte Spill
vmovaps %xmm5, 64(%rsp) # 16-byte Spill
vmovdqa %xmm6, 16(%rsp) # 16-byte Spill
callq _Znam
movq %rax, 128(%rsp)
movq 16(%r12), %rsi
movq %rax, %rdi
movq %rbx, %rdx
callq memmove
vmovdqa 16(%rsp), %xmm6 # 16-byte Reload
vmovaps 64(%rsp), %xmm5 # 16-byte Reload
vmovaps 96(%rsp), %xmm2 # 16-byte Reload
vmovdqa .LCPI582_0(%rip), %xmm4
.LBB582_4: # %invoke.cont
vmovaps %xmm2, 96(%rsp)...
2016 Oct 27
1
PIC and mcmodel=large on x86 doesn't use any relocations
...re() {
// Large Memory Model code sequences from AMD64 abi
// Figure 3.22: Position-Independent Global Data Load and Store
//
// Assume that %r15 has been loaded with GOT address by
// function prologue.
// movabs $Lsrc at GOTOFF,%rax ; R_X86_64_GOTOFF64
// movabs $Ldst at GOTOFF,%rdx ; R_X86_64_GOTOFF64
// movl (%rax,%r15),%ecx
// movl %ecx,(%rdx,%r15)
dst = src;
// movabs $dptr at GOT,%rax ; R_X86_64_GOT64
// movabs $Ldst at GOTOFF,%rdx ; R_X86_64_GOTOFF64
// movq (%rax,%r15),%rax
// leaq (%rdx,%r15),%rcx
// movq %rcx,(%rax)
dptr = &dst;...
2015 Oct 27
4
How can I tell llvm, that a branch is preferred ?
...d branch, correct ? I see nothing in the specs for "branch"
or "switch". And __buildin_expect does nothing, that I am sure of.
Unfortunately llvm has this knack for ordering my one most crucial part
of code exactly the opposite I want to, it does: (x86_64)
cmpq %r15, (%rax,%rdx)
jne LBB0_3
Ltmp18:
leaq 8(%rax,%rdx), %rcx
jmp LBB0_4
LBB0_3:
addq $8, %rcx
LBB0_4:
when I want,
cmpq %r15, (%rax,%rdx)
jeq LBB0_3
addq $8, %rcx
jmp LBB0_4
LBB0_3:
leaq 8(%rax,%rdx), %rcx
LBB0_4:
since that saves me executing a jump 99.9% of the time. Is there
anything I can do ?...
2014 Oct 13
2
[LLVMdev] Unexpected spilling of vector register during lane extraction on some x86_64 targets
..._dso_handle+0x18>
400511: vcvttps2dq %xmm1,%xmm1
400515: vpmullw 0x183(%rip),%xmm1,%xmm1 # 4006a0
<__dso_handle+0x28>
40051d: vpsubd %xmm1,%xmm0,%xmm0
400521: vmovq %xmm0,%rax
400526: movslq %eax,%rcx
400529: sar $0x20,%rax
40052d: vpextrq $0x1,%xmm0,%rdx
400533: movslq %edx,%rsi
400536: sar $0x20,%rdx
40053a: vmovss 0x4006c0(,%rcx,4),%xmm0
400543: vinsertps $0x10,0x4006c0(,%rax,4),%xmm0,%xmm0
40054e: vinsertps $0x20,0x4006c0(,%rsi,4),%xmm0,%xmm0
400559: vinsertps $0x30,0x4006c0(,%rdx,4),%xmm0,%xmm0
400564: vmulps 0x14...
2017 Apr 19
3
[cfe-dev] FE_INEXACT being set for an exact conversion from float to unsigned long long
...o I can find where the sequence is emitted.
>
>
> $ more llvm/lib/Target/X86//README-X86-64.txt
> …
> Are we better off using branches instead of cmove to implement FP to
> unsigned i64?
>
> _conv:
> ucomiss LC0(%rip), %xmm0
> cvttss2siq %xmm0, %rdx
> jb L3
> subss LC0(%rip), %xmm0
> movabsq $-9223372036854775808, %rax
> cvttss2siq %xmm0, %rdx
> xorq %rax, %rdx
> L3:
> movq %rdx, %rax
> ret
>
> instead of
>
> _conv:
> movs...
2015 Jul 24
1
[LLVMdev] SIMD for sdiv <2 x i64>
...ovaps %xmm2, 96(%rsp) # 16-byte Spill
> vmovaps %xmm5, 64(%rsp) # 16-byte Spill
> vmovdqa %xmm6, 16(%rsp) # 16-byte Spill
> callq _Znam
> movq %rax, 128(%rsp)
> movq 16(%r12), %rsi
> movq %rax, %rdi
> movq %rbx, %rdx
> callq memmove
> vmovdqa 16(%rsp), %xmm6 # 16-byte Reload
> vmovaps 64(%rsp), %xmm5 # 16-byte Reload
> vmovaps 96(%rsp), %xmm2 # 16-byte Reload
> vmovdqa .LCPI582_0(%rip), %xmm4
> .LBB582_4: # %invoke.cont...