thr3ads.net - search: "sarq"

[LLVMdev] i1* function argument on x86-64

2015 Jul 27

3

[LLVMdev] i1* function argument on x86-64

I am running into a problem with 'i1*' as a function's argument which seems to have appeared since I switched to LLVM 3.6 (but can have other source, of course). If I look at the assembler that the MCJIT generates for an x86-64 target I see that the array 'i1*' is taken as a sequence of 1 bit wide elements. (I guess that's correct). However, I used to call the function

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 29

2

avx512 JIT backend generates wrong code on <4 x float>

...quot;module_KFxOBX_i4_after.ll" .globl adjmul .align 16, 0x90 .type adjmul, at function adjmul: .cfi_startproc leaq (%rdi,%r8), %rdx addq %rsi, %r8 testb $1, %cl cmoveq %rdi, %rdx cmoveq %rsi, %r8 movq %rdx, %rax sarq $63, %rax shrq $62, %rax addq %rdx, %rax sarq $2, %rax movq %r8, %rcx sarq $63, %rcx shrq $62, %rcx addq %r8, %rcx sarq $2, %rcx movq %rax, %rdx shlq $5, %rdx leaq 16(%r9,%rdx), %rsi orq $16, %rdx...

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 29

0

avx512 JIT backend generates wrong code on <4 x float>

...mul > .align 16, 0x90 > .type adjmul, at function > adjmul: > .cfi_startproc > leaq (%rdi,%r8), %rdx > addq %rsi, %r8 > testb $1, %cl > cmoveq %rdi, %rdx > cmoveq %rsi, %r8 > movq %rdx, %rax > sarq $63, %rax > shrq $62, %rax > addq %rdx, %rax > sarq $2, %rax > movq %r8, %rcx > sarq $63, %rcx > shrq $62, %rcx > addq %r8, %rcx > sarq $2, %rcx > movq %rax, %rdx > shlq $5, %rdx >...

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 30

1

avx512 JIT backend generates wrong code on <4 x float>

...type adjmul, at function >> adjmul: >> .cfi_startproc >> leaq (%rdi,%r8), %rdx >> addq %rsi, %r8 >> testb $1, %cl >> cmoveq %rdi, %rdx >> cmoveq %rsi, %r8 >> movq %rdx, %rax >> sarq $63, %rax >> shrq $62, %rax >> addq %rdx, %rax >> sarq $2, %rax >> movq %r8, %rcx >> sarq $63, %rcx >> shrq $62, %rcx >> addq %r8, %rcx >> sarq $2, %rcx >> movq...

AVX512 instruction generated when JIT compiling for an avx2 architecture

2016 Jun 23

2

AVX512 instruction generated when JIT compiling for an avx2 architecture

...ot;module" .globl main .align 16, 0x90 .type main, at function main: .cfi_startproc movq 8(%rsp), %r10 leaq (%rdi,%r8), %rdx addq %rsi, %r8 testb $1, %cl cmoveq %rdi, %rdx cmoveq %rsi, %r8 movq %rdx, %rax sarq $63, %rax shrq $62, %rax addq %rdx, %rax sarq $2, %rax movq %r8, %rcx sarq $63, %rcx shrq $62, %rcx addq %r8, %rcx sarq $2, %rcx movq (%r10), %r8 movq 8(%r10), %r10 movq %r8, %rdi shrq $32, %rdi...

AVX512 instruction generated when JIT compiling for an avx2 architecture

2016 Jun 23

2

AVX512 instruction generated when JIT compiling for an avx2 architecture

...t function > main: > .cfi_startproc > movq 8(%rsp), %r10 > leaq (%rdi,%r8), %rdx > addq %rsi, %r8 > testb $1, %cl > cmoveq %rdi, %rdx > cmoveq %rsi, %r8 > movq %rdx, %rax > sarq $63, %rax > shrq $62, %rax > addq %rdx, %rax > sarq $2, %rax > movq %r8, %rcx > sarq $63, %rcx > shrq $62, %rcx > addq %r8, %rcx > sarq $2, %rcx > movq (%r10), %r8 >...

Probs with smbfs

2003 Jun 27

2

Probs with smbfs

Hi all I am having trouble with my SMBFS and it is the following Every time I try to connect to other machine in my network, throught the command MOUNT, the folowing ERROR appears. I've already tried to see the manpage but i had not success. [root@backup_sp bin] mount -t smbfs //sarq/c /mnt/windows Password: ERROR: smbfs filesystem not supported by the kernel Please refer to the smbnt (8) manual page smbmnt failed: 255 I want to remember that service smb is running and last week, it was working properly. Please i need this help I get very please about your attention. Thank...

[LLVMdev] Autovectorization questions

2014 Mar 12

2

[LLVMdev] Autovectorization questions

...-march=core-avx2 -S -o bb.S -fslp-vectorize-aggressive and loop body looks like: .LBB1_2: # %for.body # =>This Inner Loop Header: Depth=1 cltq vmovsd (%rsi,%rax,8), %xmm0 movq %r9, %r10 sarq $32, %r10 vaddsd (%rdi,%r10,8), %xmm0, %xmm0 vmovsd %xmm0, (%rdi,%r10,8) addq %r8, %r9 addl %ecx, %eax decl %edx jne .LBB1_2 so vector instructions for scalars (vaddsd, vmovsd) were used in the loop and no real gather/scatter emitte...

[LLVMdev] Reserved call frame

2011 Sep 09

1

[LLVMdev] Reserved call frame

...f the testcase with the reservedCallFrame enabled: # BB#26: # %L51 movq 536(%rsp), %rbp movq $-5, 536(%rsp) movq 552(%rsp), %rsi movq 560(%rsp), %rdx movq 544(%rsp), %rcx movq %rdi, (%rsp) movq $2, 8(%rsp) sarq $4, %rbp leaq *128*(%rsp), %rdi movq %rbp, %r8 movq *40*(%rsp), %r9 # 8-byte Reload callq bs_put_big_integer # a function call with 5 arguments movq 128(%rsp), %rax movq 136(%rsp), %rcx cmpq $0, 144(%rsp) movq %rcx, 560(%rsp...

[LLVMdev] Optimization bug - spurious shift in partial word test

2013 Oct 30

1

[LLVMdev] Optimization bug - spurious shift in partial word test

...lets say >0, by shifting left to get the sign bit into the msb and testing llvm is inserting a spurious right shift instruction. For example this IR: ... %0 = load i64* %a.addr, align 8 %shl = shl i64 %0, 28 %cmp = icmp sgt i64 %shl, 0 ... results in ... shlq $28, %rdi sarq $28, %rdi ; <<< spurious shift testq %rdi, %rdi gcc doesnt have this problem. It just emits the shift and test. The reason appears to be that the instruction combining pass decides that the shift and test is equivalent to a test on the partial word, in this case an I36. &gt...

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

0

[LLVMdev] SIMD for sdiv <2 x i64>

...nt vmovaps %xmm2, 96(%rsp) # 16-byte Spill vmovdqa 48(%rsp), %xmm0 # 16-byte Reload vpsubq %xmm0, %xmm2, %xmm0 vpextrq $1, %xmm0, %rax movabsq $3074457345618258603, %rcx # imm = 0x2AAAAAAAAAAAAAAB imulq %rcx movq %rdx, %rax shrq $63, %rax sarq $2, %rdx addq %rax, %rdx vmovq %rdx, %xmm1 vmovq %xmm0, %rax imulq %rcx movq %rdx, %rax shrq $63, %rax sarq $2, %rdx addq %rax, %rdx vmovq %rdx, %xmm0 vpunpcklqdq %xmm1, %xmm0, %xmm1 # xmm1 = xmm0[0],xmm1[0] vpxor %xmm4, %xmm1,...

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

1

[LLVMdev] SIMD for sdiv <2 x i64>

...16-byte Spill > vmovdqa 48(%rsp), %xmm0 # 16-byte Reload > vpsubq %xmm0, %xmm2, %xmm0 > vpextrq $1, %xmm0, %rax > movabsq $3074457345618258603, %rcx # imm = 0x2AAAAAAAAAAAAAAB > imulq %rcx > movq %rdx, %rax > shrq $63, %rax > sarq $2, %rdx > addq %rax, %rdx > vmovq %rdx, %xmm1 > vmovq %xmm0, %rax > imulq %rcx > movq %rdx, %rax > shrq $63, %rax > sarq $2, %rdx > addq %rax, %rdx > vmovq %rdx, %xmm0 > vpunpcklqdq %xmm1, %xmm0, %xmm1...

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

2

[LLVMdev] SIMD for sdiv <2 x i64>

On 07/24/2015 03:42 AM, Benjamin Kramer wrote: >> On 24.07.2015, at 08:06, zhi chen <zchenhn at gmail.com> wrote: >> >> It seems that that it's hard to vectorize int64 in LLVM. For example, LLVM 3.4 generates very complicated code for the following IR. I am running on a Haswell processor. Is it because there is no alternative AVX/2 instructions for int64? The same thing

[LLVMdev] Autovectorization questions

2014 Mar 12

4

[LLVMdev] Autovectorization questions

...oop body looks like: >> >> .LBB1_2: # %for.body >> # =>This Inner Loop Header: Depth=1 >> cltq >> vmovsd (%rsi,%rax,8), %xmm0 >> movq %r9, %r10 >> sarq $32, %r10 >> vaddsd (%rdi,%r10,8), %xmm0, %xmm0 >> vmovsd %xmm0, (%rdi,%r10,8) >> addq %r8, %r9 >> addl %ecx, %eax >> decl %edx >> jne .LBB1_2 >> >> so vector instructions for scalars...

[LLVMdev] better code for IV

2014 Feb 19

2

[LLVMdev] better code for IV

...xorl %r9d, %r9d movabsq $4294967296, %r8 # imm = 0x100000000 .align 16, 0x90 .LBB0_1: # %L_entry # =>This Inner Loop Header: Depth=1 movq %r9, %rax sarq $32, %rax movss (%rdi,%rax,4), %xmm0 addss (%rsi,%rax,4), %xmm0 movss %xmm0, (%rdx,%rax,4) addq %r8, %r9 decq %rcx jne .LBB0_1 # BB#2: Ret This is what I want to get: ArrayAdd2:...

[RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

2019 Jun 13

2

[RFC] Coding Standards: "prefer `int` for regular arithmetic, use `unsigned` only for bitmask and when you intend to rely on wrapping behavior."

FWIW, the talks linked by Mehdi really do talk about these things and why I don't think the really are the correct trade-off. Even if you imagine an unsigned type that doesn't allow wrapping, I think this is a really bad type. The problem is that you have made the most common value of the type (zero in every study I'm aware of) be a boundary condition. Today, it wraps to a huge value

RFC: Speculative Load Hardening (a Spectre variant #1 mitigation)

2018 Mar 23

5

RFC: Speculative Load Hardening (a Spectre variant #1 mitigation)

...-speculated state value of `-1`): ``` ... .LBB0_4: # %danger cmovneq %r8, %rax # Conditionally update predicate state. shlq $47, %rax orq %rax, %rsp callq other_function movq %rsp, %rax sarq 63, %rax # Sign extend the high bit to all bits. ``` This first puts the predicate state into the high bits of `%rsp` before calling the function and then reads it back out of high bits of `%rsp` afterward. When correctly executing (speculatively or not), these are all no-ops. Wh...

search for: sarq