Displaying 6 results from an estimated 6 matches for "cltq".
Did you mean:
clt
2015 Jun 26
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
...xmm0, %xmm2
vpsrlq $32, %xmm7, %xmm4
vpmuludq %xmm4, %xmm0, %xmm4
vpsllq $32, %xmm4, %xmm4
vpaddq %xmm4, %xmm2, %xmm2
vpsrlq $32, %xmm0, %xmm4
vpmuludq %xmm7, %xmm4, %xmm4
vpsllq $32, %xmm4, %xmm4
vpaddq %xmm4, %xmm2, %xmm2
vpextrq $1, %xmm2, %rax
cltq
vmovq %rax, %xmm4
vmovq %xmm2, %rax
cltq
vmovq %rax, %xmm5
vpunpcklqdq %xmm4, %xmm5, %xmm4 # xmm4 = xmm5[0],xmm4[0]
vpcmpgtq %xmm3, %xmm4, %xmm3
vptest %xmm3, %xmm3
je .LBB10_66
# BB#5: # %for.body.preheader
vpaddq %xmm...
2014 Mar 12
2
[LLVMdev] Autovectorization questions
...n compiled
the sample with
$> ./clang -Ofast -ffast-math test.c -std=c99 -march=core-avx2 -S -o bb.S
-fslp-vectorize-aggressive
and loop body looks like:
.LBB1_2: # %for.body
# =>This Inner Loop Header: Depth=1
cltq
vmovsd (%rsi,%rax,8), %xmm0
movq %r9, %r10
sarq $32, %r10
vaddsd (%rdi,%r10,8), %xmm0, %xmm0
vmovsd %xmm0, (%rdi,%r10,8)
addq %r8, %r9
addl %ecx, %eax
decl %edx
jne .LBB1_2
so vector instructions for scal...
2014 Mar 12
4
[LLVMdev] Autovectorization questions
...t-math test.c -std=c99 -march=core-avx2 -S -o bb.S -fslp-vectorize-aggressive
>>
>> and loop body looks like:
>>
>> .LBB1_2: # %for.body
>> # =>This Inner Loop Header: Depth=1
>> cltq
>> vmovsd (%rsi,%rax,8), %xmm0
>> movq %r9, %r10
>> sarq $32, %r10
>> vaddsd (%rdi,%r10,8), %xmm0, %xmm0
>> vmovsd %xmm0, (%rdi,%r10,8)
>> addq %r8, %r9
>> addl %ecx, %eax
>>...
2015 Jun 24
2
[LLVMdev] Can LLVM vectorize <2 x i32> type
Hi,
Is LLVM be able to generate code for the following code?
%mul = mul <2 x i32> %1, %2, where %1 and %2 are <2 x i32> type.
I am running it on a Haswell processor with LLVM-3.4.2. It seems that it
will generates really complicated code with vpaddq, vpmuludq, vpsllq,
vpsrlq.
Thanks,
Zhi
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
2015 Aug 18
2
RFC for a design change in LoopStrengthReduce / ScalarEvolution
> Of course, and the point is that, for example, on x86_64, the zext here is free. I'm still trying to understand the problem...
>
> In the example you provided in your previous e-mail, we choose the solution:
>
> `GEP @Global, zext(V)` -> `GEP (@Global + zext VStart), {i64 0,+,1}`
> `V` -> `trunc({i64 0,+,1}) + VStart`
>
> instead of the actually-better
2013 Oct 15
0
[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.
...l %cx, %ecx
>> +; movw %cx, (%rsi)
>> +; movslq %ecx, %rcx
>> +;
>> +; We can't produce the above sequence without special SD-level
>> +; heuristics. Now we produce this:
>> +; CHECK: movw %ax, (%rsi)
>> +; CHECK: cwtl
>> +; CHECK: cltq
>>
>> Modified: llvm/trunk/test/CodeGen/X86/pr1505b.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/pr1505b.ll?rev=192750&r1=192749&r2=192750&view=diff
>> ==============================================================================
&...