thr3ads.net - search: "cltq"

[LLVMdev] Can LLVM vectorize <2 x i32> type

2015 Jun 26

2

[LLVMdev] Can LLVM vectorize <2 x i32> type

...xmm0, %xmm2 vpsrlq $32, %xmm7, %xmm4 vpmuludq %xmm4, %xmm0, %xmm4 vpsllq $32, %xmm4, %xmm4 vpaddq %xmm4, %xmm2, %xmm2 vpsrlq $32, %xmm0, %xmm4 vpmuludq %xmm7, %xmm4, %xmm4 vpsllq $32, %xmm4, %xmm4 vpaddq %xmm4, %xmm2, %xmm2 vpextrq $1, %xmm2, %rax cltq vmovq %rax, %xmm4 vmovq %xmm2, %rax cltq vmovq %rax, %xmm5 vpunpcklqdq %xmm4, %xmm5, %xmm4 # xmm4 = xmm5[0],xmm4[0] vpcmpgtq %xmm3, %xmm4, %xmm3 vptest %xmm3, %xmm3 je .LBB10_66 # BB#5: # %for.body.preheader vpaddq %xmm...

[LLVMdev] Autovectorization questions

2014 Mar 12

2

[LLVMdev] Autovectorization questions

...n compiled the sample with $> ./clang -Ofast -ffast-math test.c -std=c99 -march=core-avx2 -S -o bb.S -fslp-vectorize-aggressive and loop body looks like: .LBB1_2: # %for.body # =>This Inner Loop Header: Depth=1 cltq vmovsd (%rsi,%rax,8), %xmm0 movq %r9, %r10 sarq $32, %r10 vaddsd (%rdi,%r10,8), %xmm0, %xmm0 vmovsd %xmm0, (%rdi,%r10,8) addq %r8, %r9 addl %ecx, %eax decl %edx jne .LBB1_2 so vector instructions for scal...

[LLVMdev] Autovectorization questions

2014 Mar 12

4

[LLVMdev] Autovectorization questions

...t-math test.c -std=c99 -march=core-avx2 -S -o bb.S -fslp-vectorize-aggressive >> >> and loop body looks like: >> >> .LBB1_2: # %for.body >> # =>This Inner Loop Header: Depth=1 >> cltq >> vmovsd (%rsi,%rax,8), %xmm0 >> movq %r9, %r10 >> sarq $32, %r10 >> vaddsd (%rdi,%r10,8), %xmm0, %xmm0 >> vmovsd %xmm0, (%rdi,%r10,8) >> addq %r8, %r9 >> addl %ecx, %eax >>...

[LLVMdev] Can LLVM vectorize <2 x i32> type

2015 Jun 24

2

[LLVMdev] Can LLVM vectorize <2 x i32> type

Hi, Is LLVM be able to generate code for the following code? %mul = mul <2 x i32> %1, %2, where %1 and %2 are <2 x i32> type. I am running it on a Haswell processor with LLVM-3.4.2. It seems that it will generates really complicated code with vpaddq, vpmuludq, vpsllq, vpsrlq. Thanks, Zhi -------------- next part -------------- An HTML attachment was scrubbed... URL:

RFC for a design change in LoopStrengthReduce / ScalarEvolution

2015 Aug 18

2

RFC for a design change in LoopStrengthReduce / ScalarEvolution

> Of course, and the point is that, for example, on x86_64, the zext here is free. I'm still trying to understand the problem... > > In the example you provided in your previous e-mail, we choose the solution: > > `GEP @Global, zext(V)` -> `GEP (@Global + zext VStart), {i64 0,+,1}` > `V` -> `trunc({i64 0,+,1}) + VStart` > > instead of the actually-better

[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.

2013 Oct 15

0

[LLVMdev] [llvm-commits] r192750 - Enable MI Sched for x86.

...l %cx, %ecx >> +; movw %cx, (%rsi) >> +; movslq %ecx, %rcx >> +; >> +; We can't produce the above sequence without special SD-level >> +; heuristics. Now we produce this: >> +; CHECK: movw %ax, (%rsi) >> +; CHECK: cwtl >> +; CHECK: cltq >> >> Modified: llvm/trunk/test/CodeGen/X86/pr1505b.ll >> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/pr1505b.ll?rev=192750&r1=192749&r2=192750&view=diff >> ============================================================================== &...

search for: cltq