search for: vandps

Displaying 7 results from an estimated 7 matches for "vandps".

Did you mean: vaddps
2011 Jun 04
0
[LLVMdev] AVX Status?
...ems to be some code for this because >> xor<8 x i32> %m, %m >> works, probably because it can get rid of all bitcasts. > > And it can use xorps to implement the operation. Yes, that makes sense. But why does the same not work with "and" and "or" (-> VANDPS/VORPS) ? Anyway, I am looking forward to testing your patches. Would it be possible to send around a notification when the stuff goes upstream? Thanks a lot :). Best, Ralf
2011 Jun 03
2
[LLVMdev] AVX Status?
...s >> } > > That would be nice indeed Some lowering code would be needed to convert from i1 masks to i8 masks (the so-called packed vs. sparse mask issue). I don't think I've added anything to do this as our vectorizer doesn't generate code this way. >> -> VCMPPS, VANDPS, BLENDVPS >> >> Nadav Rotem sent around a patch a few weeks ago in which he implemented >> codegen for the select for SSE, unfortunately I did not have time to >> look at it in more depth so far. >> >> Can anybody comment on the current status of AVX? > > N...
2011 Jun 07
2
[LLVMdev] AVX Status?
...because >>> xor<8 x i32> %m, %m >>> works, probably because it can get rid of all bitcasts. >> >> And it can use xorps to implement the operation. > > Yes, that makes sense. But why does the same not work with "and" and > "or" (-> VANDPS/VORPS) ? It can. Maybe the pattern for ANDPS isn't there yet. I'd have to dig deeper into the failure. The fact that there are inconsistencies like this is one of the motivations behind the SIMD reorg. There are plenty of such inconsistencies in the existing SSE spec. Hopefully after...
2011 Jun 01
4
[LLVMdev] AVX Status?
...;8 x float> %a, <8 x float> %b, <8 x i1> %m) nounwind readnone { entry: %cmp = fcmp ugt <8 x float> %a, %b %mask = and <8 x i1> %cmp, %m %res = select <8 x i1> %mask, <8 x float> %a, <8 x float> %b ret <8 x float> %res } -> VCMPPS, VANDPS, BLENDVPS Nadav Rotem sent around a patch a few weeks ago in which he implemented codegen for the select for SSE, unfortunately I did not have time to look at it in more depth so far. Can anybody comment on the current status of AVX? Best, Ralf
2011 Jun 02
0
[LLVMdev] AVX Status?
...ounwind readnone { > entry: >   %cmp = fcmp ugt <8 x float> %a, %b >   %mask = and <8 x i1> %cmp, %m >   %res = select <8 x i1> %mask, <8 x float> %a, <8 x float> %b >   ret <8 x float> %res > } That would be nice indeed > -> VCMPPS, VANDPS, BLENDVPS > > Nadav Rotem sent around a patch a few weeks ago in which he implemented > codegen for the select for SSE, unfortunately I did not have time to > look at it in more depth so far. > > Can anybody comment on the current status of AVX? No codegen support yet (although s...
2011 Jun 03
1
[LLVMdev] AVX Status?
...%cmp = fcmp ugt<8 x float> %a, %b >> %mask = and<8 x i1> %cmp, %m >> %res = select<8 x i1> %mask,<8 x float> %a,<8 x float> %b >> ret<8 x float> %res >> } > > That would be nice indeed > >> -> VCMPPS, VANDPS, BLENDVPS >> >> Nadav Rotem sent around a patch a few weeks ago in which he implemented >> codegen for the select for SSE, unfortunately I did not have time to >> look at it in more depth so far. >> >> Can anybody comment on the current status of AVX? > > N...
2015 Jul 14
4
[LLVMdev] Poor register allocation (constants causing spilling)
...ter, and it has spilled a value to the stack. It would have been cheaper to simply fold the constant load into the 3 uses. This is not the only example. Later on we can see this: vmovaps .LCPI0_1(%rip), %xmm6 # xmm6 = [2147483648,2147483648,...] vxorps %xmm6, %xmm2, %xmm3 ... vandps %xmm6, %xmm5, %xmm2 ... vmovaps %xmm1, -56(%rsp) # 16-byte Spill vmovaps %xmm6, %xmm1 ... vmovaps -56(%rsp), %xmm0 # 16-byte Reload ... vxorps %xmm1, %xmm3, %xmm4 ... Here, we have a spill and reload to keep the constant in a register for a single us...