search for: vblendvps

Displaying 6 results from an estimated 6 matches for "vblendvps".

Did you mean: vblendps
2011 Jun 01
4
[LLVMdev] AVX Status?
...%a, <8 x float> %b, i8 1) nounwind readnone %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> %a, <8 x float> %b, <8 x float> %cmp) nounwind readnone ret <8 x float> %res } This works fine and produces the expected assembly (VCMPLTPS + VBLENDVPS). On the other hand, this does not work: define <8 x float> @test2(<8 x float> %a, <8 x float> %b, <8 x i32> %m) nounwind readnone { entry: %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a, <8 x float> %b, i8 1) nounwind readnone...
2011 Jun 02
0
[LLVMdev] AVX Status?
...8 1) nounwind readnone >   %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> > %a, <8 x float> %b, <8 x float> %cmp) nounwind readnone >   ret <8 x float> %res > } > > This works fine and produces the expected assembly (VCMPLTPS + VBLENDVPS). > > On the other hand, this does not work: > > define <8 x float> @test2(<8 x float> %a, <8 x float> %b, <8 x i32> %m) > nounwind readnone { > entry: >   %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a, > <8 x fl...
2011 Jun 03
1
[LLVMdev] AVX Status?
...;> %res = tail call<8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> >> %a,<8 x float> %b,<8 x float> %cmp) nounwind readnone >> ret<8 x float> %res >> } >> >> This works fine and produces the expected assembly (VCMPLTPS + VBLENDVPS). >> >> On the other hand, this does not work: >> >> define<8 x float> @test2(<8 x float> %a,<8 x float> %b,<8 x i32> %m) >> nounwind readnone { >> entry: >> %cmp = tail call<8 x float> @llvm.x86.avx.cmp.ps.256(<8 x...
2012 Jul 27
3
[LLVMdev] X86 FMA4
...o be entered allowing you to mix/match with normal SSE and take advantage of the lack of implicit arguments/nice non-destructive encoding if you choose to (in case you can't tell I like the non-destructive encoding a lot). * This is important since SSE instructions with implicit operands (i.e. vblendvps) now have an explicit operand when instantiated as a 128 bit AVX instruction. ** NOTE The dirty state is not synonymous with the upper bits of all of the ymm registers being zero. See the Intel AVX optimization guide. > As I am sure you are aware, we cannot use SSE (movaps) instructions in an A...
2012 Jul 27
0
[LLVMdev] X86 FMA4
Hey Michael, Thanks for the legwork! It appears that the stats you listed are for movaps [SSE], not vmovaps [AVX]. I would *assume* that vmovaps(m128) is closer to vmovaps(m256), since they are both AVX instructions. Although, yes, I agree that this is not clear from Agner's report. Please correct me if I am misunderstanding. As I am sure you are aware, we cannot use SSE (movaps)
2012 Jul 27
2
[LLVMdev] X86 FMA4
Just looked up the numbers from Agner Fog for Sandy Bridge for vmovaps/etc for loading/storing from memory. vmovaps - load takes 1 load mu op, 3 latency, with a reciprocal throughput of 0.5. vmovaps - store takes 1 store mu op, 1 load mu op for address calculation, 3 latency, with a reciprocal throughput of 1. He does not list vmovsd, but movsd has the same stats as vmovaps, so I feel it is a