thr3ads.net - search: "vblendvp"

Displaying 6 results from an estimated 6 matches for "vblendvp".

Did you mean: vblendvps

2011 Jun 01

[LLVMdev] AVX Status?

...%a, <8 x float> %b, i8 1) nounwind readnone %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> %a, <8 x float> %b, <8 x float> %cmp) nounwind readnone ret <8 x float> %res } This works fine and produces the expected assembly (VCMPLTPS + VBLENDVPS). On the other hand, this does not work: define <8 x float> @test2(<8 x float> %a, <8 x float> %b, <8 x i32> %m) nounwind readnone { entry: %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a, <8 x float> %b, i8 1) nounwind readnone...

[LLVMdev] AVX Status?

2011 Jun 02

[LLVMdev] AVX Status?

...8 1) nounwind readnone > %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> > %a, <8 x float> %b, <8 x float> %cmp) nounwind readnone > ret <8 x float> %res > } > > This works fine and produces the expected assembly (VCMPLTPS + VBLENDVPS). > > On the other hand, this does not work: > > define <8 x float> @test2(<8 x float> %a, <8 x float> %b, <8 x i32> %m) > nounwind readnone { > entry: > %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a, > <8 x f...

[LLVMdev] AVX Status?

2011 Jun 03

[LLVMdev] AVX Status?

...;> %res = tail call<8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> >> %a,<8 x float> %b,<8 x float> %cmp) nounwind readnone >> ret<8 x float> %res >> } >> >> This works fine and produces the expected assembly (VCMPLTPS + VBLENDVPS). >> >> On the other hand, this does not work: >> >> define<8 x float> @test2(<8 x float> %a,<8 x float> %b,<8 x i32> %m) >> nounwind readnone { >> entry: >> %cmp = tail call<8 x float> @llvm.x86.avx.cmp.ps.256(<8...

[LLVMdev] X86 FMA4

2012 Jul 27

[LLVMdev] X86 FMA4

...o be entered allowing you to mix/match with normal SSE and take advantage of the lack of implicit arguments/nice non-destructive encoding if you choose to (in case you can't tell I like the non-destructive encoding a lot). * This is important since SSE instructions with implicit operands (i.e. vblendvps) now have an explicit operand when instantiated as a 128 bit AVX instruction. ** NOTE The dirty state is not synonymous with the upper bits of all of the ymm registers being zero. See the Intel AVX optimization guide. > As I am sure you are aware, we cannot use SSE (movaps) instructions in an...

[LLVMdev] X86 FMA4

2012 Jul 27

[LLVMdev] X86 FMA4

Hey Michael, Thanks for the legwork! It appears that the stats you listed are for movaps [SSE], not vmovaps [AVX]. I would *assume* that vmovaps(m128) is closer to vmovaps(m256), since they are both AVX instructions. Although, yes, I agree that this is not clear from Agner's report. Please correct me if I am misunderstanding. As I am sure you are aware, we cannot use SSE (movaps)

[LLVMdev] X86 FMA4

2012 Jul 27

[LLVMdev] X86 FMA4

Just looked up the numbers from Agner Fog for Sandy Bridge for vmovaps/etc for loading/storing from memory. vmovaps - load takes 1 load mu op, 3 latency, with a reciprocal throughput of 0.5. vmovaps - store takes 1 store mu op, 1 load mu op for address calculation, 3 latency, with a reciprocal throughput of 1. He does not list vmovsd, but movsd has the same stats as vmovaps, so I feel it is a

search for: vblendvp