search for: blendp

Displaying 10 results from an estimated 10 matches for "blendp".

Did you mean: blend
2009 Jun 17
0
[LLVMdev] Regular Expressions
On Tuesday 16 June 2009 19:35, David Greene wrote: > So which is more intuitive and less error-prone? > > defm BLENDPS : sse41_avx_fp_binary_vector_osta_vintrinsic_rmi_rrmi<0x0C, > i32i8imm, "blend", "blend", "f32", 4>; > > or > > defm BLENDPS : sse41_avx_fp_binary_vector_osta_vintrinsic_rmi_rrmi<0x0C, > i32i8imm, "...
2009 Jun 17
3
[LLVMdev] Regular Expressions
...2 : X86ValueType { let VT = v4f32; let RegClass = VR128; let suffix = "ps"; } class X86_v8f32 : X86ValueType { let VT = v8f32; let RegClass = VR256; let suffix = "ps"; } Ok, you get the picture. Now let's look at how we would write instruction patterns: defm BLENDPS : sse41_avx_fp_binary_vector_osta_vintrinsic_rmi_rrmi<0x0C, i32i8imm, "blend", "blend", "f32">; defm BLENDPD : sse41_avx_fp_binary_vector_osta_vintrinsic_rmi_rrmi<0x0D, i32i8imm, "blend", "blend", &q...
2009 Jun 17
2
[LLVMdev] Regular Expressions
On Jun 16, 2009, at 5:49 PM, David Greene wrote: > On Tuesday 16 June 2009 19:35, David Greene wrote: > >> So which is more intuitive and less error-prone? >> >> defm BLENDPS : >> sse41_avx_fp_binary_vector_osta_vintrinsic_rmi_rrmi<0x0C, >> i32i8imm, "blend", "blend", "f32", 4>; >> >> or >> >> defm BLENDPS : >> sse41_avx_fp_binary_vector_osta_vintrinsic_rmi_rrmi<0x0C...
2009 Jun 15
0
[LLVMdev] Regular Expressions
On Jun 15, 2009, at 11:33 AM, David Greene wrote: > To reduce redundancy, developers must be able to write generic > patterns > like this: > > [(set DSTREGCLASS:$dst, // rr, rrr > (xor (INTSRCTYPE (bitconvert (SRCTYPE SRCREGCLASS:$src1))), > (INTSRCTYPE (bitconvert (SRCTYPE SRCREGCLASS:$src2)))))], > > The substitution then fills in the appropriate types,
2009 Jun 15
2
[LLVMdev] Regular Expressions
Chris Lattner wrote: > However, I don't see any reason to base this off of strings. Instead > of passing down "f32" as a string, why not do something like this > pseudo code: > > class X86ValueType { > RegisterClass RegClass; > ... > } > > def X86_f32 : X86ValueType { > let RegClass = FR32; > ... }; > def X86_i32 :
2014 Sep 09
5
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...citly said that we currently lack of SSE4.1 blend support. Unfortunately, this seems to be one of the main reasons for the slowdown we are seeing. Here is a list of what we found so far that we think is causing most of the slowdown: 1) shufps is always emitted in cases where we could emit a single blendps; in these cases, blendps is preferable because it has better reciprocal throughput (this is true on all modern Intel and AMD cpus). Things get worse when it comes to lowering shuffles where the shuffle mask indices refer to elements from both input vectors in each lane. For example, a shuffle mas...
2014 Sep 10
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...gt; of SSE4.1 blend support. Unfortunately, this seems to be one of the > main reasons for the slowdown we are seeing. > > Here is a list of what we found so far that we think is causing most > of the slowdown: > 1) shufps is always emitted in cases where we could emit a single > blendps; in these cases, blendps is preferable because it has better > reciprocal throughput (this is true on all modern Intel and AMD cpus). > > Yep. I think this is actually super easy. I'll add support for blendps shortly. > > > Things get worse when it comes to lowering shuff...
2014 Sep 10
13
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...port. Unfortunately, this seems to be one of the >> main reasons for the slowdown we are seeing. >> >> Here is a list of what we found so far that we think is causing most >> of the slowdown: >> 1) shufps is always emitted in cases where we could emit a single >> blendps; in these cases, blendps is preferable because it has better >> reciprocal throughput (this is true on all modern Intel and AMD cpus). > > > Yep. I think this is actually super easy. I'll add support for blendps > shortly. Thanks Chandler! > >> 3) When a shuffle pe...
2014 Sep 09
1
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...ort. Unfortunately, this seems to be one of the >> main reasons for the slowdown we are seeing. >> >> Here is a list of what we found so far that we think is causing most >> of the slowdown: >> 1) shufps is always emitted in cases where we could emit a single >> blendps; in these cases, blendps is preferable because it has better >> reciprocal throughput (this is true on all modern Intel and AMD cpus). >> >> Things get worse when it comes to lowering shuffles where the shuffle >> mask indices refer to elements from both input vectors in e...
2014 Sep 08
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...4 x float> %2 >>> } >>> >>> >>> llc -march=x86-64 -mattr=+avx test.ll -o - >>> >>> test: # @test >>> vxorps %xmm2, %xmm2, %xmm2 >>> vmovss %xmm0, %xmm2, %xmm2 >>> vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3] >>> vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] >>> retl >>> >>> test2: # @test2 >>> vinsertps $48, %xmm1, %xmm0, %xmm0 #...