search for: blendp

Displaying 10 results from an estimated 10 matches for "blendp".

Did you mean: blend

[LLVMdev] Regular Expressions

2009 Jun 17

0

[LLVMdev] Regular Expressions

On Tuesday 16 June 2009 19:35, David Greene wrote: > So which is more intuitive and less error-prone? > > defm BLENDPS : sse41_avx_fp_binary_vector_osta_vintrinsic_rmi_rrmi<0x0C, > i32i8imm, "blend", "blend", "f32", 4>; > > or > > defm BLENDPS : sse41_avx_fp_binary_vector_osta_vintrinsic_rmi_rrmi<0x0C, > i32i8imm, "...

[LLVMdev] Regular Expressions

2009 Jun 17

3

[LLVMdev] Regular Expressions

...2 : X86ValueType { let VT = v4f32; let RegClass = VR128; let suffix = "ps"; } class X86_v8f32 : X86ValueType { let VT = v8f32; let RegClass = VR256; let suffix = "ps"; } Ok, you get the picture. Now let's look at how we would write instruction patterns: defm BLENDPS : sse41_avx_fp_binary_vector_osta_vintrinsic_rmi_rrmi<0x0C, i32i8imm, "blend", "blend", "f32">; defm BLENDPD : sse41_avx_fp_binary_vector_osta_vintrinsic_rmi_rrmi<0x0D, i32i8imm, "blend", "blend", &q...

[LLVMdev] Regular Expressions

2009 Jun 17

2

[LLVMdev] Regular Expressions

On Jun 16, 2009, at 5:49 PM, David Greene wrote: > On Tuesday 16 June 2009 19:35, David Greene wrote: > >> So which is more intuitive and less error-prone? >> >> defm BLENDPS : >> sse41_avx_fp_binary_vector_osta_vintrinsic_rmi_rrmi<0x0C, >> i32i8imm, "blend", "blend", "f32", 4>; >> >> or >> >> defm BLENDPS : >> sse41_avx_fp_binary_vector_osta_vintrinsic_rmi_rrmi<0x0C...

[LLVMdev] Regular Expressions

2009 Jun 15

0

[LLVMdev] Regular Expressions

On Jun 15, 2009, at 11:33 AM, David Greene wrote: > To reduce redundancy, developers must be able to write generic > patterns > like this: > > [(set DSTREGCLASS:$dst, // rr, rrr > (xor (INTSRCTYPE (bitconvert (SRCTYPE SRCREGCLASS:$src1))), > (INTSRCTYPE (bitconvert (SRCTYPE SRCREGCLASS:$src2)))))], > > The substitution then fills in the appropriate types,

[LLVMdev] Regular Expressions

2009 Jun 15

2

[LLVMdev] Regular Expressions

Chris Lattner wrote: > However, I don't see any reason to base this off of strings. Instead > of passing down "f32" as a string, why not do something like this > pseudo code: > > class X86ValueType { > RegisterClass RegClass; > ... > } > > def X86_f32 : X86ValueType { > let RegClass = FR32; > ... }; > def X86_i32 :

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 09

5

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...citly said that we currently lack of SSE4.1 blend support. Unfortunately, this seems to be one of the main reasons for the slowdown we are seeing. Here is a list of what we found so far that we think is causing most of the slowdown: 1) shufps is always emitted in cases where we could emit a single blendps; in these cases, blendps is preferable because it has better reciprocal throughput (this is true on all modern Intel and AMD cpus). Things get worse when it comes to lowering shuffles where the shuffle mask indices refer to elements from both input vectors in each lane. For example, a shuffle mas...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 10

2

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...gt; of SSE4.1 blend support. Unfortunately, this seems to be one of the > main reasons for the slowdown we are seeing. > > Here is a list of what we found so far that we think is causing most > of the slowdown: > 1) shufps is always emitted in cases where we could emit a single > blendps; in these cases, blendps is preferable because it has better > reciprocal throughput (this is true on all modern Intel and AMD cpus). > > Yep. I think this is actually super easy. I'll add support for blendps shortly. > > > Things get worse when it comes to lowering shuff...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 10

13

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...port. Unfortunately, this seems to be one of the >> main reasons for the slowdown we are seeing. >> >> Here is a list of what we found so far that we think is causing most >> of the slowdown: >> 1) shufps is always emitted in cases where we could emit a single >> blendps; in these cases, blendps is preferable because it has better >> reciprocal throughput (this is true on all modern Intel and AMD cpus). > > > Yep. I think this is actually super easy. I'll add support for blendps > shortly. Thanks Chandler! > >> 3) When a shuffle pe...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 09

1

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...ort. Unfortunately, this seems to be one of the >> main reasons for the slowdown we are seeing. >> >> Here is a list of what we found so far that we think is causing most >> of the slowdown: >> 1) shufps is always emitted in cases where we could emit a single >> blendps; in these cases, blendps is preferable because it has better >> reciprocal throughput (this is true on all modern Intel and AMD cpus). >> >> Things get worse when it comes to lowering shuffles where the shuffle >> mask indices refer to elements from both input vectors in e...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 08

2

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

...4 x float> %2 >>> } >>> >>> >>> llc -march=x86-64 -mattr=+avx test.ll -o - >>> >>> test: # @test >>> vxorps %xmm2, %xmm2, %xmm2 >>> vmovss %xmm0, %xmm2, %xmm2 >>> vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3] >>> vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] >>> retl >>> >>> test2: # @test2 >>> vinsertps $48, %xmm1, %xmm0, %xmm0 #...