search for: insertps

Displaying 20 results from an estimated 22 matches for "insertps".

Did you mean: vinsertps
2009 Dec 02
2
[LLVMdev] More AVX Advice Needed
...com> wrote: > > I'm working on some of the AVX insert/extract instructions.  They're > > stupid.  They do not operate on ymm registers, meaning we have to > > use VINSERTF128/VEXTRACTF128 and then do the real operation. > > > > Anyway, I'm looking at how INSERTPS and friends work and noticed that > > there are special SelectionDAG nodes for them and corresponding TableGen > > dag operators (X86insrtps, for example). > > > > What's the reason for using special dag operators as opposed to > > intrinsics? > > INSERTPS is...
2009 Dec 02
0
[LLVMdev] More AVX Advice Needed
...gt; > I'm working on some of the AVX insert/extract instructions.  They're >> > stupid.  They do not operate on ymm registers, meaning we have to >> > use VINSERTF128/VEXTRACTF128 and then do the real operation. >> > >> > Anyway, I'm looking at how INSERTPS and friends work and noticed that >> > there are special SelectionDAG nodes for them and corresponding TableGen >> > dag operators (X86insrtps, for example). >> > >> > What's the reason for using special dag operators as opposed to >> > intrinsics? &...
2009 Dec 02
1
[LLVMdev] More AVX Advice Needed
...king on some of the AVX insert/extract instructions.  They're > >> > stupid.  They do not operate on ymm registers, meaning we have to > >> > use VINSERTF128/VEXTRACTF128 and then do the real operation. > >> > > >> > Anyway, I'm looking at how INSERTPS and friends work and noticed that > >> > there are special SelectionDAG nodes for them and corresponding > >> > TableGen dag operators (X86insrtps, for example). > >> > > >> > What's the reason for using special dag operators as opposed to > &g...
2014 Jun 11
2
[LLVMdev] constraining two virtual registers to be the same physical register
On 06/10/2014 05:51 PM, Pete Cooper wrote: > Hi Reed > > You can do this on the instruction itself by telling it 2 operands > must be the same register. For example, from X86: > > let Constraints = "$src1 = $dst" in > defm INSERTPS : SS41I_insertf32<0x21, "insertps">; > > Thanks, Hi Pete, Sorry. I should have been more specific. I'm looking for a way to do this in c++. I'm aware of how it is done in tablegen. Reed > Pete >> On Jun 10, 2014, at 5:38 PM, reed kotler <rkotler at mi...
2009 Dec 02
2
[LLVMdev] More AVX Advice Needed
I'm working on some of the AVX insert/extract instructions. They're stupid. They do not operate on ymm registers, meaning we have to use VINSERTF128/VEXTRACTF128 and then do the real operation. Anyway, I'm looking at how INSERTPS and friends work and noticed that there are special SelectionDAG nodes for them and corresponding TableGen dag operators (X86insrtps, for example). What's the reason for using special dag operators as opposed to intrinsics? -Dave
2014 Sep 09
5
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Hi Chandler, Thanks for fixing the problem with the insertps mask. Generally the new shuffle lowering looks promising, however there are some cases where the codegen is now worse causing runtime performance regressions in some of our internal codebase. You have already mentioned how the new shuffle lowering is missing some features; for example, you explic...
2009 Dec 02
0
[LLVMdev] More AVX Advice Needed
...d Greene <dag at cray.com> wrote: > I'm working on some of the AVX insert/extract instructions.  They're > stupid.  They do not operate on ymm registers, meaning we have to > use VINSERTF128/VEXTRACTF128 and then do the real operation. > > Anyway, I'm looking at how INSERTPS and friends work and noticed that > there are special SelectionDAG nodes for them and corresponding TableGen > dag operators (X86insrtps, for example). > > What's the reason for using special dag operators as opposed to intrinsics? INSERTPS isn't an intrinsic because there'...
2014 Jun 11
2
[LLVMdev] constraining two virtual registers to be the same physical register
Does anyone know if there is a way to constrain two virtual registers to be allocated to the same physical register? Tia. Reed
2014 Sep 10
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...0 = xmm0[0],xmm1[5],xmm0[2],xmm1[7] > > llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx): > vshufps $-40, %xmm0, %xmm1, %xmm0 # xmm0 = xmm1[0,2],xmm0[1,3] > vshufps $-40, %xmm0, %xmm0, %xmm0 # xmm0[0,2,1,3] > > > 2) On SSE4.1, we should try not to emit an insertps if the shuffle > mask identifies a blend. At the moment the new lowering logic is very > aggressively emitting insertps instead of cheaper blendps. > > Example: > ;;; > define <4 x float> @bar(<4 x float> %A, <4 x float> %B) { > %1 = shufflevector <4 x f...
2014 Sep 05
3
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> wrote: > Unfortunately, another team, while doing internal testing has seen the > new path generating illegal insertps masks. A sample here: > > vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] > vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] > vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] > vinsertps $416, %xmm1, %...
2014 Sep 10
13
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...ter >> reciprocal throughput (this is true on all modern Intel and AMD cpus). > > > Yep. I think this is actually super easy. I'll add support for blendps > shortly. Thanks Chandler! > >> 3) When a shuffle performs an insert at index 0 we always generate an >> insertps, while a movss would do a better job. >> ;;; >> define <4 x float> @baz(<4 x float> %A, <4 x float> %B) { >> %1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 4, >> i32 1, i32 2, i32 3> >> ret <4 x float&...
2014 Sep 08
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com <mailto:rob.lougher at gmail.com>> >>>> wrote: >>>>> >>>>> Unfortunately, another team, while doing internal testing has seen the >>>>> new path generating illegal insertps masks. A sample here: >>>>> >>>>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] >>>>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >>>>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xm...
2014 Sep 04
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Greetings all, As you may have noticed, there is a new vector shuffle lowering path in the X86 backend. You can try it out with the '-x86-experimental-vector-shuffle-lowering' flag to llc, or '-mllvm -x86-experimental-vector-shuffle-lowering' to clang. Please test it out! There may be some correctness bugs, I'm still fuzz testing it to shake them out. But I expect fairly few
2014 Sep 05
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...;mailto:chandlerc at gmail.com>> wrote: >> >> On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> >> wrote: >>> >>> Unfortunately, another team, while doing internal testing has seen the >>> new path generating illegal insertps masks. A sample here: >>> >>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] >>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] >&g...
2013 Dec 05
3
[LLVMdev] X86 - Help on fixing a poor code generation bug
...s like this: a0 : f32 = extract_vector_elt ( A, 0) b0 : f32 = extract_vector_elt ( B, 0) r0 : f32 = fadd a0, b0 result : v4f32 = insert_vector_elt ( A, r0, 0 ) (with A and B of type v4f32). The 'insert_vector_elt' is custom lowered into either X86ISD::MOVSS or X86ISD::INSERTPS depending on the target's SSE feature level. To start I checked if this bug was caused simply by the lack of specific tablegen patterns to match the complex sequence described above into a single ADDSS instruction. However X86InstrSSE.td already defines an instruction X86::ADDSSrr as a commut...
2014 Sep 09
1
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...---------------------------- > > Thanks, > -Quentin > >> On Sep 9, 2014, at 6:13 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com <mailto:andrea.dibiagio at gmail.com>> wrote: >> >> Hi Chandler, >> >> Thanks for fixing the problem with the insertps mask. >> >> Generally the new shuffle lowering looks promising, however there are >> some cases where the codegen is now worse causing runtime performance >> regressions in some of our internal codebase. >> >> You have already mentioned how the new shuffle lowe...
2010 Aug 31
5
[LLVMdev] "equivalent" .ll files diverge after optimizations are applied
...WebCore5mouniEPNS_15GraphicsContextEPNS_30GraphicsContextPlatformPrivateERKNS_9FloatRectERNS_10FloatPointES8_ movss 8(%rsp), %xmm1 movss 12(%rsp), %xmm0 subss 20(%rsp), %xmm0 subss 16(%rsp), %xmm1 ## kill: XMM1<def> XMM1<kill> XMM1<def> insertps $16, %xmm0, %xmm1 ## xmm1 = xmm1[0],xmm0[0],xmm1[2,3] movq 16(%rsp), %xmm0 addq $24, %rsp ret $ opt -std-compile-opts unopt-fail.ll -o - | llc -o - .section __TEXT,__text,regular,pure_instructions .globl __ZN7WebCore15GraphicsContext19roundToDevicePixelsERKNS_9FloatRectE .align 4, 0x90 __Z...
2014 Sep 06
2
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
...uth <chandlerc at gmail.com> wrote: >> >> >> On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> >> wrote: >> >> >> Unfortunately, another team, while doing internal testing has seen the >> new path generating illegal insertps masks. A sample here: >> >> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] >> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] >> vinsertps...
2010 Aug 31
0
[LLVMdev] "equivalent" .ll files diverge after optimizations are applied
...PNS_30GraphicsContextPlatformPrivateERKNS_9FloatRectERNS_10FloatPointES8_ > movss 8(%rsp), %xmm1 > movss 12(%rsp), %xmm0 > subss 20(%rsp), %xmm0 > subss 16(%rsp), %xmm1 > ## kill: XMM1<def> XMM1<kill> > XMM1<def> > insertps $16, %xmm0, %xmm1 ## xmm1 = xmm1[0],xmm0[0],xmm1[2,3] > movq 16(%rsp), %xmm0 > addq $24, %rsp > ret > > > $ opt -std-compile-opts unopt-fail.ll -o - | llc -o - > > .section __TEXT,__text,regular,pure_instructions > .globl > __ZN7WebCore15GraphicsContext19roundTo...
2010 Aug 31
2
[LLVMdev] "equivalent" .ll files diverge after optimizations are applied
...ntextPlatformPrivateERKNS_9FloatRectERNS_10FloatPointES8_ >> movss 8(%rsp), %xmm1 >> movss 12(%rsp), %xmm0 >> subss 20(%rsp), %xmm0 >> subss 16(%rsp), %xmm1 >> ## kill: XMM1<def> XMM1<kill> XMM1<def> >> insertps $16, %xmm0, %xmm1 ## xmm1 = xmm1[0],xmm0[0],xmm1[2,3] >> movq 16(%rsp), %xmm0 >> addq $24, %rsp >> ret >> >> >> $ opt -std-compile-opts unopt-fail.ll -o - | llc -o - >> >> .section __TEXT,__text,regular,pure_instructions >> .globl __ZN7W...