search for: insert_vector_elt

Displaying 20 results from an estimated 29 matches for "insert_vector_elt".

2013 Jul 01
3
[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
...le thumbv7-apple-ios3 motivating_example.ll -o - > ldr r0, [r1] > ldr r1, [r2] > vmov s1, r1 > vmov s0, r0 > Here each ldr, vmov sequences could have been replaced by a simple vld1.32. > > ** Proposed Solution ** > Lower to more vector friendly code (using a sequence of insert_vector_elt), when bit casts will not be free. > The attached patch demonstrates that, but is missing the proper check to know what DAG combine will do (see TODO). > > I think you're approaching this backwards: the obvious thing to do is to generate the insert_vector_elt sequence unconditionally,...
2013 Jul 01
0
[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
...o - >> ldr r0, [r1] >> ldr r1, [r2] >> vmov s1, r1 >> vmov s0, r0 >> Here each ldr, vmov sequences could have been replaced by a simple >> vld1.32. >> >> ** Proposed Solution ** >> Lower to more vector friendly code (using a sequence of >> insert_vector_elt), when bit casts will not be free. >> The attached patch demonstrates that, but is missing the proper check to >> know what DAG combine will do (see TODO). >> > > I think you're approaching this backwards: the obvious thing to do is to > generate the insert_vector_elt...
2013 Jul 01
0
[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
...le thumbv7-apple-ios3 motivating_example.ll -o - > ldr r0, [r1] > ldr r1, [r2] > vmov s1, r1 > vmov s0, r0 > Here each ldr, vmov sequences could have been replaced by a simple vld1.32. > > ** Proposed Solution ** > Lower to more vector friendly code (using a sequence of > insert_vector_elt), when bit casts will not be free. > The attached patch demonstrates that, but is missing the proper check to > know what DAG combine will do (see TODO). > I think you're approaching this backwards: the obvious thing to do is to generate the insert_vector_elt sequence unconditionally,...
2013 Jul 01
3
[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
...ple.ll shows such a case: llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o - ldr r0, [r1] ldr r1, [r2] vmov s1, r1 vmov s0, r0 Here each ldr, vmov sequences could have been replaced by a simple vld1.32. ** Proposed Solution ** Lower to more vector friendly code (using a sequence of insert_vector_elt), when bit casts will not be free. The attached patch demonstrates that, but is missing the proper check to know what DAG combine will do (see TODO). Thanks for your help. Cheers, -Quentin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/p...
2009 Dec 02
1
[LLVMdev] More AVX Advice Needed
...rd?"  It looks just like an > > intrinsic to me. > > X86insrtps is roughly equivalent to the LLVM IR instruction > insertelement, so there's no need for an IR intrinsic. > X86ISD::INSERTPS is an extra instruction for ISel; it's used inside > the custom lowering for INSERT_VECTOR_ELT and VECTOR_SHUFFLE. Yes, that's how I found out about it. :) Why not just use ISD::INSERT_VECTOR_ELT? And what's the difference between vector_extract and extractelt in TargetSelectionDAG.td? Ditto vector_insert vs. insertelt. -Dave
2016 Aug 02
2
Instruction selection problems due to SelectionDAGBuilder
...r3_wo_getSetCCResultType) Initial selection DAG: BB#15 'foo:vector.ph' SelectionDAG has 41 nodes: t0: ch = EntryToken t4: i32 = Constant<0> t3: i64,ch = CopyFromReg t0, Register:i64 %vreg12 t6: v8i64 = insert_vector_elt undef:v8i64, t3, Constant:i64<0> t7: v8i64 = vector_shuffle<0,0,0,0,0,0,0,0> t6, undef:v8i64 t15: v8i64 = BUILD_VECTOR Constant:i64<0>, Constant:i64<1>, Constant:i64<2>, Constant:i64<3>, Constant:i64<4>, Constant:i64&l...
2009 Dec 02
0
[LLVMdev] More AVX Advice Needed
...ow is X86insrtps "standard?"  It looks just like an > intrinsic to me. X86insrtps is roughly equivalent to the LLVM IR instruction insertelement, so there's no need for an IR intrinsic. X86ISD::INSERTPS is an extra instruction for ISel; it's used inside the custom lowering for INSERT_VECTOR_ELT and VECTOR_SHUFFLE. -Eli
2009 Dec 02
2
[LLVMdev] More AVX Advice Needed
On Wednesday 02 December 2009 16:51, Eli Friedman wrote: > On Wed, Dec 2, 2009 at 2:44 PM, David Greene <dag at cray.com> wrote: > > I'm working on some of the AVX insert/extract instructions.  They're > > stupid.  They do not operate on ymm registers, meaning we have to > > use VINSERTF128/VEXTRACTF128 and then do the real operation. > > > > Anyway,
2009 Feb 16
0
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
...ions in the dst registers. The way that I ended up handling this was to have different register classes for 1, 2, 3 and 4 component vectors. This made the generic cases very simple but still made swizzling fairly difficult. In order to get swizzling to work you only need to handle three SDNodes, insert_vector_elt, extract_vector_elt and build_vector while expanding the rest. For those three nodes I then custom lowered them to a target specific node with an extra integer constant per register that would encode the swizzle mask in 32bits. The correct swizzles can then be generated in the asm printer by decodi...
2016 Mar 18
2
generate vectorized code
...o a <4xi32>, builds two vectors with the 8 ints, > > This might sound like a dumb question, but how does one build a vector of ints out of regular ints in IR? See: http://llvm.org/docs/LangRef.html#vector-operations In short, the IR has "insertelement", which maps to "INSERT_VECTOR_ELT" in SDAG and "extractelement", which maps to "EXTRACT_VECTOR_ELT" in SDAG. I usually find good example by grepping in the lit tests. Another way is to write the function in clang, and run it with -O3 -emit-llvm -S to get a good starting point. -- Mehdi > > sum...
2009 Feb 16
2
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
Evan Cheng-2 wrote: > > Well, how many possible permutations are there? Is it possible to > model each case as a separate physical register? > > Evan > I don't think so. There are 4x4x4x4 = 256 permutations. For example: * xyzw: default * zxyw * yyyy: splat Even if can model each of these 256 cases as a separate physical register, how can I model the use of r0.xyzw in
2013 Dec 05
3
[LLVMdev] X86 - Help on fixing a poor code generation bug
...dss %xmm1, %xmm0 The first step is to understand why the compiler produces the redundant instructions. For the example above, the DAG sequence looks like this: a0 : f32 = extract_vector_elt ( A, 0) b0 : f32 = extract_vector_elt ( B, 0) r0 : f32 = fadd a0, b0 result : v4f32 = insert_vector_elt ( A, r0, 0 ) (with A and B of type v4f32). The 'insert_vector_elt' is custom lowered into either X86ISD::MOVSS or X86ISD::INSERTPS depending on the target's SSE feature level. To start I checked if this bug was caused simply by the lack of specific tablegen patterns to match the comp...
2010 Jul 28
3
[LLVMdev] Subregister coalescing
...gister & ops, *but* not vector loads. All the vector register elements are directly accesible, so VI1 reg (Vector Integer 1) has I4, I5, I6 and I7 as its (integer) subregisters. Subregisters of same reg *never* overlap. Therefore, vector loads are lowered to scalar loads followed by a chain of INSERT_VECTOR_ELTs. Then we select those to INSERT_SUBREG, everything fine to that point. Status before live analisys is (non-related instrs removed): 36 %reg16388<def> = LDWr %reg16384, 0; mem:LD4[<unknown>] 68 %reg16392<def> = INSERT_SUBREG %reg16392<undef>, %reg16388<kill>, 1 76 %r...
2013 Dec 05
0
[LLVMdev] X86 - Help on fixing a poor code generation bug
...to understand why the compiler produces the > redundant instructions. > > For the example above, the DAG sequence looks like this: > > a0 : f32 = extract_vector_elt ( A, 0) > b0 : f32 = extract_vector_elt ( B, 0) > r0 : f32 = fadd a0, b0 > result : v4f32 = insert_vector_elt ( A, r0, 0 ) > > (with A and B of type v4f32). > > The 'insert_vector_elt' is custom lowered into either X86ISD::MOVSS or > X86ISD::INSERTPS depending on the target's SSE feature level. > > To start I checked if this bug was caused simply by the lack of > spec...
2008 Sep 30
0
[LLVMdev] Generalizing shuffle vector
...nce the two sides are the same, it will generate quad word moves > to copy the values. I think this specific issue can be fixed without extending the IL-level syntax; DAGCombiner could easily be made a lot more clever about cases like this. For example, before legalization, we can transform an INSERT_VECTOR_ELT inserting an element into a constant vector or a SCALAR_TO_VECTOR into a BUILD_VECTOR, and we can transform BUILD_VECTOR into CONCAT_VECTORS or EXTRACT_SUBVECTOR for relevant cases. We can also make the lowering significantly more clever about dealing with insertelement. If what we currently have...
2016 Mar 18
4
generate vectorized code
...;> >> This might sound like a dumb question, but how does one build a vector of >> ints out of regular ints in IR? >> >> >> See: http://llvm.org/docs/LangRef.html#vector-operations >> >> In short, the IR has "insertelement", which maps to "INSERT_VECTOR_ELT" >> in SDAG and "extractelement", which maps to "EXTRACT_VECTOR_ELT" in SDAG. >> >> I usually find good example by grepping in the lit tests. Another way is >> to write the function in clang, and run it with -O3 -emit-llvm -S to get a >> good s...
2010 Jul 28
0
[LLVMdev] Subregister coalescing
On Jul 28, 2010, at 12:25 PM, Carlos Sánchez de La Lama wrote: > Which after register coalescing gets transformed into: > > 36 %reg16404:1<def> = LDWr %reg16384, 0; mem:LD4[<unknown>] > 76 %reg16394<def> = LDWr %reg16386<kill>, 0; mem:LD4[<unknown>] > 124 %reg16404<def> = INSERT_SUBREG %reg16404, %reg16394<kill>, 2 > 132
2008 Sep 30
4
[LLVMdev] Generalizing shuffle vector
Hi, The current definition of shuffle vector is <result> = shufflevector <n x <ty>> <v1>, <n x <ty>> <v2>, <n x i32> <mask> ; yields <n x <ty>> The first two operands of a 'shufflevector' instruction are vectors with types that match each other and types that match the result of the instruction. The third
2016 Mar 18
2
generate vectorized code
> On Mar 18, 2016, at 1:37 PM, Rail Shafigulin <rail at esenciatech.com> wrote: > >> I think you created a cycle, this is easy to do with SelectionDAG :) >> Basically SelecitonDAG will iterate until it does not see anything to change. So if you insert a transformation on a pattern A, that generates pattern B, while you have another transformation that matches B and
2009 May 20
2
[LLVMdev] [PATCH] Add new phase to legalization to handle vector operations
...RINSIC_WO_CHAIN: + case ISD::INTRINSIC_W_CHAIN: + case ISD::INTRINSIC_VOID: + case ISD::CopyToReg: + case ISD::CopyFromReg: + case ISD::AssertSext: + case ISD::AssertZext: + // Node cannot be illegal if types are legal + break; + case ISD::BUILD_VECTOR: + case ISD::INSERT_VECTOR_ELT: + case ISD::EXTRACT_VECTOR_ELT: + case ISD::CONCAT_VECTORS: + case ISD::EXTRACT_SUBVECTOR: + case ISD::VECTOR_SHUFFLE: + case ISD::SCALAR_TO_VECTOR: + case ISD::BIT_CONVERT: + case ISD::LOAD: + case ISD::STORE: + // These are intentionally not handled here; the point o...