Displaying 20 results from an estimated 29 matches for "insert_vector_elt".
2013 Jul 01
3
[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
...le thumbv7-apple-ios3 motivating_example.ll -o -
> ldr r0, [r1]
> ldr r1, [r2]
> vmov s1, r1
> vmov s0, r0
> Here each ldr, vmov sequences could have been replaced by a simple vld1.32.
>
> ** Proposed Solution **
> Lower to more vector friendly code (using a sequence of insert_vector_elt), when bit casts will not be free.
> The attached patch demonstrates that, but is missing the proper check to know what DAG combine will do (see TODO).
>
> I think you're approaching this backwards: the obvious thing to do is to generate the insert_vector_elt sequence unconditionally,...
2013 Jul 01
0
[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
...o -
>> ldr r0, [r1]
>> ldr r1, [r2]
>> vmov s1, r1
>> vmov s0, r0
>> Here each ldr, vmov sequences could have been replaced by a simple
>> vld1.32.
>>
>> ** Proposed Solution **
>> Lower to more vector friendly code (using a sequence of
>> insert_vector_elt), when bit casts will not be free.
>> The attached patch demonstrates that, but is missing the proper check to
>> know what DAG combine will do (see TODO).
>>
>
> I think you're approaching this backwards: the obvious thing to do is to
> generate the insert_vector_elt...
2013 Jul 01
0
[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
...le thumbv7-apple-ios3 motivating_example.ll -o -
> ldr r0, [r1]
> ldr r1, [r2]
> vmov s1, r1
> vmov s0, r0
> Here each ldr, vmov sequences could have been replaced by a simple vld1.32.
>
> ** Proposed Solution **
> Lower to more vector friendly code (using a sequence of
> insert_vector_elt), when bit casts will not be free.
> The attached patch demonstrates that, but is missing the proper check to
> know what DAG combine will do (see TODO).
>
I think you're approaching this backwards: the obvious thing to do is to
generate the insert_vector_elt sequence unconditionally,...
2013 Jul 01
3
[LLVMdev] Advices Required: Best practice to share logic between DAG combine and target lowering?
...ple.ll shows such a case:
llc -O3 -mtriple thumbv7-apple-ios3 motivating_example.ll -o -
ldr r0, [r1]
ldr r1, [r2]
vmov s1, r1
vmov s0, r0
Here each ldr, vmov sequences could have been replaced by a simple vld1.32.
** Proposed Solution **
Lower to more vector friendly code (using a sequence of insert_vector_elt), when bit casts will not be free.
The attached patch demonstrates that, but is missing the proper check to know what DAG combine will do (see TODO).
Thanks for your help.
Cheers,
-Quentin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/p...
2009 Dec 02
1
[LLVMdev] More AVX Advice Needed
...rd?" It looks just like an
> > intrinsic to me.
>
> X86insrtps is roughly equivalent to the LLVM IR instruction
> insertelement, so there's no need for an IR intrinsic.
> X86ISD::INSERTPS is an extra instruction for ISel; it's used inside
> the custom lowering for INSERT_VECTOR_ELT and VECTOR_SHUFFLE.
Yes, that's how I found out about it. :)
Why not just use ISD::INSERT_VECTOR_ELT?
And what's the difference between vector_extract and extractelt in
TargetSelectionDAG.td? Ditto vector_insert vs. insertelt.
-Dave
2016 Aug 02
2
Instruction selection problems due to SelectionDAGBuilder
...r3_wo_getSetCCResultType)
Initial selection DAG: BB#15 'foo:vector.ph'
SelectionDAG has 41 nodes:
t0: ch = EntryToken
t4: i32 = Constant<0>
t3: i64,ch = CopyFromReg t0, Register:i64 %vreg12
t6: v8i64 = insert_vector_elt undef:v8i64, t3, Constant:i64<0>
t7: v8i64 = vector_shuffle<0,0,0,0,0,0,0,0> t6, undef:v8i64
t15: v8i64 = BUILD_VECTOR Constant:i64<0>, Constant:i64<1>,
Constant:i64<2>, Constant:i64<3>, Constant:i64<4>, Constant:i64&l...
2009 Dec 02
0
[LLVMdev] More AVX Advice Needed
...ow is X86insrtps "standard?" It looks just like an
> intrinsic to me.
X86insrtps is roughly equivalent to the LLVM IR instruction
insertelement, so there's no need for an IR intrinsic.
X86ISD::INSERTPS is an extra instruction for ISel; it's used inside
the custom lowering for INSERT_VECTOR_ELT and VECTOR_SHUFFLE.
-Eli
2009 Dec 02
2
[LLVMdev] More AVX Advice Needed
On Wednesday 02 December 2009 16:51, Eli Friedman wrote:
> On Wed, Dec 2, 2009 at 2:44 PM, David Greene <dag at cray.com> wrote:
> > I'm working on some of the AVX insert/extract instructions. They're
> > stupid. They do not operate on ymm registers, meaning we have to
> > use VINSERTF128/VEXTRACTF128 and then do the real operation.
> >
> > Anyway,
2009 Feb 16
0
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
...ions in the dst registers. The way that I ended up handling this
was to have different register classes for 1, 2, 3 and 4 component
vectors. This made the generic cases very simple but still made
swizzling fairly difficult.
In order to get swizzling to work you only need to handle three
SDNodes, insert_vector_elt, extract_vector_elt and build_vector while
expanding the rest. For those three nodes I then custom lowered them to
a target specific node with an extra integer constant per register that
would encode the swizzle mask in 32bits. The correct swizzles can then
be generated in the asm printer by decodi...
2016 Mar 18
2
generate vectorized code
...o a <4xi32>, builds two vectors with the 8 ints,
>
> This might sound like a dumb question, but how does one build a vector of ints out of regular ints in IR?
See: http://llvm.org/docs/LangRef.html#vector-operations
In short, the IR has "insertelement", which maps to "INSERT_VECTOR_ELT" in SDAG and "extractelement", which maps to "EXTRACT_VECTOR_ELT" in SDAG.
I usually find good example by grepping in the lit tests. Another way is to write the function in clang, and run it with -O3 -emit-llvm -S to get a good starting point.
--
Mehdi
>
> sum...
2009 Feb 16
2
[LLVMdev] Modeling GPU vector registers, again (with my implementation)
Evan Cheng-2 wrote:
>
> Well, how many possible permutations are there? Is it possible to
> model each case as a separate physical register?
>
> Evan
>
I don't think so. There are 4x4x4x4 = 256 permutations. For example:
* xyzw: default
* zxyw
* yyyy: splat
Even if can model each of these 256 cases as a separate physical register,
how can I model the use of r0.xyzw in
2013 Dec 05
3
[LLVMdev] X86 - Help on fixing a poor code generation bug
...dss %xmm1, %xmm0
The first step is to understand why the compiler produces the
redundant instructions.
For the example above, the DAG sequence looks like this:
a0 : f32 = extract_vector_elt ( A, 0)
b0 : f32 = extract_vector_elt ( B, 0)
r0 : f32 = fadd a0, b0
result : v4f32 = insert_vector_elt ( A, r0, 0 )
(with A and B of type v4f32).
The 'insert_vector_elt' is custom lowered into either X86ISD::MOVSS or
X86ISD::INSERTPS depending on the target's SSE feature level.
To start I checked if this bug was caused simply by the lack of
specific tablegen patterns to match the comp...
2010 Jul 28
3
[LLVMdev] Subregister coalescing
...gister & ops, *but* not vector loads. All the vector register elements
are directly accesible, so VI1 reg (Vector Integer 1) has I4, I5, I6 and
I7 as its (integer) subregisters. Subregisters of same reg *never*
overlap.
Therefore, vector loads are lowered to scalar loads followed by a chain
of INSERT_VECTOR_ELTs. Then we select those to INSERT_SUBREG, everything
fine to that point.
Status before live analisys is (non-related instrs removed):
36 %reg16388<def> = LDWr %reg16384, 0; mem:LD4[<unknown>]
68 %reg16392<def> = INSERT_SUBREG %reg16392<undef>, %reg16388<kill>, 1
76 %r...
2013 Dec 05
0
[LLVMdev] X86 - Help on fixing a poor code generation bug
...to understand why the compiler produces the
> redundant instructions.
>
> For the example above, the DAG sequence looks like this:
>
> a0 : f32 = extract_vector_elt ( A, 0)
> b0 : f32 = extract_vector_elt ( B, 0)
> r0 : f32 = fadd a0, b0
> result : v4f32 = insert_vector_elt ( A, r0, 0 )
>
> (with A and B of type v4f32).
>
> The 'insert_vector_elt' is custom lowered into either X86ISD::MOVSS or
> X86ISD::INSERTPS depending on the target's SSE feature level.
>
> To start I checked if this bug was caused simply by the lack of
> spec...
2008 Sep 30
0
[LLVMdev] Generalizing shuffle vector
...nce the two sides are the same, it will generate quad word moves
> to copy the values.
I think this specific issue can be fixed without extending the
IL-level syntax; DAGCombiner could easily be made a lot more clever
about cases like this. For example, before legalization, we can
transform an INSERT_VECTOR_ELT inserting an element into a constant
vector or a SCALAR_TO_VECTOR into a BUILD_VECTOR, and we can transform
BUILD_VECTOR into CONCAT_VECTORS or EXTRACT_SUBVECTOR for relevant
cases. We can also make the lowering significantly more clever about
dealing with insertelement.
If what we currently have...
2016 Mar 18
4
generate vectorized code
...;>
>> This might sound like a dumb question, but how does one build a vector of
>> ints out of regular ints in IR?
>>
>>
>> See: http://llvm.org/docs/LangRef.html#vector-operations
>>
>> In short, the IR has "insertelement", which maps to "INSERT_VECTOR_ELT"
>> in SDAG and "extractelement", which maps to "EXTRACT_VECTOR_ELT" in SDAG.
>>
>> I usually find good example by grepping in the lit tests. Another way is
>> to write the function in clang, and run it with -O3 -emit-llvm -S to get a
>> good s...
2010 Jul 28
0
[LLVMdev] Subregister coalescing
On Jul 28, 2010, at 12:25 PM, Carlos Sánchez de La Lama wrote:
> Which after register coalescing gets transformed into:
>
> 36 %reg16404:1<def> = LDWr %reg16384, 0; mem:LD4[<unknown>]
> 76 %reg16394<def> = LDWr %reg16386<kill>, 0; mem:LD4[<unknown>]
> 124 %reg16404<def> = INSERT_SUBREG %reg16404, %reg16394<kill>, 2
> 132
2008 Sep 30
4
[LLVMdev] Generalizing shuffle vector
Hi,
The current definition of shuffle vector is
<result> = shufflevector <n x <ty>> <v1>, <n x <ty>> <v2>, <n x
i32> <mask> ; yields <n x <ty>>
The first two operands of a 'shufflevector' instruction are vectors
with types that match each other and types that match the result of
the instruction. The third
2016 Mar 18
2
generate vectorized code
> On Mar 18, 2016, at 1:37 PM, Rail Shafigulin <rail at esenciatech.com> wrote:
>
>> I think you created a cycle, this is easy to do with SelectionDAG :)
>> Basically SelecitonDAG will iterate until it does not see anything to change. So if you insert a transformation on a pattern A, that generates pattern B, while you have another transformation that matches B and
2009 May 20
2
[LLVMdev] [PATCH] Add new phase to legalization to handle vector operations
...RINSIC_WO_CHAIN:
+ case ISD::INTRINSIC_W_CHAIN:
+ case ISD::INTRINSIC_VOID:
+ case ISD::CopyToReg:
+ case ISD::CopyFromReg:
+ case ISD::AssertSext:
+ case ISD::AssertZext:
+ // Node cannot be illegal if types are legal
+ break;
+ case ISD::BUILD_VECTOR:
+ case ISD::INSERT_VECTOR_ELT:
+ case ISD::EXTRACT_VECTOR_ELT:
+ case ISD::CONCAT_VECTORS:
+ case ISD::EXTRACT_SUBVECTOR:
+ case ISD::VECTOR_SHUFFLE:
+ case ISD::SCALAR_TO_VECTOR:
+ case ISD::BIT_CONVERT:
+ case ISD::LOAD:
+ case ISD::STORE:
+ // These are intentionally not handled here; the point o...