Displaying 20 results from an estimated 259 matches for "insertelements".
Did you mean:
insertelement
2014 Mar 17
2
[LLVMdev] Improving SLPVectorizer for Julia
I'm working on some small improvements to SLPVectorizer.cpp so that it can deal with some tuple operations arising from Julia code. Being fairly new to LLVM, I could use some advice, particular from those familiar with the internals of SLPVectorizer.
The motivation can be found in the Julia discussion https://github.com/JuliaLang/julia/issues/5857 . Here is an example of the kind of LLVM
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The following IR implements the following nested loop:
for (int i = start ; i < end ; ++i )
for (int p = 0 ; p < 4 ; ++p )
a[i*4+p] = b[i*4+p] + c[i*4+p];
define void @main(i64 %arg0, i64 %arg1, i1 %arg2, i64 %arg3, float*
noalias %arg4, float* noalias %arg5, float* noalias %arg6) {
entrypoint:
br i1 %arg2, label %L0, label %L1
L0:
2013 Nov 06
0
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The loop vectorizer relies on cleanup passes to be run after it:
from Transforms/IPO/PassManagerBuilder.cpp:
// Add the various vectorization passes and relevant cleanup passes for
// them since we are no longer in the middle of the main scalar pipeline.
MPM.add(createLoopVectorizePass(DisableUnrollLoops));
MPM.add(createInstructionCombiningPass());
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
The instcombine pass cleans up a lot.
Any idea why there are still shufflevector, insertelement, *and* bitcast
(!!) etc. instructions left? The original loop is so clean, a textbook
example I'd say. There is no need to shuffle anything.At least I don't
see it.
Frank
vector.ph: ; preds = %L5
%broadcast.splatinsert1 = insertelement <4 x
2013 Feb 04
6
[LLVMdev] Vectorizer using Instruction, not opcodes
On 4 February 2013 18:25, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:
> For cases where this approach breaks really badly we could consider adding
> a specialized api or parameters (like the type of a user/use). But we
> should do so only as a last resort and backed by actual code that would
> benefit from doing so.
>
Very sensible, more or less what I had in
2013 Nov 06
0
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
Yes, you need the latest ToT version of llvm or you run
-loop-vectorize -earlycse -instcombine -simplifycfg
The bitcast essentially is a noop to satisfy the type system.
This is how your example looks like for me:
vector.body: ; preds = %vector.body, %vector.ph
%index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
%.lhs = shl i64 %6, 2
2013 Feb 04
0
[LLVMdev] Vectorizer using Instruction, not opcodes
Hi all,
My take on this is that, as you state below, at the IR level we are only roughly estimating cost, at best (or we would have to lower the code and then estimate cost - something we don't want to do).
I would propose for estimating the "worst case costs" and see how far we get with this. My rational here is that we don't want vectorization to decrease performance relative
2015 Jan 14
2
[LLVMdev] Bug in InsertElement constant propagation?
Hi,
When I run opt on the following LLVM IR:
define i32 @foo() {
bb0:
%0 = bitcast i32 2139171423 to float
%1 = insertelement <1 x float> undef, float %0, i32 0
%2 = extractelement <1 x float> %1, i32 0
%3 = bitcast float %2 to i32
ret i32 %3
}
->
It generates:
define i32 @foo() {
bb0:
ret i32 2143365727
}
While tracking the value I see that the floating point value
2013 Feb 04
0
[LLVMdev] Vectorizer using Instruction, not opcodes
----- Original Message -----
> From: "Renato Golin" <renato.golin at linaro.org>
> To: "Arnold Schwaighofer" <aschwaighofer at apple.com>
> Cc: "LLVM Dev" <llvmdev at cs.uiuc.edu>, "Nadav Rotem" <nrotem at apple.com>, "Hal Finkel" <hfinkel at anl.gov>
> Sent: Monday, February 4, 2013 1:38:03 PM
>
2013 Feb 04
2
[LLVMdev] Vectorizer using Instruction, not opcodes
Hi folks,
I've been thinking on how to implement some of the costs and there is a lot
of instructions which cost depend on other instructions around. Casts are
one obvious case, since arithmetic and memory instructions can, sometimes,
cast values for free.
The cost model receives Opcodes, which lose the info on the history of the
values being vectorized, and I thought we could pass the whole
2013 Feb 04
0
[LLVMdev] Vectorizer using Instruction, not opcodes
The loop vectorized does not estimate the cost of vectorization by looking at the IR you list below. It does not vectorize and then run the CostAnalysis pass. It estimates the cost itself before it even performs the vectorization.
The way it works is that it looks at all the scalar instructions and asks: What is the cost if I execute the scalar instruction as a vector instruction. Therefore, it
2015 Jan 14
2
[LLVMdev] Bug in InsertElement constant propagation?
Ha here is what I was missing. Thanks Jon. It still seems to me that the transformation of LLVM IR is invalid is that right? I assume we shouldn't be converting APFloat to float in order to avoid such problems?
-----Original Message-----
From: Jonathan Roelofs [mailto:jonathan at codesourcery.com]
Sent: Wednesday, January 14, 2015 9:39 AM
To: Raoux, Thomas F; LLVM Developers Mailing List
2017 Mar 14
3
llvm-stress crash
Hi,
Using llvm-stress, I got a crash after Post-RA pseudo expansion, with
machine verifier.
A 128 bit register
%vreg233:subreg_l32<def,read-undef> = LLCRMux %vreg119;
GR128Bit:%vreg233 GRX32Bit:%vreg119
gets spilled:
%vreg265:subreg_l32<def,read-undef> = LLCRMux %vreg119;
GR128Bit:%vreg265 GRX32Bit:%vreg119
ST128 %vreg265, <fi#10>, 0, %noreg;
2007 Jul 20
0
[LLVMdev] Seg faulting on vector ops
Hi Chuck!
On Jul 20, 2007, at 11:36 AM, Chuck Rose III wrote:
> Hola LLVMers,
>
>
>
> I’m looking to make use of the vectorization primitives in the
> Intel chip with the code we generate from LLVM and so I’ve started
> experimenting with it. What is the state of the machine code
> generated for vectors? In my tinkering, I seem to be getting some
> wonky
2014 Jul 22
2
[LLVMdev] InsertElementInst and ExtractElementInst
Hello,
I am create a <3 x i32> vector in LLVM IR. Then I insert 3 instructions
and later on I try to load one instruction from the vector. The
insertion seems to work, however, when I try to load a specific
instruction from a vector I seems that it does not work.
This is the part of my IR:
%"ins or1" = insertelement <3 x i32> undef, i32 %38, i32 0
%"ins and2"
2007 Jul 20
5
[LLVMdev] Seg faulting on vector ops
Hola LLVMers,
I'm looking to make use of the vectorization primitives in the Intel
chip with the code we generate from LLVM and so I've started
experimenting with it. What is the state of the machine code generated
for vectors? In my tinkering, I seem to be getting some wonky machine
instructions, but I'm most likely just doing something wrong and I'm
hoping you can set me in
2012 Feb 28
1
[LLVMdev] How to vectorize a vector type cast?
Since Clang does not seem to allow type casts, such as uchar4 to float4, between vector types, it seems it is necessary to write them as element by element conversions, such as
typedef float float4 __attribute__((ext_vector_type(4)));
typedef unsigned char uchar4 __attribute__((ext_vector_type(4)));
float4 to_float4(uchar4 in)
{
float4 out = {in.x, in.y, in.z, in.w};
return out;
}
Running
2015 Jan 15
2
[LLVMdev] Bug in InsertElement constant propagation?
I don't see a way to create a ConstantDataVector from Constant or form APFloat though. Did I oversee that?
Is the solution to had a new get function in ConstantDataVector to allow that? Any hint on what would be the right fix otherwise?
Thomas
-----Original Message-----
From: Jonathan Roelofs [mailto:jonathan at codesourcery.com]
Sent: Wednesday, January 14, 2015 10:30 AM
To: Raoux, Thomas
2020 Jan 11
2
[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors
Thanks so much for your feedback Simon.
I am not sure that what I am proposing here is at odds with what you're
referring to (here and in the PR you linked). The key difference AFAICT is
that the pattern I am referring to is probably more aptly described as
"reducing scalarization" than as "vectorization". The reason I say that is
that the inputs are vectors and the output
2012 Oct 24
3
[LLVMdev] RegisterCoalescing Pass seems to ignore part of CFG.
Hi,
I don't know if my llvm ir code is faulty, or if I spot a bug in the RegisterCoalescing Pass, so I'm posting my issue on the ML. Shader and print-before-all dump are given below.
The interessing part is the vreg6/vreg48 reduction : before RegCoalescing, the machine code is :
// BEFORE LOOP
... Some COPYs....
400B%vreg47<def> = COPY %vreg2<kill>; R600_Reg32:%vreg47,%vreg2