thr3ads.net - search: "4xfloat"

Displaying 17 results from an estimated 17 matches for "4xfloat".

[LLVMdev] convert llvm ir to selection Dag

2010 Oct 04

[LLVMdev] convert llvm ir to selection Dag

Hi, Thanks for your reply. I have gone through the tutorial how to write llvm pass .. and but not able to figure out how should i proceed for this pass.. Can u please tell me some starting point for this.. Let say i have some add function.. define <4xfloat> add(<4xfloat>% in1, <4xfloat>% in2) { entry: %0 = fadd <4xfloat> %in1, %in2 ret <4xfloat> %0 } Thanks & Regards, Pachauri ------- Original Message ------- Sender : Eli Friedman<eli.friedman at gmail.com> Date : Oct 02, 2010 00:51 (GMT+09:0...

[LLVMdev] vectors of pointers (why not?)

2011 Feb 08

[LLVMdev] vectors of pointers (why not?)

I'm writing a compiler where I'd like to be able to (sometimes) represent lvalues of vectors (e.g. <4xfloat>) with vectors of pointers (e.g. <4xfloat *>). In this case, I'd like to be generating these vectors of pointers early on and later converting operations on them to a series of individual loads/stores with a later pass. (Note, though, that ISAs with "gather"/"scatter&q...

[LLVMdev] How to partition registers into different RegisterClass?

2005 Jul 26

[LLVMdev] How to partition registers into different RegisterClass?

2005/7/26, Chris Lattner <sabre at nondot.org>: > Tzu-Chien Chiu wrote: > > The same problem exists when there are two types of costant registers, > > floating point and integer, and each is declared 'packed' ([4xfloat] > > and [4xint]). The instruction selector doesn't know which instruction > > it should produce because the newly defined MVT type 'packed' is > > always used for all operands (registers), even if it's acutally a > > [4xfloat] or [4xint]. > > It might...

[LLVMdev] How to partition registers into different RegisterClass?

2005 Jul 25

[LLVMdev] How to partition registers into different RegisterClass?

...9;. With two 'packed' operands, the instruction selector doesn't know whether a ADDgg, ADDgi, or an ADDgc should be generated (BuildMI() function). The same problem exists when there are two types of costant registers, floating point and integer, and each is declared 'packed' ([4xfloat] and [4xint]). The instruction selector doesn't know which instruction it should produce because the newly defined MVT type 'packed' is always used for all operands (registers), even if it's acutally a [4xfloat] or [4xint]. 2005/7/24, Chris Lattner <sabre at nondot.org>: &gt...

[LLVMdev] vectors of pointers (why not?)

2011 Feb 09

[LLVMdev] vectors of pointers (why not?)

On 02/08/2011 05:12 PM, Matt Pharr wrote: > I'm writing a compiler where I'd like to be able to (sometimes) represent lvalues of vectors (e.g.<4xfloat>) with vectors of pointers (e.g.<4xfloat *>). In this case, I'd like to be generating these vectors of pointers early on and later converting operations on them to a series of individual loads/stores with a later pass. (Note, though, that ISAs with "gather"/"scatter&qu...

[LLVMdev] How to partition registers into different RegisterClass?

2005 Jul 26

[LLVMdev] How to partition registers into different RegisterClass?

...put or input register if possible. To allow this coallescing to happen, implement the TargetInstrInfo::isMoveInstr virtual method for your target. > The same problem exists when there are two types of costant registers, > floating point and integer, and each is declared 'packed' ([4xfloat] > and [4xint]). The instruction selector doesn't know which instruction > it should produce because the newly defined MVT type 'packed' is > always used for all operands (registers), even if it's acutally a > [4xfloat] or [4xint]. It might make sense to add two MVT enu...

Vector evolution?

2020 Sep 01

Vector evolution?

On Tue, Sep 1, 2020 at 5:10 PM Florian Hahn <florian_hahn at apple.com> wrote: > The loop vectorizer does not really handle loops that already operate on vectors, so that is why the loop using v4f32 does not get widened. > > Arguably the user explicitly asked for 4xfloat vectors in the v4f32 version, so that is what gets generated. In my case I have tons of legacy code written for SSE2 and if the compiler can make a better and correct version of it, why not? > (Those kinds of issues are better to discuss on https://bugs.llvm.org/ IMO, because it is easier to k...

[LLVMdev] How to define complicated instruction in TableGen (Direct3D shader instruction)

2005 Jul 27

[LLVMdev] How to define complicated instruction in TableGen (Direct3D shader instruction)

Each register is a 4-component (namely, r, g, b, a) vector register. They are actually defined as llvm packed [4xfloat]. The instruction: add_sat r0.a, r1_bias.xxyy, r3_x2.zzzz Explaination: '.a' is a writemask. only the specified component will be update '.xxyy' and '.zzzz' are swizzle masks, specify the component permutation, simliar to the Intel SSE permutation instruction SHUFPD...

[LLVMdev] Error with instruction selection

2010 May 31

[LLVMdev] Error with instruction selection

...1st call to a temporary register. The physical return register is then overwritten in the next call. (This is visible when calling "llc -view-isel-dags -view-sched-dags". The first graph is OK, the second is not.) The problem goes away if I: -have the getPtr return anything else than <4xfloat>* or <4xi32>* (e.g. <4xfloat> or float* work just fine) -do not load from or store to the pointer - e.g. just returning the pointer works. -target any other processor than CellSPU (ok, some backends assert on this code, and the PIC assebly I didn't understand :) ) Any explanatio...

[LLVMdev] LLVMdev Digest, Vol 80, Issue 13

2011 Feb 15

[LLVMdev] LLVMdev Digest, Vol 80, Issue 13

...ust suggesting that from the perspective of the LLVM IR there doesn't seem to be a necessary semantic difference between arrays and vectors. Arrays provide a superset of the functionality available for vectors. I would be happy if the code generator used 4x32bit vectors for basic math on [4xfloat] arrays, and fell back to something less efficient if the user decided to dynamically index it. However, maybe this is more work for the code generator than is currently feasible. On 02/14/2011 06:44 PM, Villmow, Micah wrote: > Andrew, > This is one area of LLVM that maps very nicely to...

[LLVMdev] How to define complicated instruction in TableGen (Direct3D shader instruction)

2005 Jul 29

[LLVMdev] How to define complicated instruction in TableGen (Direct3D shader instruction)

...; with permutation) it should be the role of the pattern instruction selector to recognise the shuffle+add combination and emit a single instruction. m. Tzu-Chien Chiu wrote: > Each register is a 4-component (namely, r, g, b, a) vector register. > They are actually defined as llvm packed [4xfloat]. > > The instruction: > > add_sat r0.a, r1_bias.xxyy, r3_x2.zzzz > > Explaination: > > '.a' is a writemask. only the specified component will be update > > '.xxyy' and '.zzzz' are swizzle masks, specify the component > permutation, si...

[LLVMdev] Questions on the llvm 'vector' types and resulting SIMD instructions

2014 Sep 03

[LLVMdev] Questions on the llvm 'vector' types and resulting SIMD instructions

...ar code, what is the function signature, as it would appear to be called from a C function, on a machine without __m128? * What happens to vector types of length not equal to the machine's SIMD length? If I defined a <3 x float>, would it always generate scalar code, or would it pad to a 4xfloat and generate SSE instructions? Or is it not even allowed? Thanks, and apologies if I've missed the documentation where all this is spelled out. -- Larry Gritz lg at larrygritz.com

[LLVMdev] LLVMdev Digest, Vol 80, Issue 13

2011 Feb 14

[LLVMdev] LLVMdev Digest, Vol 80, Issue 13

Andrew, This is one area of LLVM that maps very nicely to our GPU architecture. A vector is a native data type on these architectures. For example, on AMD's GPUs, the native type is 4x32bit vector with sub-components. Each of the individual 32bits can be indexed separately, but not dynamically. This is a big difference from an array of 32bit values. So there are cases where the meaning of the

[LLVMdev] How to partition registers into different RegisterClass?

2005 Jul 23

[LLVMdev] How to partition registers into different RegisterClass?

On Sat, 23 Jul 2005, Tzu-Chien Chiu wrote: > 2005/7/23, Chris Lattner <sabre at nondot.org>: >> What does a 'read only' register mean? Is it a constant (e.g. returns >> 1.0)? Otherwise, how can it be a useful value? > > Yes, it's a constant register. > > Because the instruction cannot contain an immediate value, a constant > value may be stored in

[LLVMdev] How to partition registers into different RegisterClass?

2005 Jul 23

[LLVMdev] How to partition registers into different RegisterClass?

2005/7/23, Chris Lattner <sabre at nondot.org>: > > What does a 'read only' register mean? Is it a constant (e.g. returns > 1.0)? Otherwise, how can it be a useful value? Yes, it's a constant register. Because the instruction cannot contain an immediate value, a constant value may be stored in a constant register, and it's defined _before_ the program starts by

Vector evolution?

2020 Sep 01

Vector evolution?

Hi, Please consider the following loop: using v4f32 = float __attribute__((__vector_size__(16))); void fct6(v4f32 *x) { #pragma clang loop vectorize(enable) for (int i = 0; i < 256; ++i) x[i] = 7 * x[i]; } After compiling it with: clang++ -O3 -march=native -mtune=native \ -Rpass=loop-vectorize,slp-vectorize -Rpass-missed=loop-vectorize,slp-vectorize

[LLVMdev] Finding Merge nodes in CFG (ambika@cse.iitb.ac.in)

2010 May 31

[LLVMdev] Finding Merge nodes in CFG (ambika@cse.iitb.ac.in)

...register. The physical return register is then overwritten in the next > call. (This is visible when calling "llc -view-isel-dags > -view-sched-dags". The first graph is OK, the second is not.) > > The problem goes away if I: > -have the getPtr return anything else than <4xfloat>* or <4xi32>* (e.g. > <4xfloat> or float* work just fine) > -do not load from or store to the pointer - e.g. just returning the > pointer works. > -target any other processor than CellSPU (ok, some backends assert on > this code, and the PIC assebly I didn't unders...

search for: 4xfloat