thr3ads.net - search: "v8i16"

Displaying 20 results from an estimated 51 matches for "v8i16".

Did you mean: v1i16

2020 Jan 03

Legalizing vector types

Hi all, I am working on a target that has support for v4i16 vectors, and no support for v4i8 / v8i8 / v8i16 V4i8 is promoted to v4i16 which is nice V8i16 is split to 2 x v4i16 which is nice as well Now v8i8 is scalarized, which is not so nice. Ideally I would like v8i8 to be first promoted to v8i16 then split to 2xv4i16 (or split to 2xV4i8 then promoted to 2xv4i16) Is there a way to achieve that? I tr...

[LLVMdev] Possible CellSPU Bug?

2011 Jan 29

[LLVMdev] Possible CellSPU Bug?

I'm working on enhancing TableGen's type checking and it triggered with a problem in CellSPU's specification: XSHWv4i32: (set VECREG:v8i16:$rDest, (sext:v8i16 VECREG:v4i32:$rSrc)) It's complaining that v4i32 is not smaller than v8i16, which is true in the sense of vector bit size, and true in the sense of vector element size. To me, a sign extension from i32 to i16 makes no sense. >From the .td file, it looks as if src and d...

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 23

[LoopVectorizer] Improving the performance of dot product reduction loop

...s a dot product reduction loop multipying sign extended 16-bit values to produce a 32-bit accumulated result. The x86 backend is currently not able to optimize it as well as gcc and icc. The IR we are getting from the loop vectorizer has several v8i32 adds and muls inside the loop. These are fed by v8i16 loads and sexts from v8i16 to v8i32. The x86 backend recognizes that these are addition reductions of multiplication so we use the vpmaddwd instruction which calculates 32-bit products from 16-bit inputs and does a horizontal add of adjacent pairs. A vpmaddwd given two v8i16 inputs will produce a v...

[LLVMdev] Lowering to MMX

2011 Oct 20

[LLVMdev] Lowering to MMX

...a mix of MMX intrinsics and v4i16 operations, so it ping-pongs back and forth between MMX and SSE2 instructions in the generated code. To get more optimal code, I see three options, and I was wondering if someone could share some advice on which approach you think will work best: 1) I could use v8i16 or v4i32 instead of v4i16, but then the SSE register pressure would be significantly increased. I already use v4f32 operations intensively so having the MMX registers available for 64-bit integer vector operations helps performance quite considerably on the register deprived x86 architecture. T...

Vector trunc code generation difference between llvm-3.9 and 4.0

2017 Feb 17

Vector trunc code generation difference between llvm-3.9 and 4.0

Correction in the C snippet: typedef signed short v8i16_t __attribute__((ext_vector_type(8))); v8i16_t foo (v8i16_t a, int n) { return a >> n; } Best regards Saurabh On 17 February 2017 at 16:21, Saurabh Verma <saurabh.verma at movidius.com> wrote: > Hello, > > We are investigating a difference in code generation for vect...

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 23

[LoopVectorizer] Improving the performance of dot product reduction loop

...ction loop > multipying sign extended 16-bit values to produce a 32-bit accumulated > result. The x86 backend is currently not able to optimize it as well as gcc > and icc. The IR we are getting from the loop vectorizer has several v8i32 > adds and muls inside the loop. These are fed by v8i16 loads and sexts from > v8i16 to v8i32. The x86 backend recognizes that these are addition > reductions of multiplication so we use the vpmaddwd instruction which > calculates 32-bit products from 16-bit inputs and does a horizontal add of > adjacent pairs. A vpmaddwd given two v8i16 inp...

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 23

[LoopVectorizer] Improving the performance of dot product reduction loop

...pying sign extended 16-bit values to produce a 32-bit >> accumulated result. The x86 backend is currently not able to optimize >> it as well as gcc and icc. The IR we are getting from the loop >> vectorizer has several v8i32 adds and muls inside the loop. These are >> fed by v8i16 loads and sexts from v8i16 to v8i32. The x86 backend >> recognizes that these are addition reductions of multiplication so we >> use the vpmaddwd instruction which calculates 32-bit products from >> 16-bit inputs and does a horizontal add of adjacent pairs. A vpmaddwd >> giv...

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 24

[LoopVectorizer] Improving the performance of dot product reduction loop

...gt; multipying sign extended 16-bit values to produce a 32-bit accumulated >> result. The x86 backend is currently not able to optimize it as well as gcc >> and icc. The IR we are getting from the loop vectorizer has several v8i32 >> adds and muls inside the loop. These are fed by v8i16 loads and sexts from >> v8i16 to v8i32. The x86 backend recognizes that these are addition >> reductions of multiplication so we use the vpmaddwd instruction which >> calculates 32-bit products from 16-bit inputs and does a horizontal add of >> adjacent pairs. A vpmaddwd giv...

[LLVMdev] Lowering to MMX

2011 Oct 25

[LLVMdev] Lowering to MMX

...cs and v4i16 operations, so it ping-pongs > back and forth between MMX and SSE2 instructions in the generated code. > > To get more optimal code, I see three options, and I was wondering if > someone could share some advice on which approach you think will work best: > 1) I could use v8i16 or v4i32 instead of v4i16, but then the SSE > register pressure would be significantly increased. I already use v4f32 > operations intensively so having the MMX registers available for 64-bit > integer vector operations helps performance quite considerably on the > register deprived x86...

[LLVMdev] Question on the use of TableGen

2010 Jul 06

[LLVMdev] Question on the use of TableGen

...I'm trying to create a new backend for a processor, and I start with modifying the existing backends like MIPS and Microblaze. I have a problem when I try to add a register class in the Target's register description, it looks like this: def IGPRegs : RegisterClass<"MBlaze", [v8i16], 128, [PR0, PR1, PR2, PR3]>; // PR0 - PR3 are registers defined before I want to have a new integer register file for a different type, e.g. v8i16. But then I got errors when running tblgen. Here is the error I got when modifying the MBlaze backend: BSLLI: (set CPURegs:i32:$dst, (shl:...

[LLVMdev] Possible CellSPU Bug?

2011 Jan 31

[LLVMdev] Possible CellSPU Bug?

David Greene wrote: > class XSHWVecInst<ValueType in_vectype, ValueType out_vectype>: > def v4i32: XSHWVecInst<v4i32, v8i16>; > Is this pattern as intended, or did I find a real problem? Looks like a bug to me. xshw (extend signed half-word(16bits) to word(32bits)) takes a v8i16 and produces a v4i32. This has likely gone unnoticed as there is only one type of vector register class (i.e. VECREG) that is used for...

Vector trunc code generation difference between llvm-3.9 and 4.0

2017 Feb 18

Vector trunc code generation difference between llvm-3.9 and 4.0

...e may have won the bug lottery by exposing all of front-, > middle-, back-end bugs. :) > > > > On Fri, Feb 17, 2017 at 9:38 AM, Saurabh Verma via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Correction in the C snippet: >> >> typedef signed short v8i16_t __attribute__((ext_vector_type(8))); >> >> v8i16_t foo (v8i16_t a, int n) >> { >> return a >> n; >> } >> >> Best regards >> Saurabh >> >> >> >> On 17 February 2017 at 16:21, Saurabh Verma <saurabh.verma at movid...

[LLVMdev] Should more vector [zs]extloads be legal for X86 SSE4.1?

2014 Dec 02

[LLVMdev] Should more vector [zs]extloads be legal for X86 SSE4.1?

...s braindead testcase: %0 = load <8 x i8>* %src, align 1 %1 = zext <8 x i8> %0 to <8 x i16> turning into: pmovzxbw (%rsi), %xmm0 pand <0xff,0xff,...>, %xmm0, %xmm0 v8i8 isn't legal, so the load became an anyext load from v8i8 to v8i16, with the pand masking out the unwanted/zero bits. In that example, if you declare zextloads from v8i8 legal, and add the simple corresponding pattern, the pand isn't generated anymore, as expected. So, unless I'm missing something, shouldn't we declare them legal? Insights much appre...

[LLVMdev] Turning on LegalizeTypes by default

2008 Oct 26

[LLVMdev] Turning on LegalizeTypes by default

...galize case, legalize just custom lowers the build_vector-of-i16 right away. In the legalize types case, legalize types modifies the DAG quite a bit to eliminate the i16 operations, and the PPC backend doesn't match on what it expects. > Two solutions: > decide that in fact > v8i16 = BUILD_VECTOR(i32, i32, ..., i32) > is legal and modify LegalizeTypes to take advantage of this Are you saying that the build_vector would have 8 i32 inputs, or only 4? If 8, I really don't like it because it breaks a lot of invariants. If 4 then it should be the same as the current...

[LLVMdev] Lowering to MMX

2011 Oct 25

[LLVMdev] Lowering to MMX

...and v4i16 operations, so it ping-pongs > back and forth between MMX and SSE2 instructions in the generated code. > > To get more optimal code, I see three options, and I was wondering if > someone could share some advice on which approach you think will work best: > 1) I could use v8i16 or v4i32 instead of v4i16, but then the SSE > register pressure would be significantly increased. I already use v4f32 > operations intensively so having the MMX registers available for 64-bit > integer vector operations helps performance quite considerably on the > register deprived...

Vector trunc code generation difference between llvm-3.9 and 4.0

2017 Mar 08

Vector trunc code generation difference between llvm-3.9 and 4.0

...back-end bugs. :) >>> >>> >>> >>> On Fri, Feb 17, 2017 at 9:38 AM, Saurabh Verma via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> Correction in the C snippet: >>>> >>>> typedef signed short v8i16_t __attribute__((ext_vector_type(8))); >>>> >>>> v8i16_t foo (v8i16_t a, int n) >>>> { >>>> return a >> n; >>>> } >>>> >>>> Best regards >>>> Saurabh >>>> >>>> >>...

[LLVMdev] Turning on LegalizeTypes by default

2008 Oct 26

[LLVMdev] Turning on LegalizeTypes by default

...rands doesn't match the vector element type (this is supposed to be illegal), while LegalizeTypes correctly constructs a BUILD_VECTOR where the operand type is equal to the vector element type, but then the PPC custom code doesn't handle this optimally. Two solutions: decide that in fact v8i16 = BUILD_VECTOR(i32, i32, ..., i32) is legal and modify LegalizeTypes to take advantage of this; or improve the PPC so it better handles the code coming out of LegalizeTypes. (b) llvm-gcc builds with languages Ada, C, C++, Fortran, Objc and Obj-c++ on x86-32-linux, and bootstraps with languages C,...

[LLVMdev] Possible CellSPU Bug?

2011 Jan 31

[LLVMdev] Possible CellSPU Bug?

Kalle Raiskila <kalle.raiskila at nokia.com> writes: > Looks like a bug to me. xshw (extend signed half-word(16bits) to > word(32bits)) takes a v8i16 and produces a v4i32. This has likely gone > unnoticed as there is only one type of vector register class (i.e. > VECREG) that is used for all vectors. > > Nice catch :) Are there more of these? I don't know. I stopped implementing the stricter typechecking when I saw this. I wan...

X86 new registers not being allocated

2018 Jan 22

X86 new registers not being allocated

Hi all, I have a bunch of new registers set up in X86RegisterInfo.td, the important part being def PR128 : RegisterClass<"X86", [i128], 128, (sequence "POI%u", 0, 7)>; def VR128 : RegisterClass<"X86", [v4f32, v2f64, v16i8, v8i16, v4i32, v2i64], 128, (add PR128, FR32)>; I have an entry in X86ISelLowering.cpp: addRegisterClass(MVT::i128, &X86::PR128RegClass); and in findRepresentativeClass(): case MVT::i128: RRC = &X86::PR128RegClass; But even though my nodes have MVT::i128 valu...

[LLVMdev] TableGen: RegisterClass question

2006 May 14

[LLVMdev] TableGen: RegisterClass question

...l allowed ValueTypes? This is useful for targets where multiple types of the same size can be held in the same registers. For example, if the target has a unified register file for Int/FP, it could have [i64, f64]. In practice, this is most useful for vector types, the X86 backend has [v16i8, v8i16, v4i32, v2i64, v4f32, v2f64]. -Chris -- http://nondot.org/sabre/ http://llvm.org/

search for: v8i16