thr3ads.net - search: "v32i8"

2009 Nov 18

1

[LLVMdev] TableGen Type Contradiction

Can anyone puzzle out what tblgen is trying to tell me here? VR256:v32i8:$src MD0.VMOVDQA_256mr: (st:isVoid VR256:v32i8:$src, addr:iPTR: $dst)<<P:Predicate_unindexedstore>><<P:Predicate_store>><<P:Predicate_alignedstore>> /ptmp/dag/universal_build/debug/DEFAULT/llvm/tblgen: In MD0.VMOVDQA_256mr: Type inference contradiction found in...

VSelect Instruction Error

2017 Sep 21

1

VSelect Instruction Error

Hello, I am getting this error. What instruction is required to be implemented? LLVM ERROR: Cannot select: t22: v32i32 = vselect t724, t11, t16 t724: v32i32,ch = load<LD128[FixedStack1]> t723, FrameIndex:i64<1>, undef:i64 t659: i64 = FrameIndex<1> t10: i64 = undef t11: v32i32,ch = load<LD128[%sunkaddr45](align=4)(tbaa=<0x481f1e8>)> t0, t8, undef:i64

MCRegisterClass mandatory vs preferred alignment?

2015 Aug 31

2

MCRegisterClass mandatory vs preferred alignment?

...ecified in tablegen. From Target.td: > > class RegisterClass<string namespace, list<ValueType> regTypes, int alignment, > dag regList, RegAltNameIndex idx = NoRegAltName> > > X86RegisterInfo.td: > > def VR256 : RegisterClass<"X86", [v32i8, v16i16, v8i32, v4i64, v8f32, v4f64], > 256, (sequence "YMM%u", 0, 15)>; > def VR256X : RegisterClass<"X86", [v32i8, v16i16, v8i32, v4i64, v8f32, v4f64], > 256, (sequence "YMM%u", 0, 31)>; > &gt...

MCRegisterClass mandatory vs preferred alignment?

2015 Aug 31

3

MCRegisterClass mandatory vs preferred alignment?

Looking around today, it appears that TargetRegisterClass and MCRegisterClass only includes a single alignment. This is documented as being the minimum legal alignment, but it appears to often be greater than this in practice. For instance, on x86 the alignment of %ymm0 is listed as 32, not 1. Does anyone know why this is? Additionally, where are these alignments actually defined? I

[LLVMdev] AVX spill alignment

2011 Aug 25

2

[LLVMdev] AVX spill alignment

Hey guys, Are spills/reloads of AVX registers using aligned stores/loads? I can't seem to find the code that aligns the stack slots to 32-bytes. Could someone point me in the right direction? Thanks, Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110825/b5724dec/attachment.html>

[LLVMdev] AVX spill alignment

2011 Sep 01

0

[LLVMdev] AVX spill alignment

...pills/reloads of AVX registers using aligned stores/loads? Yes. > I can't > seem to find the code that aligns the stack slots to 32-bytes. Could > someone point me in the right direction? The register class has 256-bit spill alignment: def VR256 : RegisterClass<"X86", [v32i8, v16i16, v8i32, v4i64, v8f32, v4f64], 256, (sequence "YMM%u", 0, 15)> { let SubRegClasses = [(FR32 sub_ss), (FR64 sub_sd), (VR128 sub_xmm)]; } /jakob

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 23

3

[LoopVectorizer] Improving the performance of dot product reduction loop

...wd instruction. Additionally, future x86 CPUs will be gaining an instruction that can do VPMADDWD and VPADDD in one instruction, but that width mismatch makes that instruction difficult to utilize. In order for the backend to handle this better it would be great if we could have something like two v32i8 loads, two shufflevectors to extract the even elements and the odd elements to create four v16i8 pieces.Sign extend each of those pieces. Multiply the two even pieces and the two odd pieces separately, sum those results with a v8i32 add. Then another v8i32 add to accumulate the previous loop iterat...

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 23

4

[LoopVectorizer] Improving the performance of dot product reduction loop

...ally, future x86 CPUs will be gaining an instruction > that can do VPMADDWD and VPADDD in one instruction, but that width mismatch > makes that instruction difficult to utilize. > > In order for the backend to handle this better it would be great if we > could have something like two v32i8 loads, two shufflevectors to extract > the even elements and the odd elements to create four v16i8 pieces. > > > Why v*i8 loads? I thought that we have 16-bit and 32-bit types here? > Oops that should have been v16i16. Mixed up my 256-bit types. > > Sign extend each of those...

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 23

2

[LoopVectorizer] Improving the performance of dot product reduction loop

...s will be gaining an instruction that can do VPMADDWD >> and VPADDD in one instruction, but that width mismatch makes that >> instruction difficult to utilize. >> >> In order for the backend to handle this better it would be great if >> we could have something like two v32i8 loads, two shufflevectors to >> extract the even elements and the odd elements to create four v16i8 >> pieces. > > Why v*i8 loads? I thought that we have 16-bit and 32-bit types here? > >> Sign extend each of those pieces. Multiply the two even pieces and >> the two...

[LLVMdev] Register Class assignment for integer and pointer types

2013 Jun 24

2

[LLVMdev] Register Class assignment for integer and pointer types

...and writes can be 128 or 256 bits and the largest integer type we support is i64. Luckily, since image reads and writes aren't something you can do in a 'normal' C or C++ program we can get away with using special intrinsics that return the fat pointers, which we model using v16i8 and v32i8 types. However, as we improve support for OpenCL and other compute oriented programming languages, we may need to starting using 'real' pointers. > Our problem is perhaps a bit different form yours, as our pointers must be loaded and manipulated via special instructions, they can not u...

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 24

4

[LoopVectorizer] Improving the performance of dot product reduction loop

...s will be gaining an instruction >> that can do VPMADDWD and VPADDD in one instruction, but that width mismatch >> makes that instruction difficult to utilize. >> >> In order for the backend to handle this better it would be great if we >> could have something like two v32i8 loads, two shufflevectors to extract >> the even elements and the odd elements to create four v16i8 pieces. >> >> >> Why v*i8 loads? I thought that we have 16-bit and 32-bit types here? >> > > Oops that should have been v16i16. Mixed up my 256-bit types. > &gt...

Cost model is missing in InstCombiner

2016 Aug 18

2

Cost model is missing in InstCombiner

----- Original Message ----- > From: "Mehdi Amini via llvm-dev" <llvm-dev at lists.llvm.org> > To: "Shixiong Xu" <shixiong at cadence.com> > Cc: llvm-dev at lists.llvm.org > Sent: Thursday, August 18, 2016 11:05:35 AM > Subject: Re: [llvm-dev] Cost model is missing in InstCombiner > +David M. > > On Aug 17, 2016, at 3:48 AM, Shixiong Xu

[LLVMdev] AVX spill alignment

2011 Sep 01

1

[LLVMdev] AVX spill alignment

...d stores/loads? > > Yes. > >> I can't >> seem to find the code that aligns the stack slots to 32-bytes. Could >> someone point me in the right direction? > > The register class has 256-bit spill alignment: > > def VR256 : RegisterClass<"X86", [v32i8, v16i16, v8i32, v4i64, v8f32, v4f64], > 256, (sequence "YMM%u", 0, 15)> { > let SubRegClasses = [(FR32 sub_ss), (FR64 sub_sd), (VR128 sub_xmm)]; > } > > /jakob > > -------------- next part -------------- An HTML attachment was scrubbed......

[LLVMdev] Vector promotion broken for <2 x [i8|i16]>

2012 Jul 30

2

[LLVMdev] Vector promotion broken for <2 x [i8|i16]>

...do while loop is important, it just increments the type, starting at MVT::v4i8 until it hits a legal type. This seems broken to me. Here is what TOT LLVM has for its MVT list: v4i8 = 14, // 4 x i8 v8i8 = 15, // 8 x i8 v16i8 = 16, // 16 x i8 v32i8 = 17, // 32 x i8 v2i16 = 18, // 2 x i16 v4i16 = 19, // 4 x i16 v8i16 = 20, // 8 x i16 v16i16 = 21, // 16 x i16 v2i32 = 22, // 2 x i32 So, for my platform with the 'and' I promote all i8...

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

2014 Sep 20

2

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

After some adding some serious ninja-ry to the new shuffle lowering... On Fri, Sep 19, 2014 at 11:53 AM, Quentin Colombet <qcolombet at apple.com> wrote: > 2. none_useless_shuflle none > Instead of using a single move to materialize a zero extended constant > into a vector register, we explicitly zeroed a vector register and use a > shuffle. > ... this test case is fixed,

[LLVMdev] Vector promotion broken for <2 x [i8|i16]>

2012 Jul 30

0

[LLVMdev] Vector promotion broken for <2 x [i8|i16]>

...do while loop is important, it just increments the type, starting at MVT::v4i8 until it hits a legal type. This seems broken to me. Here is what TOT LLVM has for its MVT list: v4i8 = 14, // 4 x i8 v8i8 = 15, // 8 x i8 v16i8 = 16, // 16 x i8 v32i8 = 17, // 32 x i8 v2i16 = 18, // 2 x i16 v4i16 = 19, // 4 x i16 v8i16 = 20, // 8 x i16 v16i16 = 21, // 16 x i16 v2i32 = 22, // 2 x i32 So, for my platform with the 'and' I promote all i8...

[LLVMdev] Vector promotion broken for <2 x [i8|i16]>

2012 Jul 30

2

[LLVMdev] Vector promotion broken for <2 x [i8|i16]>

...ements > the type, starting at MVT::v4i8 until it hits a legal type. > > This seems broken to me. > Here is what TOT LLVM has for its MVT list: > v4i8 = 14, // 4 x i8 > v8i8 = 15, // 8 x i8 > v16i8 = 16, // 16 x i8 > v32i8 = 17, // 32 x i8 > v2i16 = 18, // 2 x i16 > v4i16 = 19, // 4 x i16 > v8i16 = 20, // 8 x i16 > v16i16 = 21, // 16 x i16 > v2i32 = 22, // 2 x i32 > > So, for my platform with...

[LLVMdev] Vector promotion broken for <2 x [i8|i16]>

2012 Jul 30

0

[LLVMdev] Vector promotion broken for <2 x [i8|i16]>

...ments > the type, starting at MVT::v4i8 until it hits a legal type. > > This seems broken to me. > Here is what TOT LLVM has for its MVT list: > v4i8 = 14, // 4 x i8 > v8i8 = 15, // 8 x i8 > v16i8 = 16, // 16 x i8 > v32i8 = 17, // 32 x i8 > v2i16 = 18, // 2 x i16 > v4i16 = 19, // 4 x i16 > v8i16 = 20, // 8 x i16 > v16i16 = 21, // 16 x i16 > v2i32 = 22, // 2 x i32 > > So, for my platform with...

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 24

2

[LoopVectorizer] Improving the performance of dot product reduction loop

...that can do VPMADDWD and VPADDD in one > instruction, but that width mismatch makes that instruction > difficult to utilize. > > > > In order for the backend to handle this better it would be > great if we could have something like two v32i8 loads, two > shufflevectors to extract the even elements and the odd > elements to create four v16i8 pieces. > > > Why v*i8 loads? I thought that we have 16-bit and 32-bit types here? > > > Sign extend each of those pieces. Multiply the two even p...

[LLVMdev] Vector promotion broken for <2 x [i8|i16]>

2012 Jul 30

0

[LLVMdev] Vector promotion broken for <2 x [i8|i16]>

I don't know how your target architecture looks like, but I suspect that <4 x i8> should not be legalized to <1 x i32>. I think that what you are seeing is that <4 x i8> is first split into <2 x i8>, and later promoted to <2 x i32>. At the moment different targets can only affect type-legalization by declaring different legal types. A number of us discussed the

search for: v32i8