thr3ads.net - similar to: "Question about VectorLegalizer::ExpandStore() with v4i1"

Displaying 20 results from an estimated 200 matches similar to: "Question about VectorLegalizer::ExpandStore() with v4i1"

Question about VectorLegalizer::ExpandStore() with v4i1

2016 Jun 28

Question about VectorLegalizer::ExpandStore() with v4i1

Hi All, Can someone comment below question whether it is wrong or not please? 2016-06-25 7:52 GMT+01:00 jingu kang <jaykang10 at gmail.com>: > Hi All, > > I have a problem with VectorLegalizer::ExpandStore() with v4i1. > > Let's see a example. > > * LLVM IR > store <4 x i1> %edgeMask_for.body1314, <4 x i1>* %27 > > * SelectionDAG before vector

Question about VectorLegalizer::ExpandStore() with v4i1

2016 Jun 28

Question about VectorLegalizer::ExpandStore() with v4i1

On Tue, Jun 28, 2016 at 2:45 AM, jingu kang via llvm-dev <llvm-dev at lists.llvm.org> wrote: > Hi All, > > Can someone comment below question whether it is wrong or not please? > > 2016-06-25 7:52 GMT+01:00 jingu kang <jaykang10 at gmail.com>: >> Hi All, >> >> I have a problem with VectorLegalizer::ExpandStore() with v4i1. >> >> Let's

Question about VectorLegalizer::ExpandStore() with v4i1

2016 Jun 29

Question about VectorLegalizer::ExpandStore() with v4i1

Rob, Ahmed, and Jingu, [I'm sorry if my point of view is too x86 centric.] >>the tricky part about fixing it is the need to settle on a memory layout for these vectors >> (packed vs byte per i1; packed would be compatible with AVX512, I think). I agree with Ahmed here, in principle. It's actually more than that, since vector compare in AVX2 and below produces the same

[LLVMdev] Eliminating copies between overlapping register classes

2012 Feb 22

[LLVMdev] Eliminating copies between overlapping register classes

Hi, I have two register classes A and B, where A contains a subset of the registers in B: A = [R0, R1, R2, ... R128] B = [RO, R1, R2, ... R128, T0, T1, T2, ... T128] I am using the Greedy Register Allocator, and I would expect the register allocator to eliminate this copy: %vreg0<def> = COPY %vreg1; B:%vreg0 A:%vreg1 but instead I end up with %R0<def> = COPY %R1 Is there

[LLVMdev] Eliminating copies between overlapping register classes

2012 Feb 23

[LLVMdev] Eliminating copies between overlapping register classes

On Wed, Feb 22, 2012 at 07:00:55PM -0800, Jakob Stoklund Olesen wrote: > > On Feb 22, 2012, at 12:01 PM, Tom Stellard wrote: > > > Hi, > > > > I have two register classes A and B, where A contains a subset of the > > registers in B: > > > > A = [R0, R1, R2, ... R128] > > > > B = [RO, R1, R2, ... R128, > > T0, T1, T2, ... T128]

[LLVMdev] Eliminating copies between overlapping register classes

2012 Feb 23

[LLVMdev] Eliminating copies between overlapping register classes

On Feb 22, 2012, at 12:01 PM, Tom Stellard wrote: > Hi, > > I have two register classes A and B, where A contains a subset of the > registers in B: > > A = [R0, R1, R2, ... R128] > > B = [RO, R1, R2, ... R128, > T0, T1, T2, ... T128] > > I am using the Greedy Register Allocator, and I would expect the register > allocator to eliminate this copy: >

[LLVMdev] Boolean floats and v4i1

2012 Jun 25

[LLVMdev] Boolean floats and v4i1

Hi Hal, Why do say that the type v4i64 is broken ? You can specify that this type has no legal operations and the codegen will lower ("legalize") them to something that works on your platform. Nadav -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Hal Finkel Sent: Monday, June 25, 2012 06:28 To: LLVM Developers

[LLVMdev] Boolean floats and v4i1

2012 Jun 25

[LLVMdev] Boolean floats and v4i1

Hello, I'm working on support for the SIMD instruction set on our new BG/Q supercomputer. This instruction set is v4f64 (with the exception of some int <-> fp conversions, floating-point only). The vectorized comparisons, logical operations and selects also exclusively use floating-point inputs. For those inputs that are logically vectors of booleans the system uses the following

[LLVMdev] Boolean floats and v4i1

2012 Jun 25

[LLVMdev] Boolean floats and v4i1

You could set the AND operation action to custom. The problem is that you would have no way of knowing if the type 'v4i64' originated from v4i1 or v4i64. And I don't think that you can use SimplifyDemandedBits (to discover if only the high bit is set) during the legalizer because the DAG is in a strange state, but I could be mistaken on this one. Okay, here is another idea.

LLVM issuse:AArch64 TargetParser

2016 May 05

LLVM issuse:AArch64 TargetParser

Hi everyone, I'm a member engineer of linaro's llvm team,coming from Spreadtrum.I am a new person on LLVM.Now I'm writing a Target Parser for AArch64,so options parsing of AArch64 about cpu & arch & fpu can be summary to one place. In the TargetParser,we assume "aarch64" and "arm64" are synonyms of armv8a(as they are only for armv8a,people usually do

[LLVMdev] Boolean floats and v4i1

2012 Jun 25

[LLVMdev] Boolean floats and v4i1

On Mon, 25 Jun 2012 05:45:57 +0000 "Rotem, Nadav" <nadav.rotem at intel.com> wrote: > Hi Hal, > > Why do say that the type v4i64 is broken ? You can specify that this > type has no legal operations and the codegen will lower ("legalize") > them to something that works on your platform. For example, the AND operation is really only an AND operation

[PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs

2019 Dec 09

[PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs

Sanjay, I'm looking at some missed optimizations caused by D70246. Here's a test case: define <4 x float> @f(i32 %t32, <4 x float>* %t24) { .entry: %t43 = insertelement <3 x i32> undef, i32 %t32, i32 2 %t44 = bitcast <3 x i32> %t43 to <3 x float> %t45 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> <i32 0, i32 undef,

Instruction selection problems due to SelectionDAGBuilder

2016 Aug 02

Instruction selection problems due to SelectionDAGBuilder

Hello. I'm having problems at instruction selection with my back end with the following basic-block due to a vector add with immediate constant vector (obtained by vectorizing a simple C program doing vector sum map): vector.ph: ; preds = %vector.memcheck50 %.splatinsert = insertelement <8 x i64> undef, i64 %i.07.unr, i32 0

LLVM issuse:AArch64 TargetParser

2016 May 18

LLVM issuse:AArch64 TargetParser

Hi, A64 versus A32/T32 code generation is controlled by the -target option which I don’t believe is under discussion here. James On 18 May 2016, at 13:17, Bruce Hoult <bruce at hoult.org<mailto:bruce at hoult.org>> wrote: Note that armv8a modifies the A32 and T32 instruction sets, and is therefore an important -march option for 32 bit code. Therefore armv8a can not be used to imply

TargetRegisterInfo::getCommonSubClass bug, perhaps.

2019 Aug 27

TargetRegisterInfo::getCommonSubClass bug, perhaps.

Hi, ABCRegister.td : def SGPR32 : RegisterClass<"ABC", [i32], 16, (add S0, S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12, S13, S14, S15 )>; def SFGPR32 : RegisterClass<"ABC", [f32], 16, (add S0, S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12, S13, S14, S15 )>; ===== Instruction selection ends: ... t8: i32 = ADDrr t37, t32

Specify special cases of delay slots in the back end

2017 Feb 11

Specify special cases of delay slots in the back end

Hello. Hal, the problem I have is that it doesn't advance at the next available instruction - it always gets the same store. This might be because I did not specify in a file like [Target]Schedule.td the functional units, processor and instruction itineraries. Regarding the Stalls argument to my method [Target]DispatchGroupSBHazardRecognizer::getHazardType() I always get the

TableGen - Help to implement a form of gather/scatter operations for Mips MSA

2016 Dec 12

TableGen - Help to implement a form of gather/scatter operations for Mips MSA

Hello. I wanted to inform that I fixed the bug from the previous email. The main reason for the bug was that I thought that the SDNode masked_gather is returning only 1 value, but it returns 2 (hence, I guess, the earlier reported, difficult to follow, error: "Assertion `New->getNumTypes() == 1"). masked_gather returns 2 values because: // SDTypeProfile -

RFC: New intrinsics masked.expandload and masked.compressstore

2016 Sep 19

RFC: New intrinsics masked.expandload and masked.compressstore

Hi all, AVX-512 ISA introduces new vector instructions VCOMPRESS and VEXPAND in order to allow vectorization of the following loops with two specific types of cross-iteration dependencies: Compress: for (int i=0; i<N; ++i) If (t[i]) *A++ = expr; Expand: for (i=0; i<N; ++i) If (t[i]) X[i] = *A++; else

Optimizing Compare instruction selection

2019 Jun 02

Optimizing Compare instruction selection

Hi Eli, Thank you very much for your response. In fact, I had already tried the X86 approach before, i.e explicitly using the status register. This is the approach that appeals more to me. I left it parked because it also produced some problems (but I left it commented out). So I have now re-lived the code, and it works fine in most cases, but there’s a particular case that causes LLVM to stop

RFC: New intrinsics masked.expandload and masked.compressstore

2016 Sep 25

RFC: New intrinsics masked.expandload and masked.compressstore

| |Hi Elena, | |Technically speaking, this seems straightforward. | |I wonder, however, how target-independent this is in a practical |sense; will there be an efficient lowering when targeting any other |ISA? I don't want to get into the territory where, because the |vectorizer is supposed to be architecture independent, we need to |add target-independent intrinsics for all

similar to: Question about VectorLegalizer::ExpandStore() with v4i1