Displaying 20 results from an estimated 200 matches similar to: "Question about VectorLegalizer::ExpandStore() with v4i1"
2016 Jun 28
0
Question about VectorLegalizer::ExpandStore() with v4i1
Hi All,
Can someone comment below question whether it is wrong or not please?
2016-06-25 7:52 GMT+01:00 jingu kang <jaykang10 at gmail.com>:
> Hi All,
>
> I have a problem with VectorLegalizer::ExpandStore() with v4i1.
>
> Let's see a example.
>
> * LLVM IR
> store <4 x i1> %edgeMask_for.body1314, <4 x i1>* %27
>
> * SelectionDAG before vector
2016 Jun 28
2
Question about VectorLegalizer::ExpandStore() with v4i1
On Tue, Jun 28, 2016 at 2:45 AM, jingu kang via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> Hi All,
>
> Can someone comment below question whether it is wrong or not please?
>
> 2016-06-25 7:52 GMT+01:00 jingu kang <jaykang10 at gmail.com>:
>> Hi All,
>>
>> I have a problem with VectorLegalizer::ExpandStore() with v4i1.
>>
>> Let's
2016 Jun 29
0
Question about VectorLegalizer::ExpandStore() with v4i1
Rob, Ahmed, and Jingu,
[I'm sorry if my point of view is too x86 centric.]
>>the tricky part about fixing it is the need to settle on a memory layout for these vectors
>> (packed vs byte per i1; packed would be compatible with AVX512, I think).
I agree with Ahmed here, in principle. It's actually more than that, since vector compare
in AVX2 and below produces the same
2012 Feb 22
2
[LLVMdev] Eliminating copies between overlapping register classes
Hi,
I have two register classes A and B, where A contains a subset of the
registers in B:
A = [R0, R1, R2, ... R128]
B = [RO, R1, R2, ... R128,
T0, T1, T2, ... T128]
I am using the Greedy Register Allocator, and I would expect the register
allocator to eliminate this copy:
%vreg0<def> = COPY %vreg1; B:%vreg0 A:%vreg1
but instead I end up with
%R0<def> = COPY %R1
Is there
2012 Feb 23
1
[LLVMdev] Eliminating copies between overlapping register classes
On Wed, Feb 22, 2012 at 07:00:55PM -0800, Jakob Stoklund Olesen wrote:
>
> On Feb 22, 2012, at 12:01 PM, Tom Stellard wrote:
>
> > Hi,
> >
> > I have two register classes A and B, where A contains a subset of the
> > registers in B:
> >
> > A = [R0, R1, R2, ... R128]
> >
> > B = [RO, R1, R2, ... R128,
> > T0, T1, T2, ... T128]
2012 Feb 23
0
[LLVMdev] Eliminating copies between overlapping register classes
On Feb 22, 2012, at 12:01 PM, Tom Stellard wrote:
> Hi,
>
> I have two register classes A and B, where A contains a subset of the
> registers in B:
>
> A = [R0, R1, R2, ... R128]
>
> B = [RO, R1, R2, ... R128,
> T0, T1, T2, ... T128]
>
> I am using the Greedy Register Allocator, and I would expect the register
> allocator to eliminate this copy:
>
2012 Jun 25
0
[LLVMdev] Boolean floats and v4i1
Hi Hal,
Why do say that the type v4i64 is broken ? You can specify that this type has no legal operations and the codegen will lower ("legalize") them to something that works on your platform.
Nadav
-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Hal Finkel
Sent: Monday, June 25, 2012 06:28
To: LLVM Developers
2012 Jun 25
3
[LLVMdev] Boolean floats and v4i1
Hello,
I'm working on support for the SIMD instruction set on our new BG/Q
supercomputer. This instruction set is v4f64 (with the exception of
some int <-> fp conversions, floating-point only). The vectorized
comparisons, logical operations and selects also exclusively use
floating-point inputs. For those inputs that are logically vectors of
booleans the system uses the following
2012 Jun 25
0
[LLVMdev] Boolean floats and v4i1
You could set the AND operation action to custom. The problem is that you would have no way of knowing if the type 'v4i64' originated from v4i1 or v4i64. And I don't think that you can use SimplifyDemandedBits (to discover if only the high bit is set) during the legalizer because the DAG is in a strange state, but I could be mistaken on this one.
Okay, here is another idea.
2016 May 05
4
LLVM issuse:AArch64 TargetParser
Hi everyone,
I'm a member engineer of linaro's llvm team,coming from Spreadtrum.I am a
new person on LLVM.Now I'm writing a Target Parser for AArch64,so options
parsing of AArch64 about cpu & arch & fpu can be summary to one place.
In the TargetParser,we assume "aarch64" and "arm64" are synonyms of
armv8a(as they are only for armv8a,people usually do
2012 Jun 25
2
[LLVMdev] Boolean floats and v4i1
On Mon, 25 Jun 2012 05:45:57 +0000
"Rotem, Nadav" <nadav.rotem at intel.com> wrote:
> Hi Hal,
>
> Why do say that the type v4i64 is broken ? You can specify that this
> type has no legal operations and the codegen will lower ("legalize")
> them to something that works on your platform.
For example, the AND operation is really only an AND operation
2019 Dec 09
2
[PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs
Sanjay,
I'm looking at some missed optimizations caused by D70246. Here's a test case:
define <4 x float> @f(i32 %t32, <4 x float>* %t24) {
.entry:
%t43 = insertelement <3 x i32> undef, i32 %t32, i32 2
%t44 = bitcast <3 x i32> %t43 to <3 x float>
%t45 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32>
<i32 0, i32 undef,
2016 Aug 02
2
Instruction selection problems due to SelectionDAGBuilder
Hello.
I'm having problems at instruction selection with my back end with the following
basic-block due to a vector add with immediate constant vector (obtained by vectorizing a
simple C program doing vector sum map):
vector.ph: ; preds = %vector.memcheck50
%.splatinsert = insertelement <8 x i64> undef, i64 %i.07.unr, i32 0
2016 May 18
2
LLVM issuse:AArch64 TargetParser
Hi,
A64 versus A32/T32 code generation is controlled by the -target option which I don’t believe is under discussion here.
James
On 18 May 2016, at 13:17, Bruce Hoult <bruce at hoult.org<mailto:bruce at hoult.org>> wrote:
Note that armv8a modifies the A32 and T32 instruction sets, and is therefore an important -march option for 32 bit code. Therefore armv8a can not be used to imply
2019 Aug 27
2
TargetRegisterInfo::getCommonSubClass bug, perhaps.
Hi,
ABCRegister.td :
def SGPR32 : RegisterClass<"ABC", [i32], 16, (add
S0, S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11,
S12, S13, S14, S15
)>;
def SFGPR32 : RegisterClass<"ABC", [f32], 16, (add
S0, S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11,
S12, S13, S14, S15
)>;
===== Instruction selection ends:
...
t8: i32 = ADDrr t37, t32
2017 Feb 11
2
Specify special cases of delay slots in the back end
Hello.
Hal, the problem I have is that it doesn't advance at the next available instruction
- it always gets the same store. This might be because I did not specify in a file like
[Target]Schedule.td the functional units, processor and instruction itineraries.
Regarding the Stalls argument to my method
[Target]DispatchGroupSBHazardRecognizer::getHazardType() I always get the
2016 Dec 12
0
TableGen - Help to implement a form of gather/scatter operations for Mips MSA
Hello.
I wanted to inform that I fixed the bug from the previous email.
The main reason for the bug was that I thought that the SDNode masked_gather is
returning only 1 value, but it returns 2 (hence, I guess, the earlier reported, difficult
to follow, error: "Assertion `New->getNumTypes() == 1").
masked_gather returns 2 values because:
// SDTypeProfile -
2016 Sep 19
2
RFC: New intrinsics masked.expandload and masked.compressstore
Hi all,
AVX-512 ISA introduces new vector instructions VCOMPRESS and VEXPAND in order to allow vectorization of the following loops with two specific types of cross-iteration dependencies:
Compress:
for (int i=0; i<N; ++i)
If (t[i])
*A++ = expr;
Expand:
for (i=0; i<N; ++i)
If (t[i])
X[i] = *A++;
else
2019 Jun 02
2
Optimizing Compare instruction selection
Hi Eli,
Thank you very much for your response.
In fact, I had already tried the X86 approach before, i.e explicitly using the status register. This is the approach that appeals more to me. I left it parked because it also produced some problems (but I left it commented out). So I have now re-lived the code, and it works fine in most cases, but there’s a particular case that causes LLVM to stop
2016 Sep 25
5
RFC: New intrinsics masked.expandload and masked.compressstore
|
|Hi Elena,
|
|Technically speaking, this seems straightforward.
|
|I wonder, however, how target-independent this is in a practical
|sense; will there be an efficient lowering when targeting any other
|ISA? I don't want to get into the territory where, because the
|vectorizer is supposed to be architecture independent, we need to
|add target-independent intrinsics for all