thr3ads.net - similar to: "[LLVMdev] AVX spill alignment"

Displaying 20 results from an estimated 500 matches similar to: "[LLVMdev] AVX spill alignment"

2011 Sep 01

[LLVMdev] AVX spill alignment

On Aug 25, 2011, at 4:17 PM, Cameron McInally wrote: > Hey guys, > > Are spills/reloads of AVX registers using aligned stores/loads? Yes. > I can't > seem to find the code that aligns the stack slots to 32-bytes. Could > someone point me in the right direction? The register class has 256-bit spill alignment: def VR256 : RegisterClass<"X86", [v32i8, v16i16,

[LLVMdev] AVX spill alignment

2011 Sep 01

[LLVMdev] AVX spill alignment

Ah, thanks. That seems easy enough. Sorry to be pedantic, but does that snippet also handle cases where the frame pointer, %rbp, needs to be 32-byte aligned when dynamic allocas are present? I've looked at the ABI, but I don't see any guarantees about 32-byte frame alignment for AVX. That can be trouble when spill slots are based off of the frame pointer, not the stack pointer. Please

MCRegisterClass mandatory vs preferred alignment?

2015 Aug 31

MCRegisterClass mandatory vs preferred alignment?

On 08/31/2015 03:59 PM, Matthias Braun wrote: > Looks to me like the alignment is specified in tablegen. From Target.td: > > class RegisterClass<string namespace, list<ValueType> regTypes, int alignment, > dag regList, RegAltNameIndex idx = NoRegAltName> > > X86RegisterInfo.td: > > def VR256 : RegisterClass<"X86", [v32i8,

MCRegisterClass mandatory vs preferred alignment?

2015 Aug 31

MCRegisterClass mandatory vs preferred alignment?

Looking around today, it appears that TargetRegisterClass and MCRegisterClass only includes a single alignment. This is documented as being the minimum legal alignment, but it appears to often be greater than this in practice. For instance, on x86 the alignment of %ymm0 is listed as 32, not 1. Does anyone know why this is? Additionally, where are these alignments actually defined? I

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

2012 Jul 26

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

All, I've been trying to simplify the way LLVM models sub-register relationships a bit, and the X86 sub_ss and sub_sd sub-register indices are getting in the way. I want to get rid of them. These sub-registers are special, they are only mentioned here: let CompositeIndices = [(sub_ss), (sub_sd)] in { def XMM0: Register<"xmm0">, DwarfRegNum<[17, 21, 21]>; def

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

2012 Jul 26

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

Jakob Stoklund Olesen <jolesen at apple.com> writes: > These sub-registers are special, they are only mentioned here: > > let CompositeIndices = [(sub_ss), (sub_sd)] in { > def XMM0: Register<"xmm0">, DwarfRegNum<[17, 21, 21]>; > def XMM1: Register<"xmm1">, DwarfRegNum<[18, 22, 22]>; > ... I'm confused. Below you

VSelect Instruction Error

2017 Sep 21

VSelect Instruction Error

Hello, I am getting this error. What instruction is required to be implemented? LLVM ERROR: Cannot select: t22: v32i32 = vselect t724, t11, t16 t724: v32i32,ch = load<LD128[FixedStack1]> t723, FrameIndex:i64<1>, undef:i64 t659: i64 = FrameIndex<1> t10: i64 = undef t11: v32i32,ch = load<LD128[%sunkaddr45](align=4)(tbaa=<0x481f1e8>)> t0, t8, undef:i64

[LLVMdev] TableGen Type Contradiction

2009 Nov 18

[LLVMdev] TableGen Type Contradiction

Can anyone puzzle out what tblgen is trying to tell me here? VR256:v32i8:$src MD0.VMOVDQA_256mr: (st:isVoid VR256:v32i8:$src, addr:iPTR: $dst)<<P:Predicate_unindexedstore>><<P:Predicate_store>><<P:Predicate_alignedstore>> /ptmp/dag/universal_build/debug/DEFAULT/llvm/tblgen: In MD0.VMOVDQA_256mr: Type inference contradiction found in node! I don't see any

[LLVMdev] CompositeIndices

2012 Mar 31

[LLVMdev] CompositeIndices

Does anyone know exactly what ComposerIndices in Target.td is all about? I see just one place where it's used in X86 but it's not clear from the comments in Target.td and it's one usage, exactly what this feature is about. Tia. Reed

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

2012 Jul 26

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

On Jul 26, 2012, at 9:43 AM, dag at cray.com wrote: > Jakob Stoklund Olesen <jolesen at apple.com> writes: > >> As far as I can tell, all sub-register operations involving sub_ss and >> sub_sd can simply be replaced with COPY_TO_REGCLASS: >> >> def : Pat<(v4i32 (X86Movsd VR128:$src1, VR128:$src2)), >> (VMOVSDrr VR128:$src1,

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

2012 Jul 26

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

Jakob Stoklund Olesen <jolesen at apple.com> writes: >> What happens if the result of the above pattern using COPY_TO_REGCLASS >> is spilled? Will we get a 64-bit store or a 128-bit store? > > This behavior isn't affected by the change. FR64 registers are spilled > with 64-bit stores, and VR128 registers are spilled with 128-bit > stores. > > When the

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 23

[LoopVectorizer] Improving the performance of dot product reduction loop

~Craig On Mon, Jul 23, 2018 at 4:24 PM Hal Finkel <hfinkel at anl.gov> wrote: > > On 07/23/2018 05:22 PM, Craig Topper wrote: > > Hello all, > > This code https://godbolt.org/g/tTyxpf is a dot product reduction loop > multipying sign extended 16-bit values to produce a 32-bit accumulated > result. The x86 backend is currently not able to optimize it as well as gcc

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 24

[LoopVectorizer] Improving the performance of dot product reduction loop

On Tue, Jul 24, 2018 at 6:10 AM Hal Finkel <hfinkel at anl.gov> wrote: > > On 07/23/2018 06:37 PM, Craig Topper wrote: > > > ~Craig > > > On Mon, Jul 23, 2018 at 4:24 PM Hal Finkel <hfinkel at anl.gov> wrote: > >> >> On 07/23/2018 05:22 PM, Craig Topper wrote: >> >> Hello all, >> >> This code https://godbolt.org/g/tTyxpf

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

2012 Jul 26

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

On Jul 26, 2012, at 10:28 AM, dag at cray.com wrote: > Jakob Stoklund Olesen <jolesen at apple.com> writes: > >>> What happens if the result of the above pattern using COPY_TO_REGCLASS >>> is spilled? Will we get a 64-bit store or a 128-bit store? >> >> This behavior isn't affected by the change. FR64 registers are spilled >> with 64-bit

[LoopVectorizer] Improving the performance of dot product reduction loop

2018 Jul 23

[LoopVectorizer] Improving the performance of dot product reduction loop

Hello all, This code https://godbolt.org/g/tTyxpf is a dot product reduction loop multipying sign extended 16-bit values to produce a 32-bit accumulated result. The x86 backend is currently not able to optimize it as well as gcc and icc. The IR we are getting from the loop vectorizer has several v8i32 adds and muls inside the loop. These are fed by v8i16 loads and sexts from v8i16 to v8i32. The

[LLVMdev] Vector promotion broken for <2 x [i8|i16]>

2012 Jul 30

[LLVMdev] Vector promotion broken for <2 x [i8|i16]>

Hrmm.... PromoteVectorOp doesn't seem to follow this at all. http://llvm.org/svn/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp SDValue VectorLegalizer::PromoteVectorOp(SDValue Op) { // Vector "promotion" is basically just bitcasting and doing the operation // in a different type. For example, x86 promotes ISD::AND on v2i32 to // v1i64. EVT VT =

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

2012 Jul 26

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

Jakob Stoklund Olesen <jolesen at apple.com> writes: >> If the 128-bit register is not ever used as a 128-bit register, >> shouldn't the coalescer pick the 64- or 32-bit register? > > That optimization is not currently implemented for sub-registers. For > example, if you create a GR64 virtual register and only ever use the > sub_32bit sub-register, it would be

[LLVMdev] Vector promotion broken for <2 x [i8|i16]>

2012 Jul 30

[LLVMdev] Vector promotion broken for <2 x [i8|i16]>

v4i8 itself is a legal type, just not on the 'AND' operation. So there seems to be multiple problems here. 1) PromoteVectorOp doesn't handle the case where the types are not the same size, this occurs because #2 2) getTypeToPromoteTo doesn't actual check to see if the type it should promote to makes any sense. 3) PromoteVectorOp also doesn't handle the case where

[LLVMdev] x86 Vector Shuffle Patterns

2010 Aug 04

[LLVMdev] x86 Vector Shuffle Patterns

I have a few questions about the new vector shuffle matching code in the x86 .td files. It's a big improvement over the old system and provides the context that code generation for AVX needs. This is great! I'm asking because I'm having some trouble converting some AVX patterns over to the new system. I'm getting this error from tblgen: VyPERM2F128PDirrmi: (set:isVoid

[LLVMdev] Vector promotion broken for <2 x [i8|i16]>

2012 Jul 30

[LLVMdev] Vector promotion broken for <2 x [i8|i16]>

Notice that PromoteVectorOp is called after the type legalization legalized all of the types in the program. It legalizes the *operations*, not the types. So, you should only see legal types (Legal types are types that fit into your registers). So, if your target has v2i32, I suspect that v4i8 is an illegal because it has a different size. -----Original Message----- From: Villmow, Micah

similar to: [LLVMdev] AVX spill alignment