thr3ads.net - search: "vr128"

Displaying 20 results from an estimated 99 matches for "vr128".

Did you mean: v128

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

2011 Sep 22

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

...AVX 128-bit versions too? I added the avx checks to the same file (in which case calling it sse3-haddsub.ll is not so great). > 4) Your tablegen modifications are totally fine, for the intrinsics just do: > > let Predicates = [HasSSE3] in { > def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2), > (HADDPSrr VR128:$src1, VR128:$src2)>; > def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)), > (HADDPSrm VR128:$src1, addr:$src2)>; > ... > > and > > let Predicates = [HasAVX] in { > def : Pat<...

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

2012 Jul 26

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

...gt; XMM0 They are supposed to represent the 32-bit and 64-bit low parts of the xmm registers, but since we don't define explicit registers for those sub-registers, we are left with idempotent sub-register indexes. We have three different register classes for the xmm registers: FR32, FR64, and VR128. The sub_ss and sub_sd indexes used to play a role in selecting the right register class, but not any longer. That is all derived from the instruction descriptions now. As far as I can tell, all sub-register operations involving sub_ss and sub_sd can simply be replaced with COPY_TO_REGCLASS: de...

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

2012 Jul 26

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

...onfused. Below you note that they are used in patterns, so they are certainly mentioned more than just in the code above. > As far as I can tell, all sub-register operations involving sub_ss and > sub_sd can simply be replaced with COPY_TO_REGCLASS: > > def : Pat<(v4i32 (X86Movsd VR128:$src1, VR128:$src2)), > (VMOVSDrr VR128:$src1, (EXTRACT_SUBREG (v4i32 VR128:$src2), > sub_sd))>; > > Becomes: > > def : Pat<(v4i32 (X86Movsd VR128:$src1, VR128:$src2)), > (VMOVSDrr VR128:$src1, (...

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

2012 Jul 26

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

..., at 9:43 AM, dag at cray.com wrote: > Jakob Stoklund Olesen <jolesen at apple.com> writes: > >> As far as I can tell, all sub-register operations involving sub_ss and >> sub_sd can simply be replaced with COPY_TO_REGCLASS: >> >> def : Pat<(v4i32 (X86Movsd VR128:$src1, VR128:$src2)), >> (VMOVSDrr VR128:$src1, (EXTRACT_SUBREG (v4i32 VR128:$src2), >> sub_sd))>; >> >> Becomes: >> >> def : Pat<(v4i32 (X86Movsd VR128:$src1, VR128:$src2)), >>...

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

2011 Sep 21

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

...rizontal.ll to sse3-haddsub.ll 3) Can you duplicate the testcase file to something like avx-haddsub.ll, and check for the AVX 128-bit versions too? 4) Your tablegen modifications are totally fine, for the intrinsics just do: let Predicates = [HasSSE3] in { def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2), (HADDPSrr VR128:$src1, VR128:$src2)>; def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)), (HADDPSrm VR128:$src1, addr:$src2)>; ... and let Predicates = [HasAVX] in { def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR...

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

2011 Sep 22

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

...AVX 128-bit versions too? I added the avx checks to the same file (in which case calling it sse3-haddsub.ll is not so great). > 4) Your tablegen modifications are totally fine, for the intrinsics just do: > > let Predicates = [HasSSE3] in { > def : Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), VR128:$src2), > (HADDPSrr VR128:$src1, VR128:$src2)>; def : > Pat<(int_x86_sse3_hadd_ps (v4f32 VR128:$src1), (memop addr:$src2)), > (HADDPSrm VR128:$src1, addr:$src2)>; ... > > and > > let Predicates = [HasAVX] in { > def : Pat<(int...

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

2011 Sep 21

[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits

This patch synthesizes haddps/haddpd/hsubps/hsubpd instructions from floating point additions and subtractions of appropriate vector shuffles. To do this I introduced new x86 FHADD and FHSUB opcodes. These need to be wired up somehow in the .td file to the appropriate instructions. Since I have no idea how tablegen works I just hacked it in horribly. It works, but breaks support for the hadd

[LLVMdev] Reducing .td redundancy

2009 Mar 24

[LLVMdev] Reducing .td redundancy

...ins FR32:$src1, FR32: $src2), !strconcat(OpcodeStr, "ps"\t{$src2, $dst|$dst, $src2}"), [(set FR32:$dst, (!SOME_CONCAT("x86f", OpNode) FR32: $src1, FR32:$src2))]>; // Vector operation def PSrr : PSI<opc, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src1, VR128: $src2), !strconcat(OpcodeStr, "ps"\t{$src2, $dst|$dst, $src2}"), [(set VR128:$dst, v2i64 (OpNode VR128:$src1, VR128: $src2))]>; // Bitconverted vector operation def PSrm : PSI<opc, MRMSrcMem,...

[LLVMdev] question on table gen TIED_TO constraint

2012 Jul 09

[LLVMdev] question on table gen TIED_TO constraint

I need to implement an instruction which has 2 read-write registers, so I added let Constraints = "$src1 = $dst, $mask = $mask_wb" in { ... def rm : AVX28I<opc, MRMSrcMem, (outs VR128:$dst, VR128:$mask_wb), (ins VR128:$src1, v128mem:$src2, VR128:$mask), ... } There is a problem since MRMSrcMem assumes the 2nd physical operand is a memory operand. See the section about MRMSrcMem in RecognizableInstr::emitInstructionSpecifier. And the above gives us $dst, $mask_wb, $sr...

[LLVMdev] Patterns with Multiple Stores

2008 Nov 17

[LLVMdev] Patterns with Multiple Stores

I want to write a pattern that looks something like this: def : Pat<(unalignedstore (v2f64 VR128:$src), addr:$dst), (MOVSDmr ADD64ri8(addr:$dst, imm:8), ( SHUFPDrri (VR128:$src, (MOVSDmr addr:$dst, FR64:$src))), imm:3) So I want to convert an unaligned vector store to a scalar store, a shuffle and a scalar store. There are several question I have: - Is the imm:3 syntax...

[LLVMdev] Bogus X86-64 Patterns

2007 Dec 12

[LLVMdev] Bogus X86-64 Patterns

Tracking down a problem with one of our benchmark codes, we've discovered that some of the patterns in X86InstrX86-64.td are wrong. Specifically: def MOV64toPQIrm : RPDI<0x6E, MRMSrcMem, (outs VR128:$dst), (ins i64mem:$src), "mov{d|q}\t{$src, $dst|$dst, $src}", [(set VR128:$dst, (v2i64 (scalar_to_vector (loadi64 addr:$src))))]>; def MOVPQIto64mr : RPDI<0x7E, MRMDestMem, (outs), (ins i64mem:$dst, VR128:...

[LLVMdev] Patterns with Multiple Stores

2008 Nov 17

[LLVMdev] Patterns with Multiple Stores

On Monday 17 November 2008 14:28, David Greene wrote: > I want to write a pattern that looks something like this: > > def : Pat<(unalignedstore (v2f64 VR128:$src), addr:$dst), > (MOVSDmr ADD64ri8(addr:$dst, imm:8), ( SHUFPDrri (VR128:$src, > (MOVSDmr addr:$dst, FR64:$src))), imm:3) > > So I want to convert an unaligned vector store to a scalar store, a shuffle > and a scalar store. I got a little further with this:...

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

2012 Jul 26

[LLVMdev] X86 sub_ss and sub_sd sub-register indexes

...lt;jolesen at apple.com> writes: >> What happens if the result of the above pattern using COPY_TO_REGCLASS >> is spilled? Will we get a 64-bit store or a 128-bit store? > > This behavior isn't affected by the change. FR64 registers are spilled > with 64-bit stores, and VR128 registers are spilled with 128-bit > stores. > > When the register coalescer removes a copy between VR128 and FR64 > registers, it chooses the larger spill size for the result. This is > the same for sub-register copies and full register copies. So if I understand this correctly, a...

[LLVMdev] x86 Vector Shuffle Patterns

2010 Aug 04

[LLVMdev] x86 Vector Shuffle Patterns

...$src1, node:$src2), [{ return X86::isVPERM2F128Mask(cast<ShuffleVectorSDNode>(N)); }], SHUFFLE_get_vperm2f128_imm>; I don't understand completely how the new system all works. Take a simple SHUFPS match: def SHUFPSrri : PSIi8<0xC6, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src1, VR128:$src2, i8imm:$src3), "shufps\t{$src3, $src2, $dst|$dst, $src2, $src3}", [(set VR128:$dst, (v4f32 (shufp:$src3 VR128:$src1, VR128:$src2)))]>; "...

[LLVMdev] RFC: AVX Pattern Specification [LONG]

2009 Apr 30

[LLVMdev] RFC: AVX Pattern Specification [LONG]

...(ins FR32:$src1, f32mem:$src2), !strconcat(OpcodeStr, "ss\t{$src2, $dst|$dst, $src2}"), [(set FR32:$dst, (OpNode FR32:$src1, (load addr:$src2)))]>; // Vector operation, reg+reg. def PSrr : PSI<opc, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src1, VR128:$src2), !strconcat(OpcodeStr, "ps\t{$src2, $dst|$dst, $src2}"), [(set VR128:$dst, (v4f32 (OpNode VR128:$src1, VR128:$src2)))]> { let isComm...

[LLVMdev] Reducing .td redundancy

2009 Mar 24

[LLVMdev] Reducing .td redundancy

...!strconcat(OpcodeStr, "ps"\t{$src2, $dst|$dst, > $src2}"), > [(set FR32:$dst, (!SOME_CONCAT("x86f", OpNode) > FR32: > $src1, FR32:$src2))]>; > > // Vector operation > def PSrr : PSI<opc, MRMSrcReg, (outs VR128:$dst), (ins VR128:$src1, > VR128: > $src2), > !strconcat(OpcodeStr, "ps"\t{$src2, $dst|$dst, > $src2}"), > [(set VR128:$dst, v2i64 (OpNode VR128:$src1, > VR128: > $src2))]>; > > // Bitconverted vector oper...

[LLVMdev] Pat<> & tlbgen

2009 Nov 03

[LLVMdev] Pat<> & tlbgen

Can someone explain the magic behind the Pat<> construct and tblgen. >From X86InstrSSE.td: def : Pat<(v4f32 (vector_shuffle VR128:$src, (undef), MOVDDUP_shuffle_mask)), (MOVLHPSrr VR128:$src, VR128:$src)>, Requires<[HasSSE1]>; Where's the code in tblgen to emit the matching code for this? I'm trying to extend it so that Pat<> can be used as a general subclass for AVX: class Base<dag pat,...

[LLVMdev] Register class intersection

2009 Apr 28

[LLVMdev] Register class intersection

...isters - it also holds information about spill size and alignment. Value types are no longer interesting once the selection DAG has been destroyed. X86 has the weird examples as usual: Classes RFP32, RFP64, and RFP80 are identical (FP0-6) except for the spill size. The same goes for FR64 and VR128 (XMM0-15). The coalescer will join these classes as follows: RFP32 + RFP64 -> RFP64 FR64 + VR128 -> VR128 This seems perfectly reasonable - choose the larger spill size and avoid losing data. TableGen thinks these classes are unrelated - it currently defines register subclasses as fol...

[LLVMdev] question on table gen TIED_TO constraint

2012 Jul 10

[LLVMdev] question on table gen TIED_TO constraint

On Jul 9, 2012, at 4:15 PM, Manman Ren <mren at apple.com> wrote: > > I need to implement an instruction which has 2 read-write registers, so I added > let Constraints = "$src1 = $dst, $mask = $mask_wb" in { > ... > def rm : AVX28I<opc, MRMSrcMem, (outs VR128:$dst, VR128:$mask_wb), > (ins VR128:$src1, v128mem:$src2, VR128:$mask), > ... > } > There is a problem since MRMSrcMem assumes the 2nd physical operand is a memory operand. > See the section about MRMSrcMem in RecognizableInstr::emitInstructionSpecifier. Can this be fixed...

[LLVMdev] question on table gen TIED_TO constraint

2012 Jul 10

[LLVMdev] question on table gen TIED_TO constraint

...at 4:15 PM, Manman Ren <mren at apple.com> wrote: > >> >> I need to implement an instruction which has 2 read-write registers, so I added >> let Constraints = "$src1 = $dst, $mask = $mask_wb" in { >> ... >> def rm : AVX28I<opc, MRMSrcMem, (outs VR128:$dst, VR128:$mask_wb), >> (ins VR128:$src1, v128mem:$src2, VR128:$mask), >> ... >> } >> There is a problem since MRMSrcMem assumes the 2nd physical operand is a memory operand. >> See the section about MRMSrcMem in RecognizableInstr::emitInstructionSpecifier....

search for: vr128