search for: ymm

Displaying 20 results from an estimated 92 matches for "ymm".

Did you mean: mm
2012 Jan 09
2
[LLVMdev] Calling conventions for YMM registers on AVX
I'll explain what we see in the code. 1. The caller saves XMM registers across the call if needed (according to DEFS definition). YMMs are not in the set, so caller does not take care. 2. The callee preserves XMMs but works with YMMs and clobbering them. 3. So after the call, the upper part of YMM is gone. - Elena -----Original Message----- From: Bruno Cardoso Lopes [mailto:bruno.cardoso at gmail.com] Sent: Monday, January 09,...
2012 Jan 08
2
[LLVMdev] Calling conventions for YMM registers on AVX
Hi, What is the calling conventions for YMM. According to documents I saw till now, the YMMs are scratch and not saved in callee. This is also the default behavior of the Intel Compiler. In X86InstrControl.td the YMMs are not in "defs" set of call. - Elena --------------------------------------------------------------------- In...
2012 Jan 09
0
[LLVMdev] Calling conventions for YMM registers on AVX
Hi, > What is the calling conventions for YMM. According to documents I saw till now, the YMMs are scratch and not saved in callee. > This is also the default behavior of the Intel Compiler. x86_64 Non-windows targets use the rules defined in the x86_64 abi! > In X86InstrControl.td the YMMs are not in "defs" set of call. The...
2012 Jan 09
0
[LLVMdev] Calling conventions for YMM registers on AVX
On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote: > I'll explain what we see in the code. > 1. The caller saves XMM registers across the call if needed (according to DEFS definition). > YMMs are not in the set, so caller does not take care. This is not how the register allocator works. It saves the registers holding values, it doesn't care which alias is clobbered. Are you saying that only the xmm part of a ymm register gets spilled before a call? > 2. The callee preserves X...
2012 Jan 09
3
[LLVMdev] Calling conventions for YMM registers on AVX
On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote: > > On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote: > >> I'll explain what we see in the code. >> 1. The caller saves XMM registers across the call if needed (according to DEFS definition). >> YMMs are not in the set, so caller does not take care. > > This is not how the register allocator works. It saves the registers holding values, it doesn't care which alias is clobbered. > > Are you saying that only the xmm part of a ymm register gets spilled before a call? > >&...
2012 Jan 10
0
[LLVMdev] Calling conventions for YMM registers on AVX
...test: # @test # BB#0: # %entry pushq %rbp movq %rsp, %rbp subq $64, %rsp vmovaps %xmm7, -32(%rbp) # 16-byte Spill vmovaps %xmm6, -16(%rbp) # 16-byte Spill vmovaps %ymm3, %ymm6 vmovaps %ymm2, %ymm7 vaddps %ymm7, %ymm0, %ymm0 vaddps %ymm6, %ymm1, %ymm1 callq foo vsubps %ymm7, %ymm0, %ymm0 vsubps %ymm6, %ymm1, %ymm1 vmovaps -16(%rbp), %xmm6 # 16-byte Reload vmovaps -32(%rbp), %xmm7 #...
2011 Nov 30
0
[PATCH 2/4] x86/emulator: add emulation of SIMD FP moves
..._pfx ) \ + (dst) = sse_prefix[(vex_pfx) - 1]; \ +} while (0) + union vex { uint8_t raw[2]; struct { @@ -3850,6 +3860,76 @@ x86_emulate( case 0x19 ... 0x1f: /* nop (amd-defined) */ break; + case 0x2b: /* {,v}movntp{s,d} xmm,m128 */ + /* vmovntp{s,d} ymm,m256 */ + fail_if(ea.type != OP_MEM); + /* fall through */ + case 0x28: /* {,v}movap{s,d} xmm/m128,xmm */ + /* vmovap{s,d} ymm/m256,ymm */ + case 0x29: /* {,v}movap{s,d} xmm,xmm/m128 */ + /* vmovap{s,d} ymm,ymm/m256 */ + fail_if(vex.pfx & V...
2013 Apr 09
1
[LLVMdev] inefficient code generation for 128-bit->256-bit typecast intrinsics
Hello, LLVM generates two additional instructions for 128->256 bit typecasts (e.g. _mm256_castsi128_si256()) to clear out the upper 128 bits of YMM register corresponding to source XMM register. vxorps xmm2,xmm2,xmm2 vinsertf128 ymm0,ymm2,xmm0,0x0 Most of the industry-standard C/C++ compilers (GCC, Intel's compiler, Visual Studio compiler) don't generate any extra moves for 128-bit->256-bit typecast intrinsics. None of...
2009 Apr 30
2
[LLVMdev] RFC: AVX Feature Specification
...e first one. In some ways AVX is "just another" SSE level. Having AVX implies you have SSE1-SSE4.2. However AVX is very different from SSE and there are a number of sub-features which may or may not be available on various implementations. So right now I've done this: def FeatureYMM : SubtargetFeature<"ymm", "X86YMM", "true", // Cray "Enable YMM state">; def FeatureVEX : SubtargetFeature<"vex", "X86VEX", "true", // Cray...
2012 Mar 01
3
[LLVMdev] Stack alignment on X86 AVX seems incorrect
Hi Elena, You're correct. LLVM does not align the stack to 32-bytes for AVX and unaligned moves should be used for YMM spills. I wrote some code to align the stack to 32-bytes when AVX spills are present; it does break the x86-64 ABI though. If upstream would be interested in this code, I can arrange with my employer to send a patch to the mailing list. -Cameron On Mar 1, 2012, at 4:09 PM, <llvmdev-request at...
2013 Dec 11
2
[LLVMdev] AVX code gen
Hello - I found this post on the llvm blog: http://blog.llvm.org/2012/12/new-loop-vectorizer.html which makes me think that clang / llvm are capable of generating AVX with packed instructions as well as utilizing the full width of the YMM registers… I have an environment where icc generates these instructions (vmulps %ymm1, %ymm3, %ymm2 for example) but I can not get clang/llvm to generate such instructions (using the 3.3 release or either 3.4 rc1 or 3.4 rc2). I am new to clang / llvm so I may not be invoking the tools correctly b...
2015 Aug 31
2
MCRegisterClass mandatory vs preferred alignment?
...alueType> regTypes, int alignment, > dag regList, RegAltNameIndex idx = NoRegAltName> > > X86RegisterInfo.td: > > def VR256 : RegisterClass<"X86", [v32i8, v16i16, v8i32, v4i64, v8f32, v4f64], > 256, (sequence "YMM%u", 0, 15)>; > def VR256X : RegisterClass<"X86", [v32i8, v16i16, v8i32, v4i64, v8f32, v4f64], > 256, (sequence "YMM%u", 0, 31)>; > > Seems to be 256bits/32bytes. Yeah, don't know how I missed that. :) > > I don'...
2009 Dec 02
2
[LLVMdev] More AVX Advice Needed
I'm working on some of the AVX insert/extract instructions. They're stupid. They do not operate on ymm registers, meaning we have to use VINSERTF128/VEXTRACTF128 and then do the real operation. Anyway, I'm looking at how INSERTPS and friends work and noticed that there are special SelectionDAG nodes for them and corresponding TableGen dag operators (X86insrtps, for example). What's the rea...
2009 Apr 30
0
[LLVMdev] RFC: AVX Feature Specification
...ther" SSE level. Having AVX implies > you have > SSE1-SSE4.2. However AVX is very different from SSE and there are a > number > of sub-features which may or may not be available on various > implementations. > > So right now I've done this: > > def FeatureYMM : SubtargetFeature<"ymm", "X86YMM", "true", // Cray > "Enable YMM state">; > def FeatureVEX : SubtargetFeature<"vex", "X86VEX", "true", // Cray >...
2015 Jul 01
3
[LLVMdev] SLP vectorizer on AVX feature
...will honour those, and front-ends should create them correctly. --renato On 1 July 2015 at 19:06, Frank Winter <fwinter at jlab.org> wrote: > I realized that the function parameters had no alignment attributes on them. > However, even adding an alignment suitable for aligned loads on YMM, i.e. 32 > bytes, didn't convince the vectorizer to use [8 x float]. > > define void @main(i64 %lo, i64 %hi, float* noalias align 32 %arg0, float* > noalias align 32 %arg1, float* noalias align 32 %arg2) { > ... > > results still in code using only [4 x float]. > > Th...
2009 Dec 02
2
[LLVMdev] More AVX Advice Needed
On Wednesday 02 December 2009 16:51, Eli Friedman wrote: > On Wed, Dec 2, 2009 at 2:44 PM, David Greene <dag at cray.com> wrote: > > I'm working on some of the AVX insert/extract instructions.  They're > > stupid.  They do not operate on ymm registers, meaning we have to > > use VINSERTF128/VEXTRACTF128 and then do the real operation. > > > > Anyway, I'm looking at how INSERTPS and friends work and noticed that > > there are special SelectionDAG nodes for them and corresponding TableGen > > dag operato...
2013 Nov 19
6
[PATCH 2/5] X86 architecture instruction set extension definiation
...rocessor''s extended state */ void xstate_init(bool_t bsp) { - u32 eax, ebx, ecx, edx, min_size; + u32 eax, ebx, ecx, edx; u64 feature_mask; if ( boot_cpu_data.cpuid_level < XSTATE_CPUID ) @@ -269,12 +269,6 @@ void xstate_init(bool_t bsp) BUG_ON((eax & XSTATE_YMM) && !(eax & XSTATE_SSE)); feature_mask = (((u64)edx << 32) | eax) & XCNTXT_MASK; - /* FP/SSE, XSAVE.HEADER, YMM */ - min_size = XSTATE_AREA_MIN_SIZE; - if ( eax & XSTATE_YMM ) - min_size += XSTATE_YMM_SIZE; - BUG_ON(ecx < min_size); - /*...
2015 Jul 01
3
[LLVMdev] SLP vectorizer on AVX feature
...e). Did you see a different result on Haswell? Thanks, Nadav > On Jul 1, 2015, at 11:06 AM, Frank Winter <fwinter at jlab.org> wrote: > > I realized that the function parameters had no alignment attributes on them. However, even adding an alignment suitable for aligned loads on YMM, i.e. 32 bytes, didn't convince the vectorizer to use [8 x float]. > > define void @main(i64 %lo, i64 %hi, float* noalias align 32 %arg0, float* noalias align 32 %arg1, float* noalias align 32 %arg2) { > ... > > results still in code using only [4 x float]. > > Thanks, &...
2020 Jul 16
2
LLVM 11 and trunk selecting 4 wide instead of 8 wide loop vectorization for AVX-enabled target
...; 11 now that the branch has been cut, and have noticed an apparent loop vectorization codegen regression for X86 with AVX or AVX2 enabled. The following IR example is vectorized to 4 wide with LLVM 11 and trunk whereas in LLVM 10 it (correctly as per what we want) vectorized it 8 wide matching the ymm registers. ; ModuleID = '../test.ll' source_filename = "main" target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-pc-windows-msvc-coff" %"Burst.Compiler.IL.Tests.VectorsMaths/FloatPointer.0...
2013 Dec 12
0
[LLVMdev] AVX code gen
...t %rbp, -16 movq %rsp, %rbp Ltmp4: .cfi_def_cfa_register %rbp xorl %eax, %eax .align 4, 0x90 LBB0_1: ## %vector.body ## =>This Inner Loop Header: Depth=1 vmovups (%rdx,%rax,4), %ymm0 vmulps (%rsi,%rax,4), %ymm0, %ymm0 vaddps (%rdi,%rax,4), %ymm0, %ymm0 vmovups %ymm0, (%rdi,%rax,4) addq $8, %rax cmpq $256, %rax ## imm = 0x100 jne LBB0_1 ## BB#2: ## %for.end popq %r...