search for: zmm

Displaying 20 results from an estimated 27 matches for "zmm".

Did you mean: mm
2020 Sep 05
2
Possible AVX512 codegen bug in LLVM 10.0.1?
Hey LLVMDev, Perhaps I'm missing something, but I think I've stumbled across a codegen bug in LLVM 10.0.1 related to AVX512. I've attached a small LLVM IR testcase and generated x86_64 assembly file that shows the bug. The test case is small, but not quite minimal, mostly because of driver code included in the test case so one can compile and run the program. The program does a
2013 Jul 12
2
[LLVMdev] LLVM x86 backend for Intel MIC : trying it out and questions
...[LLVMdev] LLVM x86 backend for Intel MIC : trying it out and questions > > Dear all, > > I'm interested to analyse what could be done with current LLVM trunk to deliver basic Intel MIC support. Let's say, for basic level we'd want just scalar code execution, no threading, no zmm vectors. > Attached verbose in text, but functionally very simple patch copy-pastes x86 and x86_64 backends into 32-bit and 64-bit K1OM. In the end of the message you can find how simple LLVM-generated programs could be compiled & executed on MIC device, using this patch. > > Could you...
2013 Jul 11
2
[LLVMdev] LLVM x86 backend for Intel MIC : trying it out and questions
Dear all, I'm interested to analyse what could be done with current LLVM trunk to deliver basic Intel MIC support. Let's say, for basic level we'd want just scalar code execution, no threading, no zmm vectors. Attached verbose in text, but functionally very simple patch copy-pastes x86 and x86_64 backends into 32-bit and 64-bit K1OM. In the end of the message you can find how simple LLVM-generated programs could be compiled & executed on MIC device, using this patch. Could you please help f...
2013 Nov 19
6
[PATCH 2/5] X86 architecture instruction set extension definiation
...@@ unsigned int xstate_ctxt_size(u64 xcr0) return ebx; } +static bool_t valid_xcr0(u64 xcr0) +{ + if ( !(xcr0 & XSTATE_FP) ) + return 0; + + if ( (xcr0 & XSTATE_YMM) && !(xcr0 & XSTATE_SSE) ) + return 0; + + if ( xcr0 & (XSTATE_OPMASK | XSTATE_ZMM | XSTATE_HI_ZMM) ) + { + if ( !(xcr0 & XSTATE_YMM) ) + return 0; + + if ( ~xcr0 & (XSTATE_OPMASK | XSTATE_ZMM | XSTATE_HI_ZMM) ) + return 0; + } + + return !(xcr0 & XSTATE_BNDREGS) == !(xcr0 & XSTATE_BNDCSR); +} + int validate_xstate(u64...
2017 Jun 21
2
AVX 512 Assembly Code Generation issues
when i generate code with 72 loop iterations. the compiler generates code with using avx512 zmm operations 4 times (16x4=64) and remaining 8 iterations are handled by routine mov operations with EAX register. wouldn't it be better if it uses ymm for remaining 8 iterations as it does when iteration count is between 8 and 15. same for xmm and so on. please correct me if i am wrong. Than...
2013 Jul 15
0
[LLVMdev] LLVM x86 backend for Intel MIC : trying it out and questions
...supported. KNC is scalar ISA (Knights Corner supports a subset of the Intel 64 Architecture instructions) + 512-bit vectors + masks 3) then does MIC calling convention permit generation of programs that use only 32-bit x86 ISA? In other words, in common case, does calling convention require use of zmm registers? Please check what ICC does. X87 registers are supported. - Elena -----Original Message----- From: Dmitry Mikushin [mailto:dmitry at kernelgen.org] Sent: Friday, July 12, 2013 23:44 To: Demikhovsky, Elena Cc: LLVM Developers Mailing List Subject: Re: [LLVMdev] LLVM x86 backend for In...
2013 Jul 12
0
[LLVMdev] LLVM x86 backend for Intel MIC : trying it out and questions
...ing List Subject: [LLVMdev] LLVM x86 backend for Intel MIC : trying it out and questions Dear all, I'm interested to analyse what could be done with current LLVM trunk to deliver basic Intel MIC support. Let's say, for basic level we'd want just scalar code execution, no threading, no zmm vectors. Attached verbose in text, but functionally very simple patch copy-pastes x86 and x86_64 backends into 32-bit and 64-bit K1OM. In the end of the message you can find how simple LLVM-generated programs could be compiled & executed on MIC device, using this patch. Could you please help f...
2013 Jul 15
1
[LLVMdev] LLVM x86 backend for Intel MIC : trying it out and questions
...6-bit vectors are not supported. KNC is scalar ISA (Knights Corner supports a subset of the Intel 64 Architecture instructions) + 512-bit vectors + masks Of course, 512-bit, that was my typo, sorry. > Please check what ICC does. X87 registers are supported. Checked. Unfortunately ICC does use zmm in scalar 64-bit programs, which requires new ABI in LLVM. - D. [1] http://www.old.inf.usi.ch/file/pub/75/tech_report2013.pdf
2010 Oct 21
2
1 way audio asterisk 1.6
Hi ? I ?wonder if?anyone could give some light on SIP NAT. I've having a friken headache with SIP NAT 1 way audio. Client - NAT? - NAT - Server Client can hear users from server side but server cant hear client. ? Ive tried every possible settings externip set localip set NAT= yes / route directmedia yes/ no ? Ive check the sip headers in the debug mode and its using the external address in
2013 Nov 25
0
[PATCH 2/4 V2] X86: enable support for new ISA extensions
...bool_t valid_xcr0(u64 xcr0) +{ + /* FP must be unconditionally set. */ + if ( !(xcr0 & XSTATE_FP) ) + return 0; + + /* YMM depends on SSE. */ + if ( (xcr0 & XSTATE_YMM) && !(xcr0 & XSTATE_SSE) ) + return 0; + + if ( xcr0 & (XSTATE_OPMASK | XSTATE_ZMM | XSTATE_HI_ZMM) ) + { + /* OPMASK, ZMM, and HI_ZMM require YMM. */ + if ( !(xcr0 & XSTATE_YMM) ) + return 0; + + /* OPMASK, ZMM, and HI_ZMM must be the same. */ + if ( ~xcr0 & (XSTATE_OPMASK | XSTATE_ZMM | XSTATE_HI_ZMM) ) + return 0; +...
2013 Dec 02
0
[LLVMdev] JIT on Intel KNC
...same difficulties. KNC (dubbed k1om in compiler utils) is a 64-bit device with 8087-compatible scalar arithmetics and non-standard vector arithmetics. More specifically, the widely used variant of 64-bit ABI implemented by LLVM involves xmm registers, while KNC does not have xmm-s and instead has zmm-s (512-bit wide). This makes standard 64-bit binaries you're trying to compile partially incompatible with KNC. So, even if you'd succeed to compile them somehow, they may fail to run (illegal instruction). There are two possible solutions I know of. First one - use KNC in 32-bit mode. KN...
2013 Dec 02
2
[LLVMdev] JIT on Intel KNC
Hi, in the past few weeks we were able to confirm that the LLVM's JIT compiler can be used for our research project. This was confirmed for x86-64 architecture (with very good performance results by the way). Now, one of our real target architecture is the Intel Xeon Phi (KNC) accelerator in a native execution model. When cross-compiling LLVM (3.4 RC1) for Xeon Phi with CMake following
2016 Jun 26
2
FLAC__SSE_OS change
Dave Yeo wrote: > Doesn't SSE support imply SSE2+ support? Not for the CPU. Just because a CPU supports SSE, does not mean it is guaranteed to support SSE2+. For OS support, I'm not sure. Didn't later version of SSE add new registers? > I have a '96 install of an OS, it has been upgraded until end of life, > and it handles SSE4+ instructions fine even though the
2016 Jun 26
0
FLAC__SSE_OS change
...MM0-8 on i386 and XMM0-15 on amd64, respectively. Register width was increased to 256 bit with the introduction of AVX (YMM0-8 on i386 and YMM0-15 on amd64, respectively) but their number was not changed until AVX-512, which again increased register width to 512 bit, and increased the number of the ZMM registers to 32. But if I am not mistaken, we don't have AVX-512 code in libFLAC yet :-) In any case, the disable-SSE matter is still important. People are still using flac on x86 machines without SSE, for instance AMD Geode CPUs seem to live forever. Riggs
2010 Oct 20
1
SIP 401
Hi ? I am trying to get 2 accounts from voipblaster to talk to each other. Calls withing voipblaster network is free. If I configure two sip clients?with the two accounts it works fine however with Asterisk I am getting SIP 401 ? In my Sip.conf file I?under general ? register = user:password at sip.voipblaster.com ? then I have a sip peer ? ? [FreeCall](default) type= friend context= incoming
2018 Aug 06
2
[PATCH] D50328: [X86][SSE] Combine (some) target shuffles with multiple uses
[NOTE: Removed Phab and reviewers] > ================ > Comment at: test/CodeGen/X86/2012-01-12-extract-sv.ll:12 > +; CHECK-NEXT: vblendps {{.*#+}} xmm1 = xmm1[0],xmm2[1,2,3] > +; CHECK-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,0,0,0] > ; CHECK-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0 > ---------------- > greened wrote: >> Can we make this test less brittle by
2016 Jun 13
2
Loop vectorizer Queires
Hello, I have a few issues in vectorizing loops using Clang 3.8. Will it be ok if I shoot some of my findings and queries here? Meanwhile, can I please know if LLVM support autovectorized MIC instructions for Xeon phi? If so, could you please tell me the flags to use? I am Jumana, a masters student in Embedded system working as a graduate research intern with Intel. For my thesis, I am working
2018 Aug 08
2
[PATCH] D50328: [X86][SSE] Combine (some) target shuffles with multiple uses
...es where we are doing anything so brittle > (although many bug tests could be testing for that "perfect storm", > you can only reduce so far) - we have instructions that can only use > certain registers (e.g. DIV/MUL, PBLENDV's implicit use of xmm0 or > EVEX's upper 15 zmm registers) and ensuring that we handle that > efficiently can be more than a regalloc only issue. Isn't the very change under discussion an example of this? Lots of tests have changed hunks that needn't be there. That's brittle. ISA register assignment requirements should, again,...
2017 Aug 17
4
unable to emit vectorized code in LLVM IR
I assume compiler knows that your only have 2 input values that you just added together 1000 times. Despite the fact that you stored to a[i] and b[i] here, nothing reads them other than the addition in the same loop iteration. So the compiler easily removed the a and b arrays. Same with 'c', it's not read outside the loop so it doesn't need to exist. So the compiler turned your
2016 Sep 14
5
[PATCH 1/2] filearch: Add RISC-V architecture.
--- src/filearch.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/src/filearch.c b/src/filearch.c index 5985b73..cbc8372 100644 --- a/src/filearch.c +++ b/src/filearch.c @@ -56,14 +56,16 @@ cleanup_magic_t_free (void *ptr) # endif COMPILE_REGEXP (re_file_elf, - "ELF.*(MSB|LSB).*(?:executable|shared object|relocatable),