search for: hasavx2

Displaying 15 results from an estimated 15 matches for "hasavx2".

Did you mean: hasavx
2013 Sep 12
2
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
...core-avx2" platform on Haswell i7 CPUs when running -march=native. Currently it detects it as generic x86_64. lib/Support/Host.cpp: * Haswell is detected for CPUID Family 6 Model 60 * Similar to Ivy and Sandy Bridge we check for AVX2 since some Haswell Pentiums are SSE4.x only * I have marked HasAVX2 as "volatile", since otherwise it gets magically zeroed (by optimizer?) when compiling clang with latest clang build from trunk lib/Target/X86/X86Subtarget.cpp: * Also enabling X86::FeatureFastUAMem for Haswell Regards, -- Adam Strzelecki | nanoant.com | twitter.com/nanoant -----------...
2013 Nov 22
2
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
...lib/Support/Host.cpp +++ b/lib/Support/Host.cpp @@ -138,6 +138,8 @@ std::string sys::getHostCPUName() { // switch, then we have full AVX support. const unsigned AVXBits = (1 << 27) | (1 << 28); bool HasAVX = ((ECX & AVXBits) == AVXBits) && OSHasAVXSupport(); + bool HasAVX2 = HasAVX && !GetX86CpuIDAndInfo(0x7, &EAX, &EBX, &ECX, &EDX) && + (EBX & 0x20); GetX86CpuIDAndInfo(0x80000001, &EAX, &EBX, &ECX, &EDX); bool Em64T = (EDX >> 29) & 0x1; @@ -258,6 +260,12 @@ std::string sys::getHost...
2013 Sep 12
0
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
Hi Adam, > * I have marked HasAVX2 as "volatile", since otherwise it gets > magically zeroed (by optimizer?) when compiling clang with latest > clang build from trunk That's far more worrying to me than not being able to detect Haswell. I can't reproduce the problem here at the moment: both debug and release...
2013 Sep 12
3
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
...elease builds give identical assembly for Host.cpp. OK. I know the reason you cannot reproduce it, before posting the patch I've decided to check for AVX before checking AVX2, just not to cpuid AVX2 when we don't have AVX1 anyway. So the problem exists with following predicate: (1) bool HasAVX2 = !GetX86CpuIDAndInfo(0x7, &EAX, &EBX, &ECX, &EDX) && (EBX & 0x20); However it is working absolutely fine if I add "volatile": (2) volatile bool HasAVX2 = !GetX86CpuIDAndInfo(0x7, &EAX, &EBX, &ECX, &EDX) &&...
2013 Nov 22
0
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
Hi Adam, > + bool HasAVX2 = HasAVX && !GetX86CpuIDAndInfo(0x7, &EAX, &EBX, &ECX, &EDX) && > + (EBX & 0x20); I don't think this guarantees %ecx is 0, does it? Wasn't that the entire reason the original code went wrong? Cheers. Tim.
2013 Sep 12
0
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
Hi Adam, > OK. I know the reason you cannot reproduce it, before posting > the patch I've decided to check for AVX before checking AVX2, > just not to cpuid AVX2 when we don't have AVX1 anyway. I suspect it was also incompetence on my part. Given the differences I'm seeing now I can't believe there'd be *no* difference in my tests if I'd done them properly.
2013 Nov 23
0
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
...port for XSAVE, XRESTORE and AVX, and XGETBV @@ -138,15 +215,12 @@ std::string sys::getHostCPUName() { // switch, then we have full AVX support. const unsigned AVXBits = (1 << 27) | (1 << 28); bool HasAVX = ((ECX & AVXBits) == AVXBits) && OSHasAVXSupport(); + bool HasAVX2 = HasAVX && MaxLeaf >= 0x7 && + !GetX86CpuIDAndInfoEx(0x7, 0x0, &EAX, &EBX, &ECX, &EDX) && + (EBX & 0x20); GetX86CpuIDAndInfo(0x80000001, &EAX, &EBX, &ECX, &EDX); bool Em64T = (EDX >> 29) &amp...
2013 Nov 23
2
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
I agree with Tim, you need to implement a GetCpuIDAndInfoEx function in Host.cpp and pass the correct value to ecx. Also you need to verify that 7 is a valid leaf because an invalid leaf is defined to return the highest supported leaf on that processor. So if a processor supports say leaf 6 and not leaf 7, then an access leaf 7 will return the data from leaf 6 causing unrelated bits to be
2016 Feb 26
2
how to force llvm generate gather intrinsic
If I'm understanding correctly, you're saying that vgather* is slow on all of Excavator, Haswell, Broadwell, and Skylake (client). Therefore, we will not generate it for any of those machines. Even if that's true, we should not define "gatherIsSlow()" as "hasAVX2() && !hasAVX512()". It could break for some hypothetical future processor that manages to implement it properly. The AVX2 spec includes gather; whether it's slow or fast is an implementation detail. We need a feature bit / cost model entry somewhere to signify this, so we're no...
2013 Nov 22
2
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
>> + bool HasAVX2 = HasAVX && !GetX86CpuIDAndInfo(0x7, &EAX, &EBX, &ECX, &EDX) && >> + (EBX & 0x20); > > I don't think this guarantees %ecx is 0, does it? Wasn't that the > entire reason the original code went wrong? I don’t remember really...
2017 Sep 21
1
VSelect Instruction Error
Hello, I am getting this error. What instruction is required to be implemented? LLVM ERROR: Cannot select: t22: v32i32 = vselect t724, t11, t16 t724: v32i32,ch = load<LD128[FixedStack1]> t723, FrameIndex:i64<1>, undef:i64 t659: i64 = FrameIndex<1> t10: i64 = undef t11: v32i32,ch = load<LD128[%sunkaddr45](align=4)(tbaa=<0x481f1e8>)> t0, t8, undef:i64
2016 Feb 26
0
how to force llvm generate gather intrinsic
...I'm understanding correctly, you're saying that vgather* is slow on all > of Excavator, Haswell, Broadwell, and Skylake (client). Therefore, we will > not generate it for any of those machines. > > Even if that's true, we should not define "gatherIsSlow()" as "hasAVX2() > && !hasAVX512()". It could break for some hypothetical future processor > that manages to implement it properly. The AVX2 spec includes gather; > whether it's slow or fast is an implementation detail. We need a feature > bit / cost model entry somewhere to signify t...
2015 May 04
3
[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo
Thanks Nadav for the info. It clears my query :) Yes its an integer ADD, and since AVX2 supports 256 bits integer arithmetic, so its cost is less than AVX1. One query though - shouldn't then the cost of integer ADD/SUB/MUL (which would be 1) be explicitly specified in AVX2 cost table? Because right now this entry is missing and cost of these operations are taken from BaseTTI (which is
2016 Feb 26
0
how to force llvm generate gather intrinsic
No. Gather operation is slow on AVX2 processors. - Elena From: zhi chen [mailto:zchenhn at gmail.com] Sent: Thursday, February 25, 2016 20:48 To: Sanjay Patel <spatel at rotateright.com> Cc: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Nema, Ashutosh <Ashutosh.Nema at amd.com>; llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] how to force
2016 Feb 25
2
how to force llvm generate gather intrinsic
It seems that http://reviews.llvm.org/D15690 only implemented gather/scatter for AVX-512, but not for AVX/AVX2. Is there any plan to enable gather for AVX/2? Thanks. Best, Zhi On Thu, Feb 25, 2016 at 8:28 AM, Sanjay Patel <spatel at rotateright.com> wrote: > I don't think gather has been enabled for AVX2 as of r261875. > Masked load/store were enabled for AVX with: >