search for: hasavx

Displaying 20 results from an estimated 28 matches for "hasavx".

2014 Sep 19
2
[LLVMdev] predicates vs. requirements [TableGen, X86InstrInfo.td]
...; But this does: > > * let Predicates = [OptForSize] in* { > > def : Pat<(v2f64 (X86VBroadcast (loadf64 addr:$src))), (VMOVDDUPrm > addr > > :$src)>; > > } > > > > I see both forms used on some patterns like this: > > * let Predicates = [HasAVX] *in { > > def : Pat<(X86Movddup (loadv2f64 addr:$src)), > > (VMOVDDUPrm addr:$src)>, *Requires<[HasAVX]>*; > > } > > > > Is a predicate different than a requirement or is this a bug? > > > > There are existing patterns that...
2014 Mar 13
3
[LLVMdev] Possible bug in getCallPreservedMask for CallingConv::Intel_OCL_BI
...eservedMask function. http://llvm.org/docs/doxygen/html/X86RegisterInfo_8cpp_source.html case CallingConv::Intel_OCL_BI <http://llvm.org/docs/doxygen/html/namespacellvm_1_1CallingConv.html#a4f861731fc6dbfdccc05af5968d98974ad47327c131a0990283111588b89587cb>: { if (IsWin64 && HasAVX512) return CSR_Win64_Intel_OCL_BI_AVX512_RegMask; if (Is64Bit && HasAVX512) return CSR_64_Intel_OCL_BI_AVX512_RegMask; if (IsWin64 && HasAVX) return CSR_Win64_Intel_OCL_BI_AVX_RegMask; if (Is64Bit && HasAVX) return CSR...
2014 Sep 18
3
[LLVMdev] predicates vs. requirements [TableGen, X86InstrInfo.td]
...DDUPrm addr: $src)>, *Requires<[OptForSize**]>*; But this does: * let Predicates = [OptForSize] in* { def : Pat<(v2f64 (X86VBroadcast (loadf64 addr:$src))), (VMOVDDUPrm addr :$src)>; } I see both forms used on some patterns like this: * let Predicates = [HasAVX] *in { def : Pat<(X86Movddup (loadv2f64 addr:$src)), (VMOVDDUPrm addr:$src)>, *Requires<[HasAVX]>*; } Is a predicate different than a requirement or is this a bug? There are existing patterns that specify 'OptForSize' with "Requires", but they are...
2011 Sep 22
3
[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
Hi Bruno, > Some comments: > > + // Try to synthesize horizontal adds from adds of shuffles. > + if (((Subtarget->hasSSE3()&& (VT == MVT::v4f32 || VT == MVT::v2f64)) || > + (Subtarget->hasAVX()&& (VT == MVT::v8f32 || VT == MVT::v4f64)))&& > + isHorizontalBinOp(LHS, RHS, true)) > > 1) You probably want to do something like: > > "bool HasHorizontalArith = Subtarget->hasSSE3() || > Subtarget->hasAVX()" and check it for the first condi...
2013 Nov 22
2
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
...lib/Support/Host.cpp b/lib/Support/Host.cpp index 380df6b..2235456 100644 --- a/lib/Support/Host.cpp +++ b/lib/Support/Host.cpp @@ -138,6 +138,8 @@ std::string sys::getHostCPUName() { // switch, then we have full AVX support. const unsigned AVXBits = (1 << 27) | (1 << 28); bool HasAVX = ((ECX & AVXBits) == AVXBits) && OSHasAVXSupport(); + bool HasAVX2 = HasAVX && !GetX86CpuIDAndInfo(0x7, &EAX, &EBX, &ECX, &EDX) && + (EBX & 0x20); GetX86CpuIDAndInfo(0x80000001, &EAX, &EBX, &ECX, &EDX); bool Em6...
2013 Sep 12
3
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
...elease builds give identical assembly for Host.cpp. OK. I know the reason you cannot reproduce it, before posting the patch I've decided to check for AVX before checking AVX2, just not to cpuid AVX2 when we don't have AVX1 anyway. So the problem exists with following predicate: (1) bool HasAVX2 = !GetX86CpuIDAndInfo(0x7, &EAX, &EBX, &ECX, &EDX) && (EBX & 0x20); However it is working absolutely fine if I add "volatile": (2) volatile bool HasAVX2 = !GetX86CpuIDAndInfo(0x7, &EAX, &EBX, &ECX, &EDX) &&...
2011 Sep 21
0
[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
...in the hope that someone will > explain > how I should be doing the tablegen bits. This is awesome :D Some comments: + // Try to synthesize horizontal adds from adds of shuffles. + if (((Subtarget->hasSSE3() && (VT == MVT::v4f32 || VT == MVT::v2f64)) || + (Subtarget->hasAVX() && (VT == MVT::v8f32 || VT == MVT::v4f64))) && + isHorizontalBinOp(LHS, RHS, true)) 1) You probably want to do something like: "bool HasHorizontalArith = Subtarget->hasSSE3() || Subtarget->hasAVX()" and check it for the first condition, because when AVX is o...
2011 Sep 22
0
[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
...atch to synthesize x86 hadd instructions; need help with the tablegen bits Hi Bruno, > Some comments: > > + // Try to synthesize horizontal adds from adds of shuffles. > + if (((Subtarget->hasSSE3()&& (VT == MVT::v4f32 || VT == MVT::v2f64)) || > + (Subtarget->hasAVX()&& (VT == MVT::v8f32 || VT == MVT::v4f64)))&& > + isHorizontalBinOp(LHS, RHS, true)) > > 1) You probably want to do something like: > > "bool HasHorizontalArith = Subtarget->hasSSE3() || > Subtarget->hasAVX()" and check it for the first condi...
2011 Sep 21
2
[LLVMdev] Patch to synthesize x86 hadd instructions; need help with the tablegen bits
This patch synthesizes haddps/haddpd/hsubps/hsubpd instructions from floating point additions and subtractions of appropriate vector shuffles. To do this I introduced new x86 FHADD and FHSUB opcodes. These need to be wired up somehow in the .td file to the appropriate instructions. Since I have no idea how tablegen works I just hacked it in horribly. It works, but breaks support for the hadd
2013 Sep 12
0
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
...rt. Given the differences I'm seeing now I can't believe there'd be *no* difference in my tests if I'd done them properly. Anyway, thanks very much for the information. Hopefully that'll let me track things down. > Also attaching patch that removes volatile, but leaves > HasAVX check that makes the code run fine here. Would you mind me taking a day or so to investigate what's going on here properly? Introducing a volatile to work around a bug in Clang itself just seems perverse to me. (And we shouldn't let a CodeGen bug dictate how we implement our functions eith...
2013 Sep 12
0
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
Hi Adam, > * I have marked HasAVX2 as "volatile", since otherwise it gets > magically zeroed (by optimizer?) when compiling clang with latest > clang build from trunk That's far more worrying to me than not being able to detect Haswell. I can't reproduce the problem here at the moment: both debug and releas...
2013 Sep 12
2
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
...core-avx2" platform on Haswell i7 CPUs when running -march=native. Currently it detects it as generic x86_64. lib/Support/Host.cpp: * Haswell is detected for CPUID Family 6 Model 60 * Similar to Ivy and Sandy Bridge we check for AVX2 since some Haswell Pentiums are SSE4.x only * I have marked HasAVX2 as "volatile", since otherwise it gets magically zeroed (by optimizer?) when compiling clang with latest clang build from trunk lib/Target/X86/X86Subtarget.cpp: * Also enabling X86::FeatureFastUAMem for Haswell Regards, -- Adam Strzelecki | nanoant.com | twitter.com/nanoant ----------...
2013 Nov 23
0
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
...dword ptr [esi],eax + mov esi,rEBX + mov dword ptr [esi],ebx + mov esi,rECX + mov dword ptr [esi],ecx + mov esi,rEDX + mov dword ptr [esi],edx + } + return false; + #else + return true; + #endif +#else + return true; +#endif +} + static bool OSHasAVXSupport() { #if defined(__GNUC__) // Check xgetbv; this uses a .byte sequence instead of the instruction @@ -131,6 +200,14 @@ std::string sys::getHostCPUName() { unsigned Model = 0; DetectX86FamilyModel(EAX, Family, Model); + union { + unsigned u[3]; + char c[12]; + } text;...
2013 Jan 07
4
[LLVMdev] instruction scheduling issue
...TargetMachine.cpp: bool X86PassConfig::addPreEmitPass() { bool ShouldPrint = false; if (getOptLevel() != CodeGenOpt::None && getX86Subtarget().hasSSE2()) { addPass(createExecutionDependencyFixPass(&X86::VR128RegClass)); ShouldPrint = true; } if (getX86Subtarget().hasAVX() && UseVZeroUpper) { addPass(createX86IssueVZeroUpperPass()); ShouldPrint = true; } return ShouldPrint; } -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
2013 Nov 23
2
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
I agree with Tim, you need to implement a GetCpuIDAndInfoEx function in Host.cpp and pass the correct value to ecx. Also you need to verify that 7 is a valid leaf because an invalid leaf is defined to return the highest supported leaf on that processor. So if a processor supports say leaf 6 and not leaf 7, then an access leaf 7 will return the data from leaf 6 causing unrelated bits to be
2013 Nov 22
0
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
Hi Adam, > + bool HasAVX2 = HasAVX && !GetX86CpuIDAndInfo(0x7, &EAX, &EBX, &ECX, &EDX) && > + (EBX & 0x20); I don't think this guarantees %ecx is 0, does it? Wasn't that the entire reason the original code went wrong? Cheers. Tim.
2011 Feb 26
0
[LLVMdev] X86 LowerVECTOR_SHUFFLE Question
...8, f128mem, "unpcklps\t{$src2, $src1, $dst|$dst, $src1, $src2}", SSEPackedSingle>, VEX_4V; "New-style": def : Pat<(v4f32 (X86Unpcklps VR128:$src1, (memopv4f32 addr:$src2))), (VUNPCKLPSrm VR128:$src1, addr:$src2)>, Requires<[HasAVX]>; I think these are basically the same pattern. What's the purpose of these special operators and patterns? -Dave
2015 May 04
3
[LLVMdev] AVX2 Cost Table in X86TargetTransformInfo
Thanks Nadav for the info. It clears my query :) Yes its an integer ADD, and since AVX2 supports 256 bits integer arithmetic, so its cost is less than AVX1. One query though - shouldn't then the cost of integer ADD/SUB/MUL (which would be 1) be explicitly specified in AVX2 cost table? Because right now this entry is missing and cost of these operations are taken from BaseTTI (which is
2011 Feb 25
2
[LLVMdev] X86 LowerVECTOR_SHUFFLE Question
In ToT, LowerVECTOR_SHUFFLE for x86 has this code: if (X86::isUNPCKLMask(SVOp)) getTargetShuffleNode(getUNPCKLOpcode(VT) dl, VT, V1, V2, DAG); why would this not be: if (X86::isUNPCKLMask(SVOp)) return SVOp; I'm trying to add support for VUNPCKL and am getting into trouble because the existing code ends up creating: VUNPCKLPS load load which is badness come selection
2013 Jan 07
2
[LLVMdev] instruction scheduling issue
On 1/7/2013 1:53 PM, Sergei Larin wrote: > > Also, how much performance are you willing to sacrifice to do what you > do? Maybe turning off scheduling all together is an acceptable solution? Or insert the calls after scheduling. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation