Adam Strzelecki
2013-Sep-12 16:28 UTC
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
> That's far more worrying to me than not being able to detect Haswell. > I can't reproduce the problem here at the moment: both debug and > release builds give identical assembly for Host.cpp.OK. I know the reason you cannot reproduce it, before posting the patch I've decided to check for AVX before checking AVX2, just not to cpuid AVX2 when we don't have AVX1 anyway. So the problem exists with following predicate: (1) bool HasAVX2 = !GetX86CpuIDAndInfo(0x7, &EAX, &EBX, &ECX, &EDX) && (EBX & 0x20); However it is working absolutely fine if I add "volatile": (2) volatile bool HasAVX2 = !GetX86CpuIDAndInfo(0x7, &EAX, &EBX, &ECX, &EDX) && (EBX & 0x20); Or add extra check for HasAVX: (3) bool HasAVX2 = HasAVX && !GetX86CpuIDAndInfo(0x7, &EAX, &EBX, &ECX, &EDX) && (EBX & 0x20); Attaching object files related to all of these cases below. Also attaching patch that removes volatile, but leaves HasAVX check that makes the code run fine here. My platform is Ubuntu 12.04 LTS 64-bit & i7-4470k. llvm[1]: Compiling Host.cpp for Release build if clang++ -I/home/ono/Documents/llvm/include -I/home/ono/Documents/llvm/lib/Support -I/tmp/llvm.q35LGwjHmR/include -I/tmp/llvm.q35LGwjHmR/lib/Support -DNDEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -O3 -fomit-frame-pointer -fvisibility-inlines-hidden -fno-exceptions -fno-rtti -fPIC -Woverloaded-virtual -Wcast-qual -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcovered-switch-default -Wno-uninitialized -Wno-missing-field-initializers -c -MMD -MP -MF "/tmp/llvm.q35LGwjHmR/lib/Support/Release/Host.d.tmp" -MT "/tmp/llvm.q35LGwjHmR/lib/Support/Release/Host.o" -MT "/tmp/llvm.q35LGwjHmR/lib/Support/Release/Host.d" /home/ono/Documents/llvm/lib/Support/Host.cpp -o /tmp/llvm.q35LGwjHmR/lib/Support/Release/Host.o ; \ then /bin/mv -f "/tmp/llvm.q35LGwjHmR/lib/Support/Release/Host.d.tmp" "/tmp/llvm.q35LGwjHmR/lib/Support/Release/Host.d"; else /bin/rm "/tmp/llvm.q35LGwjHmR/lib/Support/Release/Host.d.tmp"; exit 1; fi Regards, -- Adam Strzelecki | nanoant.com | twitter.com/nanoant -------------- next part -------------- A non-text attachment was scrubbed... Name: Host-objects.tbz2 Type: application/octet-stream Size: 6366 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130912/6c851eb1/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: llvm-detect-haswell-r2.patch Type: application/octet-stream Size: 1720 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130912/6c851eb1/attachment-0001.obj>
Tim Northover
2013-Sep-12 19:58 UTC
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
Hi Adam,> OK. I know the reason you cannot reproduce it, before posting > the patch I've decided to check for AVX before checking AVX2, > just not to cpuid AVX2 when we don't have AVX1 anyway.I suspect it was also incompetence on my part. Given the differences I'm seeing now I can't believe there'd be *no* difference in my tests if I'd done them properly. Anyway, thanks very much for the information. Hopefully that'll let me track things down.> Also attaching patch that removes volatile, but leaves > HasAVX check that makes the code run fine here.Would you mind me taking a day or so to investigate what's going on here properly? Introducing a volatile to work around a bug in Clang itself just seems perverse to me. (And we shouldn't let a CodeGen bug dictate how we implement our functions either). I promise I'll do the review of your code after that. Cheers. Tim.
Adam Strzelecki
2013-Sep-12 21:12 UTC
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
> Anyway, thanks very much for the information. Hopefully that'll let me > track things down.Let me know if you need some more information or dumps.> Would you mind me taking a day or so to investigate what's going on > here properly? Introducing a volatile to work around a bug in Clang > itself just seems perverse to me. (And we shouldn't let a CodeGen bug > dictate how we implement our functions either).Looking at the assembly there's something wrong with SI that is not getting saved anywhere after CPUID and 0x20 bit test before it gets overwritten by LEA. 332: mov eax,0x7 337: mov rsi,rbx 33a: cpuid 33c: xchg rsi,rbx 33f: and esi,0x20 342: shr esi,0x5 345: lea rbp,[rip+0x0] # 34c <llvm::sys::getHostCPUName()+0xbc> 34c: lea r12,[rip+0x0] # 353 <llvm::sys::getHostCPUName()+0xc3> 353: cmove rbp,r12 357: lea rdi,[rsp+0x188] 35f: lea rsi,[rip+0x0] # 366 <llvm::sys::getHostCPUName()+0xd6> In both other cases (2) & (3) SI is saved into stack region.> I promise I'll do the review of your code after that.Thanks. Regards, -- Adam
Adam Strzelecki
2013-Nov-22 11:33 UTC
[LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
> I promise I'll do the review of your code after that.Tim, I don’t want to push too much. But since there’s 3.4 release on the horizon, maybe you could find a moment review this patch. Especially Haswell is all there since few months. Cheers, -- Adam --- lib/Support/Host.cpp | 8 ++++++++ lib/Target/X86/X86Subtarget.cpp | 3 ++- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/lib/Support/Host.cpp b/lib/Support/Host.cpp index 380df6b..2235456 100644 --- a/lib/Support/Host.cpp +++ b/lib/Support/Host.cpp @@ -138,6 +138,8 @@ std::string sys::getHostCPUName() { // switch, then we have full AVX support. const unsigned AVXBits = (1 << 27) | (1 << 28); bool HasAVX = ((ECX & AVXBits) == AVXBits) && OSHasAVXSupport(); + bool HasAVX2 = HasAVX && !GetX86CpuIDAndInfo(0x7, &EAX, &EBX, &ECX, &EDX) && + (EBX & 0x20); GetX86CpuIDAndInfo(0x80000001, &EAX, &EBX, &ECX, &EDX); bool Em64T = (EDX >> 29) & 0x1; @@ -258,6 +260,12 @@ std::string sys::getHostCPUName() { // versions instead of the i7 versions). return HasAVX ? "core-avx-i" : "corei7"; + // Haswell: + case 60: + // Not all Haswell processors support AVX too (such as the Pentium + // versions instead of the i7 versions). + return HasAVX2 ? "core-avx2" : "corei7"; + case 28: // Most 45 nm Intel Atom processors case 38: // 45 nm Atom Lincroft case 39: // 32 nm Atom Medfield diff --git a/lib/Target/X86/X86Subtarget.cpp b/lib/Target/X86/X86Subtarget.cpp index fa04c38..d221316 100644 --- a/lib/Target/X86/X86Subtarget.cpp +++ b/lib/Target/X86/X86Subtarget.cpp @@ -285,7 +285,8 @@ void X86Subtarget::AutoDetectSubtargetFeatures() { (Family == 6 && Model == 0x2F) || // Westmere: Westmere-EX (Family == 6 && Model == 0x2A) || // SandyBridge (Family == 6 && Model == 0x2D) || // SandyBridge: SandyBridge-E* - (Family == 6 && Model == 0x3A))) {// IvyBridge + (Family == 6 && Model == 0x3A) || // IvyBridge + (Family == 6 && Model == 0x3C))) {// Haswell IsUAMemFast = true; ToggleFeature(X86::FeatureFastUAMem); } -- 1.8.3.4 (Apple Git-47)
Possibly Parallel Threads
- [LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
- [LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
- [LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
- [LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)
- [LLVMdev] [PATCH] Detect Haswell subarchitecture (i.e. using -march=native)