search for: lzcnt

Displaying 20 results from an estimated 26 matches for "lzcnt".

2011 Nov 30
0
[PATCH 3/4] x86/emulator: properly handle lzcnt and tzcnt
...n order to avoid running into problems on newer CPUs. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -1058,6 +1058,9 @@ static bool_t vcpu_has( return rc == X86EMUL_OKAY; } +#define vcpu_has_lzcnt() vcpu_has(0x80000001, ECX, 5, ctxt, ops) +#define vcpu_has_bmi1() vcpu_has(0x00000007, EBX, 3, ctxt, ops) + #define vcpu_must_have(leaf, reg, bit) \ generate_exception_if(!vcpu_has(leaf, reg, bit, ctxt, ops), EXC_UD, -1) #define vcpu_must_have_mmx() vcpu_must_have(0x00000001, EDX, 23) @...
2017 Sep 30
2
invalid code generated on Windows x86_64 using skylake-specific features
...res: +sse2,+cx16,-tbm,-avx512ifma,-avx512dq,-fma4,+prfchw,+bmi2,+xsavec,+fsgsbase,+popcnt,+aes,+xsaves,-avx512er,-avx512vpopcntdq,-clwb,-avx512f,-clzero,-pku,+mmx,-lwp,-xop,+rdseed,-sse4a,-avx512bw,+clflushopt,+xsave,-avx512vl,-avx512cd,+avx,-rtm,+fma,+bmi,+rdrnd,-mwaitx,+sse4.1,+sse4.2,+avx2,+sse,+lzcnt,+pclmul,-prefetchwt1,+f16c,+ssse3,+sgx,+cmov,-avx512vbmi,+movbe,+xsaveopt,-sha,+adx,-avx512pf,+sse3 It successfully creates a binary, but the binary when run crashes with: Unhandled exception at 0x00007FF7C9913BA7 in test.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF. Th...
2017 Jan 23
2
Early legalization pass ? Doing early legalization in an existing pass ?
...#39;t supported by the backend would benefit from having the optimizer pass on them. I noticed some example trying to optimize various pieces of code over the past weeks. One offender is the cttz/ctlz intrinsic when defined on 0. On X86, BSR and NSF are undefined on 0, and only recent CPU have the LZCNT and TZCNT instructions that are properly defined for 0. The backend insert code with a branch that checks for 0 and use bsf/bsr or just use a constant. But if we are to branch anyway, and one path of the branch set the value as a constant, there are some obvious optimization which can be done, sta...
2017 Oct 01
1
invalid code generated on Windows x86_64 using skylake-specific features
...fma,- > avx512dq,-fma4,+prfchw,+bmi2,+xsavec,+fsgsbase,+popcnt,+aes, > +xsaves,-avx512er,-avx512vpopcntdq,-clwb,-avx512f,-clzero,-pku,+mmx,- > lwp,-xop,+rdseed,-sse4a,-avx512bw,+clflushopt,+xsave,- > avx512vl,-avx512cd,+avx,-rtm,+fma,+bmi,+rdrnd,-mwaitx,+sse4. > 1,+sse4.2,+avx2,+sse,+lzcnt,+pclmul,-prefetchwt1,+f16c,+ > ssse3,+sgx,+cmov,-avx512vbmi,+movbe,+xsaveopt,-sha,+adx,-avx512pf,+sse3 > > > It successfully creates a binary, but the binary when run crashes with: > > Unhandled exception at 0x00007FF7C9913BA7 in test.exe: 0xC0000005: Access > violation reading...
2017 Jan 24
3
Early legalization pass ? Doing early legalization in an existing pass ?
...d would benefit from having the optimizer pass on them. I noticed > some example trying to optimize various pieces of code over the past weeks. > > > > One offender is the cttz/ctlz intrinsic when defined on 0. On X86, BSR > and NSF are undefined on 0, and only recent CPU have the LZCNT and TZCNT > instructions that are properly defined for 0. The backend insert code with > a branch that checks for 0 and use bsf/bsr or just use a constant. > > > > But if we are to branch anyway, and one path of the branch set the value > as a constant, there are some obvious o...
2016 Jun 29
2
avx512 JIT backend generates wrong code on <4 x float>
...ne assembler is wrong. When I execute the exploit program on an Intel KNL the following output is produced: CPU name = knl -sse4a,-avx512bw,cx16,-tbm,xsave,-fma4,-avx512vl,prfchw,bmi2,adx,-xsavec,fsgsbase,avx,avx512cd,avx512pf,-rtm,popcnt,fma,bmi,aes,rdrnd,-xsaves,sse4.1,sse4.2,avx2,avx512er,sse,lzcnt,pclmul,avx512f,f16c,ssse3,mmx,-pku,cmov,-xop,rdseed,movbe,-hle,xsaveopt,-sha,sse2,sse3,-avx512dq, Assembly: .text .file "module_KFxOBX_i4_after.ll" .globl adjmul .align 16, 0x90 .type adjmul, at function adjmul: .cfi_startproc leaq (%rdi,%...
2017 Aug 17
4
unable to emit vectorized code in LLVM IR
...t;="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="knl" "target-features"="+adx,+aes,+avx,+avx2,+avx512cd,+avx512er,+avx512f,+avx512pf,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+prefetchwt1,+rdrnd,+rdseed,+rtm,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="false" "use-soft-float"="false" } !llvm.ident = !{!0} !0 = !{!"clang version 4.0.0 (tags/RELEASE_400/fina...
2013 Feb 26
2
[LLVMdev] Question about intrinsic function llvm.objectsize
...64) #1 declare void @bar1(i8*) #2 declare void @bar2(i8*) #2 declare i32 @bar3(i8*) #2 attributes #0 = { nounwind ssp uwtable "target-cpu"="core2" "target-features"="-sse4a,-avx2,-xop,-fma4,-bmi2,-3dnow,-3dnowa,-pclmul,+sse,-avx,-sse41,+ssse3,+mmx,-rtm,-sse42,-lzcnt,-f16c,-popcnt,-bmi,-aes,-fma,-rdrand,+sse2,+sse3" } attributes #1 = { nounwind "target-cpu"="core2" "target-features"="-sse4a,-avx2,-xop,-fma4,-bmi2,-3dnow,-3dnowa,-pclmul,+sse,-avx,-sse41,+ssse3,+mmx,-rtm,-sse42,-lzcnt,-f16c,-popcnt,-bmi,-aes,-fma,-rdrand,+s...
2016 Jun 29
0
avx512 JIT backend generates wrong code on <4 x float>
...When I execute the exploit program on an Intel KNL the following > output > is produced: > > CPU name = knl > -sse4a,-avx512bw,cx16,-tbm,xsave,-fma4,-avx512vl,prfchw,bmi2,adx,-xsavec,fsgsbase,avx,avx512cd,avx512pf,-rtm,popcnt,fma,bmi,aes,rdrnd,-xsaves,sse4.1,sse4.2,avx2,avx512er,sse,lzcnt,pclmul,avx512f,f16c,ssse3,mmx,-pku,cmov,-xop,rdseed,movbe,-hle,xsaveopt,-sha,sse2,sse3,-avx512dq, > Assembly: > .text > .file "module_KFxOBX_i4_after.ll" > .globl adjmul > .align 16, 0x90 > .type adjmul, at function > adjmul: >...
2016 Jun 30
1
avx512 JIT backend generates wrong code on <4 x float>
...exploit program on an Intel KNL the following >> output >> is produced: >> >> CPU name = knl >> -sse4a,-avx512bw,cx16,-tbm,xsave,-fma4,-avx512vl,prfchw,bmi2,adx,-xsavec,fsgsbase,avx,avx512cd,avx512pf,-rtm,popcnt,fma,bmi,aes,rdrnd,-xsaves,sse4.1,sse4.2,avx2,avx512er,sse,lzcnt,pclmul,avx512f,f16c,ssse3,mmx,-pku,cmov,-xop,rdseed,movbe,-hle,xsaveopt,-sha,sse2,sse3,-avx512dq, >> Assembly: >> .text >> .file "module_KFxOBX_i4_after.ll" >> .globl adjmul >> .align 16, 0x90 >> .type adjmul, a...
2017 Aug 17
2
unable to emit vectorized code in LLVM IR
...-math"="false" >> "stack-protector-buffer-size"="8" "target-cpu"="knl" >> "target-features"="+adx,+aes,+avx,+avx2,+avx512cd,+avx512er, >> +avx512f,+avx512pf,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fx >> sr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+prefetchwt1,+rdrnd,+ >> rdseed,+rtm,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" >> "unsafe-fp-math"="false" "use-soft-float"="false" } >> >> !llvm.ident = !{!0} >> >> !0...
2017 Oct 03
2
invalid code generated on Windows x86_64 using skylake-specific features
...w,+bmi2,+xsavec,+fsgsbase,+popcnt,+aes, >>> +xsaves,-avx512er,-avx512vpopcntdq,-clwb,-avx512f,-clzero,-p >>> ku,+mmx,-lwp,-xop,+rdseed,-sse4a,-avx512bw,+clflushopt,+xsav >>> e,-avx512vl,-avx512cd,+avx,-rtm,+fma,+bmi,+rdrnd,-mwaitx,+ >>> sse4.1,+sse4.2,+avx2,+sse,+lzcnt,+pclmul,-prefetchwt1,+ >>> f16c,+ssse3,+sgx,+cmov,-avx512vbmi,+movbe,+xsaveopt,- >>> sha,+adx,-avx512pf,+sse3 >>> >>> >>> It successfully creates a binary, but the binary when run crashes with: >>> >>> Unhandled exception at 0x00007FF7C...
2020 Jul 10
12
New x86-64 micro-architecture levels
...that it probably can be dropped, unless the benefits from using VEX encoding are truly significant. For AVX and some of the following features, it is assumed that the run-time selection takes full support coverage (from silicon to the kernel) into account. * Level C AVX2, BMI1, BMI2, F16C, FMA, LZCNT, MOVBE, plus everything in level B. This is close to what glibc currently calls "haswell". * Level D AVX512F, AVX512BW, AVX512CD, AVX512DQ, AVX512VL, plus everything in level C. This is the AVX-512 level implemented by Xeon Scalable Processors, not the Xeon Phi variant. glibc (or an...
2017 Aug 17
4
unable to emit vectorized code in LLVM IR
...t;>>> "stack-protector-buffer-size"="8" "target-cpu"="knl" >>>> "target-features"="+adx,+aes,+avx,+avx2,+avx512cd,+avx512er, >>>> +avx512f,+avx512pf,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fx >>>> sr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+prefetchwt1,+rdrnd,+r >>>> dseed,+rtm,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" >>>> "unsafe-fp-math"="false" "use-soft-float"="false" } >>>> >>>> !llvm.ide...
2020 Jul 15
0
New x86-64 micro-architecture levels
...LATFORM at the moment) and of course we can generate SIGILL for unsupported instructions. We currently don't intercept /proc/cpuinfo (but could). I think it is important to be precise here, because in the past this has sometimes caused confusion. For example for how to check correctly for avx, lzcnt, or fma[4] support. Thanks, Mark P.S. I don't particular like the numbered names, but well, bike-shed...
2013 May 25
0
[PATCH 1/2] Fix mistyped variable name
...ith input 0 */ #if defined(__INTEL_COMPILER) - return _bit_scan_reverse(n) ^ 31U; + return _bit_scan_reverse(v) ^ 31U; #elif defined(__GNUC__) && (__GNUC__ >= 4 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 4)) /* This will translate either to (bsr ^ 31U), clz , ctlz, cntlz, lzcnt depending on * -march= setting or to a software rutine in exotic machines. */ -- 1.7.10.4 --=_vA9B0g0mKp5QbmIrcmD4lw5 Content-Type: text/x-diff; charset=us-ascii; name=0002-bitwriter.c-Add-missing-extern-declaration.patch Content-Disposition: attachment; filename=0002-bitwriter.c-Add-missing...
2013 Aug 16
1
PATCH for bitmath.h: 1 typo, 1 warning
...\include\private\bitmath.h 2013-08-14 10:20:51.484053700 +0400 @@ -78,12 +78,12 @@ return _bit_scan_reverse(v) ^ 31U; #elif defined(__GNUC__) && (__GNUC__ >= 4 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 4)) /* This will translate either to (bsr ^ 31U), clz , ctlz, cntlz, lzcnt depending on - * -march= setting or to a software rutine in exotic machines. */ + * -march= setting or to a software routine in exotic machines. */ return __builtin_clz(v); #elif defined(_MSC_VER) && (_MSC_VER >= 1400) - FLAC__uint32 idx; + unsigned long idx; _BitScan...
2012 May 04
3
Git branch with compiling fixes for win32
El 03/05/12 12:19, Miroslav Lichvar escribi?: > Hi Josh, > > nice to see you here again. > > On Wed, Apr 25, 2012 at 04:26:05PM -0700, Josh Coalson wrote: >> (Jumping in again, maybe at the wrong point since this doesn't seem >> to involve encoding, but here goes.) >> >> Miroslav's patches have always been high-quality for sure. But >>
2017 Aug 17
2
unable to emit vectorized code in LLVM IR
...ck-protector-buffer-size"="8" "target-cpu"="knl" >>>>>> "target-features"="+adx,+aes,+avx,+avx2,+avx512cd,+avx512er, >>>>>> +avx512f,+avx512pf,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fx >>>>>> sr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+prefetchwt1,+rdrnd,+r >>>>>> dseed,+rtm,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" >>>>>> "unsafe-fp-math"="false" "use-soft-float"="false" } >>>>>> &g...
2017 Aug 17
2
unable to emit vectorized code in LLVM IR
..."="8" "target-cpu"="knl" >>>>>>>> "target-features"="+adx,+aes,+avx,+avx2,+avx512cd,+avx512er, >>>>>>>> +avx512f,+avx512pf,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fx >>>>>>>> sr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+prefetchwt1,+rdrnd,+r >>>>>>>> dseed,+rtm,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" >>>>>>>> "unsafe-fp-math"="false" "use-soft-float"="false" } >>&gt...