thr3ads.net - search: "lzcnt"

Displaying 20 results from an estimated 26 matches for "lzcnt".

[PATCH 3/4] x86/emulator: properly handle lzcnt and tzcnt

2011 Nov 30

[PATCH 3/4] x86/emulator: properly handle lzcnt and tzcnt

...n order to avoid running into problems on newer CPUs. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/x86_emulate/x86_emulate.c +++ b/xen/arch/x86/x86_emulate/x86_emulate.c @@ -1058,6 +1058,9 @@ static bool_t vcpu_has( return rc == X86EMUL_OKAY; } +#define vcpu_has_lzcnt() vcpu_has(0x80000001, ECX, 5, ctxt, ops) +#define vcpu_has_bmi1() vcpu_has(0x00000007, EBX, 3, ctxt, ops) + #define vcpu_must_have(leaf, reg, bit) \ generate_exception_if(!vcpu_has(leaf, reg, bit, ctxt, ops), EXC_UD, -1) #define vcpu_must_have_mmx() vcpu_must_have(0x00000001, EDX, 23) @...

invalid code generated on Windows x86_64 using skylake-specific features

2017 Sep 30

invalid code generated on Windows x86_64 using skylake-specific features

...res: +sse2,+cx16,-tbm,-avx512ifma,-avx512dq,-fma4,+prfchw,+bmi2,+xsavec,+fsgsbase,+popcnt,+aes,+xsaves,-avx512er,-avx512vpopcntdq,-clwb,-avx512f,-clzero,-pku,+mmx,-lwp,-xop,+rdseed,-sse4a,-avx512bw,+clflushopt,+xsave,-avx512vl,-avx512cd,+avx,-rtm,+fma,+bmi,+rdrnd,-mwaitx,+sse4.1,+sse4.2,+avx2,+sse,+lzcnt,+pclmul,-prefetchwt1,+f16c,+ssse3,+sgx,+cmov,-avx512vbmi,+movbe,+xsaveopt,-sha,+adx,-avx512pf,+sse3 It successfully creates a binary, but the binary when run crashes with: Unhandled exception at 0x00007FF7C9913BA7 in test.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF. Th...

Early legalization pass ? Doing early legalization in an existing pass ?

2017 Jan 23

Early legalization pass ? Doing early legalization in an existing pass ?

...#39;t supported by the backend would benefit from having the optimizer pass on them. I noticed some example trying to optimize various pieces of code over the past weeks. One offender is the cttz/ctlz intrinsic when defined on 0. On X86, BSR and NSF are undefined on 0, and only recent CPU have the LZCNT and TZCNT instructions that are properly defined for 0. The backend insert code with a branch that checks for 0 and use bsf/bsr or just use a constant. But if we are to branch anyway, and one path of the branch set the value as a constant, there are some obvious optimization which can be done, sta...

invalid code generated on Windows x86_64 using skylake-specific features

2017 Oct 01

invalid code generated on Windows x86_64 using skylake-specific features

...fma,- > avx512dq,-fma4,+prfchw,+bmi2,+xsavec,+fsgsbase,+popcnt,+aes, > +xsaves,-avx512er,-avx512vpopcntdq,-clwb,-avx512f,-clzero,-pku,+mmx,- > lwp,-xop,+rdseed,-sse4a,-avx512bw,+clflushopt,+xsave,- > avx512vl,-avx512cd,+avx,-rtm,+fma,+bmi,+rdrnd,-mwaitx,+sse4. > 1,+sse4.2,+avx2,+sse,+lzcnt,+pclmul,-prefetchwt1,+f16c,+ > ssse3,+sgx,+cmov,-avx512vbmi,+movbe,+xsaveopt,-sha,+adx,-avx512pf,+sse3 > > > It successfully creates a binary, but the binary when run crashes with: > > Unhandled exception at 0x00007FF7C9913BA7 in test.exe: 0xC0000005: Access > violation reading...

Early legalization pass ? Doing early legalization in an existing pass ?

2017 Jan 24

Early legalization pass ? Doing early legalization in an existing pass ?

...d would benefit from having the optimizer pass on them. I noticed > some example trying to optimize various pieces of code over the past weeks. > > > > One offender is the cttz/ctlz intrinsic when defined on 0. On X86, BSR > and NSF are undefined on 0, and only recent CPU have the LZCNT and TZCNT > instructions that are properly defined for 0. The backend insert code with > a branch that checks for 0 and use bsf/bsr or just use a constant. > > > > But if we are to branch anyway, and one path of the branch set the value > as a constant, there are some obvious o...

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 29

avx512 JIT backend generates wrong code on <4 x float>

...ne assembler is wrong. When I execute the exploit program on an Intel KNL the following output is produced: CPU name = knl -sse4a,-avx512bw,cx16,-tbm,xsave,-fma4,-avx512vl,prfchw,bmi2,adx,-xsavec,fsgsbase,avx,avx512cd,avx512pf,-rtm,popcnt,fma,bmi,aes,rdrnd,-xsaves,sse4.1,sse4.2,avx2,avx512er,sse,lzcnt,pclmul,avx512f,f16c,ssse3,mmx,-pku,cmov,-xop,rdseed,movbe,-hle,xsaveopt,-sha,sse2,sse3,-avx512dq, Assembly: .text .file "module_KFxOBX_i4_after.ll" .globl adjmul .align 16, 0x90 .type adjmul, at function adjmul: .cfi_startproc leaq (%rdi,%...

unable to emit vectorized code in LLVM IR

2017 Aug 17

unable to emit vectorized code in LLVM IR

...t;="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="knl" "target-features"="+adx,+aes,+avx,+avx2,+avx512cd,+avx512er,+avx512f,+avx512pf,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fxsr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+prefetchwt1,+rdrnd,+rdseed,+rtm,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" "unsafe-fp-math"="false" "use-soft-float"="false" } !llvm.ident = !{!0} !0 = !{!"clang version 4.0.0 (tags/RELEASE_400/fina...

[LLVMdev] Question about intrinsic function llvm.objectsize

2013 Feb 26

[LLVMdev] Question about intrinsic function llvm.objectsize

...64) #1 declare void @bar1(i8*) #2 declare void @bar2(i8*) #2 declare i32 @bar3(i8*) #2 attributes #0 = { nounwind ssp uwtable "target-cpu"="core2" "target-features"="-sse4a,-avx2,-xop,-fma4,-bmi2,-3dnow,-3dnowa,-pclmul,+sse,-avx,-sse41,+ssse3,+mmx,-rtm,-sse42,-lzcnt,-f16c,-popcnt,-bmi,-aes,-fma,-rdrand,+sse2,+sse3" } attributes #1 = { nounwind "target-cpu"="core2" "target-features"="-sse4a,-avx2,-xop,-fma4,-bmi2,-3dnow,-3dnowa,-pclmul,+sse,-avx,-sse41,+ssse3,+mmx,-rtm,-sse42,-lzcnt,-f16c,-popcnt,-bmi,-aes,-fma,-rdrand,+s...

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 29

avx512 JIT backend generates wrong code on <4 x float>

...When I execute the exploit program on an Intel KNL the following > output > is produced: > > CPU name = knl > -sse4a,-avx512bw,cx16,-tbm,xsave,-fma4,-avx512vl,prfchw,bmi2,adx,-xsavec,fsgsbase,avx,avx512cd,avx512pf,-rtm,popcnt,fma,bmi,aes,rdrnd,-xsaves,sse4.1,sse4.2,avx2,avx512er,sse,lzcnt,pclmul,avx512f,f16c,ssse3,mmx,-pku,cmov,-xop,rdseed,movbe,-hle,xsaveopt,-sha,sse2,sse3,-avx512dq, > Assembly: > .text > .file "module_KFxOBX_i4_after.ll" > .globl adjmul > .align 16, 0x90 > .type adjmul, at function > adjmul: >...

avx512 JIT backend generates wrong code on <4 x float>

2016 Jun 30

avx512 JIT backend generates wrong code on <4 x float>

...exploit program on an Intel KNL the following >> output >> is produced: >> >> CPU name = knl >> -sse4a,-avx512bw,cx16,-tbm,xsave,-fma4,-avx512vl,prfchw,bmi2,adx,-xsavec,fsgsbase,avx,avx512cd,avx512pf,-rtm,popcnt,fma,bmi,aes,rdrnd,-xsaves,sse4.1,sse4.2,avx2,avx512er,sse,lzcnt,pclmul,avx512f,f16c,ssse3,mmx,-pku,cmov,-xop,rdseed,movbe,-hle,xsaveopt,-sha,sse2,sse3,-avx512dq, >> Assembly: >> .text >> .file "module_KFxOBX_i4_after.ll" >> .globl adjmul >> .align 16, 0x90 >> .type adjmul, a...

unable to emit vectorized code in LLVM IR

2017 Aug 17

unable to emit vectorized code in LLVM IR

...-math"="false" >> "stack-protector-buffer-size"="8" "target-cpu"="knl" >> "target-features"="+adx,+aes,+avx,+avx2,+avx512cd,+avx512er, >> +avx512f,+avx512pf,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fx >> sr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+prefetchwt1,+rdrnd,+ >> rdseed,+rtm,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" >> "unsafe-fp-math"="false" "use-soft-float"="false" } >> >> !llvm.ident = !{!0} >> >> !0...

invalid code generated on Windows x86_64 using skylake-specific features

2017 Oct 03

invalid code generated on Windows x86_64 using skylake-specific features

...w,+bmi2,+xsavec,+fsgsbase,+popcnt,+aes, >>> +xsaves,-avx512er,-avx512vpopcntdq,-clwb,-avx512f,-clzero,-p >>> ku,+mmx,-lwp,-xop,+rdseed,-sse4a,-avx512bw,+clflushopt,+xsav >>> e,-avx512vl,-avx512cd,+avx,-rtm,+fma,+bmi,+rdrnd,-mwaitx,+ >>> sse4.1,+sse4.2,+avx2,+sse,+lzcnt,+pclmul,-prefetchwt1,+ >>> f16c,+ssse3,+sgx,+cmov,-avx512vbmi,+movbe,+xsaveopt,- >>> sha,+adx,-avx512pf,+sse3 >>> >>> >>> It successfully creates a binary, but the binary when run crashes with: >>> >>> Unhandled exception at 0x00007FF7C...

New x86-64 micro-architecture levels

2020 Jul 10

New x86-64 micro-architecture levels

...that it probably can be dropped, unless the benefits from using VEX encoding are truly significant. For AVX and some of the following features, it is assumed that the run-time selection takes full support coverage (from silicon to the kernel) into account. * Level C AVX2, BMI1, BMI2, F16C, FMA, LZCNT, MOVBE, plus everything in level B. This is close to what glibc currently calls "haswell". * Level D AVX512F, AVX512BW, AVX512CD, AVX512DQ, AVX512VL, plus everything in level C. This is the AVX-512 level implemented by Xeon Scalable Processors, not the Xeon Phi variant. glibc (or an...

unable to emit vectorized code in LLVM IR

2017 Aug 17

unable to emit vectorized code in LLVM IR

...t;>>> "stack-protector-buffer-size"="8" "target-cpu"="knl" >>>> "target-features"="+adx,+aes,+avx,+avx2,+avx512cd,+avx512er, >>>> +avx512f,+avx512pf,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fx >>>> sr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+prefetchwt1,+rdrnd,+r >>>> dseed,+rtm,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" >>>> "unsafe-fp-math"="false" "use-soft-float"="false" } >>>> >>>> !llvm.ide...

New x86-64 micro-architecture levels

2020 Jul 15

New x86-64 micro-architecture levels

...LATFORM at the moment) and of course we can generate SIGILL for unsupported instructions. We currently don't intercept /proc/cpuinfo (but could). I think it is important to be precise here, because in the past this has sometimes caused confusion. For example for how to check correctly for avx, lzcnt, or fma[4] support. Thanks, Mark P.S. I don't particular like the numbered names, but well, bike-shed...

[PATCH 1/2] Fix mistyped variable name

2013 May 25

[PATCH 1/2] Fix mistyped variable name

...ith input 0 */ #if defined(__INTEL_COMPILER) - return _bit_scan_reverse(n) ^ 31U; + return _bit_scan_reverse(v) ^ 31U; #elif defined(__GNUC__) && (__GNUC__ >= 4 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 4)) /* This will translate either to (bsr ^ 31U), clz , ctlz, cntlz, lzcnt depending on * -march= setting or to a software rutine in exotic machines. */ -- 1.7.10.4 --=_vA9B0g0mKp5QbmIrcmD4lw5 Content-Type: text/x-diff; charset=us-ascii; name=0002-bitwriter.c-Add-missing-extern-declaration.patch Content-Disposition: attachment; filename=0002-bitwriter.c-Add-missing...

PATCH for bitmath.h: 1 typo, 1 warning

2013 Aug 16

PATCH for bitmath.h: 1 typo, 1 warning

...\include\private\bitmath.h 2013-08-14 10:20:51.484053700 +0400 @@ -78,12 +78,12 @@ return _bit_scan_reverse(v) ^ 31U; #elif defined(__GNUC__) && (__GNUC__ >= 4 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 4)) /* This will translate either to (bsr ^ 31U), clz , ctlz, cntlz, lzcnt depending on - * -march= setting or to a software rutine in exotic machines. */ + * -march= setting or to a software routine in exotic machines. */ return __builtin_clz(v); #elif defined(_MSC_VER) && (_MSC_VER >= 1400) - FLAC__uint32 idx; + unsigned long idx; _BitScan...

Git branch with compiling fixes for win32

2012 May 04

Git branch with compiling fixes for win32

El 03/05/12 12:19, Miroslav Lichvar escribi?: > Hi Josh, > > nice to see you here again. > > On Wed, Apr 25, 2012 at 04:26:05PM -0700, Josh Coalson wrote: >> (Jumping in again, maybe at the wrong point since this doesn't seem >> to involve encoding, but here goes.) >> >> Miroslav's patches have always been high-quality for sure. But >>

unable to emit vectorized code in LLVM IR

2017 Aug 17

unable to emit vectorized code in LLVM IR

...ck-protector-buffer-size"="8" "target-cpu"="knl" >>>>>> "target-features"="+adx,+aes,+avx,+avx2,+avx512cd,+avx512er, >>>>>> +avx512f,+avx512pf,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fx >>>>>> sr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+prefetchwt1,+rdrnd,+r >>>>>> dseed,+rtm,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" >>>>>> "unsafe-fp-math"="false" "use-soft-float"="false" } >>>>>> &g...

unable to emit vectorized code in LLVM IR

2017 Aug 17

unable to emit vectorized code in LLVM IR

..."="8" "target-cpu"="knl" >>>>>>>> "target-features"="+adx,+aes,+avx,+avx2,+avx512cd,+avx512er, >>>>>>>> +avx512f,+avx512pf,+bmi,+bmi2,+cx16,+f16c,+fma,+fsgsbase,+fx >>>>>>>> sr,+lzcnt,+mmx,+movbe,+pclmul,+popcnt,+prefetchwt1,+rdrnd,+r >>>>>>>> dseed,+rtm,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt" >>>>>>>> "unsafe-fp-math"="false" "use-soft-float"="false" } >>&gt...

search for: lzcnt