Stefan Kanthak via llvm-dev
2018-Dec-04 19:47 UTC
[llvm-dev] __parityti2(), __paritydi2() and __paritysi2() vs. __builtin_parity
Hi @ll, compiler-rt/lib/builtins/parityti2.c compiler-rt/lib/builtins/paritydi2.c compiler-rt/lib/builtins/paritysi2.c implement the parity function as matroschka: si_int __paritysi2(si_int a) { su_int x = (su_int)a; x ^= x >> 16; x ^= x >> 8; x ^= x >> 4; return (0x6996 >> (x & 0xF)) & 1; // see optimisation below! } si_int __paritydi2(di_int a) { dwords x; x.all = a; return __paritysi2(x.s.high ^ x.s.low); } si_int __parityti2(ti_int a) { twords x; x.all = a; return __paritydi2(x.s.high ^ x.s.low); } Questions: ~~~~~~~~~~ 1. are these functions still needed, given that __builtin_parity is available? 2. will the optimiser "inline" the internal function calls (as part of LTO)? If NOT, they should be inlined manually! JFTR: if the 3 functions are part of a single source or compilation unit, they are inlined by the compiler! Yes, parity is seldomly used, so this optimisation may not seem necessary. si_int __paritydi2(di_int a) { su_int x = (su_int)a; x ^= (du_int)a >> 32; x ^= x >> 16; x ^= x >> 8; x ^= x >> 4; return (0x69966996 >> x) & 1; } si_int __parityti2(ti_int a) { du_int x = (du_int)a; x ^= (tu_int)a >> 64; x ^= x >> 32; x ^= x >> 16; x ^= x >> 8; x ^= x >> 4; return (0x69966996 >> x) & 1; } CAVEAT: the last right-shift MAY BE undefined behaviour, the optimisation shown here only works on CPUs which perform shifts modulo word-size! stay tuned Stefan Kanthak
Craig Topper via llvm-dev
2018-Dec-04 20:28 UTC
[llvm-dev] __parityti2(), __paritydi2() and __paritysi2() vs. __builtin_parity
I don't believe clang/llvm will ever emit a call to the parity library routines today. But it might be needed so we can say that we match the libgcc interface. gcc doesn't use it for x86 either as far as I know. And I don't know if they have it in their x86 libgcc. But it might be easier for compiler-rt to be a superset of libgcc rather than trying to track exactly what they have on each target. Not sure about the inlining question. ~Craig On Tue, Dec 4, 2018 at 11:51 AM Stefan Kanthak via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi @ll, > > compiler-rt/lib/builtins/parityti2.c > compiler-rt/lib/builtins/paritydi2.c > compiler-rt/lib/builtins/paritysi2.c > > implement the parity function as matroschka: > > si_int > __paritysi2(si_int a) > { > su_int x = (su_int)a; > x ^= x >> 16; > x ^= x >> 8; > x ^= x >> 4; > return (0x6996 >> (x & 0xF)) & 1; // see optimisation below! > } > > si_int > __paritydi2(di_int a) > { > dwords x; > x.all = a; > return __paritysi2(x.s.high ^ x.s.low); > } > > si_int > __parityti2(ti_int a) > { > twords x; > x.all = a; > return __paritydi2(x.s.high ^ x.s.low); > } > > Questions: > ~~~~~~~~~~ > > 1. are these functions still needed, given that __builtin_parity is > available? > > 2. will the optimiser "inline" the internal function calls (as part of > LTO)? > > If NOT, they should be inlined manually! > > JFTR: if the 3 functions are part of a single source or compilation > unit, > they are inlined by the compiler! > > Yes, parity is seldomly used, so this optimisation may not seem > necessary. > > si_int > __paritydi2(di_int a) > { > su_int x = (su_int)a; > x ^= (du_int)a >> 32; > x ^= x >> 16; > x ^= x >> 8; > x ^= x >> 4; > return (0x69966996 >> x) & 1; > } > > si_int > __parityti2(ti_int a) > { > du_int x = (du_int)a; > x ^= (tu_int)a >> 64; > x ^= x >> 32; > x ^= x >> 16; > x ^= x >> 8; > x ^= x >> 4; > return (0x69966996 >> x) & 1; > } > > CAVEAT: the last right-shift MAY BE undefined behaviour, the optimisation > shown here only works on CPUs which perform shifts modulo > word-size! > > stay tuned > Stefan Kanthak > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181204/cbcac064/attachment.html>