More or less detailed explanation of this patch. 1. The first parameter of _BitScanReverse() and _BitScanReverse64() is a pointer to unsigned long (4-byte int). However _BitScanReverse64() is called with a pointer to FLAC__uint64 (8-byte int). IMHO it's a bug and this patch changes the type of idx variable inside FLAC__bitmath_ilog2_wide() from FLAC__uint64 to unsigned long. The type of idx variable inside FLAC__bitmath_ilog2() was also changed (from FLAC__uint32 to unsigned long), for symmetry. 2. FLAC__clz_uint32() returns the result of BitScanReverse() XOR 31. FLAC__bitmath_ilog2() returns 31-FLAC__clz_uint32() == 31-(BSR^31). As 0<=BSR<=31 so 31-(BSR^31) == BSR. But I don't think that compilers are so smart that can optimize this expression. So it is better to change FLAC__bitmath_ilog2() so it simply calls BitScanReverse() when possible. 3. FLAC__bitmath_ilog2_wide() cannot be compiled with MSVC: it has 'if (v == 0)...' statement followed by a variable (either idx or DEBRUIJN_IDX64) definition. But all calls of this function have FLAC__ASSERT statements that ensure that its argument is greater than 0. So this 'if()' can be removed. 4. Am I right that FLAC__bitmath_ilog2_wide() and FLAC__bitmath_ilog2() return the same value if their argument is less than 2^32? Then FLAC__bitmath_ilog2_wide() works correctly only when compiled with GCC: it should return idx, not idx^63U. Also the result of de Bruijn sequences is off by 1. -------------- next part -------------- A non-text attachment was scrubbed... Name: bitmath.patch Type: application/octet-stream Size: 2625 bytes Desc: not available Url : http://lists.xiph.org/pipermail/flac-dev/attachments/20130904/9457509c/attachment.obj
A newer version of bitmath patch. It also replaces 'de Bruijn sequences' code with calls to 32-bit intrinsics. -------------- next part -------------- A non-text attachment was scrubbed... Name: bitmath_v2.patch Type: application/octet-stream Size: 2821 bytes Desc: not available Url : http://lists.xiph.org/pipermail/flac-dev/attachments/20130907/770e3b17/attachment.obj
lvqcl wrote:> More or less detailed explanation of this patch.I've applied part of that patch.> 1. The first parameter of _BitScanReverse() and _BitScanReverse64() is > a pointer to unsigned long (4-byte int).On windows, yes, unsigned long is 4 bytes for both 32 and 64 bit versions. This is not true for most Unices.> However _BitScanReverse64() is called with a pointer to FLAC__uint64 > (8-byte int). IMHO it's a bugI would not call that a bug, just a result of trying to write cross platform code and relying on functions that are not specified by any standard.> 2. FLAC__clz_uint32() returns the result of BitScanReverse() XOR 31. > FLAC__bitmath_ilog2() returns 31-FLAC__clz_uint32() == 31-(BSR^31). > > As 0<=BSR<=31 so 31-(BSR^31) == BSR. But I don't think that compilers > are so smart that can optimize this expression. So it is better to change > FLAC__bitmath_ilog2() so it simply calls BitScanReverse() when possible.I want to look at this more carefully. Its probably worth writing a unit test for it.> 3. FLAC__bitmath_ilog2_wide() cannot be compiled with MSVC: it has > 'if (v == 0)...' statement followed by a variable (either idx or > DEBRUIJN_IDX64) definition. But all calls of this function have > FLAC__ASSERT statements that ensure that its argument is greater > than 0. So this 'if()' can be removed.I'd actuall prefer to remove those ASSERTS, because they get bypassed for the production compiles anyway.> 4. Am I right that FLAC__bitmath_ilog2_wide() and FLAC__bitmath_ilog2() > return the same value if their argument is less than 2^32? Then > FLAC__bitmath_ilog2_wide() works correctly only when compiled with GCC: > it should return idx, not idx^63U. Also the result of de Bruijn sequences > is off by 1.This also needs more investigation. Your patch changes the output of this function for any given input. Erik -- ---------------------------------------------------------------------- Erik de Castro Lopo http://www.mega-nerd.com/
Erik de Castro Lopo <mle+la at mega-nerd.com> wrote:>> However _BitScanReverse64() is called with a pointer to FLAC__uint64 >> (8-byte int). IMHO it's a bug > > I would not call that a bug, just a result of trying to write cross > platform code and relying on functions that are not specified by > any standard.Well, the calls of _BitScanReverse/_BitScanReverse64 are inside #ifdef _MSC_VER, so it's MS-specific code, not cross-platform. _BitScanReverse64 writes 4 least significant bytes of idx, the 4 most significant bytes are not initialized and contain some garbage. FLAC__bitmath_ilog2_wide() returns unsigned. If sizeof(unsigned)==4 then these 4 most significant bytes are discarded and the result is correct. But if sizeof(unsigned)==8 then FLAC__bitmath_ilog2_wide() will usually return incorrect value. But as I said it is MS-only code, so sizeof(unsigned) is always 4.>> 4. Am I right that FLAC__bitmath_ilog2_wide() and FLAC__bitmath_ilog2() >> return the same value if their argument is less than 2^32? Then >> FLAC__bitmath_ilog2_wide() works correctly only when compiled with GCC: >> it should return idx, not idx^63U. Also the result of de Bruijn sequences >> is off by 1. > > This also needs more investigation. Your patch changes the output > of this function for any given input.There is a comment "An example of what FLAC__bitmath_ilog2() computes". One can see that FLAC__bitmath_ilog2() == BitScanReverse() == 31 - CountLeadingZeroes(). And it seems that FLAC__bitmath_ilog2_wide() == BitScanReverse64() === 63 - CountLeadingZeroesLongLong(). About de Bruijn: Ulrich Klauer posted a link to http://ccodearchive.net/info/ilog.html From this page: "This can also be thought of as the location of the highest set bit, with counting starting from one (so that 0 returns 0, 1 returns 1, and 2**31 returns 32)". For FLAC__bitmath_ilog2()/ilog2_wide() counting starts from zero. --------------------------------------- And by the way: FLAC__bitmath_ilog2_wide() is for integer-only encoder. IMHO this encoder is for platforms with very slow or absent FPU, and the inclusion of MS-specific code to this function looks a bit bizarre. I measured a speed of various implementations of FLAC__bitmath_ilog2_wide() on my Core2 CPU before I realized that it's pointless.
Erik de Castro Lopo <mle+la at mega-nerd.com> wrote:>> 3. FLAC__bitmath_ilog2_wide() cannot be compiled with MSVC: it has >> 'if (v == 0)...' statement followed by a variable (either idx or >> DEBRUIJN_IDX64) definition. But all calls of this function have >> FLAC__ASSERT statements that ensure that its argument is greater >> than 0. So this 'if()' can be removed. > > I'd actuall prefer to remove those ASSERTS, because they get bypassed > for the production compiles anyway.I thought that these ASSERTs mean that the encoding algorithm ensures: the argument of FLAC__bitmath_ilog2_wide() is always > 0. In this case there's no need to compare it with 0 inside the function.