This patch adds two files, fast_math.[ch]. These files includes a
couple faster versions
of the libm routines that don't provide all the semantics and accuracy
of the IEEE
versions (but are still pretty darn accurate). Typically there are
situations in which
full precision is not necessary and the following patch tries to apply
this approach.
I don't know whether this patch will produce acceptable results --
everything
sounds fine to me, but I haven't done exhaustive testing (and I'm not
the world's
most critical listener).
Here is the blow by blow description of the patch:
- Add fast_math.[ch]
- fast_rsqrt_est -- fast reciprocal square root estimate (i.e.,
1.0/sqrt(x))
- fast_sqrt_est -- fast square root estimate
- fast_log_est -- fast natural log estimate
- Adds a IEEE structure for floats
- Adds VORBIS_EXACT_FLOAT_RESULTS. If this is defined, the
estimations are turned off.
- Modify os.h
- Remove _V_IFDEFJAIL_H_. The entire file is already in an _OS_H
ifdef,
so this shouldn't be necessary.
- Added normalized processor ifdef support (VORBIS_X86 and
VORBIS_PPC so far).
These control whether process-specific optimizations are allowed.
Defining C_ONLY turns off all processor-specific code.
- Added VORBIS_BIG_ENDIAN and VORBIS_LITTLE_ENDIAN since
the other various macros for this are not portable
- Switched a few tests to use VORBIS_BIG/LITTLE_ENDIAN
- Modify scales.h
- Import fast_math.h and use fast_log_est instead of log() for todB
and todB_nn()
The sqrt approximation functions are only applicable on PPC right now
and this patch does not include the code to actually use them (since I
wanted to provide some testing numbers that would better reflect likely
changes in x86 performance).
The log approximation breaks the float into its IEEE structure. This
allows use to represent the values as 2^n*fraction and compute the log
as:
log(2^n*fraction)
= log(2^n) + log(fraction)
= n*log(2) + log(fraction)
The log(fraction) estimate is done via few terms of a series expansion
for log that converges better with smaller values. The log estimate is
something like 8.5 times faster on my PPC. YMMV on x86 (I'd like to
hear about the performance change this introduces).
I haven't tested this on x86, but it should work fine (I tested some
similar code). Please let me know if this does anything weird.
On my PPC box, my test case encoded about 8-9% faster with this change
(the PPC specific sqrt change was not enabled for this test).
Still needed are:
- Someone with a better ear than mine needs to test that these don't
produce artifacts.
- The fast log estimate could have a term or two removed to increase
speed
more if artifacts still aren't found.
Here are the new files and the patch for existing files.
<Attachment missing><Attachment missing><Attachment missing>
Please let me know if there are any problems with this patch that I
can correct.
Thanks!
-tim
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body. No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.