Hey guys,
I just released my new MacOSX-based OpenAL implementation...part of it
is a Ogg Vorbis decoder based on the 1.0 reference libraries. I spent
some time optimizing them and found that many of the hotspots in
libvorbis are perfect candidates for vectorization, so I wrote Altivec
versions of them.
The end result? Decoding of a .ogg file is between 30 and 50% faster on
a Mac with an Altivec unit over the stock reference libraries...which
doesn't suck. Decoding is still a little faster even without Altivec due
to some other optimizations that don't involve vectorization.
<p>I'm not putting together a patch (because, honestly, my changes
aren't
pretty), but if it's worth anything to anyone, the optimized libraries
are here:
http://cvs.icculus.org/horde/chora/cvs.php/osx/AL_EXT_vorbis?rt=al_osx
(or, to check it out from CVS:
cvs -z9 -d:pserver:anonymous@cvs.icculus.org:/cvs/cvsroot co al_osx
...password is "anonymous").
libvorbis was a great candidate for Altivec because it does a ton of
math on floating point numbers that almost always seem to align to 16
byte offsets. Similiar results are probably possible on x86 chips with
the SSE instruction set (MMX, 3DNow, etc too?)
There are one or two good optimization wins that resulted from code
changes that have nothing to do with vectorization, too (moving branches
and invariant code out of loops, forcing things into registers, etc).
Overall, some .ogg files seem to spend a lot of time in
vorbis_lsp_to_curve()...I assume this .ogg is from an older version of
the vorbis spec, and such files tend to eat more CPU (although there are
some wins here by inlining the the lookup table functions and using the
frsqrte opcode instead of the invsqrt lookup table...memory access is a
huge bottleneck on the Mac, so recalculating things is frequently faster
than using a lookup table)...these files lean towards the 30% speedup
side of the field. Other .oggs (newer version?) seem to skip this
function altogether and spend a lot of time in mdct.c...where most of
the vectorization occurs...these files lean towards a 50% speedup.
Anyhow, if someone wanted to get these changes into a mainline
libvorbis, they should:
- diff libogg and libvorbis from that CVS.
- Make sure the build system #defines MACOSX=1 and gcc is invoked with
the -faltivec command line (-O3, -ffast-math, and -falign-loops=16 are
huge helps, too).
- Change the _al_has_vector_unit() define in misc.h to point to a static
variable in libvorbis and set that variable in a convenient place during
initialization. Currently this variable exists in my AL implementation
and not libvorbis. The code to detect an Altivec unit in MacOSX looks
like this:
#include <CoreServices/CoreServices.h>
long cpufeature = 0;
OSErr err = Gestalt(gestaltPowerPCProcessorFeatures, &cpufeature);
if (err == noErr) {
if ((1 << gestaltPowerPCHasVectorInstructions) & cpufeature)
VectorUnitDetected = 1;
}
Using this code will need "-framework Carbon" on the gcc and ld
commandline.
Non-Mac platforms should already have _al_has_vector_unit() #defined to
be (0), and the *_vectorized functions are inlined stubs, so branches
and functions should be optimized out...but ideally, they should get
filled in with SSE/whatever code.
Theoretically, besides the OSX-specific API for detecting the vector
unit, the Altivec code should work on PowerPC Linux and other PPC-based
OSes that use gcc (well, Apple's Altivec extensions...IBM's compiler and
maybe CodeWarrior handle them, too).
If this is useful to anyone, feel free to grab it from my CVS...I don't
plan to touch the code anymore unless something is really broken, but
I'll answer any questions people have about it.
--ryan.
<p>--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body. No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.