Timothy J. Wood
2000-Nov-26 19:42 UTC
[vorbis-dev] Another good optimization (for PPC only, though)
Using the PPC frsqrte and fres instructions, I got the percentage time take in
the smoothing code in _vp_compute_mask() down from 13.64% to 1.88% of the
running time.
In my local copy of Vorbis I have a fast_math.[hc] in vorbis/lib and have a
_fast_sqrt() inline in fast_math.h. If anyone else wants to try it out, it
follows.
I can currently encode my test file (the first 15 seconds of Supervixen by
Garbage) in about 11.1 seconds. Faster than real time, at least :)
-tim
#if defined(__ppc__)
// Vanilla PPC code, but since PPC has a reciprocal square root estimate
instruction,
// runs *much* faster than calling sqrt(). We'll use two Newton-Raphson
// refinement steps to get bunch more precision in the 1/sqrt() value for very
little cost.
// We'll then invert using the PPC reciprocal estimate instruction and also
refine that
// (although we should only need a single refinement step there).
// This is about 8.8 times faster than sqrt() and according to my testing (not
exhaustive)
// it returns fairly accurate results (error below 1.0e-5 up to 100000.0 in 0.1
increments).
// In practice in Vorbis, it seems to do even better (the biggest error I saw
was down in
// the 1e-10 range).
static inline float _fast_sqrt(float x)
{
const float half = 0.5;
const float one = 1.0;
float B, y0, y1;
// This'll NaN if it hits frsqrte
if (x == 0.0)
return x;
B = x;
#ifdef __GNUC__
asm("frsqrte %0,%1" : "=f" (y0) : "f" (B));
#else
y0 = __frsqrte(B);
#endif
/* First refinement step */
y1 = y0 + half*y0*(one - B*y0*y0);
/* Second refinement step */
y0 = y1;
y1 = y0 + half*y0*(one - B*y0*y0);
/* Now that we have a good estimate of 1/sqrt(x) we need to invert it. We
will also approximate this. */
B = y1;
#ifdef __GNUC__
asm("fres %0,%1" : "=f" (y0) : "f" (B));
#else
y0 = __fres(B);
#endif
/* Do a Newton-Rhapson refinement step on the reciprocal estimate */
y1 = y0 + y0*(1.0 - B*y0);
return y1;
}
#else
static inline double _fast_sqrt(double x)
{
return sqrt(x);
}
#endif
-tim
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body. No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.
David Riley
2000-Nov-26 20:57 UTC
[vorbis-dev] Another good optimization (for PPC only, though)
"Timothy J. Wood" wrote:> > #if defined(__ppc__) > // Vanilla PPC code, but since PPC has a reciprocal square root estimate instruction, > // runs *much* faster than calling sqrt(). We'll use two Newton-Raphson > // refinement steps to get bunch more precision in the 1/sqrt() value for very little cost.This should also do nicely with 3dnow, which has similar instructions. I dunno about SSE, I don't know the instruction set or have a processor with which to test it. If no one else wants to work on this, I'll try in a few days (once a new CPU comes for my Linux box, which is down ATM). --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Segher Boessenkool
2000-Nov-27 07:27 UTC
[vorbis-dev] Another good optimization (for PPC only, though)
> // Vanilla PPC code, but since PPC has a reciprocal square root estimate instruction, > // runs *much* faster than calling sqrt(). We'll use two Newton-Raphson > // refinement steps to get bunch more precision in the 1/sqrt() value for very little cost. > // We'll then invert using the PPC reciprocal estimate instruction and also refine that > // (although we should only need a single refinement step there).Nonononono! If you got y = 1/sqrt(x), then "y *= x;" will give you y = sqrt(x), easier, cheaper & better :-)> if (x == 0.0)There are separate +0.0 and -0.0, so you better watch out.> /* Second refinement step */ > y0 = y1; > y1 = y0 + half*y0*(one - B*y0*y0);Why the copying? Ciao, Segher --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Timothy J. Wood
2000-Nov-27 07:52 UTC
[vorbis-dev] Another good optimization (for PPC only, though)
OK, here is the sqrt(x) = x * 1/sqrt(x) version.
It gets about the same accuracy (the reciprocal estimate instruction gives
really good accuracy for most cases), but is 12.4x faster than sqrt instead of
only 8.8x.
This doesn't end up having a lot of effect on the eventual performance
since the percentage time taken here was so low already, but this is obviously
better :)
-tim
#if defined(__ppc__)
// Vanilla PPC code, but since PPC has a reciprocal square root estimate
instruction,
// runs *much* faster than calling sqrt(). We'll use two Newton-Raphson
// refinement steps to get bunch more precision in the 1/sqrt() value for very
little cost.
// We'll then multiply 1/sqrt times the original value to get the sqrt.
// This is about 12.4 times faster than sqrt() and according to my testing (not
exhaustive)
// it returns fairly accurate results (error below 1.0e-5 up to 100000.0 in 0.1
increments).
static inline float _fast_sqrt(float x)
{
const float half = 0.5;
const float one = 1.0;
float B, y0, y1;
// This'll NaN if it hits frsqrte. Handle both +0.0 and -0.0
if (fabs(x) == 0.0)
return x;
B = x;
#ifdef __GNUC__
asm("frsqrte %0,%1" : "=f" (y0) : "f" (B));
#else
y0 = __frsqrte(B);
#endif
/* First refinement step */
y1 = y0 + half*y0*(one - B*y0*y0);
/* Second refinement step -- copy the output of the last step to the input
of this step */
y0 = y1;
y1 = y0 + half*y0*(one - B*y0*y0);
/* Get sqrt(x) from x * 1/sqrt(x) */
return x * y1;
}
#else
static inline double _fast_sqrt(double x)
{
return sqrt(x);
}
#endif
--- >8 ----
List archives: http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'vorbis-dev-request@xiph.org'
containing only the word 'unsubscribe' in the body. No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.
Timothy J. Wood
2000-Nov-27 07:56 UTC
[vorbis-dev] Another good optimization (for PPC only, though)
>Nonononono! If you got y = 1/sqrt(x), then "y *= x;" will give you y = sqrt(x), >easier, cheaper & better :-)Duh. I'll put that in. In my defense, I'd been up for over 24 hours when writing that. I should have slept first, obviously :)>> if (x == 0.0)>There are separate +0.0 and -0.0, so you better watch out.if (fabs(x) == 0.0), then I guess.>> /* Second refinement step */ >> y0 = y1; >> y1 = y0 + half*y0*(one - B*y0*y0); > >Why the copying?The refinement expression takes y0 as an input and produces y1. The copying assignes the output of the last refinement to the input of the next. Sure, I could have rewritten the expression to avoid this, but that would be less clear, I think (and the compiler should do this anyway). -tim --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Timothy J. Wood
2000-Nov-27 08:20 UTC
[vorbis-dev] Another good optimization (for PPC only, though)
<HR NOSHADE> <UL> <LI>text/enriched attachment: stored </UL> -------------- next part -------------- A non-text attachment was scrubbed... Name: part Type: application/octet-stream Size: 2573 bytes Desc: not available Url : http://lists.xiph.org/pipermail/vorbis-dev/attachments/20001127/a985ff0c/part-0001.obj