Timothy J. Wood
2000-Nov-26 19:42 UTC
[vorbis-dev] Another good optimization (for PPC only, though)
Using the PPC frsqrte and fres instructions, I got the percentage time take in the smoothing code in _vp_compute_mask() down from 13.64% to 1.88% of the running time. In my local copy of Vorbis I have a fast_math.[hc] in vorbis/lib and have a _fast_sqrt() inline in fast_math.h. If anyone else wants to try it out, it follows. I can currently encode my test file (the first 15 seconds of Supervixen by Garbage) in about 11.1 seconds. Faster than real time, at least :) -tim #if defined(__ppc__) // Vanilla PPC code, but since PPC has a reciprocal square root estimate instruction, // runs *much* faster than calling sqrt(). We'll use two Newton-Raphson // refinement steps to get bunch more precision in the 1/sqrt() value for very little cost. // We'll then invert using the PPC reciprocal estimate instruction and also refine that // (although we should only need a single refinement step there). // This is about 8.8 times faster than sqrt() and according to my testing (not exhaustive) // it returns fairly accurate results (error below 1.0e-5 up to 100000.0 in 0.1 increments). // In practice in Vorbis, it seems to do even better (the biggest error I saw was down in // the 1e-10 range). static inline float _fast_sqrt(float x) { const float half = 0.5; const float one = 1.0; float B, y0, y1; // This'll NaN if it hits frsqrte if (x == 0.0) return x; B = x; #ifdef __GNUC__ asm("frsqrte %0,%1" : "=f" (y0) : "f" (B)); #else y0 = __frsqrte(B); #endif /* First refinement step */ y1 = y0 + half*y0*(one - B*y0*y0); /* Second refinement step */ y0 = y1; y1 = y0 + half*y0*(one - B*y0*y0); /* Now that we have a good estimate of 1/sqrt(x) we need to invert it. We will also approximate this. */ B = y1; #ifdef __GNUC__ asm("fres %0,%1" : "=f" (y0) : "f" (B)); #else y0 = __fres(B); #endif /* Do a Newton-Rhapson refinement step on the reciprocal estimate */ y1 = y0 + y0*(1.0 - B*y0); return y1; } #else static inline double _fast_sqrt(double x) { return sqrt(x); } #endif -tim --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
David Riley
2000-Nov-26 20:57 UTC
[vorbis-dev] Another good optimization (for PPC only, though)
"Timothy J. Wood" wrote:> > #if defined(__ppc__) > // Vanilla PPC code, but since PPC has a reciprocal square root estimate instruction, > // runs *much* faster than calling sqrt(). We'll use two Newton-Raphson > // refinement steps to get bunch more precision in the 1/sqrt() value for very little cost.This should also do nicely with 3dnow, which has similar instructions. I dunno about SSE, I don't know the instruction set or have a processor with which to test it. If no one else wants to work on this, I'll try in a few days (once a new CPU comes for my Linux box, which is down ATM). --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Segher Boessenkool
2000-Nov-27 07:27 UTC
[vorbis-dev] Another good optimization (for PPC only, though)
> // Vanilla PPC code, but since PPC has a reciprocal square root estimate instruction, > // runs *much* faster than calling sqrt(). We'll use two Newton-Raphson > // refinement steps to get bunch more precision in the 1/sqrt() value for very little cost. > // We'll then invert using the PPC reciprocal estimate instruction and also refine that > // (although we should only need a single refinement step there).Nonononono! If you got y = 1/sqrt(x), then "y *= x;" will give you y = sqrt(x), easier, cheaper & better :-)> if (x == 0.0)There are separate +0.0 and -0.0, so you better watch out.> /* Second refinement step */ > y0 = y1; > y1 = y0 + half*y0*(one - B*y0*y0);Why the copying? Ciao, Segher --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Timothy J. Wood
2000-Nov-27 07:52 UTC
[vorbis-dev] Another good optimization (for PPC only, though)
OK, here is the sqrt(x) = x * 1/sqrt(x) version. It gets about the same accuracy (the reciprocal estimate instruction gives really good accuracy for most cases), but is 12.4x faster than sqrt instead of only 8.8x. This doesn't end up having a lot of effect on the eventual performance since the percentage time taken here was so low already, but this is obviously better :) -tim #if defined(__ppc__) // Vanilla PPC code, but since PPC has a reciprocal square root estimate instruction, // runs *much* faster than calling sqrt(). We'll use two Newton-Raphson // refinement steps to get bunch more precision in the 1/sqrt() value for very little cost. // We'll then multiply 1/sqrt times the original value to get the sqrt. // This is about 12.4 times faster than sqrt() and according to my testing (not exhaustive) // it returns fairly accurate results (error below 1.0e-5 up to 100000.0 in 0.1 increments). static inline float _fast_sqrt(float x) { const float half = 0.5; const float one = 1.0; float B, y0, y1; // This'll NaN if it hits frsqrte. Handle both +0.0 and -0.0 if (fabs(x) == 0.0) return x; B = x; #ifdef __GNUC__ asm("frsqrte %0,%1" : "=f" (y0) : "f" (B)); #else y0 = __frsqrte(B); #endif /* First refinement step */ y1 = y0 + half*y0*(one - B*y0*y0); /* Second refinement step -- copy the output of the last step to the input of this step */ y0 = y1; y1 = y0 + half*y0*(one - B*y0*y0); /* Get sqrt(x) from x * 1/sqrt(x) */ return x * y1; } #else static inline double _fast_sqrt(double x) { return sqrt(x); } #endif --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Timothy J. Wood
2000-Nov-27 07:56 UTC
[vorbis-dev] Another good optimization (for PPC only, though)
>Nonononono! If you got y = 1/sqrt(x), then "y *= x;" will give you y = sqrt(x), >easier, cheaper & better :-)Duh. I'll put that in. In my defense, I'd been up for over 24 hours when writing that. I should have slept first, obviously :)>> if (x == 0.0)>There are separate +0.0 and -0.0, so you better watch out.if (fabs(x) == 0.0), then I guess.>> /* Second refinement step */ >> y0 = y1; >> y1 = y0 + half*y0*(one - B*y0*y0); > >Why the copying?The refinement expression takes y0 as an input and produces y1. The copying assignes the output of the last refinement to the input of the next. Sure, I could have rewritten the expression to avoid this, but that would be less clear, I think (and the compiler should do this anyway). -tim --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a message to 'vorbis-dev-request@xiph.org' containing only the word 'unsubscribe' in the body. No subject is needed. Unsubscribe messages sent to the list will be ignored/filtered.
Timothy J. Wood
2000-Nov-27 08:20 UTC
[vorbis-dev] Another good optimization (for PPC only, though)
<HR NOSHADE> <UL> <LI>text/enriched attachment: stored </UL> -------------- next part -------------- A non-text attachment was scrubbed... Name: part Type: application/octet-stream Size: 2573 bytes Desc: not available Url : http://lists.xiph.org/pipermail/vorbis-dev/attachments/20001127/a985ff0c/part-0001.obj