<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"> </head> <body bgcolor="#ffffff" text="#000000"> Hi,<br> <br> I have been looking into optimising the CELT decoder for speed to make it acceptable for use in games, we would need it to be at least twice as fast as it currently is for this. I was hoping to be able to crunch some things down with some SIMD but there doesn''t seem to be any good candidates for that.<br> <br> Some profiling has shown the ec_ilog and decode_pulses would be good functions to optimise, though they seem quite minimal already. <br> <br> Do you guys have any tips on how I might make things faster with the CELT decoder?<br> <br> I am currently on 0.6.1, do the newer versions have any significant performance updates to the decoder? (I''d prefer not to update until 1.0 if I don''t have to as we have made some changes to get things working on the ps3).<br> <br> <div class="moz-signature">-- <br> <span style="font-size: 10pt; font-family: "Arial","sans-serif"; color: rgb(102, 102, 102);"><font color="#000000"><b>Chen-Po Sun</b></font> | Programmer <br> Firelight Technologies Pty Ltd. <br> FMOD Sound System | <a class="moz-txt-link-abbreviated" href="http://www.fmod.org">www.fmod.org</a> <br> PH: <font color="#000000">+61 3 96635947</font> Fax: <font color="#000000">+61 3 96635951</font> <br> </span> </div> </body> </html>
On 2010-02-12 02:03, Chen-Po Sun wrote:> I have been looking into optimising the CELT decoder for speed to make > it acceptable for use in games, we would need it to be at least twice as > fast as it currently is for this. I was hoping to be able to crunch some > things down with some SIMD but there doesn't seem to be any good > candidates for that.First, if your chip has a fast FPU, make sure the code is compiled as float (default). Also, note that if you encode without the pitch predictor (e.g. using complexity 1), the decoding operation is faster too.> Some profiling has shown the ec_ilog and decode_pulses would be good > functions to optimise, though they seem quite minimal already.On most CPUs, there's actually an instruction that computes ec_ilog(). For example, on x86 that would be the CLZ (count leading zeros) instructions, where ilog2(x)=31-clz(x) or something like that. Most chips of that instruction in one form or another. As for decode pulses, there are many tradeoffs that can be used for that function. For example, you can make it faster by using more memory -- or in some cases just by tuning the current tradeoffs. The first thing to do would be to check what's the actual bottleneck on that function. Is it the number of arithmetic operations or just the fact that it branches a lot?> I am currently on 0.6.1, do the newer versions have any significant > performance updates to the decoder? (I'd prefer not to update until 1.0 > if I don't have to as we have made some changes to get things working on > the ps3).I think the main change in 0.7.x would be the stereo quality. If you're using stereo, it's probably a good idea to upgrade. You may see a bit of change in complexity, but I would expect it to be small (can't tell in which direction). Cheers, Jean-Marc
On Fri, Feb 12, 2010 at 2:03 AM, Chen-Po Sun <chenpo at fmod.org> wrote:> Hi, > > I have been looking into optimising the CELT decoder for speed to make it > acceptable for use in games, we would need it to be at least twice as fast > as it currently is for this. I was hoping to be able to crunch some things > down with some SIMD but there doesn't seem to be any good candidates for > that.Profiling 0.7.1's decoder here I see the IMDCT (kiss_fft.c and mdct.c) at the top of the profile with 23.93% of the estimated cycle count. It can be simd-ized, especially for the short block case. Because of the reduced overlap window in CELT some operations could probably also be eliminated. Doing that wouldn't get you to your 2x alone, but if you are using only a single frame size it's probably the easiest target for improvement especially as optimized FFT like algorithms are a fairly widely studied subject.