I'm trying to make a design decision between a TI 6416 or DM642 (fixed point) and 6713 (floating point) platform. The application is a 32 channel speech encoder. (CBR only, 8khz, 8kbps) To get a feel for the computational load, I am running 1 second (50 frames) of voice through the encoder. My profile of the 6416 indicates I'm at 27.4M cycles/channel. I need to get below 720Mhz/32 channels = 22.5M cycles per channel. I did a little work on inner_prod() and normalize16() and I'm confident I can get 32 channels by optimizing 5 or 6 functions. I expect these numbers to translate over the DM642. Symbol Name Count cycle.Total: Incl. cycle.Total:Excl. compute_weighted_codebook 200 4511420 4511420 iir_mem2 599 3338308 3338308 filter_mem2 799 2323655 2323655 compute_impulse_response 200 1800518 1800518 pitch_gain_search_3tap 199 4726604 1744952 open_loop_nbest_pitch 199 4204121 1641016 vq_nbest 800 1626252 1626252 lpc_to_lsp 50 1612650 1558133 nb_encode 50 27412845 1179551 fir_mem2 50 1097300 1097300 inner_prod 27469 1072299 1072299 split_cb_search_shape_sign_N1200 7310588 1007711 normalize16 597 303378 303378, A lower cost option would be to use a floating point 6713. I thought that a 300Mhz floating point would come out even or ahead in an encoding comparison. Instead of the 300M/32=9.3M cycles per channel that I need, I see 71.5M cycles per channel!!! compute_weighted_codebook 200 8709029 8709029 filter_mem2 799 8322224 8322224 inner_prod 27469 5911396 5911396 vq_nbest 800 5465094 5465094 iir_mem2 599 5378906 5378906 split_cb_search_shape_sign_N1 200 18106210 3694787 compute_impulse_response 200 3084502 3084502 open_loop_nbest_pitch 199 18400309 2817913 pitch_gain_search_3tap 199 7002859 2696353 _spx_autocorr 50 2211100 2211100 lsp_to_lpc 450 2076854 2076854 nb_encode 50 71523682 1938067 fir_mem2 50 1777450 1777450 cheb_poly_eva 9634 1564172 1564172 lsp_weight_quant 100 1032600 1032600 Does this make sense? I'm generating floating point code, using the optimizer, etc... Has anyone posted DM642, C64xx or C67xx benchmarks?
> To get a feel for the computational load, I am running 1 second (50 frames) > of voice through the encoder.You might want to use a bit more just so you don't see the initialization complexity at all.> My profile of the 6416 indicates I'm at 27.4M cycles/channel. I need to get > below 720Mhz/32 channels = 22.5M cycles per channel. I did a little work on > inner_prod() and normalize16() and I'm confident I can get 32 channels by > optimizing 5 or 6 functions. I expect these numbers to translate over the > DM642.have you tried defining PRECISION16? That should reduce the computation cost.> A lower cost option would be to use a floating point 6713. I thought that a > 300Mhz floating point would come out even or ahead in an encoding > comparison. Instead of the 300M/32=9.3M cycles per channel that I need, I > see 71.5M cycles per channel!!!That's definitely strange. Normally, if your chip takes the same time to do a float op than it takes to do an int op, then the float version should be faster. That's because some of the float ops get replaced by several int ops.> Does this make sense? > I'm generating floating point code, using the optimizer, etc...Are you sure the compiler isn't using float emulation or something like that?> Has anyone posted DM642, C64xx or C67xx benchmarks?I'm not aware of any. Jean-Marc
I started my project using the CodeComposerStudio speex_C64_test.pjt in speex 1.1.11.1. To build using floating point, I created a new project with the same files and modified ti\config.h to #undef FIXED_POINT. Is there a better way to configure a floating point processor? I have a few TI specific optimizations that could go into the next release. What's the procedure for submitting code? I've been working with this code for about a week now. I'm still trying to understand it all, but I'm particularly impressed by the float vs fixed flexibility of the code. Jerry J. Trantow Applied Signal Processing, Inc. jtrantow@ieee.org -----Original Message----- From: Jean-Marc Valin [mailto:jean-marc.valin@usherbrooke.ca] Sent: Thursday, January 19, 2006 1:00 AM To: Jerry Trantow Cc: speex-dev@xiph.org Subject: Re: [Speex-dev] TI 6xxx platform performance> To get a feel for the computational load, I am running 1 second (50frames)> of voice through the encoder.You might want to use a bit more just so you don't see the initialization complexity at all.> My profile of the 6416 indicates I'm at 27.4M cycles/channel. I need toget> below 720Mhz/32 channels = 22.5M cycles per channel. I did a little workon> inner_prod() and normalize16() and I'm confident I can get 32 channels by > optimizing 5 or 6 functions. I expect these numbers to translate over the > DM642.have you tried defining PRECISION16? That should reduce the computation cost.> A lower cost option would be to use a floating point 6713. I thought thata> 300Mhz floating point would come out even or ahead in an encoding > comparison. Instead of the 300M/32=9.3M cycles per channel that I need, I > see 71.5M cycles per channel!!!That's definitely strange. Normally, if your chip takes the same time to do a float op than it takes to do an int op, then the float version should be faster. That's because some of the float ops get replaced by several int ops.> Does this make sense? > I'm generating floating point code, using the optimizer, etc...Are you sure the compiler isn't using float emulation or something like that?> Has anyone posted DM642, C64xx or C67xx benchmarks?I'm not aware of any. Jean-Marc