I started my project using the CodeComposerStudio speex_C64_test.pjt in speex 1.1.11.1. To build using floating point, I created a new project with the same files and modified ti\config.h to #undef FIXED_POINT. Is there a better way to configure a floating point processor? I have a few TI specific optimizations that could go into the next release. What's the procedure for submitting code? I've been working with this code for about a week now. I'm still trying to understand it all, but I'm particularly impressed by the float vs fixed flexibility of the code. Jerry J. Trantow Applied Signal Processing, Inc. jtrantow@ieee.org -----Original Message----- From: Jean-Marc Valin [mailto:jean-marc.valin@usherbrooke.ca] Sent: Thursday, January 19, 2006 1:00 AM To: Jerry Trantow Cc: speex-dev@xiph.org Subject: Re: [Speex-dev] TI 6xxx platform performance> To get a feel for the computational load, I am running 1 second (50frames)> of voice through the encoder.You might want to use a bit more just so you don't see the initialization complexity at all.> My profile of the 6416 indicates I'm at 27.4M cycles/channel. I need toget> below 720Mhz/32 channels = 22.5M cycles per channel. I did a little workon> inner_prod() and normalize16() and I'm confident I can get 32 channels by > optimizing 5 or 6 functions. I expect these numbers to translate over the > DM642.have you tried defining PRECISION16? That should reduce the computation cost.> A lower cost option would be to use a floating point 6713. I thought thata> 300Mhz floating point would come out even or ahead in an encoding > comparison. Instead of the 300M/32=9.3M cycles per channel that I need, I > see 71.5M cycles per channel!!!That's definitely strange. Normally, if your chip takes the same time to do a float op than it takes to do an int op, then the float version should be faster. That's because some of the float ops get replaced by several int ops.> Does this make sense? > I'm generating floating point code, using the optimizer, etc...Are you sure the compiler isn't using float emulation or something like that?> Has anyone posted DM642, C64xx or C67xx benchmarks?I'm not aware of any. Jean-Marc
Jerry, I think that just removing the FIXED_POINT define should be sufficient, though you mind want to turn off MANUAL_ALLOC, because I am not sure if the memory usage is identical for the fixed point build, and the constants in config.h are set for the fixed point build. Are you testing on the simulator, or on an eval board? It does not look like the 6713 has enough memory to hold Speex (64K vs. 1024K for the 6416), and your performance could suffer badly running from external memory. I would be very surprised if you can get below 9.3MIPs/channel for floating point, since Jean-Marc has posted raw MIPs numbers somewhere around 10 MIPs for the algorithm itself. And that is discounting the memory issues. With the C6416 you can fit the code and data for 32 channels in internal memory. If you want to post (or send me) your .pjt and .cmd files for the 6713 build, I can take a look at it in the simulator (I am using the 6415 now, though I do not need as many channels). But if you are experienced enough in this area to be doing 6416 optimizations, I am probably not telling you anything that you don't already know. That is not a task for the faint of heart. Jim Crichton ----- Original Message ----- From: "Jerry Trantow" <jtrantow@ieee.org> To: "'Jean-Marc Valin'" <jean-marc.valin@usherbrooke.ca> Cc: <speex-dev@xiph.org> Sent: Thursday, January 19, 2006 10:40 AM Subject: RE: [Speex-dev] TI 6xxx platform performance I started my project using the CodeComposerStudio speex_C64_test.pjt in speex 1.1.11.1. To build using floating point, I created a new project with the same files and modified ti\config.h to #undef FIXED_POINT. Is there a better way to configure a floating point processor? I have a few TI specific optimizations that could go into the next release. What's the procedure for submitting code? I've been working with this code for about a week now. I'm still trying to understand it all, but I'm particularly impressed by the float vs fixed flexibility of the code. Jerry J. Trantow Applied Signal Processing, Inc. jtrantow@ieee.org -----Original Message----- From: Jean-Marc Valin [mailto:jean-marc.valin@usherbrooke.ca] Sent: Thursday, January 19, 2006 1:00 AM To: Jerry Trantow Cc: speex-dev@xiph.org Subject: Re: [Speex-dev] TI 6xxx platform performance> To get a feel for the computational load, I am running 1 second (50frames)> of voice through the encoder.You might want to use a bit more just so you don't see the initialization complexity at all.> My profile of the 6416 indicates I'm at 27.4M cycles/channel. I need toget> below 720Mhz/32 channels = 22.5M cycles per channel. I did a little workon> inner_prod() and normalize16() and I'm confident I can get 32 channels by > optimizing 5 or 6 functions. I expect these numbers to translate over the > DM642.have you tried defining PRECISION16? That should reduce the computation cost.> A lower cost option would be to use a floating point 6713. I thought thata> 300Mhz floating point would come out even or ahead in an encoding > comparison. Instead of the 300M/32=9.3M cycles per channel that I need, I > see 71.5M cycles per channel!!!That's definitely strange. Normally, if your chip takes the same time to do a float op than it takes to do an int op, then the float version should be faster. That's because some of the float ops get replaced by several int ops.> Does this make sense? > I'm generating floating point code, using the optimizer, etc...Are you sure the compiler isn't using float emulation or something like that?> Has anyone posted DM642, C64xx or C67xx benchmarks?I'm not aware of any. Jean-Marc _______________________________________________ Speex-dev mailing list Speex-dev@xiph.org http://lists.xiph.org/mailman/listinfo/speex-dev
The majority of a Speex encoder app does fit in a 6713. The 6713 has 8K of L1 and another 256K of memory 64K of which can be configured as L2 cache. (16,32,48, or 64K). One level of TI's website seems to incorrectly indicate only 64K of L2. I turned off MANUAL_ALLOC and have it allocating internal memory using calloc(). I did change the L2 cache to 2 way (32K) and adjusted the heap size to 12K to get it to fit. I put a wavefile and the .cinit, .const up in the SDRAM. name origin length used attr fill ---------------------- -------- --------- -------- ---- -------- IRAMB 00000000 00000400 00000000 RWIX IRAMP 00000400 00028c00 00028218 RWIX IRAM 00029000 0000f000 00007fdc RWIX CACHE_L2 00038000 00008000 00000000 RWIX SDRAM 80000000 00800000 000181b7 RWIX I'm currently using the simulator and the SDRAM doesn't seem to be a factor. I put some test vectors up into SDRAM and when I call DSPF_sp_dotprod() I get what I expect for cycles. O(N/2)+25 The 32K L2 cache will help any SDRAM access. I initially had a compiler option wrong which was miniminizing size instead of max speed, but I'm still at 44MIPS for a single channel. I saw the 10MFLOPS number in the documentation. At first glance, a 300Mhz 67 looks under powered but a quick profile shows 50% of the cycles are concentrated in just a few functions. The 67xx is designed to execute these functions and I figured I could get a factor of two out of these DSP functions. That would bring me under the 9.3MFLOPS requirement. The DSP functions are performing as I expect: SP AutoCorrelation: (nx/2) * nr + (nr/2) * 5 + 10 - (nr * nr)/4 + nr SP FIR Filter: 4*floor((nh-1)/2)+14)*(ceil(nr/4)) + 8 SP Inner product: nx/2 + 25 But unless the whole algorithm gets down near 10MIPS, I'm going to have to go to the 64xx fixed point. Jerry J. Trantow Applied Signal Processing, Inc. jtrantow@ieee.org -----Original Message----- From: Jim Crichton [mailto:jim.crichton@comcast.net] Sent: Thursday, January 19, 2006 10:33 AM To: Jerry Trantow Cc: speex-dev@xiph.org Subject: Re: [Speex-dev] TI 6xxx platform performance Jerry, I think that just removing the FIXED_POINT define should be sufficient, though you mind want to turn off MANUAL_ALLOC, because I am not sure if the memory usage is identical for the fixed point build, and the constants in config.h are set for the fixed point build. Are you testing on the simulator, or on an eval board? It does not look like the 6713 has enough memory to hold Speex (64K vs. 1024K for the 6416), and your performance could suffer badly running from external memory. I would be very surprised if you can get below 9.3MIPs/channel for floating point, since Jean-Marc has posted raw MIPs numbers somewhere around 10 MIPs for the algorithm itself. And that is discounting the memory issues. With the C6416 you can fit the code and data for 32 channels in internal memory. If you want to post (or send me) your .pjt and .cmd files for the 6713 build, I can take a look at it in the simulator (I am using the 6415 now, though I do not need as many channels). But if you are experienced enough in this area to be doing 6416 optimizations, I am probably not telling you anything that you don't already know. That is not a task for the faint of heart. Jim Crichton ----- Original Message ----- From: "Jerry Trantow" <jtrantow@ieee.org> To: "'Jean-Marc Valin'" <jean-marc.valin@usherbrooke.ca> Cc: <speex-dev@xiph.org> Sent: Thursday, January 19, 2006 10:40 AM Subject: RE: [Speex-dev] TI 6xxx platform performance I started my project using the CodeComposerStudio speex_C64_test.pjt in speex 1.1.11.1. To build using floating point, I created a new project with the same files and modified ti\config.h to #undef FIXED_POINT. Is there a better way to configure a floating point processor? I have a few TI specific optimizations that could go into the next release. What's the procedure for submitting code? I've been working with this code for about a week now. I'm still trying to understand it all, but I'm particularly impressed by the float vs fixed flexibility of the code. Jerry J. Trantow Applied Signal Processing, Inc. jtrantow@ieee.org -----Original Message----- From: Jean-Marc Valin [mailto:jean-marc.valin@usherbrooke.ca] Sent: Thursday, January 19, 2006 1:00 AM To: Jerry Trantow Cc: speex-dev@xiph.org Subject: Re: [Speex-dev] TI 6xxx platform performance> To get a feel for the computational load, I am running 1 second (50frames)> of voice through the encoder.You might want to use a bit more just so you don't see the initialization complexity at all.> My profile of the 6416 indicates I'm at 27.4M cycles/channel. I need toget> below 720Mhz/32 channels = 22.5M cycles per channel. I did a little workon> inner_prod() and normalize16() and I'm confident I can get 32 channels by > optimizing 5 or 6 functions. I expect these numbers to translate over the > DM642.have you tried defining PRECISION16? That should reduce the computation cost.> A lower cost option would be to use a floating point 6713. I thought thata> 300Mhz floating point would come out even or ahead in an encoding > comparison. Instead of the 300M/32=9.3M cycles per channel that I need, I > see 71.5M cycles per channel!!!That's definitely strange. Normally, if your chip takes the same time to do a float op than it takes to do an int op, then the float version should be faster. That's because some of the float ops get replaced by several int ops.> Does this make sense? > I'm generating floating point code, using the optimizer, etc...Are you sure the compiler isn't using float emulation or something like that?> Has anyone posted DM642, C64xx or C67xx benchmarks?I'm not aware of any. Jean-Marc _______________________________________________ Speex-dev mailing list Speex-dev@xiph.org http://lists.xiph.org/mailman/listinfo/speex-dev