Michael Shatz
2007-Jun-21 03:51 UTC
[Speex-dev] Blackfin inline assembler and VisualDSP++ toolchain
From: Jim Crichton [mailto:jim.crichton@comcast.net] Sent: Tuesday, June 19, 2007 10:47 PM> >For TI DSPs, I used a private memory array rather than the C stack, and a >debug patch in stack_alloc.h to measure the scratch usage: > >#if 1 >extern char *spxGlobalScratchFree; >#define ALLOC(var, size, type) (var = PUSH(stack, size, type), >(spxGlobalScratchFree)=((stack)>(spxGlobalScratchFree))?(stack):(spxGlobalScratchFree)) >#else >#define ALLOC(var, size, type) var = PUSH(stack, size, type) >#endif > >I Initialized the global scratch pointer to the beginning of the scratch >area in the encoder init, and the debug macro keeps track of the max usage. >It may be too late in your work for this to be of any help. >I measured one mode at one complexity setting (15kbps, complexity=1). If you could publish data for other modes/complexities it would be appreciated.>>>> On the code size things are less rosy. >>>> The wideband indeed goes away with DISABLE_WIDEBAND but that's about >>>> all. >>>> Due to extensive use of function pointers very little unused stuff >>>> beyond wideband >>>> goes away when unused. >>> >>>Unless you NULL those pointers you don't need. Also, if you only use one >>>rate, there are tables you can get rid of as well. All the tables >>>represent about 10kB of ROM size, but you can probably reduce that to >>>2-3 kB if you only use a single narrowband mode. >> >> Nullifying the pointers means that I don't treat the code as a black box. >> Which means >> that if I upgrade to the next version of the library I'd have to reapply >> the patches. > >For those of us working on very memory constrained platforms, I don't think >that it will ever be a black box, because that would require having ENBABLE >defines for every rate and feature, so one could build up just what is >needed. That would be really messy. >I'd guess Jean-Marc would never agree to ENABLE defines because it would complicate the life for non-memory-constrained majority. But we could convince him to add DISABLE defines. I don't agree that it has to be messy. There are many heavily configurable open-source projects. Look at eCOS as just one example. By comparison, disabling individual modes in speex would be order of magnitude simpler.>You did not respond to the point about single data rate. If you are doing >this, then you can get rid of most of the tables if you fix up the >references in modes.c. It would be nice to have a README.code-reduction >file that collected some of the advice that hits the list from time to time.Yes, the final application very likely to use a single data rate. And yes, readme.code-reduction is an excellent idea.>>>> For starter, I would like DISABLE_VBR analogous to DISABLE_WIDEBAND. >>>> After that, it's probably possible to put vocoder under conditional >>>> compilation >>>> the stuff that is used only in vocoder modes. It seems that modes 3 to 7 >>>> are too >>>> similar to each other to save significant amount of code by eliminating >>>> some of them, >>>> but I have a feeling that generic mechanism for picking only those modes >>>> needed (either >>>> through conditional compilation or may be even with configuration perl >>>> script) would be >>>> simple than specific DISABLE_VOCODER. >>> >>>The problem is that there are *lots* of things like that and having an >>>option for everything would make the code a bit ugly. But they aren't >>>that hard to debug. If you don't know if a function is useful, remove it >>>and see what happens. If it succeeds in encoding one file, it will work >>>all the time. >> >> VBR is by far the biggest thing after WIDEBAND that the users are likely >> to never need or >> never want. Ant take it off efficiently requires the widest knowledge of >> internal functioning >> of the library. I think, DISABLE_VBR is a good candidate for official >> release. > >I removed vbr.c and ifdefed the references in nb_celp.c (in 8 or so places). >This is not too messy, and I could send a patch for this if Jean-Marc is >agreeable.I suggest to send a patch first. Jean-Marc always has the opportunity to reject.>>>Plus 16k 24-bit words is already 48 kB and I'm sure Speex can fit into >>>smaller than that. >> >> First, I am not sure that board had full 16K words. I said 16K because >> that's the maximal size >> allowed by ADSP-2111 architecture. >> Second, code density of Blackfin family is far superior over ADI 21xx. >> Third, I believe you that 48 KB speex on Blackfin is possible, but right >> now my code is bigger. > >With VBR and all modes but one stripped, My text+const size for the TI C55 >is about 48 KB for a standalone build. It was about 58 KB before. The >remaining source files are: > >libspeex\bits.c >libspeex\cb_search.c >libspeex\exc_10_32_table.c >libspeex\filters.c >libspeex\gain_table_lbr.c >libspeex\lpc.c >libspeex\lsp.c >libspeex\lsp_tables_nb.c >libspeex\ltp.c >libspeex\math_approx.c >libspeex\misc.c >libspeex\modes.c >libspeex\nb_celp.c >libspeex\quant_lsp.c >libspeex\speex.c >libspeex\speex_callbacks.c >libspeex\vq.c >libspeex\window.c >ti\testenc-TI-C5x.c > >My platform has 256KB of internal RAM, so this was fine for me. It does >suggest that it might be very hard for you to squeeze this in. Maybe some >Blackfin users can chime in with their memory/MIPs results. >Yes, in theory C55 and Blackfin have comparable code density. Which suggests that 32KB code is out of reach.>>>>> IIRC, gcc alone (no asm) was using something in the order of 100 MIPS >>>>> (back when it couldn't do hardware loops, MACs, cond. moves, ...), so >>>>> as >>>>> you can see, there's a fair bit of difference. So yes, with assembly >>>>> working, VDSP++ should be able to achieve better than 20 MIPS. >>>>> >>>>> Jean-Marc >>>> >>>> Not sure we are talking about the same mode. >>> >>>>This was with the 15 kbps mode used at complexity 1. >>>> >>> Jean-Marc >> >> Yes, that's the mode that I measured, with no VBR. Does 100 MIPS figure >> reflect the situation before >> or after David Rowe's improvements? > >I see around 26 MIPs for a TI C55x DSP for Quality 3 (8kbps), complexity 1, >and about 33 MIPs on a TI C64xx, with no assembly optimizations, using TI's >build tools. That is consistent with your 15kbps result. > >- JimPretty embarrassing for result for VLIW ): Yet another case of the difference between the practice and the theory. For 8kbps, complexity=1 I measured 25 MIPs for encoder + 4 MIPs for decoder.
Apparently Analagous Threads
- Blackfin inline assembler and VisualDSP++ toolchain
- Blackfin inline assembler and VisualDSP++ toolchain
- Blackfin inline assembler and VisualDSP++ toolchain
- Blackfin inline assembler and VisualDSP++ toolchain
- Blackfin inline assembler and VisualDSP++ toolchain