>> There is a bit of work remaining to get the memory usage down for a >> multichannel application. There have been some good posts over the >> last couple of months about reducing memory usage. > > I think 1.1.8 incorporates all memory reductions proposed. Let me know > otherwise.For the persistent storage, the only change that I have made is to MAX_CHARS_PER_FRAME, which is set to 2000 in bits.c. I changed bits.c to set this value only if it was not already defined, and then put my own, much smaller value in config.h. For the scratch stack, I replace the fixed values in nb_encoder_init and nb_decoder_init with constants that I defined in config.h. Jamey Hicks original C5x patch had some test code in stack_alloc.h to detect working stack overflow. Maybe something similar could be done to measure the peak stack usage, enabled by a debug switch. Then, for a space critical application, it would be easy to measure the stack requirement for a given operating mode, and set the size (manually) accordingly.>> Also, to nominally comply with the TI XDAIS algorithm standard, it is >> necessary to extract all of the memory allocation from the code, >> organize it into blocks, and provide a table to the application host >> with the size and scratch/persistent nature of each block. The host >> then does the memory allocating, and provides the pointers back to the >> application. > > I'm not familiar with XDAIS, but I would think you could just overload > the speex_alloc() and speex_free() functions, right?According to this standard, an allocate call is made to an algorithm, and the algorithm fills in a table of required blocks (size, alignment, and scratch/persistent type). The system allocates these blocks, and calls the algorithm init function, with the same memory table, now including the base addresses. Now, these addresses have to get into Speex somehow. Since I did not want to change the API, I have resorted to the kludge of declaring global variables, which I initialize based on the allocated memory blocks. My alloc routines then look at the global variables, similar to the way calloc works. This does not solve the problem of distinguishing persistent and scratch storage. To do this, I added a speex_alloc_scratch routine, which uses a different memory block than speex_alloc. This does force a change to nb_encoder_init, etc. At the moment, the code looks like this: #if defined(VAR_ARRAYS) || defined (USE_ALLOCA) st = (EncState*)speex_alloc(sizeof(EncState)); if (!st) return NULL; st->stack = NULL; #elif defined(SCRATCH_ALLOC) st = (EncState*)speex_alloc(sizeof(EncState)); if (!st) return NULL; st->stack = (char*)speex_alloc_scratch(SPEEXENC_SCRATCH_STACK_SIZE); #else st = (EncState*)speex_alloc(sizeof(EncState)+8000*sizeof(spx_sig_t)); if (!st) return NULL; st->stack = ((char*)st) + sizeof(EncState); #endif Note that I also moved the "if (!st)" check to before st-stack is set, since a write to a bad location would occur otherwise.>> Question 1: Is there anything wrong with using a 32-bit float for >> spx_word64_t (other than MIPs)? This type is used only in two places >> in ltp.c. > > No problem replacing with a float. The reason for the 64 bits is not the > precision but only the range. A 40-bit accumulator would work too. > Eventually, this could probably made to fit in a 32-bit int, but I > haven't done that yet.The C55x uses a 40-bit long long (as Stuart Cording pointed out), so this should be fine here.>> 3. And, of course, the internal stack memory allocations in >> nb_encoder_int and nb_decoder_init had to be cut down to fit within >> the available data memory space. It would be useful to parameterize >> the working stack allocation size for those folks who cannot use the >> new VAR_ARRAYS and USE_ALLOCA stuff. > > Would a compile-time option be OK (so I don't need to change the API)? > If so, I'll put that on the TODO list.I am using a compile option, as shown above.>> With this change, the codec ran, but the encoded data is garbage. >> Eventually I realized that because the char size on the C5x is 16 >> bits, the fread and fwrite routines are using only the least >> significant 8 bits of each word. A little packing and unpacking >> later, the encoder/decoder loop was producing intelligible sound. >> However, there are some some anomalies. Using the sample file >> male.wav, the output has a positive step at 0.1 sec (rapid ramp from 0 >> to ~20000 sample value, with decay back to zero by time 0.112 sec), >> another positive step at 2.940 sec (amplitude about 3000, decaying in >> 12 ms again), and a rail-to-rail impulse at 4.600 sec (also decaying >> within a few msec). This is a simulator, so there are no "real world" >> effects at play. The C6x simulation does not show the artifacts. The >> encoded bits are the same for the first frame, but then they diverge. > > That's odd, definitely worth investigating.Stuart Cordings change to replace the math macros with inline functions cures the problem. I will continue to look at this. - Jim Crichton
Jean-Marc Valin
2005-May-25 14:25 UTC
[Speex-dev] Speex on TI C6x, Problem with TI C5x Patch
> For the persistent storage, the only change that I have made is to > MAX_CHARS_PER_FRAME, which is set to 2000 in bits.c. I changed bits.c to > set this value only if it was not already defined, and then put my own, much > smaller value in config.h.Yeah, I think I'll add an option like that.> For the scratch stack, I replace the fixed values in nb_encoder_init and > nb_decoder_init with constants that I defined in config.h. Jamey Hicks > original C5x patch had some test code in stack_alloc.h to detect working > stack overflow. Maybe something similar could be done to measure the peak > stack usage, enabled by a debug switch. Then, for a space critical > application, it would be easy to measure the stack requirement for a given > operating mode, and set the size (manually) accordingly.I'm not sure how you could detect that without carrying the stack limit everywhere though. What I would suggest instead is just to fill the stack with 0xdeadbeef and then look up to what address you have it overwritten.> According to this standard, an allocate call is made to an algorithm, and > the algorithm fills in a table of required blocks (size, alignment, and > scratch/persistent type). The system allocates these blocks, and calls the > algorithm init function, with the same memory table, now including the base > addresses. Now, these addresses have to get into Speex somehow. Since I > did not want to change the API, I have resorted to the kludge of declaring > global variables, which I initialize based on the allocated memory blocks. > My alloc routines then look at the global variables, similar to the way > calloc works.What I would suggest is to have a c55-specific function that allocates enough memory for everything and then overload speex_alloc to return pointers to that area (in a way similar to the way my pseudo-stack works).> This does not solve the problem of distinguishing persistent and scratch > storage. To do this, I added a speex_alloc_scratch routine, which uses a > different memory block than speex_alloc. This does force a change to > nb_encoder_init, etc. At the moment, the code looks like this:I'm not sure how your environment defines scratch space. What's the difference?> Note that I also moved the "if (!st)" check to before st-stack is set, since > a write to a bad location would occur otherwise.Makes sense. Jean-Marc -- Jean-Marc Valin <Jean-Marc.Valin@USherbrooke.ca> Universite de Sherbrooke
Jean-Marc Valin
2005-May-25 23:52 UTC
[Speex-dev] Speex on TI C6x, Problem with TI C5x Patch
> For the persistent storage, the only change that I have made is to > MAX_CHARS_PER_FRAME, which is set to 2000 in bits.c. I changed bits.c to > set this value only if it was not already defined, and then put my own, much > smaller value in config.h.Actually, I just remembered that you don't even need to redefine MAX_CHARS_PER_FRAME. All you have to do is use the speex_bits_init_buffer() call, which allows you to explicitly tell Speex where you want the data copied.> For the scratch stack, I replace the fixed values in nb_encoder_init and > nb_decoder_init with constants that I defined in config.h. Jamey Hicks > original C5x patch had some test code in stack_alloc.h to detect working > stack overflow. Maybe something similar could be done to measure the peak > stack usage, enabled by a debug switch. Then, for a space critical > application, it would be easy to measure the stack requirement for a given > operating mode, and set the size (manually) accordingly.Regarding the stack, I have just added macros that allow you to set it: NB_ENC_STACK, NB_DEC_STACK, SB_ENC_STACK, SB_DEC_STACK. Jean-Marc -- Jean-Marc Valin <Jean-Marc.Valin@USherbrooke.ca> Universit? de Sherbrooke
>> For the scratch stack, I replace the fixed values in nb_encoder_init and >> nb_decoder_init with constants that I defined in config.h. Jamey Hicks >> original C5x patch had some test code in stack_alloc.h to detect working >> stack overflow. Maybe something similar could be done to measure the >> peak >> stack usage, enabled by a debug switch. Then, for a space critical >> application, it would be easy to measure the stack requirement for a >> given >> operating mode, and set the size (manually) accordingly. > > I'm not sure how you could detect that without carrying the stack limit > everywhere though. What I would suggest instead is just to fill the > stack with 0xdeadbeef and then look up to what address you have it > overwritten.I was talking about adding some peak detect side effect to the ALLOC macro. But, of course, painting the stack is a lot simpler, and can be done with no performance penalty.>> According to this standard, an allocate call is made to an algorithm, and >> the algorithm fills in a table of required blocks (size, alignment, and >> scratch/persistent type). The system allocates these blocks, and calls >> the >> algorithm init function, with the same memory table, now including the >> base >> addresses. Now, these addresses have to get into Speex somehow. Since I >> did not want to change the API, I have resorted to the kludge of >> declaring >> global variables, which I initialize based on the allocated memory >> blocks. >> My alloc routines then look at the global variables, similar to the way >> calloc works. > > What I would suggest is to have a c55-specific function that allocates > enough memory for everything and then overload speex_alloc to return > pointers to that area (in a way similar to the way my pseudo-stack > works).Yes, that is what I am doing. The global variables to which I referred are pointers to the next free location and to the end (for overflow detection, debug only) of the pre-allocated block. Realloc and Free do not do anything, because the whole block gets freed outside of Speex when an instance (for multichannel) of the algorithm is destroyed.>> This does not solve the problem of distinguishing persistent and scratch >> storage. To do this, I added a speex_alloc_scratch routine, which uses a >> different memory block than speex_alloc. This does force a change to >> nb_encoder_init, etc. At the moment, the code looks like this: > > I'm not sure how your environment defines scratch space. What's the > difference?Scratch space does not need to be preserved between calls. In an environment with multiple algorithms, or multiple instances of an algorithm, this space can be reused, as long as there is no preemption (each algorithm runs to completion). On a DSP like the C64x (up to 1000 MIPs now), memory can be more of a limiting factor than MIPs, in determining the number of channels which can be supported. An alternative is to fall back to the C-stack, but that makes debugging more difficult, and it is less efficient if there are other algorithms present which do allocate scratch blocks in the XDAIS way (because that scratch space could be used for Speex). - Jim Crichton