Andrew Lentvorski
2011-Apr-18 19:52 UTC
[CELT-dev] CELT grabbing 100KB of memory right off the top
Is there a particular reason why CELT grabs 100KB of stack immediately? Is that really required or can that be trimmed down some/a lot? -a
Timothy B. Terriberry
2011-Apr-18 19:57 UTC
[CELT-dev] CELT grabbing 100KB of memory right off the top
> Is there a particular reason why CELT grabs 100KB of stack immediately?You'd be much better off using C99 vararrays, or alloca, if either of those are available, rather than the global stack (which is purely there as a fallback).> Is that really required or can that be trimmed down some/a lot?I don't think anywhere near 100 kB is required. The exact maximum depends on the mode/frame size, but we haven't gone through and measured the worst case lately. Even that could probably be substantially reduced: for example the MDCT uses a lot of temporary buffers, when it could be made to operate entirely in-place, if necessary.
Gregory Maxwell
2011-Apr-18 19:59 UTC
[CELT-dev] CELT grabbing 100KB of memory right off the top
On Mon, Apr 18, 2011 at 3:52 PM, Andrew Lentvorski <bsder at allcaps.org> wrote:> Is there a particular reason why CELT grabs 100KB of stack immediately? > Is that really required or can that be trimmed down some/a lot?No, it's not a requirement. The actual memory usage should be much lower than that. This only happens in any case if libcelt is compiled in pseudostack mode. If you instead compile for use with C99 var arrays (or alloca) then it doesn't do that. I also think that there are a number of places where the peak stack usage is much more than strictly required, The MDCT comes to mind.
Jean-Marc Valin
2011-Apr-19 10:17 UTC
[CELT-dev] CELT grabbing 100KB of memory right off the top
On 11-04-19 01:35 AM, Andrew Lentvorski wrote:>> Lowering the "complexity setting" >> CELT_SET_COMPLEXITY should definitely help at the expense of lowering >> quality. I suggest trying complexity 4, which disables the pitch search >> for the prefilter. > > I see no visible change moving from 5 to 4. Even moving the whole way to > 0 only gains about 10% (3.46ms per loop vs. 3.14ms per loop)Right, because the post-filter must be disabled already. So I recommend keeping the default and not touching anything for that one (the quality drop is probably not worth the lower CPU).>>> I already did a quick profile of the fixed point encoder and it wasn't >>> doing anything obviously stupid. There were 4 hotspots at 15%, 8%, 7%, >>> and 5% respectively. The 15% involved ilog. >> >> ilog is an easy one. Most platforms have a hardware instruction to >> compute that, so just using it should make that 15% go away. > > Yeah, that was straightforward to buy almost 10%.Good :-) Jean-Marc