> Build 11387 produces the same result as my modified build 11343. Because
of
> compiler limitations in the TI tools, I did have to make modifications to
> pseudofloat.h (separating return of float values) and nb_celp.c (adding
> braces around a variable declaration in the middle of code). I have
> attached a patch. You might prefer to do the nb_celp.c change in a
> different way.
Funny thing is I applied exactly the same patch to nb_celp.c a few hours
ago!
> Also in the pseudofloat.h is the change to EXTRACT16 which I mentioned
> earlier in the thread. The change that I made (casting as unsigned to
> prevent MSB extension) solved my problem, but maybe it would be safer to
use
> EXTEND32 here. That is, after all, the way all of the other 16 bit issues
> get solved.
I'll need to have a closer look on that one.
> Outside of what is in the patch, I ifdefed out the unused spx_fft_float and
> spx_ifft_float in fftwrap.c (I described this earlier, it prevents the need
> to include smallft.c) and commented out the memory.h and malloc.h includes
> in kiss_fft.h (the TI compiler does not have these files).
True, forgot about that one.
> When the dust
> settles, I will send a patch with additions to the TI directory for the
echo
> canceler example, including instructions on these last changes. Also, I
> would like to modify the memory allocation to follow the same structure as
> the encoder/decoder, including the ability to override the malloc function.
What do you mean here?
> Finally, there is a lingering issue. The C55 and C64 builds cancel the
echo
> similarly, but they do not produce the same results. The files match for
> the first 0x1000 bytes (2048 samples, or 16 frames). We recently sorted
out
> something like this in the encoder, but I am not sure if the expected
> results are the same here. Do you expect bit-exact results for all
> fixed-point builds? I can look into this some more, but first I need to
> figure out if I can fit the echo canceler into my application at all.
Assuming the FFT tables are exactly the same (that's the only place
using float), then I would definitely expect bit-exact results.
Jean-Marc
> - Jim
>
> ----- Original Message -----
> From: "Jean-Marc Valin" <Jean-Marc.Valin@USherbrooke.ca>
> To: "Jim Crichton" <jim.crichton@comcast.net>
> Cc: <speex-dev@xiph.org>
> Sent: Tuesday, May 09, 2006 7:36 AM
> Subject: Re: [Speex-dev] Speex echo canceller on TI C55 DSP
>
>
> > I built and ran the same test on the TI C64 simulator, and the echo
was
> > canceled nicely (about 10:1 reduction in the peak amplitude during the
> > second of two brief speech bursts). So, my problem must again be
related
> > to
> > the 16-bit processing on the C5X DSPs.
>
> Good. At least we've narrowed it down a bit.
>
> > Also, the line where it is hanging is:
> > st->power_1[i] > >
FLOAT_SHL(FLOAT_DIV32_FLOAT(MULT16_32_Q15(M_1,r),FLOAT_MUL32U(e,st->power[i]+10)),WEIGHT_SHIFT+16);
>
> Actually, I just found a 16-bit bug in FLOAT_DIV32_FLOAT. Could you
> update svn and let me know if it works?
>
> > and it is e that is in the denominator, not r (sorry for the
confusion).
> > I
> > can now run the simulations side-by-side and look for differences.
>
> Can e really go to 0? I don't see how.
>
> Jean-Marc
>
> > - Jim
> >
> > Le lundi 08 mai 2006 ? 20:05 -0400, Jim Crichton a ?crit :
> > > > I've just been made aware of these problems (look for
the thread
> > > > "speex
> > > > echo cancellation limitations"). It's on my
short-term TODO list.
> > >
> > > I saw the other thread, my problems happened in different (but
similar)
> > > routines.
> > >
> > > >> If fftwrap.c, I ifdefed out the spx_fft_float and
spx_ifft_float
> > > >> routines,
> > > >> because there were not used and required smallft.c
(which is not so
> > > >> small
> > > >> at
> > > >> all) to be added to the build.
> > > >
> > > > Right, need to cleanup that part...
> > > >
> > > >> With these changes, the link was successful, using
testecho.c with
> > > >> some
> > > >> modifications for the C55 environment. The code and
data memory
> > > >> requirements were a lot more than I had hoped
(>20kbytes of dynamic
> > > >> data
> > > >> memory for block size=128, tail length = 1024), and I
will probably
> > > >> not
> > > >> be
> > > >> able to fit it in the production build without some
trimming.
> > > >
> > > > Yes, there may be a bit of memory reduction possible here.
Of course,
> > > > decreasing the tail length is also a rather easy way.
> > > >
> > > >> When I run the build, it goes into an infinite loop in
FLOAT_DIV32
> > > >> (mdf.c
> > > >> line 660), which occurs because adapt_rate is < 0,
which happens when
> > > >> FLOAT_EXTRACT16 gets the input {0x7ff0, 0xfffb}. The
rounding is
> > > >> causing
> > > >> the result to go negative. I worked around this by
changing
> > > >
> > > > I think that was mentioned in the previous thread...
> > > >
> > > >> return (a.m+(1<<(-a.e-1)))>>-a.e;
> > > >> to
> > > >> return (((spx_uint16_t)
a.m)+(1<<(-a.e-1)))>>-a.e;
> > > >
> > > > Is that sufficient to remove all the overflows at this
place?
> > >
> > > The rounding takes the value to exactly 0x8000, and it is
followed by a
> > > right shift, so you just need to avoid the sign extension.
> > >
> > > >> I have not had time to trace this, but it looks like a
similar
> > > >> problem,
> > > >> where the result of MULT16_32_Q15(M_1,r) is negative,
and
> > > >> FLOAT_DIV32_FLOAT
> > > >> bombs. Maybe the best thing to do next is to instrument
the routines
> > > >> in
> > > >> pseudofloat.h which have loops, but I will not get to
that for a day
> > > >> or
> > > >> two.
> > > >
> > > > Yeah, r is never supposed to be negative and the float
routines assume
> > > > that.
> > >
> > > No, it was a divide by zero, as explained in my second post. I
will try
> > > a
> > > build on the C6x DSP to see if this is a 16 vs. 32-bit problem.
I sent
> > > the
> > > test files off-list.
> > >
> > > >> 1. speex_echo_state_init takes about 20M instructions,
which is a
> > > >> little
> > > >> frightening,
> > > >
> > > > That's the fft initialization that calls a lot of float
cos()
> > > > functions.
> > > > If you have a fixed version of cos() you can use it there,
otherwise a
> > > > fixed table (for a certain size) would work.
> > > >
> > > >> and the calls to speex_echo_cancel take about 630K
instructions
> > > >> for 128 samples. Given your recent experience with the
fixed point
> > > >> canceller, does this sound rational? The MIPs for the
canceller are
> > > >> similar
> > > >> to the MIPs for the encoder running 8kbps, complexity 1.
> > > >
> > > > The order of magnitude seems right. It may be possible to
reduce that
> > > > a
> > > > bit, though. If you have an optimized FFT, you could replace
kiss_fft
> > > > with it and get a big improvement right there.
> > >
> > > Yeah, but then I have to try to actually understand the
algorithm. I am
> > > not
> > > sure that those brain cells are still alive.
> > >
> > > >> 2. The testecho example uses a frame length and tail
size that are
> > > >> powers
> > > >> of two (128, 1024). Are there any implications to using
sizes which
> > > >> are
> > > >> not
> > > >> powers of two? It would be most convenient to use the
encoder frame
> > > >> size
> > > >> (160), and some multiple of that for the tail size. How
does the
> > > >> frame
> > > >> size
> > > >> affect performance (I understand that the tail length
determines what
> > > >> echo
> > > >> signals are cancelable)?
> > > >
> > > > Non powers of two will be a bit slower because of the FFT,
but that's
> > > > all. I made sure the echo canceller works with 160,
precisely because
> > > > it's the frame size used by Speex. Note that I don't
recommend using
> > > > frames more than 20 ms long (at any sampling rate).
> > > >
> > > >> 3. Do you have any suggestions for code/data memory
reduction for
> > > >> the
> > > >> canceller, other than to make the tail length no longer
than
> > > >> necessary
> > > >> (this
> > > >> is a line echo canceller for a local phone, so I should
be able to
> > > >> keep
> > > >> it
> > > >> to 40ms). I was surprised by the size of the FFT code,
but I guess
> > > >> that
> > > >> it
> > > >> is doing much more than the radix2 version in the TI
library.
> > > >
> > > > The FFT code has more than just the radix two, so you can
save there.
> > > > It
> > > > wasn't meant to be an optimized FFT, so if TI supplies
you with one,
> > > > it's probably a good idea to use it (that's what
fft_wrap is for).
> > > > Also,
> > > > given that the memory use is almost directly proportional to
the tail
> > > > length, reducing that one to 40 ms will make a huge
difference.
> > >
> > > Thanks for the advice.
> > >
> > > - Jim
> > >
> > >
> > >
> >
> >
> >
> _______________________________________________
> Speex-dev mailing list
> Speex-dev@xiph.org
> http://lists.xiph.org/mailman/listinfo/speex-dev