Hi,
I'm quite surprised that doing a*b*c is faster than doing MULT16_16(a,
MULT16_16_16(b, c)) on ARM. probably your compiler doesn't realise that
it can ignore pretty much all the casts. Are you using some crappy MS
compiler by any chance (I think gcc is usually smart enough for that)?
In any case, the workaround would be to override MULT16_16 and
MULT16_16_16 so just do;
#define MULT16_16(a,b) ((a)*(b))
#define MULT16_16_16(a,b) ((a)*(b))
I think that should solve the problem. Let me know whether that works
(and doesn't have undesirable side effects), and what compiler you're
using.
Jean-Marc
Eliso a ?crit :> Hello
>
>
>
> I'm testing SPEEX on embedded board using ARM7 (Atmel). ARM7 don't
have
> floating point so I'm using FIXED_POINT. Unfortunately the encoding
speed is
> about 5 times slower then necessary for real time.
>
> ARM7 is slow for 16/8 bits operations.
>
> The sequence:
>
>
>
> static inline spx_word32_t compute_pitch_error(spx_word16_t *C,
spx_word16_t
> *g, spx_word16_t pitch_control)
>
> {
>
> spx_word32_t sum = 0;
>
> sum = ADD32(sum,MULT16_16(MULT16_16_16(g[0],pitch_control),C[0]));
>
> sum = ADD32(sum,MULT16_16(MULT16_16_16(g[1],pitch_control),C[1]));
>
> sum = ADD32(sum,MULT16_16(MULT16_16_16(g[2],pitch_control),C[2]));
>
> sum = SUB32(sum,MULT16_16(MULT16_16_16(g[0],g[1]),C[3]));
>
> sum = SUB32(sum,MULT16_16(MULT16_16_16(g[2],g[1]),C[4]));
>
> sum = SUB32(sum,MULT16_16(MULT16_16_16(g[2],g[0]),C[5]));
>
> sum = SUB32(sum,MULT16_16(MULT16_16_16(g[0],g[0]),C[6]));
>
> sum = SUB32(sum,MULT16_16(MULT16_16_16(g[1],g[1]),C[7]));
>
> sum = SUB32(sum,MULT16_16(MULT16_16_16(g[2],g[2]),C[8]));
>
> return sum;
>
> }
>
>
>
> is about 30 times slower than similar operation using 32 bits (int) below.
>
>
>
> static inline long compute_pitch_errorL(int *C, int *g, int pitch_control)
>
> {
>
> spx_word32_t sum=0;
>
> sum+=g[0] * pitch_control * C[0]; //
> ADD32(sum,MULT16_16(MULT16_16_16(g[0],pitch_control),C[0]));
>
> sum+=g[1] * pitch_control * C[1]; //
> ADD32(sum,MULT16_16(MULT16_16_16(g[1],pitch_control),C[1]));
>
> sum+=g[2] * pitch_control * C[2]; //
> ADD32(sum,MULT16_16(MULT16_16_16(g[2],pitch_control),C[2]));
>
> sum-=g[0] * g[1] * C[3];
> //SUB32(sum,MULT16_16(MULT16_16_16(g[0],g[1]),C[3]));
>
> sum-=g[2] * g[1] * C[4];
> //SUB32(sum,MULT16_16(MULT16_16_16(g[2],g[1]),C[4]));
>
> sum-=g[2] * g[0] * C[5]; //
> SUB32(sum,MULT16_16(MULT16_16_16(g[2],g[0]),C[5]));
>
> sum-=g[0] * g[0] * C[6]; //
> SUB32(sum,MULT16_16(MULT16_16_16(g[0],g[0]),C[6]));
>
> sum-=g[1] * g[1] * C[7]; //
> SUB32(sum,MULT16_16(MULT16_16_16(g[1],g[1]),C[7]));
>
> sum-=g[2] * g[2] * C[8]; //
> SUB32(sum,MULT16_16(MULT16_16_16(g[2],g[2]),C[8]));
>
> return sum;
>
> }
>
> Not use 16 bits seem to be a possible solution. I'd like to know if
there is
> an option to execute this way or if the algorithm relay on 16 bit operation
> and cannot easily converted to 32 bits.
>
>
>
> Best regards
>
>
>
> Eliso Cavalli
>
>
>
>
>
>
>
>
>
> Planeta Informatica Ltda.Rua Roxo Moreira, 1178,
>
> Campinas/SP/BRASIL.
>
> CEP 13083-591.
>
> phone: +55 19 32897755
>
> fax: +55 19 32491717
>
> <mailto:eliso@planeta.inf.br>
>
>
>
<file:///C:\Documents%20and%20Settings\Administrador.PLANETA\Dados%20de%20ap
> licativos\Microsoft\Signatures\www.planeta.inf.br>
>
>
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Speex-dev mailing list
> Speex-dev@xiph.org
> http://lists.xiph.org/mailman/listinfo/speex-dev