Hello! I'v downloaded speex lib 1.1.11.1. I am trying to port speex lib to Blackfin processor. I am using VisualDSP++ 4.0. If I am compiling source codes with using floating point everything ok. When I am compiling with FIXED_POINT defined everything's ok and code works about two times faster. But when I am defining BFIN_ASM I am getting several compiling errors in Blackfin assembler functions. Here they are: 1) In the inline assembler functions (blackfin asm) in every loop there is a syntax error. May be I am wrong, but I can't understood what is written there(and compiler too). Example: __asm__ __volatile__ ( "I0 = %0;\n\t" "I1 = %1;\n\t" "L0 = 0;\n\t" "L1 = 0;\n\t" "LOOP tupdate%= LC0 = %3;\n\t" ^//here is the problem "LOOP_BEGIN tupdate%=;\n\t" ^//here is the same problem "R0.L = W[I0] || R1.L = W[I1++];\n\t" "R1 = (A1 = R1.L*%2.L) (IS);\n\t" "R1 >>>= 11;\n\t" "R0.L = R0.L - R1.L;\n\t" "W[I0++] = R0.L;\n\t" "LOOP_END tupdate%=;\n\t" ^//and here : : "a" (t), "a" (r), "d" (g), "a" (len) : "R0", "R1", "A1", "I0", "I1", "L0", "L1" ); So, I removed the signs '%=' from loops and this problems were solved. 2)Some errors in arithmetical operations. Example: "R0.L = R0.L - R1.L;\n\t"//compiler error //Need to be written (s)-saturate or (ns)-not saturate after operation "R0.L = R0.L - R1.L(ns);\n\t" Here is only one example. But I'v got a lot. 3)When I am trying to correct previus errors - compiler don't compile some functions with message: gnu asm requires too much Preg registers. So, did you compile your library in Visual DPS++? May be it;s some problems with versions. May be I am wrong, and I don't understand something? Please, clarify this questions. Thank you. -- demon mailto:demonb@mail.ru
> I am trying to port speex lib to Blackfin processor. > I am using VisualDSP++ 4.0.I've never used VisualDSP++ 4.0. All the development on Blackfin has been done with gcc, which may explain some problems with the inline asm. Does VisualDSP++ support a syntax close to what gcc uses (with constraints) or more like the MS compilers.> If I am compiling source codes with using floating point everything > ok. > When I am compiling with FIXED_POINT defined everything's ok and code > works about two times faster.Strange that it's *only* 2x faster...> But when I am defining BFIN_ASM I am getting several compiling errors > in Blackfin assembler functions. > Here they are: > 1) In the inline assembler functions (blackfin asm) in every loop > there is a syntax error. May be I am wrong, but I can't understood > what is written there(and compiler too). > Example: > > __asm__ __volatile__ > ( > "I0 = %0;\n\t" > "I1 = %1;\n\t" > "L0 = 0;\n\t" > "L1 = 0;\n\t" > "LOOP tupdate%= LC0 = %3;\n\t" > ^//here is the problem > "LOOP_BEGIN tupdate%=;\n\t" > ^//here is the same problem > "R0.L = W[I0] || R1.L = W[I1++];\n\t" > "R1 = (A1 = R1.L*%2.L) (IS);\n\t" > "R1 >>>= 11;\n\t" > "R0.L = R0.L - R1.L;\n\t" > "W[I0++] = R0.L;\n\t" > "LOOP_END tupdate%=;\n\t" > ^//and here > : > : "a" (t), "a" (r), "d" (g), "a" (len) > : "R0", "R1", "A1", "I0", "I1", "L0", "L1" > ); > > So, I removed the signs '%=' from loops and this problems were > solved.In the gcc syntax, %= gets replaced by something different (random) every time the asm block is evaluated. That prevents redefined symbols when the asm block is part of an inline function.> 2)Some errors in arithmetical operations. > Example: > "R0.L = R0.L - R1.L;\n\t"//compiler error > //Need to be written (s)-saturate or (ns)-not saturate after operation > "R0.L = R0.L - R1.L(ns);\n\t" > Here is only one example. But I'v got a lot.Wasn't aware that the flag was required (gnu as doesn't require it). The whole fixed-point code assumes no saturation is required, but it probably doesn't hurt to use (s) either.> 3)When I am trying to correct previus errors - compiler don't compile some functions with message: > gnu asm requires too much Preg registers.Not sure what that means. Any idea?> So, did you compile your library in Visual DPS++?No, only gcc.> May be it;s some > problems with versions. May be I am wrong, and I don't understand > something? > Please, clarify this questions.Probably a compiler issue. If you find a workaround, please let me know. Jean-Marc
I'm trying to make a design decision between a TI 6416 or DM642 (fixed point) and 6713 (floating point) platform. The application is a 32 channel speech encoder. (CBR only, 8khz, 8kbps) To get a feel for the computational load, I am running 1 second (50 frames) of voice through the encoder. My profile of the 6416 indicates I'm at 27.4M cycles/channel. I need to get below 720Mhz/32 channels = 22.5M cycles per channel. I did a little work on inner_prod() and normalize16() and I'm confident I can get 32 channels by optimizing 5 or 6 functions. I expect these numbers to translate over the DM642. Symbol Name Count cycle.Total: Incl. cycle.Total:Excl. compute_weighted_codebook 200 4511420 4511420 iir_mem2 599 3338308 3338308 filter_mem2 799 2323655 2323655 compute_impulse_response 200 1800518 1800518 pitch_gain_search_3tap 199 4726604 1744952 open_loop_nbest_pitch 199 4204121 1641016 vq_nbest 800 1626252 1626252 lpc_to_lsp 50 1612650 1558133 nb_encode 50 27412845 1179551 fir_mem2 50 1097300 1097300 inner_prod 27469 1072299 1072299 split_cb_search_shape_sign_N1200 7310588 1007711 normalize16 597 303378 303378, A lower cost option would be to use a floating point 6713. I thought that a 300Mhz floating point would come out even or ahead in an encoding comparison. Instead of the 300M/32=9.3M cycles per channel that I need, I see 71.5M cycles per channel!!! compute_weighted_codebook 200 8709029 8709029 filter_mem2 799 8322224 8322224 inner_prod 27469 5911396 5911396 vq_nbest 800 5465094 5465094 iir_mem2 599 5378906 5378906 split_cb_search_shape_sign_N1 200 18106210 3694787 compute_impulse_response 200 3084502 3084502 open_loop_nbest_pitch 199 18400309 2817913 pitch_gain_search_3tap 199 7002859 2696353 _spx_autocorr 50 2211100 2211100 lsp_to_lpc 450 2076854 2076854 nb_encode 50 71523682 1938067 fir_mem2 50 1777450 1777450 cheb_poly_eva 9634 1564172 1564172 lsp_weight_quant 100 1032600 1032600 Does this make sense? I'm generating floating point code, using the optimizer, etc... Has anyone posted DM642, C64xx or C67xx benchmarks?