search for: a_q28

Displaying 10 results from an estimated 10 matches for "a_q28".

Did you mean: b_q28
2017 Apr 26
2
2 patches related to silk_biquad_alt() optimization
On Tue, Apr 25, 2017 at 10:31 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > > > A_Q28 is split to 2 14-bit (or 16-bit, whatever) integers, to make the > > multiplication operation within 32-bits. NEON can do 32-bit x 32-bit = > > 64-bit using 'int64x2_t vmull_s32(int32x2_t a, int32x2_t b)', and it > > could possibly be faster and less rounding/shifting erro...
2017 May 15
2
2 patches related to silk_biquad_alt() optimization
..., Apr 26, 2017 at 2:15 PM, Linfeng Zhang <linfengz at google.com > <mailto:linfengz at google.com>> wrote: > > On Tue, Apr 25, 2017 at 10:31 PM, Jean-Marc Valin > <jmvalin at jmvalin.ca <mailto:jmvalin at jmvalin.ca>> wrote: > > > > A_Q28 is split to 2 14-bit (or 16-bit, whatever) integers, to make the > > multiplication operation within 32-bits. NEON can do 32-bit x 32-bit = > > 64-bit using 'int64x2_t vmull_s32(int32x2_t a, int32x2_t b)', and it > > could possibly be faster and les...
2017 Apr 25
2
2 patches related to silk_biquad_alt() optimization
...ovides no improvement over C, let's > stick with C. And if you manage to write C code that has the same > performance as the Neon code, then that would also be better (both > easier to maintain and more portable). > Will do. > > > If it's allowed to skip the split of A_Q28 and replace by 32-bit > > multiplication (result is 64-bit), probably it could be faster on NEON. > > This may change the encoder results because of different order of > > adding, shifting and rounding. > > I'm not sure what you mean for that. > /* Negate A_Q28 v...
2017 May 08
0
2 patches related to silk_biquad_alt() optimization
Ping for comments. Thanks, Linfeng On Wed, Apr 26, 2017 at 2:15 PM, Linfeng Zhang <linfengz at google.com> wrote: > On Tue, Apr 25, 2017 at 10:31 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> > wrote: > >> >> > A_Q28 is split to 2 14-bit (or 16-bit, whatever) integers, to make the >> > multiplication operation within 32-bits. NEON can do 32-bit x 32-bit = >> > 64-bit using 'int64x2_t vmull_s32(int32x2_t a, int32x2_t b)', and it >> > could possibly be faster and less rounding/s...
2017 May 17
0
2 patches related to silk_biquad_alt() optimization
...ng Zhang <linfengz at google.com > > <mailto:linfengz at google.com>> wrote: > > > > On Tue, Apr 25, 2017 at 10:31 PM, Jean-Marc Valin > > <jmvalin at jmvalin.ca <mailto:jmvalin at jmvalin.ca>> wrote: > > > > > > > A_Q28 is split to 2 14-bit (or 16-bit, whatever) integers, to > make the > > > multiplication operation within 32-bits. NEON can do 32-bit x > 32-bit = > > > 64-bit using 'int64x2_t vmull_s32(int32x2_t a, int32x2_t b)', > and it > > > co...
2017 Apr 25
2
2 patches related to silk_biquad_alt() optimization
...(channel) == 1, the optimization has no gain compared with C function. When stride (channel) == 2, the optimization is 1.2%-1.8% faster (1.6% at Complexity 8) compared with C function. Please let me know and I can remove the optimization of stride 1 case. If it's allowed to skip the split of A_Q28 and replace by 32-bit multiplication (result is 64-bit), probably it could be faster on NEON. This may change the encoder results because of different order of adding, shifting and rounding. Thanks, Linfeng On Wed, Apr 19, 2017 at 10:23 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: &...
2017 Apr 25
0
2 patches related to silk_biquad_alt() optimization
.... Yeah, if there's Neon code that provides no improvement over C, let's stick with C. And if you manage to write C code that has the same performance as the Neon code, then that would also be better (both easier to maintain and more portable). > If it's allowed to skip the split of A_Q28 and replace by 32-bit > multiplication (result is 64-bit), probably it could be faster on NEON. > This may change the encoder results because of different order of > adding, shifting and rounding. I'm not sure what you mean for that. Jean-Marc > Thanks, > Linfeng > >...
2017 Apr 26
0
2 patches related to silk_biquad_alt() optimization
...> channels in the same loop in C, and then additional 0.8% faster using NEON. Considering that the function isn't huge, I'm OK in principle adding some Neon to gain 0.8%. It would just be good to check that the 0.8% indeed comes from Neon as opposed to just unrolling the channels. > A_Q28 is split to 2 14-bit (or 16-bit, whatever) integers, to make the > multiplication operation within 32-bits. NEON can do 32-bit x 32-bit = > 64-bit using 'int64x2_t vmull_s32(int32x2_t a, int32x2_t b)', and it > could possibly be faster and less rounding/shifting errors than above C...
2017 Apr 19
4
2 patches related to silk_biquad_alt() optimization
Hi, Attached are 2 patches related to silk_biquad_alt() optimization. Please review. Thanks, Linfeng Zhang -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170419/f08f5030/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name:
2012 Sep 10
11
Cleanup/build improvement for opus
Hello all, after FOMS I decided to take a look at the opus library and I found that I could improve a bit the build system and cleanup the code a little bit. Most of the changes to the code has been suggested by my two tools cowstats and missingstatic (part of the ruby-elf gem if you care). HTH, Diego