search for: jridg

2015 Nov 16

[Aarch64 00/11] Patches to enable Aarch64

...ssembly. I?ll submit patches for this. The inline assembly parts of my aarch64 patch set can thus be considered withdrawn. I haven?t yet tried replacing SIG2WORD16 (or silk_ADD_SAT32/silk_SUB_SAT32) with Neon intrinsics. That?s an obvious next step. On Nov 13, 2015, at 2:47 PM, John Ridges <jridges at masque.com<mailto:jridges at masque.com>> wrote: Thanks, I look forward to seeing what you find out. BTW, I was wondering if you tried replacing the SIG2WORD16 macro using the vqmovns_s32 intrinsic? I'm sure it would be faster than the C code, but in the grand scheme of things it...

2015 Nov 13

[Aarch64 00/11] Patches to enable Aarch64

...if you tried replacing the SIG2WORD16 macro using the vqmovns_s32 intrinsic? I'm sure it would be faster than the C code, but in the grand scheme of things it might not make much difference. On 11/13/2015 12:15 PM, Jonathan Lennox wrote: >> On Nov 13, 2015, at 1:51 PM, John Ridges <jridges at masque.com> wrote: >> >> Hi Jonathan, >> >> I'm sorry to bring this up again, and I don't want to beat a dead horse, but I was very surprised by your benchmarks so I took a little closer look. >> >> I think what's happening is that it's a...

[Aarch64 v2 05/18] Add Neon intrinsics for Silk noise shape quantization.

2009 Jun 30

Delays estimation in Speex algorithms

Speex tells me that the decoder is always 5 ms, but it says that the encoder is 5 ms for NB, 8.9375 ms for WB, and 10.90625 ms for UWB. Is there an extra frame of delay in the encoder that isn't otherwise accounted for? John Ridges Jean-Marc Valin wrote: > Quoting John Ridges <jridges at masque.com>: > >> I also need to know the precise delays from Speex but I used the >> SPEEX_GET_LOOKAHEAD control requests to determine them (plus the >> "speex_resampler_get_output_latency" function from the resampler). The >> returned values from th...

2015 Nov 23

[Aarch64 v2 05/18] Add Neon intrinsics for Silk noise shape quantization.

On Nov 23, 2015, at 12:04 PM, John Ridges <jridges at masque.com<mailto:jridges at masque.com>> wrote: Hi Jonathan. I really, really hate to bring this up this late in the game, but I just noticed that your NEON code doesn't use any of the "high" intrinsics for ARM64, e.g. instead of: int32x4_t coef1 = vmovl_s16(vget_hig...

2015 Nov 13

[Aarch64 00/11] Patches to enable Aarch64

.... Anyway I'll stop talking now. I'm not saying that the inline assembly isn't faster, but I don't think it's giving you as much of a gain over C as you think. --John Ridges On 11/13/2015 9:30 AM, Jonathan Lennox wrote: >> On Nov 12, 2015, at 12:23 PM, John Ridges <jridges at masque.com> wrote: >> >> One other minor thing: I notice that in the inline assembly the result (rd) is constrained as an earlyclobber operand. What was the reason for that? > Possibly an error? Probably from modeling it on macros_armv4.h, which I guess does require earlyclo...

2009 Jul 22

A technical question about the speex preprocessor.

...omething looks odd without your values (or the doc) because hypergeom_gain() > should really approach 1 as x goes to infinity. But in the end, an > approximation is probably OK because denoising is anything but an exact science > :-) > > Jean-Marc > > Quoting John Ridges <jridges at masque.com>: > > >> By my reckoning the confluent hypergoemetric functions should have the >> following values: >> >> M(-.25;1;-.5) = 1.11433 >> M(-.25;1;-1) = 1.21088 >> M(-.25;1;-1.5) = 1.29385 >> M(-.25;1;-2) = 1.36627 >> M(-.25;1;...

2009 Jul 22

A technical question about the speex preprocessor.

...table you see does not match the definition? > y = gamma(1.25)^2 * M(-.25;1;-x) / sqrt(x) > Note that the table data has an interval of .5 for the x axis. > > How far are your results from the data in the table? > > Cheers, > > Jean-Marc > > Quoting John Ridges <jridges at masque.com>: > > >> Thanks for the confirmation Jean-Marc. I kind of suspected from the >> comments that it was the confluent hypergoemetric function, which I was >> trying to evaluate using Kummer's equation, namely: >> >> M(a;b;x) is the sum fro...

2015 Nov 20

[Aarch64 00/11] Patches to enable Aarch64

> On Nov 19, 2015, at 5:47 PM, John Ridges <jridges at masque.com> wrote: > > Any speedup from the intrinsics may just be swamped by the rest of the encode/decode process. But I think you really want SIG2WORD16 to be (vqmovns_s32(PSHR32((x), SIG_SHIFT))) Yes, you?re right. I forgot to run the vectors under qemu with my previous version...

[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.

2009 Jun 30

Delays estimation in Speex algorithms

Quoting John Ridges <jridges at masque.com>: > Speex tells me that the decoder is always 5 ms, but it says that the > encoder is 5 ms for NB, 8.9375 ms for WB, and 10.90625 ms for UWB. Is > there an extra frame of delay in the encoder that isn't otherwise > accounted for? Oh, delay = frame_size + lookahe...

2015 Mar 12

[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.

Nit: in dual_inner_prod_sse, why not do both horizontal sums at the same time? As in: xsum1 = _mm_add_ps(_mm_movelh_ps(xsum1, xsum2), _mm_movehl_ps(xsum2, xsum1)); xsum1 = _mm_add_ps(xsum1, _mm_shuffle_ps(xsum1, xsum1, 0xf5)); _mm_store_ss(xy1, xsum1); _mm_store_ss(xy2, _mm_movehl_ps(xsum1, xsum1)); --John

AEC with different soundcards

2009 Jul 07

AEC with different soundcards

...ry) in those machines, and?AEC worked again. But this solution wasn?t practical (user?must install a special driver) so?i can?t say if?this method will work or not on *every* machine.? I think it will not work on Vista, no mather what card you use. Just my 2c. --- El mar, 7/7/09, John Ridges <jridges at masque.com> escribi?: De: John Ridges <jridges at masque.com> Asunto: Re: [Speex-dev] AEC with different soundcards Para: "Alexander Chemeris" <Alexander.Chemeris at sipez.com> CC: "speex-dev at xiph.org" <speex-dev at xiph.org> Fecha: martes, 7 julio...

2015 Nov 12

[Aarch64 00/11] Patches to enable Aarch64

One other minor thing: I notice that in the inline assembly the result (rd) is constrained as an earlyclobber operand. What was the reason for that?

2009 Jul 22

A technical question about the speex preprocessor.

Something looks odd without your values (or the doc) because hypergeom_gain() should really approach 1 as x goes to infinity. But in the end, an approximation is probably OK because denoising is anything but an exact science :-) Jean-Marc Quoting John Ridges <jridges at masque.com>: > By my reckoning the confluent hypergoemetric functions should have the > following values: > > M(-.25;1;-.5) = 1.11433 > M(-.25;1;-1) = 1.21088 > M(-.25;1;-1.5) = 1.29385 > M(-.25;1;-2) = 1.36627 > M(-.25;1;-2.5) = 1.43038 > M(-.25;1;-3) = 1.48784...

2015 Nov 13

[Aarch64 00/11] Patches to enable Aarch64

> On Nov 13, 2015, at 1:51 PM, John Ridges <jridges at masque.com> wrote: > > Hi Jonathan, > > I'm sorry to bring this up again, and I don't want to beat a dead horse, but I was very surprised by your benchmarks so I took a little closer look. > > I think what's happening is that it's a little unfair to comp...

2009 Jun 30

Delays estimation in Speex algorithms

JM, I also need to know the precise delays from Speex but I used the SPEEX_GET_LOOKAHEAD control requests to determine them (plus the "speex_resampler_get_output_latency" function from the resampler). The returned values from the Speex lookahead request don't seem to match with the values you gave Alexander. Am I doing this wrong? Thanks, John Ridges speex-dev-request at

2009 Jul 23

A technical question about the speex preprocessor.

...doc) because >> hypergeom_gain() >> should really approach 1 as x goes to infinity. But in the end, an >> approximation is probably OK because denoising is anything but an >> exact science >> :-) >> >> Jean-Marc >> >> Quoting John Ridges <jridges at masque.com>: >> >> >>> By my reckoning the confluent hypergoemetric functions should have the >>> following values: >>> >>> M(-.25;1;-.5) = 1.11433 >>> M(-.25;1;-1) = 1.21088 >>> M(-.25;1;-1.5) = 1.29385 >>> M(-.25...

2015 Nov 10

[Aarch64 00/11] Patches to enable Aarch64

Since you're already set up for benchmarks, I would ask if you could benchmark the difference between using and not using the ARM64 inline assembly. I believe the original justification on ARMv7 for the assembly was the processor's panoply of multiply instructions and their long cycle times. It seems to me that the ARM64 processor is much more like an x86 one, where using a

2009 Jul 22

A technical question about the speex preprocessor.

Thanks for the confirmation Jean-Marc. I kind of suspected from the comments that it was the confluent hypergoemetric function, which I was trying to evaluate using Kummer's equation, namely: M(a;b;x) is the sum from n=0 to infinity of (a)n*x^n / (b)n*n! where (a)n = a(a+1)(a+2) ... (a+n-1) But when I use Kummer's equation, I don't get the values in the "hypergeom_gain"

Question about UWB

2008 Dec 01

Question about UWB

Hi all, One question that I hope someone on the list just knows the answer to without having to delve too deeply into the code: How does UWB mode divvy up the bandwidth and pack it in the bitstream? I know from the documentation that WB mode codes the first 0-4K kHz band as a Narrowband packet, and then adds on the 4-8 kHz band coded separately (so that a NB decoder can decode a WB bitstream