thr3ads.net - search: "float32

Displaying 17 results from an estimated 17 matches for "float32_t".

Did you mean: float32x4_t

[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Dec 19

[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

...If there were a measurable performance advantage to the switch statement, that would change my opinion, but if there isn't, then either of the other two approaches seem better. One more nit I hadn't noticed before: > + float *xi = x; > + float *yi = y; These need to be const float32_t (in both xcorr_kernel_neon_float and xcorr_kernel_neon_float_process1). They're currently causing a ton of warning spew. float32_t appears to not be considered equivalent to float, which means you'll also need casts here: > + vst1q_f32(sum, SUMM); and here: > + vst1_lane_f32...

[PATCH 12/15] Replace call of celt_inner_prod_c() (step 1)

2016 Sep 13

[PATCH 12/15] Replace call of celt_inner_prod_c() (step 1)

Should call celt_inner_prod(). --- celt/bands.c | 7 ++++--- celt/bands.h | 2 +- celt/celt_encoder.c | 6 +++--- celt/pitch.c | 2 +- src/opus_multistream_encoder.c | 2 +- 5 files changed, 10 insertions(+), 9 deletions(-) diff --git a/celt/bands.c b/celt/bands.c index bbe8a4c..1ab24aa 100644 --- a/celt/bands.c +++ b/celt/bands.c

[RFC PATCH v1 2/2] armv7(float): Optimize encode usecase using NE10 library

2015 Jan 29

[RFC PATCH v1 2/2] armv7(float): Optimize encode usecase using NE10 library

...t;nfft); > + if (st->priv == NULL) { > + printf("Unable to ne10 alloc\n"); Absolutely no printfs in library code. > + return -1; > + } > + return 0; > +} > + > +void opus_fft_free_arm_float_neon(kiss_fft_state *st) > +{ > + ne10_fft_cfg_float32_t cfg = (ne10_fft_cfg_float32_t)st->priv; > + > + if (cfg) > + free((void *)cfg); This concerns me for several reasons: 1) We never call free() directly in libopus. It is always wrapped in the opus_free() macro to allow ports to override it (and as a debugging tool). 2) We didn...

[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Dec 19

[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

...er > of the other two approaches seem better. OK, thanks for your explanation. I will take a closer look at this part of celt_pitch_xcorr_arm.s > > > One more nit I hadn't noticed before: > >> + float *xi = x; >> + float *yi = y; > > These need to be const float32_t (in both xcorr_kernel_neon_float and > xcorr_kernel_neon_float_process1). They're currently causing a ton of > warning spew. float32_t appears to not be considered equivalent to > float, which means you'll also need casts here: > >> + vst1q_f32(sum, SUMM); > > and...

[RFC PATCH v1 2/2] armv7(float): Optimize encode usecase using NE10 library

2015 Jan 29

[RFC PATCH v1 2/2] armv7(float): Optimize encode usecase using NE10 library

...of developers/maintainers of opus. Please correct me if any of information below is inaccurate from a libNE10 change request perspective. Phil, Below are 2 main things we need to change from NE10 Library side. Can you please make these changes on NE10 side? 1. Remove usage of _t in ne10_fft_cfg_float32_t and ne10_fft_cpx_float32_t -------from comment---- ne10_fft_cfg_float32_t cfg = (ne10_fft_cfg_float32_t)st->priv; + VARDECL(ne10_fft_cpx_float32_t, temp); + VARDECL(ne10_fft_cpx_float32_t, tempin); Just another note on API design... the _t suffix is reserved by POSIX, and should never be us...

[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Dec 18

[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

Almost there... just a few nits left. Viswanath Puttagunta wrote: > +if OPUS_ARM_NEON_INTR > +CELT_SOURCES += $(CELT_SOURCES_ARM_NEON_INTR) > +OPUS_ARM_NEON_INTR_CPPFLAGS = -mfpu=neon -O3 I'll repeat: I don't think you should change the optimization level here. > + /* Just unroll the rest of the loop */ I saw you decided to keep this unrolled, but you didn't actually

[RFC V3 5/8] aarch64: celt_pitch_xcorr: Fixed point intrinsics

2015 May 15

[RFC V3 5/8] aarch64: celt_pitch_xcorr: Fixed point intrinsics

...284 insertions(+) diff --git a/celt/arm/celt_neon_intr.c b/celt/arm/celt_neon_intr.c index 47dce15..be978a0 100644 --- a/celt/arm/celt_neon_intr.c +++ b/celt/arm/celt_neon_intr.c @@ -249,4 +249,272 @@ void celt_pitch_xcorr_float_neon(const opus_val16 *_x, const opus_val16 *_y, (const float32_t *)_y+i, (float32_t *)xcorr+i, len); } } +#else /* FIXED POINT */ + +/* + * Function: xcorr_kernel_neon_fixed + * --------------------------------- + * Computes 8 correlation values and stores them in sum[8] + */ +static void xcorr_kernel_neon_fixed(const int16_t *x, const int16_t *y, +...

[[RFC PATCH v2]: Ne10 fft fixed and previous 5/8] aarch64: celt_pitch_xcorr: Fixed point intrinsics

2015 May 08

[[RFC PATCH v2]: Ne10 fft fixed and previous 5/8] aarch64: celt_pitch_xcorr: Fixed point intrinsics

[RFC PATCH v1 0/2] Encode optimize using libNE10

2015 Jan 20

[RFC PATCH v1 0/2] Encode optimize using libNE10

Hello opus-dev, I've been cooking up this patchset to integrate NE10 library into opus. Current patchset focuses on encode use case mainly effecting performance of clt_mdct_forward() and opus_fft() (for float only) Glad to report the following on Encode use case: (Measured on my Beaglebone Black Cortex-A8 board) - Performance improvement for encode use case ~= 12.34% (Based on time -p

[PATCH v1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

2014 Dec 19

[PATCH v1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics

On 19 December 2014 at 17:25, Viswanath Puttagunta <viswanath.puttagunta at linaro.org> wrote: > Optimize celt_pitch_xcorr function (for floating point) > using ARM NEON intrinsics for SoCs that have NEON VFP unit. > > To enable this optimization, use --enable-intrinsics > configure option. > > Compile time and runtime checks are also supported to make sure > this

RFC: SIMD math-function library

2016 Jul 13

RFC: SIMD math-function library

Dear LLVM contributors, I am Naoki Shibata, an associate professor at Nara Institute of Science and Technology. I and Hal Finkel would like to jointly propose to add my vectorized math library to LLVM. The library has been available as public domain software for years, I am going to double-license the library if necessary. ******** Below is a proposal to add my vectorized math library,

[[RFC PATCH v2]: Ne10 fft fixed and previous 1/8] armv7(float): Optimize encode usecase using NE10 library

2015 May 08

[[RFC PATCH v2]: Ne10 fft fixed and previous 1/8] armv7(float): Optimize encode usecase using NE10 library

...>arch_fft->is_supported = 1; + st->arch_fft->priv = (void *)ne10_fft_alloc_c2c_float32_neon(st->nfft); + if (st->arch_fft->priv == NULL) { + return -1; + } + } + return 0; +} + +void opus_fft_free_arm_float_neon(kiss_fft_state *st) +{ + ne10_fft_cfg_float32_t cfg; + + if (!st->arch_fft) + return; + + cfg = (ne10_fft_cfg_float32_t)st->arch_fft->priv; + if (cfg) + ne10_fft_destroy_c2c_float32(cfg); + opus_free(st->arch_fft); +} +#endif +void opus_fft_float_neon(const kiss_fft_state *st, + const kiss_ff...

[RFC PATCH v1 0/5] aarch64: celt_pitch_xcorr: Fixed point series

2015 Mar 31

[RFC PATCH v1 0/5] aarch64: celt_pitch_xcorr: Fixed point series

Hi Timothy, As I mentioned earlier [1], I now fixed compile issues with fixed point and resubmitting the patch. I also have new patch that does intrinsics optimizations for celt_pitch_xcorr targetting aarch64. You can find my latest work-in-progress branch at [2] For reference, you can use the Ne10 pre-built libraries at [3] Note that I am working with Phil at ARM to get my patch at [4]

[RFC PATCH v2]: Ne10 fft fixed and previous 0/8]

2015 May 08

[RFC PATCH v2]: Ne10 fft fixed and previous 0/8]

Hi All, As per Timothy's suggestion, disabling mdct_forward for fixed point. Only effects armv7,armv8: Extend fixed fft NE10 optimizations to mdct Rest of patches are same as in [1] For reference, latest wip code for opus is at [2] Still working with NE10 team at ARM to get corner cases of mdct_forward. Will update with another patch when issue in NE10 gets fixed. Regards, Vish [1]:

[RFC V3 0/8] Ne10 fft fixed and previous

2015 May 15

[RFC V3 0/8] Ne10 fft fixed and previous

Hi All, Changes from RFC v2 [1] armv7,armv8: Extend fixed fft NE10 optimizations to mdct - Overflow issue fixed by Phil at ARM. Ne10 wip at [2]. Should be upstream soon. - So, re-enabled using fixed fft for mdct_forward which was disabled in RFCv2 armv7,armv8: Optimize fixed point fft using NE10 library - Thanks to Jonathan Lennox, fixed some build fixes on iOS and some copy-paste errors Rest

[RFC PATCH v1 0/8] Ne10 fft fixed and previous

2015 Apr 28

[RFC PATCH v1 0/8] Ne10 fft fixed and previous

Hello Timothy / Jean-Marc / opus-dev, This patch series is follow up on work I posted on [1]. In addition to what was posted on [1], this patch series mainly integrates Fixed point FFT implementations in NE10 library into opus. You can view my opus wip code at [2]. Note that while I found some issues both with the NE10 library(fixed fft) and with Linaro toolchain (armv8 intrinsics), the work

[RFC PATCH v1 0/4] Enable aarch64 intrinsics/Ne10

2015 Mar 18

[RFC PATCH v1 0/4] Enable aarch64 intrinsics/Ne10

Hi All, Since I continue to base my work on top of Jonathan's patch, and my previous Ne10 fft/ifft/mdct_forward/backward patches, I thought it would be better to just post all new patches as a patch series. Please let me know if anyone disagrees with this approach. You can see wip branch of all latest patches at https://git.linaro.org/people/viswanath.puttagunta/opus.git Branch:

search for: float32_t