Displaying 17 results from an estimated 17 matches for "float32_t".
Did you mean:
float32x4_t
2014 Dec 19
3
[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...If there were a measurable performance advantage to the switch
statement, that would change my opinion, but if there isn't, then either
of the other two approaches seem better.
One more nit I hadn't noticed before:
> + float *xi = x;
> + float *yi = y;
These need to be const float32_t (in both xcorr_kernel_neon_float and
xcorr_kernel_neon_float_process1). They're currently causing a ton of
warning spew. float32_t appears to not be considered equivalent to
float, which means you'll also need casts here:
> + vst1q_f32(sum, SUMM);
and here:
> + vst1_lane_f32...
2016 Sep 13
4
[PATCH 12/15] Replace call of celt_inner_prod_c() (step 1)
Should call celt_inner_prod().
---
celt/bands.c | 7 ++++---
celt/bands.h | 2 +-
celt/celt_encoder.c | 6 +++---
celt/pitch.c | 2 +-
src/opus_multistream_encoder.c | 2 +-
5 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/celt/bands.c b/celt/bands.c
index bbe8a4c..1ab24aa 100644
--- a/celt/bands.c
+++ b/celt/bands.c
2015 Jan 29
2
[RFC PATCH v1 2/2] armv7(float): Optimize encode usecase using NE10 library
...t;nfft);
> + if (st->priv == NULL) {
> + printf("Unable to ne10 alloc\n");
Absolutely no printfs in library code.
> + return -1;
> + }
> + return 0;
> +}
> +
> +void opus_fft_free_arm_float_neon(kiss_fft_state *st)
> +{
> + ne10_fft_cfg_float32_t cfg = (ne10_fft_cfg_float32_t)st->priv;
> +
> + if (cfg)
> + free((void *)cfg);
This concerns me for several reasons:
1) We never call free() directly in libopus. It is always wrapped in the
opus_free() macro to allow ports to override it (and as a debugging tool).
2) We didn...
2014 Dec 19
0
[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...er
> of the other two approaches seem better.
OK, thanks for your explanation. I will take a closer look at this part
of celt_pitch_xcorr_arm.s
>
>
> One more nit I hadn't noticed before:
>
>> + float *xi = x;
>> + float *yi = y;
>
> These need to be const float32_t (in both xcorr_kernel_neon_float and
> xcorr_kernel_neon_float_process1). They're currently causing a ton of
> warning spew. float32_t appears to not be considered equivalent to
> float, which means you'll also need casts here:
>
>> + vst1q_f32(sum, SUMM);
>
> and...
2015 Jan 29
0
[RFC PATCH v1 2/2] armv7(float): Optimize encode usecase using NE10 library
...of developers/maintainers of opus. Please correct me if
any of information below is inaccurate from a libNE10 change request
perspective.
Phil,
Below are 2 main things we need to change from NE10 Library side. Can
you please make these changes on NE10 side?
1. Remove usage of _t in ne10_fft_cfg_float32_t and ne10_fft_cpx_float32_t
-------from comment----
ne10_fft_cfg_float32_t cfg = (ne10_fft_cfg_float32_t)st->priv;
+ VARDECL(ne10_fft_cpx_float32_t, temp);
+ VARDECL(ne10_fft_cpx_float32_t, tempin);
Just another note on API design... the _t suffix is reserved by POSIX,
and should never be us...
2014 Dec 18
2
[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Almost there... just a few nits left.
Viswanath Puttagunta wrote:
> +if OPUS_ARM_NEON_INTR
> +CELT_SOURCES += $(CELT_SOURCES_ARM_NEON_INTR)
> +OPUS_ARM_NEON_INTR_CPPFLAGS = -mfpu=neon -O3
I'll repeat: I don't think you should change the optimization level here.
> + /* Just unroll the rest of the loop */
I saw you decided to keep this unrolled, but you didn't actually
2015 May 15
0
[RFC V3 5/8] aarch64: celt_pitch_xcorr: Fixed point intrinsics
...284 insertions(+)
diff --git a/celt/arm/celt_neon_intr.c b/celt/arm/celt_neon_intr.c
index 47dce15..be978a0 100644
--- a/celt/arm/celt_neon_intr.c
+++ b/celt/arm/celt_neon_intr.c
@@ -249,4 +249,272 @@ void celt_pitch_xcorr_float_neon(const opus_val16 *_x, const opus_val16 *_y,
(const float32_t *)_y+i, (float32_t *)xcorr+i, len);
}
}
+#else /* FIXED POINT */
+
+/*
+ * Function: xcorr_kernel_neon_fixed
+ * ---------------------------------
+ * Computes 8 correlation values and stores them in sum[8]
+ */
+static void xcorr_kernel_neon_fixed(const int16_t *x, const int16_t *y,
+...
2015 May 08
0
[[RFC PATCH v2]: Ne10 fft fixed and previous 5/8] aarch64: celt_pitch_xcorr: Fixed point intrinsics
...284 insertions(+)
diff --git a/celt/arm/celt_neon_intr.c b/celt/arm/celt_neon_intr.c
index 47dce15..be978a0 100644
--- a/celt/arm/celt_neon_intr.c
+++ b/celt/arm/celt_neon_intr.c
@@ -249,4 +249,272 @@ void celt_pitch_xcorr_float_neon(const opus_val16 *_x, const opus_val16 *_y,
(const float32_t *)_y+i, (float32_t *)xcorr+i, len);
}
}
+#else /* FIXED POINT */
+
+/*
+ * Function: xcorr_kernel_neon_fixed
+ * ---------------------------------
+ * Computes 8 correlation values and stores them in sum[8]
+ */
+static void xcorr_kernel_neon_fixed(const int16_t *x, const int16_t *y,
+...
2015 Jan 20
6
[RFC PATCH v1 0/2] Encode optimize using libNE10
Hello opus-dev,
I've been cooking up this patchset to integrate NE10 library into opus.
Current patchset focuses on encode use case mainly effecting performance of
clt_mdct_forward() and opus_fft() (for float only)
Glad to report the following on Encode use case:
(Measured on my Beaglebone Black Cortex-A8 board)
- Performance improvement for encode use case ~= 12.34% (Based on time -p
2014 Dec 19
2
[PATCH v1] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
On 19 December 2014 at 17:25, Viswanath Puttagunta
<viswanath.puttagunta at linaro.org> wrote:
> Optimize celt_pitch_xcorr function (for floating point)
> using ARM NEON intrinsics for SoCs that have NEON VFP unit.
>
> To enable this optimization, use --enable-intrinsics
> configure option.
>
> Compile time and runtime checks are also supported to make sure
> this
2016 Jul 13
7
RFC: SIMD math-function library
Dear LLVM contributors,
I am Naoki Shibata, an associate professor at Nara Institute of Science
and Technology.
I and Hal Finkel would like to jointly propose to add my vectorized math
library to LLVM.
The library has been available as public domain software for years, I am
going to double-license the library if necessary.
********
Below is a proposal to add my vectorized math library,
2015 May 08
0
[[RFC PATCH v2]: Ne10 fft fixed and previous 1/8] armv7(float): Optimize encode usecase using NE10 library
...>arch_fft->is_supported = 1;
+ st->arch_fft->priv = (void *)ne10_fft_alloc_c2c_float32_neon(st->nfft);
+ if (st->arch_fft->priv == NULL) {
+ return -1;
+ }
+ }
+ return 0;
+}
+
+void opus_fft_free_arm_float_neon(kiss_fft_state *st)
+{
+ ne10_fft_cfg_float32_t cfg;
+
+ if (!st->arch_fft)
+ return;
+
+ cfg = (ne10_fft_cfg_float32_t)st->arch_fft->priv;
+ if (cfg)
+ ne10_fft_destroy_c2c_float32(cfg);
+ opus_free(st->arch_fft);
+}
+#endif
+void opus_fft_float_neon(const kiss_fft_state *st,
+ const kiss_ff...
2015 Mar 31
6
[RFC PATCH v1 0/5] aarch64: celt_pitch_xcorr: Fixed point series
Hi Timothy,
As I mentioned earlier [1], I now fixed compile issues
with fixed point and resubmitting the patch.
I also have new patch that does intrinsics optimizations
for celt_pitch_xcorr targetting aarch64.
You can find my latest work-in-progress branch at [2]
For reference, you can use the Ne10 pre-built libraries
at [3]
Note that I am working with Phil at ARM to get my patch at [4]
2015 May 08
8
[RFC PATCH v2]: Ne10 fft fixed and previous 0/8]
Hi All,
As per Timothy's suggestion, disabling mdct_forward
for fixed point. Only effects
armv7,armv8: Extend fixed fft NE10 optimizations to mdct
Rest of patches are same as in [1]
For reference, latest wip code for opus is at [2]
Still working with NE10 team at ARM to get corner cases of
mdct_forward. Will update with another patch
when issue in NE10 gets fixed.
Regards,
Vish
[1]:
2015 May 15
11
[RFC V3 0/8] Ne10 fft fixed and previous
Hi All,
Changes from RFC v2 [1]
armv7,armv8: Extend fixed fft NE10 optimizations to mdct
- Overflow issue fixed by Phil at ARM. Ne10 wip at [2]. Should be upstream soon.
- So, re-enabled using fixed fft for mdct_forward which was disabled in RFCv2
armv7,armv8: Optimize fixed point fft using NE10 library
- Thanks to Jonathan Lennox, fixed some build fixes on iOS and some copy-paste errors
Rest
2015 Apr 28
10
[RFC PATCH v1 0/8] Ne10 fft fixed and previous
Hello Timothy / Jean-Marc / opus-dev,
This patch series is follow up on work I posted on [1].
In addition to what was posted on [1], this patch series mainly
integrates Fixed point FFT implementations in NE10 library into opus.
You can view my opus wip code at [2].
Note that while I found some issues both with the NE10 library(fixed fft)
and with Linaro toolchain (armv8 intrinsics), the work
2015 Mar 18
5
[RFC PATCH v1 0/4] Enable aarch64 intrinsics/Ne10
Hi All,
Since I continue to base my work on top of Jonathan's patch,
and my previous Ne10 fft/ifft/mdct_forward/backward patches,
I thought it would be better to just post all new patches
as a patch series. Please let me know if anyone disagrees
with this approach.
You can see wip branch of all latest patches at
https://git.linaro.org/people/viswanath.puttagunta/opus.git
Branch: