similar to: [Aarch64 00/11] Patches to enable Aarch64

Displaying 20 results from an estimated 2000 matches similar to: "[Aarch64 00/11] Patches to enable Aarch64"

2015 Nov 10
0
[Aarch64 00/11] Patches to enable Aarch64
Good to know. Thank-you for the test. On 11/10/2015 2:37 PM, Jonathan Lennox wrote: >> On Nov 10, 2015, at 3:45 PM, John Ridges <jridges at masque.com> wrote: >> >> Since you're already set up for benchmarks, I would ask if you could >> benchmark the difference between using and not using the ARM64 inline >> assembly. I believe the original justification
2015 Nov 10
0
[Aarch64 00/11] Patches to enable Aarch64
> On Nov 10, 2015, at 3:45 PM, John Ridges <jridges at masque.com> wrote: > > Since you're already set up for benchmarks, I would ask if you could > benchmark the difference between using and not using the ARM64 inline > assembly. I believe the original justification on ARMv7 for the assembly > was the processor's panoply of multiply instructions and their long
2015 Nov 13
2
[Aarch64 00/11] Patches to enable Aarch64
Thanks, I look forward to seeing what you find out. BTW, I was wondering if you tried replacing the SIG2WORD16 macro using the vqmovns_s32 intrinsic? I'm sure it would be faster than the C code, but in the grand scheme of things it might not make much difference. On 11/13/2015 12:15 PM, Jonathan Lennox wrote: >> On Nov 13, 2015, at 1:51 PM, John Ridges <jridges at masque.com>
2015 Mar 04
0
Patch cleaning up Opus x86 intrinsics configury
On 3 March 2015 at 22:17, Jonathan Lennox <jonathan at vidyo.com> wrote: > > On Mar 3, 2015, at 11:08 PM, Viswanath Puttagunta > <viswanath.puttagunta at linaro.org> wrote: > > > > On 3 March 2015 at 21:59, Jonathan Lennox <jonathan at vidyo.com> wrote: >> >> Viswenath, >> >> My patch should be against the tip, but it?s the very recent
2015 Nov 13
2
[Aarch64 00/11] Patches to enable Aarch64
Hi Jonathan, I'm sorry to bring this up again, and I don't want to beat a dead horse, but I was very surprised by your benchmarks so I took a little closer look. I think what's happening is that it's a little unfair to compare the ARM64 inline assembly to the C code, because looking at the C macros in "fixed_generic.h" for MULT16_32_Q16 and MULT16_32_Q15 you find
2015 Mar 07
1
Patch cleaning up Opus x86 intrinsics configury
Hello Jonathan, Just FYI, I started doing review of your patch and will get back to you in few days. After review, I would like to rebase your patch (as necessary) myself and do some testing.. and re-submit. Regards, Vish On 4 March 2015 at 09:00, Viswanath Puttagunta <viswanath.puttagunta at linaro.org> wrote: > > On 3 March 2015 at 22:17, Jonathan Lennox <jonathan at
2015 Mar 04
2
Patch cleaning up Opus x86 intrinsics configury
On Mar 3, 2015, at 11:08 PM, Viswanath Puttagunta <viswanath.puttagunta at linaro.org<mailto:viswanath.puttagunta at linaro.org>> wrote: On 3 March 2015 at 21:59, Jonathan Lennox <jonathan at vidyo.com<mailto:jonathan at vidyo.com>> wrote: Viswenath, My patch should be against the tip, but it?s the very recent tip, including some changes this past Friday (27 Feb). I
2015 Mar 04
0
Patch cleaning up Opus x86 intrinsics configury
On 3 March 2015 at 21:59, Jonathan Lennox <jonathan at vidyo.com> wrote: > Viswenath, > > My patch should be against the tip, but it?s the very recent tip, > including some changes this past Friday (27 Feb). I mentioned in the IRC > room a problem I discovered in creating my patch, and then later improved > the fix Tim had made for the problem. Where do you get conflicts
2015 Nov 19
2
[Aarch64 00/11] Patches to enable Aarch64
> On Nov 16, 2015, at 4:42 PM, Jonathan Lennox <jonathan at vidyo.com> wrote: > > I haven?t yet tried replacing SIG2WORD16 (or silk_ADD_SAT32/silk_SUB_SAT32) with Neon intrinsics. That?s an obvious next step. This doesn?t show any appreciable speed difference in my tests, but the code is obviously better by inspection (all three of these map directly to a single Aarch64
2015 Mar 04
2
Patch cleaning up Opus x86 intrinsics configury
Viswenath, My patch should be against the tip, but it?s the very recent tip, including some changes this past Friday (27 Feb). I mentioned in the IRC room a problem I discovered in creating my patch, and then later improved the fix Tim had made for the problem. Where do you get conflicts merging it to tip? In terms of merging, you posted your patch before I posted mine, so probably I should be
2015 Nov 20
2
[Aarch64 00/11] Patches to enable Aarch64
> On Nov 19, 2015, at 5:47 PM, John Ridges <jridges at masque.com> wrote: > > Any speedup from the intrinsics may just be swamped by the rest of the encode/decode process. But I think you really want SIG2WORD16 to be (vqmovns_s32(PSHR32((x), SIG_SHIFT))) Yes, you?re right. I forgot to run the vectors under qemu with my previous version (oh, the embarrassment!) Fixed forthcoming
2015 Nov 12
2
[Aarch64 00/11] Patches to enable Aarch64
One other minor thing: I notice that in the inline assembly the result (rd) is constrained as an earlyclobber operand. What was the reason for that?
2015 Nov 21
8
[Aarch64 v2 10/18] Clean up some intrinsics-related wording in configure.
--- configure.ac | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/configure.ac b/configure.ac index f52d2c2..e1a6e9b 100644 --- a/configure.ac +++ b/configure.ac @@ -190,7 +190,7 @@ AC_ARG_ENABLE([rtcd], [enable_rtcd=yes]) AC_ARG_ENABLE([intrinsics], - [AS_HELP_STRING([--disable-intrinsics], [Disable intrinsics optimizations for ARM(float) X86(fixed)])],, +
2015 Nov 19
0
[Aarch64 00/11] Patches to enable Aarch64
Any speedup from the intrinsics may just be swamped by the rest of the encode/decode process. But I think you really want SIG2WORD16 to be (vqmovns_s32(PSHR32((x), SIG_SHIFT))) On 11/19/2015 2:52 PM, Jonathan Lennox wrote: >> On Nov 16, 2015, at 4:42 PM, Jonathan Lennox <jonathan at vidyo.com> wrote: >> >> I haven?t yet tried replacing SIG2WORD16 (or
2015 Nov 16
0
[Aarch64 00/11] Patches to enable Aarch64
I?ve tried adding support for OPUS_FAST_INT64 to celt/arch.h, and I?ve found that this is indeed comparable in speed, if not a touch faster, than my inline assembly. I?ll submit patches for this. The inline assembly parts of my aarch64 patch set can thus be considered withdrawn. I haven?t yet tried replacing SIG2WORD16 (or silk_ADD_SAT32/silk_SUB_SAT32) with Neon intrinsics. That?s an obvious
2015 Nov 19
3
[PATCH 1/3] Add configure check for Aarch64-specific Neon intrinsics.
--- configure.ac | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/configure.ac b/configure.ac index 90a06c8..adcb969 100644 --- a/configure.ac +++ b/configure.ac @@ -503,6 +503,26 @@ AS_IF([test x"$enable_intrinsics" = x"yes"],[ [rtcd_support="$rtcd_support (NE10)"]) ]) + OPUS_CHECK_INTRINSICS( +
2014 Nov 25
1
[RFC PATCHv1] cover: celt_pitch_xcorr: Introduce ARM neon intrinsics
On 25 November 2014 at 10:18, Jonathan Lennox <jonathan at vidyo.com> wrote: > > On Nov 25, 2014, at 11:13 AM, Viswanath Puttagunta > <viswanath.puttagunta at linaro.org> wrote: > > On 25 November 2014 at 10:11, Viswanath Puttagunta > <viswanath.puttagunta at linaro.org> wrote: > > > On 25 November 2014 at 09:39, Jonathan Lennox <jonathan at
2015 Mar 03
0
Patch cleaning up Opus x86 intrinsics configury
Hello Jonathan, I am unable to apply your patch cleanly on tip. Timothy/opus-dev, This patch has some conflicts with my ARM patch that does fft optimizations http://lists.xiph.org/pipermail/opus/2015-March/002904.html http://lists.xiph.org/pipermail/opus/2015-March/002905.html One of us probably has to rebase depending on which patch goes into opus first. Regards, Vish On 1 March 2015 at
2015 Nov 13
0
[Aarch64 00/11] Patches to enable Aarch64
> On Nov 13, 2015, at 1:51 PM, John Ridges <jridges at masque.com> wrote: > > Hi Jonathan, > > I'm sorry to bring this up again, and I don't want to beat a dead horse, but I was very surprised by your benchmarks so I took a little closer look. > > I think what's happening is that it's a little unfair to compare the ARM64 inline assembly to the C code,
2015 May 15
0
[RFC V3 7/8] armv7, armv8: Optimize fixed point fft using NE10 library
Uses NEON optimized fixed point fft routines in NE10 library Signed-off-by: Viswanath Puttagunta <viswanath.puttagunta at linaro.org> Signed-off-by: Jonathan Lennox <jonathan at vidyo.com> --- Makefile.am | 12 +- celt/arm/arm_celt_map.c | 46 ++-- celt/arm/celt_ne10_fft.c | 98 +++++---- celt/arm/fft_arm.h |