Displaying 3 results from an estimated 3 matches for "den0018a".
2014 Dec 19
3
[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Viswanath Puttagunta wrote:
> I responded to your feedback before I started on RFCv3.. and took your
> silence as approval :).. I guess that email got lost in your inbox sea
> some where.. so re-posting the responses.
Sorry, I did see it but I guess I read it rather more quickly than I
thought. Apologies for that.
> guidance. I wouldn't know where else to put this. Without
2014 Dec 19
0
[RFC PATCH v3] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
...ault) -O2 is identical. However, if
> someone actually wants to make an unoptimized build for some reason
> (e.g., testing or debugging), you've broken that.
Thanks. After re-referring to the NEON Programmer's guide carefully
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0018a/index.html
the appropriate flag should have been "-ftree-vectorize".
I can also confirm that -O2 and -O3 produces almost same output. What
I was comparing when I meant "painful" is when I don't use any optimization
level.. which I tried during unit testing. But it looks lik...
2014 Dec 09
1
[RFC PATCH v2] armv7: celt_pitch_xcorr: Introduce ARM neon intrinsics
Viswanath Puttagunta wrote:
> + SUMM = vdupq_n_f32(0);
It kills me that there's no intrinsic for VMOV.F32 d0, #0 (or at least I
couldn't find one), so this takes two instructions instead of one.
> + /* Consume 4 elements in x vector and 8 elements in y
> + * vector. However, the 8'th element in y never really gets
> + * touched in this loop. So, if len == 4,