Displaying 3 results from an estimated 3 matches for "rgb2yik".
2016 Sep 01
2
enabling interleaved access loop vectorization
...rleaved-mem-accesses, we just need the backend to generate good code for the vector types that produces, specifically, in this case, <12 x i8>. The details are in PR29025.
The upshot of this is that for the original program (with an outer loop around it):
$ bin/clang -m32 -O2 -o ~/llvm/temp/rgb2yik.exe ~/llvm/temp/rgb2yik.c -mavx && time ~/llvm/temp/rgb2yik.exe
real 0m2.229s
user 0m2.224s
$ bin/clang -m32 -O2 -o ~/llvm/temp/rgb2yik.exe ~/llvm/temp/rgb2yik.c -mavx -mllvm -enable-interleaved-mem-accesses && time ~/llvm/temp/rgb2yik.exe
real 0m2.590s
user 0m2....
2016 Aug 17
2
enabling interleaved access loop vectorization
...wrote:
> Hi Michael,
>
>
>
> Don’t quite have a full reproducer for you yet. You’re welcome to try and
> see what’s happening in 32 bit mode when enabling interleaving for the
> following, based on “https://en.wikipedia.org/wiki/YIQ#From_RGB_to_YIQ”:
>
>
>
> void rgb2yik (char * in, char * out, int N)
>
> {
>
> int j;
>
> for (j = 0; j < N; ++j) {
>
> unsigned char r = *in++;
>
> unsigned char g = *in++;
>
> unsigned char b = *in++;
>
> unsigned char y = 0.299*r + 0.587*g + 0.114*b;
>
> sign...
2016 Aug 16
2
enabling interleaved access loop vectorization
Hi Ayal, Elena,
I'd really like to enable this by default.
As I wrote above, I didn't see any regressions in internal benchmarks, and
there doesn't seem to be anything in SPEC2006 either. I do see a
performance improvement in an internal benchmark (that is, a real
workload).
Would you be able to provide an example that gets pessimized? I have no
doubt you've seen regressions