Displaying 5 results from an estimated 5 matches for "sum20".
Did you mean:
sum2
2018 Jul 10
9
[PATCH 0/7] PowerPC64 performance improvements
The following series adds initial vector support for PowerPC64.
On POWER9, flac --best is about 3.3x faster.
Amitay Isaacs (2):
Add m4 macro to check for C __attribute__ features
Check if compiler supports target attribute on ppc64
Anton Blanchard (5):
configure.ac: Remove SPE detection code
configure.ac: Add VSX enable/disable
configure.ac: Fix FLAC__CPU_PPC on little endian, and add
2010 Nov 23
2
[LLVMdev] Unrolling power sum calculations into constant time expressions
Hello,
I noticed that feeding 'clang -O3' with functions like:
int sum1(int x) {
int ret = 0;
for(int i = 0; i < x; i++)
ret += i;
return ret;
}
int sum2(int x) {
int ret = 0;
for(int i = 0; i < x; i++)
ret += i*i;
return ret;
}
...
int sum20(int x) {
int ret = 0;
for(int i = 0; i < x; i++)
ret += i*i*i*i*i*i*i*i*i*i*i*i*i*i*i*i*i*i*i*i;
return ret;
}
etc.
Makes LLVM unroll all those loops into constant time expressions!
A sum \sum_{i=0}^{n} i^k can be derived into a formula which is a
polynomial of degree k...
2014 Nov 11
3
[LLVMdev] supporting SAD in loop vectorizer
...x i8> %wide.load to <4 x i32>
> 2795 %9 = zext <4 x i8> %wide.load10 to <4 x i32>
> 2796 %10 = getelementptr inbounds i8* %pix2, i64 %index
> 2797 %11 = bitcast i8* %10 to <4 x i8>*
> 2798 %wide.load11 = load <4 x i8>* %11, align 1
> 2799 %.sum20 = or i64 %index, 4
> 2800 %12 = getelementptr i8* %pix2, i64 %.sum20
> 2801 %13 = bitcast i8* %12 to <4 x i8>*
> 2802 %wide.load12 = load <4 x i8>* %13, align 1
> 2803 %14 = zext <4 x i8> %wide.load11 to <4 x i32>
> 2804 %15 = zext <4 x i8> %wi...
2014 Nov 11
4
[LLVMdev] supporting SAD in loop vectorizer
....load to <4 x i32>
> > 2795 %9 = zext <4 x i8> %wide.load10 to <4 x i32>
> > 2796 %10 = getelementptr inbounds i8* %pix2, i64 %index
> > 2797 %11 = bitcast i8* %10 to <4 x i8>*
> > 2798 %wide.load11 = load <4 x i8>* %11, align 1
> > 2799 %.sum20 = or i64 %index, 4
> > 2800 %12 = getelementptr i8* %pix2, i64 %.sum20
> > 2801 %13 = bitcast i8* %12 to <4 x i8>*
> > 2802 %wide.load12 = load <4 x i8>* %13, align 1
> > 2803 %14 = zext <4 x i8> %wide.load11 to <4 x i32>
> > 2804 %15 = zext <...
2014 Nov 04
3
[LLVMdev] supporting SAD in loop vectorizer
----- Original Message -----
> From: "Renato Golin" <renato.golin at linaro.org>
> To: "Dibyendu Das" <Dibyendu.Das at amd.com>
> Cc: llvmdev at cs.uiuc.edu
> Sent: Tuesday, November 4, 2014 5:23:30 AM
> Subject: Re: [LLVMdev] supporting SAD in loop vectorizer
>
> On 4 November 2014 11:06, Das, Dibyendu <Dibyendu.Das at amd.com> wrote: