thr3ads.net - search: "sum20"

Displaying 5 results from an estimated 5 matches for "sum20".

Did you mean: sum2

[PATCH 0/7] PowerPC64 performance improvements

2018 Jul 10

[PATCH 0/7] PowerPC64 performance improvements

The following series adds initial vector support for PowerPC64. On POWER9, flac --best is about 3.3x faster. Amitay Isaacs (2): Add m4 macro to check for C __attribute__ features Check if compiler supports target attribute on ppc64 Anton Blanchard (5): configure.ac: Remove SPE detection code configure.ac: Add VSX enable/disable configure.ac: Fix FLAC__CPU_PPC on little endian, and add

[LLVMdev] Unrolling power sum calculations into constant time expressions

2010 Nov 23

[LLVMdev] Unrolling power sum calculations into constant time expressions

Hello, I noticed that feeding 'clang -O3' with functions like: int sum1(int x) { int ret = 0; for(int i = 0; i < x; i++) ret += i; return ret; } int sum2(int x) { int ret = 0; for(int i = 0; i < x; i++) ret += i*i; return ret; } ... int sum20(int x) { int ret = 0; for(int i = 0; i < x; i++) ret += i*i*i*i*i*i*i*i*i*i*i*i*i*i*i*i*i*i*i*i; return ret; } etc. Makes LLVM unroll all those loops into constant time expressions! A sum \sum_{i=0}^{n} i^k can be derived into a formula which is a polynomial of degree k...

[LLVMdev] supporting SAD in loop vectorizer

2014 Nov 11

[LLVMdev] supporting SAD in loop vectorizer

...x i8> %wide.load to <4 x i32> > 2795 %9 = zext <4 x i8> %wide.load10 to <4 x i32> > 2796 %10 = getelementptr inbounds i8* %pix2, i64 %index > 2797 %11 = bitcast i8* %10 to <4 x i8>* > 2798 %wide.load11 = load <4 x i8>* %11, align 1 > 2799 %.sum20 = or i64 %index, 4 > 2800 %12 = getelementptr i8* %pix2, i64 %.sum20 > 2801 %13 = bitcast i8* %12 to <4 x i8>* > 2802 %wide.load12 = load <4 x i8>* %13, align 1 > 2803 %14 = zext <4 x i8> %wide.load11 to <4 x i32> > 2804 %15 = zext <4 x i8> %wi...

[LLVMdev] supporting SAD in loop vectorizer

2014 Nov 11

[LLVMdev] supporting SAD in loop vectorizer

....load to <4 x i32> > > 2795 %9 = zext <4 x i8> %wide.load10 to <4 x i32> > > 2796 %10 = getelementptr inbounds i8* %pix2, i64 %index > > 2797 %11 = bitcast i8* %10 to <4 x i8>* > > 2798 %wide.load11 = load <4 x i8>* %11, align 1 > > 2799 %.sum20 = or i64 %index, 4 > > 2800 %12 = getelementptr i8* %pix2, i64 %.sum20 > > 2801 %13 = bitcast i8* %12 to <4 x i8>* > > 2802 %wide.load12 = load <4 x i8>* %13, align 1 > > 2803 %14 = zext <4 x i8> %wide.load11 to <4 x i32> > > 2804 %15 = zext &lt...

[LLVMdev] supporting SAD in loop vectorizer

2014 Nov 04

[LLVMdev] supporting SAD in loop vectorizer

----- Original Message ----- > From: "Renato Golin" <renato.golin at linaro.org> > To: "Dibyendu Das" <Dibyendu.Das at amd.com> > Cc: llvmdev at cs.uiuc.edu > Sent: Tuesday, November 4, 2014 5:23:30 AM > Subject: Re: [LLVMdev] supporting SAD in loop vectorizer > > On 4 November 2014 11:06, Das, Dibyendu <Dibyendu.Das at amd.com> wrote:

search for: sum20