thr3ads.net - similar to: "mdct_backward with fused muladd?"

Displaying 20 results from an estimated 500 matches similar to: "mdct_backward with fused muladd?"

[LLVMdev] Question to use inline assemble in X86

2009 Dec 29

[LLVMdev] Question to use inline assemble in X86

Hi everyone, I try to add an instruction to x86. The instruction is a multiply-add instruction MULADD A, B, C; //A = A + B * C. I use the instruction by inline assemble as below int x, y, z; ..... .... x = 0; asm("MULADD %0, %1, %2":"=r"(x):"0"(x), "r"(y), "r"(z)); ..... .... The backend does allocate registers %edx, %edi, %esi for x,y, z

[LLVMdev] Function permutation at IR bytecode level

2013 Mar 07

[LLVMdev] Function permutation at IR bytecode level

Hi All, I am working on writing pass in LLVM and interested in doing function permutation at intermediate representation byte code level? If I have lets say C program having three functions and its corresponding IR bytecode. void findLen(char a[10]) { int tmp = strlen(a); printf("Len is : %d\n", tmp); } void muladd(int a, int b, int c) { int tmp = a + b; int tmp1 = tmp *

[LLVMdev] Question to use inline assemble in X86

2009 Dec 29

[LLVMdev] Question to use inline assemble in X86

On Dec 29, 2009, at 3:09 AM, Heyu Zhu wrote: > Hi everyone, > > I try to add an instruction to x86. The instruction is a multiply-add instruction > MULADD A, B, C; //A = A + B * C. > I use the instruction by inline assemble as below > > int x, y, z; > ..... .... > x = 0; > asm("MULADD %0, %1, %2":"=r"(x):"0"(x), "r"(y),

[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

2014 Aug 07

[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

Hi Sanjay, You are right. I tried XL and gcc 4.8.2 for PPC and I also got multiply-and-add operations. I supported my statement on what I read in the gcc man page. -ffast-math is used in clang to set fp-contract to fast (default is standard) and in gcc it activates (among others) the flag -funsafe-math-optimizations whose description includes: "Allow optimizations for floating-point

mdct.c pointer to array conversion

2002 Aug 13

mdct.c pointer to array conversion

Hi all, I'm attempting to convert all the pointers to arrays the mdct_backward function so it can be partitioned off for a hardware implementation. Although this code is quite short I'm finding it a little tricky. As it stands, mdct_backward is passed values by reference i.e. void mdct_backward(mdct_lookup *init, DATA_TYPE *in, DATA_TYPE *out) o my modified version starts void

_LOW_ACCURACY_ good enough?

2003 May 23

_LOW_ACCURACY_ good enough?

I spent a fair amount of time optimizing tremor for the PS2, mostly by using dual-pipe multiplies in the X[N]PRODnn and the window apply code. Then, just for kicks, I re-enabled _LOW_ACCURACY_ and lo and behold it was still substantially faster. I also got some gains out of tremor by changing the longs in cookbook and sharedbook to ogg_int32_t's like I did for vorbis. I think _LOW_ACCURACY_

[PATCH] Fix miscompile of SSE resampler

2009 Oct 26

[PATCH] Fix miscompile of SSE resampler

From: Thorvald Natvig <slicer at users.sourceforge.net> Some optimizing compilers miscompile the current SSE optimizations when full optimizations are enabled. By using output value pointer instead of a return value, we can bypass this misbehaviour. --- libspeex/resample.c | 8 ++++---- libspeex/resample_sse.h | 24 ++++++++---------------- 2 files changed, 12 insertions(+), 20

Accumulating results from "for" loop in a list/array

2009 Sep 11

Accumulating results from "for" loop in a list/array

Dear R users, I would like to accumulate objects generated from 'for' loop to a list or array. To illustrate the problem, arbitrary data set and script is shown below, x <- data.frame(a = c(rep("n",3),rep("y",2),rep("n",3),rep("y",2)), b = c(rep("y",2),rep("n",4),rep("y",3),"n"), c =

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

2017 Jun 13

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

Am 13.06.2017 um 02:05 schrieb Ilia Mirkin: > On Mon, Jun 12, 2017 at 7:57 PM, Roland Scheidegger <sroland at vmware.com> wrote: >> FWIW surely on nv50 you could keep a single mad instruction for umad >> (sad maybe too?). (I'm actually wondering if the hw really can't do >> unfused float multiply+add as a single instruction but I know next to >> nothing

Hangups - SIGFPE in dsp.c

2004 Aug 18

Hangups - SIGFPE in dsp.c

Hi, I'm running the latest CVS HEAD version of asterisk, and I'm experiencing hangups during voice conversation. This happens quite regularely and often. The problem is in dsp.c, line 1235, where it says accum /= len; But `len', at this point, is 0, resulting in a SIGFPE. The routine ast_frame *i4l_read() in channels/chan_modem_i4l.c:411 is setting p->fr.datalen to

[LLVMdev] subregisters, def-kill

2011 May 20

[LLVMdev] subregisters, def-kill

If I write %reg16506<def> = INSERT_SUBREG %reg16506, %reg16445, hi16; #1 %reg16506<def> = INSERT_SUBREG %reg16506, %reg16468, lo16; #2 store %reg16506 #3 it will not coalesce, as LiveVariables: on #2: %16506 gets #2 as a kill #3: %16506 gets #3 as an additional kill LiveIntervalAnalysis:

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 18

[LLVMdev] LLVM ARM VMLA instruction

> http://llvm.org/bugs/show_bug.cgi?id=17188 > http://llvm.org/bugs/show_bug.cgi?id=17211 Ah, thanks. That makes a lot more sense now. > Correct - clang is different than gcc, icc, msvc, xlc, etc. on this. Still > haven't seen any explanation for how this is better though... That would be because it follows what C tells us a compiler has to do by default but provides overrides

Resampler (no api)

2008 May 03

Resampler (no api)

.. And a version without the API changes. -------------- next part -------------- Index: libspeex/resample_sse.h =================================================================== --- libspeex/resample_sse.h (revision 0) +++ libspeex/resample_sse.h (revision 0) @@ -0,0 +1,128 @@ +/* Copyright (C) 2002-2008 Jean-Marc Valin + * Copyright (C) 2008 Thorvald Natvig + */ +/** + @file resample_sse.h +

Resampler, memory only variant

2008 May 03

Resampler, memory only variant

Hi, Here's the (hopefully) final version of the resampler, now always using st->mem as the buffer area. It only allocates buffers on the stack when it's necesarry to convert the output between int and float. -------------- next part -------------- Index: include/speex/speex_resampler.h =================================================================== ---

accelerating matrix multiply

2017 Jan 07

accelerating matrix multiply

I am using R to multiply some large (30k x 30k double) matrices on a 64 core machine (xeon phi). I added some timers to src/main/array.c to see where the time is going. All of the time is being spent in the matprod function, most of that time is spent in dgemm. 15 seconds is in matprod in some code that is checking if there are NaNs. > system.time (C <- B %*% A) nancheck: wall time

Plot frame border to start at zero?

2010 Jan 20

Plot frame border to start at zero?

Hello, I am creating plots of hourly precipitation and accumulated precipitation (on different axis, see attached image). I was wondering how can I have the plot frame (black border) start at zero, it looks like it is plotted less than zero? The code I use to create the png files is below: CairoPNG(PNG_file,width=1000, height=600, pointsize=14, bg="white") opar <-

SCEV and LoopStrengthReduction Formulae

2018 Apr 07

SCEV and LoopStrengthReduction Formulae

> > I realize this is a micro-op saving a single cycle. But this reduces the instruction count, one less > instr to decode in a potentially hot path. If this all makes sense, and seems like a reasonable addition > to llvm, would it make sense to implement this as a supplemental LSR formula, or as a separate pass? This seems reasonable to me so long as rbx has no other uses that

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

2012 Dec 10

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

Hello all, I wanted to get some feedback on this patch for ScalarEvolution. It addresses a performance problem I am seeing for simple benchmark. Starting with this C code: 01: signed char foo(void) 02: { 03: const int count = 8000; 04: signed char result = 0; 05: int j; 06: 07: for (j = 0; j < count; ++j) { 08: result += (result_t)(3); 09: } 10: 11: return result; 12: } I

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 19

[LLVMdev] LLVM ARM VMLA instruction

Thanks for the explanation, Tim! gcc 4.8.1 *does* generate an fma for your code example for an x86 target that supports fma. I'd bet that the HW vendors' compilers do the same, but I don't have any of those installed at the moment to test that theory. So this is a bug in those compilers? Do you know how they justify it? I see section 6.5 "Expressions" in the C standard, and

similar to: mdct_backward with fused muladd?