Displaying 20 results from an estimated 500 matches similar to: "mdct_backward with fused muladd?"
2009 Dec 29
2
[LLVMdev] Question to use inline assemble in X86
Hi everyone,
I try to add an instruction to x86. The instruction is a
multiply-add instruction
MULADD A, B, C; //A = A + B * C.
I use the instruction by inline assemble as below
int x, y, z;
..... ....
x = 0;
asm("MULADD %0, %1, %2":"=r"(x):"0"(x), "r"(y), "r"(z));
..... ....
The backend does allocate registers %edx, %edi, %esi for x,y, z
2013 Mar 07
1
[LLVMdev] Function permutation at IR bytecode level
Hi All,
I am working on writing pass in LLVM and interested in doing function
permutation at intermediate representation byte code level?
If I have lets say C program having three functions and its corresponding
IR bytecode.
void findLen(char a[10])
{
int tmp = strlen(a);
printf("Len is : %d\n", tmp);
}
void muladd(int a, int b, int c)
{
int tmp = a + b;
int tmp1 = tmp *
2009 Dec 29
0
[LLVMdev] Question to use inline assemble in X86
On Dec 29, 2009, at 3:09 AM, Heyu Zhu wrote:
> Hi everyone,
>
> I try to add an instruction to x86. The instruction is a multiply-add instruction
> MULADD A, B, C; //A = A + B * C.
> I use the instruction by inline assemble as below
>
> int x, y, z;
> ..... ....
> x = 0;
> asm("MULADD %0, %1, %2":"=r"(x):"0"(x), "r"(y),
2014 Aug 07
2
[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines
Hi Sanjay,
You are right. I tried XL and gcc 4.8.2 for PPC and I also got
multiply-and-add operations.
I supported my statement on what I read in the gcc man page. -ffast-math is
used in clang to set fp-contract to fast (default is standard) and in gcc
it activates (among others) the flag -funsafe-math-optimizations whose
description includes:
"Allow optimizations for floating-point
2002 Aug 13
1
mdct.c pointer to array conversion
Hi all,
I'm attempting to convert all the pointers to arrays the mdct_backward
function so it can be partitioned off for a hardware implementation.
Although this code is quite short I'm finding it a little tricky.
As it stands, mdct_backward is passed values by reference i.e.
void mdct_backward(mdct_lookup *init, DATA_TYPE *in, DATA_TYPE *out)
o my modified version starts
void
2003 May 23
0
_LOW_ACCURACY_ good enough?
I spent a fair amount of time optimizing tremor for the PS2, mostly by using dual-pipe multiplies in the X[N]PRODnn and the window apply code. Then, just for kicks, I re-enabled _LOW_ACCURACY_ and lo and behold it was still substantially faster. I also got some gains out of tremor by changing the longs in cookbook and sharedbook to ogg_int32_t's like I did for vorbis.
I think _LOW_ACCURACY_
2009 Oct 26
1
[PATCH] Fix miscompile of SSE resampler
From: Thorvald Natvig <slicer at users.sourceforge.net>
Some optimizing compilers miscompile the current SSE optimizations when
full optimizations are enabled. By using output value pointer instead of
a return value, we can bypass this misbehaviour.
---
libspeex/resample.c | 8 ++++----
libspeex/resample_sse.h | 24 ++++++++----------------
2 files changed, 12 insertions(+), 20
2009 Sep 11
2
Accumulating results from "for" loop in a list/array
Dear R users,
I would like to accumulate objects generated from 'for' loop to a list or
array.
To illustrate the problem, arbitrary data set and script is shown below,
x <- data.frame(a = c(rep("n",3),rep("y",2),rep("n",3),rep("y",2)), b =
c(rep("y",2),rep("n",4),rep("y",3),"n"), c =
2017 Jun 13
1
[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI
Am 13.06.2017 um 02:05 schrieb Ilia Mirkin:
> On Mon, Jun 12, 2017 at 7:57 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>> FWIW surely on nv50 you could keep a single mad instruction for umad
>> (sad maybe too?). (I'm actually wondering if the hw really can't do
>> unfused float multiply+add as a single instruction but I know next to
>> nothing
2004 Aug 18
1
Hangups - SIGFPE in dsp.c
Hi,
I'm running the latest CVS HEAD version of asterisk, and I'm experiencing
hangups during voice conversation. This happens quite regularely and
often.
The problem is in dsp.c, line 1235, where it says
accum /= len;
But `len', at this point, is 0, resulting in a SIGFPE. The routine
ast_frame *i4l_read() in channels/chan_modem_i4l.c:411 is
setting p->fr.datalen to
2011 May 20
1
[LLVMdev] subregisters, def-kill
If I write
%reg16506<def> = INSERT_SUBREG %reg16506, %reg16445, hi16; #1
%reg16506<def> = INSERT_SUBREG %reg16506, %reg16468, lo16; #2
store %reg16506 #3
it will not coalesce, as
LiveVariables:
on
#2: %16506 gets #2 as a kill
#3: %16506 gets #3 as an additional kill
LiveIntervalAnalysis:
2013 Dec 18
0
[LLVMdev] LLVM ARM VMLA instruction
> http://llvm.org/bugs/show_bug.cgi?id=17188
> http://llvm.org/bugs/show_bug.cgi?id=17211
Ah, thanks. That makes a lot more sense now.
> Correct - clang is different than gcc, icc, msvc, xlc, etc. on this. Still
> haven't seen any explanation for how this is better though...
That would be because it follows what C tells us a compiler has to do
by default but provides overrides
2008 May 03
2
Resampler (no api)
.. And a version without the API changes.
-------------- next part --------------
Index: libspeex/resample_sse.h
===================================================================
--- libspeex/resample_sse.h (revision 0)
+++ libspeex/resample_sse.h (revision 0)
@@ -0,0 +1,128 @@
+/* Copyright (C) 2002-2008 Jean-Marc Valin
+ * Copyright (C) 2008 Thorvald Natvig
+ */
+/**
+ @file resample_sse.h
+
2008 May 03
0
Resampler, memory only variant
Hi,
Here's the (hopefully) final version of the resampler, now always using
st->mem as the buffer area. It only allocates buffers on the stack when
it's necesarry to convert the output between int and float.
-------------- next part --------------
Index: include/speex/speex_resampler.h
===================================================================
---
2017 Jan 07
2
accelerating matrix multiply
I am using R to multiply some large (30k x 30k double) matrices on a 64 core machine (xeon phi). I added some timers to src/main/array.c to see where the time is going. All of the time is being spent in the matprod function, most of that time is spent in dgemm. 15 seconds is in matprod in some code that is checking if there are NaNs.
> system.time (C <- B %*% A)
nancheck: wall time
2010 Jan 20
2
Plot frame border to start at zero?
Hello,
I am creating plots of hourly precipitation and accumulated
precipitation (on different axis, see attached image). I was wondering
how can I have the plot frame (black border) start at zero, it looks
like it is plotted less than zero?
The code I use to create the png files is below:
CairoPNG(PNG_file,width=1000, height=600, pointsize=14, bg="white")
opar <-
2018 Apr 07
0
SCEV and LoopStrengthReduction Formulae
>
> I realize this is a micro-op saving a single cycle. But this reduces the instruction count, one less
> instr to decode in a potentially hot path. If this all makes sense, and seems like a reasonable addition
> to llvm, would it make sense to implement this as a supplemental LSR formula, or as a separate pass?
This seems reasonable to me so long as rbx has no other uses that
2012 Dec 10
3
[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)
Hello all,
I wanted to get some feedback on this patch for ScalarEvolution.
It addresses a performance problem I am seeing for simple benchmark.
Starting with this C code:
01: signed char foo(void)
02: {
03: const int count = 8000;
04: signed char result = 0;
05: int j;
06:
07: for (j = 0; j < count; ++j) {
08: result += (result_t)(3);
09: }
10:
11: return result;
12: }
I
2013 Dec 19
2
[LLVMdev] LLVM ARM VMLA instruction
Thanks for the explanation, Tim!
gcc 4.8.1 *does* generate an fma for your code example for an x86 target
that supports fma. I'd bet that the HW vendors' compilers do the same, but
I don't have any of those installed at the moment to test that theory. So
this is a bug in those compilers? Do you know how they justify it?
I see section 6.5 "Expressions" in the C standard, and
2000 Oct 23
4
More mdct questions
Sorry for starting another topic, this is actually a reply to Segher's post
on Sun Oct 22 on the 'mdct question' topic. I wasn't subscribed properly
and so I didn't get email confirmation and thus can't add to that thread.
So Segher, if the equation is indeed what you say it is, then replacing
mdct_backward with this version should work, but it doesn't.
Am I applying