thr3ads.net - search: "muladd"

Displaying 20 results from an estimated 23 matches for "muladd".

Did you mean: fmuladd

[LLVMdev] Question to use inline assemble in X86

2009 Dec 29

[LLVMdev] Question to use inline assemble in X86

Hi everyone, I try to add an instruction to x86. The instruction is a multiply-add instruction MULADD A, B, C; //A = A + B * C. I use the instruction by inline assemble as below int x, y, z; ..... .... x = 0; asm("MULADD %0, %1, %2":"=r"(x):"0"(x), "r"(y), "r"(z)); ..... .... The backend does allocate registers %edx, %edi, %esi for x,y, z respect...

mdct_backward with fused muladd?

2003 May 20

mdct_backward with fused muladd?

Can anybody point me at any resources that would explain how to optimize mdct_backward for a cpu with a fused multiply-accumute unit? >From what I understand from responses to my older postings, Tremor's mdct_backward could be rewritten to take advantage of a muladd. My target machine can do either two-wide 32x32 + Accum(64) -> Accum(64) integer muladd or eight-wide 16x16 + Accum(32) -> Accum(32) integer muladd or four-wide single-precision floating-point muladd. The tremor code seems to be much cleaner and more portable than the stock version for cons...

[LLVMdev] Function permutation at IR bytecode level

2013 Mar 07

[LLVMdev] Function permutation at IR bytecode level

...ing pass in LLVM and interested in doing function permutation at intermediate representation byte code level? If I have lets say C program having three functions and its corresponding IR bytecode. void findLen(char a[10]) { int tmp = strlen(a); printf("Len is : %d\n", tmp); } void muladd(int a, int b, int c) { int tmp = a + b; int tmp1 = tmp * c; printf("Addition is : %d Multiplication is : %d\n", tmp, tmp1); char d[10] = "abhd"; findLen(d); } void main() { int x = 8, y = 5, z = 3; muladd(x, y, z); } In its corresponding .s [IR bytecode] f...

[LLVMdev] Question to use inline assemble in X86

2009 Dec 29

[LLVMdev] Question to use inline assemble in X86

On Dec 29, 2009, at 3:09 AM, Heyu Zhu wrote: > Hi everyone, > > I try to add an instruction to x86. The instruction is a multiply-add instruction > MULADD A, B, C; //A = A + B * C. > I use the instruction by inline assemble as below > > int x, y, z; > ..... .... > x = 0; > asm("MULADD %0, %1, %2":"=r"(x):"0"(x), "r"(y), "r"(z)); > ..... .... > > The backend does allocat...

[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

2014 Aug 07

[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

...ose description includes: "Allow optimizations for floating-point arithmetic that (a) assume that arguments and results are valid and (b) may violate IEEE or ANSI standards." I am not a floating point expert, for the applications I care usually more precision is better, and that is what muladd provides. Given Tim's explanation, I thought that muladd would conflict with (b) and some user would expect the exact roundings for the mul and add. However, I find this statement in Section 5 of IEEE floating point standard: "Each of the computational operations that return a numeric res...

[LLVMdev] Queries regarding function's arguments data type

2013 Feb 25

[LLVMdev] Queries regarding function's arguments data type

On 2/25/13 1:44 PM, teja tamboli wrote: > Hi all, > > I am working on my Master's project in security and I am trying to > iterate over the argument list of the function. Basically I need to do > following things : Interesting. Just out of curiosity, can you tell us what your project is about? > > 1. Check data type of each argument of the argument list of the

[LLVMdev] Queries regarding function's arguments data type

2013 Feb 25

[LLVMdev] Queries regarding function's arguments data type

Hi all, I am working on my Master's project in security and I am trying to iterate over the argument list of the function. Basically I need to do following things : 1. Check data type of each argument of the argument list of the function. 2. Based on its data type like character array or integer array, pointer, int, char, take different action. 3. I have added following code to check its

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

2017 Jun 13

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

...ler should reassociate a mul + add into a mad where possible. > In actuality, IMAD is actually super-slow... allegedly slower than > IMUL + IADD. Not sure why. Maxwell added a XMAD operation which is > faster but we haven't figured out how to operate it yet. I'm not aware > of a muladd version of fma on fermi and newer (GL 4.0). The tesla > series does have a floating point mul+add (but no fma). > Interesting. radeons seem to always have a unfused mad. pre-gcn parts apparently only have a 32bit fma with parts supporting double precision. The same restriction is stated for...

[LLVMdev] How to use property 'isCommutable' in target description file?

2009 Dec 07

[LLVMdev] How to use property 'isCommutable' in target description file?

Hi everyone, I practice writing target description file with MSP430 reference. I add a multiply-and-add instruction as below: let isTwoAddress=1 in { def MULADD:Pseudo<(out GR16:$dst), (ins GR16:$src1, GR16:$src2, GR16:$src3), "muladd\t{$dst, $src2, $src3}", [(set GR16:$dst, (add GR16:$src1, (mul GR16:$src2, GR16:$src3)))]> } How can i tell the system X=A*B + C == X = B*A +...

accelerating matrix multiply

2017 Jan 07

accelerating matrix multiply

...ovides any benefit. I was a little surprised that the O(n) NaN check is costly compared to the O(n**2) dgemm that follows. I think the reason is that nan check is single thread and not vectorized, and my machine can do 2048 floating point ops/cycle when you consider the cores/dual issue/8 way SIMD/muladd, and the constant factor will be significant for even large matrices. Would you consider deleting the breaks? I can submit a patch if that will help. Thanks. Robert

[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

2014 Jul 31

[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

...avoid unless specifically told we're allowed. It can be just as > harmful to carefully written floating-point code as dropping precision > would be. > > > Also, in TargetOptions.h I read: > > > > Standard, // Only allow fusion of 'blessed' ops (currently just fmuladd) > > > > which made me suspect that the check against Fast in the DAGCombiner is not > > correct. > > I think it's OK. In the IR there are 3 different ways to express mul + add: > > 1. fmul + fadd. This must not be fused into a single step without > intermediate...

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

2017 Jun 12

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

This looks like the right idea to me too. It may sound a bit weird to do that per instruction, but d3d11 does that as well. (Some d3d versions just have a global flag basically forbidding or allowing any such fast math optimizations in the assembly, but I'm not actually sure everybody honors that without tesselation...) For 1/9: Reviewed-by: Roland Scheidegger <sroland at vmware.com>

accelerating matrix multiply

2017 Jan 11

accelerating matrix multiply

...as a little surprised that the O(n) NaN check is costly compared to > the O(n**2) dgemm that follows. I think the reason is that nan check > is single thread and not vectorized, and my machine can do 2048 > floating point ops/cycle when you consider the cores/dual issue/8 way > SIMD/muladd, and the constant factor will be significant for even > large matrices. > > Would you consider deleting the breaks? I can submit a patch if that > will help. Thanks. > > Robert Thank you Robert for bringing the issue up ("again", possibly). Within R core, some have...

[LLVMdev] Error: Type constraint application shouldn't fail!

2011 Dec 02

[LLVMdev] Error: Type constraint application shouldn't fail!

...(i16 (extractelt node:$a, (i32 1)))), (sext (i16 (extractelt node:$b, (i32 0)))) ) ) >; def ADDMULv : InstSP< (outs IntRegs:$dst), (ins IntRegs:$a, IntRegs:$b), "muladd $a, $b, $dst", [(set (i32 IntRegs:$dst), (mula_pat (v2i16 IntRegs:$a), (v2i16 IntRegs:$b)))]>; IntRegs is class with a type list of [i32, v2i16] But I get the following error when llvm building system try to generate the ISelector llvm[3]: Building Sparc.td DAG inst...

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

2017 Jun 13

[Mesa-dev] [RFC 0/9] Add precise/invariant semantics to TGSI

...ia hw...) The compiler should reassociate a mul + add into a mad where possible. In actuality, IMAD is actually super-slow... allegedly slower than IMUL + IADD. Not sure why. Maxwell added a XMAD operation which is faster but we haven't figured out how to operate it yet. I'm not aware of a muladd version of fma on fermi and newer (GL 4.0). The tesla series does have a floating point mul+add (but no fma).

_LOW_ACCURACY_ good enough?

2003 May 23

_LOW_ACCURACY_ good enough?

...can afford more cpu time than cache misses (or memory for that matter). (For what it's worth, my current test case in tremor _LOW_ACCURACY_ runs in about 630M cycles; without _LOW_ACCURACY_ but with other PS2-specific optimizations, it runs in 710M cycles; vorbis with some minor floating-point-muladd optimizations and the longs changed to ogg_int32_t runs in about 800M cycles -- I suspect the huge trig lookups in mdct are killing me there). -Dave --- >8 ---- List archives: http://www.xiph.org/archives/ Ogg project homepage: http://www.xiph.org/ogg/ To unsubscribe from this list, send a me...

accelerating matrix multiply

2017 Jan 10

accelerating matrix multiply

...I was a little surprised that the O(n) NaN check is costly > compared to the O(n**2) dgemm that follows. I think the reason > is that nan check is single thread and not vectorized, and my > machine can do 2048 floating point ops/cycle when you consider > the cores/dual issue/8 way SIMD/muladd, and the constant > factor will be significant for even large matrices. > > Would you consider deleting the breaks? I can submit a patch > if that will help. Thanks. > > Robert Thank you Robert for bringing the issue up ("again", possibly). Within R core, some have se...

accelerating matrix multiply

2017 Jan 16

accelerating matrix multiply

...rised that the O(n) NaN check is costly compared >> to the O(n**2) dgemm that follows. I think the reason is that nan >> check is single thread and not vectorized, and my machine can do 2048 >> floating point ops/cycle when you consider the cores/dual issue/8 way >> SIMD/muladd, and the constant factor will be significant for even >> large matrices. >> >> Would you consider deleting the breaks? I can submit a patch if that >> will help. Thanks. >> >> Robert > Thank you Robert for bringing the issue up ("again", possibly)....

accelerating matrix multiply

2017 Jan 16

accelerating matrix multiply

...surprised that the O(n) NaN check is costly compared to >> the O(n**2) dgemm that follows. I think the reason is that nan check >> is single thread and not vectorized, and my machine can do 2048 >> floating point ops/cycle when you consider the cores/dual issue/8 way >> SIMD/muladd, and the constant factor will be significant for even >> large matrices. >> >> Would you consider deleting the breaks? I can submit a patch if that >> will help. Thanks. >> >> Robert > Thank you Robert for bringing the issue up ("again", possibly). &...

[RFC] Tablegen-erated GlobalISel Combine Rules

2018 Nov 10

[RFC] Tablegen-erated GlobalISel Combine Rules

...MYTGT_ADD %A, %B %d:sub_hi = MYTGT_ADD %C, %D Along the same lines, I also think that the integrated debug-info is only really practical in MIR. It's possible to shoe-horn DILocation in using a pseudo-node like so: (set $d, (add (mul $a, $b):$mul, $c):$add -> (set (merged_dilocation (muladd $a, $b, $c), $mul, $add) but there isn't really a good place to modify the DIExpression used by DEBUG_VALUE. > One reason is that it adds yet another parser, which is more maintenance burden without buying much. The expectation is that we can make use of the existing MIR parser and replace...

search for: muladd