thr3ads.net - search: "fmul"

Displaying 20 results from an estimated 339 matches for "fmul".

[LLVMdev] bb-vectorizer transforms only part of the block

2015 Jun 22

[LLVMdev] bb-vectorizer transforms only part of the block

...float>* %5 = bitcast float* %arg2 to <4 x float>* %6 = bitcast float* %1 to <4 x float>* %7 = load <4 x float>* %3, align 16 %8 = load <4 x float>* %4, align 16 %9 = load <4 x float>* %5, align 16 %10 = load <4 x float>* %6, align 16 %11 = fmul <4 x float> %10, %7 %12 = fmul <4 x float> %9, %8 %13 = fadd <4 x float> %12, %11 %14 = bitcast float* %2 to <4 x float>* %15 = fmul <4 x float> %10, %8 %16 = fmul <4 x float> %9, %7 %17 = fsub <4 x float> %16, %15 %18 = bitcast float*...

[LLVMdev] spilling & xmm register usage

2010 Sep 29

[LLVMdev] spilling & xmm register usage

...er.loop > > entry.header.loop.end: ; preds = %cond.then.i201.i, %phi.exit138.i > %cond.i204.i = phi float [ %tmp43.i200.i, %cond.then.i201.i ], [ %tmp38.i194.i, %phi.exit138.i ] > %arrayidx82.i = getelementptr float addrspace(1)* %5, i64 %8 > %tmp85.i = fmul float %tmp63.i, %cond.i204.i > %tmp88.i = fmul float %tmp9.i, %cond.i135.i > %tmp89.i = fsub float %tmp85.i, %tmp88.i > store float %tmp89.i, float addrspace(1)* %arrayidx82.i, align 4 > %inc = add i32 %7, 1 > %exitcond = icmp eq i32 %inc, %local_size_0 > br i1 %exitcond, la...

[LLVMdev] spilling & xmm register usage

2010 Sep 29

[LLVMdev] spilling & xmm register usage

Hello everybody, I have stumbled upon a test case (the attached module is a slightly reduced version) that shows extremely reduced performance on linux compared to windows when executed using LLVM's JIT. We narrowed the problem down to the actual code being generated, the source IR on both systems is the same. Try compiling the attached module: llc -O3 -filetype=asm -o BAD.s BAD.ll Under

[LLVMdev] SIMD instructions and memory alignment on X86

2013 Jul 18

[LLVMdev] SIMD instructions and memory alignment on X86

Are you able to send any IR for others to reproduce this issue? On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman <peter at uformia.com> wrote: > Unfortunately, this doesn't appear to be the bug I'm hitting. I applied > the fix to my source and it didn't make a difference. > > Also further testing found me getting the same behavior with other SIMD > instructions.

[LLVMdev] Modifications to SLP

2015 Jul 07

[LLVMdev] Modifications to SLP

...Here's an example: %0 = getelementptr float* %arg1, i32 49 %1 = load float* %0 %2 = getelementptr float* %arg1, i32 4145 %3 = load float* %2 %4 = getelementptr float* %arg2, i32 49 %5 = load float* %4 %6 = getelementptr float* %arg2, i32 4145 %7 = load float* %6 %8 = fmul float %7, %1 %9 = fmul float %5, %3 %10 = fadd float %9, %8 %11 = fmul float %7, %3 %12 = fmul float %5, %1 %13 = fsub float %12, %11 %14 = getelementptr float* %arg3, i32 16 %15 = load float* %14 %16 = getelementptr float* %arg3, i32 4112 %17 = load float* %16 %18 = g...

[LLVMdev] SIMD instructions and memory alignment on X86

2013 Jul 18

[LLVMdev] SIMD instructions and memory alignment on X86

Unfortunately, this doesn't appear to be the bug I'm hitting. I applied the fix to my source and it didn't make a difference. Also further testing found me getting the same behavior with other SIMD instructions. The common factor is in each case, ECX is set to 0x7fffffff, and it's an operation using xmm ptr ecx+offset . Additionally, turning the optimization level passed to

[LLVMdev] SIMD instructions and memory alignment on X86

2013 Jul 19

[LLVMdev] SIMD instructions and memory alignment on X86

...%21, 1 %25 = extractvalue { <2 x double>, <2 x double>, <2 x double> } %21, 2 %26 = extractelement <4 x double> %22, i32 0 %27 = insertelement <2 x double> zeroinitializer, double %26, i32 0 %28 = insertelement <2 x double> %27, double %26, i32 1 %29 = fmul <2 x double> %28, <double 1.000000e+00, double 1.000000e+00> %30 = fsub <2 x double> %23, %29 %31 = fmul <2 x double> %28, zeroinitializer %32 = fsub <2 x double> %24, %31 %33 = fmul <2 x double> %28, zeroinitializer %34 = fsub <2 x double> %25, %...

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

2013 Nov 10

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

...1 = getelementptr float* %arg7, i64 %372 %432 = load float* %431 %433 = getelementptr float* %arg7, i64 %382 %434 = load float* %433 %435 = getelementptr float* %arg7, i64 %392 %436 = load float* %435 %437 = getelementptr float* %arg7, i64 %402 %438 = load float* %437 %439 = fmul float %406, %188 %440 = fmul float %404, %190 %441 = fadd float %440, %439 %442 = fmul float %406, %190 %443 = fmul float %404, %188 %444 = fsub float %443, %442 %445 = fmul float %410, %200 %446 = fmul float %408, %202 %447 = fadd float %446, %445 %448 = fmul float %410,...

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

2013 Nov 10

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

...i64 %372 > %432 = load float* %431 > %433 = getelementptr float* %arg7, i64 %382 > %434 = load float* %433 > %435 = getelementptr float* %arg7, i64 %392 > %436 = load float* %435 > %437 = getelementptr float* %arg7, i64 %402 > %438 = load float* %437 > %439 = fmul float %406, %188 > %440 = fmul float %404, %190 > %441 = fadd float %440, %439 > %442 = fmul float %406, %190 > %443 = fmul float %404, %188 > %444 = fsub float %443, %442 > %445 = fmul float %410, %200 > %446 = fmul float %408, %202 > %447 = fadd float %446,...

[LLVMdev] Issue with Machine Verifier and earlyclobber

2012 Jul 15

[LLVMdev] Issue with Machine Verifier and earlyclobber

...he following code with llc at -O3: target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:32:64-v128:32:128-a0:0:32-n32-S32" target triple = "armv4t-generic-generic" define float @f3(float %days) nounwind readnone { entry: %mul = fmul float %days, 0x3FEF8A6C60000000 %add = fadd float %mul, 0x40718776A0000000 %mul1 = fmul float %days, 0x3FEF8A09A0000000 %add2 = fadd float %mul1, 0x4076587740000000 %mul3 = fmul float %days, 0x3E81B35CC0000000 %sub = fsub float 0x3FFEA235C0000000, %mul3 %call = tail call float @dsin(flo...

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

2013 Nov 10

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

...float* %431 >> %433 = getelementptr float* %arg7, i64 %382 >> %434 = load float* %433 >> %435 = getelementptr float* %arg7, i64 %392 >> %436 = load float* %435 >> %437 = getelementptr float* %arg7, i64 %402 >> %438 = load float* %437 >> %439 = fmul float %406, %188 >> %440 = fmul float %404, %190 >> %441 = fadd float %440, %439 >> %442 = fmul float %406, %190 >> %443 = fmul float %404, %188 >> %444 = fsub float %443, %442 >> %445 = fmul float %410, %200 >> %446 = fmul float %408, %202 &g...

[LLVMdev] Issue with Machine Verifier and earlyclobber

2012 Jul 15

[LLVMdev] Issue with Machine Verifier and earlyclobber

On Jul 15, 2012, at 9:20 AM, Borja Ferrer <borja.ferav at gmail.com> wrote: > Jakob, one more hint, I've placed some asserts around the code you added and noticed that the InlineSpiller::insertReload() function is not being called. > > 2012/7/14 Borja Ferrer <borja.ferav at gmail.com> > Hello Jakob, > > I'm still getting the error, I can give you any other

[LLVMdev] Fast-math flags in constant expressions

2014 Nov 27

[LLVMdev] Fast-math flags in constant expressions

Hi, I'm wondering why lib/AsmParser/LLParser handles fast-math flags in the following IR: ... %val = fmul nnan double 1.0, 1.0 ... but doesn't allow any flags if "fmul" is inside "phi": ... %val = phi double [ fmul (double 1.0, double 1.0), %cond.true ], [ fmul (double 1.0, double 1.0), %cond.false ] ... LLParser::ParseValID(...) could ca...

Fusing contract fadd/fsub with normal fmul

2017 Jun 10

Fusing contract fadd/fsub with normal fmul

Hi, On LLVM 5.0 (current trunk), fadd/fsub and fmul that are both marked with `contract` or `fast` can be merged to a fma instruction by the backend. I'm wondering about the exact semantic of this new flag as well as `fast` and in particular, would it be valid to do this when only the `fadd`/`fsub` (and not the `fmul`) is marked with `contract`...

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

2013 Nov 10

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

...433 = getelementptr float* %arg7, i64 %382 > %434 = load float* %433 > %435 = getelementptr float* %arg7, i64 %392 > %436 = load float* %435 > %437 = getelementptr float* %arg7, i64 %402 > %438 = load float* %437 > %439 = fmul float %406, %188 > %440 = fmul float %404, %190 > %441 = fadd float %440, %439 > %442 = fmul float %406, %190 > %443 = fmul float %404, %188 > %444 = fsub float %443, %442 > %445 = fmul float %410, %200 > %44...

[PATCH] maxwell,pascal: add scheduling data to shaders

2018 Sep 08

[PATCH] maxwell,pascal: add scheduling data to shaders

...r2 0x0 0x1 t2d 0x8 ipa $r3 a[0x84] $r0 0x0 0x1 -sched (st 0x0) (st 0x0) (st 0x0) +sched (st 0xf wr 0x2) (st 0x1 wr 0x0 wt 0x5) (st 0xe) ipa $r2 a[0x80] $r0 0x0 0x1 tex nodep $r0 $r2 0x0 0x0 t2d 0x8 depbar le 0x5 0x0 0x0 -sched (st 0x0) (st 0x0) (st 0x0) +sched (st 0x6 wt 0x3) (st 0x1) (st 0x1) fmul ftz $r3 $r0 $r1 mov $r2 $r3 0xf mov $r1 $r3 0xf -sched (st 0x0) (st 0x0) (st 0x0) +sched (st 0x1) (st 0xf wt 0x3f) (st 0x1) mov $r0 $r3 0xf exit +nop 0x0 #endif diff --git a/src/shader/exac8nv110.fpc b/src/shader/exac8nv110.fpc index 4aa1368..ffc2bdc 100644 --- a/src/shader/exac8nv110.fpc +++...

[LLVMdev] Upcoming API change: FAdd, FSub, FMul

2009 Jun 15

[LLVMdev] Upcoming API change: FAdd, FSub, FMul

Hello, The LLVM IR opcodes Add, Sub, and Mul have been each split into two. Add, Sub, and Mul now only handle integer types, and three new opcodes, FAdd, FSub, and FMul now handle floating-point types. The main LLVM APIs are currently preserving backwards compatibility, transparently mapping integer opcodes to corresponding floating-point opcodes when the operands have floating-point types. This compatibility code will eventually be removed, so front-end writers...

[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism

2010 Nov 03

[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism

...itself replicates the following block 512 times: . . { p1 = p1 * a; p2 = p2 * b; p3 = p3 * c; p4 = p4 * d; } . . Compiling with NVCC, Ocelot, and LLVM, I can confirm the interleaved instruction schedule with a four-instruction reuse distance. An excerpt follows: . . %r1500 = fmul float %r1496, %r24 ; compute %1500 %r1501 = fmul float %r1497, %r23 %r1502 = fmul float %r1498, %r22 %r1503 = fmul float %r1499, %r21 %r1504 = fmul float %r1500, %r24 ; first use of %1500 %r1505 = fmul float %r1501, %r23 %r1506 = fmul float %r1502, %r22 %r1507 = fmul float %r1503,...

[PATCH v2] nv110/exa: update sched codes

2017 Jun 07

[PATCH v2] nv110/exa: update sched codes

...r0 $r2 0x0 0x0 t2d 0x8 >> > > Out of curiosity, what didn't you add a read-dep-bar on $r2:$r3 here? Missed it, thanks for pointing it out. > > > depbar le 0x5 0x0 0x0 >> -sched (st 0x0) (st 0x0) (st 0x0) >> +sched (st 0x6 wt 0x3) (st 0x6) (st 0x1) >> fmul ftz $r3 $r0 $r1 >> mov $r2 $r3 0xf >> > > You can stall for only one cycle here, but the 6 cycles on fmul is needed. > > mov $r1 $r3 0xf >> -sched (st 0x0) (st 0x0) (st 0x0) >> +sched (st 0x6) (st 0xf) (st 0x0) >> mov $r0 $r3 0xf >> > > Sa...

[LLVMdev] Optimization of calls to functions without side effects (from Kaleidoscope example)

2010 Nov 15

[LLVMdev] Optimization of calls to functions without side effects (from Kaleidoscope example)

...optimizes calls to functions without side effects. Specifically, ready> extern sin(x); ready> extern cos(x); ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x); Read function definition: define double @foo(double %x) { entry: %calltmp = call double @sin(double %x) %multmp = fmul double %calltmp, %calltmp %calltmp2 = call double @cos(double %x) %multmp4 = fmul double %calltmp2, %calltmp2 %addtmp = fadd double %multmp, %multmp4 ret double %addtmp } I find that when I run the code, the calls to sin and cos aren't optimized and, instead, I...

search for: fmul