thr3ads.net - search: "fmuls"

Displaying 20 results from an estimated 339 matches for "fmuls".

Did you mean: fmul

[LLVMdev] bb-vectorizer transforms only part of the block

2015 Jun 22

[LLVMdev] bb-vectorizer transforms only part of the block

The loads, stores and float arithmetic in attached function should be completely vectorizable. The bb-vectorizer does a good job at first, but from instruction %96 on it messes up by adding unnecessary vectorshuffles. (The function was designed so that no shuffle would be needed in order to vectorize it). I tested this with llvm 3.6 with the following command:

[LLVMdev] spilling & xmm register usage

2010 Sep 29

[LLVMdev] spilling & xmm register usage

On Sep 29, 2010, at 8:35 AMPDT, Ralf Karrenberg wrote: > Hello everybody, > > I have stumbled upon a test case (the attached module is a slightly > reduced version) that shows extremely reduced performance on linux > compared to windows when executed using LLVM's JIT. > > We narrowed the problem down to the actual code being generated, the > source IR on both systems

[LLVMdev] spilling & xmm register usage

2010 Sep 29

[LLVMdev] spilling & xmm register usage

Hello everybody, I have stumbled upon a test case (the attached module is a slightly reduced version) that shows extremely reduced performance on linux compared to windows when executed using LLVM's JIT. We narrowed the problem down to the actual code being generated, the source IR on both systems is the same. Try compiling the attached module: llc -O3 -filetype=asm -o BAD.s BAD.ll Under

[LLVMdev] SIMD instructions and memory alignment on X86

2013 Jul 18

[LLVMdev] SIMD instructions and memory alignment on X86

Are you able to send any IR for others to reproduce this issue? On Wed, Jul 17, 2013 at 11:23 PM, Peter Newman <peter at uformia.com> wrote: > Unfortunately, this doesn't appear to be the bug I'm hitting. I applied > the fix to my source and it didn't make a difference. > > Also further testing found me getting the same behavior with other SIMD > instructions.

[LLVMdev] Modifications to SLP

2015 Jul 07

[LLVMdev] Modifications to SLP

Hi all! It takes the current SLP vectorizer too long to vectorize my scalar code. I am talking here about functions that have a single, huge basic block with O(10^6) instructions. Here's an example: %0 = getelementptr float* %arg1, i32 49 %1 = load float* %0 %2 = getelementptr float* %arg1, i32 4145 %3 = load float* %2 %4 = getelementptr float* %arg2, i32 49 %5 = load

[LLVMdev] SIMD instructions and memory alignment on X86

2013 Jul 18

[LLVMdev] SIMD instructions and memory alignment on X86

Unfortunately, this doesn't appear to be the bug I'm hitting. I applied the fix to my source and it didn't make a difference. Also further testing found me getting the same behavior with other SIMD instructions. The common factor is in each case, ECX is set to 0x7fffffff, and it's an operation using xmm ptr ecx+offset . Additionally, turning the optimization level passed to

[LLVMdev] SIMD instructions and memory alignment on X86

2013 Jul 19

[LLVMdev] SIMD instructions and memory alignment on X86

I've attached the module->dump() that our code is producing. Unfortunately this is the smallest test case I have available. This is before any optimization passes are applied. There are two separate modules in existence at the time, and there are no guarantees about the order the surrounding code calls those functions, so there may be some interaction between them? There shouldn't

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

2013 Nov 10

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

The loop vectorizer is doing an amazing job so far. Most of the time. I just came across one function which led to unexpected behavior: On this function the loop vectorizer finds a 256 bit vector as the wides vector type for the x86-64 architecture. (!) This is strange, as it was always finding the correct size of 128 bit as the widest type. I isolated the IR of the function to check if this is

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

2013 Nov 10

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

I looked more into this. For the previously sent IR the vector width of 256 bit is found mistakenly (and reproducibly) on this hardware: model name : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz For the same IR the loop vectorizer finds the correct vector width (128 bit) on: model name : Intel(R) Xeon(R) CPU E5630 @ 2.53GHz model name : Intel(R) Core(TM) i7 CPU M 640 @

[LLVMdev] Issue with Machine Verifier and earlyclobber

2012 Jul 15

[LLVMdev] Issue with Machine Verifier and earlyclobber

I think I'm getting a bit closer to the problem. I've found that the call to InlineSpiller::foldMemoryOperand() inside InlineSpiller::spillAroundUses() is causing the problems. As a test, I removed that call and with your yesterday's patch I'm not getting any errors at all, the code generated is the same one as with the call. This is happening when

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

2013 Nov 10

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

Hi Frank, I'm not an Intel expert, but it seems that your Xeon E5 supports AVX, which does have 256-bit vectors. The other two only supports SSE instructions, which are only 128-bit long. cheers, --renato On 10 November 2013 06:05, Frank Winter <fwinter at jlab.org> wrote: > I looked more into this. For the previously sent IR the vector width of > 256 bit is found mistakenly

[LLVMdev] Issue with Machine Verifier and earlyclobber

2012 Jul 15

[LLVMdev] Issue with Machine Verifier and earlyclobber

On Jul 15, 2012, at 9:20 AM, Borja Ferrer <borja.ferav at gmail.com> wrote: > Jakob, one more hint, I've placed some asserts around the code you added and noticed that the InlineSpiller::insertReload() function is not being called. > > 2012/7/14 Borja Ferrer <borja.ferav at gmail.com> > Hello Jakob, > > I'm still getting the error, I can give you any other

[LLVMdev] Fast-math flags in constant expressions

2014 Nov 27

[LLVMdev] Fast-math flags in constant expressions

Hi, I'm wondering why lib/AsmParser/LLParser handles fast-math flags in the following IR: ... %val = fmul nnan double 1.0, 1.0 ... but doesn't allow any flags if "fmul" is inside "phi": ... %val = phi double [ fmul (double 1.0, double 1.0), %cond.true ], [ fmul (double 1.0, double 1.0), %cond.false ] ...

Fusing contract fadd/fsub with normal fmul

2017 Jun 10

Fusing contract fadd/fsub with normal fmul

Hi, On LLVM 5.0 (current trunk), fadd/fsub and fmul that are both marked with `contract` or `fast` can be merged to a fma instruction by the backend. I'm wondering about the exact semantic of this new flag as well as `fast` and in particular, would it be valid to do this when only the `fadd`/`fsub` (and not the `fmul`) is marked with `contract` or at least `fast`. The reasoning is that doing

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

2013 Nov 10

[LLVMdev] loop vectorizer erroneously finds 256 bit vectors

Hi Renato, you are right! There is 'avx' support: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave

[PATCH] maxwell,pascal: add scheduling data to shaders

2018 Sep 08

[PATCH] maxwell,pascal: add scheduling data to shaders

Generated with envysched. Tested by running rendercheck from piglit, running mplayer -vo xv, and staring at gnome-shell. Signed-off-by: Rhys Perry <pendingchaos02 at gmail.com> --- src/shader/exac8nv110.fp | 11 ++++---- src/shader/exac8nv110.fpc | 22 ++++++++-------- src/shader/exacanv110.fp | 11 ++++---- src/shader/exacanv110.fpc | 22 ++++++++-------- src/shader/exacmnv110.fp | 10

[LLVMdev] Upcoming API change: FAdd, FSub, FMul

2009 Jun 15

[LLVMdev] Upcoming API change: FAdd, FSub, FMul

Hello, The LLVM IR opcodes Add, Sub, and Mul have been each split into two. Add, Sub, and Mul now only handle integer types, and three new opcodes, FAdd, FSub, and FMul now handle floating-point types. The main LLVM APIs are currently preserving backwards compatibility, transparently mapping integer opcodes to corresponding floating-point opcodes when the operands have floating-point types.

[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism

2010 Nov 03

[LLVMdev] LLVM x86 Code Generator discards Instruction-level Parallelism

Dear LLVMdev, I've noticed an unusual behavior of the LLVM x86 code generator (with default options) that results in nearly a 4x slow-down in floating-point throughput for my microbenchmark. I've written a compute-intensive microbenchmark to approach theoretical peak throughput of the target processor by issuing a large number of independent floating-point multiplies. The distance

[PATCH v2] nv110/exa: update sched codes

2017 Jun 07

[PATCH v2] nv110/exa: update sched codes

On Tue, Jun 6, 2017 at 7:15 AM, Samuel Pitoiset <samuel.pitoiset at gmail.com> wrote: > Nice work! > > See my comments below, and double-check if some of them can be applied to > the shaders I didn't review yet. > > I recommend you to test your work because if one sched code is wrong, you > are likely going to kill your card and reboot your box. :-) > > >

[LLVMdev] Optimization of calls to functions without side effects (from Kaleidoscope example)

2010 Nov 15

[LLVMdev] Optimization of calls to functions without side effects (from Kaleidoscope example)

In http://llvm.org/docs/tutorial/LangImpl4.html#jit there's an example that optimizes calls to functions without side effects. Specifically, ready> extern sin(x); ready> extern cos(x); ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x); Read function definition: define double @foo(double %x) { entry: %calltmp = call double @sin(double %x) %multmp = fmul double %calltmp,

search for: fmuls