thr3ads.net - search: "madd"

[LLVMdev] Question on Machine Combiner Pass

2015 Feb 04

2

[LLVMdev] Question on Machine Combiner Pass

...mbiner.cpp I see that in the function MachineCombiner::preservesCriticalPathLen we try to determine whether the new combined instruction lengthens the critical path or not. In order to do this we compute the depth and latency for the current instruction (MUL+ADD) and the alternate instruction (MADD). But we call two different set of APIs for the current and new instructions: For new instruction we use: unsigned NewRootDepth = getDepth(InsInstrs, InstrIdxForVirtReg, BlockTrace); unsigned NewRootLatency = getLatency(Root, NewRoot, BlockTrace); While for the current instruction we use...

AArch64 fmul/fadd fusion

2015 Sep 19

2

AArch64 fmul/fadd fusion

Hi All, Recently I was doing some AArch64 work and noticed some cases where fmuls were not getting fused with fadds. Is there any particular reason that the AArch64 machine combiner doesn't do this like it does for add/mul? I am happy to work up a patch for this, but I wanted to make sure that there wasn't a good reason for it not already being there. FWIW, I see where GCC is doing

AArch64 fmul/fadd fusion

2015 Sep 19

3

AArch64 fmul/fadd fusion

On Fri, Sep 18, 2015 at 10:34 PM, Tim Northover <t.p.northover at gmail.com> wrote: > AArch64's fmadd instruction is fused, which means it can produce a > different result to the two operations executed separately. The C and > C++ standards do not allow such changes. Sorry, sloppy language on my part. I was aware of fmadd, but I was really asking about turning sequences like: fmul s0, s0...

How to overcome 32000 subdirs limit

2010 Nov 19

1

How to overcome 32000 subdirs limit

Hi. I have a system storage HP MSA 2012 with 12 drives in it: 8 drives are 2 Tb each and 4 are 1 Tb each. All of them are in array RAID 1+0. This storage is connected to two servers which use data, stored on the storage. So I'm using OCFS2 on these two nodes. Today, after long time of successfull work with it, I've found that it has a limit of 32000 subdirs. The trouble is I have more than

PWGL in wine, problems

2008 May 14

6

PWGL in wine, problems

Hello, I'm new on this list. First of all, thank you to all the developers of this great project! At the moment there is only an application that keeps me on both macos and windows, its name is PWGL a free environment for computer assisted composition in openGL. (http://www2.siba.fi/PWGL/) I'm running Ubuntu 8.04 and wine 0.9.59. I have to say that I also installed vcrun2005 and

[LLVMdev] Subregister liveness tracking

2013 Oct 08

0

[LLVMdev] Subregister liveness tracking

What I didn't mention in r192119 is that mthi/lo clobbers the other sub-register only if the contents of hi and lo are produced by mult or other arithmetic instructions (div, madd, etc.) It doesn't have this side-effect if it is produced by another mthi/lo. So I don't think making mthi/lo clobber the other half would work. For example, this is an illegal sequence of instructions, where instruction 3 makes $hi unpredictable: 1. mult $lo<def>, $hi<def>, $...

[LLVMdev] Register allocation limitations

2013 Nov 07

0

[LLVMdev] Register allocation limitations

Hi Nikos, You can model your requirement in the *.td using RegisterClass as def SrcRegs : RegisterClass<"Src", [i32], 4, (add R0, R2, R4, R6 )>; def DstRegs : RegisterClass<"Dst", [i32], 4, (add R1, R3, R5, R7 )>; Thanks ~Umesh On Thu, Nov 7, 2013 at 8:25 PM, Stavropoulos Nikos < n.stavropoulos at think-silicon.com> wrote: > Hi all.

[LLVMdev] Register allocation limitations

2013 Nov 07

2

[LLVMdev] Register allocation limitations

Hi all. if there is limitation for the registers to be used together in an instruction, should i try to change it in the register allocation pass or should i try it somewhere else?? example. lets say we have to add 2 registers addu rx ,ry ,rz there is a limitation that says that the two regs that will be added they can not have the same mod4 so we can add r1 , r2 but cannot add r1,r5.

Verifying Backend Schedule (Over)Coverage

2017 Jun 21

2

Verifying Backend Schedule (Over)Coverage

I ran into an interesting problem when helping to land a scheduler .td file that my colleague had written. The problem that came up was that a multiply/add pair was not combined into an madd, but just for our CPU. Upon digging into it, the problem turned out to be that '(instregex "^SUB" ...' was matching "SUBREG_TO_REG" and incorrectly increasing the schedule length. I removed the overly aggressive match, but I noticed that there were lots of instructions t...

patches for xiph build setup

2004 Jun 10

4

patches for xiph build setup

Hi, I offered some time ago to do some build cleanup. Today I did so and here's my slew of patches. Basically, they - touch ogg, vorbis, vorbis-tools and theora - fix a bunch of autotools issues - uniformize the use of them across the four - fix compile/link flags - use pkgconfig if it's available to detect flags - for vorbis-tools, generate and use config.h - add -uninstalled .pc stuff

patches for xiph build setup

2004 Jun 10

4

patches for xiph build setup

Hi, I offered some time ago to do some build cleanup. Today I did so and here's my slew of patches. Basically, they - touch ogg, vorbis, vorbis-tools and theora - fix a bunch of autotools issues - uniformize the use of them across the four - fix compile/link flags - use pkgconfig if it's available to detect flags - for vorbis-tools, generate and use config.h - add -uninstalled .pc stuff

[LLVMdev] Subregister liveness tracking

2013 Oct 09

4

[LLVMdev] Subregister liveness tracking

On Oct 8, 2013, at 2:06 PM, Akira Hatanaka <ahatanak at gmail.com> wrote: > What I didn't mention in r192119 is that mthi/lo clobbers the other sub-register only if the contents of hi and lo are produced by mult or other arithmetic instructions (div, madd, etc.) It doesn't have this side-effect if it is produced by another mthi/lo. So I don't think making mthi/lo clobber the other half would work. Uh that is indeed nasty, and can’t really be expressed like that in the current RA framework I think. > > For example, this is an illegal...

[LLVMdev] r57326 malfunctions?

2008 Oct 31

1

[LLVMdev] r57326 malfunctions?

....h 2008/10/01 17:38:40 56923 +++ llvm-gcc-4.2/trunk/gcc/config/i386/darwin.h 2008/10/02 06:16:08 56946 @@ -101,6 +101,8 @@ %{!mmacosx-version-min=*: %{!miphoneos-version-min=*: %(darwin_cc1_minversion)}} \ "/* APPLE LOCAL ignore -mcpu=G4 -mcpu=G5 */"\ %<faltivec %<mno-fused-madd %<mlong-branch %<mlongcall %<mcpu=G4 %<mcpu=G5 \ + "/* APPLE LOCAL enable format security warnings */"\ + %{!Wno-format:-Wformat -Wformat-security} \ %{g: %{!fno-eliminate-unused-debug-symbols: -feliminate-unused-debug-symbols }}" /* APPLE LOCAL AltiVec */ foll...

Sink redundant spill after RA

2018 Feb 22

2

Sink redundant spill after RA

...stp x20, x19, [sp, #192] // 8-byte Folded Spill stp x29, x30, [sp, #208] // 8-byte Folded Spill ldrsw x8, [x0, #4424] sxtw x10, w2 <------------- w2 is the use of spilled value before spill. sxtw x12, w1 madd x8, x8, x10, x12 ldr x9, [x0, #8] add x9, x9, x8, lsl #2 ldrh w11, [x9] ldrh w10, [x0, #16] str x2, [sp, #120] // 8-byte Folded Spill <------------- spill !!! cmp w11, w10 b.eq .LBB2_32 // %b...

[GlobalISel] Quick Status

2017 Jan 21

12

[GlobalISel] Quick Status

...- Improve the optimization heuristic. ** InstructionSelect ** - Core logic present. - TableGen support for simple SDISel patterns (i.e., GISel reuses SDISel patterns) * What’s Left * - Teach TableGen how to reuse more complex patterns: — Patterns with combines in them (e.g., (mull (add)) => madd) — Patterns with complex patterns (e.g., SelectAddressModXR0) *** On Going Work *** - General approach: use AArch64 O0 on the LLVM test suite as a driving vehicle to guide what to support next in the various passes. - Extend TableGen support to reuse more and more SDISel patterns. - ARM port. -...

[LLVMdev] Subregister liveness tracking

2013 Oct 08

2

[LLVMdev] Subregister liveness tracking

Currently it will always spill / restore the whole vreg but only spilling the parts that are actually live would be a nice addition in the future. Looking at r192119': if "mtlo" writes to $LO and sets $HI to an unpredictable value, then it should just have an additional (dead) def operand for $hi, shouldn't it? Greetings Matthias Am 10/8/13, 11:03 AM, schrieb Akira

mixed-effects model using lmer

2007 Jun 28

0

mixed-effects model using lmer

...0.0e+00 8.1e-04 0.0e+00 0.0e+00 GEOR 0.0e+00 -9.3e-03 0.0e+00 0.0e+00 HANN 0.0e+00 0.0e+00 3.2e-10 0.0e+00 HERO -6.3e-16 0.0e+00 0.0e+00 0.0e+00 JOUG 0.0e+00 -2.3e-02 0.0e+00 0.0e+00 MADD 7.3e-16 0.0e+00 0.0e+00 0.0e+00 MOOT 0.0e+00 0.0e+00 0.0e+00 8.7e-11 NEKO 0.0e+00 -2.1e-03 0.0e+00 0.0e+00 PETE 0.0e+00 0.0e+00 0.0e+00 2.5e-09 PLEN 0.0e+00 0.0e+00 0.0e+00 -2.0e-10...

[LLVMdev] Stop MachineCSE on certain instructions

2011 Dec 21

2

[LLVMdev] Stop MachineCSE on certain instructions

Hi, Jim. In my case the target (Tilera) doesn't have a full 32-bit mult operation and to do so it has to accumulate results from three 16-bit mults, by retaining operands and the result across in the same registers. However the ISel DAG thinks its a CSE case. Please note this is not a MAdd/MSub triad. How could I do this by defining such a sequence or the pattern in the .def file itself for the ISD::MUL? Thanks. Girish. >________________________________ > From: Jim Grosbach <grosbach at apple.com> >To: girish gulawani <girishvg at yahoo.com> >Cc: Johannes...

Sink redundant spill after RA

2018 Feb 22

2

Sink redundant spill after RA

...Spill > > stp x29, x30, [sp, #208] // 8-byte Folded Spill > > ldrsw x8, [x0, #4424] > > sxtw x10, w2 <------------- w2 is the > use of spilled value before spill. > > sxtw x12, w1 > > madd x8, x8, x10, x12 > > ldr x9, [x0, #8] > > add x9, x9, x8, lsl #2 > > ldrh w11, [x9] > > ldrh w10, [x0, #16] > > str x2, [sp, #120] // 8-byte Folded Spill > <------------- spill !!! > &...

Sink redundant spill after RA

2018 Feb 22

0

Sink redundant spill after RA

...stp x20, x19, [sp, #192] // 8-byte Folded Spill stp x29, x30, [sp, #208] // 8-byte Folded Spill ldrsw x8, [x0, #4424] sxtw x10, w2 <------------- w2 is the use of spilled value before spill. sxtw x12, w1 madd x8, x8, x10, x12 ldr x9, [x0, #8] add x9, x9, x8, lsl #2 ldrh w11, [x9] ldrh w10, [x0, #16] str x2, [sp, #120] // 8-byte Folded Spill <------------- spill !!! cmp w11, w10 b.eq .LBB2_32 // %b...

search for: madd