thr3ads.net - similar to: "MachinePipeliner refactoring"

Displaying 20 results from an estimated 3000 matches similar to: "MachinePipeliner refactoring"

2019 Jul 15

MachinePipeliner refactoring

Hi James: Personally, I like the idea of refactoring and more abstraction, But unfortunately, I don't know enough about the edges cases either. BTW: the prototype is still causing quite some Asseertions in PowerPC - some nodes are not generated in correct order. Best, Jinsong Ji (纪金松), PhD. XL/LLVM on Power Compiler Development E-mail: jji at us.ibm.com From: James Molloy <james at

MachinePipeliner refactoring

2019 Jul 16

MachinePipeliner refactoring

Hi James, I also think that refactoring the code generation part is a great idea. That code is very complicated and difficult to maintain. I’ve wanted to rewrite that code for a long time, but just have never got to it. There are quite a few edge cases to handle (at least in the current code). I’ll take a deeper look at your patch. The abstractions that you mention, Stage and Block, are good

[EXTERNAL] RE: Machinepipeliner interface. shouldIgnoreForPipelining, actually not ignoring.

2020 Sep 07

[EXTERNAL] RE: Machinepipeliner interface. shouldIgnoreForPipelining, actually not ignoring.

Hi James, Having not worked on this for circa one year I've gone and refreshed my memory. We have a pretty capable implementation of swing modulo scheduling downstream, distinct from the MachinePipeliner implementation. Historically, MachinePipeliner had very tight coupling between the finding of a suitable schedule and emitting the code that adheres to that schedule. I spent quite a bit of

[EXTERNAL] RE: Machinepipeliner interface. shouldIgnoreForPipelining, actually not ignoring.

2020 Sep 09

[EXTERNAL] RE: Machinepipeliner interface. shouldIgnoreForPipelining, actually not ignoring.

Hi James, One last thing - is your target upstream? or are you working on a downstream target? Cheers, James On Tue, 8 Sep 2020 at 23:02, Nagurne, James <j-nagurne at ti.com> wrote: > I greatly appreciate you going back to gather that intel, James. It > actually helps my understanding of the whole pipeliner puzzle quite a bit! > > > > I did identify, like you, that the

[EXTERNAL] RE: Machinepipeliner interface. shouldIgnoreForPipelining, actually not ignoring.

2020 Sep 03

[EXTERNAL] RE: Machinepipeliner interface. shouldIgnoreForPipelining, actually not ignoring.

Hi James, Adding Hendrik, who has taken over ownership of the downstream code involved. I can also add background about the rationale, of that helps? It was added to ignore induction variable update code (scalar code) that is rewritten when we unroll / peel the prolog epilog anyway. Targets like Hexagon or PPC with dedicated loop control instructions for pipelined loops don't need this, but

[EXTERNAL] Re: Machinepipeliner interface. shouldIgnoreForPipelining, actually not ignoring.

2020 Sep 02

[EXTERNAL] Re: Machinepipeliner interface. shouldIgnoreForPipelining, actually not ignoring.

Sorry to bring this thread from 3 months ago back, but I’m running into this issue too. I do see that shouldIgnore is not called in the MachinePipeliner, however, James’ comment doesn’t really resolve the issue or make the story any clearer. My summary of the comment is: “Hexagon and PPC9 do not need to ignore any instructions. However, in the case that you do, such as when the indvar update is

Machinepipeliner interface. shouldIgnoreForPipelining, actually not ignoring.

2020 Jun 01

Machinepipeliner interface. shouldIgnoreForPipelining, actually not ignoring.

Hi all, I think there is a mistake in the machinepipeliner interface. In the TargetInstrInfo.h in the class PipelinerLoopInfo there is a function "bool shouldIgnoreForPipelining(const MachineInstr *MI)". The description says that if this function returns true for a given MachineInstr it will not be pipelined. However in reality it is not ignored and is being considered for

Some questions about software pipeline in LLVM 4.0.0

2017 May 25

Some questions about software pipeline in LLVM 4.0.0

Hi, I have some questions about the implementation of Software pipeline in MachinePipeliner.cpp. First, in hexagon backend, between MachinePipeliner and regalloc pass, there're some other passes like phi eliminate, two-address, register coalescing, which may change or insert intructions like 'copy' in MBB, and swp kernel loop may be destroyed by these passes. Why not put

[Pipeliner] MachinePipeliner TargetInstrInfo hooks need more information?

2019 May 10

[Pipeliner] MachinePipeliner TargetInstrInfo hooks need more information?

Hello, I'm working on integrating the MachinePipeliner.cpp pass into our VLIW backend, and so far we've managed to get it working with some nice speedups. Unlike Hexagon however, our backend doesn't generate hardware loop instructions and so all our loops are a combination of induction variables, comparisons and branches. So when it came to implementing reduceLoopCount for our

Some questions about software pipeline in LLVM 4.0.0

2017 Jun 01

Some questions about software pipeline in LLVM 4.0.0

Hi - I replied to the original sender only by mistake. Sorry about that. When we started working on the pipeliner, and added it before the scheduler, we also were concerned that the scheduler or other passes would undo the work of the pipeliner. The initial thought was that we would add information (using metadata or some other way like you've suggested) to the basic block to tell the

[RFC] Porting MachinePipeliner to AArch64+SVE

2018 Jun 08

[RFC] Porting MachinePipeliner to AArch64+SVE

Hi, I am extending LLVM for HPC applications. As one of them, I am trying to make MachinePipeliner available on AArch64 + Scalable Vector Extension environment. MachinePipeliner is currently used only by Hexagon CPU. Since it is a very portable implementation, I think that it will actually work just by adding a little code for many CPUs(See Code [2]). The current MachinePipeliner is written on

[RFC][AArch64] Homogeneous Prolog and Epilog for Size Optimization

2020 Mar 24

[RFC][AArch64] Homogeneous Prolog and Epilog for Size Optimization

Hello, I'd like to upstream our work over the time which the community would benefit from. This is a part of effort toward minimizing code size presented in here <https://llvm.org/devmtg/2020-02-23/slides/Kyungwoo-GlobalMachineOutlinerForThinLTO.pdf>. In particular, this RFC is about optimizing prolog and epilog for size. *Homogeneous Prolog and Epilog for Size Optimization, D76570

[RFC][AArch64] Homogeneous Prolog and Epilog for Size Optimization

2020 Mar 24

[RFC][AArch64] Homogeneous Prolog and Epilog for Size Optimization

Hi Vedant, Thanks for your interest and comment. Size-optimization improves page-faults and a start-up time for a large application, which this enabling also followed. Even though I didn't see a large regression/complaint on a CPU-bound case, which is not a typical case for mobile workload, I wanted to be precautious of enabling it by default. However, as with default outlining case, I

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 06

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Jean-Marc, Thanks a lot for reviewing this huge assembly function! silk_warped_autocorrelation_FIX_c()'s kernel part is for( n = 0; n < length; n++ ) { tmp1_QS = silk_LSHIFT32( (opus_int32)input[ n ], QS ); /* Loop over allpass sections */ for( i = 0; i < order; i++ ) { /* Output of allpass section */ tmp2_QS = silk_SMLAWB(

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Jan 31

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi, Attached is a patch with arm neon optimizations for silk_warped_autocorrelation_FIX(). Please review. Thanks, Felicia -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170131/9a912bb4/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name:

Question about Prolog/Epilog Code Insertion

2017 Jun 09

Question about Prolog/Epilog Code Insertion

Hi All, When seeing the title "Prolog/Epilog Code Insertion", I'd expect something about XXXFrameLowering.cpp (particular about emitPrologue/emitEpilogue). But the document [1] is about unwind. Is it placed at the right place/section? Thanks. [1] http://llvm.org/docs/CodeGenerator.html#prolog-epilog-code-insertion Regards, chenwj -- Wei-Ren Chen (陳韋任) Homepage:

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 07

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

This is a great idea. But the order (psEncC->shapingLPCOrder) can be configured to 12, 14, 16, 20 and 24 according to complexity parameter. It's hard to get a universal function to handle all these orders efficiently. Any suggestions? Thanks, Linfeng On Mon, Feb 6, 2017 at 12:40 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > On 06/02/17 02:51 PM,

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 07

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Jean-Marc, Thanks for your suggestions. Will get back to you once we have some updates. Linfeng On Mon, Feb 6, 2017 at 5:47 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > On 06/02/17 07:18 PM, Linfeng Zhang wrote: > > This is a great idea. But the order (psEncC->shapingLPCOrder) can be > > configured to 12, 14, 16, 20 and 24 according to

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

I attached a new patch with small cleanup (disassembly is identical as the last patch). We have done the same internal testing as usual. Also, attached 2 failed temporary versions which try to reduce code size (just for code review reference purpose). The new patch of silk_warped_autocorrelation_FIX_neon() has a code size of 3,228 bytes (with gcc). smaller_slower.c has a code size of 2,304

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Thank Jean-Marc! The speedup percentages are all relative to the entire encoder. Comparing to master, this optimization patch speeds up fixed-point SILK encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8% Complexity 8: 5.5% Complexity 10: 4.0% when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max MHz: 2116.5 Thanks, Linfeng On Wed, Apr 5, 2017 at 11:02 AM,

similar to: MachinePipeliner refactoring