Displaying 20 results from an estimated 3000 matches similar to: "MachinePipeliner refactoring"
2019 Jul 15
1
MachinePipeliner refactoring
Hi James:
Personally, I like the idea of refactoring and more abstraction,
But unfortunately, I don't know enough about the edges cases either.
BTW: the prototype is still causing quite some Asseertions in PowerPC -
some nodes are not generated in correct order.
Best,
Jinsong Ji (纪金松), PhD.
XL/LLVM on Power Compiler Development
E-mail: jji at us.ibm.com
From: James Molloy <james at
2019 Jul 16
2
MachinePipeliner refactoring
Hi James,
I also think that refactoring the code generation part is a great idea. That code is very complicated and difficult to maintain. I’ve wanted to rewrite that code for a long time, but just have never got to it. There are quite a few edge cases to handle (at least in the current code). I’ll take a deeper look at your patch. The abstractions that you mention, Stage and Block, are good
2020 Sep 07
2
[EXTERNAL] RE: Machinepipeliner interface. shouldIgnoreForPipelining, actually not ignoring.
Hi James,
Having not worked on this for circa one year I've gone and refreshed my
memory.
We have a pretty capable implementation of swing modulo scheduling
downstream, distinct from the MachinePipeliner implementation.
Historically, MachinePipeliner had very tight coupling between the finding
of a suitable schedule and emitting the code that adheres to that schedule.
I spent quite a bit of
2020 Sep 09
2
[EXTERNAL] RE: Machinepipeliner interface. shouldIgnoreForPipelining, actually not ignoring.
Hi James,
One last thing - is your target upstream? or are you working on a
downstream target?
Cheers,
James
On Tue, 8 Sep 2020 at 23:02, Nagurne, James <j-nagurne at ti.com> wrote:
> I greatly appreciate you going back to gather that intel, James. It
> actually helps my understanding of the whole pipeliner puzzle quite a bit!
>
>
>
> I did identify, like you, that the
2020 Sep 03
1
[EXTERNAL] RE: Machinepipeliner interface. shouldIgnoreForPipelining, actually not ignoring.
Hi James,
Adding Hendrik, who has taken over ownership of the downstream code
involved.
I can also add background about the rationale, of that helps? It was added
to ignore induction variable update code (scalar code) that is rewritten
when we unroll / peel the prolog epilog anyway.
Targets like Hexagon or PPC with dedicated loop control instructions for
pipelined loops don't need this, but
2020 Sep 02
2
[EXTERNAL] Re: Machinepipeliner interface. shouldIgnoreForPipelining, actually not ignoring.
Sorry to bring this thread from 3 months ago back, but I’m running into this issue too.
I do see that shouldIgnore is not called in the MachinePipeliner, however, James’ comment doesn’t really resolve the issue or make the story any clearer.
My summary of the comment is: “Hexagon and PPC9 do not need to ignore any instructions. However, in the case that you do, such as when the indvar update is
2020 Jun 01
2
Machinepipeliner interface. shouldIgnoreForPipelining, actually not ignoring.
Hi all,
I think there is a mistake in the machinepipeliner interface. In the
TargetInstrInfo.h in the class PipelinerLoopInfo there is a function
"bool shouldIgnoreForPipelining(const MachineInstr *MI)". The
description says that if this function returns true for a given
MachineInstr it will not be pipelined.
However in reality it is not ignored and is being considered for
2017 May 25
3
Some questions about software pipeline in LLVM 4.0.0
Hi,
I have some questions about the implementation of Software pipeline in MachinePipeliner.cpp.
First, in hexagon backend, between MachinePipeliner and regalloc pass, there're some other passes like phi eliminate, two-address, register coalescing, which may change or insert intructions like 'copy' in MBB, and swp kernel loop may be destroyed by these passes.
Why not put
2019 May 10
2
[Pipeliner] MachinePipeliner TargetInstrInfo hooks need more information?
Hello,
I'm working on integrating the MachinePipeliner.cpp pass into our VLIW
backend, and so far we've managed to get it working with some nice
speedups.
Unlike Hexagon however, our backend doesn't generate hardware loop
instructions and so all our loops are a combination of induction
variables, comparisons and branches. So when it came to implementing
reduceLoopCount for our
2017 Jun 01
1
Some questions about software pipeline in LLVM 4.0.0
Hi - I replied to the original sender only by mistake. Sorry about that.
When we started working on the pipeliner, and added it before the scheduler,
we also were concerned that the scheduler or other passes would undo the
work of the pipeliner. The initial thought was that we would add information
(using metadata or some other way like you've suggested) to the basic block
to tell the
2018 Jun 08
4
[RFC] Porting MachinePipeliner to AArch64+SVE
Hi,
I am extending LLVM for HPC applications.
As one of them, I am trying to make MachinePipeliner available on
AArch64 + Scalable Vector Extension environment.
MachinePipeliner is currently used only by Hexagon CPU.
Since it is a very portable implementation, I think that it will
actually work just by adding a little code for many CPUs(See Code [2]).
The current MachinePipeliner is written on
2020 Mar 24
2
[RFC][AArch64] Homogeneous Prolog and Epilog for Size Optimization
Hello,
I'd like to upstream our work over the time which the community would
benefit from.
This is a part of effort toward minimizing code size presented in here
<https://llvm.org/devmtg/2020-02-23/slides/Kyungwoo-GlobalMachineOutlinerForThinLTO.pdf>.
In particular, this RFC is about optimizing prolog and epilog for size.
*Homogeneous Prolog and Epilog for Size Optimization, D76570
2020 Mar 24
2
[RFC][AArch64] Homogeneous Prolog and Epilog for Size Optimization
Hi Vedant,
Thanks for your interest and comment.
Size-optimization improves page-faults and a start-up time for a large
application, which this enabling also followed.
Even though I didn't see a large regression/complaint on a CPU-bound case,
which is not a typical case for mobile workload, I wanted to be precautious
of enabling it by default.
However, as with default outlining case, I
2017 Feb 06
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc,
Thanks a lot for reviewing this huge assembly function!
silk_warped_autocorrelation_FIX_c()'s kernel part is
for( n = 0; n < length; n++ ) {
tmp1_QS = silk_LSHIFT32( (opus_int32)input[ n ], QS );
/* Loop over allpass sections */
for( i = 0; i < order; i++ ) {
/* Output of allpass section */
tmp2_QS = silk_SMLAWB(
2017 Jan 31
6
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi,
Attached is a patch with arm neon optimizations for
silk_warped_autocorrelation_FIX(). Please review.
Thanks,
Felicia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20170131/9a912bb4/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name:
2017 Jun 09
2
Question about Prolog/Epilog Code Insertion
Hi All,
When seeing the title "Prolog/Epilog Code Insertion", I'd expect
something about XXXFrameLowering.cpp
(particular about emitPrologue/emitEpilogue). But the document [1] is about
unwind. Is it placed at the right
place/section?
Thanks.
[1] http://llvm.org/docs/CodeGenerator.html#prolog-epilog-code-insertion
Regards,
chenwj
--
Wei-Ren Chen (陳韋任)
Homepage:
2017 Feb 07
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
This is a great idea. But the order (psEncC->shapingLPCOrder) can be
configured to 12, 14, 16, 20 and 24 according to complexity parameter.
It's hard to get a universal function to handle all these orders
efficiently. Any suggestions?
Thanks,
Linfeng
On Mon, Feb 6, 2017 at 12:40 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:
> Hi Linfeng,
>
> On 06/02/17 02:51 PM,
2017 Feb 07
3
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc,
Thanks for your suggestions. Will get back to you once we have some updates.
Linfeng
On Mon, Feb 6, 2017 at 5:47 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:
> Hi Linfeng,
>
> On 06/02/17 07:18 PM, Linfeng Zhang wrote:
> > This is a great idea. But the order (psEncC->shapingLPCOrder) can be
> > configured to 12, 14, 16, 20 and 24 according to
2017 Apr 05
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
I attached a new patch with small cleanup (disassembly is identical as the
last patch). We have done the same internal testing as usual.
Also, attached 2 failed temporary versions which try to reduce code size
(just for code review reference purpose).
The new patch of silk_warped_autocorrelation_FIX_neon() has a code size of
3,228 bytes (with gcc).
smaller_slower.c has a code size of 2,304
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Thank Jean-Marc!
The speedup percentages are all relative to the entire encoder.
Comparing to master, this optimization patch speeds up fixed-point SILK
encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8%
Complexity 8: 5.5% Complexity 10: 4.0%
when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max
MHz: 2116.5
Thanks,
Linfeng
On Wed, Apr 5, 2017 at 11:02 AM,