search for: epilog

Displaying 20 results from an estimated 254 matches for "epilog".

2020 Mar 24
2
[RFC][AArch64] Homogeneous Prolog and Epilog for Size Optimization
...ke to upstream our work over the time which the community would benefit from. This is a part of effort toward minimizing code size presented in here <https://llvm.org/devmtg/2020-02-23/slides/Kyungwoo-GlobalMachineOutlinerForThinLTO.pdf>. In particular, this RFC is about optimizing prolog and epilog for size. *Homogeneous Prolog and Epilog for Size Optimization, D76570 <https://reviews.llvm.org/D76570>:* Prolog and epilog to handle callee-save registers tend to be irregular with different immediate offsets, which are not often being outlined (by machine outliner) when optimizing for si...
2017 Feb 27
4
[Proposal][RFC] Epilog loop vectorization
Thanks for looking into this. 1) Issues with re running vectorizer: Vectorizer might generate redundant alias checks while vectorizing epilog loop. Redundant alias checks are expensive, we like to reuse the results of already computed alias checks. With metadata we can limit the width of epilog loop, but not sure about reusing alias check result. Any thoughts on rerunning vectorizer with reusing the alias check result ? 2) Best & wo...
2020 Mar 24
2
[RFC][AArch64] Homogeneous Prolog and Epilog for Size Optimization
...an opt-out option. Regards, Kyungwoo On Tue, Mar 24, 2020 at 12:01 PM Vedant Kumar <vedant_kumar at apple.com> wrote: > This looks really interesting. In the slides, it’s mentioned that the > combination of tuning the MachineOutliner for ThinLTO and of optimizing > function prolog/epilogs improved measured run-time performance. > > What kind of performance impact do you see from simply homogenizing > prolog/epilogs? (If, say across LNT/aarch64/-Oz the performance impact is > not large, it may make sense to have homogenization enabled by default.) > > best, > ve...
2019 Jul 16
2
MachinePipeliner refactoring
...I think you’ll need to account for the cycle, or position within a Stage as well. It is a complex problem with a lot different edge cases when dealing with Phis, though I think they can be dealt with much better than the existing code. Generating code for the 3 main parts, the prolog, kernel, and epilog, can be challenging because each has a unique difference that made is hard to generalize the code. For example, the prologs will not have an Phis, but the code generation needs to be aware of when the Phis occur in the sequence in order to get the correct remapped name. In the kernel, values may c...
2017 Mar 14
10
[Proposal][RFC] Epilog loop vectorization
Summarizing the discussion on the implementation approaches. Discussed about two approaches, first running ‘InnerLoopVectorizer’ again on the epilog loop immediately after vectorizing the original loop within the same vectorization pass, the second approach where re-running vectorization pass and limiting vectorization factor of epilog loop by metadata. <Approach-2> Challenges with re-running the vectorizer pass: 1) Reusing alias c...
2017 Mar 14
4
[Proposal][RFC] Epilog loop vectorization
...osh.Nema at amd.com >>>> <mailto:Ashutosh.Nema at amd.com>> wrote: >>>> >>>> Summarizing the discussion on the implementation approaches. >>>> Discussed about two approaches, first running ‘InnerLoopVectorizer’ >>>> again on the epilog loop immediately after vectorizing the original >>>> loop within the same vectorization pass, the second approach where >>>> re-running vectorization pass and limiting vectorization factor of >>>> epilog loop by metadata. >>>> <Approach-2> &g...
2011 Nov 10
2
[LLVMdev] Possible Phi Removal Pass?
...ons without changing register/memory usage. Does this sort of translation of phi instructions seem reasonable? : =============================================================== ; Original stripped code example entry: ... br label %for.body for.body: ; preds = %entry, %sw.epilog %indvar = phi i32 [ 0, %entry ], [ %indvar.next, %sw.epilog ] %j.02 = phi i32 [10, %entry ], [ %j.2, %sw.epilog ] ... sw.epilog: ; preds = ... ... br i1 %exitcond, label %for.end, label %for.body ; End original stripped code example ===================================...
2017 Mar 14
2
[Proposal][RFC] Epilog loop vectorization
...17, at 6:00 AM, Nema, Ashutosh <Ashutosh.Nema at amd.com >> <mailto:Ashutosh.Nema at amd.com>> wrote: >> >> Summarizing the discussion on the implementation approaches. >> Discussed about two approaches, first running ‘InnerLoopVectorizer’ >> again on the epilog loop immediately after vectorizing the original >> loop within the same vectorization pass, the second approach where >> re-running vectorization pass and limiting vectorization factor of >> epilog loop by metadata. >> <Approach-2> >> Challenges with re-runnin...
2017 Feb 27
2
[Proposal][RFC] Epilog loop vectorization
...2017, at 7:27 AM, Hal Finkel <hfinkel at anl.gov> wrote: > > > On 02/27/2017 06:29 AM, Nema, Ashutosh wrote: >> Thanks for looking into this. >> >> 1) Issues with re running vectorizer: >> Vectorizer might generate redundant alias checks while vectorizing epilog loop. >> Redundant alias checks are expensive, we like to reuse the results of already computed alias checks. >> With metadata we can limit the width of epilog loop, but not sure about reusing alias check result. >> Any thoughts on rerunning vectorizer with reusing the alias check...
2017 Feb 22
3
[Proposal][RFC] Epilog loop vectorization
Hi, This is a proposal about epilog loop vectorization. Currently Loop Vectorizer inserts an epilogue loop for handling loops that don't have known iteration counts. The Loop Vectorizer supports loops with an unknown trip count, unknown trip count may not be a multiple of the vector width, and the vectorizer has to execute the...
2017 Feb 23
2
[Proposal][RFC] Epilog loop vectorization
On 02/22/2017 11:52 AM, Adam Nemet via llvm-dev wrote: > Hi Ashutosh, > >> On Feb 22, 2017, at 1:57 AM, Nema, Ashutosh via llvm-dev >> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Hi, >> This is a proposal about epilog loop vectorization. >> Currently Loop Vectorizer inserts an epilogue loop for handling loops >> that don’t have known iteration counts. >> The Loop Vectorizer supports loops with an unknown trip count, >> unknown trip count may not be a multiple of the vector width, and th...
2017 Mar 15
4
[Proposal][RFC] Epilog loop vectorization
...d.com <mailto:Ashutosh.Nema at amd.com>> wrote: >>>>>> >>>>>> Summarizing the discussion on the implementation approaches. >>>>>> Discussed about two approaches, first running >>>>>> ‘InnerLoopVectorizer’ again on the epilog loop immediately after >>>>>> vectorizing the original loop within the same vectorization pass, >>>>>> the second approach where re-running vectorization pass and >>>>>> limiting vectorization factor of epilog loop by metadata. >>>&g...
2017 Jun 09
2
Question about Prolog/Epilog Code Insertion
Hi All, When seeing the title "Prolog/Epilog Code Insertion", I'd expect something about XXXFrameLowering.cpp (particular about emitPrologue/emitEpilogue). But the document [1] is about unwind. Is it placed at the right place/section? Thanks. [1] http://llvm.org/docs/CodeGenerator.html#prolog-epilog-code-insertion Regards, chenw...
2017 Feb 06
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
...e processing input[x] at stage y. If there is no dependency between inx(sy) and in(x+1)(sy), then we can do this FOR in=0 TO N WITH in+=8 FOR y=0 TO order-1 WITH y++ PROC(in0(sy) in1(sy) in2(sy) in3(sy) in4(sy) in5(sy) in6(sy) in7(sy)) END FOR END FOR Definitely there is no any prolog and epilog needed. However, the critical thing is that all the states in each stage when processing input[i] are reused by the next input[i+1]. That is input[i+1] must wait input[i] for 1 stage, and input[i+2] must wait input[i+1] for 1 stage, etc. Then it becomes this FOR in=0 to N WITH in+=8 PROC(in0(s0...
2019 Jul 15
1
MachinePipeliner refactoring
...+llvm-dev for openness), Over the past week or so I've been attempting to extend the MachinePipeliner to support different idioms of code generation. To make this a bit more concrete, there are two areas where the currently generated code could be improved depending on architecture:   1) The epilog blocks peel off the final iterations in reverse order. This means that the overall execution of loop iterations isn't in a perfectly pipelined order. For architectures that have hardware constructs that insist on a first-in-first-out order (queues), the currently generated code cannot be used....
2009 Sep 18
0
[LLVMdev] OT: intel darwin losing primary target status
I dug into this. Based on the .s files in bugzilla, the latest gcc is now adding dwarf unwind info to describe the function epilog. If you run dwarfdump --eh-frame on the .o files made with the new compiler, you'll see extra dwarf unwind instructions at the end like: ... DW_CFA_advance_loc4 (64) #<-- advance to near end of function DW_CFA_restore (rbp)...
2017 Mar 14
2
[Proposal][RFC] Epilog loop vectorization
On 03/14/2017 11:58 AM, Michael Kuperstein wrote: > I'm still not sure about this, for a few reasons: > > 1) I'd like to try to treat epilogue loops the same way regardless of > whether the main loop was vectorized by hand or automatically. So if > someone hand-wrote an avx-512 16-wide loop, with alias checks, and we > decide it's profitable to vectorize the epilogue loop by 4 and re-use > the checks, it ought to be...
2009 Apr 09
3
[LLVMdev] Calling Conventions, function prologs and epilogs.
How/where are function prologs and epilogs generated, is it bespoke C++ code or TableGen generated ? If someone could point me in the right direction please. Many thanks in advance, Aaron -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090409/fb33...
2017 Feb 28
3
[Proposal][RFC] Epilog loop vectorization
...A[i] = B[i] + C[i]; } <Command> $ opt –O3 –gvn test.ll –o test.opt.ll $ opt –O3 –newgvn test.ll –o test.opt.ll “test.ll” is attached, it got already vectorized by the approach running vectorizer twice by annotate the remainder loop with metadata to limit the vectorization factor for epilog vector loop. Regards, Ashutosh From: anemet at apple.com [mailto:anemet at apple.com] Sent: Tuesday, February 28, 2017 1:33 AM To: Hal Finkel <hfinkel at anl.gov> Cc: Daniel Berlin <dberlin at dberlin.org>; Nema, Ashutosh <Ashutosh.Nema at amd.com>; Zaks, Ayal <ayal.zaks at i...
2017 Jan 31
6
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi, Attached is a patch with arm neon optimizations for silk_warped_autocorrelation_FIX(). Please review. Thanks, Felicia -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170131/9a912bb4/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: