thr3ads.net - similar to: "[LLVMdev] Segmented Stacks (re-roll)"

Displaying 20 results from an estimated 7000 matches similar to: "[LLVMdev] Segmented Stacks (re-roll)"

2011 Aug 22

[LLVMdev] Segmented Stacks (re-roll)

Hi Sanjoy, The patch generally looks fine except for this part: diff --git a/lib/CodeGen/StackSegmenter.cpp b/lib/CodeGen/StackSegmenter.cpp new file mode 100644 index 0000000..5ffb8f2 --- /dev/null +++ b/lib/CodeGen/StackSegmenter.cpp @@ -0,0 +1,48 @@ +//===-- StackSegmenter.h - Prolog/Epilog code insertion -------*- C++ -* --===// The comment is obviously incorrect. diff --git

[LLVMdev] Segmented Stacks (re-roll)

2011 Aug 23

[LLVMdev] Segmented Stacks (re-roll)

Hi! > diff --git a/lib/CodeGen/StackSegmenter.cpp b/lib/CodeGen/StackSegmenter.cpp > new file mode 100644 > index 0000000..5ffb8f2 > --- /dev/null > +++ b/lib/CodeGen/StackSegmenter.cpp > @@ -0,0 +1,48 @@ > +//===-- StackSegmenter.h - Prolog/Epilog code insertion -------*- C++ -* --===// > > The comment is obviously incorrect. Thanks. So much for lifting file

[LLVMdev] Segmented Stacks (re-roll)

2011 Aug 23

[LLVMdev] Segmented Stacks (re-roll)

On Aug 23, 2011, at 9:24 AM, Sanjoy Das wrote: > Hi! > >> diff --git a/lib/CodeGen/StackSegmenter.cpp b/lib/CodeGen/StackSegmenter.cpp >> new file mode 100644 >> index 0000000..5ffb8f2 >> --- /dev/null >> +++ b/lib/CodeGen/StackSegmenter.cpp >> @@ -0,0 +1,48 @@ >> +//===-- StackSegmenter.h - Prolog/Epilog code insertion -------*- C++ -* --===//

[LLVMdev] Segmented Stacks (re-roll)

2011 Aug 24

[LLVMdev] Segmented Stacks (re-roll)

Hi! > According to the patch you send, the pass is not doing anything: > > +bool StackSegmenter::runOnMachineFunction(MachineFunction &MF) { > + return false; > +} > + It is, in the next patch. diff --git a/lib/CodeGen/StackSegmenter.cpp b/lib/CodeGen/StackSegmenter.cpp index 5ffb8f2..cc2ca87 100644 --- a/lib/CodeGen/StackSegmenter.cpp +++ b/lib/CodeGen/StackSegmenter.cpp

[LLVMdev] Segmented Stacks: Pre-midterm work

2011 Aug 10

[LLVMdev] Segmented Stacks: Pre-midterm work

Hi! Attached my pre-midterm GSoC work for segmented stacks for review (with the required fixes). Thanks! -- Sanjoy Das http://playingwithpointers.com -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-New-command-line-option-to-enable-segmented-stacks.patch Type: text/x-diff Size: 1699 bytes Desc: not available URL:

[LLVMdev] [Segmented Stacks] Week 1

2011 May 30

[LLVMdev] [Segmented Stacks] Week 1

Hi! I've attached my first week of work as a patchset for review. This is also available on Github [1]. By next Monday I intend to (more or less) finish up the preliminary parts concerning the codegen; and start working on the runtime (so that I can do a basic sanity check). [1] https://github.com/sanjoy/llvm/tree/segmented-stacks -- Sanjoy Das http://playingwithpointers.com

[LLVMdev] [PATCH] Segmented Stacks

2011 Jul 14

[LLVMdev] [PATCH] Segmented Stacks

Hi llvm-dev! I have attached the current state of my GSoC work in patches [1] for review; this currently allows LLVM to correctly handle functions running out of stack space and variable sized stack objects. Firstly, since I think it is better to get things merged in small chunks, I'd like to have some specific feedback on where my work stands in terms of mergeability. Secondly, I had been

[LLVMdev] Adding a stack probe function attribute

2015 Aug 16

[LLVMdev] Adding a stack probe function attribute

I started to implement inlining of the stack probe function based on Microsoft's inlined stack probes in https://github.com/Microsoft/llvm/tree/MS. Do we know why the stack pointer cannot be updated in a loop (which results in ideal code)? I noticed that was commented in Microsoft's code. I suspect this is due to debug or unwinding information, since it is allowed on Windows x86-32. I

[LLVMdev] Segmented Stacks (Re-Roll 2)

2011 Aug 25

[LLVMdev] Segmented Stacks (Re-Roll 2)

Hi all! I've attached a corrected set of patches (based on the input I received here). Please let me know if this work looks mergeable. The documentation is only partially filled in, I'll add more details once support for Go is also merged (the current co-routine work I'm doing). Thanks! -- Sanjoy Das http://playingwithpointers.com -------------- next part -------------- A

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 06

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Jean-Marc, Thanks a lot for reviewing this huge assembly function! silk_warped_autocorrelation_FIX_c()'s kernel part is for( n = 0; n < length; n++ ) { tmp1_QS = silk_LSHIFT32( (opus_int32)input[ n ], QS ); /* Loop over allpass sections */ for( i = 0; i < order; i++ ) { /* Output of allpass section */ tmp2_QS = silk_SMLAWB(

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Jan 31

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi, Attached is a patch with arm neon optimizations for silk_warped_autocorrelation_FIX(). Please review. Thanks, Felicia -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170131/9a912bb4/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name:

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 07

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

This is a great idea. But the order (psEncC->shapingLPCOrder) can be configured to 12, 14, 16, 20 and 24 according to complexity parameter. It's hard to get a universal function to handle all these orders efficiently. Any suggestions? Thanks, Linfeng On Mon, Feb 6, 2017 at 12:40 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > On 06/02/17 02:51 PM,

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Feb 07

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Hi Jean-Marc, Thanks for your suggestions. Will get back to you once we have some updates. Linfeng On Mon, Feb 6, 2017 at 5:47 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > On 06/02/17 07:18 PM, Linfeng Zhang wrote: > > This is a great idea. But the order (psEncC->shapingLPCOrder) can be > > configured to 12, 14, 16, 20 and 24 according to

[LLVMdev] Injecting code before function prolog

2010 Apr 07

[LLVMdev] Injecting code before function prolog

I'm trying to implement something similar to this: http://gcc.gnu.org/wiki/SplitStacks in LLVM. The reason I want this is so that I can have dynamically growing and shrinking stacks in my programming language. In order to do this, I need to be able to check for overflow of a stack frame. The methods of doing this are outlined in the link above, but my intention is to pass the current stack

[RFC][AArch64] Homogeneous Prolog and Epilog for Size Optimization

2020 Mar 24

[RFC][AArch64] Homogeneous Prolog and Epilog for Size Optimization

Hello, I'd like to upstream our work over the time which the community would benefit from. This is a part of effort toward minimizing code size presented in here <https://llvm.org/devmtg/2020-02-23/slides/Kyungwoo-GlobalMachineOutlinerForThinLTO.pdf>. In particular, this RFC is about optimizing prolog and epilog for size. *Homogeneous Prolog and Epilog for Size Optimization, D76570

[RFC][AArch64] Homogeneous Prolog and Epilog for Size Optimization

2020 Mar 24

[RFC][AArch64] Homogeneous Prolog and Epilog for Size Optimization

Hi Vedant, Thanks for your interest and comment. Size-optimization improves page-faults and a start-up time for a large application, which this enabling also followed. Even though I didn't see a large regression/complaint on a CPU-bound case, which is not a typical case for mobile workload, I wanted to be precautious of enabling it by default. However, as with default outlining case, I

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

I attached a new patch with small cleanup (disassembly is identical as the last patch). We have done the same internal testing as usual. Also, attached 2 failed temporary versions which try to reduce code size (just for code review reference purpose). The new patch of silk_warped_autocorrelation_FIX_neon() has a code size of 3,228 bytes (with gcc). smaller_slower.c has a code size of 2,304

MachinePipeliner refactoring

2019 Jul 16

MachinePipeliner refactoring

Hi James, I also think that refactoring the code generation part is a great idea. That code is very complicated and difficult to maintain. I’ve wanted to rewrite that code for a long time, but just have never got to it. There are quite a few edge cases to handle (at least in the current code). I’ll take a deeper look at your patch. The abstractions that you mention, Stage and Block, are good

MachinePipeliner refactoring

2019 Jul 15

MachinePipeliner refactoring

Hi James: Personally, I like the idea of refactoring and more abstraction, But unfortunately, I don't know enough about the edges cases either. BTW: the prototype is still causing quite some Asseertions in PowerPC - some nodes are not generated in correct order. Best, Jinsong Ji (纪金松), PhD. XL/LLVM on Power Compiler Development E-mail: jji at us.ibm.com From: James Molloy <james at

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

2017 Apr 05

[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON

Thank Jean-Marc! The speedup percentages are all relative to the entire encoder. Comparing to master, this optimization patch speeds up fixed-point SILK encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8% Complexity 8: 5.5% Complexity 10: 4.0% when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max MHz: 2116.5 Thanks, Linfeng On Wed, Apr 5, 2017 at 11:02 AM,

similar to: [LLVMdev] Segmented Stacks (re-roll)