similar to: [LLVMdev] Injecting code before function prolog

Displaying 20 results from an estimated 7000 matches similar to: "[LLVMdev] Injecting code before function prolog"

2010 Apr 10
0
[LLVMdev] Injecting code before function prolog
On Wed, Apr 7, 2010 at 12:43 PM, Arlen Cox <arlencox at gmail.com> wrote: > I'm trying to implement something similar to this: > http://gcc.gnu.org/wiki/SplitStacks in LLVM.  The reason I want this > is so that I can have dynamically growing and shrinking stacks in my > programming language.  In order to do this, I need to be able to check > for overflow of a stack frame.
2020 Mar 24
2
[RFC][AArch64] Homogeneous Prolog and Epilog for Size Optimization
Hello, I'd like to upstream our work over the time which the community would benefit from. This is a part of effort toward minimizing code size presented in here <https://llvm.org/devmtg/2020-02-23/slides/Kyungwoo-GlobalMachineOutlinerForThinLTO.pdf>. In particular, this RFC is about optimizing prolog and epilog for size. *Homogeneous Prolog and Epilog for Size Optimization, D76570
2020 Mar 24
2
[RFC][AArch64] Homogeneous Prolog and Epilog for Size Optimization
Hi Vedant, Thanks for your interest and comment. Size-optimization improves page-faults and a start-up time for a large application, which this enabling also followed. Even though I didn't see a large regression/complaint on a CPU-bound case, which is not a typical case for mobile workload, I wanted to be precautious of enabling it by default. However, as with default outlining case, I
2017 Jun 09
2
Question about Prolog/Epilog Code Insertion
Hi All, When seeing the title "Prolog/Epilog Code Insertion", I'd expect something about XXXFrameLowering.cpp (particular about emitPrologue/emitEpilogue). But the document [1] is about unwind. Is it placed at the right place/section? Thanks. [1] http://llvm.org/docs/CodeGenerator.html#prolog-epilog-code-insertion Regards, chenwj -- Wei-Ren Chen (陳韋任) Homepage:
2010 Jun 18
1
[LLVMdev] Problem adding a MachineBasicBlock during X86 EmitPrologue
I'm attempting to add an error handler to functions with a custom calling convention. This error is checked upon function entry, before any code is run (specifically, I cannot allow any stack operations). Because of this, I figured a good place to do this code insertion is in EmitPrologue. I also, at this time, create the block that handles the error case. // create a new block for
2008 Nov 15
1
[LLVMdev] ARM libgcc dependencies
I was trying to build some code today for an ARM7TDMI, which does not have a hardware divider and I noticed that LLVM translated divide instructions into a call into libgcc's udivsi3. Is there any way of removing this library dependency and allowing LLVM's link time optimizer optimize the generated division code (inline it, merge the div/mod if using both, etc)? Thanks much, Arlen
2010 Apr 12
1
[LLVMdev] Question. about Machinefunction pass, funtion Prolog/Epilog code, stack frame
I am new to the LLVM, and need some help with this points. 1. how can we add special code for the Prolog/Epilog for some certain functions, this should be done with machinefunction pass, rt? 2. Basically, I want to get the function stack frame, that is the size and the initial position. I found int64_t llvm::MachineFrameInfo::getObjectSize ( int *ObjectIdx* ) const[inline] This method is
2007 Sep 06
1
[LLVMdev] Prolog/Epilog Insertion Question
I've been looking through the code for pologue/epilogoue generation and noticed this oddity: void PEI::replaceFrameIndices(MachineFunction &Fn) { [...] for (MachineBasicBlock::iterator I = BB->begin(); I != BB->end(); ) { [...] if (I->getOpcode() == FrameSetupOpcode || I->getOpcode() == FrameDestroyOpcode) { [...] } else {
2014 Mar 27
2
[LLVMdev] PR19267 - Add a feature to clobber non-calle-save regs in the prolog.
This is a feature I’m considering for the LLVM backend. Feel free to provide input in the following PR. llvm.org/pr19267 - Add a feature to clobber non-callee-save regs in the prolog. I’m copying llvm-dev because it seems like something that others must have already done or at least thought about at some point. -Andy
2014 Nov 02
3
[PATCH] customize: Add --ssh-inject option for injecting SSH keys.
This adds a customize option: virt-customize --ssh-inject USER[=KEY] virt-builder --ssh-inject USER[=KEY] virt-sysprep --ssh-inject USER[=KEY] In each case this either injects the current (host) user's ssh pubkey into the guest user USER (adding it to ~USER/.ssh/authorized_keys in the guest), or you can specify a particular key. For example: virt-builder fedora-20 --ssh-inject root
2019 Jul 16
2
MachinePipeliner refactoring
Hi James, I also think that refactoring the code generation part is a great idea. That code is very complicated and difficult to maintain. I’ve wanted to rewrite that code for a long time, but just have never got to it. There are quite a few edge cases to handle (at least in the current code). I’ll take a deeper look at your patch. The abstractions that you mention, Stage and Block, are good
2017 Feb 06
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc, Thanks a lot for reviewing this huge assembly function! silk_warped_autocorrelation_FIX_c()'s kernel part is for( n = 0; n < length; n++ ) { tmp1_QS = silk_LSHIFT32( (opus_int32)input[ n ], QS ); /* Loop over allpass sections */ for( i = 0; i < order; i++ ) { /* Output of allpass section */ tmp2_QS = silk_SMLAWB(
2017 Jan 31
6
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi, Attached is a patch with arm neon optimizations for silk_warped_autocorrelation_FIX(). Please review. Thanks, Felicia -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xiph.org/pipermail/opus/attachments/20170131/9a912bb4/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name:
2019 Jul 15
1
MachinePipeliner refactoring
Hi James: Personally, I like the idea of refactoring and more abstraction, But unfortunately, I don't know enough about the edges cases either. BTW: the prototype is still causing quite some Asseertions in PowerPC - some nodes are not generated in correct order. Best, Jinsong Ji (纪金松), PhD. XL/LLVM on Power Compiler Development E-mail: jji at us.ibm.com From: James Molloy <james at
2017 Feb 07
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
This is a great idea. But the order (psEncC->shapingLPCOrder) can be configured to 12, 14, 16, 20 and 24 according to complexity parameter. It's hard to get a universal function to handle all these orders efficiently. Any suggestions? Thanks, Linfeng On Mon, Feb 6, 2017 at 12:40 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > On 06/02/17 02:51 PM,
2017 Feb 07
3
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc, Thanks for your suggestions. Will get back to you once we have some updates. Linfeng On Mon, Feb 6, 2017 at 5:47 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote: > Hi Linfeng, > > On 06/02/17 07:18 PM, Linfeng Zhang wrote: > > This is a great idea. But the order (psEncC->shapingLPCOrder) can be > > configured to 12, 14, 16, 20 and 24 according to
2017 Apr 05
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
I attached a new patch with small cleanup (disassembly is identical as the last patch). We have done the same internal testing as usual. Also, attached 2 failed temporary versions which try to reduce code size (just for code review reference purpose). The new patch of silk_warped_autocorrelation_FIX_neon() has a code size of 3,228 bytes (with gcc). smaller_slower.c has a code size of 2,304
2004 Jun 09
2
[LLVMdev] X86 Frame info question
The X86 backend has this code: X86TargetMachine::X86TargetMachine(const Module &M, IntrinsicLowering *IL) : .... FrameInfo(TargetFrameInfo::StackGrowsDown, 8/*16 for SSE*/, 4), That is, it uses "4" as local area offset. Based on prior discussion this should mean that the local area starts and address ESP+4. Is this really true? On X86 stack grows down, so
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Thank Jean-Marc! The speedup percentages are all relative to the entire encoder. Comparing to master, this optimization patch speeds up fixed-point SILK encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8% Complexity 8: 5.5% Complexity 10: 4.0% when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max MHz: 2116.5 Thanks, Linfeng On Wed, Apr 5, 2017 at 11:02 AM,
2019 Jul 15
2
MachinePipeliner refactoring
Hi Brendan (and friends of MachinePipeliner, +llvm-dev for openness), Over the past week or so I've been attempting to extend the MachinePipeliner to support different idioms of code generation. To make this a bit more concrete, there are two areas where the currently generated code could be improved depending on architecture: 1) The epilog blocks peel off the final iterations in reverse