Displaying 20 results from an estimated 7000 matches similar to: "[LLVMdev] Injecting code before function prolog"
2010 Apr 10
0
[LLVMdev] Injecting code before function prolog
On Wed, Apr 7, 2010 at 12:43 PM, Arlen Cox <arlencox at gmail.com> wrote:
> I'm trying to implement something similar to this:
> http://gcc.gnu.org/wiki/SplitStacks in LLVM. The reason I want this
> is so that I can have dynamically growing and shrinking stacks in my
> programming language. In order to do this, I need to be able to check
> for overflow of a stack frame.
2020 Mar 24
2
[RFC][AArch64] Homogeneous Prolog and Epilog for Size Optimization
Hello,
I'd like to upstream our work over the time which the community would
benefit from.
This is a part of effort toward minimizing code size presented in here
<https://llvm.org/devmtg/2020-02-23/slides/Kyungwoo-GlobalMachineOutlinerForThinLTO.pdf>.
In particular, this RFC is about optimizing prolog and epilog for size.
*Homogeneous Prolog and Epilog for Size Optimization, D76570
2020 Mar 24
2
[RFC][AArch64] Homogeneous Prolog and Epilog for Size Optimization
Hi Vedant,
Thanks for your interest and comment.
Size-optimization improves page-faults and a start-up time for a large
application, which this enabling also followed.
Even though I didn't see a large regression/complaint on a CPU-bound case,
which is not a typical case for mobile workload, I wanted to be precautious
of enabling it by default.
However, as with default outlining case, I
2017 Jun 09
2
Question about Prolog/Epilog Code Insertion
Hi All,
When seeing the title "Prolog/Epilog Code Insertion", I'd expect
something about XXXFrameLowering.cpp
(particular about emitPrologue/emitEpilogue). But the document [1] is about
unwind. Is it placed at the right
place/section?
Thanks.
[1] http://llvm.org/docs/CodeGenerator.html#prolog-epilog-code-insertion
Regards,
chenwj
--
Wei-Ren Chen (陳韋任)
Homepage:
2010 Jun 18
1
[LLVMdev] Problem adding a MachineBasicBlock during X86 EmitPrologue
I'm attempting to add an error handler to functions with a custom
calling convention. This error is checked upon function entry, before
any code is run (specifically, I cannot allow any stack operations).
Because of this, I figured a good place to do this code insertion is
in EmitPrologue. I also, at this time, create the block that handles
the error case.
// create a new block for
2008 Nov 15
1
[LLVMdev] ARM libgcc dependencies
I was trying to build some code today for an ARM7TDMI, which does not
have a hardware divider and I noticed that LLVM translated divide
instructions into a call into libgcc's udivsi3. Is there any way of
removing this library dependency and allowing LLVM's link time
optimizer optimize the generated division code (inline it, merge the
div/mod if using both, etc)?
Thanks much,
Arlen
2010 Apr 12
1
[LLVMdev] Question. about Machinefunction pass, funtion Prolog/Epilog code, stack frame
I am new to the LLVM, and need some help with this points.
1. how can we add special code for the Prolog/Epilog for some
certain functions, this should be done with machinefunction pass, rt?
2. Basically, I want to get the function stack frame, that is the size and
the initial position.
I found
int64_t llvm::MachineFrameInfo::getObjectSize ( int *ObjectIdx* )
const[inline]
This method is
2007 Sep 06
1
[LLVMdev] Prolog/Epilog Insertion Question
I've been looking through the code for pologue/epilogoue generation and
noticed this oddity:
void PEI::replaceFrameIndices(MachineFunction &Fn) {
[...]
for (MachineBasicBlock::iterator I = BB->begin(); I != BB->end(); ) {
[...]
if (I->getOpcode() == FrameSetupOpcode ||
I->getOpcode() == FrameDestroyOpcode) {
[...]
} else {
2014 Mar 27
2
[LLVMdev] PR19267 - Add a feature to clobber non-calle-save regs in the prolog.
This is a feature I’m considering for the LLVM backend. Feel free to provide input in the following PR.
llvm.org/pr19267 - Add a feature to clobber non-callee-save regs in the prolog.
I’m copying llvm-dev because it seems like something that others must have already done or at least thought about at some point.
-Andy
2014 Nov 02
3
[PATCH] customize: Add --ssh-inject option for injecting SSH keys.
This adds a customize option:
virt-customize --ssh-inject USER[=KEY]
virt-builder --ssh-inject USER[=KEY]
virt-sysprep --ssh-inject USER[=KEY]
In each case this either injects the current (host) user's ssh pubkey
into the guest user USER (adding it to ~USER/.ssh/authorized_keys in
the guest), or you can specify a particular key.
For example:
virt-builder fedora-20 --ssh-inject root
2019 Jul 16
2
MachinePipeliner refactoring
Hi James,
I also think that refactoring the code generation part is a great idea. That code is very complicated and difficult to maintain. I’ve wanted to rewrite that code for a long time, but just have never got to it. There are quite a few edge cases to handle (at least in the current code). I’ll take a deeper look at your patch. The abstractions that you mention, Stage and Block, are good
2017 Feb 06
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc,
Thanks a lot for reviewing this huge assembly function!
silk_warped_autocorrelation_FIX_c()'s kernel part is
for( n = 0; n < length; n++ ) {
tmp1_QS = silk_LSHIFT32( (opus_int32)input[ n ], QS );
/* Loop over allpass sections */
for( i = 0; i < order; i++ ) {
/* Output of allpass section */
tmp2_QS = silk_SMLAWB(
2017 Jan 31
6
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi,
Attached is a patch with arm neon optimizations for
silk_warped_autocorrelation_FIX(). Please review.
Thanks,
Felicia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/opus/attachments/20170131/9a912bb4/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name:
2019 Jul 15
1
MachinePipeliner refactoring
Hi James:
Personally, I like the idea of refactoring and more abstraction,
But unfortunately, I don't know enough about the edges cases either.
BTW: the prototype is still causing quite some Asseertions in PowerPC -
some nodes are not generated in correct order.
Best,
Jinsong Ji (纪金松), PhD.
XL/LLVM on Power Compiler Development
E-mail: jji at us.ibm.com
From: James Molloy <james at
2017 Feb 07
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
This is a great idea. But the order (psEncC->shapingLPCOrder) can be
configured to 12, 14, 16, 20 and 24 according to complexity parameter.
It's hard to get a universal function to handle all these orders
efficiently. Any suggestions?
Thanks,
Linfeng
On Mon, Feb 6, 2017 at 12:40 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:
> Hi Linfeng,
>
> On 06/02/17 02:51 PM,
2017 Feb 07
3
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Hi Jean-Marc,
Thanks for your suggestions. Will get back to you once we have some updates.
Linfeng
On Mon, Feb 6, 2017 at 5:47 PM, Jean-Marc Valin <jmvalin at jmvalin.ca> wrote:
> Hi Linfeng,
>
> On 06/02/17 07:18 PM, Linfeng Zhang wrote:
> > This is a great idea. But the order (psEncC->shapingLPCOrder) can be
> > configured to 12, 14, 16, 20 and 24 according to
2017 Apr 05
2
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
I attached a new patch with small cleanup (disassembly is identical as the
last patch). We have done the same internal testing as usual.
Also, attached 2 failed temporary versions which try to reduce code size
(just for code review reference purpose).
The new patch of silk_warped_autocorrelation_FIX_neon() has a code size of
3,228 bytes (with gcc).
smaller_slower.c has a code size of 2,304
2004 Jun 09
2
[LLVMdev] X86 Frame info question
The X86 backend has this code:
X86TargetMachine::X86TargetMachine(const Module &M, IntrinsicLowering *IL)
: ....
FrameInfo(TargetFrameInfo::StackGrowsDown, 8/*16 for SSE*/, 4),
That is, it uses "4" as local area offset. Based on prior discussion this
should mean that the local area starts and address ESP+4. Is this really
true? On X86 stack grows down, so
2017 Apr 05
4
[PATCH] Optimize silk_warped_autocorrelation_FIX() for ARM NEON
Thank Jean-Marc!
The speedup percentages are all relative to the entire encoder.
Comparing to master, this optimization patch speeds up fixed-point SILK
encoder on NEON as following: Complexity 5: 6.1% Complexity 6: 5.8%
Complexity 8: 5.5% Complexity 10: 4.0%
when testing on an Acer Chromebook, ARMv7 Processor rev 3 (v7l), CPU max
MHz: 2116.5
Thanks,
Linfeng
On Wed, Apr 5, 2017 at 11:02 AM,
2019 Jul 15
2
MachinePipeliner refactoring
Hi Brendan (and friends of MachinePipeliner, +llvm-dev for openness),
Over the past week or so I've been attempting to extend the
MachinePipeliner to support different idioms of code generation. To make
this a bit more concrete, there are two areas where the currently generated
code could be improved depending on architecture:
1) The epilog blocks peel off the final iterations in reverse