similar to: Recent -Os code size regressions

Displaying 20 results from an estimated 900 matches similar to: "Recent -Os code size regressions"

2015 Nov 21
2
Recent -Os code size regressions
On Thu, Nov 19, 2015 at 1:10 PM, Renato Golin <renato.golin at linaro.org> wrote: > On 19 November 2015 at 19:08, Steve King via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Does the community have bots or humans tracking code size for -Os >> builds? > > Hi Steve, > > I still haven't got around doing a CI for EEMBC or SPEC on ARM. I do > track
2015 Nov 21
3
Recent -Os code size regressions
Maybe adjust this to be different for –Os, -Oz than for –O2? Kevin Smith From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of James Molloy via llvm-dev Sent: Friday, November 20, 2015 4:05 PM To: Steve King <steve at metrokings.com>; Renato Golin <renato.golin at linaro.org> Cc: llvm-dev <llvm-dev at lists.llvm.org> Subject: Re: [llvm-dev] Recent -Os code
2015 Nov 21
2
Recent -Os code size regressions
On Fri, Nov 20, 2015 at 5:06 PM, James Molloy <james at jamesmolloy.co.uk> wrote: > > Hi, > > We'd need to look precisely at what's causing the code size bloat. The midend commit pointed out by Steve shouldn't cause bloat in and of itself - it should reduce code size. It removes a load of stores and branches. > > I know a backend change I made to ARM isn't
2013 Oct 11
3
[LLVMdev] Generate code for ARM Cortex m0, m3, and m4.
Hi, I am trying to cross compile code for ARM Cortex m0, m3, and m4. For m0, I use: -target armv6--eabi -mcpu=cortex-m0 That seems to work. For m3 and m4, I use the following which does not work (fatal error: error in backend: CPU: 'cortex-m3' does not support ARM mode): -target armv7m--eabi -mcpu=cortex-m3 and -target armv7em--eabi -mcpu=cortex-m4 Who can help me with the
2013 Jan 30
1
[LLVMdev] [Patch][Review Requested][Compilation Time] Avoid frequent copy of elements in LoopStrengthReduce
The compilation time is measured for different benchmarks while compiling a .bc file into a shared object. The improvement across the range of benchmarks is listed in following table. If the reason behind the need for other performance metrics is to identify possible measurement errors, then I think this table would be of some help. However, we do not have the standard deviation and confidence
2013 Jan 29
0
[LLVMdev] [Patch][Review Requested][Compilation Time] Avoid frequent copy of elements in LoopStrengthReduce
On Tue, Jan 29, 2013 at 3:59 PM, Murali, Sriram <sriram.murali at intel.com> wrote: > Our benchmark results show that the compilation time performance improved by > ~0.5%. That's fairly small; what was the standard deviation, confidence interval, etc? -- Sean Silva
2013 Oct 15
2
[LLVMdev] Unwanted push/pop on Cortex-M.
Hi Andrea, That is because the LR is the fixed register as per the http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf and out_char() function is not the leaf function ,Hence compiler tends to save and restore the LR and the save and restore of register r11 is to align stack for 8 bytes as per ARM EABI. Thanks ~Umesh On Tuesday, October 15, 2013, Umesh Kalappa
2013 Oct 15
1
[LLVMdev] Unwanted push/pop on Cortex-M.
Hi andrea, R11 treated as frame pointer at arm backend , which is fixed again . Thanks Umesh On Tuesday, October 15, 2013, Andrea Mucignat <andrea at nestlabs.com> wrote: > Umesh, > Makes some sort of sense to me, OTOH: > If instead of choosing r11 as a "dummy" to align the stack we had chosen some other register in the range r0-r7 then we could have emitted the PUSH
2013 Oct 15
0
[LLVMdev] Unwanted push/pop on Cortex-M.
Umesh, Makes some sort of sense to me, OTOH: If instead of choosing r11 as a "dummy" to align the stack we had chosen some other register in the range r0-r7 then we could have emitted the PUSH encoding T1 (2 bytes opcode) as opposed to the encoding T2 (which is a 4 bytes opcode). A On Tue, Oct 15, 2013 at 2:59 AM, Umesh Kalappa <umesh.kalappa0 at gmail.com>wrote: > Hi
2013 Jan 29
4
[LLVMdev] [Patch][Review Requested][Compilation Time] Avoid frequent copy of elements in LoopStrengthReduce
Hello, This patch aims to improve compile time performance by increasing the SCEV vector size in LoopStrengthReduce. It is observed that the BaseRegs vector size is 4 in most cases, and elements are frequently copied when it is initialized as SmallVector<const SCEV *, 2> BaseRegs. Our benchmark results show that the compilation time performance improved by ~0.5%. Patch by Wan Xiaofei.
2013 Jul 17
2
[LLVMdev] Help with subtarget features and context-dependent asm parsers
Tim Northover <t.p.northover at gmail.com> writes: >> /tmp/foo.s:1:2: error: instruction requires: distinct-ops >> sllk %r2,%r3,1 >> ^ > > That seems like it would be a good improvement for all targets. Thanks, sounds like it might be more acceptable than I thought :-) >> ARM seems to rely on the current MatchOperandParserImpl() behaviour,
2011 Oct 13
3
[LLVMdev] LLC ARM Backend maintainer
> The ARM Holdings emulator does this; I used it with great success to > profile an Advanced Encryption Standard encryptor a while back. It is indeed a useful piece of kit. We do a lot of our internal regression tests on it, and also run LLVM's regression tests every night on it (as well as PlumHall, EEMBC and SpecInt). Unfortunately it's not exactly software we can give away or
2014 Jun 24
5
[LLVMdev] Contributing the Apple ARM64 compiler backend
Eric Christopher <echristo <at> gmail.com> writes: > > > The big pain issues I see merging from ARM64 to AArch64 are: > > 1. Apple have created a fairly complete scheduling model already for > > ARM64, and we'd have to merge the partial? model in AArch64 and theirs. We > > risk regressing performance on Apple's targets here, and we can't
2011 Oct 13
0
[LLVMdev] LLC ARM Backend maintainer
Well how about as a strawman... taking some options from http://en.wikipedia.org/wiki/List_of_ARM_microprocessor_cores and http://en.wikipedia.org/wiki/List_of_applications_of_ARM_cores LLVM Supports: ARMv4T -> ARM7TDMI ARMv5TE -> ARM926EJ-S -> XScale ARMv6 -> ARM1136J(F)-S ARMv6ZK -> ARM1176JZ(F)-S ARMv7A -> Cortex-A8 Cortex-A9 ARMv7M -> Cortex-M3
2012 Jul 18
2
[LLVMdev] Setting up a cross-compiler for cortex-m3
On 18 July 2012 14:33, salvatore benedetto <salvatore.benedetto at gmail.com> wrote: > but I still haven't figure out how to build for cortex-m3 > > clang -march=armv7-m -mfloat-abi=soft <something missing?> testReference.cpp -c -march should have done the trick. You can also try -mcpu=cortex-m3, or try -ccc-host-triple armv7m-none-gnueabi (or -eabi), and possibly
2012 Oct 12
1
[LLVMdev] Target backend not converting char* to struct properly.
If you could point me towards the correct location in the standard I would appreciate that - I didn't realize it wasn't acceptable to turn pointer-data to structs. My example is reduced from the EEMBC benchmarks where I ran into the problem, so I may have reduced it too far by accident (but I'm fairly sure they do not use __attribute__ or similar). Adding a
2012 Jul 18
0
[LLVMdev] Setting up a cross-compiler for cortex-m3
On Wed, Jul 18, 2012 at 3:52 PM, Renato Golin <rengolin at systemcall.org> wrote: > On 18 July 2012 14:33, salvatore benedetto > <salvatore.benedetto at gmail.com> wrote: >> but I still haven't figure out how to build for cortex-m3 >> >> clang -march=armv7-m -mfloat-abi=soft <something missing?> testReference.cpp -c > > -march should have done
2015 Jan 10
2
[LLVMdev] LTO support on Mac
Hi, I'm building LLVM on Mac OS 10.10 and I'm having trouble making LTO work. The system linker dumps the following information when I executed "ld -v" @(#)PROGRAM:ld PROJECT:ld64-241.9 configured to support archs: armv6 armv7 armv7s arm64 i386 x86_64 x86_64h armv6m armv7m armv7em LTO support using: LLVM version 3.4.2 which tells me that it is correctly pointing to the LLVM
2015 Jul 21
2
[LLVMdev] Loop localize global variables
Hello all, I am writing to get some feedback on an optimization that I would like to upstream. The basic idea is to localize global variables inside loops so that it can be allocated into registers. For example, transform the following sequence static int gbl_var; void foo() { for () { ...access gbl_var... } } into something like static int gbl_var; void foo() { int lcl_var;
2014 Apr 08
2
[LLVMdev] Proposal: AArch64/ARM64 merge from EuroLLVM
Hi folks, As Tim pointed out, we recently had the opportunity to collect 64-bit benchmark performance data for GCC 4.9, AArch64 and ARM64 compilers on a real hardware. It is a cortex-a53 device. Due to proprietary reasons we cannot share the full hardware configuration. The preliminary results were shared at the hackers lab at EuroLLVM yesterday. For those who could not make it, below is