similar to: [LLVMdev] aarch64 status for generating SIMD instructions

Displaying 20 results from an estimated 4000 matches similar to: "[LLVMdev] aarch64 status for generating SIMD instructions"

2015 Feb 09
3
[LLVMdev] aarch64 status for generating SIMD instructions
So far, all I have tried is -O3 and with & without "-mcpu=cortex-a57". I'm new to LLVM so I'm not familiar with what optimization flags are available. I tried poking around in the LLVM documentation but haven't found a definitive list. The clang man page is skimpy on details. From: Arnaud A. de Grandmaison [mailto:arnaud.degrandmaison at arm.com] Sent: Monday, February
2015 Feb 09
3
[LLVMdev] aarch64 status for generating SIMD instructions
% clang -S -O3 -mcpu=cortex-a57 -ffast-math -Rpass-analysis=loop-vectorize dot.c dot.c:15:1: remark: loop not vectorized: value that could not be identified as reduction is used outside the loop [-Rpass-analysis=loop-vectorize] } ^ dot.c:15:1: note: could not determine the original source location for :0:0 I found “llvm-as < /dev/null | llc -march=aarch64 -mattr=help” which listed a
2017 May 31
6
[RFC] Making -mcpu=generic the default for ARM armv7a and arm8a rather than -mcpu=cortex-a8 or -mcpu=cortex-a53
Motivation At the moment, when targeting armv7a, clang defaults to generate code as if -mcpu=cortex-a8 was specified. When targeting armv8a, it defaults to generate code as if -mcpu=cortex-a53 was specified. This leads to surprising code generation, by the compiler optimizing for a specific micro-architecture, whereas the intent from the user was probably to generate code that is
2017 Jun 01
3
[RFC] Making -mcpu=generic the default for ARM armv7a and arm8a rather than -mcpu=cortex-a8 or -mcpu=cortex-a53
Thanks for everyone giving their feedback! I saw pretty unanimous support for making -mcpu=generic the default and making -mcpu=generic schedule for an in-order CPU (Cortex-A8 in this case). I'll be making those changes shortly. I think the comments also make clear that it's less obvious whether we'd want -mcpu=native to become a default. It's probably good for some use cases, but
2009 Jul 10
3
strange strsplit gsub problem 0 is this a bug or a string length limitation?
I was working with the rmetrics portfolioBacktesting function and dug into the code to try to find why my formula with 113 items, i.e. A1 thru A113, was being truncated and I only get 85 items, not 113. Is it due to a string length limitation in R or is it a bug in the strsplit or gsub functions, or in my string? I'd very much appreciate any suggestions ============Input script:
2014 Jun 26
2
[LLVMdev] Contributing the Apple ARM64 compiler backend
HI James, Thanks for your reply and hints on what can be done for the Aarch64 backend optimization for llvm We have SPEC license and v8 hardware. So I will start looking into it warm regards Manjunath On Wed, Jun 25, 2014 at 8:42 PM, James Molloy <james.molloy at arm.com> wrote: > Hi Manjunath, > > At the time of writing that status we had only done our initial analysis. >
2014 Jun 24
5
[LLVMdev] Contributing the Apple ARM64 compiler backend
Eric Christopher <echristo <at> gmail.com> writes: > > > The big pain issues I see merging from ARM64 to AArch64 are: > > 1. Apple have created a fairly complete scheduling model already for > > ARM64, and we'd have to merge the partial? model in AArch64 and theirs. We > > risk regressing performance on Apple's targets here, and we can't
2014 Jun 26
2
[LLVMdev] Contributing the Apple ARM64 compiler backend
Hi Sanjay, The behaviour I’m talking about I’ve actually pinned down to CodeGenPrepare not working too well with ISA’s that don’t have a good scaled load. I have a patch to fix it that is going through performance testing now. Your testcase seems specific to x86 – for aarch64 we get the rather spiffy: _Z3fooPii: // @_Z3fooPii // BB#0:
2014 Aug 07
1
Passing literal -cpu model string to qemu
On aarch64 with -M virt, the default CPU model is cortex-a15 (a 32 bit CPU). This is IMHO a stupid default, but there we are. Therefore most users will need to pass the `-cpu cortex-a53' or `-cpu cortex-a57' flag to qemu, depending on a complex formula of their host CPU and if they are using TCG or not. However I cannot work out how to pass this through libvirt. The obvious one would
2017 Feb 15
2
(RFC) Adjusting default loop fully unroll threshold
Thanks for running these Kristof! I'd still like to hear from Apple, and if we can get a few more x86 micro-architectures covered that'd be great, but it looks like -O3 is uncontroversial, and the question is whether this makes sense at O2... To me, it would help a lot to know the actual breakdown of benchmarks such as yours Kristof (as they seem to have more codesize impact than others
2014 Jan 27
2
[LLVMdev] [cfe-dev] AArch64 Clang CLI interface proposal
Ping. Can I assume that we're ok with this interface proposal then? Amara -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140127/49962cbc/attachment.html>
2014 Apr 23
2
[LLVMdev] Proposal: AArch64/ARM64 merge from EuroLLVM
Hi Gerolf, Sorry for the delayed response. I had to get permission to share more details. I am allowed to share relative numbers but not absolute numbers. Any missing test is due to runtime failures (e.g., gcc failure due to the fused multiply pattern bug which Tim fixed later on). Thanks, Ana. Benchmarks ARM64 vs GCC 4.9 % ARM64 vs AArch64 % ARM64 vs AArch64 patched %
2015 Jun 17
3
[LLVMdev] Build times on ARM
I recently got a tegra TK1 and was curious how fast it was compared to my previous arm "build machine": the original arm Samsung chromebook. I timed running ninja to build just llvm in Release+Asserts using clang as the host compiler. chromebook: real 84m30.939s user 163m50.145s sys 4m0.100s TK1: real 34m7.376s user 132m44.417s sys 3m3.543s A really nice
2015 May 15
6
[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives
tl;dr in low data situations we don’t look at past information, and that increases the false positive regression rate. We should look at the possibly incorrect recent past runs to fix that. Motivation: LNT’s current regression detection system has false positive rate that is too high to make it useful. With test suites as large as the llvm “test-suite” a single report will show hundreds of
2018 Mar 23
2
cuda cross compiling issue for target aarch64-linux-androideabi
I was wondering if anyone has encountered this issue when cross compiling cuda on Nvidia TX2 running android. The error is In file included from <built-in>:1: In file included from prebuilts/clang/host/linux-x86/clang-4667116/lib64/clang/7.0.1/include/__clang_cuda_runtime_wrapper.h:219: ../cuda/targets/aarch64-linux-androideabi/include/math_functions.hpp:3477:19: error: no matching function
2018 Mar 23
0
cuda cross compiling issue for target aarch64-linux-androideabi
+Artem Belevich <tra at google.com> On Fri, Mar 23, 2018 at 7:53 PM Bharath Bhoopalam via llvm-dev < llvm-dev at lists.llvm.org> wrote: > I was wondering if anyone has encountered this issue when cross compiling > cuda on Nvidia TX2 running android. > > The error is > In file included from <built-in>:1: > In file included from >
2017 Sep 16
3
LLVM mtriple for aarch64-win32-msvc ?
Thanks Martin, I'm generating the code using LLVM (writing llvm::Triple myself and llvm::TargetRegistry::lookupTarget is working), and that's how my bitcode is generated then using LLC to cross-compile that. So using armv7-win32-msvc is getting me a bit closer, but what CPU, raspberry pi 3 is running a Cortext-A53, but when I specify that in -mcpu argument I get this error: > llc.exe
2015 Jan 13
2
[LLVMdev] question about enabling cfl-aa and collecting a57 numbers
Hi folks, Moving the discussion to llvm.dev. None of the changes we talked earlier help. Find attached the C source code that you can use to reproduce the issue. clang --target=aarch64-linux-gnu -c -mcpu=cortex-a57 -Ofast -fno-math-errno test.c -S -o test.s -mllvm -debug-only=licm LICM hoisting to while.body.lr.ph: %21 = load double** %arrayidx8, align 8, !tbaa !5 LICM hoisting to
2017 Feb 16
4
(RFC) Adjusting default loop fully unroll threshold
First off, I just want to say wow and thank you. This kind of data is amazing. =D On Thu, Feb 16, 2017 at 2:46 AM Kristof Beyls <Kristof.Beyls at arm.com> wrote: > The biggest relative code size increases indeed didn't happen for the > biggest programs, but instead for a few programs weighing in at about 100KB. > I'm assuming the Google benchmark set covers much bigger
2015 Jan 14
2
[LLVMdev] question about enabling cfl-aa and collecting a57 numbers
Can you send me actual LLVM IR or a preprocessed source from using -E? I don't have a machine handy that has headers that target that arch. On Tue Jan 13 2015 at 4:33:29 PM Daniel Berlin <dberlin at dberlin.org> wrote: > Anything other than noalias or mustalias should be getting passed down the > stack, so either that is not happening or CFL aa is giving better answers > and