thr3ads.net - similar to: "[LLVMdev] aarch64 status for generating SIMD instructions"

Displaying 20 results from an estimated 4000 matches similar to: "[LLVMdev] aarch64 status for generating SIMD instructions"

[LLVMdev] aarch64 status for generating SIMD instructions

2015 Feb 09

[LLVMdev] aarch64 status for generating SIMD instructions

So far, all I have tried is -O3 and with & without "-mcpu=cortex-a57". I'm new to LLVM so I'm not familiar with what optimization flags are available. I tried poking around in the LLVM documentation but haven't found a definitive list. The clang man page is skimpy on details. From: Arnaud A. de Grandmaison [mailto:arnaud.degrandmaison at arm.com] Sent: Monday, February

[LLVMdev] aarch64 status for generating SIMD instructions

2015 Feb 09

[LLVMdev] aarch64 status for generating SIMD instructions

% clang -S -O3 -mcpu=cortex-a57 -ffast-math -Rpass-analysis=loop-vectorize dot.c dot.c:15:1: remark: loop not vectorized: value that could not be identified as reduction is used outside the loop [-Rpass-analysis=loop-vectorize] } ^ dot.c:15:1: note: could not determine the original source location for :0:0 I found “llvm-as < /dev/null | llc -march=aarch64 -mattr=help” which listed a

[RFC] Making -mcpu=generic the default for ARM armv7a and arm8a rather than -mcpu=cortex-a8 or -mcpu=cortex-a53

2017 May 31

[RFC] Making -mcpu=generic the default for ARM armv7a and arm8a rather than -mcpu=cortex-a8 or -mcpu=cortex-a53

Motivation At the moment, when targeting armv7a, clang defaults to generate code as if -mcpu=cortex-a8 was specified. When targeting armv8a, it defaults to generate code as if -mcpu=cortex-a53 was specified. This leads to surprising code generation, by the compiler optimizing for a specific micro-architecture, whereas the intent from the user was probably to generate code that is

[RFC] Making -mcpu=generic the default for ARM armv7a and arm8a rather than -mcpu=cortex-a8 or -mcpu=cortex-a53

2017 Jun 01

[RFC] Making -mcpu=generic the default for ARM armv7a and arm8a rather than -mcpu=cortex-a8 or -mcpu=cortex-a53

Thanks for everyone giving their feedback! I saw pretty unanimous support for making -mcpu=generic the default and making -mcpu=generic schedule for an in-order CPU (Cortex-A8 in this case). I'll be making those changes shortly. I think the comments also make clear that it's less obvious whether we'd want -mcpu=native to become a default. It's probably good for some use cases, but

strange strsplit gsub problem 0 is this a bug or a string length limitation?

2009 Jul 10

strange strsplit gsub problem 0 is this a bug or a string length limitation?

I was working with the rmetrics portfolioBacktesting function and dug into the code to try to find why my formula with 113 items, i.e. A1 thru A113, was being truncated and I only get 85 items, not 113. Is it due to a string length limitation in R or is it a bug in the strsplit or gsub functions, or in my string? I'd very much appreciate any suggestions ============Input script:

[LLVMdev] Contributing the Apple ARM64 compiler backend

2014 Jun 26

[LLVMdev] Contributing the Apple ARM64 compiler backend

HI James, Thanks for your reply and hints on what can be done for the Aarch64 backend optimization for llvm We have SPEC license and v8 hardware. So I will start looking into it warm regards Manjunath On Wed, Jun 25, 2014 at 8:42 PM, James Molloy <james.molloy at arm.com> wrote: > Hi Manjunath, > > At the time of writing that status we had only done our initial analysis. >

[LLVMdev] Contributing the Apple ARM64 compiler backend

2014 Jun 24

[LLVMdev] Contributing the Apple ARM64 compiler backend

Eric Christopher <echristo <at> gmail.com> writes: > > > The big pain issues I see merging from ARM64 to AArch64 are: > > 1. Apple have created a fairly complete scheduling model already for > > ARM64, and we'd have to merge the partial? model in AArch64 and theirs. We > > risk regressing performance on Apple's targets here, and we can't

[LLVMdev] Contributing the Apple ARM64 compiler backend

2014 Jun 26

[LLVMdev] Contributing the Apple ARM64 compiler backend

Hi Sanjay, The behaviour I’m talking about I’ve actually pinned down to CodeGenPrepare not working too well with ISA’s that don’t have a good scaled load. I have a patch to fix it that is going through performance testing now. Your testcase seems specific to x86 – for aarch64 we get the rather spiffy: _Z3fooPii: // @_Z3fooPii // BB#0:

Passing literal -cpu model string to qemu

2014 Aug 07

Passing literal -cpu model string to qemu

On aarch64 with -M virt, the default CPU model is cortex-a15 (a 32 bit CPU). This is IMHO a stupid default, but there we are. Therefore most users will need to pass the `-cpu cortex-a53' or `-cpu cortex-a57' flag to qemu, depending on a complex formula of their host CPU and if they are using TCG or not. However I cannot work out how to pass this through libvirt. The obvious one would

(RFC) Adjusting default loop fully unroll threshold

2017 Feb 15

(RFC) Adjusting default loop fully unroll threshold

Thanks for running these Kristof! I'd still like to hear from Apple, and if we can get a few more x86 micro-architectures covered that'd be great, but it looks like -O3 is uncontroversial, and the question is whether this makes sense at O2... To me, it would help a lot to know the actual breakdown of benchmarks such as yours Kristof (as they seem to have more codesize impact than others

[LLVMdev] [cfe-dev] AArch64 Clang CLI interface proposal

2014 Jan 27

[LLVMdev] [cfe-dev] AArch64 Clang CLI interface proposal

Ping. Can I assume that we're ok with this interface proposal then? Amara -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140127/49962cbc/attachment.html>

[LLVMdev] Proposal: AArch64/ARM64 merge from EuroLLVM

2014 Apr 23

[LLVMdev] Proposal: AArch64/ARM64 merge from EuroLLVM

Hi Gerolf, Sorry for the delayed response. I had to get permission to share more details. I am allowed to share relative numbers but not absolute numbers. Any missing test is due to runtime failures (e.g., gcc failure due to the fused multiply pattern bug which Tim fixed later on). Thanks, Ana. Benchmarks ARM64 vs GCC 4.9 % ARM64 vs AArch64 % ARM64 vs AArch64 patched %

[LLVMdev] Build times on ARM

2015 Jun 17

[LLVMdev] Build times on ARM

I recently got a tegra TK1 and was curious how fast it was compared to my previous arm "build machine": the original arm Samsung chromebook. I timed running ninja to build just llvm in Release+Asserts using clang as the host compiler. chromebook: real 84m30.939s user 163m50.145s sys 4m0.100s TK1: real 34m7.376s user 132m44.417s sys 3m3.543s A really nice

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

2015 May 15

[LLVMdev] Proposal: change LNT’s regression detection algorithm and how it is used to reduce false positives

tl;dr in low data situations we don’t look at past information, and that increases the false positive regression rate. We should look at the possibly incorrect recent past runs to fix that. Motivation: LNT’s current regression detection system has false positive rate that is too high to make it useful. With test suites as large as the llvm “test-suite” a single report will show hundreds of

cuda cross compiling issue for target aarch64-linux-androideabi

2018 Mar 23

cuda cross compiling issue for target aarch64-linux-androideabi

I was wondering if anyone has encountered this issue when cross compiling cuda on Nvidia TX2 running android. The error is In file included from <built-in>:1: In file included from prebuilts/clang/host/linux-x86/clang-4667116/lib64/clang/7.0.1/include/__clang_cuda_runtime_wrapper.h:219: ../cuda/targets/aarch64-linux-androideabi/include/math_functions.hpp:3477:19: error: no matching function

cuda cross compiling issue for target aarch64-linux-androideabi

2018 Mar 23

cuda cross compiling issue for target aarch64-linux-androideabi

+Artem Belevich <tra at google.com> On Fri, Mar 23, 2018 at 7:53 PM Bharath Bhoopalam via llvm-dev < llvm-dev at lists.llvm.org> wrote: > I was wondering if anyone has encountered this issue when cross compiling > cuda on Nvidia TX2 running android. > > The error is > In file included from <built-in>:1: > In file included from >

LLVM mtriple for aarch64-win32-msvc ?

2017 Sep 16

LLVM mtriple for aarch64-win32-msvc ?

Thanks Martin, I'm generating the code using LLVM (writing llvm::Triple myself and llvm::TargetRegistry::lookupTarget is working), and that's how my bitcode is generated then using LLC to cross-compile that. So using armv7-win32-msvc is getting me a bit closer, but what CPU, raspberry pi 3 is running a Cortext-A53, but when I specify that in -mcpu argument I get this error: > llc.exe

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

2015 Jan 13

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

Hi folks, Moving the discussion to llvm.dev. None of the changes we talked earlier help. Find attached the C source code that you can use to reproduce the issue. clang --target=aarch64-linux-gnu -c -mcpu=cortex-a57 -Ofast -fno-math-errno test.c -S -o test.s -mllvm -debug-only=licm LICM hoisting to while.body.lr.ph: %21 = load double** %arrayidx8, align 8, !tbaa !5 LICM hoisting to

(RFC) Adjusting default loop fully unroll threshold

2017 Feb 16

(RFC) Adjusting default loop fully unroll threshold

First off, I just want to say wow and thank you. This kind of data is amazing. =D On Thu, Feb 16, 2017 at 2:46 AM Kristof Beyls <Kristof.Beyls at arm.com> wrote: > The biggest relative code size increases indeed didn't happen for the > biggest programs, but instead for a few programs weighing in at about 100KB. > I'm assuming the Google benchmark set covers much bigger

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

2015 Jan 14

[LLVMdev] question about enabling cfl-aa and collecting a57 numbers

Can you send me actual LLVM IR or a preprocessed source from using -E? I don't have a machine handy that has headers that target that arch. On Tue Jan 13 2015 at 4:33:29 PM Daniel Berlin <dberlin at dberlin.org> wrote: > Anything other than noalias or mustalias should be getting passed down the > stack, so either that is not happening or CFL aa is giving better answers > and

similar to: [LLVMdev] aarch64 status for generating SIMD instructions