thr3ads.net - similar to: "Matrix Multiplication not Vectorized using double pointers"

Displaying 20 results from an estimated 9000 matches similar to: "Matrix Multiplication not Vectorized using double pointers"

Filter optimization remarks by the hotness of the code region

2016 May 04

Filter optimization remarks by the hotness of the code region

This idea came up a few times recently [1][2] so I’d like start prototyping it. To summarize, we can emit optimization remarks using the -Rpass* options. These are currently emitted by optimizations like vectorization[3], unrolling, inlining and since last week loop distribution. For large programs however this can amount to a lot of diagnostics output to sift through. Filtering this by the

[LLVMdev] LLVM loop vectorizer

2016 Feb 18

[LLVMdev] LLVM loop vectorizer

Hi Alex, I'm not aware of efforts on loop coalescing in LLVM, but probably polly can do something like this. Also, one related thought: it might be worth making it a separate pass, not a part of loop vectorizer. LLVM already has several 'utility' passes (e.g. loop rotation), which primarily aims at enabling other passes. Thanks, Michael > On Feb 15, 2016, at 6:44 AM, RCU

[LLVMdev] LLVM loop vectorizer

2016 Jun 04

[LLVMdev] LLVM loop vectorizer

Hi Alex, I think the changes you want are actually not vectorizer related. Vectorizer just uses data provided by other passes. What you probably might want is to look into routine Loop::getStartLoc() (see lib/Analysis/LoopInfo.cpp). If you find a way to improve it, patches are welcome:) Thanks, Michael > On Jun 3, 2016, at 6:13 PM, Alex Susu <alex.e.susu at gmail.com> wrote: >

[LLVMdev] LLVM loop vectorizer

2016 Jun 07

[LLVMdev] LLVM loop vectorizer

Hi Alex, This has been very recently fixed by Hal. See http://reviews.llvm.org/rL270771 Adam > On Jun 4, 2016, at 3:13 AM, Alex Susu via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hello. > Mikhail, I come back to this older thread. > I need to do a few changes to LoopVectorize.cpp. > > One of them is related to figuring out the exact C source line

[LLVMdev] LLVM loop vectorizer

2015 Jul 08

[LLVMdev] LLVM loop vectorizer

Hello. I am trying to vectorize a CSR SpMV (sparse matrix vector multiplication) procedure but the LLVM loop vectorizer is not able to handle such code. I am using cland and llvm version 3.4 (on Ubuntu 12.10). I use the -fvectorize option with clang and -loop-vectorize with opt-3.4 . The CSR SpMV function is inspired from

optimization remarks

2018 Aug 14

optimization remarks

Hi, I am trying to compare the loop vectorizers effectiveness for different targets relative to each other. That way, I am hoping to find loops that are not vectorized - but could be - on my target by finding other targets doing this successfully. With some luck, there might be something in the Target files that could be fixed with improved vectorization as a result... I would like to do

llvm is illegally vectorizing with a recurrence on skylake

2019 May 02

llvm is illegally vectorizing with a recurrence on skylake

Hi -- I have found a bug in an HPC code where llvm is vectorizing a loop on Skylake that has an obvious recurrence. I derived a small test case based on the original benchmark below: /*****************************************************************/ static void __attribute__ ((always_inline)) one( const int *restrict in, const int *const end, const unsigned shift, int *const restrict index,

How to get optimization remarks while testing with lnt in llvm

2018 Jun 05

How to get optimization remarks while testing with lnt in llvm

Hi, I'm new to llvm and am trying to run benchmarks from the test-suite using lnt to check loop-vectorization for various benchmarks. Test are compiling and executing fine, but I am not getting optimization remarks while using flags like -Rpass-missed=loop-vectorize and -Rpass-analysis=loop-vectorize I've tried running it like this: lnt runtest test-suite --sandbox SANDBOX --cc

Profile-based inlining status

2016 Mar 07

Profile-based inlining status

Hello, I'm learning how LLVM performs PGO (profile-guided optimizations) by using the instrumentation-based profile build (-fprofile-instr-generate and -fprofile-instr-use). However, I found there is no difference in inlining behaviors between with and without PGO for a few spec benchmarks by checking the emit optimization reports (-Rpass=inline -Rpass-missed=inline -Rpass-analysis=inline).

[LLVMdev] -gcolumn-info and PR 14106

2014 Jun 26

[LLVMdev] -gcolumn-info and PR 14106

For -Rpass, and other related uses, I am looking at enabling column info by default. David pointed me at PR 14106, which seems to be the original motivation for introducing -gcolumn-info. However, I am finding no differences when using it on this test. I've tried building with/without -gcolumn-info and found almost no difference in compile time (+0.4%): $ /usr/bin/time clang -w -fno-builtin

[GSoC 2016] Attaining 90% of the turbo boost peak with a C version of Matrix-Matrix Multiplication

2016 May 02

[GSoC 2016] Attaining 90% of the turbo boost peak with a C version of Matrix-Matrix Multiplication

Hi Tobias, according to [1], we can expect 90% of the turbo boost peak of the processor with a C version of Matrix-Matrix Multiplication that is similar to the one presented in [1]. In case of Intel Core i7-3820 SandyBridge, the theoretical maximal performance of the machine is 28.8 gflops and hence the expected number is 25,92 gflops. However, in case of, for example, n = m = 1056 and k = 1024

Pattern not recognized as reduction

2018 Feb 12

Pattern not recognized as reduction

Reduction Not Captured By LLVM CODE_1 ------------------------------------------------------------ ------------------------------------------------------------ -------------------- #include <stdio.h> int main() { int sum[1000]={1,2,3,4}; for (int i=1;i<1000;i++) { sum[0] +=sum[i-1]; } }

On Loop Distribution pass

2016 Oct 09

On Loop Distribution pass

Dear community, Our team at IITH have been experimenting with loop-distribution pass in LLVM. We see the following results on few benchmarks. clang -O3 -mllvm -enable-loop-distribute -Rpass=loop-distribute file.c clang -O3 -mllvm -enable-loop-distribute -Rpass-analysis=loop-distribute file.c TORCH

Filter optimization remarks by the hotness of the code region

2016 May 11

Filter optimization remarks by the hotness of the code region

> On May 11, 2016, at 3:37 AM, Hal Finkel <hfinkel at anl.gov> wrote: > > ----- Original Message ----- >> From: "Adam Nemet" <anemet at apple.com> >> To: "Hal Finkel" <hfinkel at anl.gov> >> Cc: "llvm-dev (llvm-dev at lists.llvm.org)" <llvm-dev at lists.llvm.org> >> Sent: Wednesday, May 11, 2016 1:15:42 AM

Vectorization width not correct using #pragma clang loop vectorize_width

2018 Sep 20

Vectorization width not correct using #pragma clang loop vectorize_width

Hello, I m trying to set vector width using #pragma clang loop vectorize_width(32) but i m getting width 8 for the following kernel; #define M 128 #define N 128 #define SQRT_FUN(x) sqrtf(x) int main(int argc, char** argv) { /* Variable declaration/allocation. */ double float_n = (double)N; double data[N*M]; double corr[M*M]; double mean[M]; double stddev[M]; uint32_t

Loop vectorization and unsafe floating point math

2020 Jun 24

Loop vectorization and unsafe floating point math

Hi llvm-dev! We are doing some fuzzy testing using C program generators, and one question that came up when generating a program with both floating point arithmetic and loop pragmas was; Is the loop vectorizer really allowed to vectorize a loop when it can't prove that it is safe to reorder fp math, even if there is a loop pragma that hints about a preferred width. When reading here

Vectorization of math function failed?

2020 Sep 01

Vectorization of math function failed?

I've tried to do: clang++ -O3 -march=native -mtune=native \ -Rpass=loop-vectorize,slp-vectorize -Rpass-missed=loop-vectorize,slp-vectorize -Rpass-analysis=loop-vectorize,slp-vectorize \ -ffast-math -ffp-model=fast -ffp-exception-behavior=ignore -ffp-contract=fast \ -c -o vec.o vec.cc But I've got no feedback. -- Alexandre Bique

[LLVMdev] aarch64 status for generating SIMD instructions

2015 Feb 09

[LLVMdev] aarch64 status for generating SIMD instructions

% clang -S -O3 -mcpu=cortex-a57 -ffast-math -Rpass-analysis=loop-vectorize dot.c dot.c:15:1: remark: loop not vectorized: value that could not be identified as reduction is used outside the loop [-Rpass-analysis=loop-vectorize] } ^ dot.c:15:1: note: could not determine the original source location for :0:0 I found “llvm-as < /dev/null | llc -march=aarch64 -mattr=help” which listed a

How to emit opt report when using LTO

2017 Oct 18

How to emit opt report when using LTO

Hi, I'm using clang frontend. I'm interested in some particular hot loop in my code and I emit a report from vectorizer optimizations passes. I receive nice output if passing -Rpass* flags as long as I'm building without LTO? But with -flto it just prints nothing. Is there a way to emit opt reports when using LTO? For now I can only approximate about whether my the loop will be

Filter optimization remarks by the hotness of the code region

2016 May 11

Filter optimization remarks by the hotness of the code region

Hi Hal, > On May 10, 2016, at 5:39 PM, Hal Finkel <hfinkel at anl.gov> wrote: > > Hi Adam, > > I think would be a really useful feature to have. I don't think that the backend should be responsible for filtering, but should pass the relative hotness information to the frontend. Given that these diagnostics are not just going to be used for -Rpass and friends, but also

similar to: Matrix Multiplication not Vectorized using double pointers