thr3ads.net - similar to: "Polly loop offloading to Accelerator"

Displaying 20 results from an estimated 2000 matches similar to: "Polly loop offloading to Accelerator"

2018 Jan 20

Polly loop offloading to Accelerator

Hello, i have been working with an accelerator backend. the accelerator has large vector/simd units. i want streaming loops (non-temporal) vectorized present in code to be offloaded to accelerator simd units. i find polly really suitable for this. i am thinking if the generated IR is passed to polly and then it analyzes loop to know it posses no reuse, if such loop is identified accelerator

Replicate Individual O3 optimizations

2019 Oct 13

Replicate Individual O3 optimizations

Hello, I want to study the individual O3 optimizations. For this I am using following commands, but unable to replicate O3 behavior. 1. Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang -O1 -Xclang -disable-llvm-passes -emit-llvm -S vecsum.c -o vecsum-noopt.ll 2. Documents/clang+llvm-9.0.0-x86_64-linux-gnu-ubuntu-18.04/bin/clang -O3 -mllvm -debug-pass=Arguments -emit-llvm -S

Replicate Individual O3 optimizations

2019 Oct 19

Replicate Individual O3 optimizations

On Thu, Oct 17, 2019 at 11:22 AM David Greene via llvm-dev < llvm-dev at lists.llvm.org> wrote: > hameeza ahmed via llvm-dev <llvm-dev at lists.llvm.org> writes: > > > Hello, > > I want to study the individual O3 optimizations. For this I am using > > following commands, but unable to replicate O3 behavior. > > > > 1.

Four bitcode generated with plugin-opt=save-temps

2018 May 15

Four bitcode generated with plugin-opt=save-temps

Hi Teresa Thanks for your very quick and clear explanation. I have one more question. The emit-llvm option will give you the IR for a single source file when you compile it with -c. All of those files when combined give the IR in the preopt.bc temp file. =========== So if I use "clang -emit-llvm -c" to generate the .ll file. It should be the same as the one I generated by using

Replicate Individual O3 optimizations

2019 Oct 24

Replicate Individual O3 optimizations

I run matrix multiplication code with both the approaches o3 at clang and o3 at opt. clang o3 is about 2.97x faster than opt o3. On Mon, Oct 21, 2019 at 8:24 AM Neil Nelson <nnelson at infowest.com> wrote: > is_sorted.cpp > bool is_sorted(int *a, int n) { > > for (int i = 0; i < n - 1; i++) > > if (a[i] > a[i + 1]) > return false; > return

Four bitcode generated with plugin-opt=save-temps

2018 May 15

Four bitcode generated with plugin-opt=save-temps

These are the bitcode at different stages of the LTO portion of the compile. LTO merges the IR for all files being linked and optimizes them as a single monolithic module. The preopt.bc is the merged IR just after merging and before performing any LTO optimizations. internalize.bc is after performing whole program internalization. opt.bc is after the optimization pipeline, and .precodegen.bc is

Issues with using scalar evolution with newer versions of LLVM IR

2019 Jan 16

Issues with using scalar evolution with newer versions of LLVM IR

Thank You.. I used following command to generate .bc or .ll /Documents/clang+llvm-4.0.0-x86_64-linux-gnu-ubuntu-16.04/bin/clang -O0 -emit-llvm -S -o vec4.ll vecsum.c /Documents/clang+llvm-7.0.0-x86_64-linux-gnu-ubuntu-16.04/bin/clang -O0 -emit-llvm -S -o vec7.ll vecsum.c On Wed, Jan 16, 2019 at 6:49 AM Sanjoy Das <sanjoy at playingwithpointers.com> wrote: > It is hard to tell

Polly | Dependence detection details

2016 Nov 10

Polly | Dependence detection details

Hi everyone, I'll be very thankful if anyone can help me. I want to extract the dependences details by using polly. I followed the following steps on example code matmul.c: 1. clang -S -emit-llvm matmul.c -o matmul.s 2. opt -S -polly-canonicalize matmul.s > matmul.preopt.ll 3. opt -basicaa -polly-dependences -analyze matmul.preopt.ll But it doesn't show me the dependences. I

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 16

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

At 2013-08-16 12:44:02,"Tobias Grosser" <tobias at grosser.es> wrote: >Hi, > >I tried to reproduce your findings, but could not do so. Sorry, I did not put all code in my previous email because the code seems a little too long and complicated. You can refer to the detailed C code and LLVM IR code on http://llvm.org/bugs/show_bug.cgi?id=16843 There are four attachments

[LLVMdev] How to make Polly ignore some non-affine memory accesses

2011 Nov 01

[LLVMdev] How to make Polly ignore some non-affine memory accesses

Mmm, this code seems to kill polly: #include <stdio.h> #include <stdlib.h> int main() { char *B; int i,j,k,h; const int x = 0, y=0; B = (char *)malloc(sizeof(char)*1024*1024); for (i = 1; i < 1024; i++) for (j = 1; j < 1024; j++) { if (i+j > 1000) B[j] = i; } printf("Random Value: %d", B[rand() % 1024*1024]); return 0; } running: opt

Four bitcode generated with plugin-opt=save-temps

2018 May 15

Four bitcode generated with plugin-opt=save-temps

Hi I use the LDFLAGS=" -flto -fuse-ld=gold -Wl,-plugin-opt=save-temps " to generate the makefile and to make the whole program. However, found four different kinds of bitcode for each target. For example, I am compiling coreutils. For the program "nohup", I can get nohup.0.0.preopt.bc nohup.0.2.internalize.bc nohup.0.4.opt.bc nohup.0.5.precodegen.bc If I am right, I

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 16

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

On 08/16/2013 02:42 AM, Star Tan wrote: > At 2013-08-16 12:44:02,"Tobias Grosser" <tobias at grosser.es> wrote: >> Hi, >> >> I tried to reproduce your findings, but could not do so. > > > Sorry, I did not put all code in my previous email because the code seems a little too long and complicated. > You can refer to the detailed C code and LLVM IR

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 16

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

On 08/15/2013 03:32 AM, Star Tan wrote: > Hi all, Hi, I tried to reproduce your findings, but could not do so. > I have investigated the 6X extra compile-time overhead when Polly compiles the simple nestedloop benchmark in LLVM-testsuite. (http://188.40.87.11:8000/db_default/v4/nts/31?compare_to=28&baseline=28). Preliminary results show that such compile-time overhead is resulted by

[LLVMdev] How to make Polly ignore some non-affine memory accesses

2011 Oct 07

[LLVMdev] How to make Polly ignore some non-affine memory accesses

I add also the output of these commands: [hades at artemis examples]$ ./compile_ex.sh super_simple_loop Printing analysis 'Polly - Detect Scops in functions' for function 'main': [hades at artemis examples]$ modifying it in : #include <stdio.h> int main() { int A[1024]; int j, k=10; for (j = 0; j < 1024; j++) A[j] = k;

[LLVMdev] How to make Polly ignore some non-affine memory accesses

2011 Oct 08

[LLVMdev] How to make Polly ignore some non-affine memory accesses

On 10/07/2011 03:43 PM, Marcello Maggioni wrote: > 2011/10/7 Marcello Maggioni<hayarms at gmail.com>: >> Hi, >> >> for example this loop: >> >> #include<stdio.h> >> >> int main() >> { >> int A[1024]; >> int j, k=10; >> for (j = 1; j< 1024; j++) >> A[j] =

[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.

2015 Mar 13

[RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.

From: Jonathan Lennox <jonathan at vidyo.com> * Makes ?enable-intrinsics work with clang and other non-GCC compilers * Enables RTCD for the floating-point-mode SSE code in Celt. * Disables use of RTCD in cases where the compiler targets an instruction set by default. * Enables the SSE4.1 Silk optimizations that apply to the common parts of Silk when Opus is built in floating-point mode, not

[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.

2015 Mar 12

[RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.

[LLVMdev] [Polly] Move Polly's execution later

2013 Sep 30

[LLVMdev] [Polly] Move Polly's execution later

At 2013-09-25 18:03:18,"Tobias Grosser" <tobias at grosser.es> wrote:> >I think this is too early, as most of the canonicalization is not yet >done. We probably don't need to investigate this bug immediately, but >it would be nice if we could make it reproducible without your changes >to polly. For this please run the command with -debug-pass=Arguments

Polly Dependency Analysis in MyPass

2018 Jan 29

Polly Dependency Analysis in MyPass

i put following line in CMakeLists.txt; add_subdirectory(mypass) then used make -j9 then i used following and run on canonicalize IR $ opt -load lib/LLVMmypass.so -mypass vec-sum.preopt.ll On Mon, Jan 29, 2018 at 9:39 PM, Michael Kruse <llvmdev at meinersbur.de> wrote: > 2018-01-29 10:18 GMT-06:00 hameeza ahmed <hahmed2305 at gmail.com>: > > I tried writing

Is it possible to generate the IR representation with the original macro information?

2019 Jan 18

Is it possible to generate the IR representation with the original macro information?

Hi, I use the following commands to compile the IR. But I don't see the macro information in the .ll file. Is there a way to preserve the macro information (print() in this case) for debugging purposes? $ clang -std=gnu99 -g3 -flto -Wall -pedantic -c -o main.o main.c $ clang main.o -flto -fuse-ld=gold '-Wl,-plugin-opt=save-temps' -o main.exe $ llvm-dis main.exe.0.0.preopt.bc /* vim:

similar to: Polly loop offloading to Accelerator