Displaying 20 results from an estimated 400 matches similar to: "Looking for suggestions: Inferring GPU memory accesses"
2020 Aug 23
2
Looking for suggestions: Inferring GPU memory accesses
@Ees,
Oh, I see what you mean now. Doing such analysis would be useful for a
thread block and not just a single thread but as you say you are onto
something bigger than just a thread.
We had published a short paper in ICS around this which uses polyhedral
techniques to do such analysis and reason about uncoalesced access patterns
in Cuda programs. You can find paper at
2012 Mar 05
2
[LLVMdev] OpenCL backend for LLVM
Hi,
this is a follow-up on my email from august
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-August/042737.html).
i have, finally, released my OpenCL backend and control-flow
restructuring framework for LLVM (AST-Extractor, or short axtor). The
framework restructures function CFGs such that they can be expressed
entirely without GOTOs or switch/loop-trickery. Hence, making it
possible to
2012 Aug 17
1
[LLVMdev] Portable OpenCL (pocl) v0.6 released
Portable OpenCL (pocl) v0.6 released
------------------------------------
Portable OpenCL aims to be an efficient open source (MIT-licensed)
implementation of the OpenCL 1.2 standard.
In addition to producing an easily portable open source OpenCL
implementation, another major goal of the project is improving
performance portability of OpenCL programs with compiler
optimizations, reducing the
2012 Mar 06
2
[LLVMdev] OpenCL backend for LLVM
Hi Micah,
i just had a quick look at your structurizer. Here is what if found
(correct me, if i am mistaken):
* Our approaches for handling Loops with multiple exits are identical.
("Loop-Exit Enumeration")
* Axtor implements Controlled-Node Splitting and can cope with
irreducible control-flow.
(http://cardit.et.tudelft.nl/MOVE/papers/cc96.ps)
* Axtor translates switches to cascading
2012 Mar 05
0
[LLVMdev] OpenCL backend for LLVM
Simon,
Have you looked at the control flow structizer that we have in the Open Source AMDIL backend?
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
> On Behalf Of Simon Moll
> Sent: Monday, March 05, 2012 1:01 PM
> To: llvmdev at cs.uiuc.edu
> Subject: [LLVMdev] OpenCL backend for LLVM
>
> Hi,
>
> this
2019 Mar 13
2
[RFC] Late (OpenMP) GPU code "SPMD-zation"
There are tooooooo(!) many changes, I don't who's going to review sooooo
big patch. You definitely need to split it into several smaller patches.
Also, I don't like the idea adding of one more class for NVPTX codegen.
All your changes should be on top of the eixisting solution.
-------------
Best regards,
Alexey Bataev
13.03.2019 15:08, Doerfert, Johannes пишет:
> Please consider
2012 Mar 06
0
[LLVMdev] OpenCL backend for LLVM
The person that wrote our structurizer agrees with your analysis. Too bad the licenses are incompatible, it would be nice to merge similar efforts.
> -----Original Message-----
> From: Simon Moll [mailto:simon.m.moll at googlemail.com]
> Sent: Tuesday, March 06, 2012 2:49 AM
> To: Villmow, Micah
> Cc: llvmdev at cs.uiuc.edu
> Subject: RE: [LLVMdev] OpenCL backend for LLVM
>
2018 Mar 09
1
[Polly] Reduced code analyzability moving from LLVM 3.9.0 to 5.0.1
Hi Johannes,
Perfect, thanks! The CFG now looks very similar to what I got on LLVM
3.9.0 ([1] vs [2]).
Any idea why setting -simplifycfg-sink-common=false is necessary?
Similar to LLVM 5.0.1, the default for 3.9.0 is true [3], and setting it
to false wasn't necessary in the latter version.
[1]
https://nautilus.bjornweb.nl/files/polly501-cfg-simplifycfg-sink-common.pdf
[2]
2019 Mar 13
2
[RFC] Late (OpenMP) GPU code "SPMD-zation"
-------------
Best regards,
Alexey Bataev
13.03.2019 15:35, Doerfert, Johannes пишет:
>
> Hi Alexey,
>
>
> thank you for your quick feedback.
>
>
> > There are tooooooo(!) many changes, I don't who's going to review sooooo big
> patch.
>
>
> I can for sure split it in the three components/repositories that are
> touched, clang, llvm, and openmp.
2019 Mar 13
4
[RFC] Late (OpenMP) GPU code "SPMD-zation"
1. You don't need to implement everything in a single patch. The development process is a step-by-step process, when you commit something in small pieces. The code must nit be fully functional, you may start from some basic features. Currently it is very hard to review.
2. I rather doubt that it can be reused without changes for AMD etc., especially without being fully tested. The only tested
2019 Mar 13
3
[RFC] Late (OpenMP) GPU code "SPMD-zation"
Johannes, did you try it on AMD GPUs? If not, I think it might be early to claim it as a general interface for NVidia/AMD GPUs. I'm ok, if you want tointroduce a basic class for the GPU-specific codegen, but it must be done step-by-step and thoroughly tested and reviewed. Theremightbe some parts, common with NVPTX codegen. You can put the commonfunctions into a base class and remove them from
2016 Oct 12
4
[test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On Wed, Oct 12, 2016 at 10:53 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> I don't think that Clang/LLVM uses it by default on x86_64. If you're using -Ofast, however, that would explain it. I recommend looking at -O3 vs -O0 and make sure those are the same. -Ofast enables -ffast-math, which can legitimately cause differences.
>
The following tests pass at "-O3" and
2007 Dec 09
1
package "growth" ... where is it ?
I would like to install the package "growth" as it contains the function
"corgram" and some other presumably useful stuff for time series analysis.
I can see it is in R standard library list:
http://hosho.ees.hokudai.ac.jp/~kubo/Rdoc/doc/html/packages.html<http://hosho.ees.hokudai.ac.jp/%7Ekubo/Rdoc/doc/html/packages.html>
2004 Dec 03
1
Getting R to emit an image file as a pipe or Base64 strea m: Mac OSX 10.3 - R 2.0.1
> From: Yuandan Zhang
>
> If you want to call R from perl, why don't you do a simple
> system call like:
>
> $callR="/usr/loca/bin/R CMD BATCH plotscript.R";
> system ($callR);
>
> It is not necessary to start X display if anything can be
> done in background
But the problem is jpeg()/png() are not available unless an X display is
available to the
2013 Jun 06
2
[LLVMdev] [Polly] Set up performance tester for GSOC2013 FastPolly project
Hi Tobias,
I am recently trying to set up the performance tester for FastPolly project. According to your suggestion, I plan to use the LNT infrastructure to set up the performance tester. For this purpose, I think I should do this job in three steps:
First, I will add PolyBench to LLVM test-suite since PolyBench is the critical benchmarks for FastPolly. I have adjust the PolyBench-c-3.2 so we
2013 Jun 09
1
[LLVMdev] [Patch] Apply for adding PolyBench to LLVM testsuite
Hi all,
PolyBench (http://www.cse.ohio-state.edu/~pouchet/software/polybench/) is a well-known benchmark for polyhedral compiler. Since LLVM-Polly http://polly.llvm.org/) has provided a very good polyhedral optimizer for LLVM, could we add this benchmark to LLVM test-suite?
I have attached the patch file to add PolyBench to LLVM test-suite.
Best wishes,
Star Tan
-------------- next part
2016 Oct 20
2
[test-suite] making polybench/symm succeed with "-Ofast" and "-ffp-contract=on"
On Fri, Oct 14, 2016 at 6:10 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>> polybench/linear-algebra/kernels/symm, FP_ABSTOLERANCE=1e1
>> polybench/linear-algebra/solvers/gramschmidt, FP_ABSTOLERANCE=1e0
>> What should be a good relative tolerance to set for these two tests?
>
> What's the minimum relative tolerance that you need for them to pass?
Setting
2013 Jun 09
0
[LLVMdev] [Polly] Set up performance tester for GSOC2013 FastPolly project
On 06/06/2013 11:17 AM, Star Tan wrote:
> Hi Tobias,
>
>
> I am recently trying to set up the performance tester for FastPolly project. According to your suggestion, I plan to use the LNT infrastructure to set up the performance tester. For this purpose, I think I should do this job in three steps:
>
>
> First, I will add PolyBench to LLVM test-suite since PolyBench is the
2020 Oct 03
2
Information about the number of indices in memory accesses
Michael makes a great point about aliasing here and different indexing that
accesses the same element!
Another note: x = A[0][2] is fundamentally different depending on the type
of `A`. If e.g. A was declared: int A[10][20], there's only _one_ load. A
is a (and is treated as) a linear buffer,
and GEPs only pinpoint the specific position of A[0][2] in this buffer
(i.e. 0*10 + 2). But if A was
2020 Sep 23
2
Information about the number of indices in memory accesses
Hi all,
For loads and stores i want to extract information about the number of
indices accessed. For instance:
struct S {int X, int *Y};
__global__ void kernel(int *A, int **B, struct S) {
int x = A[..][..]; // -> L: A[..][..]
int y = *B[2]; // -> L: B[0][2]
int z = S.y[..]; // -> L: S.1[..]
// etc..
}
I am performing some preprocessing on IR to:
1. Move constant