Displaying 14 results from an estimated 14 matches for "blockidx".
Did you mean:
blockid
2020 Sep 23
2
Information about the number of indices in memory accesses
...without a GEP operand, explicitly create a
(trivial) GEP with index 0
So now the operand of every load and store is a GEP instruction.
For simple stuff i am getting the right answer but when the index
expression becomes more complex multiple GEPs are introduced. For instance:
*(A+2*(blockDim.x*blockIdx.x+threadIdx.x+1)+2+3) = 5;
produces:
%6 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x()
%7 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x()
%8 = mul i32 %6, %7,
%9 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
%10 = add i32 %8, %9,
%11 = add i32 %10, 1,
%12 = mul i32 2, %11,
%13 =...
2014 Jun 16
3
[LLVMdev] Attaching range metadata to IntrinsicInst
...pute the known bits of
these intrinsics as special cases. This approach is already taken for the
x86_sse42_crc32_64_64 intrinsic. However, this approach may not be elegant
because the ranges of these CUDA special registers depend on the GPU
compute capability specified by -target-cpu. For instance, blockIdx.x is
bounded by 65535 in sm_20 but 2^31-1 in sm_30. Exposing -target-cpu to
ValueTracking is probably discouraged.
Therefore, the approach I am considering is to have clang annotate the
ranges of these CUDA special registers according to the -target-cpu flag,
and have ValueTracking pick the range...
2013 Jan 22
1
[LLVMdev] Compiling to NVPTX
...ers using LLVM (llvm-c) and Clojure. As part of my talk I'd like to
give an example of a program running on CUDA.
Are there any papers, tutorials, examples, on writing a custom frontend for
NVPTX? For instance, I'm trying to figure out how to get access to "global"
variables like blockidx. I know that libc won't be accessible so I'm
probably just going to give a demonstration of a image blur filter written
in a custom-built programming language
I was going to try simply taking my llvm module and having llvm write out
the object file using the nvptx triple. But to be honest...
2014 Jun 17
5
[LLVMdev] Attaching range metadata to IntrinsicInst
...ics as special cases. This approach is already
> taken for the x86_sse42_crc32_64_64 intrinsic. However, this
> approach may not be elegant because the ranges of these CUDA special
> registers depend on the GPU compute capability specified by
> -target-cpu. For instance, blockIdx.x is bounded by 65535 in sm_20
> but 2^31-1 in sm_30. Exposing -target-cpu to ValueTracking is
> probably discouraged.
>
> Therefore, the approach I am considering is to have clang annotate
> the ranges of these CUDA special registers according to the
> -target...
2020 Oct 03
2
Information about the number of indices in memory accesses
...>
> > So now the operand of every load and store is a GEP instruction.
> >
> > For simple stuff i am getting the right answer but when the index
> > expression becomes more complex multiple GEPs are introduced. For
> > instance:
> >
> > *(A+2*(blockDim.x*blockIdx.x+threadIdx.x+1)+2+3) = 5;
> >
> > produces:
> >
> > %6 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x()
> > %7 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x()
> > %8 = mul i32 %6, %7,
> > %9 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
> > %10 = a...
2014 Jun 17
4
[LLVMdev] Attaching range metadata to IntrinsicInst
...ach is already
>>> taken for the x86_sse42_crc32_64_64 intrinsic. However, this
>>> approach may not be elegant because the ranges of these CUDA special
>>> registers depend on the GPU compute capability specified by
>>> -target-cpu. For instance, blockIdx.x is bounded by 65535 in sm_20
>>> but 2^31-1 in sm_30. Exposing -target-cpu to ValueTracking is
>>> probably discouraged.
>>>
>>> Therefore, the approach I am considering is to have clang annotate
>>> the ranges of these CUDA special reg...
2014 Jun 17
2
[LLVMdev] Attaching range metadata to IntrinsicInst
...x86_sse42_crc32_64_64 intrinsic. However, this
>>>>> approach may not be elegant because the ranges of these CUDA
>>>>> special
>>>>> registers depend on the GPU compute capability specified by
>>>>> -target-cpu. For instance, blockIdx.x is bounded by 65535 in sm_20
>>>>> but 2^31-1 in sm_30. Exposing -target-cpu to ValueTracking is
>>>>> probably discouraged.
>>>>>
>>>>> Therefore, the approach I am considering is to have clang annotate
>>>>>...
2015 Jan 24
2
[LLVMdev] Proposal: pragma for branch divergence
...r can then enable certain
optimizations such as jump threading and loop unswitching only on
non-divergent branches. In longer term, the optimizer can even adopt some
cheap data-flow analysis to conservatively compute whether a branch is
non-divergent. For example, if a condition is not derived from blockIdx or
threadIdx, it is guaranteed to hold the same value for all threads in a
warp. How the compiler can leverage these annotationsSimilar to the
annotations for loop optimizations
(http://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations
<http://clang.llvm.org/docs...
2020 Oct 03
2
Information about the number of indices in memory accesses
...and store is a GEP instruction.
>>> >
>>> > For simple stuff i am getting the right answer but when the index
>>> > expression becomes more complex multiple GEPs are introduced. For
>>> > instance:
>>> >
>>> > *(A+2*(blockDim.x*blockIdx.x+threadIdx.x+1)+2+3) = 5;
>>> >
>>> > produces:
>>> >
>>> > %6 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x()
>>> > %7 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x()
>>> > %8 = mul i32 %6, %7,
>>> > %9 = call...
2012 Feb 23
0
[LLVMdev] Clang support for CUDA
Hi,
I am trying to convert a simple CUDA program to LLVM IR using clang 3.0.
The program is as follows,
#include<stdio.h>
#nclude<clang/test/SemaCUDA/cuda.h>
__global__ void kernfunc(int *a)
{
*a=threadIdx.x+blockIdx.x*blockDim.x;
}
int main()
{
int *h_a,*d_a,n;
n=sizeof(int);
h_a=(int*)malloc(n);
*h_a=5;
cudaMalloc((void*)&d_a,n);
cudaMemcpy(d_a,h_a,n,cudaMemcpyHostToDevice);
kernelfunc<<<1,1>>>(d_a);
cudaMemcpy(h_a,d_a,n,cudaMemcpyDeviceToHost);
printf("%d",*h_a);
return 0;...
2014 Jun 17
3
[LLVMdev] Attaching range metadata to IntrinsicInst
...wever, this
>> >>>>> approach may not be elegant because the ranges of these CUDA
>> >>>>> special
>> >>>>> registers depend on the GPU compute capability specified by
>> >>>>> -target-cpu. For instance, blockIdx.x is bounded by 65535 in
>> >>>>> sm_20
>> >>>>> but 2^31-1 in sm_30. Exposing -target-cpu to ValueTracking is
>> >>>>> probably discouraged.
>> >>>>>
>> >>>>> Therefore, the approac...
2015 Jan 24
2
[LLVMdev] [cfe-dev] Proposal: pragma for branch divergence
...rtain
> optimizations such as jump threading and loop unswitching only on
> non-divergent branches. In longer term, the optimizer can even adopt some
> cheap data-flow analysis to conservatively compute whether a branch is
> non-divergent. For example, if a condition is not derived from blockIdx or
> threadIdx, it is guaranteed to hold the same value for all threads in a
> warp. How the compiler can leverage these annotationsSimilar to the
> annotations for loop optimizations
> (http://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations
> <h...
2012 Jul 21
3
Use GPU in R with .Call
...ext, the host function and the kernel are in a *SEPARATE* file
called "VecAdd_kernel.cu".
=======================file VecAdd_kernel.cu========================
#define THREAD_PER_BLOCK 100
__global__ void VecAdd(double *a,double *b, double *c,int len) {
int idx = threadIdx.x + blockIdx.x * blockDim.x;
if (idx<len){
c[idx] = a[idx] + b[idx];
}
}
void vecAdd_kernel(double *ain,double *bin,double *cout,int len){
int alloc_size;
alloc_size=len*sizeof(double);
/*Step 0a) Make a device copies of ain,bin,and cout.*/
double *a_copy,*b_copy,*cout_copy;
/*St...
2015 Jan 25
2
[LLVMdev] [cfe-dev] Proposal: pragma for branch divergence
...ptimizations such as jump threading and loop unswitching only on
>> non-divergent branches. In longer term, the optimizer can even adopt some
>> cheap data-flow analysis to conservatively compute whether a branch is
>> non-divergent. For example, if a condition is not derived from blockIdx or
>> threadIdx, it is guaranteed to hold the same value for all threads in a
>> warp. How the compiler can leverage these annotationsSimilar to the
>> annotations for loop optimizations
>> (http://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimiza...