thr3ads.net - llvm dev - [llvm-dev] Looking for suggestions: Inferring GPU memory accesses [Aug 2020]

If this information is useful, please help other people find it:
Share via:

Ees Kee via llvm-dev

2020-Aug-22 14:38 UTC

[llvm-dev] Looking for suggestions: Inferring GPU memory accesses

Hi all,

As part of my research I want to investigate the relation between the
grid's geometry and the memory accesses of a kernel in common gpu
benchmarks (e.g Rodinia, Polybench etc). As a first step i want to
answer the following question:

- Given a kernel function with M possible memory accesses. For how many of
those M accesses we can statically infer its location given concrete values
for the grid/block and executing thread?

(Assume CUDA only for now)

My initial idea is to replace all uses of dim-related values, e.g:
    __cuda_builtin_blockDim_t::__fetch_builtin_x()
    __cuda_builtin_gridDim_t::__fetch_builtin_x()

and index related values, e.g:
    __cuda_builtin_blockIdx_t::__fetch_builtin_x()
    __cuda_builtin_threadIdx_t::__fetch_builtin_x()

with ConstantInts. Then run constant folding on the result and check how
many GEPs have constant values.

Would something like this work or are there complications I am not thinking
of? I'd appreciate any suggestions.

P.S i am new to LLVM

Thanks in advance,
Ees
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200822/ea03cd6d/attachment.html>

Madhur Amilkanthwar via llvm-dev

2020-Aug-22 15:38 UTC

head link

[llvm-dev] Looking for suggestions: Inferring GPU memory accesses

CUDA/GPU programs are written for a SIMT SIMD model, which means single
instruction, multiple threads and multiple data. Programmers write a single
program in such a way that each thread would execute it with different
data. So, a program is one physical copy but virtually it's run by several
threads so those grid/thread IDs are really meant for semantics of the
program. You can't replace thread specific variables with one thread ID.

Hence, I don't think what you're proposing would have much applicability
in
real-world benchmarks like Rodinia.If you have a strong motivating example
then please provide a counter argument but in my experience, it won't be
much useful.

In some corner cases, it would be useful but those would be a general case
of uniform code blocks.

On Sat, Aug 22, 2020 at 8:09 PM Ees Kee via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi all,
>
> As part of my research I want to investigate the relation between the
> grid's geometry and the memory accesses of a kernel in common gpu
> benchmarks (e.g Rodinia, Polybench etc). As a first step i want to
> answer the following question:
>
> - Given a kernel function with M possible memory accesses. For how many of
> those M accesses we can statically infer its location given concrete values
> for the grid/block and executing thread?
>
> (Assume CUDA only for now)
>
> My initial idea is to replace all uses of dim-related values, e.g:
>     __cuda_builtin_blockDim_t::__fetch_builtin_x()
>     __cuda_builtin_gridDim_t::__fetch_builtin_x()
>
> and index related values, e.g:
>     __cuda_builtin_blockIdx_t::__fetch_builtin_x()
>     __cuda_builtin_threadIdx_t::__fetch_builtin_x()
>
> with ConstantInts. Then run constant folding on the result and check how
> many GEPs have constant values.
>
> Would something like this work or are there complications I am not
> thinking of? I'd appreciate any suggestions.
>
> P.S i am new to LLVM
>
> Thanks in advance,
> Ees
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

-- 
*Disclaimer: Views, concerns, thoughts, questions, ideas expressed in this
mail are of my own and my employer has no take in it. *
Thank You.
Madhur D. Amilkanthwar
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200822/c5bc334a/attachment.html>

Ees Kee via llvm-dev

2020-Aug-22 16:11 UTC

head link

[llvm-dev] Looking for suggestions: Inferring GPU memory accesses

Hi Madhur and thanks for your answer.
> You can't replace thread specific variables with one thread ID.
Why not? Let me rephrase. What I'm looking for at this stage is to be able
to pick a thread in a block, and see for this particular thread, how many
memory accesses in the kernel are (statically) inferable.

For instance for these kernels
https://github.com/yuhc/gpu-rodinia/blob/0739f8045ca9d8153b06973a8b10f6d97485cd72/cuda/gaussian/gaussian.cu#L309
if
you provide concrete values for grid block and index as well as the scalar
arguments you can tell (manually) which offsets off of the pointer
arguments are being accessed by the kernel.
In contrast, in a kernel like this
https://github.com/yuhc/gpu-rodinia/blob/0739f8045ca9d8153b06973a8b10f6d97485cd72/cuda/huffman/hist.cu#L34
<https://github.com/yuhc/gpu-rodinia/blob/0739f8045ca9d8153b06973a8b10f6d97485cd72/cuda/huffman/hist.cu#L34>
you
cant infer them all because some indices are data-dependent.

What i'm looking for - and again, this is only a first step to something
bigger - is to automate this process.

Στις Σάβ, 22 Αυγ 2020 στις 5:38 μ.μ., ο/η Madhur Amilkanthwar <
madhur13490 at gmail.com> έγραψε:
> CUDA/GPU programs are written for a SIMT SIMD model, which means single
> instruction, multiple threads and multiple data. Programmers write a single
> program in such a way that each thread would execute it with different
> data. So, a program is one physical copy but virtually it's run by
several
> threads so those grid/thread IDs are really meant for semantics of the
> program. You can't replace thread specific variables with one thread
ID.
>
> Hence, I don't think what you're proposing would have much
applicability
> in real-world benchmarks like Rodinia.If you have a strong motivating
> example then please provide a counter argument but in my experience, it
> won't be much useful.
>
> In some corner cases, it would be useful but those would be a general case
> of uniform code blocks.
>
> On Sat, Aug 22, 2020 at 8:09 PM Ees Kee via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi all,
>>
>> As part of my research I want to investigate the relation between the
>> grid's geometry and the memory accesses of a kernel in common gpu
>> benchmarks (e.g Rodinia, Polybench etc). As a first step i want to
>> answer the following question:
>>
>> - Given a kernel function with M possible memory accesses. For how many
>> of those M accesses we can statically infer its location given concrete
>> values for the grid/block and executing thread?
>>
>> (Assume CUDA only for now)
>>
>> My initial idea is to replace all uses of dim-related values, e.g:
>>     __cuda_builtin_blockDim_t::__fetch_builtin_x()
>>     __cuda_builtin_gridDim_t::__fetch_builtin_x()
>>
>> and index related values, e.g:
>>     __cuda_builtin_blockIdx_t::__fetch_builtin_x()
>>     __cuda_builtin_threadIdx_t::__fetch_builtin_x()
>>
>> with ConstantInts. Then run constant folding on the result and check
how
>> many GEPs have constant values.
>>
>> Would something like this work or are there complications I am not
>> thinking of? I'd appreciate any suggestions.
>>
>> P.S i am new to LLVM
>>
>> Thanks in advance,
>> Ees
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
> --
> *Disclaimer: Views, concerns, thoughts, questions, ideas expressed in this
> mail are of my own and my employer has no take in it. *
> Thank You.
> Madhur D. Amilkanthwar
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200822/361b5b1f/attachment.html>

Johannes Doerfert via llvm-dev

2020-Aug-23 16:47 UTC

head link

[llvm-dev] Looking for suggestions: Inferring GPU memory accesses

Hi Ees,

a while back we started a project with similar scope.
Unfortunately the development slowed down and the plans to revive it 
this summer got tanked by the US travel restrictions.

Anyway, there is some some existing code that might be useful, though in 
a prototype stage. While I'm obviously biased, I would suggest we 
continue from there.

@Alex @Holger can we put the latest version on github or some other 
place to share it, I'm unsure if the code I (might have) access to is 
the latest.

@Ees I attached a recent paper and you might find the following links 
useful:

    * 2017 LLVM Developers’ Meeting: J. Doerfert “Polyhedral Value & 
Memory Analysis ” https://youtu.be/xSA0XLYJ-G0

    * "Automated Partitioning of Data-Parallel Kernels using Polyhedral 
Compilation.", P2S2 2020 (slides and video 
https://www.mcs.anl.gov/events/workshops/p2s2/2020/program.php)


Let us know what you think :)

~ Johannes




On 8/22/20 9:38 AM, Ees Kee via llvm-dev wrote:
 > Hi all,
 >
 > As part of my research I want to investigate the relation between the
 > grid's geometry and the memory accesses of a kernel in common gpu
 > benchmarks (e.g Rodinia, Polybench etc). As a first step i want to
 > answer the following question:
 >
 > - Given a kernel function with M possible memory accesses. For how 
many of
 > those M accesses we can statically infer its location given concrete 
values
 > for the grid/block and executing thread?
 >
 > (Assume CUDA only for now)
 >
 > My initial idea is to replace all uses of dim-related values, e.g:
 >     __cuda_builtin_blockDim_t::__fetch_builtin_x()
 >     __cuda_builtin_gridDim_t::__fetch_builtin_x()
 >
 > and index related values, e.g:
 >     __cuda_builtin_blockIdx_t::__fetch_builtin_x()
 >     __cuda_builtin_threadIdx_t::__fetch_builtin_x()
 >
 > with ConstantInts. Then run constant folding on the result and check how
 > many GEPs have constant values.
 >
 > Would something like this work or are there complications I am not 
thinking
 > of? I'd appreciate any suggestions.
 >
 > P.S i am new to LLVM
 >
 > Thanks in advance,
 > Ees
 >
 >
 > _______________________________________________
 > LLVM Developers mailing list
 > llvm-dev at lists.llvm.org
 > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
A non-text attachment was scrubbed...
Name: icppworkshops20-13.pdf
Type: application/pdf
Size: 932927 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200823/5fdbe1a8/attachment-0001.pdf>

Madhur Amilkanthwar via llvm-dev

2020-Aug-23 17:43 UTC

head link

[llvm-dev] Looking for suggestions: Inferring GPU memory accesses

@Ees,
Oh, I see what you mean now. Doing such analysis would be useful for a
thread block and not just a single thread but as you say you are onto
something bigger than just a thread.

We had published a short paper in ICS around this which uses polyhedral
techniques to do such analysis and reason about uncoalesced access patterns
in Cuda programs. You can find paper at
https://dl.acm.org/doi/10.1145/2464996.2467288



On Sun, Aug 23, 2020, 11:00 PM Johannes Doerfert via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi Ees,
>
> a while back we started a project with similar scope.
> Unfortunately the development slowed down and the plans to revive it
> this summer got tanked by the US travel restrictions.
>
> Anyway, there is some some existing code that might be useful, though in
> a prototype stage. While I'm obviously biased, I would suggest we
> continue from there.
>
> @Alex @Holger can we put the latest version on github or some other
> place to share it, I'm unsure if the code I (might have) access to is
> the latest.
>
> @Ees I attached a recent paper and you might find the following links
> useful:
>
>     * 2017 LLVM Developers’ Meeting: J. Doerfert “Polyhedral Value &
> Memory Analysis ” https://youtu.be/xSA0XLYJ-G0
>
>     * "Automated Partitioning of Data-Parallel Kernels using
Polyhedral
> Compilation.", P2S2 2020 (slides and video
> https://www.mcs.anl.gov/events/workshops/p2s2/2020/program.php)
>
>
> Let us know what you think :)
>
> ~ Johannes
>
>
>
>
> On 8/22/20 9:38 AM, Ees Kee via llvm-dev wrote:
>  > Hi all,
>  >
>  > As part of my research I want to investigate the relation between the
>  > grid's geometry and the memory accesses of a kernel in common gpu
>  > benchmarks (e.g Rodinia, Polybench etc). As a first step i want to
>  > answer the following question:
>  >
>  > - Given a kernel function with M possible memory accesses. For how
> many of
>  > those M accesses we can statically infer its location given concrete
> values
>  > for the grid/block and executing thread?
>  >
>  > (Assume CUDA only for now)
>  >
>  > My initial idea is to replace all uses of dim-related values, e.g:
>  >     __cuda_builtin_blockDim_t::__fetch_builtin_x()
>  >     __cuda_builtin_gridDim_t::__fetch_builtin_x()
>  >
>  > and index related values, e.g:
>  >     __cuda_builtin_blockIdx_t::__fetch_builtin_x()
>  >     __cuda_builtin_threadIdx_t::__fetch_builtin_x()
>  >
>  > with ConstantInts. Then run constant folding on the result and check
how
>  > many GEPs have constant values.
>  >
>  > Would something like this work or are there complications I am not
> thinking
>  > of? I'd appreciate any suggestions.
>  >
>  > P.S i am new to LLVM
>  >
>  > Thanks in advance,
>  > Ees
>  >
>  >
>  > _______________________________________________
>  > LLVM Developers mailing list
>  > llvm-dev at lists.llvm.org
>  > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200823/5eaeecfc/attachment.html>

Ees via llvm-dev

2020-Aug-23 19:24 UTC

head link

[llvm-dev] Looking for suggestions: Inferring GPU memory accesses

Hello Johannes,

Thank you very much for the material. I will have a look and get back to 
you (possibly with questions if you don't mind :) ).
I would also appreciate the code if that's available.

- Ees

On 23-08-2020 18:47, Johannes Doerfert wrote:> Hi Ees,
>
> a while back we started a project with similar scope.
> Unfortunately the development slowed down and the plans to revive it 
> this summer got tanked by the US travel restrictions.
>
> Anyway, there is some some existing code that might be useful, though 
> in a prototype stage. While I'm obviously biased, I would suggest we 
> continue from there.
>
> @Alex @Holger can we put the latest version on github or some other 
> place to share it, I'm unsure if the code I (might have) access to is 
> the latest.
>
> @Ees I attached a recent paper and you might find the following links 
> useful:
>
>    * 2017 LLVM Developers’ Meeting: J. Doerfert “Polyhedral Value & 
> Memory Analysis ” https://youtu.be/xSA0XLYJ-G0
>
>    * "Automated Partitioning of Data-Parallel Kernels using Polyhedral
> Compilation.", P2S2 2020 (slides and video 
> https://www.mcs.anl.gov/events/workshops/p2s2/2020/program.php)
>
>
> Let us know what you think :)
>
> ~ Johannes
>
>
>
>
> On 8/22/20 9:38 AM, Ees Kee via llvm-dev wrote:
> > Hi all,
> >
> > As part of my research I want to investigate the relation between the
> > grid's geometry and the memory accesses of a kernel in common gpu
> > benchmarks (e.g Rodinia, Polybench etc). As a first step i want to
> > answer the following question:
> >
> > - Given a kernel function with M possible memory accesses. For how 
> many of
> > those M accesses we can statically infer its location given concrete 
> values
> > for the grid/block and executing thread?
> >
> > (Assume CUDA only for now)
> >
> > My initial idea is to replace all uses of dim-related values, e.g:
> >     __cuda_builtin_blockDim_t::__fetch_builtin_x()
> >     __cuda_builtin_gridDim_t::__fetch_builtin_x()
> >
> > and index related values, e.g:
> >     __cuda_builtin_blockIdx_t::__fetch_builtin_x()
> >     __cuda_builtin_threadIdx_t::__fetch_builtin_x()
> >
> > with ConstantInts. Then run constant folding on the result and check 
> how
> > many GEPs have constant values.
> >
> > Would something like this work or are there complications I am not 
> thinking
> > of? I'd appreciate any suggestions.
> >
> > P.S i am new to LLVM
> >
> > Thanks in advance,
> > Ees
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Aug 2020 - Looking for suggestions: Inferring GPU memory accesses

[llvm-dev] Looking for suggestions: Inferring GPU memory accesses

[llvm-dev] Looking for suggestions: Inferring GPU memory accesses

[llvm-dev] Looking for suggestions: Inferring GPU memory accesses

[llvm-dev] Looking for suggestions: Inferring GPU memory accesses

[llvm-dev] Looking for suggestions: Inferring GPU memory accesses

[llvm-dev] Looking for suggestions: Inferring GPU memory accesses

Seemingly Similar Threads