Ees Kee via llvm-dev
2020-Aug-22 14:38 UTC
[llvm-dev] Looking for suggestions: Inferring GPU memory accesses
Hi all, As part of my research I want to investigate the relation between the grid's geometry and the memory accesses of a kernel in common gpu benchmarks (e.g Rodinia, Polybench etc). As a first step i want to answer the following question: - Given a kernel function with M possible memory accesses. For how many of those M accesses we can statically infer its location given concrete values for the grid/block and executing thread? (Assume CUDA only for now) My initial idea is to replace all uses of dim-related values, e.g: __cuda_builtin_blockDim_t::__fetch_builtin_x() __cuda_builtin_gridDim_t::__fetch_builtin_x() and index related values, e.g: __cuda_builtin_blockIdx_t::__fetch_builtin_x() __cuda_builtin_threadIdx_t::__fetch_builtin_x() with ConstantInts. Then run constant folding on the result and check how many GEPs have constant values. Would something like this work or are there complications I am not thinking of? I'd appreciate any suggestions. P.S i am new to LLVM Thanks in advance, Ees -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200822/ea03cd6d/attachment.html>
Madhur Amilkanthwar via llvm-dev
2020-Aug-22 15:38 UTC
[llvm-dev] Looking for suggestions: Inferring GPU memory accesses
CUDA/GPU programs are written for a SIMT SIMD model, which means single instruction, multiple threads and multiple data. Programmers write a single program in such a way that each thread would execute it with different data. So, a program is one physical copy but virtually it's run by several threads so those grid/thread IDs are really meant for semantics of the program. You can't replace thread specific variables with one thread ID. Hence, I don't think what you're proposing would have much applicability in real-world benchmarks like Rodinia.If you have a strong motivating example then please provide a counter argument but in my experience, it won't be much useful. In some corner cases, it would be useful but those would be a general case of uniform code blocks. On Sat, Aug 22, 2020 at 8:09 PM Ees Kee via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi all, > > As part of my research I want to investigate the relation between the > grid's geometry and the memory accesses of a kernel in common gpu > benchmarks (e.g Rodinia, Polybench etc). As a first step i want to > answer the following question: > > - Given a kernel function with M possible memory accesses. For how many of > those M accesses we can statically infer its location given concrete values > for the grid/block and executing thread? > > (Assume CUDA only for now) > > My initial idea is to replace all uses of dim-related values, e.g: > __cuda_builtin_blockDim_t::__fetch_builtin_x() > __cuda_builtin_gridDim_t::__fetch_builtin_x() > > and index related values, e.g: > __cuda_builtin_blockIdx_t::__fetch_builtin_x() > __cuda_builtin_threadIdx_t::__fetch_builtin_x() > > with ConstantInts. Then run constant folding on the result and check how > many GEPs have constant values. > > Would something like this work or are there complications I am not > thinking of? I'd appreciate any suggestions. > > P.S i am new to LLVM > > Thanks in advance, > Ees > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-- *Disclaimer: Views, concerns, thoughts, questions, ideas expressed in this mail are of my own and my employer has no take in it. * Thank You. Madhur D. Amilkanthwar -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200822/c5bc334a/attachment.html>
Ees Kee via llvm-dev
2020-Aug-22 16:11 UTC
[llvm-dev] Looking for suggestions: Inferring GPU memory accesses
Hi Madhur and thanks for your answer.> You can't replace thread specific variables with one thread ID.Why not? Let me rephrase. What I'm looking for at this stage is to be able to pick a thread in a block, and see for this particular thread, how many memory accesses in the kernel are (statically) inferable. For instance for these kernels https://github.com/yuhc/gpu-rodinia/blob/0739f8045ca9d8153b06973a8b10f6d97485cd72/cuda/gaussian/gaussian.cu#L309 if you provide concrete values for grid block and index as well as the scalar arguments you can tell (manually) which offsets off of the pointer arguments are being accessed by the kernel. In contrast, in a kernel like this https://github.com/yuhc/gpu-rodinia/blob/0739f8045ca9d8153b06973a8b10f6d97485cd72/cuda/huffman/hist.cu#L34 <https://github.com/yuhc/gpu-rodinia/blob/0739f8045ca9d8153b06973a8b10f6d97485cd72/cuda/huffman/hist.cu#L34> you cant infer them all because some indices are data-dependent. What i'm looking for - and again, this is only a first step to something bigger - is to automate this process. Στις Σάβ, 22 Αυγ 2020 στις 5:38 μ.μ., ο/η Madhur Amilkanthwar < madhur13490 at gmail.com> έγραψε:> CUDA/GPU programs are written for a SIMT SIMD model, which means single > instruction, multiple threads and multiple data. Programmers write a single > program in such a way that each thread would execute it with different > data. So, a program is one physical copy but virtually it's run by several > threads so those grid/thread IDs are really meant for semantics of the > program. You can't replace thread specific variables with one thread ID. > > Hence, I don't think what you're proposing would have much applicability > in real-world benchmarks like Rodinia.If you have a strong motivating > example then please provide a counter argument but in my experience, it > won't be much useful. > > In some corner cases, it would be useful but those would be a general case > of uniform code blocks. > > On Sat, Aug 22, 2020 at 8:09 PM Ees Kee via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi all, >> >> As part of my research I want to investigate the relation between the >> grid's geometry and the memory accesses of a kernel in common gpu >> benchmarks (e.g Rodinia, Polybench etc). As a first step i want to >> answer the following question: >> >> - Given a kernel function with M possible memory accesses. For how many >> of those M accesses we can statically infer its location given concrete >> values for the grid/block and executing thread? >> >> (Assume CUDA only for now) >> >> My initial idea is to replace all uses of dim-related values, e.g: >> __cuda_builtin_blockDim_t::__fetch_builtin_x() >> __cuda_builtin_gridDim_t::__fetch_builtin_x() >> >> and index related values, e.g: >> __cuda_builtin_blockIdx_t::__fetch_builtin_x() >> __cuda_builtin_threadIdx_t::__fetch_builtin_x() >> >> with ConstantInts. Then run constant folding on the result and check how >> many GEPs have constant values. >> >> Would something like this work or are there complications I am not >> thinking of? I'd appreciate any suggestions. >> >> P.S i am new to LLVM >> >> Thanks in advance, >> Ees >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> > > > -- > *Disclaimer: Views, concerns, thoughts, questions, ideas expressed in this > mail are of my own and my employer has no take in it. * > Thank You. > Madhur D. Amilkanthwar > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200822/361b5b1f/attachment.html>
Johannes Doerfert via llvm-dev
2020-Aug-23 16:47 UTC
[llvm-dev] Looking for suggestions: Inferring GPU memory accesses
Hi Ees, a while back we started a project with similar scope. Unfortunately the development slowed down and the plans to revive it this summer got tanked by the US travel restrictions. Anyway, there is some some existing code that might be useful, though in a prototype stage. While I'm obviously biased, I would suggest we continue from there. @Alex @Holger can we put the latest version on github or some other place to share it, I'm unsure if the code I (might have) access to is the latest. @Ees I attached a recent paper and you might find the following links useful: * 2017 LLVM Developers’ Meeting: J. Doerfert “Polyhedral Value & Memory Analysis ” https://youtu.be/xSA0XLYJ-G0 * "Automated Partitioning of Data-Parallel Kernels using Polyhedral Compilation.", P2S2 2020 (slides and video https://www.mcs.anl.gov/events/workshops/p2s2/2020/program.php) Let us know what you think :) ~ Johannes On 8/22/20 9:38 AM, Ees Kee via llvm-dev wrote: > Hi all, > > As part of my research I want to investigate the relation between the > grid's geometry and the memory accesses of a kernel in common gpu > benchmarks (e.g Rodinia, Polybench etc). As a first step i want to > answer the following question: > > - Given a kernel function with M possible memory accesses. For how many of > those M accesses we can statically infer its location given concrete values > for the grid/block and executing thread? > > (Assume CUDA only for now) > > My initial idea is to replace all uses of dim-related values, e.g: > __cuda_builtin_blockDim_t::__fetch_builtin_x() > __cuda_builtin_gridDim_t::__fetch_builtin_x() > > and index related values, e.g: > __cuda_builtin_blockIdx_t::__fetch_builtin_x() > __cuda_builtin_threadIdx_t::__fetch_builtin_x() > > with ConstantInts. Then run constant folding on the result and check how > many GEPs have constant values. > > Would something like this work or are there complications I am not thinking > of? I'd appreciate any suggestions. > > P.S i am new to LLVM > > Thanks in advance, > Ees > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- A non-text attachment was scrubbed... Name: icppworkshops20-13.pdf Type: application/pdf Size: 932927 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200823/5fdbe1a8/attachment-0001.pdf>
Madhur Amilkanthwar via llvm-dev
2020-Aug-23 17:43 UTC
[llvm-dev] Looking for suggestions: Inferring GPU memory accesses
@Ees, Oh, I see what you mean now. Doing such analysis would be useful for a thread block and not just a single thread but as you say you are onto something bigger than just a thread. We had published a short paper in ICS around this which uses polyhedral techniques to do such analysis and reason about uncoalesced access patterns in Cuda programs. You can find paper at https://dl.acm.org/doi/10.1145/2464996.2467288 On Sun, Aug 23, 2020, 11:00 PM Johannes Doerfert via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi Ees, > > a while back we started a project with similar scope. > Unfortunately the development slowed down and the plans to revive it > this summer got tanked by the US travel restrictions. > > Anyway, there is some some existing code that might be useful, though in > a prototype stage. While I'm obviously biased, I would suggest we > continue from there. > > @Alex @Holger can we put the latest version on github or some other > place to share it, I'm unsure if the code I (might have) access to is > the latest. > > @Ees I attached a recent paper and you might find the following links > useful: > > * 2017 LLVM Developers’ Meeting: J. Doerfert “Polyhedral Value & > Memory Analysis ” https://youtu.be/xSA0XLYJ-G0 > > * "Automated Partitioning of Data-Parallel Kernels using Polyhedral > Compilation.", P2S2 2020 (slides and video > https://www.mcs.anl.gov/events/workshops/p2s2/2020/program.php) > > > Let us know what you think :) > > ~ Johannes > > > > > On 8/22/20 9:38 AM, Ees Kee via llvm-dev wrote: > > Hi all, > > > > As part of my research I want to investigate the relation between the > > grid's geometry and the memory accesses of a kernel in common gpu > > benchmarks (e.g Rodinia, Polybench etc). As a first step i want to > > answer the following question: > > > > - Given a kernel function with M possible memory accesses. For how > many of > > those M accesses we can statically infer its location given concrete > values > > for the grid/block and executing thread? > > > > (Assume CUDA only for now) > > > > My initial idea is to replace all uses of dim-related values, e.g: > > __cuda_builtin_blockDim_t::__fetch_builtin_x() > > __cuda_builtin_gridDim_t::__fetch_builtin_x() > > > > and index related values, e.g: > > __cuda_builtin_blockIdx_t::__fetch_builtin_x() > > __cuda_builtin_threadIdx_t::__fetch_builtin_x() > > > > with ConstantInts. Then run constant folding on the result and check how > > many GEPs have constant values. > > > > Would something like this work or are there complications I am not > thinking > > of? I'd appreciate any suggestions. > > > > P.S i am new to LLVM > > > > Thanks in advance, > > Ees > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200823/5eaeecfc/attachment.html>
Ees via llvm-dev
2020-Aug-23 19:24 UTC
[llvm-dev] Looking for suggestions: Inferring GPU memory accesses
Hello Johannes, Thank you very much for the material. I will have a look and get back to you (possibly with questions if you don't mind :) ). I would also appreciate the code if that's available. - Ees On 23-08-2020 18:47, Johannes Doerfert wrote:> Hi Ees, > > a while back we started a project with similar scope. > Unfortunately the development slowed down and the plans to revive it > this summer got tanked by the US travel restrictions. > > Anyway, there is some some existing code that might be useful, though > in a prototype stage. While I'm obviously biased, I would suggest we > continue from there. > > @Alex @Holger can we put the latest version on github or some other > place to share it, I'm unsure if the code I (might have) access to is > the latest. > > @Ees I attached a recent paper and you might find the following links > useful: > > * 2017 LLVM Developers’ Meeting: J. Doerfert “Polyhedral Value & > Memory Analysis ” https://youtu.be/xSA0XLYJ-G0 > > * "Automated Partitioning of Data-Parallel Kernels using Polyhedral > Compilation.", P2S2 2020 (slides and video > https://www.mcs.anl.gov/events/workshops/p2s2/2020/program.php) > > > Let us know what you think :) > > ~ Johannes > > > > > On 8/22/20 9:38 AM, Ees Kee via llvm-dev wrote: > > Hi all, > > > > As part of my research I want to investigate the relation between the > > grid's geometry and the memory accesses of a kernel in common gpu > > benchmarks (e.g Rodinia, Polybench etc). As a first step i want to > > answer the following question: > > > > - Given a kernel function with M possible memory accesses. For how > many of > > those M accesses we can statically infer its location given concrete > values > > for the grid/block and executing thread? > > > > (Assume CUDA only for now) > > > > My initial idea is to replace all uses of dim-related values, e.g: > > __cuda_builtin_blockDim_t::__fetch_builtin_x() > > __cuda_builtin_gridDim_t::__fetch_builtin_x() > > > > and index related values, e.g: > > __cuda_builtin_blockIdx_t::__fetch_builtin_x() > > __cuda_builtin_threadIdx_t::__fetch_builtin_x() > > > > with ConstantInts. Then run constant folding on the result and check > how > > many GEPs have constant values. > > > > Would something like this work or are there complications I am not > thinking > > of? I'd appreciate any suggestions. > > > > P.S i am new to LLVM > > > > Thanks in advance, > > Ees > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >