Madhur Amilkanthwar via llvm-dev
2020-Aug-23 17:43 UTC
[llvm-dev] Looking for suggestions: Inferring GPU memory accesses
@Ees, Oh, I see what you mean now. Doing such analysis would be useful for a thread block and not just a single thread but as you say you are onto something bigger than just a thread. We had published a short paper in ICS around this which uses polyhedral techniques to do such analysis and reason about uncoalesced access patterns in Cuda programs. You can find paper at https://dl.acm.org/doi/10.1145/2464996.2467288 On Sun, Aug 23, 2020, 11:00 PM Johannes Doerfert via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi Ees, > > a while back we started a project with similar scope. > Unfortunately the development slowed down and the plans to revive it > this summer got tanked by the US travel restrictions. > > Anyway, there is some some existing code that might be useful, though in > a prototype stage. While I'm obviously biased, I would suggest we > continue from there. > > @Alex @Holger can we put the latest version on github or some other > place to share it, I'm unsure if the code I (might have) access to is > the latest. > > @Ees I attached a recent paper and you might find the following links > useful: > > * 2017 LLVM Developers’ Meeting: J. Doerfert “Polyhedral Value & > Memory Analysis ” https://youtu.be/xSA0XLYJ-G0 > > * "Automated Partitioning of Data-Parallel Kernels using Polyhedral > Compilation.", P2S2 2020 (slides and video > https://www.mcs.anl.gov/events/workshops/p2s2/2020/program.php) > > > Let us know what you think :) > > ~ Johannes > > > > > On 8/22/20 9:38 AM, Ees Kee via llvm-dev wrote: > > Hi all, > > > > As part of my research I want to investigate the relation between the > > grid's geometry and the memory accesses of a kernel in common gpu > > benchmarks (e.g Rodinia, Polybench etc). As a first step i want to > > answer the following question: > > > > - Given a kernel function with M possible memory accesses. For how > many of > > those M accesses we can statically infer its location given concrete > values > > for the grid/block and executing thread? > > > > (Assume CUDA only for now) > > > > My initial idea is to replace all uses of dim-related values, e.g: > > __cuda_builtin_blockDim_t::__fetch_builtin_x() > > __cuda_builtin_gridDim_t::__fetch_builtin_x() > > > > and index related values, e.g: > > __cuda_builtin_blockIdx_t::__fetch_builtin_x() > > __cuda_builtin_threadIdx_t::__fetch_builtin_x() > > > > with ConstantInts. Then run constant folding on the result and check how > > many GEPs have constant values. > > > > Would something like this work or are there complications I am not > thinking > > of? I'd appreciate any suggestions. > > > > P.S i am new to LLVM > > > > Thanks in advance, > > Ees > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200823/5eaeecfc/attachment.html>
Ees via llvm-dev
2020-Aug-23 19:33 UTC
[llvm-dev] Looking for suggestions: Inferring GPU memory accesses
@Madhur Thank you i will have a look at the paper. > Doing such analysis would be useful for a thread block and not just a single thread Do you have any concrete use cases in mind? I was thinking that i could use such an analysis to, for instance, visualize the memory accesses performed by the kernel (or at least the ones that it is possible to infer). Relevant literature i find always involves tracing every access. So I'm thinking that with something like this, tracing can be (potentially) significantly reduced. -Ees On 23-08-2020 19:43, Madhur Amilkanthwar wrote:> @Ees, > Oh, I see what you mean now. Doing such analysis would be useful for a > thread block and not just a single thread but as you say you are onto > something bigger than just a thread. > > We had published a short paper in ICS around this which uses > polyhedral techniques to do such analysis and reason about uncoalesced > access patterns in Cuda programs. You can find paper at > https://dl.acm.org/doi/10.1145/2464996.2467288 > > > > On Sun, Aug 23, 2020, 11:00 PM Johannes Doerfert via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > Hi Ees, > > a while back we started a project with similar scope. > Unfortunately the development slowed down and the plans to revive it > this summer got tanked by the US travel restrictions. > > Anyway, there is some some existing code that might be useful, > though in > a prototype stage. While I'm obviously biased, I would suggest we > continue from there. > > @Alex @Holger can we put the latest version on github or some other > place to share it, I'm unsure if the code I (might have) access to is > the latest. > > @Ees I attached a recent paper and you might find the following links > useful: > > * 2017 LLVM Developers’ Meeting: J. Doerfert “Polyhedral Value & > Memory Analysis ” https://youtu.be/xSA0XLYJ-G0 > > * "Automated Partitioning of Data-Parallel Kernels using > Polyhedral > Compilation.", P2S2 2020 (slides and video > https://www.mcs.anl.gov/events/workshops/p2s2/2020/program.php) > > > Let us know what you think :) > > ~ Johannes > > > > > On 8/22/20 9:38 AM, Ees Kee via llvm-dev wrote: > > Hi all, > > > > As part of my research I want to investigate the relation > between the > > grid's geometry and the memory accesses of a kernel in common gpu > > benchmarks (e.g Rodinia, Polybench etc). As a first step i want to > > answer the following question: > > > > - Given a kernel function with M possible memory accesses. For how > many of > > those M accesses we can statically infer its location given > concrete > values > > for the grid/block and executing thread? > > > > (Assume CUDA only for now) > > > > My initial idea is to replace all uses of dim-related values, e.g: > > __cuda_builtin_blockDim_t::__fetch_builtin_x() > > __cuda_builtin_gridDim_t::__fetch_builtin_x() > > > > and index related values, e.g: > > __cuda_builtin_blockIdx_t::__fetch_builtin_x() > > __cuda_builtin_threadIdx_t::__fetch_builtin_x() > > > > with ConstantInts. Then run constant folding on the result and > check how > > many GEPs have constant values. > > > > Would something like this work or are there complications I am not > thinking > > of? I'd appreciate any suggestions. > > > > P.S i am new to LLVM > > > > Thanks in advance, > > Ees > > > > > > _______________________________________________ > > LLVM Developers mailing list > > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200823/451c533e/attachment.html>
Madhur Amilkanthwar via llvm-dev
2020-Aug-24 06:40 UTC
[llvm-dev] Looking for suggestions: Inferring GPU memory accesses
I don't have any concrete cases off the top of my head, but work Johannes et al. is definitely interesting to me. I hope doing some more literature survey on the similar lines would be useful for your research work. I think research work who cited our work developed on some of the ideas we proposed. You can probably look at those use cases. On Mon, Aug 24, 2020 at 1:03 AM Ees <kayesg42 at gmail.com> wrote:> @Madhur Thank you i will have a look at the paper. > > > Doing such analysis would be useful for a thread block and not just a > single thread > > Do you have any concrete use cases in mind? > > I was thinking that i could use such an analysis to, for instance, > visualize the memory accesses performed by the kernel (or at least the ones > that it is possible to infer). Relevant literature i find always involves > tracing every access. So I'm thinking that with something like this, > tracing can be (potentially) significantly reduced. > > -Ees > On 23-08-2020 19:43, Madhur Amilkanthwar wrote: > > @Ees, > Oh, I see what you mean now. Doing such analysis would be useful for a > thread block and not just a single thread but as you say you are onto > something bigger than just a thread. > > We had published a short paper in ICS around this which uses polyhedral > techniques to do such analysis and reason about uncoalesced access patterns > in Cuda programs. You can find paper at > https://dl.acm.org/doi/10.1145/2464996.2467288 > > > > On Sun, Aug 23, 2020, 11:00 PM Johannes Doerfert via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi Ees, >> >> a while back we started a project with similar scope. >> Unfortunately the development slowed down and the plans to revive it >> this summer got tanked by the US travel restrictions. >> >> Anyway, there is some some existing code that might be useful, though in >> a prototype stage. While I'm obviously biased, I would suggest we >> continue from there. >> >> @Alex @Holger can we put the latest version on github or some other >> place to share it, I'm unsure if the code I (might have) access to is >> the latest. >> >> @Ees I attached a recent paper and you might find the following links >> useful: >> >> * 2017 LLVM Developers’ Meeting: J. Doerfert “Polyhedral Value & >> Memory Analysis ” https://youtu.be/xSA0XLYJ-G0 >> >> * "Automated Partitioning of Data-Parallel Kernels using Polyhedral >> Compilation.", P2S2 2020 (slides and video >> https://www.mcs.anl.gov/events/workshops/p2s2/2020/program.php) >> >> >> Let us know what you think :) >> >> ~ Johannes >> >> >> >> >> On 8/22/20 9:38 AM, Ees Kee via llvm-dev wrote: >> > Hi all, >> > >> > As part of my research I want to investigate the relation between the >> > grid's geometry and the memory accesses of a kernel in common gpu >> > benchmarks (e.g Rodinia, Polybench etc). As a first step i want to >> > answer the following question: >> > >> > - Given a kernel function with M possible memory accesses. For how >> many of >> > those M accesses we can statically infer its location given concrete >> values >> > for the grid/block and executing thread? >> > >> > (Assume CUDA only for now) >> > >> > My initial idea is to replace all uses of dim-related values, e.g: >> > __cuda_builtin_blockDim_t::__fetch_builtin_x() >> > __cuda_builtin_gridDim_t::__fetch_builtin_x() >> > >> > and index related values, e.g: >> > __cuda_builtin_blockIdx_t::__fetch_builtin_x() >> > __cuda_builtin_threadIdx_t::__fetch_builtin_x() >> > >> > with ConstantInts. Then run constant folding on the result and check >> how >> > many GEPs have constant values. >> > >> > Would something like this work or are there complications I am not >> thinking >> > of? I'd appreciate any suggestions. >> > >> > P.S i am new to LLVM >> > >> > Thanks in advance, >> > Ees >> > >> > >> > _______________________________________________ >> > LLVM Developers mailing list >> > llvm-dev at lists.llvm.org >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-- *Disclaimer: Views, concerns, thoughts, questions, ideas expressed in this mail are of my own and my employer has no take in it. * Thank You. Madhur D. Amilkanthwar -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200824/b2fc4676/attachment.html>