Hi, I am not sure if there is any CUDA/PTX instrumenting feature in LLVM. I want to generated a simple memory trace and I know GPGPU Ocelot does that. But I was thinking why not LLVM. So I am looking at two optimizations implemented in LLVM for CUDA for some inspiration. 1. Address inference: Does this use PTX IR or LLVM IR? I would say LLVM IR based on some code keywords like PHI nodes etc. 2. Bypass slow div: This is a generic optimization done adopted for CUDA. I think it uses LLVM IR. So my question is, to instrument PTX code, shall I focus on LLVM IR or PTX? Some definite guidance on these lines will be very helpful. Thank you. Sincerely, Gurunath -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161118/11371624/attachment.html>
Follow-up: Or should it be SASS code that should be instrumented?!! See: https://github.com/NVlabs/SASSI On Fri, Nov 18, 2016 at 1:55 PM, Gurunath Kadam <gurunath.kadam at gmail.com> wrote:> Hi, > > I am not sure if there is any CUDA/PTX instrumenting feature in LLVM. > > I want to generated a simple memory trace and I know GPGPU Ocelot does > that. But I was thinking why not LLVM. > > So I am looking at two optimizations implemented in LLVM for CUDA for some > inspiration. > > 1. Address inference: Does this use PTX IR or LLVM IR? I would say LLVM IR > based on some code keywords like PHI nodes etc. > > 2. Bypass slow div: This is a generic optimization done adopted for CUDA. > I think it uses LLVM IR. > > So my question is, to instrument PTX code, shall I focus on LLVM IR or PTX? > > Some definite guidance on these lines will be very helpful. Thank you. > > Sincerely, > Gurunath >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161118/47ea63fe/attachment.html>
On Fri, Nov 18, 2016 at 10:55 AM, Gurunath Kadam via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi, > > I am not sure if there is any CUDA/PTX instrumenting feature in LLVM. > > I want to generated a simple memory trace and I know GPGPU Ocelot does > that. But I was thinking why not LLVM. >> So I am looking at two optimizations implemented in LLVM for CUDA for some > inspiration. > > 1. Address inference: Does this use PTX IR or LLVM IR? I would say LLVM IR > based on some code keywords like PHI nodes etc. > > 2. Bypass slow div: This is a generic optimization done adopted for CUDA. > I think it uses LLVM IR. >Both optimizations are IR-level.> > So my question is, to instrument PTX code, shall I focus on LLVM IR or PTX? >Depending on what you want to trace. For memory tracing, instrumenting IR is probably enough, because there's an almost one-to-one mapping between a load/store in optimized IR and a load/store in PTX.> > Some definite guidance on these lines will be very helpful. Thank you. > > Sincerely, > Gurunath > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161118/f6827c2e/attachment.html>