thr3ads.net - llvm dev - [llvm-dev] Static Analysis for GPU Program Performance in LLVM [Dec 2021]

If this information is useful, please help other people find it:
Share via:

Jason Eckhardt via llvm-dev

2021-Nov-17 01:53 UTC

[llvm-dev] Static Analysis for GPU Program Performance in LLVM

Hi Nimit,

This is interesting and promising work! I haven't seen anyone else respond
yet, so I'll just give a few "pre-review" observations/suggestions
that I noticed. As a first step, and if you haven't already, take a look at
https://llvm.org/docs/DeveloperPolicy.html and
https://llvm.org/docs/GettingStarted.html#sending-patches for concrete
information and steps for submitting your patch.

  1.  Reviewers will ask you to format your code according to LLVM Coding
Standards. I see a few issues, such as some non-standard variable naming and
code formatting. Since this is new code, you can perform blanket
"clang-format" and "clang-tidy" runs over all the code to
resolve most issues automatically before submitting your patch (or at least
before committing).
  2.  This work is billed in a way that sounds like a generic "GPU"
analysis. However, as written today, it is CUDA/NVIDIA specific. For example,
UncoalescedAnalysis::ExecuteInstruction references NVVM intrinsics (e.g.,
llvm.nvvm.read.ptx.sreg.tid.x). Also, UncoalescedAnalysis::getConstantExprValue
hardcodes NVIDIA-specific address spaces 3 and 4 for Shared and Constant,
respectively (hardcoding those values is itself another issue). That said, much
of the code is generic. Probably what makes sense here is to isolate the above
instances (and any other target dependences) into target-specific hooks.
TargetTransformInfo might be the right place for these (see similar APIs like
TTI::isSourceOfDivergence). In this way, the bulk of the code remains
target-independent, and the various GPUs only need to implement their (hopefully
very small) specific hooks/overrides to utilize the analyses. Of course, there
is still the external issues-- such as what specific vendor tools/libraries are
needed, and that would be documented separately by each GPU target.
  3.  There don't seem to be any simple unit/regression ("lit")
tests. I do see that there are the benchmark directories NVIDIA_CUDA-8.0_Samples
and rodinia_3.1, but these aren't suitable unit tests. In fact, those might
be good to add to the LLVM "test suite" (see
https://llvm.org/docs/TestSuiteGuide.html). Instead, patches are usually
required to have some way to unit-test them. Often these are small IR tests. In
some cases, C++ test cases are used when it isn't feasible to test via IR
(testing, say, operations on a new ADT). See, e.g., llvm/test/Analysis for
examples of testing analysis passes.
  4.  I wonder if it makes sense to promote the GPU Drano static analysis to a
full-fledged LLVM tool (e.g., llvm/tools/llvm-gpudrano) instead of manually
running a series of clang+opt steps? That might make it a bit more convenient to
use. Even though the tool is today CUDA specific, it could have a target flag
where only a NVIDIA/CUDA value is recognized and implemented. Just a thought and
totally optional.
  5.  Adding some documentation would be useful. At the minimum, you might add a
paragraph to llvm/docs/Passes.rst. But a more substantial write-up (like the one
for llvm/docs/Vectorizers.rst) would be even better.
  6.
  7.

________________________________
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Nimit
Singhania via llvm-dev <llvm-dev at lists.llvm.org>
Sent: Monday, October 25, 2021 2:50 PM
To: llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
Subject: [llvm-dev] Static Analysis for GPU Program Performance in LLVM

External email: Use caution opening links or attachments

Hi there,

I would like to propose addition of two static analyses to LLVM framework that
can help detect performance issues in GPU programs: The first analysis directly
detects the issue with memory congestion across GPU threads; the second analysis
checks independence for block-size for synchronization-free programs that allows
performance tuning of block-size without impacting correctness. Both these
static analyses were developed as part of my PhD thesis and are available on
github. Please see the link here to see more details:

https://github.com/upenn-acg/gpudrano-static-analysis_v1.0

We would like to upstream these analyses to LLVM. There are many advantages of
the work. These are ground-breaking analyses that allow light-weight
compile-time detection of performance and correctness issues in GPU programs
that concern inter-thread behavior. Being light-weight allows them to operate
efficiently at compile-time. Inter-thread behavior of the program concerns the
behaviors of the program that are observed due to the interaction between
threads and are not local to individual threads. Such analysis is difficult to
perform in a generic multi-threaded program, however due to the regularity of
GPU parallelism, the analyses are feasible at compile-time.

These analyses can be the basis for optimizations that can improve the
performance of GPU programs multifold. Given the complexity of GPU programming
and the lack of support for tools in this space, the analyses provide the first
steps towards robust tools for analysis and optimization of GPU programs. There
are two publications that have been published for this work, which can be found
in the references below. I would be happy to answer any questions or concerns
regarding this work.

Regards,
Nimit

References:
1. FMSD 2021: Static analysis for detecting uncoalesced accesses in GPU
programs, Rajeev Alur, Joseph Devietti, Omar Navarro Leija, and Nimit Singhania.
2. SAS 2018: Block-Size Independence of GPU Programs, Rajeev Alur, Joseph
Devietti, and Nimit Singhania.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20211117/b0b66ed3/attachment.html>

Madhur Amilkanthwar via llvm-dev

2021-Nov-17 03:44 UTC

head link

[llvm-dev] Static Analysis for GPU Program Performance in LLVM

Hi Nimit,
I am happy to see that the work related to uncoalesced access is becoming
mainstream and targeting LLVM. When we started working on this back in 2013
(during my masters), we had just a small codebase and we never upstreamed
it. Of course, our algorithm was the first attempt to detect uncoalesced
accesses and there has been a lot of intellectual progress on this topic
since then.

I am happy to see that you have cited our work in your paper :)

I second Jason's thoughts and specifically about the below
>I wonder if it makes sense to promote the GPU Drano static analysis to afull-fledged LLVM tool (e.g., llvm/tools/llvm-gpudrano) instead of manually
running a series of clang+opt steps? That might make it a bit more
convenient to use. Even though the tool is today CUDA specific, it could
have a target flag where only a NVIDIA/CUDA value is recognized and
implemented. Just a thought and totally optional.

It makes sense to have this as a standalone tool. It will definitely remove
barriers to access the tool.

I would love to review the code when you create a patch!



On Wed, Nov 17, 2021 at 7:24 AM Jason Eckhardt via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi Nimit,
>
> This is interesting and promising work! I haven't seen anyone else
respond
> yet, so I'll just give a few "pre-review"
observations/suggestions that I
> noticed. As a first step, and if you haven't already, take a look at
> https://llvm.org/docs/DeveloperPolicy.html and
> https://llvm.org/docs/GettingStarted.html#sending-patches for concrete
> information and steps for submitting your patch.
>
>    1. Reviewers will ask you to format your code according to LLVM Coding
>    Standards. I see a few issues, such as some non-standard variable naming
>    and code formatting. Since this is new code, you can perform blanket
>    "clang-format" and "clang-tidy" runs over all the
code to resolve most
>    issues automatically before submitting your patch (or at least before
>    committing).
>    2. This work is billed in a way that sounds like a generic
"GPU"
>    analysis. However, as written today, it is CUDA/NVIDIA specific. For
>    example, UncoalescedAnalysis::ExecuteInstruction references NVVM
intrinsics
>    (e.g., llvm.nvvm.read.ptx.sreg.tid.x).
>    Also, UncoalescedAnalysis::getConstantExprValue hardcodes
NVIDIA-specific
>    address spaces 3 and 4 for Shared and Constant, respectively (hardcoding
>    those values is itself another issue). That said, much of the code is
>    generic. Probably what makes sense here is to isolate the above
instances
>    (and any other target dependences) into target-specific hooks.
>    TargetTransformInfo might be the right place for these (see similar APIs
>    like TTI::isSourceOfDivergence). In this way, the bulk of the code
remains
>    target-independent, and the various GPUs only need to implement their
>    (hopefully very small) specific hooks/overrides to utilize the analyses.
Of
>    course, there is still the external issues-- such as what specific
vendor
>    tools/libraries are needed, and that would be documented separately by
each
>    GPU target.
>    3. There don't seem to be any simple unit/regression
("lit") tests. I
>    do see that there are the benchmark directories NVIDIA_CUDA-8.0_Samples
and
>    rodinia_3.1, but these aren't suitable unit tests. In fact, those
might be
>    good to add to the LLVM "test suite" (see
>    https://llvm.org/docs/TestSuiteGuide.html). Instead, patches are
>    usually required to have some way to unit-test them. Often these are
small
>    IR tests. In some cases, C++ test cases are used when it isn't
feasible to
>    test via IR (testing, say, operations on a new ADT). See,
>    e.g., llvm/test/Analysis for examples of testing analysis passes.
>    4. I wonder if it makes sense to promote the GPU Drano static analysis
>    to a full-fledged LLVM tool (e.g., llvm/tools/llvm-gpudrano) instead of
>    manually running a series of clang+opt steps? That might make it a bit
more
>    convenient to use. Even though the tool is today CUDA specific, it could
>    have a target flag where only a NVIDIA/CUDA value is recognized and
>    implemented. Just a thought and totally optional.
>    5. Adding some documentation would be useful. At the minimum, you
>    might add a paragraph to llvm/docs/Passes.rst. But a more substantial
>    write-up (like the one for llvm/docs/Vectorizers.rst) would be even
better.
>    6.
>    7.
>
> ------------------------------
> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of
Nimit
> Singhania via llvm-dev <llvm-dev at lists.llvm.org>
> *Sent:* Monday, October 25, 2021 2:50 PM
> *To:* llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
> *Subject:* [llvm-dev] Static Analysis for GPU Program Performance in LLVM
>
> *External email: Use caution opening links or attachments*
> Hi there,
>
> I would like to propose addition of two static analyses to LLVM framework
> that can help detect performance issues in GPU programs: The first analysis
> directly detects the issue with memory congestion across GPU threads; the
> second analysis checks independence for block-size for synchronization-free
> programs that allows performance tuning of block-size without impacting
> correctness. Both these static analyses were developed as part of my PhD
> thesis and are available on github. Please see the link here to see more
> details:
>
> https://github.com/upenn-acg/gpudrano-static-analysis_v1.0
>
> We would like to upstream these analyses to LLVM. There are many
> advantages of the work. These are ground-breaking analyses that allow
> light-weight compile-time detection of performance and correctness issues
> in GPU programs that concern *inter-thread *behavior. Being light-weight
> allows them to operate efficiently at compile-time. Inter-thread behavior
> of the program concerns the behaviors of the program that are observed due
> to the interaction between threads and are not local to individual threads.
> Such analysis is difficult to perform in a generic multi-threaded program,
> however due to the regularity of GPU parallelism, the analyses are feasible
> at compile-time.
>
> These analyses can be the basis for optimizations that can improve the
> performance of GPU programs multifold. Given the complexity of GPU
> programming and the lack of support for tools in this space, the analyses
> provide the first steps towards robust tools for analysis and optimization
> of GPU programs. There are two publications that have been published for
> this work, which can be found in the references below. I would be happy to
> answer any questions or concerns regarding this work.
>
> Regards,
> Nimit
>
> References:
> 1. FMSD 2021: Static analysis for detecting uncoalesced accesses in GPU
> programs, Rajeev Alur, Joseph Devietti, Omar Navarro Leija, and Nimit
> Singhania.
> 2. SAS 2018: Block-Size Independence of GPU Programs, Rajeev Alur, Joseph
> Devietti, and Nimit Singhania.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

-- 
*Disclaimer: Views, concerns, thoughts, questions, ideas expressed in this
mail are of my own and my employer has no take in it. *
Thank You.
Madhur D. Amilkanthwar
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20211117/d7828137/attachment.html>

Nimit Singhania via llvm-dev

2021-Dec-02 07:05 UTC

head link

[llvm-dev] Static Analysis for GPU Program Performance in LLVM

Thank you Jason and Madhur for your remarks. They are very helpful. I am
working on cleaning up the code base and porting the code into an LLVM tool.

I quickly wanted to check if you know of a tutorial or a reference tool
that could guide me on creating a standalone tool. I would very much
appreciate your help.

Kind regards,
Nimit

On Tue, Nov 16, 2021 at 7:44 PM Madhur Amilkanthwar <madhur13490 at
gmail.com>
wrote:
> Hi Nimit,
> I am happy to see that the work related to uncoalesced access is becoming
> mainstream and targeting LLVM. When we started working on this back in 2013
> (during my masters), we had just a small codebase and we never upstreamed
> it. Of course, our algorithm was the first attempt to detect uncoalesced
> accesses and there has been a lot of intellectual progress on this topic
> since then.
>
> I am happy to see that you have cited our work in your paper :)
>
> I second Jason's thoughts and specifically about the below
>
> >I wonder if it makes sense to promote the GPU Drano static analysis to
a
> full-fledged LLVM tool (e.g., llvm/tools/llvm-gpudrano) instead of manually
> running a series of clang+opt steps? That might make it a bit more
> convenient to use. Even though the tool is today CUDA specific, it could
> have a target flag where only a NVIDIA/CUDA value is recognized and
> implemented. Just a thought and totally optional.
>
> It makes sense to have this as a standalone tool. It will
> definitely remove barriers to access the tool.
>
> I would love to review the code when you create a patch!
>
>
>
> On Wed, Nov 17, 2021 at 7:24 AM Jason Eckhardt via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi Nimit,
>>
>> This is interesting and promising work! I haven't seen anyone else
>> respond yet, so I'll just give a few "pre-review"
observations/suggestions
>> that I noticed. As a first step, and if you haven't already, take a
look
>> at https://llvm.org/docs/DeveloperPolicy.html and
>> https://llvm.org/docs/GettingStarted.html#sending-patches for concrete
>> information and steps for submitting your patch.
>>
>>    1. Reviewers will ask you to format your code according to LLVM
>>    Coding Standards. I see a few issues, such as some non-standard
variable
>>    naming and code formatting. Since this is new code, you can perform
blanket
>>    "clang-format" and "clang-tidy" runs over all
the code to resolve most
>>    issues automatically before submitting your patch (or at least
before
>>    committing).
>>    2. This work is billed in a way that sounds like a generic
"GPU"
>>    analysis. However, as written today, it is CUDA/NVIDIA specific. For
>>    example, UncoalescedAnalysis::ExecuteInstruction references NVVM
intrinsics
>>    (e.g., llvm.nvvm.read.ptx.sreg.tid.x).
>>    Also, UncoalescedAnalysis::getConstantExprValue hardcodes
NVIDIA-specific
>>    address spaces 3 and 4 for Shared and Constant, respectively
(hardcoding
>>    those values is itself another issue). That said, much of the code
is
>>    generic. Probably what makes sense here is to isolate the above
instances
>>    (and any other target dependences) into target-specific hooks.
>>    TargetTransformInfo might be the right place for these (see similar
APIs
>>    like TTI::isSourceOfDivergence). In this way, the bulk of the code
remains
>>    target-independent, and the various GPUs only need to implement
their
>>    (hopefully very small) specific hooks/overrides to utilize the
analyses. Of
>>    course, there is still the external issues-- such as what specific
vendor
>>    tools/libraries are needed, and that would be documented separately
by each
>>    GPU target.
>>    3. There don't seem to be any simple unit/regression
("lit") tests. I
>>    do see that there are the benchmark directories
NVIDIA_CUDA-8.0_Samples and
>>    rodinia_3.1, but these aren't suitable unit tests. In fact,
those might be
>>    good to add to the LLVM "test suite" (see
>>    https://llvm.org/docs/TestSuiteGuide.html). Instead, patches are
>>    usually required to have some way to unit-test them. Often these are
small
>>    IR tests. In some cases, C++ test cases are used when it isn't
feasible to
>>    test via IR (testing, say, operations on a new ADT). See,
>>    e.g., llvm/test/Analysis for examples of testing analysis passes.
>>    4. I wonder if it makes sense to promote the GPU Drano static
>>    analysis to a full-fledged LLVM tool (e.g.,
llvm/tools/llvm-gpudrano)
>>    instead of manually running a series of clang+opt steps? That might
make it
>>    a bit more convenient to use. Even though the tool is today CUDA
specific,
>>    it could have a target flag where only a NVIDIA/CUDA value is
recognized
>>    and implemented. Just a thought and totally optional.
>>    5. Adding some documentation would be useful. At the minimum, you
>>    might add a paragraph to llvm/docs/Passes.rst. But a more
substantial
>>    write-up (like the one for llvm/docs/Vectorizers.rst) would be even
better.
>>    6.
>>    7.
>>
>> ------------------------------
>> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf
of Nimit
>> Singhania via llvm-dev <llvm-dev at lists.llvm.org>
>> *Sent:* Monday, October 25, 2021 2:50 PM
>> *To:* llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org>
>> *Subject:* [llvm-dev] Static Analysis for GPU Program Performance in
LLVM
>>
>> *External email: Use caution opening links or attachments*
>> Hi there,
>>
>> I would like to propose addition of two static analyses to LLVM
framework
>> that can help detect performance issues in GPU programs: The first
analysis
>> directly detects the issue with memory congestion across GPU threads;
the
>> second analysis checks independence for block-size for
synchronization-free
>> programs that allows performance tuning of block-size without impacting
>> correctness. Both these static analyses were developed as part of my
PhD
>> thesis and are available on github. Please see the link here to see
more
>> details:
>>
>> https://github.com/upenn-acg/gpudrano-static-analysis_v1.0
>>
>> We would like to upstream these analyses to LLVM. There are many
>> advantages of the work. These are ground-breaking analyses that allow
>> light-weight compile-time detection of performance and correctness
issues
>> in GPU programs that concern *inter-thread *behavior. Being
light-weight
>> allows them to operate efficiently at compile-time. Inter-thread
behavior
>> of the program concerns the behaviors of the program that are observed
due
>> to the interaction between threads and are not local to individual
threads.
>> Such analysis is difficult to perform in a generic multi-threaded
program,
>> however due to the regularity of GPU parallelism, the analyses are
feasible
>> at compile-time.
>>
>> These analyses can be the basis for optimizations that can improve the
>> performance of GPU programs multifold. Given the complexity of GPU
>> programming and the lack of support for tools in this space, the
analyses
>> provide the first steps towards robust tools for analysis and
optimization
>> of GPU programs. There are two publications that have been published
for
>> this work, which can be found in the references below. I would be happy
to
>> answer any questions or concerns regarding this work.
>>
>> Regards,
>> Nimit
>>
>> References:
>> 1. FMSD 2021: Static analysis for detecting uncoalesced accesses in GPU
>> programs, Rajeev Alur, Joseph Devietti, Omar Navarro Leija, and Nimit
>> Singhania.
>> 2. SAS 2018: Block-Size Independence of GPU Programs, Rajeev Alur,
Joseph
>> Devietti, and Nimit Singhania.
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
> --
> *Disclaimer: Views, concerns, thoughts, questions, ideas expressed in this
> mail are of my own and my employer has no take in it. *
> Thank You.
> Madhur D. Amilkanthwar
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20211201/df28a04e/attachment.html>

llvm dev - Dec 2021 - Static Analysis for GPU Program Performance in LLVM

[llvm-dev] Static Analysis for GPU Program Performance in LLVM

[llvm-dev] Static Analysis for GPU Program Performance in LLVM

[llvm-dev] Static Analysis for GPU Program Performance in LLVM