Jakub (Kuba) Kuderski via llvm-dev
2021-Jul-04 18:34 UTC
[llvm-dev] LLVM GPU News Issue #15, July 02 2021
Hi folks, The 15th issue of LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella, is out: <https://llvm-gpu-news.github.io/2021/07/02/issue-15.html>. I also pasted the content below, in case you prefer to read in your email client. -Jakub ===================================================================== # LLVM GPU News Issue #15, July 02 2021 Authors: Jakub Kuderski Welcome to LLVM GPU News, a bi-weekly newsletter on all the GPU things under the LLVM umbrella. This issue covers the period from June 18 to July 1 2021. We welcome your feedback and suggestions. Let us know if we missed anything interesting, or want us to bring attention to your (sub)project, revisions under review, or proposals. Please see the bottom of the page for details on how to submit suggestions and contribute. ## Industry News and Conferences ## LLVM and Clang ### Discussions ### Commits * New NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 matrix operations: `wmma.load`, `wmma.store`, `wmma.mma`, and `mma`. [D104847]( https://reviews.llvm.org/D104847) * AMDGPU learned to optimize VGPR live-ranges in simple divergent if-else statements. [D102212](https://reviews.llvm.org/D102212) * New AMDGPU target: gfx1035. [D104804](https://reviews.llvm.org/D104804) * AMDGPU gfx90a memory model has been updated. [D105137]( https://reviews.llvm.org/D105137) * New 224-bit vector types for AMDGPU. These map to `v7i32`/`v7f32`, while existing 192-bit types to newly added `v3i64`/`v3f64`/`v6i32`/`v6f32`. [D104622](https://reviews.llvm.org/D104622) * The `ReplaceLDS` AMDGPU pass is now disabled by default in preparation to later remove the code. [D104962](https://reviews.llvm.org/D104962) ## MLIR ### Discussions ### Commits * New NVPTX ops for warp synchronous matrix operations for the GPU and NNVM dialects. [D95330](https://reviews.llvm.org/D95330), [D95331]( https://reviews.llvm.org/D95331), [D105175](https://reviews.llvm.org/D105175 ) ## OpenMP (Target Offloading) ### Discussions ### Commits * Multiple globalization improvements: * GPU memory globalization got simplified. The old implementation in the frontend that emulated standard CPU stack sharing is now replaced with a single allocation command, mimicking an `alloca` instruction for variables that must be shared between threads. [D97680]( https://reviews.llvm.org/D97680), [D97818](https://reviews.llvm.org/D97818) * OpenMP device routines will be internalized to facilitate interprocedural optimizations. [D102824](https://reviews.llvm.org/D102824) * The number of Attributor iterations is doubled from 64 to 128 on the GPU target. [D104920](https://reviews.llvm.org/D104920) * Remaining globalization optimizations will be reported as missed remarks instead of analysis remarks. [D104735]( https://reviews.llvm.org/D104735) * `clang-offload-bundler` can now unbundle archives containing bundled object files into device-specific archives. [D93525]( https://reviews.llvm.org/D93525) ## External Compilers ### LLPC * LLPC can now generate out-of-bounds checks for scratch accesses (stack variables). [LLPC#1260](https://github.com/GPUOpen-Drivers/llpc/pull/1260) * New utilities for iterating over enums using C++ iterators and ranges. [LLPC#1273](https://github.com/GPUOpen-Drivers/llpc/pull/1273), [LLPC#1299]( https://github.com/GPUOpen-Drivers/llpc/pull/1299) ### Mesa -- Jakub Kuderski -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210704/e8730364/attachment.html>