similar to: Implementing cross-thread reduction in the AMDGPU backend

Displaying 20 results from an estimated 400 matches similar to: "Implementing cross-thread reduction in the AMDGPU backend"

2017 Jun 12
2
Implementing cross-thread reduction in the AMDGPU backend
On 06/12/2017 07:15 PM, Tom Stellard via llvm-dev wrote: > cc some people who have worked on this. > > On 06/12/2017 05:58 PM, Connor Abbott via llvm-dev wrote: >> Hi all, >> >> I've been looking into how to implement the more advanced Shader Model >> 6 reduction operations in radv (and obviously most of the work would >> be useful for radeonsi too).
2017 Jun 13
2
Implementing cross-thread reduction in the AMDGPU backend
On 06/12/2017 08:03 PM, Connor Abbott wrote: > On Mon, Jun 12, 2017 at 4:56 PM, Tom Stellard <tstellar at redhat.com> wrote: >> On 06/12/2017 07:15 PM, Tom Stellard via llvm-dev wrote: >>> cc some people who have worked on this. >>> >>> On 06/12/2017 05:58 PM, Connor Abbott via llvm-dev wrote: >>>> Hi all, >>>> >>>>
2017 Jun 14
0
Implementing cross-thread reduction in the AMDGPU backend
Sorry about the formatting... Anyway, I think there may be a misinterpretation of bound_cntl. My understanding is that: 0 => if the source is invalid or disabled, do not write a result 1 => if the source is invalid or disabled, use a 0 instead So the problematic case is where bound_cntl is 0, not when it is 1. -----Original Message----- From: Tom Stellard [mailto:tstellar at redhat.com]
2017 Jun 14
5
Implementing cross-thread reduction in the AMDGPU backend
On 06/13/2017 07:33 PM, Matt Arsenault wrote: > >> On Jun 12, 2017, at 17:23, Tom Stellard <tstellar at redhat.com <mailto:tstellar at redhat.com>> wrote: >> >> On 06/12/2017 08:03 PM, Connor Abbott wrote: >>> On Mon, Jun 12, 2017 at 4:56 PM, Tom Stellard <tstellar at redhat.com <mailto:tstellar at redhat.com>> wrote: >>>> On
2017 Jun 15
2
Implementing cross-thread reduction in the AMDGPU backend
On 06/14/2017 05:05 PM, Connor Abbott wrote: > On Tue, Jun 13, 2017 at 6:13 PM, Tom Stellard <tstellar at redhat.com> wrote: >> On 06/13/2017 07:33 PM, Matt Arsenault wrote: >>> >>>> On Jun 12, 2017, at 17:23, Tom Stellard <tstellar at redhat.com <mailto:tstellar at redhat.com>> wrote: >>>> >>>> On 06/12/2017 08:03 PM, Connor
2017 Jun 15
1
Implementing cross-thread reduction in the AMDGPU backend
I'm wondering about the focus on bound_cntl. Any cleared bit in the row_mask or bank_mask will also disable updating the result. Brian -----Original Message----- From: Connor Abbott [mailto:cwabbott0 at gmail.com] Sent: Wednesday, June 14, 2017 6:13 PM To: tstellar at redhat.com Cc: Matt Arsenault; llvm-dev at lists.llvm.org; Kolton, Sam; Sumner, Brian; Pykhtin, Valery Subject: Re:
2016 Mar 05
2
[AMDGPU] non-hsa intrinsic with hsa target
Dear Developers, I compiled a OpenCL kernel before (on Nov. last year) like __kernel void g(__global float* array) { array[get_global_id(0)] = 1; } with libclc, which would originally use the instrinsics like llvm.r600.read.local.size.x(). I executed the generated object file with one version of the hsa-runtime [1] provided by Mr. Stellard, when there was more than one workgroup, the output
2016 Mar 05
2
[AMDGPU] non-hsa intrinsic with hsa target
Hi Mr. Liu, Thanks for your quick reply. I compiled the code with the libclc_trunk and linked the bitcode file under $LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc. After looking into the libclc, it is currently using the new workitem intrinsics (commit ba9858caa1e927a6fcc601e3466faa693835db5e). In the linked bitcode ($LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc), it has the following code segment,
2020 Apr 15
3
Backend emitting to string instead of file
I can use llc to compile my IR module to amdgcn with some non-zero output. However, if try to write the output (assembly or object) to a string (via buffer_ostream) the resulting string has always zero length. Here the code changes I do: Original llc:       if (Target->addPassesToEmitFile(PM, *OS,                                       DwoOut ? &DwoOut->os() : nullptr,
2017 Dec 14
2
[RFC] Add TargetTransformInfo::isAllocaPtrValueNonZero and let ValueTracking depend on TargetTransformInfo
Some optimizations depend on whether alloca instruction always has non-zero value. Currently, this checking is done by isKnownNonZero() in ValueTracking, and it assumes alloca in address space 0 always has non-zero value but alloca in non-zero address spaces does not always have non-zero value. However, this assumption is incorrect for certain targets. For example, amdgcn---amdgiz target has
2017 May 08
2
[OpenCL][AMDGPU] Using AMDGPU generated kernel code for OpenCL
Hello everyone I was wondering, what the correct way of using an AMDGPU generated kernel code for OpenCL was. I am trying to provide Polly's GPGPU Code generation with the ability to run on different GPU devices, such as AMD GPUs. For NVIDIA, I simply retrieve a pre-compiled PTX string from the NVPTX backend and pass that to OpenCL's 'clCreateProgramWithBinary' function. However,
2017 Dec 14
3
[RFC] Add TargetTransformInfo::isAllocaPtrValueNonZero and let ValueTracking depend on TargetTransformInfo
Hal, Thanks for your suggestion. I think that makes sense. Currently, non-zero alloca address space is already represented by data layout, e.g., the last component of the data layout of amdgcn---amdgiz target is -A5, which means alloca is in address space 5. How about adding a letter z to -A5 to indicate alloca may have zero value? i.e. -A5 means alloca is in address space 5 and always has
2019 Sep 16
2
Changing behavior of lit.py's -v flag
Tim Northover via llvm-dev <llvm-dev at lists.llvm.org> writes: > Hi Varun, > > I'm definitely in favour of making -v more useful like this. > > On Thu, 12 Sep 2019 at 19:31, Varun Gandhi via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Option 2 (less deviation from status quo): >> -v: Adopts behavior of -vvv from Option 1. :) >> -vv: Same
2015 Jun 08
2
[LLVMdev] R600 -> AMDGPU rename coming on Friday
Hi, I'm finally going to do the R600->AMDGPU rename this Friday. This is something I originally proposed last July [1], but had to put off in order to avoid creating really bad merge headaches for some users. The only change from my original proposal is that I'll just keep the existing r600 and amdgcn triples rather than adding a new one for amdgpu. If anyone has any strong
2017 Dec 06
2
[AMDGPU] Strange results with different address spaces
> On Dec 6, 2017, at 02:28, Haidl, Michael <michael.haidl at uni-muenster.de> wrote: > > The IR goes through a backend agnostic preparation phase that brings it into SSA from and changes the AS from 0 to 1. This sounds possibly problematic to me. The IR should be created with the correct address space to begin with. Changing this in the middle sounds suspect. > After this
2017 Dec 05
2
[AMDGPU] Strange results with different address spaces
> On Dec 5, 2017, at 13:53, Matt Arsenault <arsenm2 at gmail.com> wrote: > > > >> On Dec 5, 2017, at 02:51, Haidl, Michael via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Hi dev list, >> >> I am currently exploring the integration of AMDGPU/ROCm into the PACXX project and observing some
2016 May 03
4
Is the CppBackend still supported?
Hello, I was trying to compile a simple program with the CppBackend like so: $ clang str_arg.c -emit-llvm -S $ llc -march=cpp str_arg.ll It produces a file `str_arg.cpp` as expected, however it doesn't seem that the resulting file is correct. For once, it includes `<llvm/Analysis/Verifier.h>` which seems to have been moved to `llvm/IR/Verifier.h` as far back as 2013. My question is
2015 Sep 29
2
OpenCL toolset (for AMD GPU)
Hi LLVM, I would like to compile OpenCL kernel for a specific AMD GPU target. Is it possible with the current clang/LLVM? I started by using `clang -x cl` but it looks like at least some OpenCL specific headers are missing (e.g. uint2 is not recognized as a type). Any links to documentation / tutorials very welcome. Thanks. - Paweł -------------- next part -------------- An HTML attachment was
2017 Dec 05
3
[AMDGPU] Strange results with different address spaces
Hi dev list, I am currently exploring the integration of AMDGPU/ROCm into the PACXX project and observing some strange behavior of the AMDGPU backend. The following IR is generated for a simple address space test that copies from global to shared memory and back to global after a barrier synchronization. Here is the IR is attached as as1.ll The output is as follows: 0 0 0 0 0 0 0 0 0 0 0 0 0
2015 Sep 04
2
Testing "normal" cross-compilers versus GPU backends
>>> The Hexagon precedent is interesting; Krzysztof said they set the default >>> triple, and didn't have to xfail all that much stuff. Searching the tree, >>> I find exactly 7 individual tests marked XFAIL: hexagon, plus it disables >>> all of ExecutionEngine, and turns off the 'object-emission' feature. >>> >>> I'm curious