search for: amdgcn

Displaying 20 results from an estimated 44 matches for "amdgcn".

2017 Jun 15
2
Implementing cross-thread reduction in the AMDGPU backend
...t;>>> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 >>>>>>>> >>>>>>>> The problem is that the way these instructions use the DPP word isn't >>>>>>>> currently expressible in LLVM. We have the llvm.amdgcn.mov_dpp >>>>>>>> intrinsic, but it isn't enough. For example, take the first >>>>>>>> instruction: >>>>>>>> >>>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 >>>>>>>> >>>>&g...
2017 Jun 15
1
Implementing cross-thread reduction in the AMDGPU backend
...;>>>>>> 7 >>>>>>>>> >>>>>>>>> The problem is that the way these instructions use the DPP >>>>>>>>> word isn't currently expressible in LLVM. We have the >>>>>>>>> llvm.amdgcn.mov_dpp intrinsic, but it isn't enough. For >>>>>>>>> example, take the first >>>>>>>>> instruction: >>>>>>>>> >>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 >>>>>>>>&g...
2016 Mar 05
2
[AMDGPU] non-hsa intrinsic with hsa target
...more than one workgroup, the output of the program wasn't correct at that time. I guessed this might be because get_group_id() always returned 1 (not quite sure what was going on at that time). When I compile such cases using current llvm trunk, it uses a set of instrinsics starting with llvm.amdgcn, while it still uses llvm.r600.read.local.size.x(). The output LLVM IR code is like: define void @g(float addrspace(1)* nocapture %array) #0 { %x.i.i = tail call i32 @llvm.amdgcn.workgroup.id.x() #2 %x.i12.i = tail call i32 @llvm.r600.read.local.size.x() #1 %mul26.i = mul i32 %x.i12.i, %x.i....
2016 Mar 05
2
[AMDGPU] non-hsa intrinsic with hsa target
Hi Mr. Liu, Thanks for your quick reply. I compiled the code with the libclc_trunk and linked the bitcode file under $LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc. After looking into the libclc, it is currently using the new workitem intrinsics (commit ba9858caa1e927a6fcc601e3466faa693835db5e). In the linked bitcode ($LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc), it has the following code segment, define linkonce_odr i32 @get_global_id(i32 %dim) #5 { entr...
2017 Jun 14
5
Implementing cross-thread reduction in the AMDGPU backend
...gt;> v_nop >>>>>> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 >>>>>> >>>>>> The problem is that the way these instructions use the DPP word isn't >>>>>> currently expressible in LLVM. We have the llvm.amdgcn.mov_dpp >>>>>> intrinsic, but it isn't enough. For example, take the first >>>>>> instruction: >>>>>> >>>>>> v_foo_f32 v1, v0, v1 row_shr:1 >>>>>> >>>>>> What it's doing is shifting v...
2017 Jun 13
2
Implementing cross-thread reduction in the AMDGPU backend
...to avoid a data hazard >>>> v_nop >>>> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 >>>> >>>> The problem is that the way these instructions use the DPP word isn't >>>> currently expressible in LLVM. We have the llvm.amdgcn.mov_dpp >>>> intrinsic, but it isn't enough. For example, take the first >>>> instruction: >>>> >>>> v_foo_f32 v1, v0, v1 row_shr:1 >>>> >>>> What it's doing is shifting v0 right by one within each row and adding >&g...
2017 Jun 14
0
Implementing cross-thread reduction in the AMDGPU backend
...gt;>> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 >>>>>> >>>>>> The problem is that the way these instructions use the DPP word >>>>>> isn't currently expressible in LLVM. We have the >>>>>> llvm.amdgcn.mov_dpp intrinsic, but it isn't enough. For example, >>>>>> take the first >>>>>> instruction: >>>>>> >>>>>> v_foo_f32 v1, v0, v1 row_shr:1 >>>>>> >>>>>> What it's doing is shifting...
2020 Apr 15
3
Backend emitting to string instead of file
I can use llc to compile my IR module to amdgcn with some non-zero output. However, if try to write the output (assembly or object) to a string (via buffer_ostream) the resulting string has always zero length. Here the code changes I do: Original llc:       if (Target->addPassesToEmitFile(PM, *OS,                                      ...
2017 Jun 12
2
Implementing cross-thread reduction in the AMDGPU backend
...nop // Add two independent instructions to avoid a data hazard >> v_nop >> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 >> >> The problem is that the way these instructions use the DPP word isn't >> currently expressible in LLVM. We have the llvm.amdgcn.mov_dpp >> intrinsic, but it isn't enough. For example, take the first >> instruction: >> >> v_foo_f32 v1, v0, v1 row_shr:1 >> >> What it's doing is shifting v0 right by one within each row and adding >> it to v1. v1 stays the same in the first lane...
2017 Jun 12
4
Implementing cross-thread reduction in the AMDGPU backend
...ow_bcast:15 row_mask:0xa // Instruction 6 v_nop // Add two independent instructions to avoid a data hazard v_nop v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 The problem is that the way these instructions use the DPP word isn't currently expressible in LLVM. We have the llvm.amdgcn.mov_dpp intrinsic, but it isn't enough. For example, take the first instruction: v_foo_f32 v1, v0, v1 row_shr:1 What it's doing is shifting v0 right by one within each row and adding it to v1. v1 stays the same in the first lane of each row, however. With llvm.amdgcn.mov_dpp, we could try...
2017 May 08
2
[OpenCL][AMDGPU] Using AMDGPU generated kernel code for OpenCL
...g from the NVPTX backend and pass that to OpenCL's 'clCreateProgramWithBinary' function. However, when doing the same with the AMDGPU backend and its returned kernel string, OpenCL complains about an invalid binary. This has been tried with a number of different target triples (eg. 'amdgcn--', 'amdgcn-amd-amdhsa' etc), and my assumption so far is, that I am not trying the correct Triple. Or am I missing something entirely, and there have to be additional steps, to get the correct ELF binary? Thank you in advance for any help and pointers! Best, Philipp -------------- ne...
2017 Dec 14
2
[RFC] Add TargetTransformInfo::isAllocaPtrValueNonZero and let ValueTracking depend on TargetTransformInfo
...n-zero value. Currently, this checking is done by isKnownNonZero() in ValueTracking, and it assumes alloca in address space 0 always has non-zero value but alloca in non-zero address spaces does not always have non-zero value. However, this assumption is incorrect for certain targets. For example, amdgcn---amdgiz target has alloca in address space 5, and its alloca always has non-zero value. This assumption causes some optimizations disabled for amdgcn---amdgiz target. After discussions at https://reviews.llvm.org/D40670, I propose to introduce TargetTransformInfo::isAllocaPtrValueNonZero for repr...
2017 Dec 14
3
[RFC] Add TargetTransformInfo::isAllocaPtrValueNonZero and let ValueTracking depend on TargetTransformInfo
Hal, Thanks for your suggestion. I think that makes sense. Currently, non-zero alloca address space is already represented by data layout, e.g., the last component of the data layout of amdgcn---amdgiz target is -A5, which means alloca is in address space 5. How about adding a letter z to -A5 to indicate alloca may have zero value? i.e. -A5 means alloca is in address space 5 and always has non-zero value, -A5z means alloca is in address space 5 and may have zero value. Then we can add f...
2019 Sep 16
2
Changing behavior of lit.py's -v flag
Tim Northover via llvm-dev <llvm-dev at lists.llvm.org> writes: > Hi Varun, > > I'm definitely in favour of making -v more useful like this. > > On Thu, 12 Sep 2019 at 19:31, Varun Gandhi via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Option 2 (less deviation from status quo): >> -v: Adopts behavior of -vvv from Option 1. :) >> -vv: Same
2015 Jun 08
2
[LLVMdev] R600 -> AMDGPU rename coming on Friday
...;m finally going to do the R600->AMDGPU rename this Friday. This is something I originally proposed last July [1], but had to put off in order to avoid creating really bad merge headaches for some users. The only change from my original proposal is that I'll just keep the existing r600 and amdgcn triples rather than adding a new one for amdgpu. If anyone has any strong objections to me doing this on Friday, please let me know. Thanks, Tom [1] htttp://lists.cs.uiuc.edu/pipermail/llvmdev/2014-July/075151.html
2017 Dec 05
2
[AMDGPU] Strange results with different address spaces
...208 208 208 208 208 208 208 208 208 208 208 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 240 240 240 240 240 240 240 240 240 240 240 240 240 240 240 240 > > It looks like the addressing in as1.ll is incorrectly concluded to be uniform: > > %6 = tail call i32 @llvm.amdgcn.workitem.id.x() #0, !range !11 > %7 = tail call i32 @llvm.amdgcn.workgroup.id.x() #0 > %mul.i.i.i.i.i = mul nsw i32 %7, %3 > %add.i.i.i.i.i = add nsw i32 %mul.i.i.i.i.i, %6 > %idxprom.i.i.i = sext i32 %add.i.i.i.i.i to i64 > %8 = getelementptr i32, i32 addrspace(1)* %callab...
2017 Dec 06
2
[AMDGPU] Strange results with different address spaces
...pass or analysis on the examples it does the right thing and sees the load as divergent. $ opt -S -analyze -divergence -o - as1.ll Printing analysis 'Divergence Analysis' for function '_ZN5pacxx2v213genericKernelIZL12test_barrieriPPcE3$_0EEvT_': DIVERGENT: %6 = tail call i32 @llvm.amdgcn.workitem.id.x() #0, !range !11 DIVERGENT: %add.i.i.i.i.i = add nsw i32 %mul.i.i.i.i.i, %6 DIVERGENT: %idxprom.i.i.i = sext i32 %add.i.i.i.i.i to i64 DIVERGENT: %8 = getelementptr i32, i32 addrspace(1)* %callable.coerce0, i64 %idxprom.i.i.i DIVERGENT: %9 = load i32, i32 addrspace(1)* %8, align 4...
2015 Sep 04
2
Testing "normal" cross-compilers versus GPU backends
...ries. >> >> Failing tests attached, let me know which ones you’d like me to investigate. > > Tests: > <failing_tests> > > (note I forced enable the “native” feature on this run) So, just looking at the first one in that list for an example, running: llc -mtriple amdgcn test/CodeGen/Generic/2002-04-14-UnexpectedUnsignedType.ll you get an error of: error: unsupported call to function bar in foo ...because apparently AMDGPU simply doesn't support function calls. At all. That is a rather unusual feature to be missing, and I'd imagine is causing a fair numbe...
2016 May 03
4
Is the CppBackend still supported?
...(http://llvm.org/): LLVM version 3.7.1 Optimized build. Built Apr 4 2016 (15:04:44). Default target: x86_64-unknown-linux-gnu Host CPU: ivybridge Registered Targets: aarch64 - AArch64 (little endian) aarch64_be - AArch64 (big endian) amdgcn - AMD GCN GPUs arm - ARM arm64 - ARM64 (little endian) armeb - ARM (big endian) bpf - BPF (host endian) bpfeb - BPF (big endian) bpfel - BPF (little endian) cpp - C++ backend hexagon - He...
2015 Sep 29
2
OpenCL toolset (for AMD GPU)
...> Hi, > > You need to include OpenCL library headers from libclc > (http://libclc.llvm.org/) to compile most OpenCL code. > > Here is an example command: > > clang -include /path/to/libclc/headers/clc.h -I /path/to/libclc/headers -Dcl_clang_storage_class_specifiers -target amdgcn--amdhsa -mcpu=carrizo $INPUT_FILE -o $OUTPUT_FILE Hi Tom, to piggy-pack on this question. To load this kernel in OpenCL, is it sufficient to just pass $OUTPUT_FILE to clCreateProgramWithBinary? Also, assuming this is enough. Is the code quality for recent AMD GPUs quality-wise on the level of w...