thr3ads.net - search: "amdgcn"

Displaying 20 results from an estimated 44 matches for "amdgcn".

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 15

Implementing cross-thread reduction in the AMDGPU backend

...t;>>> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 >>>>>>>> >>>>>>>> The problem is that the way these instructions use the DPP word isn't >>>>>>>> currently expressible in LLVM. We have the llvm.amdgcn.mov_dpp >>>>>>>> intrinsic, but it isn't enough. For example, take the first >>>>>>>> instruction: >>>>>>>> >>>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 >>>>>>>> >>>>&g...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 15

Implementing cross-thread reduction in the AMDGPU backend

...;>>>>>> 7 >>>>>>>>> >>>>>>>>> The problem is that the way these instructions use the DPP >>>>>>>>> word isn't currently expressible in LLVM. We have the >>>>>>>>> llvm.amdgcn.mov_dpp intrinsic, but it isn't enough. For >>>>>>>>> example, take the first >>>>>>>>> instruction: >>>>>>>>> >>>>>>>>> v_foo_f32 v1, v0, v1 row_shr:1 >>>>>>>>&g...

[AMDGPU] non-hsa intrinsic with hsa target

2016 Mar 05

[AMDGPU] non-hsa intrinsic with hsa target

...more than one workgroup, the output of the program wasn't correct at that time. I guessed this might be because get_group_id() always returned 1 (not quite sure what was going on at that time). When I compile such cases using current llvm trunk, it uses a set of instrinsics starting with llvm.amdgcn, while it still uses llvm.r600.read.local.size.x(). The output LLVM IR code is like: define void @g(float addrspace(1)* nocapture %array) #0 { %x.i.i = tail call i32 @llvm.amdgcn.workgroup.id.x() #2 %x.i12.i = tail call i32 @llvm.r600.read.local.size.x() #1 %mul26.i = mul i32 %x.i12.i, %x.i....

[AMDGPU] non-hsa intrinsic with hsa target

2016 Mar 05

[AMDGPU] non-hsa intrinsic with hsa target

Hi Mr. Liu, Thanks for your quick reply. I compiled the code with the libclc_trunk and linked the bitcode file under $LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc. After looking into the libclc, it is currently using the new workitem intrinsics (commit ba9858caa1e927a6fcc601e3466faa693835db5e). In the linked bitcode ($LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc), it has the following code segment, define linkonce_odr i32 @get_global_id(i32 %dim) #5 { entr...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 14

Implementing cross-thread reduction in the AMDGPU backend

...gt;> v_nop >>>>>> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 >>>>>> >>>>>> The problem is that the way these instructions use the DPP word isn't >>>>>> currently expressible in LLVM. We have the llvm.amdgcn.mov_dpp >>>>>> intrinsic, but it isn't enough. For example, take the first >>>>>> instruction: >>>>>> >>>>>> v_foo_f32 v1, v0, v1 row_shr:1 >>>>>> >>>>>> What it's doing is shifting v...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 13

Implementing cross-thread reduction in the AMDGPU backend

...to avoid a data hazard >>>> v_nop >>>> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 >>>> >>>> The problem is that the way these instructions use the DPP word isn't >>>> currently expressible in LLVM. We have the llvm.amdgcn.mov_dpp >>>> intrinsic, but it isn't enough. For example, take the first >>>> instruction: >>>> >>>> v_foo_f32 v1, v0, v1 row_shr:1 >>>> >>>> What it's doing is shifting v0 right by one within each row and adding >&g...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 14

Implementing cross-thread reduction in the AMDGPU backend

...gt;>> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 >>>>>> >>>>>> The problem is that the way these instructions use the DPP word >>>>>> isn't currently expressible in LLVM. We have the >>>>>> llvm.amdgcn.mov_dpp intrinsic, but it isn't enough. For example, >>>>>> take the first >>>>>> instruction: >>>>>> >>>>>> v_foo_f32 v1, v0, v1 row_shr:1 >>>>>> >>>>>> What it's doing is shifting...

Backend emitting to string instead of file

2020 Apr 15

Backend emitting to string instead of file

I can use llc to compile my IR module to amdgcn with some non-zero output. However, if try to write the output (assembly or object) to a string (via buffer_ostream) the resulting string has always zero length. Here the code changes I do: Original llc: if (Target->addPassesToEmitFile(PM, *OS, ...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 12

Implementing cross-thread reduction in the AMDGPU backend

...nop // Add two independent instructions to avoid a data hazard >> v_nop >> v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 >> >> The problem is that the way these instructions use the DPP word isn't >> currently expressible in LLVM. We have the llvm.amdgcn.mov_dpp >> intrinsic, but it isn't enough. For example, take the first >> instruction: >> >> v_foo_f32 v1, v0, v1 row_shr:1 >> >> What it's doing is shifting v0 right by one within each row and adding >> it to v1. v1 stays the same in the first lane...

Implementing cross-thread reduction in the AMDGPU backend

2017 Jun 12

Implementing cross-thread reduction in the AMDGPU backend

...ow_bcast:15 row_mask:0xa // Instruction 6 v_nop // Add two independent instructions to avoid a data hazard v_nop v_foo_f32 v1, v1, v1 row_bcast:31 row_mask:0xc // Instruction 7 The problem is that the way these instructions use the DPP word isn't currently expressible in LLVM. We have the llvm.amdgcn.mov_dpp intrinsic, but it isn't enough. For example, take the first instruction: v_foo_f32 v1, v0, v1 row_shr:1 What it's doing is shifting v0 right by one within each row and adding it to v1. v1 stays the same in the first lane of each row, however. With llvm.amdgcn.mov_dpp, we could try...

[OpenCL][AMDGPU] Using AMDGPU generated kernel code for OpenCL

2017 May 08

[OpenCL][AMDGPU] Using AMDGPU generated kernel code for OpenCL

...g from the NVPTX backend and pass that to OpenCL's 'clCreateProgramWithBinary' function. However, when doing the same with the AMDGPU backend and its returned kernel string, OpenCL complains about an invalid binary. This has been tried with a number of different target triples (eg. 'amdgcn--', 'amdgcn-amd-amdhsa' etc), and my assumption so far is, that I am not trying the correct Triple. Or am I missing something entirely, and there have to be additional steps, to get the correct ELF binary? Thank you in advance for any help and pointers! Best, Philipp -------------- ne...

[RFC] Add TargetTransformInfo::isAllocaPtrValueNonZero and let ValueTracking depend on TargetTransformInfo

2017 Dec 14

[RFC] Add TargetTransformInfo::isAllocaPtrValueNonZero and let ValueTracking depend on TargetTransformInfo

...n-zero value. Currently, this checking is done by isKnownNonZero() in ValueTracking, and it assumes alloca in address space 0 always has non-zero value but alloca in non-zero address spaces does not always have non-zero value. However, this assumption is incorrect for certain targets. For example, amdgcn---amdgiz target has alloca in address space 5, and its alloca always has non-zero value. This assumption causes some optimizations disabled for amdgcn---amdgiz target. After discussions at https://reviews.llvm.org/D40670, I propose to introduce TargetTransformInfo::isAllocaPtrValueNonZero for repr...

[RFC] Add TargetTransformInfo::isAllocaPtrValueNonZero and let ValueTracking depend on TargetTransformInfo

2017 Dec 14

[RFC] Add TargetTransformInfo::isAllocaPtrValueNonZero and let ValueTracking depend on TargetTransformInfo

Hal, Thanks for your suggestion. I think that makes sense. Currently, non-zero alloca address space is already represented by data layout, e.g., the last component of the data layout of amdgcn---amdgiz target is -A5, which means alloca is in address space 5. How about adding a letter z to -A5 to indicate alloca may have zero value? i.e. -A5 means alloca is in address space 5 and always has non-zero value, -A5z means alloca is in address space 5 and may have zero value. Then we can add f...

Changing behavior of lit.py's -v flag

2019 Sep 16

Changing behavior of lit.py's -v flag

Tim Northover via llvm-dev <llvm-dev at lists.llvm.org> writes: > Hi Varun, > > I'm definitely in favour of making -v more useful like this. > > On Thu, 12 Sep 2019 at 19:31, Varun Gandhi via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Option 2 (less deviation from status quo): >> -v: Adopts behavior of -vvv from Option 1. :) >> -vv: Same

[LLVMdev] R600 -> AMDGPU rename coming on Friday

2015 Jun 08

[LLVMdev] R600 -> AMDGPU rename coming on Friday

...;m finally going to do the R600->AMDGPU rename this Friday. This is something I originally proposed last July [1], but had to put off in order to avoid creating really bad merge headaches for some users. The only change from my original proposal is that I'll just keep the existing r600 and amdgcn triples rather than adding a new one for amdgpu. If anyone has any strong objections to me doing this on Friday, please let me know. Thanks, Tom [1] htttp://lists.cs.uiuc.edu/pipermail/llvmdev/2014-July/075151.html

[AMDGPU] Strange results with different address spaces

2017 Dec 05

[AMDGPU] Strange results with different address spaces

...208 208 208 208 208 208 208 208 208 208 208 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 224 240 240 240 240 240 240 240 240 240 240 240 240 240 240 240 240 > > It looks like the addressing in as1.ll is incorrectly concluded to be uniform: > > %6 = tail call i32 @llvm.amdgcn.workitem.id.x() #0, !range !11 > %7 = tail call i32 @llvm.amdgcn.workgroup.id.x() #0 > %mul.i.i.i.i.i = mul nsw i32 %7, %3 > %add.i.i.i.i.i = add nsw i32 %mul.i.i.i.i.i, %6 > %idxprom.i.i.i = sext i32 %add.i.i.i.i.i to i64 > %8 = getelementptr i32, i32 addrspace(1)* %callab...

[AMDGPU] Strange results with different address spaces

2017 Dec 06

[AMDGPU] Strange results with different address spaces

...pass or analysis on the examples it does the right thing and sees the load as divergent. $ opt -S -analyze -divergence -o - as1.ll Printing analysis 'Divergence Analysis' for function '_ZN5pacxx2v213genericKernelIZL12test_barrieriPPcE3$_0EEvT_': DIVERGENT: %6 = tail call i32 @llvm.amdgcn.workitem.id.x() #0, !range !11 DIVERGENT: %add.i.i.i.i.i = add nsw i32 %mul.i.i.i.i.i, %6 DIVERGENT: %idxprom.i.i.i = sext i32 %add.i.i.i.i.i to i64 DIVERGENT: %8 = getelementptr i32, i32 addrspace(1)* %callable.coerce0, i64 %idxprom.i.i.i DIVERGENT: %9 = load i32, i32 addrspace(1)* %8, align 4...

Testing "normal" cross-compilers versus GPU backends

2015 Sep 04

Testing "normal" cross-compilers versus GPU backends

...ries. >> >> Failing tests attached, let me know which ones you’d like me to investigate. > > Tests: > <failing_tests> > > (note I forced enable the “native” feature on this run) So, just looking at the first one in that list for an example, running: llc -mtriple amdgcn test/CodeGen/Generic/2002-04-14-UnexpectedUnsignedType.ll you get an error of: error: unsupported call to function bar in foo ...because apparently AMDGPU simply doesn't support function calls. At all. That is a rather unusual feature to be missing, and I'd imagine is causing a fair numbe...

Is the CppBackend still supported?

2016 May 03

Is the CppBackend still supported?

...(http://llvm.org/): LLVM version 3.7.1 Optimized build. Built Apr 4 2016 (15:04:44). Default target: x86_64-unknown-linux-gnu Host CPU: ivybridge Registered Targets: aarch64 - AArch64 (little endian) aarch64_be - AArch64 (big endian) amdgcn - AMD GCN GPUs arm - ARM arm64 - ARM64 (little endian) armeb - ARM (big endian) bpf - BPF (host endian) bpfeb - BPF (big endian) bpfel - BPF (little endian) cpp - C++ backend hexagon - He...

OpenCL toolset (for AMD GPU)

2015 Sep 29

OpenCL toolset (for AMD GPU)

...> Hi, > > You need to include OpenCL library headers from libclc > (http://libclc.llvm.org/) to compile most OpenCL code. > > Here is an example command: > > clang -include /path/to/libclc/headers/clc.h -I /path/to/libclc/headers -Dcl_clang_storage_class_specifiers -target amdgcn--amdhsa -mcpu=carrizo $INPUT_FILE -o $OUTPUT_FILE Hi Tom, to piggy-pack on this question. To load this kernel in OpenCL, is it sufficient to just pass $OUTPUT_FILE to clCreateProgramWithBinary? Also, assuming this is enough. Is the code quality for recent AMD GPUs quality-wise on the level of w...

search for: amdgcn