similar to: [AMDGPU] non-hsa intrinsic with hsa target

Displaying 20 results from an estimated 700 matches similar to: "[AMDGPU] non-hsa intrinsic with hsa target"

2016 Mar 05
2
[AMDGPU] non-hsa intrinsic with hsa target
Hi Mr. Liu, Thanks for your quick reply. I compiled the code with the libclc_trunk and linked the bitcode file under $LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc. After looking into the libclc, it is currently using the new workitem intrinsics (commit ba9858caa1e927a6fcc601e3466faa693835db5e). In the linked bitcode ($LIBCLC_DIR/built_libs/tahiti-amdgcn--.bc), it has the following code segment,
2018 Sep 05
4
Can I control HSA config generated by AMDGPU backend?
Finally I kind of modified llvm to generate assembly that can run on AMDGPU pro drivers. One problem is the performance of the code generated by llvm is about 10% slower than amdgpu's online compiler. Anything I can tune the performance up the performance of llvm?\ Thanks! On Tue, Sep 4, 2018 at 9:23 AM 董昌道 <dongchangdao at gmail.com> wrote: > I am writing a miner of crypto
2015 Sep 29
2
OpenCL toolset (for AMD GPU)
On 09/29/2015 04:19 PM, Tom Stellard via llvm-dev wrote: > On Tue, Sep 29, 2015 at 01:20:57PM +0000, Paweł Bylica via llvm-dev wrote: >> Hi LLVM, >> >> I would like to compile OpenCL kernel for a specific AMD GPU target. Is it >> possible with the current clang/LLVM? >> >> I started by using `clang -x cl` but it looks like at least some OpenCL >>
2015 Oct 23
3
[AMDGPU] AMDGPUAsmParser fails to parse several instructions
Dear Developers, I compile a OpenCL kernel, FFT, in AMDAPP SDK v2.5 using clang 3.8 + libclc and assembling the code with lld (The LLVM linker). The assembly code contains the following assembly codes (and lots of other similar format assembly) that fails to be parsed by AMDGPUAsmParser. It seems to me that both are valid instructions after looking at the SI instruction spec. s_mov_b32 s0,
2010 Sep 29
3
[LLVMdev] spilling & xmm register usage
Hello everybody, I have stumbled upon a test case (the attached module is a slightly reduced version) that shows extremely reduced performance on linux compared to windows when executed using LLVM's JIT. We narrowed the problem down to the actual code being generated, the source IR on both systems is the same. Try compiling the attached module: llc -O3 -filetype=asm -o BAD.s BAD.ll Under
2011 Oct 10
3
[LLVMdev] Disable Short-Circuit Evaluation?
Is there any way to disable short-circuit evaluation of expressions in Clang/LLVM? Let's say I have C code like the following: bool validX = get_group_id(0) > 32; int globalIndexY0 = get_group_id(1)*186 + 6*get_local_id(1) + 0 + 1; bool valid0 = validX && globalIndexY0 >= 4 && globalIndexY0 < 3910; int globalIndexY1 = get_group_id(1)*186 + 6*get_local_id(1) +
2017 Dec 05
2
[AMDGPU] Strange results with different address spaces
> On Dec 5, 2017, at 13:53, Matt Arsenault <arsenm2 at gmail.com> wrote: > > > >> On Dec 5, 2017, at 02:51, Haidl, Michael via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Hi dev list, >> >> I am currently exploring the integration of AMDGPU/ROCm into the PACXX project and observing some
2006 May 16
0
[PATCH][SVM][5/5] add hsa for ucode
SVM patch to add a host save area per core for the hypervisor and also for the microcode. The microcode area is not guaranteed to be compatible with the vmcb layout, therefore will require it''s own "scratch pad". Consolidate the per core areas into a single structure. Applies cleanly to 10002. Please apply to xen-unstable.hg. Please apply to xen-3.0-testing.hg.
2009 Sep 29
4
How can I avoid a for-loop through sapply or lapply ?
Through converting a miRNAs file from FASTA to character format I get a vector which looks like the following: > nml [1] "hsa-let-7a MIMAT0000062 Homo sapiens let-7a" [2] "hsa-let-7b MIMAT0000063 Homo sapiens let-7b" [3] "hsa-let-7c MIMAT0000064 Homo sapiens let-7c" [4] "hsa-let-7d MIMAT0000065 Homo sapiens
2017 Jun 15
2
Implementing cross-thread reduction in the AMDGPU backend
On 06/14/2017 05:05 PM, Connor Abbott wrote: > On Tue, Jun 13, 2017 at 6:13 PM, Tom Stellard <tstellar at redhat.com> wrote: >> On 06/13/2017 07:33 PM, Matt Arsenault wrote: >>> >>>> On Jun 12, 2017, at 17:23, Tom Stellard <tstellar at redhat.com <mailto:tstellar at redhat.com>> wrote: >>>> >>>> On 06/12/2017 08:03 PM, Connor
2013 Jan 27
2
rpart
Hi, When I look at the summary of an rpart object run on my data, I get 7 nodes but when I plot the rpart object, I get only 3 nodes. Should the number of nodes not match in the results of the 2 functions (summary and plot) or it is not always the same? Look forward to your reply, Carol -------------------------------------------- ?summary(rpart.res) Call: rpart(formula = mydata$class ~ ., data
2008 Nov 06
1
replacing values in a vector
Hello list. I have a vector of values: eg > head(diff_mirs_list) [1] "hsa-miR-26b" "hsa-miR-26b" "hsa-miR-23a" "hsa-miR-27b" "hsa-miR-29a" [6] "hsa-miR-29b" and I would like to conditionally replace each value in this vector with a number defined in a dataframe: > fc ???????????? Probe ave.fc 1?????? hsa-let-7a?? 1.28 2?????
2010 May 27
3
how to extract the 1st field from a vector of strings
I have the following vector of strings (shown only the first 3 elements) > desc[1:3] [1] "hsa-let-7a MIMAT0000062 Homo sapiens let-7a" [2] "hsa-let-7a* MIMAT0004481 Homo sapiens let-7a*" [3] "hsa-let-7a-2* MIMAT0010195 Homo sapiens let-7a-2*" > is.vector(desc) [1] TRUE > A <- unlist(strsplit(desc[1:3], " ")) > A [1]
2017 Dec 05
3
[AMDGPU] Strange results with different address spaces
Hi dev list, I am currently exploring the integration of AMDGPU/ROCm into the PACXX project and observing some strange behavior of the AMDGPU backend. The following IR is generated for a simple address space test that copies from global to shared memory and back to global after a barrier synchronization. Here is the IR is attached as as1.ll The output is as follows: 0 0 0 0 0 0 0 0 0 0 0 0 0
2017 Dec 06
2
[AMDGPU] Strange results with different address spaces
> On Dec 6, 2017, at 02:28, Haidl, Michael <michael.haidl at uni-muenster.de> wrote: > > The IR goes through a backend agnostic preparation phase that brings it into SSA from and changes the AS from 0 to 1. This sounds possibly problematic to me. The IR should be created with the correct address space to begin with. Changing this in the middle sounds suspect. > After this
2011 Sep 30
1
Hi
Hi, There is a question that I am confused. I have a set of data like this: hsa-miR-205--GATA3 0.797882767 1.08E-13 hsa-miR-205--ITGB4 0.750217593 1.85E-11 hsa-miR-187--PGF 0.797604155 3.24E-11 hsa-miR-205--SERPINB5 0.744124886 3.28E-11 hsa-miR-205--PBX1 0.734487224 7.89E-11 hsa-miR-205--MCC 0.72499934 1.80E-10 hsa-miR-205--WNT5B 0.717705259 3.33E-10 hsa-miR-200c--PKN2 0.721746815
2017 Jun 14
5
Implementing cross-thread reduction in the AMDGPU backend
On 06/13/2017 07:33 PM, Matt Arsenault wrote: > >> On Jun 12, 2017, at 17:23, Tom Stellard <tstellar at redhat.com <mailto:tstellar at redhat.com>> wrote: >> >> On 06/12/2017 08:03 PM, Connor Abbott wrote: >>> On Mon, Jun 12, 2017 at 4:56 PM, Tom Stellard <tstellar at redhat.com <mailto:tstellar at redhat.com>> wrote: >>>> On
2011 Aug 15
3
Plot from function
*I have the following function:* /plot_mi_time = function(mdata, miname) { mdata2 = mdata[row.names(hakat) == miname, ] print(mdata2) xcoords <- c(1,1,2,2,3,3,4,4,5,5,6,6) plot(c(xcoords), mdata2, xaxt="n", ylab="Expression", xlab="Time(h)", , main=miname) axis(1, at=xcoords,
2017 Jun 15
1
Implementing cross-thread reduction in the AMDGPU backend
I'm wondering about the focus on bound_cntl. Any cleared bit in the row_mask or bank_mask will also disable updating the result. Brian -----Original Message----- From: Connor Abbott [mailto:cwabbott0 at gmail.com] Sent: Wednesday, June 14, 2017 6:13 PM To: tstellar at redhat.com Cc: Matt Arsenault; llvm-dev at lists.llvm.org; Kolton, Sam; Sumner, Brian; Pykhtin, Valery Subject: Re:
2012 Nov 01
1
GraphNEL object retrieve edgenumber from acc() or is it list of lists?
Hello everyone, Im working with graphNEL object and want to extract all the nodes which have adjacent nodes with at least 20 nodes in between them. acc(graph, graphnodes) obviously provides a list for the accessable nodes of every node from a node and a number of the edges between them. Like this: $`hsa:8379` hsa:100131844 hsa:10393 hsa:246184 hsa:29882 hsa:29945 1