thr3ads.net - similar to: "CUDA fixed VA allocations and sparse mappings"

Displaying 20 results from an estimated 2000 matches similar to: "CUDA fixed VA allocations and sparse mappings"

CUDA fixed VA allocations and sparse mappings

2015 Jul 07

CUDA fixed VA allocations and sparse mappings

On Tue, Jul 07, 2015 at 11:29:38AM -0400, Ilia Mirkin wrote: > On Mon, Jul 6, 2015 at 8:42 PM, Andrew Chew <achew at nvidia.com> wrote: > > Hello, > > > > I am currently looking into ways to support fixed virtual address allocations > > and sparse mappings in nouveau, as a step towards supporting CUDA. > > > > CUDA requires that the GPU virtual address

CUDA fixed VA allocations and sparse mappings

2015 Jul 08

CUDA fixed VA allocations and sparse mappings

On Wed, Jul 08, 2015 at 10:37:34AM +1000, Ben Skeggs wrote: > On 8 July 2015 at 10:31, Andrew Chew <achew at nvidia.com> wrote: > > On Wed, Jul 08, 2015 at 10:18:36AM +1000, Ben Skeggs wrote: > >> > There's some minimal state that needs to be mapped into GPU address space. > >> > One thing that comes to mind are pushbuffers, which are needed to submit

[drm-nouveau-mmu] question about potential NULL pointer dereference

2018 Feb 13

[drm-nouveau-mmu] question about potential NULL pointer dereference

Hi all, While doing some static analysis I ran into the following piece of code at drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c:957: 957#define node(root, dir) ((root)->head.dir == &vmm->list) ? NULL : \ 958 list_entry((root)->head.dir, struct nvkm_vma, head) 959 960void 961nvkm_vmm_unmap_region(struct nvkm_vmm *vmm, struct nvkm_vma *vma) 962{

[PATCH 0/6] map big page by platform IOMMU

2015 Apr 16

[PATCH 0/6] map big page by platform IOMMU

Hi, Generally the the imported buffers which has memory type TTM_PL_TT are mapped as small pages probably due to lack of big page allocation. But the platform device which also use memory type TTM_PL_TT, like GK20A, can *allocate* big page though the IOMMU hardware inside the SoC. This is a try to map the imported buffers as big pages in GMMU by the platform IOMMU. With some preparation work to

CUDA fixed VA allocations and sparse mappings

2015 Jul 08

CUDA fixed VA allocations and sparse mappings

On Tue, Jul 07, 2015 at 08:13:28PM -0400, Ilia Mirkin wrote: > On Tue, Jul 7, 2015 at 8:11 PM, C Bergström <cbergstrom at pathscale.com> wrote: > > On Wed, Jul 8, 2015 at 7:08 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote: > >> On Tue, Jul 7, 2015 at 8:07 PM, C Bergström <cbergstrom at pathscale.com> wrote: > >>> On Wed, Jul 8, 2015 at 6:58 AM, Ben

[RFC PATCH 00/13] SVM (share virtual memory) with HMM in nouveau

2018 Mar 10

[RFC PATCH 00/13] SVM (share virtual memory) with HMM in nouveau

From: Jérôme Glisse <jglisse at redhat.com> (mm is cced just to allow exposure of device driver work without ccing a long list of peoples. I do not think there is anything usefull to discuss from mm point of view but i might be wrong, so just for the curious :)). git://people.freedesktop.org/~glisse/linux branch: nouveau-hmm-v00

[LLVMdev] [PROPOSAL] LLVM multi-module support

2012 Jul 26

[LLVMdev] [PROPOSAL] LLVM multi-module support

Hi, a couple of weeks ago I discussed with Peter how to improve LLVM's support for heterogeneous computing. One weakness we (and others) have seen is the absence of multi-module support in LLVM. Peter came up with a nice idea how to improve here. I would like to put this idea up for discussion. ## The problem ## LLVM-IR modules can currently only contain code for a single target

[PATCH 3/6] mmu: map small pages into big pages(s) by IOMMU if possible

2015 Apr 20

[PATCH 3/6] mmu: map small pages into big pages(s) by IOMMU if possible

On Sat, Apr 18, 2015 at 12:37 AM, Terje Bergstrom <tbergstrom at nvidia.com> wrote: > > On 04/17/2015 02:11 AM, Alexandre Courbot wrote: >> >> Tracking the PDE and PTE of each memory chunk can probably be avoided >> if you change your unmapping strategy. Currently you are going through >> the list of nvkm_vm_bp_list, but you know your PDE and PTE are always

[PATCH 3/6] mmu: map small pages into big pages(s) by IOMMU if possible

2015 Apr 17

[PATCH 3/6] mmu: map small pages into big pages(s) by IOMMU if possible

On Thu, Apr 16, 2015 at 8:06 PM, Vince Hsu <vinceh at nvidia.com> wrote: > This patch implements a way to aggregate the small pages and make them be > mapped as big page(s) by utilizing the platform IOMMU if supported. And then > we can enable compression support for these big pages later. > > Signed-off-by: Vince Hsu <vinceh at nvidia.com> > --- >

[RESEND PATCH 0/3] nouveau: fixes for SVM

2020 Jun 22

[RESEND PATCH 0/3] nouveau: fixes for SVM

These are based on 5.8.0-rc2 and intended for Ben Skeggs' nouveau tree. I believe the changes can be queued for 5.8-rcX after being reviewed. These were part of a larger series but I'm resending them separately as suggested by Jason Gunthorpe. https://lore.kernel.org/linux-mm/20200619215649.32297-1-rcampbell at nvidia.com/ Note that in order to exercise/test patch 2 here, you will need a

[LLVMdev] [PROPOSAL] LLVM multi-module support

2012 Jul 26

[LLVMdev] [PROPOSAL] LLVM multi-module support

In our project we combine regular binary code and LLVM IR code for kernels, embedded as a special data symbol of ELF object. The LLVM IR for kernel existing at compile-time is preliminary, and may be optimized further during runtime (pointers analysis, polly, etc.). During application startup, runtime system builds an index of all kernels sources embedded into the executable. Host and kernel code

[LLVMdev] [PROPOSAL] LLVM multi-module support

2012 Jul 26

[LLVMdev] [PROPOSAL] LLVM multi-module support

Hi Tobias, I didn't really get it. Is the idea that the same bitcode is going to be codegen'd for different architectures, or is each sub-module going to contain different bitcode? In the later case you may as well just use multiple modules, perhaps in conjunction with a scheme to store more than one module in the same file on disk as a convenience. Ciao, Duncan. > a couple of weeks

Proof of concept for GPU forwarding for Linux guest on Linux host.

2019 Apr 04

Proof of concept for GPU forwarding for Linux guest on Linux host.

Hi, This is a proof of concept of GPU forwarding for Linux guest on Linux host. I'd like to get comments and suggestions from community before I put more time on it. To summarize what it is: 1. It's a solution to bring GPU acceleration for Linux vm guest on Linux host. It could works with different GPU although the current proof of concept only works with Intel GPU. 2. The basic idea

CUDA fixed VA allocations and sparse mappings

2015 Jul 08

CUDA fixed VA allocations and sparse mappings

On Wed, Jul 08, 2015 at 10:18:36AM +1000, Ben Skeggs wrote: > > There's some minimal state that needs to be mapped into GPU address space. > > One thing that comes to mind are pushbuffers, which are needed to submit > > stuff to any engine. > I guess you can probably use the start of the kernel's address space > carveout for these kind of mappings actually?

NVPTX compilation problems - ptxas error

2016 Jun 07

NVPTX compilation problems - ptxas error

Hello everybody, i am currently testing the NVPTX back-end and playing around with the IR it generates. Unfortunately i have come to an compilation error i cannot solve on my own. Maybe someone reading this knows what is causing the trouble and has a possible solution. I am using Ubuntu 16.04, Cuda 7.5 and clang version 3.9.0 (https://github.com/llvm-mirror/clang.git

[CUDA] Lost debug information when compiling CUDA code

2017 Jun 14

[CUDA] Lost debug information when compiling CUDA code

Hi, I needed to debug some CUDA code in my project; however, although I used -g when compiling the source code, no source-level information is available in cuda-gdb or cuda-memcheck. Specifically, below is what I did: 1) For a CUDA file a.cu, generate IR files: clang++ -g -emit-llvm --cuda-gpu-arch=sm_35 -c a.cu; 2) Instrument the device code a-cuda-nvptx64-nvidia-cuda-sm_35.bc (generated

cuda cross compiling issue for target aarch64-linux-androideabi

2018 Mar 23

cuda cross compiling issue for target aarch64-linux-androideabi

I was wondering if anyone has encountered this issue when cross compiling cuda on Nvidia TX2 running android. The error is In file included from <built-in>:1: In file included from prebuilts/clang/host/linux-x86/clang-4667116/lib64/clang/7.0.1/include/__clang_cuda_runtime_wrapper.h:219: ../cuda/targets/aarch64-linux-androideabi/include/math_functions.hpp:3477:19: error: no matching function

CUDA compilation "No available targets are compatible with this triple." problem

2017 Aug 02

CUDA compilation "No available targets are compatible with this triple." problem

Yes, I followed the guide. The same error showed up: >clang++ axpy.cu -o axpy --cuda-gpu-arch=sm_35 -L/usr/local/cuda/lib64 -I/usr/local/cuda/include -lcudart_static -ldl -lrt -pthread error: unable to create target: 'No available targets are compatible with this triple.' ________________________________ From: Kevin Choi <code.kchoi at gmail.com> Sent: Wednesday, August 2,

Status of CUDA 11 support

2020 Jul 30

Status of CUDA 11 support

Hi, I work in a large CUDA codebase and use Clang to build some of our CUDA code to improve compilation speed. We're planning to upgrade to CUDA 11 soon, and it appears that CUDA 11 is not yet supported in LLVM. >From the LLVM commits history, I can see that work on CUDA 11 has started. Is this currently being worked on? What is the remaining work left? And is any help needed to finish

CUDA tools?

2017 Oct 05

CUDA tools?

vychytraly . wrote: > On Thu, Oct 5, 2017 at 9:51 PM, <m.roth at 5-cent.us> wrote: >> >> So, kmod-nvidia installed. Trouble is, I have no tool to test it. And my >> user might need nvcc, which, of course, is only provided by the NVidia >> CUDA, which won't install, because it conflicts with kmod-nvidia. >> >> Has *anyone* dealt with this? If so,

similar to: CUDA fixed VA allocations and sparse mappings