Alexandre Courbot
2015-Feb-11 07:21 UTC
[Nouveau] [PATCH v2 0/6] nouveau/gk20a: RAM device removal & IOMMU support
Changes since v1: - Add missing else condition in ltc - Remove extra flags that slipped into nouveau_display.c and nv84_fence.c. Original cover letter: Patches 1-3 make the presence of a RAM device optional, and remove GK20A's dummy RAM driver we were using so far. On chips using shared memory, such a device can confuse the driver into moving objects where there is no need to, and can trick user-space into believing it can allocate "video" memory that does not exist. By making it possible to run Nouveau without a RAM device and systematically returning errors when VRAM allocations are attempted, we force user-space to do the right thing and always employ the optimal path. Contiguous memory allocation for GK20A is now handled directly by a custom instmem driver. The remaining patches are not related to the RAM device removal, but since they touch code that has been moved by patch 2 I took the freedom to include them in this series. Patch 4 is a little improvement for GK20A's instmem implementation, which suppresses the permanent and unneeded CPU mapping created by the DMA API, and frees up some CPU virtual address space. Patches 5 and 6 implement initial IOMMU support for GK20A. On top of the GPU MMU, GK20A also has an independent IOMMU that stands between the GPU and the system RAM. Whether RAM accesses are performed directly or using the IOMMU is determined by bit 34 of each address. If a IOMMU is present, GK20A's instmem takes advantage of it to make unrelated pages of memory appear contiguous to the GPU instead of using the DMA API. Another benefit of the IOMMU is that it can be used by custom VM implementation to make GPU objects allocated via TTM appear contiguous in the IOMMU space, allowing us to maximize the use of large pages and improve performance, but that part will come once the basic support is agreed on and merged. All in all this series should be largely unintrusive for non-Tegra GPUs, with only patch 1 changing common code parts, in a way that looks safe. Alexandre Courbot (6): make RAM device optional instmem/gk20a: move memory allocation to instmem gk20a: remove RAM device instmem/gk20a: use DMA attributes platform: probe IOMMU if present instmem/gk20a: add IOMMU support drm/nouveau/include/nvkm/subdev/instmem.h | 1 + drm/nouveau/nouveau_display.c | 8 +- drm/nouveau/nouveau_platform.c | 75 ++++- drm/nouveau/nouveau_platform.h | 18 ++ drm/nouveau/nouveau_ttm.c | 3 + drm/nouveau/nv84_fence.c | 14 +- drm/nouveau/nvkm/engine/device/base.c | 9 +- drm/nouveau/nvkm/engine/device/gk104.c | 2 +- drm/nouveau/nvkm/subdev/clk/base.c | 2 +- drm/nouveau/nvkm/subdev/fb/Kbuild | 1 - drm/nouveau/nvkm/subdev/fb/base.c | 26 +- drm/nouveau/nvkm/subdev/fb/gk20a.c | 1 - drm/nouveau/nvkm/subdev/fb/priv.h | 1 - drm/nouveau/nvkm/subdev/fb/ramgk20a.c | 149 ---------- drm/nouveau/nvkm/subdev/instmem/Kbuild | 1 + drm/nouveau/nvkm/subdev/instmem/gk20a.c | 438 ++++++++++++++++++++++++++++++ drm/nouveau/nvkm/subdev/ltc/gf100.c | 15 +- lib/include/nvif/os.h | 63 +++++ 18 files changed, 653 insertions(+), 174 deletions(-) delete mode 100644 drm/nouveau/nvkm/subdev/fb/ramgk20a.c create mode 100644 drm/nouveau/nvkm/subdev/instmem/gk20a.c -- 2.3.0
Having a RAM device does not make sense for chips like GK20A which have no dedicated video memory. The dummy RAM device that we used so far works as a temporary band-aid, but in the long-term it is desirable for the driver to be able to work without any kind of VRAM. This patch adds a few conditionals in places where a RAM device was assumed to be present and allows some more objects to be allocated from the TT domain, allowing Nouveau to handle GPUs for which pfb->ram == NULL. Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> --- drm/nouveau/nouveau_display.c | 8 +++++++- drm/nouveau/nouveau_ttm.c | 3 +++ drm/nouveau/nv84_fence.c | 14 +++++++++++--- drm/nouveau/nvkm/engine/device/base.c | 9 ++++++--- drm/nouveau/nvkm/subdev/clk/base.c | 2 +- drm/nouveau/nvkm/subdev/fb/base.c | 26 ++++++++++++++++++-------- drm/nouveau/nvkm/subdev/ltc/gf100.c | 15 +++++++++++---- 7 files changed, 57 insertions(+), 20 deletions(-) diff --git a/drm/nouveau/nouveau_display.c b/drm/nouveau/nouveau_display.c index 860b0e2..68ee0af 100644 --- a/drm/nouveau/nouveau_display.c +++ b/drm/nouveau/nouveau_display.c @@ -869,13 +869,19 @@ nouveau_display_dumb_create(struct drm_file *file_priv, struct drm_device *dev, struct drm_mode_create_dumb *args) { struct nouveau_bo *bo; + uint32_t domain; int ret; args->pitch = roundup(args->width * (args->bpp / 8), 256); args->size = args->pitch * args->height; args->size = roundup(args->size, PAGE_SIZE); - ret = nouveau_gem_new(dev, args->size, 0, NOUVEAU_GEM_DOMAIN_VRAM, 0, 0, &bo); + if (nvxx_fb(&nouveau_drm(dev)->device)->ram) + domain = NOUVEAU_GEM_DOMAIN_VRAM; + else + domain = NOUVEAU_GEM_DOMAIN_GART; + + ret = nouveau_gem_new(dev, args->size, 0, domain, 0, 0, &bo); if (ret) return ret; diff --git a/drm/nouveau/nouveau_ttm.c b/drm/nouveau/nouveau_ttm.c index 273e501..a3c2e9b 100644 --- a/drm/nouveau/nouveau_ttm.c +++ b/drm/nouveau/nouveau_ttm.c @@ -85,6 +85,9 @@ nouveau_vram_manager_new(struct ttm_mem_type_manager *man, if (nvbo->tile_flags & NOUVEAU_GEM_TILE_NONCONTIG) size_nc = 1 << nvbo->page_shift; + if (!pfb->ram) + return -ENOMEM; + ret = pfb->ram->get(pfb, mem->num_pages << PAGE_SHIFT, mem->page_alignment << PAGE_SHIFT, size_nc, (nvbo->tile_flags >> 8) & 0x3ff, &node); diff --git a/drm/nouveau/nv84_fence.c b/drm/nouveau/nv84_fence.c index bf429ca..b981f85 100644 --- a/drm/nouveau/nv84_fence.c +++ b/drm/nouveau/nv84_fence.c @@ -215,6 +215,7 @@ nv84_fence_create(struct nouveau_drm *drm) { struct nvkm_fifo *pfifo = nvxx_fifo(&drm->device); struct nv84_fence_priv *priv; + u32 domain; int ret; priv = drm->fence = kzalloc(sizeof(*priv), GFP_KERNEL); @@ -231,10 +232,17 @@ nv84_fence_create(struct nouveau_drm *drm) priv->base.context_base = fence_context_alloc(priv->base.contexts); priv->base.uevent = true; - ret = nouveau_bo_new(drm->dev, 16 * priv->base.contexts, 0, - TTM_PL_FLAG_VRAM, 0, 0, NULL, NULL, &priv->bo); + domain = nvxx_fb(&drm->device)->ram ? + TTM_PL_FLAG_VRAM : + /* + * fences created in TT must be coherent or we will + * wait on old CPU cache values! + */ + TTM_PL_FLAG_TT | TTM_PL_FLAG_UNCACHED; + ret = nouveau_bo_new(drm->dev, 16 * priv->base.contexts, 0, domain, 0, + 0, NULL, NULL, &priv->bo); if (ret == 0) { - ret = nouveau_bo_pin(priv->bo, TTM_PL_FLAG_VRAM, false); + ret = nouveau_bo_pin(priv->bo, domain, false); if (ret == 0) { ret = nouveau_bo_map(priv->bo); if (ret) diff --git a/drm/nouveau/nvkm/engine/device/base.c b/drm/nouveau/nvkm/engine/device/base.c index 6efa8f3..48f8537 100644 --- a/drm/nouveau/nvkm/engine/device/base.c +++ b/drm/nouveau/nvkm/engine/device/base.c @@ -139,9 +139,12 @@ nvkm_devobj_info(struct nvkm_object *object, void *data, u32 size) args->v0.chipset = device->chipset; args->v0.revision = device->chiprev; - if (pfb) args->v0.ram_size = args->v0.ram_user = pfb->ram->size; - else args->v0.ram_size = args->v0.ram_user = 0; - if (imem) args->v0.ram_user = args->v0.ram_user - imem->reserved; + if (pfb && pfb->ram) + args->v0.ram_size = args->v0.ram_user = pfb->ram->size; + else + args->v0.ram_size = args->v0.ram_user = 0; + if (imem) + args->v0.ram_user = args->v0.ram_user - imem->reserved; return 0; } diff --git a/drm/nouveau/nvkm/subdev/clk/base.c b/drm/nouveau/nvkm/subdev/clk/base.c index b24a9cc..39a83d8 100644 --- a/drm/nouveau/nvkm/subdev/clk/base.c +++ b/drm/nouveau/nvkm/subdev/clk/base.c @@ -184,7 +184,7 @@ nvkm_pstate_prog(struct nvkm_clk *clk, int pstatei) nv_debug(clk, "setting performance state %d\n", pstatei); clk->pstate = pstatei; - if (pfb->ram->calc) { + if (pfb->ram && pfb->ram->calc) { int khz = pstate->base.domain[nv_clk_src_mem]; do { ret = pfb->ram->calc(pfb, khz); diff --git a/drm/nouveau/nvkm/subdev/fb/base.c b/drm/nouveau/nvkm/subdev/fb/base.c index 16589fa..61fde43 100644 --- a/drm/nouveau/nvkm/subdev/fb/base.c +++ b/drm/nouveau/nvkm/subdev/fb/base.c @@ -55,9 +55,11 @@ _nvkm_fb_fini(struct nvkm_object *object, bool suspend) struct nvkm_fb *pfb = (void *)object; int ret; - ret = nv_ofuncs(pfb->ram)->fini(nv_object(pfb->ram), suspend); - if (ret && suspend) - return ret; + if (pfb->ram) { + ret = nv_ofuncs(pfb->ram)->fini(nv_object(pfb->ram), suspend); + if (ret && suspend) + return ret; + } return nvkm_subdev_fini(&pfb->base, suspend); } @@ -72,9 +74,11 @@ _nvkm_fb_init(struct nvkm_object *object) if (ret) return ret; - ret = nv_ofuncs(pfb->ram)->init(nv_object(pfb->ram)); - if (ret) - return ret; + if (pfb->ram) { + ret = nv_ofuncs(pfb->ram)->init(nv_object(pfb->ram)); + if (ret) + return ret; + } for (i = 0; i < pfb->tile.regions; i++) pfb->tile.prog(pfb, i, &pfb->tile.region[i]); @@ -91,9 +95,12 @@ _nvkm_fb_dtor(struct nvkm_object *object) for (i = 0; i < pfb->tile.regions; i++) pfb->tile.fini(pfb, i, &pfb->tile.region[i]); nvkm_mm_fini(&pfb->tags); - nvkm_mm_fini(&pfb->vram); - nvkm_object_ref(NULL, (struct nvkm_object **)&pfb->ram); + if (pfb->ram) { + nvkm_mm_fini(&pfb->vram); + nvkm_object_ref(NULL, (struct nvkm_object **)&pfb->ram); + } + nvkm_subdev_destroy(&pfb->base); } @@ -127,6 +134,9 @@ nvkm_fb_create_(struct nvkm_object *parent, struct nvkm_object *engine, pfb->memtype_valid = impl->memtype; + if (!impl->ram) + return 0; + ret = nvkm_object_ctor(nv_object(pfb), NULL, impl->ram, NULL, 0, &ram); if (ret) { nv_fatal(pfb, "error detecting memory configuration!!\n"); diff --git a/drm/nouveau/nvkm/subdev/ltc/gf100.c b/drm/nouveau/nvkm/subdev/ltc/gf100.c index 8e7cc62..7ae8e91 100644 --- a/drm/nouveau/nvkm/subdev/ltc/gf100.c +++ b/drm/nouveau/nvkm/subdev/ltc/gf100.c @@ -136,7 +136,8 @@ gf100_ltc_dtor(struct nvkm_object *object) struct nvkm_ltc_priv *priv = (void *)object; nvkm_mm_fini(&priv->tags); - nvkm_mm_free(&pfb->vram, &priv->tag_ram); + if (pfb->ram) + nvkm_mm_free(&pfb->vram, &priv->tag_ram); nvkm_ltc_destroy(priv); } @@ -150,7 +151,10 @@ gf100_ltc_init_tag_ram(struct nvkm_fb *pfb, struct nvkm_ltc_priv *priv) int ret; /* tags for 1/4 of VRAM should be enough (8192/4 per GiB of VRAM) */ - priv->num_tags = (pfb->ram->size >> 17) / 4; + if (pfb->ram) + priv->num_tags = (pfb->ram->size >> 17) / 4; + else + priv->num_tags = (1 << 17); if (priv->num_tags > (1 << 17)) priv->num_tags = 1 << 17; /* we have 17 bits in PTE */ priv->num_tags = (priv->num_tags + 63) & ~63; /* round up to 64 */ @@ -170,8 +174,11 @@ gf100_ltc_init_tag_ram(struct nvkm_fb *pfb, struct nvkm_ltc_priv *priv) tag_size += tag_align; tag_size = (tag_size + 0xfff) >> 12; /* round up */ - ret = nvkm_mm_tail(&pfb->vram, 1, 1, tag_size, tag_size, 1, - &priv->tag_ram); + if (pfb->ram) + ret = nvkm_mm_tail(&pfb->vram, 1, 1, tag_size, tag_size, 1, + &priv->tag_ram); + else + ret = -1; if (ret) { priv->num_tags = 0; } else { -- 2.3.0
Alexandre Courbot
2015-Feb-11 07:21 UTC
[Nouveau] [PATCH v2 2/6] instmem/gk20a: move memory allocation to instmem
GK20A does not have dedicated RAM, thus having a RAM device for it does not make sense. Move the contiguous physical memory allocation to instmem. Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> --- drm/nouveau/include/nvkm/subdev/instmem.h | 1 + drm/nouveau/nvkm/engine/device/gk104.c | 2 +- drm/nouveau/nvkm/subdev/fb/ramgk20a.c | 86 +----------- drm/nouveau/nvkm/subdev/instmem/Kbuild | 1 + drm/nouveau/nvkm/subdev/instmem/gk20a.c | 212 ++++++++++++++++++++++++++++++ 5 files changed, 217 insertions(+), 85 deletions(-) create mode 100644 drm/nouveau/nvkm/subdev/instmem/gk20a.c diff --git a/drm/nouveau/include/nvkm/subdev/instmem.h b/drm/nouveau/include/nvkm/subdev/instmem.h index d104c1a..1bcb763 100644 --- a/drm/nouveau/include/nvkm/subdev/instmem.h +++ b/drm/nouveau/include/nvkm/subdev/instmem.h @@ -45,4 +45,5 @@ nvkm_instmem(void *obj) extern struct nvkm_oclass *nv04_instmem_oclass; extern struct nvkm_oclass *nv40_instmem_oclass; extern struct nvkm_oclass *nv50_instmem_oclass; +extern struct nvkm_oclass *gk20a_instmem_oclass; #endif diff --git a/drm/nouveau/nvkm/engine/device/gk104.c b/drm/nouveau/nvkm/engine/device/gk104.c index bf58934..8f266a9 100644 --- a/drm/nouveau/nvkm/engine/device/gk104.c +++ b/drm/nouveau/nvkm/engine/device/gk104.c @@ -171,7 +171,7 @@ gk104_identify(struct nvkm_device *device) device->oclass[NVDEV_SUBDEV_FB ] = gk20a_fb_oclass; device->oclass[NVDEV_SUBDEV_LTC ] = gk104_ltc_oclass; device->oclass[NVDEV_SUBDEV_IBUS ] = &gk20a_ibus_oclass; - device->oclass[NVDEV_SUBDEV_INSTMEM] = nv50_instmem_oclass; + device->oclass[NVDEV_SUBDEV_INSTMEM] = gk20a_instmem_oclass; device->oclass[NVDEV_SUBDEV_MMU ] = &gf100_mmu_oclass; device->oclass[NVDEV_SUBDEV_BAR ] = &gk20a_bar_oclass; device->oclass[NVDEV_ENGINE_DMAOBJ ] = gf110_dmaeng_oclass; diff --git a/drm/nouveau/nvkm/subdev/fb/ramgk20a.c b/drm/nouveau/nvkm/subdev/fb/ramgk20a.c index 5f30db1..60d8e1c 100644 --- a/drm/nouveau/nvkm/subdev/fb/ramgk20a.c +++ b/drm/nouveau/nvkm/subdev/fb/ramgk20a.c @@ -23,99 +23,17 @@ #include <core/device.h> -struct gk20a_mem { - struct nvkm_mem base; - void *cpuaddr; - dma_addr_t handle; -}; -#define to_gk20a_mem(m) container_of(m, struct gk20a_mem, base) - static void gk20a_ram_put(struct nvkm_fb *pfb, struct nvkm_mem **pmem) { - struct device *dev = nv_device_base(nv_device(pfb)); - struct gk20a_mem *mem = to_gk20a_mem(*pmem); - - *pmem = NULL; - if (unlikely(mem == NULL)) - return; - - if (likely(mem->cpuaddr)) - dma_free_coherent(dev, mem->base.size << PAGE_SHIFT, - mem->cpuaddr, mem->handle); - - kfree(mem->base.pages); - kfree(mem); + BUG(); } static int gk20a_ram_get(struct nvkm_fb *pfb, u64 size, u32 align, u32 ncmin, u32 memtype, struct nvkm_mem **pmem) { - struct device *dev = nv_device_base(nv_device(pfb)); - struct gk20a_mem *mem; - u32 type = memtype & 0xff; - u32 npages, order; - int i; - - nv_debug(pfb, "%s: size: %llx align: %x, ncmin: %x\n", __func__, size, - align, ncmin); - - npages = size >> PAGE_SHIFT; - if (npages == 0) - npages = 1; - - if (align == 0) - align = PAGE_SIZE; - align >>= PAGE_SHIFT; - - /* round alignment to the next power of 2, if needed */ - order = fls(align); - if ((align & (align - 1)) == 0) - order--; - align = BIT(order); - - /* ensure returned address is correctly aligned */ - npages = max(align, npages); - - mem = kzalloc(sizeof(*mem), GFP_KERNEL); - if (!mem) - return -ENOMEM; - - mem->base.size = npages; - mem->base.memtype = type; - - mem->base.pages = kzalloc(sizeof(dma_addr_t) * npages, GFP_KERNEL); - if (!mem->base.pages) { - kfree(mem); - return -ENOMEM; - } - - *pmem = &mem->base; - - mem->cpuaddr = dma_alloc_coherent(dev, npages << PAGE_SHIFT, - &mem->handle, GFP_KERNEL); - if (!mem->cpuaddr) { - nv_error(pfb, "%s: cannot allocate memory!\n", __func__); - gk20a_ram_put(pfb, pmem); - return -ENOMEM; - } - - align <<= PAGE_SHIFT; - - /* alignment check */ - if (unlikely(mem->handle & (align - 1))) - nv_warn(pfb, "memory not aligned as requested: %pad (0x%x)\n", - &mem->handle, align); - - nv_debug(pfb, "alloc size: 0x%x, align: 0x%x, paddr: %pad, vaddr: %p\n", - npages << PAGE_SHIFT, align, &mem->handle, mem->cpuaddr); - - for (i = 0; i < npages; i++) - mem->base.pages[i] = mem->handle + (PAGE_SIZE * i); - - mem->base.offset = (u64)mem->base.pages[0]; - return 0; + BUG(); } static int diff --git a/drm/nouveau/nvkm/subdev/instmem/Kbuild b/drm/nouveau/nvkm/subdev/instmem/Kbuild index e6f35ab..13bb7fc 100644 --- a/drm/nouveau/nvkm/subdev/instmem/Kbuild +++ b/drm/nouveau/nvkm/subdev/instmem/Kbuild @@ -2,3 +2,4 @@ nvkm-y += nvkm/subdev/instmem/base.o nvkm-y += nvkm/subdev/instmem/nv04.o nvkm-y += nvkm/subdev/instmem/nv40.o nvkm-y += nvkm/subdev/instmem/nv50.o +nvkm-y += nvkm/subdev/instmem/gk20a.o diff --git a/drm/nouveau/nvkm/subdev/instmem/gk20a.c b/drm/nouveau/nvkm/subdev/instmem/gk20a.c new file mode 100644 index 0000000..6176f50 --- /dev/null +++ b/drm/nouveau/nvkm/subdev/instmem/gk20a.c @@ -0,0 +1,212 @@ +/* + * Copyright (c) 2015, NVIDIA CORPORATION. All rights reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ + +#include <subdev/fb.h> +#include <core/mm.h> +#include <core/device.h> + +#include "priv.h" + +struct gk20a_instobj_priv { + struct nvkm_instobj base; + /* Must be second member here - see nouveau_gpuobj_map_vm() */ + struct nvkm_mem *mem; + /* Pointed by mem */ + struct nvkm_mem _mem; + void *cpuaddr; + dma_addr_t handle; + struct nvkm_mm_node r; +}; + +struct gk20a_instmem_priv { + struct nvkm_instmem base; + spinlock_t lock; + u64 addr; +}; + +static u32 +gk20a_instobj_rd32(struct nvkm_object *object, u64 offset) +{ + struct gk20a_instmem_priv *priv = (void *)nvkm_instmem(object); + struct gk20a_instobj_priv *node = (void *)object; + unsigned long flags; + u64 base = (node->mem->offset + offset) & 0xffffff00000ULL; + u64 addr = (node->mem->offset + offset) & 0x000000fffffULL; + u32 data; + + spin_lock_irqsave(&priv->lock, flags); + if (unlikely(priv->addr != base)) { + nv_wr32(priv, 0x001700, base >> 16); + priv->addr = base; + } + data = nv_rd32(priv, 0x700000 + addr); + spin_unlock_irqrestore(&priv->lock, flags); + return data; +} + +static void +gk20a_instobj_wr32(struct nvkm_object *object, u64 offset, u32 data) +{ + struct gk20a_instmem_priv *priv = (void *)nvkm_instmem(object); + struct gk20a_instobj_priv *node = (void *)object; + unsigned long flags; + u64 base = (node->mem->offset + offset) & 0xffffff00000ULL; + u64 addr = (node->mem->offset + offset) & 0x000000fffffULL; + + spin_lock_irqsave(&priv->lock, flags); + if (unlikely(priv->addr != base)) { + nv_wr32(priv, 0x001700, base >> 16); + priv->addr = base; + } + nv_wr32(priv, 0x700000 + addr, data); + spin_unlock_irqrestore(&priv->lock, flags); +} + +static void +gk20a_instobj_dtor(struct nvkm_object *object) +{ + struct gk20a_instobj_priv *node = (void *)object; + struct gk20a_instmem_priv *priv = (void *)nvkm_instmem(node); + struct device *dev = nv_device_base(nv_device(priv)); + + if (unlikely(!node->handle)) + return; + + dma_free_coherent(dev, node->mem->size << PAGE_SHIFT, node->cpuaddr, + node->handle); + + nvkm_instobj_destroy(&node->base); +} + +static int +gk20a_instobj_ctor(struct nvkm_object *parent, struct nvkm_object *engine, + struct nvkm_oclass *oclass, void *data, u32 _size, + struct nvkm_object **pobject) +{ + struct nvkm_instobj_args *args = data; + struct gk20a_instmem_priv *priv = (void *)nvkm_instmem(parent); + struct device *dev = nv_device_base(nv_device(priv)); + struct gk20a_instobj_priv *node; + u32 size, align; + u32 npages; + int ret; + + nv_debug(parent, "%s: size: %x align: %x\n", __func__, + args->size, args->align); + + size = max((args->size + 4095) & ~4095, (u32)4096); + align = max((args->align + 4095) & ~4095, (u32)4096); + + npages = size >> PAGE_SHIFT; + + ret = nvkm_instobj_create_(parent, engine, oclass, sizeof(*node), + (void **)&node); + *pobject = nv_object(node); + if (ret) + return ret; + + node->mem = &node->_mem; + + node->cpuaddr = dma_alloc_coherent(dev, npages << PAGE_SHIFT, + &node->handle, GFP_KERNEL); + if (!node->cpuaddr) { + nv_error(priv, "cannot allocate DMA memory\n"); + return -ENOMEM; + } + + /* alignment check */ + if (unlikely(node->handle & (align - 1))) + nv_warn(priv, "memory not aligned as requested: %pad (0x%x)\n", + &node->handle, align); + + node->mem->offset = node->handle; + node->mem->size = size >> 12; + node->mem->memtype = 0; + node->mem->page_shift = 12; + INIT_LIST_HEAD(&node->mem->regions); + + node->r.type = 12; + node->r.offset = node->handle >> 12; + node->r.length = npages; + list_add_tail(&node->r.rl_entry, &node->mem->regions); + + node->base.addr = node->mem->offset; + node->base.size = size; + + nv_debug(parent, "alloc size: 0x%x, align: 0x%x, gaddr: 0x%llx\n", + size, align, node->mem->offset); + + return 0; +} + +static struct nvkm_instobj_impl +gk20a_instobj_oclass = { + .base.ofuncs = &(struct nvkm_ofuncs) { + .ctor = gk20a_instobj_ctor, + .dtor = gk20a_instobj_dtor, + .init = _nvkm_instobj_init, + .fini = _nvkm_instobj_fini, + .rd32 = gk20a_instobj_rd32, + .wr32 = gk20a_instobj_wr32, + }, +}; + + + +static int +gk20a_instmem_fini(struct nvkm_object *object, bool suspend) +{ + struct gk20a_instmem_priv *priv = (void *)object; + priv->addr = ~0ULL; + return nvkm_instmem_fini(&priv->base, suspend); +} + +static int +gk20a_instmem_ctor(struct nvkm_object *parent, struct nvkm_object *engine, + struct nvkm_oclass *oclass, void *data, u32 size, + struct nvkm_object **pobject) +{ + struct gk20a_instmem_priv *priv; + int ret; + + ret = nvkm_instmem_create(parent, engine, oclass, &priv); + *pobject = nv_object(priv); + if (ret) + return ret; + + spin_lock_init(&priv->lock); + + return 0; +} + +struct nvkm_oclass * +gk20a_instmem_oclass = &(struct nvkm_instmem_impl) { + .base.handle = NV_SUBDEV(INSTMEM, 0xea), + .base.ofuncs = &(struct nvkm_ofuncs) { + .ctor = gk20a_instmem_ctor, + .dtor = _nvkm_instmem_dtor, + .init = _nvkm_instmem_init, + .fini = gk20a_instmem_fini, + }, + .instobj = &gk20a_instobj_oclass.base, +}.base; + -- 2.3.0
Now that Nouveau can operate even when there is no RAM device, remove the dummy one used by GK20A. Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> --- drm/nouveau/nvkm/subdev/fb/Kbuild | 1 - drm/nouveau/nvkm/subdev/fb/gk20a.c | 1 - drm/nouveau/nvkm/subdev/fb/priv.h | 1 - drm/nouveau/nvkm/subdev/fb/ramgk20a.c | 67 ----------------------------------- 4 files changed, 70 deletions(-) delete mode 100644 drm/nouveau/nvkm/subdev/fb/ramgk20a.c diff --git a/drm/nouveau/nvkm/subdev/fb/Kbuild b/drm/nouveau/nvkm/subdev/fb/Kbuild index 904d601..d6be4c6 100644 --- a/drm/nouveau/nvkm/subdev/fb/Kbuild +++ b/drm/nouveau/nvkm/subdev/fb/Kbuild @@ -37,7 +37,6 @@ nvkm-y += nvkm/subdev/fb/ramgt215.o nvkm-y += nvkm/subdev/fb/rammcp77.o nvkm-y += nvkm/subdev/fb/ramgf100.o nvkm-y += nvkm/subdev/fb/ramgk104.o -nvkm-y += nvkm/subdev/fb/ramgk20a.o nvkm-y += nvkm/subdev/fb/ramgm107.o nvkm-y += nvkm/subdev/fb/sddr2.o nvkm-y += nvkm/subdev/fb/sddr3.o diff --git a/drm/nouveau/nvkm/subdev/fb/gk20a.c b/drm/nouveau/nvkm/subdev/fb/gk20a.c index 6762847..a5d7857 100644 --- a/drm/nouveau/nvkm/subdev/fb/gk20a.c +++ b/drm/nouveau/nvkm/subdev/fb/gk20a.c @@ -65,5 +65,4 @@ gk20a_fb_oclass = &(struct nvkm_fb_impl) { .fini = _nvkm_fb_fini, }, .memtype = gf100_fb_memtype_valid, - .ram = &gk20a_ram_oclass, }.base; diff --git a/drm/nouveau/nvkm/subdev/fb/priv.h b/drm/nouveau/nvkm/subdev/fb/priv.h index d82da02..485c4b6 100644 --- a/drm/nouveau/nvkm/subdev/fb/priv.h +++ b/drm/nouveau/nvkm/subdev/fb/priv.h @@ -32,7 +32,6 @@ extern struct nvkm_oclass gt215_ram_oclass; extern struct nvkm_oclass mcp77_ram_oclass; extern struct nvkm_oclass gf100_ram_oclass; extern struct nvkm_oclass gk104_ram_oclass; -extern struct nvkm_oclass gk20a_ram_oclass; extern struct nvkm_oclass gm107_ram_oclass; int nvkm_sddr2_calc(struct nvkm_ram *ram); diff --git a/drm/nouveau/nvkm/subdev/fb/ramgk20a.c b/drm/nouveau/nvkm/subdev/fb/ramgk20a.c deleted file mode 100644 index 60d8e1c..0000000 --- a/drm/nouveau/nvkm/subdev/fb/ramgk20a.c +++ /dev/null @@ -1,67 +0,0 @@ -/* - * Copyright (c) 2014, NVIDIA CORPORATION. All rights reserved. - * - * Permission is hereby granted, free of charge, to any person obtaining a - * copy of this software and associated documentation files (the "Software"), - * to deal in the Software without restriction, including without limitation - * the rights to use, copy, modify, merge, publish, distribute, sublicense, - * and/or sell copies of the Software, and to permit persons to whom the - * Software is furnished to do so, subject to the following conditions: - * - * The above copyright notice and this permission notice shall be included in - * all copies or substantial portions of the Software. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER - * DEALINGS IN THE SOFTWARE. - */ -#include "priv.h" - -#include <core/device.h> - -static void -gk20a_ram_put(struct nvkm_fb *pfb, struct nvkm_mem **pmem) -{ - BUG(); -} - -static int -gk20a_ram_get(struct nvkm_fb *pfb, u64 size, u32 align, u32 ncmin, - u32 memtype, struct nvkm_mem **pmem) -{ - BUG(); -} - -static int -gk20a_ram_ctor(struct nvkm_object *parent, struct nvkm_object *engine, - struct nvkm_oclass *oclass, void *data, u32 datasize, - struct nvkm_object **pobject) -{ - struct nvkm_ram *ram; - int ret; - - ret = nvkm_ram_create(parent, engine, oclass, &ram); - *pobject = nv_object(ram); - if (ret) - return ret; - ram->type = NV_MEM_TYPE_STOLEN; - ram->size = get_num_physpages() << PAGE_SHIFT; - - ram->get = gk20a_ram_get; - ram->put = gk20a_ram_put; - return 0; -} - -struct nvkm_oclass -gk20a_ram_oclass = { - .ofuncs = &(struct nvkm_ofuncs) { - .ctor = gk20a_ram_ctor, - .dtor = _nvkm_ram_dtor, - .init = _nvkm_ram_init, - .fini = _nvkm_ram_fini, - }, -}; -- 2.3.0
Alexandre Courbot
2015-Feb-11 07:21 UTC
[Nouveau] [PATCH v2 4/6] instmem/gk20a: use DMA attributes
instmem for GK20A is allocated using dma_alloc_coherent(), which provides us with a coherent CPU mapping that we never use because instmem objects are accessed through PRAMIN. Switch to dma_alloc_attrs() which gives us the option to dismiss that CPU mapping and free up some CPU virtual space. Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> --- drm/nouveau/nvkm/subdev/instmem/gk20a.c | 24 ++++++++++++++++++++---- lib/include/nvif/os.h | 31 +++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+), 4 deletions(-) diff --git a/drm/nouveau/nvkm/subdev/instmem/gk20a.c b/drm/nouveau/nvkm/subdev/instmem/gk20a.c index 6176f50..4c8af6e 100644 --- a/drm/nouveau/nvkm/subdev/instmem/gk20a.c +++ b/drm/nouveau/nvkm/subdev/instmem/gk20a.c @@ -24,6 +24,10 @@ #include <core/mm.h> #include <core/device.h> +#ifdef __KERNEL__ +#include <linux/dma-attrs.h> +#endif + #include "priv.h" struct gk20a_instobj_priv { @@ -34,6 +38,7 @@ struct gk20a_instobj_priv { struct nvkm_mem _mem; void *cpuaddr; dma_addr_t handle; + struct dma_attrs attrs; struct nvkm_mm_node r; }; @@ -91,8 +96,8 @@ gk20a_instobj_dtor(struct nvkm_object *object) if (unlikely(!node->handle)) return; - dma_free_coherent(dev, node->mem->size << PAGE_SHIFT, node->cpuaddr, - node->handle); + dma_free_attrs(dev, node->mem->size << PAGE_SHIFT, node->cpuaddr, + node->handle, &node->attrs); nvkm_instobj_destroy(&node->base); } @@ -126,8 +131,19 @@ gk20a_instobj_ctor(struct nvkm_object *parent, struct nvkm_object *engine, node->mem = &node->_mem; - node->cpuaddr = dma_alloc_coherent(dev, npages << PAGE_SHIFT, - &node->handle, GFP_KERNEL); + init_dma_attrs(&node->attrs); + /* + * We will access this memory through PRAMIN and thus do not need a + * consistent CPU pointer + */ + dma_set_attr(DMA_ATTR_NON_CONSISTENT, &node->attrs); + dma_set_attr(DMA_ATTR_WEAK_ORDERING, &node->attrs); + dma_set_attr(DMA_ATTR_WRITE_COMBINE, &node->attrs); + dma_set_attr(DMA_ATTR_NO_KERNEL_MAPPING, &node->attrs); + + node->cpuaddr = dma_alloc_attrs(dev, npages << PAGE_SHIFT, + &node->handle, GFP_KERNEL, + &node->attrs); if (!node->cpuaddr) { nv_error(priv, "cannot allocate DMA memory\n"); return -ENOMEM; diff --git a/lib/include/nvif/os.h b/lib/include/nvif/os.h index f6391a5..b4d307e 100644 --- a/lib/include/nvif/os.h +++ b/lib/include/nvif/os.h @@ -683,6 +683,37 @@ dma_free_coherent(struct device *dev, size_t sz, void *vaddr, dma_addr_t bus) { } +enum dma_attr { + DMA_ATTR_WRITE_BARRIER, + DMA_ATTR_WEAK_ORDERING, + DMA_ATTR_WRITE_COMBINE, + DMA_ATTR_NON_CONSISTENT, + DMA_ATTR_NO_KERNEL_MAPPING, + DMA_ATTR_SKIP_CPU_SYNC, + DMA_ATTR_FORCE_CONTIGUOUS, + DMA_ATTR_MAX, +}; + +struct dma_attrs { +}; + +static inline void init_dma_attrs(struct dma_attrs *attrs) {} +static inline void dma_set_attr(enum dma_attr attr, struct dma_attrs *attrs) {} + +static inline void * +dma_alloc_attrs(struct device *dev, size_t sz, dma_addr_t *hdl, gfp_t gfp, + struct dma_attrs *attrs) +{ + return NULL; +} + +static inline void +dma_free_attrs(struct device *dev, size_t sz, void *vaddr, dma_addr_t bus, + struct dma_attrs *attrs) +{ +} + + /****************************************************************************** * PCI *****************************************************************************/ -- 2.3.0
Alexandre Courbot
2015-Feb-11 07:21 UTC
[Nouveau] [PATCH v2 5/6] platform: probe IOMMU if present
Tegra SoCs have an IOMMU that can be used to present non-contiguous physical memory as contiguous to the GPU and maximize the use of large pages in the GPU MMU, leading to performance gains. This patch adds support for probing such a IOMMU if present and make its properties available in the nouveau_platform_gpu structure so subsystems can take advantage of it. Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> --- drm/nouveau/nouveau_platform.c | 75 +++++++++++++++++++++++++++++++++++++++++- drm/nouveau/nouveau_platform.h | 18 ++++++++++ lib/include/nvif/os.h | 32 ++++++++++++++++++ 3 files changed, 124 insertions(+), 1 deletion(-) diff --git a/drm/nouveau/nouveau_platform.c b/drm/nouveau/nouveau_platform.c index dc5900b..3691982 100644 --- a/drm/nouveau/nouveau_platform.c +++ b/drm/nouveau/nouveau_platform.c @@ -27,6 +27,7 @@ #include <linux/of.h> #include <linux/reset.h> #include <linux/regulator/consumer.h> +#include <linux/iommu.h> #include <soc/tegra/fuse.h> #include <soc/tegra/pmc.h> @@ -91,6 +92,71 @@ static int nouveau_platform_power_down(struct nouveau_platform_gpu *gpu) return 0; } +static void nouveau_platform_probe_iommu(struct device *dev, + struct nouveau_platform_gpu *gpu) +{ + int err; + unsigned long pgsize_bitmap; + + mutex_init(&gpu->iommu.mutex); + + if (iommu_present(&platform_bus_type)) { + gpu->iommu.domain = iommu_domain_alloc(&platform_bus_type); + if (IS_ERR(gpu->iommu.domain)) + goto error; + + /* + * A IOMMU is only usable if it supports page sizes smaller + * or equal to the system's PAGE_SIZE, with a preference if + * both are equal. + */ + pgsize_bitmap = gpu->iommu.domain->ops->pgsize_bitmap; + if (pgsize_bitmap & PAGE_SIZE) { + gpu->iommu.pgshift = PAGE_SHIFT; + } else { + gpu->iommu.pgshift = fls(pgsize_bitmap & ~PAGE_MASK); + if (gpu->iommu.pgshift == 0) { + dev_warn(dev, "unsupported IOMMU page size\n"); + goto free_domain; + } + gpu->iommu.pgshift -= 1; + } + + err = iommu_attach_device(gpu->iommu.domain, dev); + if (err) + goto free_domain; + + err = nvkm_mm_init(&gpu->iommu._mm, 0, + (1ULL << 40) >> gpu->iommu.pgshift, 1); + if (err) + goto detach_device; + + gpu->iommu.mm = &gpu->iommu._mm; + } + + return; + +detach_device: + iommu_detach_device(gpu->iommu.domain, dev); + +free_domain: + iommu_domain_free(gpu->iommu.domain); + +error: + gpu->iommu.domain = NULL; + gpu->iommu.pgshift = 0; + dev_err(dev, "cannot initialize IOMMU MM\n"); +} + +static void nouveau_platform_remove_iommu(struct device *dev, + struct nouveau_platform_gpu *gpu) +{ + if (gpu->iommu.domain) { + iommu_detach_device(gpu->iommu.domain, dev); + iommu_domain_free(gpu->iommu.domain); + } +} + static int nouveau_platform_probe(struct platform_device *pdev) { struct nouveau_platform_gpu *gpu; @@ -118,6 +184,8 @@ static int nouveau_platform_probe(struct platform_device *pdev) if (IS_ERR(gpu->clk_pwr)) return PTR_ERR(gpu->clk_pwr); + nouveau_platform_probe_iommu(&pdev->dev, gpu); + err = nouveau_platform_power_up(gpu); if (err) return err; @@ -154,10 +222,15 @@ static int nouveau_platform_remove(struct platform_device *pdev) struct nouveau_drm *drm = nouveau_drm(drm_dev); struct nvkm_device *device = nvxx_device(&drm->device); struct nouveau_platform_gpu *gpu = nv_device_to_platform(device)->gpu; + int err; nouveau_drm_device_remove(drm_dev); - return nouveau_platform_power_down(gpu); + err = nouveau_platform_power_down(gpu); + + nouveau_platform_remove_iommu(&pdev->dev, gpu); + + return err; } #if IS_ENABLED(CONFIG_OF) diff --git a/drm/nouveau/nouveau_platform.h b/drm/nouveau/nouveau_platform.h index 268bb72..392874c 100644 --- a/drm/nouveau/nouveau_platform.h +++ b/drm/nouveau/nouveau_platform.h @@ -24,10 +24,12 @@ #define __NOUVEAU_PLATFORM_H__ #include "core/device.h" +#include "core/mm.h" struct reset_control; struct clk; struct regulator; +struct iommu_domain; struct platform_driver; struct nouveau_platform_gpu { @@ -36,6 +38,22 @@ struct nouveau_platform_gpu { struct clk *clk_pwr; struct regulator *vdd; + + struct { + /* + * Protects accesses to mm from subsystems + */ + struct mutex mutex; + + struct nvkm_mm _mm; + /* + * Just points to _mm. We need this to avoid embedding + * struct nvkm_mm in os.h + */ + struct nvkm_mm *mm; + struct iommu_domain *domain; + unsigned long pgshift; + } iommu; }; struct nouveau_platform_device { diff --git a/lib/include/nvif/os.h b/lib/include/nvif/os.h index b4d307e..275fa84 100644 --- a/lib/include/nvif/os.h +++ b/lib/include/nvif/os.h @@ -715,6 +715,29 @@ dma_free_attrs(struct device *dev, size_t sz, void *vaddr, dma_addr_t bus, /****************************************************************************** + * IOMMU + *****************************************************************************/ +struct iommu_domain; + +#define IOMMU_READ (1 << 0) +#define IOMMU_WRITE (1 << 1) +#define IOMMU_CACHE (1 << 2) /* DMA cache coherency */ +#define IOMMU_NOEXEC (1 << 3) + +static inline int +iommu_map(struct iommu_domain *domain, unsigned long iova, dma_addr_t paddr, + size_t size, int prot) +{ + return 0; +} + +static inline size_t +iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) +{ + return 0; +} + +/****************************************************************************** * PCI *****************************************************************************/ #include <pciaccess.h> @@ -1208,9 +1231,18 @@ regulator_get_voltage(struct regulator *regulator) * nouveau drm platform device *****************************************************************************/ +struct nvkm_mm; + struct nouveau_platform_gpu { struct clk *clk; struct regulator *vdd; + + struct { + struct mutex mutex; + struct nvkm_mm *mm; + struct iommu_domain *domain; + unsigned long pgshift; + } iommu; }; struct nouveau_platform_device { -- 2.3.0
Alexandre Courbot
2015-Feb-11 07:21 UTC
[Nouveau] [PATCH v2 6/6] instmem/gk20a: add IOMMU support
Let GK20A's instmem take advantage of the IOMMU if it is present. Having an IOMMU means that instmem is no longer allocated using the DMA API, but instead obtained through page_alloc and made contiguous to the GPU by IOMMU mappings. Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> --- drm/nouveau/nvkm/subdev/instmem/gk20a.c | 272 ++++++++++++++++++++++++++++---- 1 file changed, 241 insertions(+), 31 deletions(-) diff --git a/drm/nouveau/nvkm/subdev/instmem/gk20a.c b/drm/nouveau/nvkm/subdev/instmem/gk20a.c index 4c8af6e..a20b93c 100644 --- a/drm/nouveau/nvkm/subdev/instmem/gk20a.c +++ b/drm/nouveau/nvkm/subdev/instmem/gk20a.c @@ -20,12 +20,32 @@ * DEALINGS IN THE SOFTWARE. */ +/* + * GK20A does not have dedicated video memory, and to accurately represent this + * fact Nouveau will not create a RAM device for it. Therefore its instmem + * implementation must be done directly on top of system memory, while providing + * coherent read and write operations. + * + * Instmem can be allocated through two means: + * 1) If a IOMMU mapping has been probed, the IOMMU API is used to make memory + * pages contiguous to the GPU. This is the preferred way. + * 2) If no IOMMU mapping is probed, the DMA API is used to allocated physically + * contiguous memory. + * + * In both cases CPU read and writes are performed using PRAMIN (i.e. using the + * GPU path) to ensure these operations are coherent for the GPU. This allows us + * to use more "relaxed" allocation parameters when using the DMA API, since we + * never need a kernel mapping. + */ + #include <subdev/fb.h> #include <core/mm.h> #include <core/device.h> #ifdef __KERNEL__ #include <linux/dma-attrs.h> +#include <linux/iommu.h> +#include <nouveau_platform.h> #endif #include "priv.h" @@ -36,18 +56,51 @@ struct gk20a_instobj_priv { struct nvkm_mem *mem; /* Pointed by mem */ struct nvkm_mem _mem; +}; + +/* + * Used for objects allocated using the DMA API + */ +struct gk20a_instobj_dma { + struct gk20a_instobj_priv base; + void *cpuaddr; dma_addr_t handle; struct dma_attrs attrs; struct nvkm_mm_node r; }; +/* + * Used for objects flattened using the IOMMU API + */ +struct gk20a_instobj_iommu { + struct gk20a_instobj_priv base; + + /* array of base.mem->size pages */ + struct page *pages[]; +}; + struct gk20a_instmem_priv { struct nvkm_instmem base; spinlock_t lock; u64 addr; + + /* Only used if IOMMU if present */ + struct mutex *mm_mutex; + struct nvkm_mm *mm; + struct iommu_domain *domain; + unsigned long iommu_pgshift; }; +/* + * Use PRAMIN to read/write data and avoid coherency issues. + * PRAMIN uses the GPU path and ensures data will always be coherent. + * + * A dynamic mapping based solution would be desirable in the future, but + * the issue remains of how to maintain coherency efficiently. On ARM it is + * not easy (if possible at all?) to create uncached temporary mappings. + */ + static u32 gk20a_instobj_rd32(struct nvkm_object *object, u64 offset) { @@ -87,50 +140,79 @@ gk20a_instobj_wr32(struct nvkm_object *object, u64 offset, u32 data) } static void -gk20a_instobj_dtor(struct nvkm_object *object) +gk20a_instobj_dtor_dma(struct gk20a_instobj_priv *_node) { - struct gk20a_instobj_priv *node = (void *)object; + struct gk20a_instobj_dma *node = (void *)_node; struct gk20a_instmem_priv *priv = (void *)nvkm_instmem(node); struct device *dev = nv_device_base(nv_device(priv)); if (unlikely(!node->handle)) return; - dma_free_attrs(dev, node->mem->size << PAGE_SHIFT, node->cpuaddr, + dma_free_attrs(dev, _node->mem->size << PAGE_SHIFT, node->cpuaddr, node->handle, &node->attrs); +} + +static void +gk20a_instobj_dtor_iommu(struct gk20a_instobj_priv *_node) +{ + struct gk20a_instobj_iommu *node = (void *)_node; + struct gk20a_instmem_priv *priv = (void *)nvkm_instmem(node); + struct nvkm_mm_node *r; + int i; + + if (unlikely(list_empty(&_node->mem->regions))) + return; + + r = list_first_entry(&_node->mem->regions, struct nvkm_mm_node, + rl_entry); + + /* clear bit 34 to unmap pages */ + r->offset &= ~BIT(34 - priv->iommu_pgshift); + + /* Unmap pages from GPU address space and free them */ + for (i = 0; i < _node->mem->size; i++) { + iommu_unmap(priv->domain, + (r->offset + i) << priv->iommu_pgshift, PAGE_SIZE); + __free_page(node->pages[i]); + } + + /* Release area from GPU address space */ + mutex_lock(priv->mm_mutex); + nvkm_mm_free(priv->mm, &r); + mutex_unlock(priv->mm_mutex); +} + +static void +gk20a_instobj_dtor(struct nvkm_object *object) +{ + struct gk20a_instobj_priv *node = (void *)object; + struct gk20a_instmem_priv *priv = (void *)nvkm_instmem(node); + + if (priv->domain) + gk20a_instobj_dtor_iommu(node); + else + gk20a_instobj_dtor_dma(node); nvkm_instobj_destroy(&node->base); } static int -gk20a_instobj_ctor(struct nvkm_object *parent, struct nvkm_object *engine, - struct nvkm_oclass *oclass, void *data, u32 _size, - struct nvkm_object **pobject) +gk20a_instobj_ctor_dma(struct nvkm_object *parent, struct nvkm_object *engine, + struct nvkm_oclass *oclass, u32 npages, u32 align, + struct gk20a_instobj_priv **_node) { - struct nvkm_instobj_args *args = data; + struct gk20a_instobj_dma *node; struct gk20a_instmem_priv *priv = (void *)nvkm_instmem(parent); - struct device *dev = nv_device_base(nv_device(priv)); - struct gk20a_instobj_priv *node; - u32 size, align; - u32 npages; + struct device *dev = nv_device_base(nv_device(parent)); int ret; - nv_debug(parent, "%s: size: %x align: %x\n", __func__, - args->size, args->align); - - size = max((args->size + 4095) & ~4095, (u32)4096); - align = max((args->align + 4095) & ~4095, (u32)4096); - - npages = size >> PAGE_SHIFT; - ret = nvkm_instobj_create_(parent, engine, oclass, sizeof(*node), - (void **)&node); - *pobject = nv_object(node); + (void **)&node); + *_node = &node->base; if (ret) return ret; - node->mem = &node->_mem; - init_dma_attrs(&node->attrs); /* * We will access this memory through PRAMIN and thus do not need a @@ -154,16 +236,132 @@ gk20a_instobj_ctor(struct nvkm_object *parent, struct nvkm_object *engine, nv_warn(priv, "memory not aligned as requested: %pad (0x%x)\n", &node->handle, align); - node->mem->offset = node->handle; + /* present memory for being mapped using small pages */ + node->r.type = 12; + node->r.offset = node->handle >> 12; + node->r.length = (npages << PAGE_SHIFT) >> 12; + + node->base._mem.offset = node->handle; + + INIT_LIST_HEAD(&node->base._mem.regions); + list_add_tail(&node->r.rl_entry, &node->base._mem.regions); + + return 0; +} + +static int +gk20a_instobj_ctor_iommu(struct nvkm_object *parent, struct nvkm_object *engine, + struct nvkm_oclass *oclass, u32 npages, u32 align, + struct gk20a_instobj_priv **_node) +{ + struct gk20a_instobj_iommu *node; + struct gk20a_instmem_priv *priv = (void *)nvkm_instmem(parent); + struct nvkm_mm_node *r; + int ret; + int i; + + ret = nvkm_instobj_create_(parent, engine, oclass, + sizeof(*node) + sizeof(node->pages[0]) * npages, + (void **)&node); + *_node = &node->base; + if (ret) + return ret; + + /* Allocate backing memory */ + for (i = 0; i < npages; i++) { + struct page *p = alloc_page(GFP_KERNEL); + + if (p == NULL) { + ret = -ENOMEM; + goto free_pages; + } + node->pages[i] = p; + } + + mutex_lock(priv->mm_mutex); + /* Reserve area from GPU address space */ + ret = nvkm_mm_head(priv->mm, 0, 1, npages, npages, + align >> priv->iommu_pgshift, &r); + mutex_unlock(priv->mm_mutex); + if (ret) { + nv_error(priv, "virtual space is full!\n"); + goto free_pages; + } + + /* Map into GPU address space */ + for (i = 0; i < npages; i++) { + struct page *p = node->pages[i]; + u32 offset = (r->offset + i) << priv->iommu_pgshift; + + ret = iommu_map(priv->domain, offset, page_to_phys(p), + PAGE_SIZE, IOMMU_READ | IOMMU_WRITE); + if (ret < 0) { + nv_error(priv, "IOMMU mapping failure: %d\n", ret); + + while (i-- > 0) { + offset -= PAGE_SIZE; + iommu_unmap(priv->domain, offset, PAGE_SIZE); + } + goto release_area; + } + } + + /* Bit 34 tells that an address is to be resolved through the IOMMU */ + r->offset |= BIT(34 - priv->iommu_pgshift); + + node->base._mem.offset = ((u64)r->offset) << priv->iommu_pgshift; + + INIT_LIST_HEAD(&node->base._mem.regions); + list_add_tail(&r->rl_entry, &node->base._mem.regions); + + return 0; + +release_area: + mutex_lock(priv->mm_mutex); + nvkm_mm_free(priv->mm, &r); + mutex_unlock(priv->mm_mutex); + +free_pages: + for (i = 0; i < npages && node->pages[i] != NULL; i++) + __free_page(node->pages[i]); + + return ret; +} + +static int +gk20a_instobj_ctor(struct nvkm_object *parent, struct nvkm_object *engine, + struct nvkm_oclass *oclass, void *data, u32 _size, + struct nvkm_object **pobject) +{ + struct nvkm_instobj_args *args = data; + struct gk20a_instmem_priv *priv = (void *)nvkm_instmem(parent); + struct gk20a_instobj_priv *node; + u32 size, align; + int ret; + + nv_debug(parent, "%s (%s): size: %x align: %x\n", __func__, + priv->domain ? "IOMMU" : "DMA", args->size, args->align); + + /* Round size and align to page bounds */ + size = max((args->size + ~PAGE_MASK) & PAGE_MASK, (u32)PAGE_SIZE); + align = max((args->align + ~PAGE_MASK) & PAGE_MASK, (u32)PAGE_SIZE); + + if (priv->domain) + ret = gk20a_instobj_ctor_iommu(parent, engine, oclass, + size >> PAGE_SHIFT, align, &node); + else + ret = gk20a_instobj_ctor_dma(parent, engine, oclass, + size >> PAGE_SHIFT, align, &node); + *pobject = nv_object(node); + if (ret) + return ret; + + node->mem = &node->_mem; + + /* present memory for being mapped using small pages */ node->mem->size = size >> 12; node->mem->memtype = 0; node->mem->page_shift = 12; - INIT_LIST_HEAD(&node->mem->regions); - - node->r.type = 12; - node->r.offset = node->handle >> 12; - node->r.length = npages; - list_add_tail(&node->r.rl_entry, &node->mem->regions); node->base.addr = node->mem->offset; node->base.size = size; @@ -202,6 +400,7 @@ gk20a_instmem_ctor(struct nvkm_object *parent, struct nvkm_object *engine, struct nvkm_object **pobject) { struct gk20a_instmem_priv *priv; + struct nouveau_platform_device *plat; int ret; ret = nvkm_instmem_create(parent, engine, oclass, &priv); @@ -211,6 +410,18 @@ gk20a_instmem_ctor(struct nvkm_object *parent, struct nvkm_object *engine, spin_lock_init(&priv->lock); + plat = nv_device_to_platform(nv_device(parent)); + if (plat->gpu->iommu.domain) { + priv->domain = plat->gpu->iommu.domain; + priv->mm = plat->gpu->iommu.mm; + priv->iommu_pgshift = plat->gpu->iommu.pgshift; + priv->mm_mutex = &plat->gpu->iommu.mutex; + + nv_info(priv, "using IOMMU\n"); + } else { + nv_info(priv, "using DMA API\n"); + } + return 0; } @@ -225,4 +436,3 @@ gk20a_instmem_oclass = &(struct nvkm_instmem_impl) { }, .instobj = &gk20a_instobj_oclass.base, }.base; - -- 2.3.0
On Wed, Feb 11, 2015 at 2:21 AM, Alexandre Courbot <gnurou at gmail.com> wrote:> @@ -150,7 +151,10 @@ gf100_ltc_init_tag_ram(struct nvkm_fb *pfb, struct nvkm_ltc_priv *priv) > int ret; > > /* tags for 1/4 of VRAM should be enough (8192/4 per GiB of VRAM) */ > - priv->num_tags = (pfb->ram->size >> 17) / 4; > + if (pfb->ram) > + priv->num_tags = (pfb->ram->size >> 17) / 4; > + else > + priv->num_tags = (1 << 17); > if (priv->num_tags > (1 << 17)) > priv->num_tags = 1 << 17; /* we have 17 bits in PTE */ > priv->num_tags = (priv->num_tags + 63) & ~63; /* round up to 64 */ > @@ -170,8 +174,11 @@ gf100_ltc_init_tag_ram(struct nvkm_fb *pfb, struct nvkm_ltc_priv *priv) > tag_size += tag_align; > tag_size = (tag_size + 0xfff) >> 12; /* round up */ > > - ret = nvkm_mm_tail(&pfb->vram, 1, 1, tag_size, tag_size, 1, > - &priv->tag_ram); > + if (pfb->ram) > + ret = nvkm_mm_tail(&pfb->vram, 1, 1, tag_size, tag_size, 1, > + &priv->tag_ram); > + else > + ret = -1; > if (ret) { > priv->num_tags = 0; > } else {Seems like you could just avoid a lot of this and just start the function out with if (!pfb->ram) { priv->num_tags = 0; (is that even necessary? the object is probably kzalloc'd) ret = nouveau_mm_init(&priv->tags, 0, 0, 1); return ret; } That's essentially what happens, no? Perhaps you could go a step further and teach the base code about uninitialized tags mm, but that may not be worth it. -ilia
Ilia Mirkin
2015-Feb-11 07:55 UTC
[Nouveau] [PATCH v2 2/6] instmem/gk20a: move memory allocation to instmem
On Wed, Feb 11, 2015 at 2:21 AM, Alexandre Courbot <gnurou at gmail.com> wrote:> +static int > +gk20a_instmem_fini(struct nvkm_object *object, bool suspend) > +{ > + struct gk20a_instmem_priv *priv = (void *)object; > + priv->addr = ~0ULL; > + return nvkm_instmem_fini(&priv->base, suspend); > +} > + > +static int > +gk20a_instmem_ctor(struct nvkm_object *parent, struct nvkm_object *engine, > + struct nvkm_oclass *oclass, void *data, u32 size, > + struct nvkm_object **pobject) > +{ > + struct gk20a_instmem_priv *priv; > + int ret; > + > + ret = nvkm_instmem_create(parent, engine, oclass, &priv); > + *pobject = nv_object(priv); > + if (ret) > + return ret; > + > + spin_lock_init(&priv->lock); > + > + return 0; > +} > + > +struct nvkm_oclass * > +gk20a_instmem_oclass = &(struct nvkm_instmem_impl) { > + .base.handle = NV_SUBDEV(INSTMEM, 0xea), > + .base.ofuncs = &(struct nvkm_ofuncs) { > + .ctor = gk20a_instmem_ctor, > + .dtor = _nvkm_instmem_dtor, > + .init = _nvkm_instmem_init, > + .fini = gk20a_instmem_fini, > + }, > + .instobj = &gk20a_instobj_oclass.base, > +}.base;Minor point, but you could just reuse the nv50 functions here, instead of redefining them (and get rid of gk20a_instmem_priv). Your call... not like it's a ton of code being duplicated, and it would require exposing the nv50_instmem_priv stuff. Also, I know you're just copy/pasting the nv50 code for the instmem stuff... but that open-coded ALIGN() is really painful to look at. -ilia
Ilia Mirkin
2015-Feb-11 08:03 UTC
[Nouveau] [PATCH v2 6/6] instmem/gk20a: add IOMMU support
On Wed, Feb 11, 2015 at 2:21 AM, Alexandre Courbot <gnurou at gmail.com> wrote:> Let GK20A's instmem take advantage of the IOMMU if it is present. Having > an IOMMU means that instmem is no longer allocated using the DMA API, > but instead obtained through page_alloc and made contiguous to the GPU > by IOMMU mappings. > > Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> > --- > drm/nouveau/nvkm/subdev/instmem/gk20a.c | 272 ++++++++++++++++++++++++++++---- > 1 file changed, 241 insertions(+), 31 deletions(-) > > diff --git a/drm/nouveau/nvkm/subdev/instmem/gk20a.c b/drm/nouveau/nvkm/subdev/instmem/gk20a.c > index 4c8af6e..a20b93c 100644 > --- a/drm/nouveau/nvkm/subdev/instmem/gk20a.c > +++ b/drm/nouveau/nvkm/subdev/instmem/gk20a.c > @@ -20,12 +20,32 @@ > * DEALINGS IN THE SOFTWARE. > */ > > +/* > + * GK20A does not have dedicated video memory, and to accurately represent this > + * fact Nouveau will not create a RAM device for it. Therefore its instmem > + * implementation must be done directly on top of system memory, while providing > + * coherent read and write operations. > + * > + * Instmem can be allocated through two means: > + * 1) If a IOMMU mapping has been probed, the IOMMU API is used to make memoryan IOMMU mapping> + * pages contiguous to the GPU. This is the preferred way. > + * 2) If no IOMMU mapping is probed, the DMA API is used to allocated physicallyto allocate physically> + * contiguous memory. > + * > + * In both cases CPU read and writes are performed using PRAMIN (i.e. using the > + * GPU path) to ensure these operations are coherent for the GPU. This allows us > + * to use more "relaxed" allocation parameters when using the DMA API, since we > + * never need a kernel mapping. > + */Er yeah... obviously ignore my comments about sharing nv50_instmem_priv stuff. -ilia
Possibly Parallel Threads
- [PATCH 0/6] nouveau/gk20a: RAM device removal & IOMMU support
- [PATCH v3 0/6] nouveau/gk20a: RAM device removal & IOMMU support
- [PATCH v4 0/6] nouveau/gk20a: RAM device removal & IOMMU support
- [PATCH v3 4/6] instmem/gk20a: use DMA attributes
- [PATCH 2/6] instmem/gk20a: refer to IOMMU physical translation bit