Alexandre Courbot
2014-Jul-08  08:25 UTC
[Nouveau] [PATCH v4 0/6] drm: nouveau: memory coherency on ARM
Another revision of this patchset critical for GK20A to operate. Previous attempts were exclusively using either TTM's regular page allocator or the DMA API one. Both have their advantages and drawbacks: the page allocator is fast but requires explicit synchronization on non-coherent architectures, whereas the DMA allocator always returns coherent memory, but is also slower, creates a permanent kernel mapping, and is more constrained as to which memory it can use. This version attempts to use the most-fit allocator according to the buffer use-case: - buffers that are passed to user-space can explicitly be synced during their validation and preparation for CPU access, as previously shown by Lucas (http://lists.freedesktop.org/archives/nouveau/2013-August/014029.html ). For these, we don't mind if the memory is not coherent and prefer to use the page allocator. - buffers that are used by the kernel, typically fences and GPFIFO buffers, are accessed rarely and thus should not trigger a costly flush or cache invalidation. For these, we want to guarantee coherent access and use the DMA API if necessary. This series attempts to implement this behavior by allowing the TTM_PL_FLAG_UNCACHED flag to be passed to nouveau_bo_new(). On coherent architectures this flag is a no-op ; on non-coherent architectures, it will force the creation of a coherent buffer using the DMA-API. Several fixes and changes were necessary to enable this behavior: - CPU addresses of DMA-allocated BOs must be made visible (patch 1) so the coherent mapping can be used by drivers - The DMA-sync functions are required for BOs populated using the page allocator (patch 4). Pages need to be mapped to the device using the correct API if we are to call the sync functions (patch 2). Additionally, we need to understand whether we are on a CPU-coherent architecture (patch 3). - Coherent BOs need to be detected by Nouveau so their coherent kernel mapping can be used instead of creating a new one (patch 5). - Finally, buffers that are used by the kernel should be requested to be coherent (page 6). Changes since v3: - Only use the DMA allocator for BOs that strictly require to be coherent - Fixed the way pages are mapped to the GPU on platform devices - Thoroughly checked with CONFIG_DMA_API_DEBUG that there were no API violations Alexandre Courbot (6): drm/ttm: expose CPU address of DMA-allocated pages drm/nouveau: map pages using DMA API on platform devices drm/nouveau: introduce nv_device_is_cpu_coherent() drm/nouveau: synchronize BOs when required drm/nouveau: implement explicitly coherent BOs drm/nouveau: allocate GPFIFOs and fences coherently drivers/gpu/drm/nouveau/core/engine/device/base.c | 14 ++- drivers/gpu/drm/nouveau/core/include/core/device.h | 3 + drivers/gpu/drm/nouveau/nouveau_bo.c | 132 +++++++++++++++++++-- drivers/gpu/drm/nouveau/nouveau_bo.h | 3 + drivers/gpu/drm/nouveau/nouveau_chan.c | 2 +- drivers/gpu/drm/nouveau/nouveau_gem.c | 12 ++ drivers/gpu/drm/nouveau/nv84_fence.c | 4 +- drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 2 + drivers/gpu/drm/ttm/ttm_tt.c | 6 +- include/drm/ttm/ttm_bo_driver.h | 2 + 10 files changed, 167 insertions(+), 13 deletions(-) -- 2.0.0
Alexandre Courbot
2014-Jul-08  08:25 UTC
[Nouveau] [PATCH v4 1/6] drm/ttm: expose CPU address of DMA-allocated pages
Pages allocated using the DMA API have a coherent memory mapping. Make
this mapping visible to drivers so they can decide to use it instead of
creating their own redundant one.
Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
---
 drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 2 ++
 drivers/gpu/drm/ttm/ttm_tt.c             | 6 +++++-
 include/drm/ttm/ttm_bo_driver.h          | 2 ++
 3 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
index fb8259f69839..0301fac5badd 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
@@ -847,6 +847,7 @@ static int ttm_dma_pool_get_pages(struct dma_pool *pool,
 	if (count) {
 		d_page = list_first_entry(&pool->free_list, struct dma_page,
page_list);
 		ttm->pages[index] = d_page->p;
+		ttm_dma->cpu_address[index] = d_page->vaddr;
 		ttm_dma->dma_address[index] = d_page->dma;
 		list_move_tail(&d_page->page_list, &ttm_dma->pages_list);
 		r = 0;
@@ -978,6 +979,7 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct
device *dev)
 	INIT_LIST_HEAD(&ttm_dma->pages_list);
 	for (i = 0; i < ttm->num_pages; i++) {
 		ttm->pages[i] = NULL;
+		ttm_dma->cpu_address[i] = 0;
 		ttm_dma->dma_address[i] = 0;
 	}
 
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 75f319090043..341594ede596 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -58,6 +58,8 @@ static void ttm_dma_tt_alloc_page_directory(struct ttm_dma_tt
*ttm)
 	ttm->ttm.pages = drm_calloc_large(ttm->ttm.num_pages, sizeof(void*));
 	ttm->dma_address = drm_calloc_large(ttm->ttm.num_pages,
 					    sizeof(*ttm->dma_address));
+	ttm->cpu_address = drm_calloc_large(ttm->ttm.num_pages,
+					    sizeof(*ttm->cpu_address));
 }
 
 #ifdef CONFIG_X86
@@ -228,7 +230,7 @@ int ttm_dma_tt_init(struct ttm_dma_tt *ttm_dma, struct
ttm_bo_device *bdev,
 
 	INIT_LIST_HEAD(&ttm_dma->pages_list);
 	ttm_dma_tt_alloc_page_directory(ttm_dma);
-	if (!ttm->pages || !ttm_dma->dma_address) {
+	if (!ttm->pages || !ttm_dma->dma_address || !ttm_dma->cpu_address) {
 		ttm_tt_destroy(ttm);
 		pr_err("Failed allocating page table\n");
 		return -ENOMEM;
@@ -243,6 +245,8 @@ void ttm_dma_tt_fini(struct ttm_dma_tt *ttm_dma)
 
 	drm_free_large(ttm->pages);
 	ttm->pages = NULL;
+	drm_free_large(ttm_dma->cpu_address);
+	ttm_dma->cpu_address = NULL;
 	drm_free_large(ttm_dma->dma_address);
 	ttm_dma->dma_address = NULL;
 }
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index a5183da3ef92..7d30f0666d24 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -133,6 +133,7 @@ struct ttm_tt {
  * struct ttm_dma_tt
  *
  * @ttm: Base ttm_tt struct.
+ * @cpu_address: The CPU address of the pages
  * @dma_address: The DMA (bus) addresses of the pages
  * @pages_list: used by some page allocation backend
  *
@@ -142,6 +143,7 @@ struct ttm_tt {
  */
 struct ttm_dma_tt {
 	struct ttm_tt ttm;
+	void **cpu_address;
 	dma_addr_t *dma_address;
 	struct list_head pages_list;
 };
-- 
2.0.0
Alexandre Courbot
2014-Jul-08  08:25 UTC
[Nouveau] [PATCH v4 2/6] drm/nouveau: map pages using DMA API on platform devices
page_to_phys() is not the correct way to obtain the DMA address of a
buffer on a non-PCI system. Use the DMA API functions for this, which
are portable and will allow us to use other DMA API functions for
buffer synchronization.
Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
---
 drivers/gpu/drm/nouveau/core/engine/device/base.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/core/engine/device/base.c
b/drivers/gpu/drm/nouveau/core/engine/device/base.c
index 18c8c7245b73..e4e9e64988fe 100644
--- a/drivers/gpu/drm/nouveau/core/engine/device/base.c
+++ b/drivers/gpu/drm/nouveau/core/engine/device/base.c
@@ -489,7 +489,10 @@ nv_device_map_page(struct nouveau_device *device, struct
page *page)
 		if (pci_dma_mapping_error(device->pdev, ret))
 			ret = 0;
 	} else {
-		ret = page_to_phys(page);
+		ret = dma_map_page(&device->platformdev->dev, page, 0,
+				   PAGE_SIZE, DMA_BIDIRECTIONAL);
+		if (dma_mapping_error(&device->platformdev->dev, ret))
+			ret = 0;
 	}
 
 	return ret;
@@ -501,6 +504,9 @@ nv_device_unmap_page(struct nouveau_device *device,
dma_addr_t addr)
 	if (nv_device_is_pci(device))
 		pci_unmap_page(device->pdev, addr, PAGE_SIZE,
 			       PCI_DMA_BIDIRECTIONAL);
+	else
+		dma_unmap_page(&device->platformdev->dev, addr,
+			       PAGE_SIZE, DMA_BIDIRECTIONAL);
 }
 
 int
-- 
2.0.0
Alexandre Courbot
2014-Jul-08  08:25 UTC
[Nouveau] [PATCH v4 3/6] drm/nouveau: introduce nv_device_is_cpu_coherent()
Add a function allowing us to know whether a device is CPU-coherent,
i.e. accesses performed by the CPU on GPU-mapped buffers will
be immediately visible on the GPU side and vice-versa.
For now, a device is considered to be coherent if it uses the PCI bus on
a non-ARM architecture.
Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
---
 drivers/gpu/drm/nouveau/core/engine/device/base.c  | 6 ++++++
 drivers/gpu/drm/nouveau/core/include/core/device.h | 3 +++
 2 files changed, 9 insertions(+)
diff --git a/drivers/gpu/drm/nouveau/core/engine/device/base.c
b/drivers/gpu/drm/nouveau/core/engine/device/base.c
index e4e9e64988fe..23c7061aac3c 100644
--- a/drivers/gpu/drm/nouveau/core/engine/device/base.c
+++ b/drivers/gpu/drm/nouveau/core/engine/device/base.c
@@ -520,6 +520,12 @@ nv_device_get_irq(struct nouveau_device *device, bool
stall)
 	}
 }
 
+bool
+nv_device_is_cpu_coherent(struct nouveau_device *device)
+{
+	return (!IS_ENABLED(CONFIG_ARM) && nv_device_is_pci(device));
+}
+
 static struct nouveau_oclass
 nouveau_device_oclass = {
 	.handle = NV_ENGINE(DEVICE, 0x00),
diff --git a/drivers/gpu/drm/nouveau/core/include/core/device.h
b/drivers/gpu/drm/nouveau/core/include/core/device.h
index a8a9a9cf16cb..1f9d5d87cf06 100644
--- a/drivers/gpu/drm/nouveau/core/include/core/device.h
+++ b/drivers/gpu/drm/nouveau/core/include/core/device.h
@@ -171,4 +171,7 @@ nv_device_unmap_page(struct nouveau_device *device,
dma_addr_t addr);
 int
 nv_device_get_irq(struct nouveau_device *device, bool stall);
 
+bool
+nv_device_is_cpu_coherent(struct nouveau_device *device);
+
 #endif
-- 
2.0.0
Alexandre Courbot
2014-Jul-08  08:25 UTC
[Nouveau] [PATCH v4 4/6] drm/nouveau: synchronize BOs when required
On architectures for which access to GPU memory is non-coherent,
caches need to be flushed and invalidated explicitly when BO control
changes between CPU and GPU.
This patch adds buffer synchronization functions which invokes the
correct API (PCI or DMA) to ensure synchronization is effective.
Based on the TTM DMA cache helper patches by Lucas Stach.
Signed-off-by: Lucas Stach <dev at lynxeye.de>
Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
---
 drivers/gpu/drm/nouveau/nouveau_bo.c  | 56 +++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_bo.h  |  2 ++
 drivers/gpu/drm/nouveau/nouveau_gem.c | 12 ++++++++
 3 files changed, 70 insertions(+)
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 67e9e8e2e2ec..47e4e8886769 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -402,6 +402,60 @@ nouveau_bo_unmap(struct nouveau_bo *nvbo)
 		ttm_bo_kunmap(&nvbo->kmap);
 }
 
+void
+nouveau_bo_sync_for_device(struct nouveau_bo *nvbo)
+{
+	struct nouveau_drm *drm = nouveau_bdev(nvbo->bo.bdev);
+	struct nouveau_device *device = nouveau_dev(drm->dev);
+	struct ttm_dma_tt *ttm_dma = (struct ttm_dma_tt *)nvbo->bo.ttm;
+	int i;
+
+	if (!ttm_dma)
+		return;
+
+	if (nv_device_is_cpu_coherent(device) || nvbo->force_coherent)
+		return;
+
+	if (nv_device_is_pci(device)) {
+		for (i = 0; i < ttm_dma->ttm.num_pages; i++)
+			pci_dma_sync_single_for_device(device->pdev,
+				ttm_dma->dma_address[i], PAGE_SIZE,
+				PCI_DMA_TODEVICE);
+	} else {
+		for (i = 0; i < ttm_dma->ttm.num_pages; i++)
+			dma_sync_single_for_device(nv_device_base(device),
+				ttm_dma->dma_address[i], PAGE_SIZE,
+				DMA_TO_DEVICE);
+	}
+}
+
+void
+nouveau_bo_sync_for_cpu(struct nouveau_bo *nvbo)
+{
+	struct nouveau_drm *drm = nouveau_bdev(nvbo->bo.bdev);
+	struct nouveau_device *device = nouveau_dev(drm->dev);
+	struct ttm_dma_tt *ttm_dma = (struct ttm_dma_tt *)nvbo->bo.ttm;
+	int i;
+
+	if (!ttm_dma)
+		return;
+
+	if (nv_device_is_cpu_coherent(device) || nvbo->force_coherent)
+		return;
+
+	if (nv_device_is_pci(device)) {
+		for (i = 0; i < ttm_dma->ttm.num_pages; i++)
+			pci_dma_sync_single_for_cpu(device->pdev,
+				ttm_dma->dma_address[i], PAGE_SIZE,
+				PCI_DMA_FROMDEVICE);
+	} else {
+		for (i = 0; i < ttm_dma->ttm.num_pages; i++)
+			dma_sync_single_for_cpu(nv_device_base(device),
+				ttm_dma->dma_address[i], PAGE_SIZE,
+				DMA_FROM_DEVICE);
+	}
+}
+
 int
 nouveau_bo_validate(struct nouveau_bo *nvbo, bool interruptible,
 		    bool no_wait_gpu)
@@ -418,6 +472,8 @@ nouveau_bo_validate(struct nouveau_bo *nvbo, bool
interruptible,
 	if (!ttm)
 		return ret;
 
+	nouveau_bo_sync_for_device(nvbo);
+
 	device = nouveau_dev(nouveau_bdev(ttm->bdev)->dev);
 	nv_wr32(device, 0x70004, 0x00000001);
 	if (!nv_wait(device, 0x070004, 0x00000001, 0x00000000))
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.h
b/drivers/gpu/drm/nouveau/nouveau_bo.h
index ff17c1f432fc..fa42298d2dca 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.h
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.h
@@ -81,6 +81,8 @@ void nouveau_bo_wr32(struct nouveau_bo *, unsigned index, u32
val);
 void nouveau_bo_fence(struct nouveau_bo *, struct nouveau_fence *);
 int  nouveau_bo_validate(struct nouveau_bo *, bool interruptible,
 			 bool no_wait_gpu);
+void nouveau_bo_sync_for_device(struct nouveau_bo *nvbo);
+void nouveau_bo_sync_for_cpu(struct nouveau_bo *nvbo);
 
 struct nouveau_vma *
 nouveau_bo_vma_find(struct nouveau_bo *, struct nouveau_vm *);
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c
b/drivers/gpu/drm/nouveau/nouveau_gem.c
index c90c0dc0afe8..08829a720891 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -896,6 +896,7 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void
*data,
 	spin_lock(&nvbo->bo.bdev->fence_lock);
 	ret = ttm_bo_wait(&nvbo->bo, true, true, no_wait);
 	spin_unlock(&nvbo->bo.bdev->fence_lock);
+	nouveau_bo_sync_for_cpu(nvbo);
 	drm_gem_object_unreference_unlocked(gem);
 	return ret;
 }
@@ -904,6 +905,17 @@ int
 nouveau_gem_ioctl_cpu_fini(struct drm_device *dev, void *data,
 			   struct drm_file *file_priv)
 {
+	struct drm_nouveau_gem_cpu_fini *req = data;
+	struct drm_gem_object *gem;
+	struct nouveau_bo *nvbo;
+
+	gem = drm_gem_object_lookup(dev, file_priv, req->handle);
+	if (!gem)
+		return -ENOENT;
+	nvbo = nouveau_gem_object(gem);
+
+	nouveau_bo_sync_for_device(nvbo);
+	drm_gem_object_unreference_unlocked(gem);
 	return 0;
 }
 
-- 
2.0.0
Alexandre Courbot
2014-Jul-08  08:26 UTC
[Nouveau] [PATCH v4 5/6] drm/nouveau: implement explicitly coherent BOs
Allow nouveau_bo_new() to recognize the TTM_PL_FLAG_UNCACHED flag, which
means that we want the allocated BO to be perfectly coherent between the
CPU and GPU. This is useful on non-coherent architectures for which we
do not want to manually sync some rarely-accessed buffers: typically,
fences and pushbuffers.
A TTM BO allocated with the TTM_PL_FLAG_UNCACHED on a non-coherent
architecture will be populated using the DMA API, and accesses to it
performed using the coherent mapping performed by dma_alloc_coherent().
Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
---
 drivers/gpu/drm/nouveau/nouveau_bo.c | 76 ++++++++++++++++++++++++++++++++----
 drivers/gpu/drm/nouveau/nouveau_bo.h |  1 +
 2 files changed, 69 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 47e4e8886769..23a29adfabf0 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -219,6 +219,9 @@ nouveau_bo_new(struct drm_device *dev, int size, int align,
 	nvbo->tile_flags = tile_flags;
 	nvbo->bo.bdev = &drm->ttm.bdev;
 
+	if (!nv_device_is_cpu_coherent(nouveau_dev(dev)))
+		nvbo->force_coherent = flags & TTM_PL_FLAG_UNCACHED;
+
 	nvbo->page_shift = 12;
 	if (drm->client.base.vm) {
 		if (!(flags & TTM_PL_FLAG_TT) && size > 256 * 1024)
@@ -289,8 +292,9 @@ void
 nouveau_bo_placement_set(struct nouveau_bo *nvbo, uint32_t type, uint32_t busy)
 {
 	struct ttm_placement *pl = &nvbo->placement;
-	uint32_t flags = TTM_PL_MASK_CACHING |
-		(nvbo->pin_refcnt ? TTM_PL_FLAG_NO_EVICT : 0);
+	uint32_t flags = (nvbo->force_coherent ? TTM_PL_FLAG_UNCACHED :
+						 TTM_PL_MASK_CACHING) |
+			 (nvbo->pin_refcnt ? TTM_PL_FLAG_NO_EVICT : 0);
 
 	pl->placement = nvbo->placements;
 	set_placement_list(nvbo->placements, &pl->num_placement,
@@ -390,7 +394,14 @@ nouveau_bo_map(struct nouveau_bo *nvbo)
 	if (ret)
 		return ret;
 
-	ret = ttm_bo_kmap(&nvbo->bo, 0, nvbo->bo.mem.num_pages,
&nvbo->kmap);
+	/*
+	 * TTM buffers allocated using the DMA API already have a mapping, let's
+	 * use it instead.
+	 */
+	if (!nvbo->force_coherent)
+		ret = ttm_bo_kmap(&nvbo->bo, 0, nvbo->bo.mem.num_pages,
+				  &nvbo->kmap);
+
 	ttm_bo_unreserve(&nvbo->bo);
 	return ret;
 }
@@ -398,7 +409,14 @@ nouveau_bo_map(struct nouveau_bo *nvbo)
 void
 nouveau_bo_unmap(struct nouveau_bo *nvbo)
 {
-	if (nvbo)
+	if (!nvbo)
+		return;
+
+	/*
+	 * TTM buffers allocated using the DMA API already had a coherent
+	 * mapping which we used, no need to unmap.
+	 */
+	if (!nvbo->force_coherent)
 		ttm_bo_kunmap(&nvbo->kmap);
 }
 
@@ -482,12 +500,36 @@ nouveau_bo_validate(struct nouveau_bo *nvbo, bool
interruptible,
 	return 0;
 }
 
+static inline void *
+_nouveau_bo_mem_index(struct nouveau_bo *nvbo, unsigned index, void *mem, u8
sz)
+{
+	struct ttm_dma_tt *dma_tt;
+	u8 *m = mem;
+
+	index *= sz;
+
+	if (m) {
+		/* kmap'd address, return the corresponding offset */
+		m += index;
+	} else {
+		/* DMA-API mapping, lookup the right address */
+		dma_tt = (struct ttm_dma_tt *)nvbo->bo.ttm;
+		m = dma_tt->cpu_address[index / PAGE_SIZE];
+		m += index % PAGE_SIZE;
+	}
+
+	return m;
+}
+#define nouveau_bo_mem_index(o, i, m) _nouveau_bo_mem_index(o, i, m,
sizeof(*m))
+
 u16
 nouveau_bo_rd16(struct nouveau_bo *nvbo, unsigned index)
 {
 	bool is_iomem;
 	u16 *mem = ttm_kmap_obj_virtual(&nvbo->kmap, &is_iomem);
-	mem = &mem[index];
+
+	mem = nouveau_bo_mem_index(nvbo, index, mem);
+
 	if (is_iomem)
 		return ioread16_native((void __force __iomem *)mem);
 	else
@@ -499,7 +541,9 @@ nouveau_bo_wr16(struct nouveau_bo *nvbo, unsigned index, u16
val)
 {
 	bool is_iomem;
 	u16 *mem = ttm_kmap_obj_virtual(&nvbo->kmap, &is_iomem);
-	mem = &mem[index];
+
+	mem = nouveau_bo_mem_index(nvbo, index, mem);
+
 	if (is_iomem)
 		iowrite16_native(val, (void __force __iomem *)mem);
 	else
@@ -511,7 +555,9 @@ nouveau_bo_rd32(struct nouveau_bo *nvbo, unsigned index)
 {
 	bool is_iomem;
 	u32 *mem = ttm_kmap_obj_virtual(&nvbo->kmap, &is_iomem);
-	mem = &mem[index];
+
+	mem = nouveau_bo_mem_index(nvbo, index, mem);
+
 	if (is_iomem)
 		return ioread32_native((void __force __iomem *)mem);
 	else
@@ -523,7 +569,9 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index, u32
val)
 {
 	bool is_iomem;
 	u32 *mem = ttm_kmap_obj_virtual(&nvbo->kmap, &is_iomem);
-	mem = &mem[index];
+
+	mem = nouveau_bo_mem_index(nvbo, index, mem);
+
 	if (is_iomem)
 		iowrite32_native(val, (void __force __iomem *)mem);
 	else
@@ -1426,6 +1474,14 @@ nouveau_ttm_tt_populate(struct ttm_tt *ttm)
 	device = nv_device(drm->device);
 	dev = drm->dev;
 
+	/*
+	 * Objects matching this condition have been marked as force_coherent,
+	 * so use the DMA API for them.
+	 */
+	if (!nv_device_is_cpu_coherent(device) &&
+	    ttm->caching_state == tt_uncached)
+		return ttm_dma_populate(ttm_dma, dev->dev);
+
 #if __OS_HAS_AGP
 	if (drm->agp.stat == ENABLED) {
 		return ttm_agp_tt_populate(ttm);
@@ -1476,6 +1532,10 @@ nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm)
 	device = nv_device(drm->device);
 	dev = drm->dev;
 
+	if (!nv_device_is_cpu_coherent(device) &&
+	    ttm->caching_state == tt_uncached)
+		ttm_dma_unpopulate(ttm_dma, dev->dev);
+
 #if __OS_HAS_AGP
 	if (drm->agp.stat == ENABLED) {
 		ttm_agp_tt_unpopulate(ttm);
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.h
b/drivers/gpu/drm/nouveau/nouveau_bo.h
index fa42298d2dca..9a111b92fb34 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.h
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.h
@@ -11,6 +11,7 @@ struct nouveau_bo {
 	u32 valid_domains;
 	u32 placements[3];
 	u32 busy_placements[3];
+	bool force_coherent;
 	struct ttm_bo_kmap_obj kmap;
 	struct list_head head;
 
-- 
2.0.0
Alexandre Courbot
2014-Jul-08  08:26 UTC
[Nouveau] [PATCH v4 6/6] drm/nouveau: allocate GPFIFOs and fences coherently
Specify TTM_PL_FLAG_UNCACHED when allocating GPFIFOs and fences to
allow them to be safely accessed by the kernel without being synced
on non-coherent architectures.
Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
---
 drivers/gpu/drm/nouveau/nouveau_chan.c | 2 +-
 drivers/gpu/drm/nouveau/nv84_fence.c   | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c
b/drivers/gpu/drm/nouveau/nouveau_chan.c
index ccb6b452d6d0..155b1b192676 100644
--- a/drivers/gpu/drm/nouveau/nouveau_chan.c
+++ b/drivers/gpu/drm/nouveau/nouveau_chan.c
@@ -110,7 +110,7 @@ nouveau_channel_prep(struct nouveau_drm *drm, struct
nouveau_cli *cli,
 	chan->handle = handle;
 
 	/* allocate memory for dma push buffer */
-	target = TTM_PL_FLAG_TT;
+	target = TTM_PL_FLAG_TT | TTM_PL_FLAG_UNCACHED;
 	if (nouveau_vram_pushbuf)
 		target = TTM_PL_FLAG_VRAM;
 
diff --git a/drivers/gpu/drm/nouveau/nv84_fence.c
b/drivers/gpu/drm/nouveau/nv84_fence.c
index 9fd475c89820..b5d6737b6b8d 100644
--- a/drivers/gpu/drm/nouveau/nv84_fence.c
+++ b/drivers/gpu/drm/nouveau/nv84_fence.c
@@ -257,8 +257,8 @@ nv84_fence_create(struct nouveau_drm *drm)
 
 	if (ret == 0)
 		ret = nouveau_bo_new(drm->dev, 16 * (pfifo->max + 1), 0,
-				     TTM_PL_FLAG_TT, 0, 0, NULL,
-				     &priv->bo_gart);
+				     TTM_PL_FLAG_TT | TTM_PL_FLAG_UNCACHED, 0,
+				     0, NULL, &priv->bo_gart);
 	if (ret == 0) {
 		ret = nouveau_bo_pin(priv->bo_gart, TTM_PL_FLAG_TT);
 		if (ret == 0) {
-- 
2.0.0
Daniel Vetter
2014-Jul-10  12:58 UTC
[Nouveau] [PATCH v4 2/6] drm/nouveau: map pages using DMA API on platform devices
On Tue, Jul 08, 2014 at 05:25:57PM +0900, Alexandre Courbot wrote:> page_to_phys() is not the correct way to obtain the DMA address of a > buffer on a non-PCI system. Use the DMA API functions for this, which > are portable and will allow us to use other DMA API functions for > buffer synchronization. > > Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> > --- > drivers/gpu/drm/nouveau/core/engine/device/base.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/nouveau/core/engine/device/base.c b/drivers/gpu/drm/nouveau/core/engine/device/base.c > index 18c8c7245b73..e4e9e64988fe 100644 > --- a/drivers/gpu/drm/nouveau/core/engine/device/base.c > +++ b/drivers/gpu/drm/nouveau/core/engine/device/base.c > @@ -489,7 +489,10 @@ nv_device_map_page(struct nouveau_device *device, struct page *page) > if (pci_dma_mapping_error(device->pdev, ret)) > ret = 0; > } else { > - ret = page_to_phys(page); > + ret = dma_map_page(&device->platformdev->dev, page, 0, > + PAGE_SIZE, DMA_BIDIRECTIONAL); > + if (dma_mapping_error(&device->platformdev->dev, ret)) > + ret = 0; > } > > return ret; > @@ -501,6 +504,9 @@ nv_device_unmap_page(struct nouveau_device *device, dma_addr_t addr) > if (nv_device_is_pci(device)) > pci_unmap_page(device->pdev, addr, PAGE_SIZE, > PCI_DMA_BIDIRECTIONAL);pci_map/unmap alias to dma_unmap/map when called on the underlying struct device embedded in pci_device (like for platform drivers). Dunno whether it's worth to track a pointer to the struct device directly and always call dma_unmap/map. Just drive-by comment since I'm interested in how you solve this - i915 has similar fun with buffer sharing and coherent and non-coherent platforms. Although we don't have fun with pci and non-pci based platforms. -Daniel> + else > + dma_unmap_page(&device->platformdev->dev, addr, > + PAGE_SIZE, DMA_BIDIRECTIONAL); > } > > int > -- > 2.0.0 > > _______________________________________________ > Nouveau mailing list > Nouveau at lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/nouveau-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Daniel Vetter
2014-Jul-10  13:04 UTC
[Nouveau] [PATCH v4 4/6] drm/nouveau: synchronize BOs when required
On Tue, Jul 08, 2014 at 05:25:59PM +0900, Alexandre Courbot wrote:> On architectures for which access to GPU memory is non-coherent, > caches need to be flushed and invalidated explicitly when BO control > changes between CPU and GPU. > > This patch adds buffer synchronization functions which invokes the > correct API (PCI or DMA) to ensure synchronization is effective. > > Based on the TTM DMA cache helper patches by Lucas Stach. > > Signed-off-by: Lucas Stach <dev at lynxeye.de> > Signed-off-by: Alexandre Courbot <acourbot at nvidia.com> > --- > drivers/gpu/drm/nouveau/nouveau_bo.c | 56 +++++++++++++++++++++++++++++++++++ > drivers/gpu/drm/nouveau/nouveau_bo.h | 2 ++ > drivers/gpu/drm/nouveau/nouveau_gem.c | 12 ++++++++ > 3 files changed, 70 insertions(+) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c > index 67e9e8e2e2ec..47e4e8886769 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_bo.c > +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c > @@ -402,6 +402,60 @@ nouveau_bo_unmap(struct nouveau_bo *nvbo) > ttm_bo_kunmap(&nvbo->kmap); > } > > +void > +nouveau_bo_sync_for_device(struct nouveau_bo *nvbo) > +{ > + struct nouveau_drm *drm = nouveau_bdev(nvbo->bo.bdev); > + struct nouveau_device *device = nouveau_dev(drm->dev); > + struct ttm_dma_tt *ttm_dma = (struct ttm_dma_tt *)nvbo->bo.ttm; > + int i; > + > + if (!ttm_dma) > + return; > + > + if (nv_device_is_cpu_coherent(device) || nvbo->force_coherent) > + return;Is the is_cpu_coherent check really required? On coherent platforms the sync_for_foo should be a noop. It's the dma api's job to encapsulate this knowledge so that drivers can be blissfully ignorant. The explicit is_coherent check makes this a bit leaky. And same comment that underlying the bus-specifics dma-mapping functions are identical. -Daniel> + > + if (nv_device_is_pci(device)) { > + for (i = 0; i < ttm_dma->ttm.num_pages; i++) > + pci_dma_sync_single_for_device(device->pdev, > + ttm_dma->dma_address[i], PAGE_SIZE, > + PCI_DMA_TODEVICE); > + } else { > + for (i = 0; i < ttm_dma->ttm.num_pages; i++) > + dma_sync_single_for_device(nv_device_base(device), > + ttm_dma->dma_address[i], PAGE_SIZE, > + DMA_TO_DEVICE); > + } > +} > + > +void > +nouveau_bo_sync_for_cpu(struct nouveau_bo *nvbo) > +{ > + struct nouveau_drm *drm = nouveau_bdev(nvbo->bo.bdev); > + struct nouveau_device *device = nouveau_dev(drm->dev); > + struct ttm_dma_tt *ttm_dma = (struct ttm_dma_tt *)nvbo->bo.ttm; > + int i; > + > + if (!ttm_dma) > + return; > + > + if (nv_device_is_cpu_coherent(device) || nvbo->force_coherent) > + return; > + > + if (nv_device_is_pci(device)) { > + for (i = 0; i < ttm_dma->ttm.num_pages; i++) > + pci_dma_sync_single_for_cpu(device->pdev, > + ttm_dma->dma_address[i], PAGE_SIZE, > + PCI_DMA_FROMDEVICE); > + } else { > + for (i = 0; i < ttm_dma->ttm.num_pages; i++) > + dma_sync_single_for_cpu(nv_device_base(device), > + ttm_dma->dma_address[i], PAGE_SIZE, > + DMA_FROM_DEVICE); > + } > +} > + > int > nouveau_bo_validate(struct nouveau_bo *nvbo, bool interruptible, > bool no_wait_gpu) > @@ -418,6 +472,8 @@ nouveau_bo_validate(struct nouveau_bo *nvbo, bool interruptible, > if (!ttm) > return ret; > > + nouveau_bo_sync_for_device(nvbo); > + > device = nouveau_dev(nouveau_bdev(ttm->bdev)->dev); > nv_wr32(device, 0x70004, 0x00000001); > if (!nv_wait(device, 0x070004, 0x00000001, 0x00000000)) > diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.h b/drivers/gpu/drm/nouveau/nouveau_bo.h > index ff17c1f432fc..fa42298d2dca 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_bo.h > +++ b/drivers/gpu/drm/nouveau/nouveau_bo.h > @@ -81,6 +81,8 @@ void nouveau_bo_wr32(struct nouveau_bo *, unsigned index, u32 val); > void nouveau_bo_fence(struct nouveau_bo *, struct nouveau_fence *); > int nouveau_bo_validate(struct nouveau_bo *, bool interruptible, > bool no_wait_gpu); > +void nouveau_bo_sync_for_device(struct nouveau_bo *nvbo); > +void nouveau_bo_sync_for_cpu(struct nouveau_bo *nvbo); > > struct nouveau_vma * > nouveau_bo_vma_find(struct nouveau_bo *, struct nouveau_vm *); > diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c > index c90c0dc0afe8..08829a720891 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_gem.c > +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c > @@ -896,6 +896,7 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void *data, > spin_lock(&nvbo->bo.bdev->fence_lock); > ret = ttm_bo_wait(&nvbo->bo, true, true, no_wait); > spin_unlock(&nvbo->bo.bdev->fence_lock); > + nouveau_bo_sync_for_cpu(nvbo); > drm_gem_object_unreference_unlocked(gem); > return ret; > } > @@ -904,6 +905,17 @@ int > nouveau_gem_ioctl_cpu_fini(struct drm_device *dev, void *data, > struct drm_file *file_priv) > { > + struct drm_nouveau_gem_cpu_fini *req = data; > + struct drm_gem_object *gem; > + struct nouveau_bo *nvbo; > + > + gem = drm_gem_object_lookup(dev, file_priv, req->handle); > + if (!gem) > + return -ENOENT; > + nvbo = nouveau_gem_object(gem); > + > + nouveau_bo_sync_for_device(nvbo); > + drm_gem_object_unreference_unlocked(gem); > return 0; > } > > -- > 2.0.0 > > _______________________________________________ > Nouveau mailing list > Nouveau at lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/nouveau-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Maybe Matching Threads
- [PATCH v5 0/4] drm: nouveau: memory coherency on ARM
- [PATCH v4 4/6] drm/nouveau: synchronize BOs when required
- [PATCH v4 4/6] drm/nouveau: synchronize BOs when required
- [PATCH 0/4] drm/ttm: nouveau: memory coherency fixes for ARM
- [PATCH v2 0/3] drm/ttm: nouveau: memory coherency for ARM