thr3ads.net - Nouveau - [Nouveau] [PATCH 2/4] drm/ttm: introduce dma cache sync helpers [May 2014]

If this information is useful, please help other people find it:
Share via:

Alexandre Courbot

2014-May-19 07:10 UTC

[Nouveau] [PATCH 0/4] drm/ttm: nouveau: memory coherency fixes for ARM

This small series introduces TTM helper functions as well as Nouveau hooks that
are needed to ensure buffer coherency on ARM. Most of this series is a
forward-port of some patches Lucas Stach sent last year and that are also
needed for Nouveau GK20A support:

http://lists.freedesktop.org/archives/nouveau/2013-August/014026.html

Another patch takes care of flushing the CPU write-buffer when writing BOs
through a non-BAR path.

Alexandre Courbot (1):
  drm/nouveau: introduce CPU cache flushing macro

Lucas Stach (3):
  drm/ttm: recognize ARM arch in ioprot handler
  drm/ttm: introduce dma cache sync helpers
  drm/nouveau: hook up cache sync functions

 drivers/gpu/drm/nouveau/core/os.h     | 17 +++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_bo.c  | 40 +++++++++++++++++++++++++++++++++--
 drivers/gpu/drm/nouveau/nouveau_bo.h  | 20 ++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_gem.c |  8 ++++++-
 drivers/gpu/drm/ttm/ttm_bo_util.c     |  2 +-
 drivers/gpu/drm/ttm/ttm_tt.c          | 25 ++++++++++++++++++++++
 include/drm/ttm/ttm_bo_driver.h       | 28 ++++++++++++++++++++++++
 7 files changed, 136 insertions(+), 4 deletions(-)

-- 
1.9.2

Alexandre Courbot

2014-May-19 07:10 UTC

head link

[Nouveau] [PATCH 1/4] drm/ttm: recognize ARM arch in ioprot handler

From: Lucas Stach <dev at lynxeye.de>

Signed-off-by: Lucas Stach <dev at lynxeye.de>
Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
---
 drivers/gpu/drm/ttm/ttm_bo_util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 1df856f78568..30e5d90cb7bc 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -500,7 +500,7 @@ pgprot_t ttm_io_prot(uint32_t caching_flags, pgprot_t tmp)
 			pgprot_val(tmp) |= _PAGE_GUARDED;
 	}
 #endif
-#if defined(__ia64__)
+#if defined(__ia64__) || defined(__arm__)
 	if (caching_flags & TTM_PL_FLAG_WC)
 		tmp = pgprot_writecombine(tmp);
 	else
-- 
1.9.2

Alexandre Courbot

2014-May-19 07:10 UTC

head link

[Nouveau] [PATCH 2/4] drm/ttm: introduce dma cache sync helpers

From: Lucas Stach <dev at lynxeye.de>

On arches with non-coherent PCI, we need to flush caches ourselfes at
the appropriate places. Introduce two small helpers to make things easy
for TTM based drivers.

Signed-off-by: Lucas Stach <dev at lynxeye.de>
Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
---
 drivers/gpu/drm/ttm/ttm_tt.c    | 25 +++++++++++++++++++++++++
 include/drm/ttm/ttm_bo_driver.h | 28 ++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 75f319090043..05a316b71ad1 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -38,6 +38,7 @@
 #include <linux/swap.h>
 #include <linux/slab.h>
 #include <linux/export.h>
+#include <linux/dma-mapping.h>
 #include <drm/drm_cache.h>
 #include <drm/drm_mem_util.h>
 #include <drm/ttm/ttm_module.h>
@@ -248,6 +249,30 @@ void ttm_dma_tt_fini(struct ttm_dma_tt *ttm_dma)
 }
 EXPORT_SYMBOL(ttm_dma_tt_fini);
 
+void ttm_dma_tt_cache_sync_for_device(struct ttm_dma_tt *ttm_dma,
+				      struct device *dev)
+{
+	int i;
+
+	for (i = 0; i < ttm_dma->ttm.num_pages; i++) {
+		dma_sync_single_for_device(dev, ttm_dma->dma_address[i],
+					   PAGE_SIZE, DMA_TO_DEVICE);
+	}
+}
+EXPORT_SYMBOL(ttm_dma_tt_cache_sync_for_device);
+
+void ttm_dma_tt_cache_sync_for_cpu(struct ttm_dma_tt *ttm_dma,
+				   struct device *dev)
+{
+	int i;
+
+	for (i = 0; i < ttm_dma->ttm.num_pages; i++) {
+		dma_sync_single_for_cpu(dev, ttm_dma->dma_address[i],
+					PAGE_SIZE, DMA_FROM_DEVICE);
+	}
+}
+EXPORT_SYMBOL(ttm_dma_tt_cache_sync_for_cpu);
+
 void ttm_tt_unbind(struct ttm_tt *ttm)
 {
 	int ret;
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index a5183da3ef92..52fb709568fc 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -41,6 +41,7 @@
 #include <linux/fs.h>
 #include <linux/spinlock.h>
 #include <linux/reservation.h>
+#include <linux/device.h>
 
 struct ttm_backend_func {
 	/**
@@ -690,6 +691,33 @@ extern int ttm_tt_swapout(struct ttm_tt *ttm,
  */
 extern void ttm_tt_unpopulate(struct ttm_tt *ttm);
 
+/**
+ * ttm_dma_tt_cache_sync_for_device:
+ *
+ * @ttm A struct ttm_tt of the type returned by ttm_dma_tt_init.
+ * @dev A struct device representing the device to which to sync.
+ *
+ * This function will flush the CPU caches on arches where snooping in the
+ * TT is not available. On fully coherent arches this will turn into an
(almost)
+ * noop. This makes sure that data written by the CPU is visible to the device.
+ */
+extern void ttm_dma_tt_cache_sync_for_device(struct ttm_dma_tt *ttm_dma,
+					     struct device *dev);
+
+/**
+ * ttm_dma_tt_cache_sync_for_cpu:
+ *
+ * @ttm A struct ttm_tt of the type returned by ttm_dma_tt_init.
+ * @dev A struct device representing the device from which to sync.
+ *
+ * This function will invalidate the CPU caches on arches where snooping in the
+ * TT is not available. On fully coherent arches this will turn into an
(almost)
+ * noop. This makes sure that the CPU does not read any stale cached or
+ * prefetched data.
+ */
+extern void ttm_dma_tt_cache_sync_for_cpu(struct ttm_dma_tt *ttm_dma,
+					  struct device *dev);
+
 /*
  * ttm_bo.c
  */
-- 
1.9.2

Alexandre Courbot

2014-May-19 07:10 UTC

head link

[Nouveau] [PATCH 3/4] drm/nouveau: hook up cache sync functions

From: Lucas Stach <dev at lynxeye.de>

Signed-off-by: Lucas Stach <dev at lynxeye.de>
[acourbot at nvidia.com: make conditional and platform-friendly]
Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
---
 drivers/gpu/drm/nouveau/nouveau_bo.c  | 32 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_bo.h  | 20 ++++++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_gem.c |  8 +++++++-
 3 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index b6dc85c614be..0886f47e5244 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -407,6 +407,8 @@ nouveau_bo_validate(struct nouveau_bo *nvbo, bool
interruptible,
 {
 	int ret;
 
+	nouveau_bo_sync_for_device(nvbo);
+
 	ret = ttm_bo_validate(&nvbo->bo, &nvbo->placement,
 			      interruptible, no_wait_gpu);
 	if (ret)
@@ -487,6 +489,36 @@ nouveau_bo_invalidate_caches(struct ttm_bo_device *bdev,
uint32_t flags)
 	return 0;
 }
 
+#ifdef NOUVEAU_NEED_CACHE_SYNC
+void
+nouveau_bo_sync_for_cpu(struct nouveau_bo *nvbo)
+{
+	struct nouveau_device *device;
+	struct ttm_tt *ttm = nvbo->bo.ttm;
+
+	device = nouveau_dev(nouveau_bdev(ttm->bdev)->dev);
+
+	if (nvbo->bo.ttm && nvbo->bo.ttm->caching_state == tt_cached)
+		ttm_dma_tt_cache_sync_for_cpu((struct ttm_dma_tt *)nvbo->bo.ttm,
+					      nv_device_base(device));
+}
+
+void
+nouveau_bo_sync_for_device(struct nouveau_bo *nvbo)
+{
+	struct ttm_tt *ttm = nvbo->bo.ttm;
+
+	if (ttm && ttm->caching_state == tt_cached) {
+		struct nouveau_device *device;
+
+		device = nouveau_dev(nouveau_bdev(ttm->bdev)->dev);
+
+		ttm_dma_tt_cache_sync_for_device((struct ttm_dma_tt *)ttm,
+						 nv_device_base(device));
+	}
+}
+#endif
+
 static int
 nouveau_bo_init_mem_type(struct ttm_bo_device *bdev, uint32_t type,
 			 struct ttm_mem_type_manager *man)
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.h
b/drivers/gpu/drm/nouveau/nouveau_bo.h
index ff17c1f432fc..ead214931223 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.h
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.h
@@ -89,6 +89,26 @@ int  nouveau_bo_vma_add(struct nouveau_bo *, struct
nouveau_vm *,
 			struct nouveau_vma *);
 void nouveau_bo_vma_del(struct nouveau_bo *, struct nouveau_vma *);
 
+#if IS_ENABLED(CONFIG_ARCH_TEGRA)
+#define NOUVEAU_NEED_CACHE_SYNC
+#endif
+
+#ifdef NOUVEAU_NEED_CACHE_SYNC
+void nouveau_bo_sync_for_cpu(struct nouveau_bo *);
+void nouveau_bo_sync_for_device(struct nouveau_bo *);
+#else
+static inline void
+nouveau_bo_sync_for_cpu(struct nouveau_bo *)
+{
+}
+
+static inline void
+nouveau_bo_sync_for_device(struct nouveau_bo *)
+{
+}
+#endif
+
+
 /* TODO: submit equivalent to TTM generic API upstream? */
 static inline void __iomem *
 nvbo_kmap_obj_iovirtual(struct nouveau_bo *nvbo)
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c
b/drivers/gpu/drm/nouveau/nouveau_gem.c
index c90c0dc0afe8..b7e42fdc9634 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -897,7 +897,13 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void
*data,
 	ret = ttm_bo_wait(&nvbo->bo, true, true, no_wait);
 	spin_unlock(&nvbo->bo.bdev->fence_lock);
 	drm_gem_object_unreference_unlocked(gem);
-	return ret;
+
+	if (ret)
+		return ret;
+
+	nouveau_bo_sync_for_cpu(nvbo);
+
+	return 0;
 }
 
 int
-- 
1.9.2

Alexandre Courbot

2014-May-19 07:10 UTC

head link

[Nouveau] [PATCH 4/4] drm/nouveau: introduce CPU cache flushing macro

Some architectures (e.g. ARM) need the CPU buffers to be explicitely
flushed for a memory write to take effect. Not doing so results in
synchronization issues, especially after writing to BOs.

This patch introduces a macro that flushes the caches on ARM and
translates to a no-op on other architectures, and uses it when
writing to in-memory BOs. It will also be useful for implementations of
instmem that access shared memory directly instead of going through
PRAMIN.

Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
---
 drivers/gpu/drm/nouveau/core/os.h    | 17 +++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_bo.c |  8 ++++++--
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/core/os.h
b/drivers/gpu/drm/nouveau/core/os.h
index d0ced94ca54c..274b4460bb03 100644
--- a/drivers/gpu/drm/nouveau/core/os.h
+++ b/drivers/gpu/drm/nouveau/core/os.h
@@ -38,4 +38,21 @@
 #endif /* def __BIG_ENDIAN else */
 #endif /* !ioread32_native */
 
+#if defined(__arm__)
+
+#define nv_cpu_cache_flush_area(va, size)	\
+do {						\
+	phys_addr_t pa = virt_to_phys(va);	\
+	__cpuc_flush_dcache_area(va, size);	\
+	outer_flush_range(pa, pa + size);	\
+} while (0)
+
+#else
+
+#define nv_cpu_cache_flush_area(va, size)	\
+do {						\
+} while (0)
+
+#endif /* defined(__arm__) */
+
 #endif
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 0886f47e5244..b9c9729c5733 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -437,8 +437,10 @@ nouveau_bo_wr16(struct nouveau_bo *nvbo, unsigned index,
u16 val)
 	mem = &mem[index];
 	if (is_iomem)
 		iowrite16_native(val, (void __force __iomem *)mem);
-	else
+	else {
 		*mem = val;
+		nv_cpu_cache_flush_area(mem, 2);
+	}
 }
 
 u32
@@ -461,8 +463,10 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned index,
u32 val)
 	mem = &mem[index];
 	if (is_iomem)
 		iowrite32_native(val, (void __force __iomem *)mem);
-	else
+	else {
 		*mem = val;
+		nv_cpu_cache_flush_area(mem, 4);
+	}
 }
 
 static struct ttm_tt *
-- 
1.9.2

Thierry Reding

2014-May-19 08:33 UTC

head link

[Nouveau] [PATCH 2/4] drm/ttm: introduce dma cache sync helpers

On Mon, May 19, 2014 at 04:10:56PM +0900, Alexandre Courbot
wrote:> From: Lucas Stach <dev at lynxeye.de>
> 
> On arches with non-coherent PCI,
I guess since this applies to gk20a
> we need to flush caches ourselfes at
"ourselves". Or perhaps even reword to something like: "...,
caches need
to be flushed and invalidated explicitly", since dma_sync_for_cpu() does
invalidate rather than flush.
> the appropriate places. Introduce two small helpers to make things easy
> for TTM based drivers.
> 
> Signed-off-by: Lucas Stach <dev at lynxeye.de>
> Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
> ---
>  drivers/gpu/drm/ttm/ttm_tt.c    | 25 +++++++++++++++++++++++++
>  include/drm/ttm/ttm_bo_driver.h | 28 ++++++++++++++++++++++++++++
>  2 files changed, 53 insertions(+)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
[...]> +void ttm_dma_tt_cache_sync_for_device(struct ttm_dma_tt *ttm_dma,
> +				      struct device *dev)
> +{
> +	int i;
This should probably be unsigned long to match the type of
ttm_dma->ttm.num_pages.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL:
<http://lists.freedesktop.org/archives/nouveau/attachments/20140519/ff67b2bc/attachment.sig>

Thierry Reding

2014-May-19 08:46 UTC

head link

[Nouveau] [PATCH 3/4] drm/nouveau: hook up cache sync functions

On Mon, May 19, 2014 at 04:10:57PM +0900, Alexandre Courbot
wrote:> From: Lucas Stach <dev at lynxeye.de>
> 
> Signed-off-by: Lucas Stach <dev at lynxeye.de>
> [acourbot at nvidia.com: make conditional and platform-friendly]
> Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
Perhaps having a propery commit message here would be good.
> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c
b/drivers/gpu/drm/nouveau/nouveau_bo.c
[...]> +#ifdef NOUVEAU_NEED_CACHE_SYNC
> +void
> +nouveau_bo_sync_for_cpu(struct nouveau_bo *nvbo)
> +{
> +	struct nouveau_device *device;
> +	struct ttm_tt *ttm = nvbo->bo.ttm;
> +
> +	device = nouveau_dev(nouveau_bdev(ttm->bdev)->dev);
> +
> +	if (nvbo->bo.ttm && nvbo->bo.ttm->caching_state ==
tt_cached)
> +		ttm_dma_tt_cache_sync_for_cpu((struct ttm_dma_tt *)nvbo->bo.ttm,
> +					      nv_device_base(device));
Can we be certain at this point that the struct ttm_tt is in fact a
struct ttm_dma_tt?
> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.h
b/drivers/gpu/drm/nouveau/nouveau_bo.h
[...]> +#if IS_ENABLED(CONFIG_ARCH_TEGRA)
> +#define NOUVEAU_NEED_CACHE_SYNC
> +#endif
I know I gave this as an example myself when we discussed this offline,
but I'm now thinking that this might actually be better off in Kconfig.
> +#ifdef NOUVEAU_NEED_CACHE_SYNC
> +void nouveau_bo_sync_for_cpu(struct nouveau_bo *);
> +void nouveau_bo_sync_for_device(struct nouveau_bo *);
> +#else
> +static inline void
> +nouveau_bo_sync_for_cpu(struct nouveau_bo *)
> +{
> +}
> +
> +static inline void
> +nouveau_bo_sync_for_device(struct nouveau_bo *)
> +{
> +}
> +#endif
> +
> +
There's a gratuituous blank line here.
> diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c
b/drivers/gpu/drm/nouveau/nouveau_gem.c
> index c90c0dc0afe8..b7e42fdc9634 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_gem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
> @@ -897,7 +897,13 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev,
void *data,
>  	ret = ttm_bo_wait(&nvbo->bo, true, true, no_wait);
>  	spin_unlock(&nvbo->bo.bdev->fence_lock);
>  	drm_gem_object_unreference_unlocked(gem);
> -	return ret;
> +
> +	if (ret)
> +		return ret;
> +
> +	nouveau_bo_sync_for_cpu(nvbo);
> +
> +	return 0;
>  }
This could be rewritten as:

	if (!ret)
		nouveau_bo_sync_for_cpu(nvbo);

	return ret;

Which would be slightly shorter.

On second thought, perhaps part of nouveau_gem_ioctl_cpu_prep() could be
refactored into a separate function to make this more symmetric. If we
put that in nouveau_bo.c and name it nouveau_bo_wait() for example, the
dummies can go away and both nouveau_bo_sync_for_{cpu,device}() can be
made static. I also think that's cleaner because it has both variants of
the nouveau_bo_sync_for_*() calls in the same file.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL:
<http://lists.freedesktop.org/archives/nouveau/attachments/20140519/a9b3c9d4/attachment.sig>

Thierry Reding

2014-May-19 09:02 UTC

head link

[Nouveau] [PATCH 4/4] drm/nouveau: introduce CPU cache flushing macro

On Mon, May 19, 2014 at 04:10:58PM +0900, Alexandre Courbot
wrote:> Some architectures (e.g. ARM) need the CPU buffers to be explicitely
> flushed for a memory write to take effect. Not doing so results in
> synchronization issues, especially after writing to BOs.
It seems to me that the above is generally true for all architectures,
not just ARM.

Also: s/explicitely/explicitly/
> This patch introduces a macro that flushes the caches on ARM and
> translates to a no-op on other architectures, and uses it when
> writing to in-memory BOs. It will also be useful for implementations of
> instmem that access shared memory directly instead of going through
> PRAMIN.
Presumably instmem can access shared memory on all architectures, so
this doesn't seem like a property of the architecture but rather of the
memory pool backing the instmem.

In that case I wonder if this shouldn't be moved into an operation that
is implemented by the backing memory pool and be a noop where the cache
doesn't need explicit flushing.
> diff --git a/drivers/gpu/drm/nouveau/core/os.h
b/drivers/gpu/drm/nouveau/core/os.h
> index d0ced94ca54c..274b4460bb03 100644
> --- a/drivers/gpu/drm/nouveau/core/os.h
> +++ b/drivers/gpu/drm/nouveau/core/os.h
> @@ -38,4 +38,21 @@
>  #endif /* def __BIG_ENDIAN else */
>  #endif /* !ioread32_native */
>  
> +#if defined(__arm__)
> +
> +#define nv_cpu_cache_flush_area(va, size)	\
> +do {						\
> +	phys_addr_t pa = virt_to_phys(va);	\
> +	__cpuc_flush_dcache_area(va, size);	\
> +	outer_flush_range(pa, pa + size);	\
> +} while (0)
Couldn't this be a static inline function?
> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c
b/drivers/gpu/drm/nouveau/nouveau_bo.c
[...]> index 0886f47e5244..b9c9729c5733 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> @@ -437,8 +437,10 @@ nouveau_bo_wr16(struct nouveau_bo *nvbo, unsigned
index, u16 val)
>  	mem = &mem[index];
>  	if (is_iomem)
>  		iowrite16_native(val, (void __force __iomem *)mem);
> -	else
> +	else {
>  		*mem = val;
> +		nv_cpu_cache_flush_area(mem, 2);
> +	}
>  }
>  
>  u32
> @@ -461,8 +463,10 @@ nouveau_bo_wr32(struct nouveau_bo *nvbo, unsigned
index, u32 val)
>  	mem = &mem[index];
>  	if (is_iomem)
>  		iowrite32_native(val, (void __force __iomem *)mem);
> -	else
> +	else {
>  		*mem = val;
> +		nv_cpu_cache_flush_area(mem, 4);
> +	}
This looks rather like a sledgehammer to me. Effectively this turns nvbo
into an uncached buffer. With additional overhead of constantly flushing
caches. Wouldn't it make more sense to locate the places where these are
called and flush the cache after all the writes have completed?

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL:
<http://lists.freedesktop.org/archives/nouveau/attachments/20140519/a1b385f2/attachment-0001.sig>

Lucas Stach

2014-May-19 09:31 UTC

head link

[Nouveau] [PATCH 3/4] drm/nouveau: hook up cache sync functions

Am Montag, den 19.05.2014, 16:10 +0900 schrieb Alexandre
Courbot:> From: Lucas Stach <dev at lynxeye.de>
> 
> Signed-off-by: Lucas Stach <dev at lynxeye.de>
> [acourbot at nvidia.com: make conditional and platform-friendly]
> Signed-off-by: Alexandre Courbot <acourbot at nvidia.com>
> ---
>  drivers/gpu/drm/nouveau/nouveau_bo.c  | 32
++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/nouveau/nouveau_bo.h  | 20 ++++++++++++++++++++
>  drivers/gpu/drm/nouveau/nouveau_gem.c |  8 +++++++-
>  3 files changed, 59 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c
b/drivers/gpu/drm/nouveau/nouveau_bo.c
> index b6dc85c614be..0886f47e5244 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> @@ -407,6 +407,8 @@ nouveau_bo_validate(struct nouveau_bo *nvbo, bool
interruptible,
>  {
>  	int ret;
>  
> +	nouveau_bo_sync_for_device(nvbo);
> +
>  	ret = ttm_bo_validate(&nvbo->bo, &nvbo->placement,
>  			      interruptible, no_wait_gpu);
>  	if (ret)
> @@ -487,6 +489,36 @@ nouveau_bo_invalidate_caches(struct ttm_bo_device
*bdev, uint32_t flags)
>  	return 0;
>  }
>  
> +#ifdef NOUVEAU_NEED_CACHE_SYNC
I don't like this ifdef at all. I know calling this functions will add a
little overhead to x86 where it isn't strictly required, but I think
it's negligible.

When I looked at them the dma_sync_single_for_[device|cpu] functions
which are called here map out to just a drain of the PCI store buffer on
x86, which should be fast enough to be done unconditionally. They won't
so any time-consuming cache synchronization on PCI coherent arches.

Regards,
Lucas

-- 
Pengutronix e.K.             | Lucas Stach                 |
Industrial Linux Solutions   | http://www.pengutronix.de/  |

Possibly Parallel Threads

Search for more apparently analagous threads

Nouveau - May 2014 - [PATCH 2/4] drm/ttm: introduce dma cache sync helpers

[Nouveau] [PATCH 0/4] drm/ttm: nouveau: memory coherency fixes for ARM

[Nouveau] [PATCH 1/4] drm/ttm: recognize ARM arch in ioprot handler

[Nouveau] [PATCH 2/4] drm/ttm: introduce dma cache sync helpers

[Nouveau] [PATCH 3/4] drm/nouveau: hook up cache sync functions

[Nouveau] [PATCH 4/4] drm/nouveau: introduce CPU cache flushing macro

[Nouveau] [PATCH 2/4] drm/ttm: introduce dma cache sync helpers

[Nouveau] [PATCH 3/4] drm/nouveau: hook up cache sync functions

[Nouveau] [PATCH 4/4] drm/nouveau: introduce CPU cache flushing macro

[Nouveau] [PATCH 3/4] drm/nouveau: hook up cache sync functions

Possibly Parallel Threads